Showing posts with label MS Office. Show all posts
Showing posts with label MS Office. Show all posts

04 March 2024

Business Intelligence: A Software Engineer's Perspective VI (The Data Citizen)

Business Intelligence
Business Intelligence Series

More than a century ago, Jerbert G Wells wrote on mathematical literacy: "[...] the time may not be very remote when it will be understood that for complete initiation as an efficient citizen of one of the new great complex world-wide States that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write” [1]. The quote is occasionally misquoted as referring to Statistics, though frankly the boundaries of mathematical, statistical, numerical and data literacy tend to melt into each other, existing multiple dependencies between them.

In the age of big data, data citizens, business people able to use data, data processing and visualization tools for building solutions that enable their job, become steadily a necessity for businesses in their quest of making data-driven decisions, gaining insight and whatever valuable use data might have for the organizations. The need is not new,  Microsoft Access and Excel were used for similar purposes already in the 90s, becoming a maintenance nightmare for IT, data islands without proper backup or documentation existing through the organizations, diverse numbers being reported and contradicting each other. 

Then IT took over, trying to find alternatives for the data islands, implementing concepts like single source(s) of truth, quality gates and supporting processes, designing data models and infrastructures for self-service, allowing users to tap into the data for data exploration, discovery, reporting, etc. Getting all this right required to redesign existing infrastructures, making one step forward and a few steps back, in the end everything is a learning process. Such an effort can easily consume an organization's resources. 

Microsoft and other vendors for data-driven solutions keep insisting on how much potential exist in their tools for the data citizen, how the citizens can bring competitive advantage for organizations, automating business and supporting processes. The potential is not to neglect, though it requires a considerable investment from organizations in training and mentoring data citizens, in building data warehouses or data meshes that focus on end-user self-service needs. The data citizen needs time to learn, to play with the data, build solutions, test their usefulness in the daily tasks, respectively incorporate and disseminate the knowledge gained within the organization. 

There are many scenarios in which results can be obtained with a minimum of effort, however there are also hard limits. Besides the learning effort and the time available, there are cognitive, knowledge and ability limits that vary from person to person. Understanding what good architecture, design and techniques means is unfortunately not for everybody, and here's where the concept of citizen data analyst or citizen scientist breaks, and this independently of the tools used. There are also IT people who have similar challenges. 

It must be also recognized that the solutions built in the early stages by data citizens are primarily personal solutions that need to be reviewed and brought to the standards adopted by the organization. In time, it's expected to reduce considerably such effort by evolving data citizen's knowledge and skillset. Without this further work, the solutions built will tend to display some of the shortcomings of the solutions built on MS Access or Excel

The concept of data citizen can work as long the various assumptions and needs are adequately addressed, however progress will not happen overnight. The effort needs to become part of organization's long-term strategy, and the effort can be considerable for many organizations. Mentorship in terms of technical and non-technical support is needed. It's advisable to proceed in small iterative steps and integrate gradually the lessons learned.

Previous Post <<||>> Next Post

Resources:

[1] “Mankind in the Making”, by Herbert G Wells, 1903 [Source]

13 January 2021

MS Office: Excel for SQL Developers V (Formatting Output to HTML)

Some years back I found a tool to format the SQL and VB.Net code in posts (see hilite.me), tool which made blogging much easier, as I didn't had to format the code manually myself. However, showing the output of the queries into blog posts resumed mainly to making screenshots, which is unproductive and wastes space from my quota. Therefore, today I took the time to create a small Excel macro which allows formatting an MS Excel range to a HTML table. The macro is pretty basic, looping though range's cells:

'formats a range as html table
Public Function GetTable(rng As Range)
  Dim retval As String
  Dim i, j As Long
  
  retval = "<table style=""width: 90%;color:black;border-color:black;font-size:10px;"" border=""0"" cellpadding=""1"">"
  
  For i = 1 To rng.Rows.Count()
    retval = retval & "<tr style=""background-color:" & IIf(i = 1, "#b0c4de", "white") & ";font-weight:" & IIf(i = 1, "bold", "normal") & """>"
    
    For j = 1 To rng.Columns.Count()
       retval = retval & "<td align=""" & IIf(IsNumeric(rng.Cells(i, j)), "right", "left") & """>" & rng.Cells(i, j) & "</td>"
    Next
    
    retval = retval & "</tr>" & vbCrLf
  Next
  
  retval = retval & "</table>"
  
  GetTable = retval
End Function

Just copy the GetTable macro into a new module in Excel and provide the range with data as parameter.

 Unfortunately, copying macro's output to a text file introduces two double quotes where just one was supposed to be:

This requires as intermediary step to replace the two double quotes with one in Notepad (e.g. via the Replace functionality), respectively to remove manually the first and last double quotes. 

Notes:
1. Feel free to use and improve the macro. 
2. Further formatting can be added afterwards as seems fit. 

Happy coding!

06 November 2016

SQL Server Administration: System.OutOfMemoryException in SQL Server Management Studio and other 32-bit Drawbacks

    I was playing this week with a few datasets downloaded from the web on various topics, trying to torture the data until they’ll confess something. A few of the datasets were prepared for load into a MySQL database as individual INSERT INTO statements. They were containing between 100000 and a few millions of records. While looking at the big but slim datasets in SSMS (SQL Server Management Studio) and reconciling the differences between MySQL and SQL Server I got several times the System.OutOfMemoryException exception, SSMS crashing one or two times. That should be ok, given the number of records, though I was surprised that I got the same error message while executing the INSERT INTO statements for one of the smallest datasets which had about 300000 records:

    „An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown”

    Kb 2874903 brings some light into the topic – SSMS is still a 32-bit process and thus limited to 2GB of memory. The Kb offers three methods to avoid this issue. The first two, outputting the query results to text or file didn’t worked. The third method based on using sqlcmd utility worked smoothly with a syntax like the one below:
sqlcmd -i “<file_name.sql>” -d “<database name>”

    So it doesn’t matter that you’re having a supercomputer and that working with big datasets becomes a necessity nowadays, this limitation can make data loading just a little bit more complicated. On one side, it’s true that when dealing with such datasets is probably recommended to use directly sqlcmd to execute the scripts. On the other side, independently from this type of problem, even if understandable from the need of keeping backwards compatibility with 32-bit platforms/solutions, it’s hard to digest the fact that Microsoft keeps some of its products 32-bit based when SQL Server is targeting 64-bit platforms. One has same problem when using BIDS (Business Intelligence Development Studio), developing SSRS, SSIS or SSAS solutions under 32-bit and having maybe to deploy the code as 64-bit (e.g. SQL Server Agent). From my point of view most of the issues I had were when dealing with proprietary drivers like the ones for Oracle or even for MS Office. In addition in SSIS there could be features that are only available in 32-bit versions, or have limitations on 64-bit computers (see [5]). As it seems also the SQL Server Data Tools (SSDT) will have similar drawbacks…

   Anyway, sqlcmd utility saved the day with a minimum of overhead. Unfortunately it’s not always that easy to solvethe compatibility issues between 32-bit and 64-bit software and platforms.

Update 20.06.2017:
   One can synchronize the runtime version between BIDS and SQL Server Agent pretty easy. In BIDS under "Configuration Properties/Debug Option" at Project level, there is the “Run64BitRuntime” Property. Set to false it will run your package on 32-bit version. In a SQL Server Agent Package, there is the “Use 32 bit runtime” Checkbox under “Execution options” at step level. Checking this checkbox will run your package on 32-bit version.

   A hint that the two values might be out of synch is the following error message raised when running the package:
"Attempt to load Oracle client libraries threw BadImageFormatException. This problem will occur when running in 64 bit mode with the 32 bit Oracle client components installed."
Resources:
[1] Microsoft Support (2013) Kb 2874903: "System.OutOfMemoryException" exception when you execute a query in SQL Server Management Studio https://support.microsoft.com/en-us/kb/2874903
[2] MSDN (2016) SQL Server 2016: sqlcmd Utility https://msdn.microsoft.com/en-us/library/ms162773.aspx
[3] MSDN (2016) SQL Server 2016: Use the sqlcmd Utility https://msdn.microsoft.com/en-us/library/ms180944.aspx
[4] MSDN (2012) Introducing Business Intelligence Development Studio  https://msdn.microsoft.com/en-us/library/ms173767.aspx
[5] SQL Server 2008 R2: 64 bit Considerations for Integration Services https://technet.microsoft.com/en-us/library/ms141766(v=sql.105).aspx





02 March 2016

Business Intelligence: Self-Service BI (An Introduction)

Business Intelligence

Introduction


According to Gartner, the world's leading information technology research and advisory company, Self-Service BI (aka self-service analytics, ad-hoc analysis, personal analytics), for short SSBI, is a “form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support” [1].

Reading between the lines, SSBI presumes the existence of an infrastructure made of tools to support it (aka self-service BI tools), direct or indirect access to row data and/or data models for the users, and the skillset needed in order to work with data and answer to business problems/questions.

A Little History

The concept of self-service is not new, it just got “rebranded” and transformed into a business opportunity. The need for business users to perform ad-hoc analyses was always there in organizations, especially in the ones not having the right infrastructure for harnessing their data. Even since the 90s with the appearance of products like MS Excel or MS Access in many organizations users were forced by the state of art to learn how to use such products in order to get the answers they needed from the data. Users started building personal solutions, many of them temporary, intended to fill the reporting gaps organizations had. With a little effort and relatively small investment users had the possibility of playing with the data, understanding the data, identifying and solving problems in the business. They acquired thus a certain level of business expertise and data awareness becoming valuable resources in the organization.

With time such solutions grew in scope and data volume, gained broader visibility and reached deeper in organizations, some of them becoming team, departmental or cross-departmental solutions. What grows uncontrolled with time starts to have negative impact on the environment. First tools’ management became a problem because the solutions needed to be backed-up and maintained regularly, then other problems started to surface: security of data, inefficient data processing as increasing volumes of data were processed on local computers and transferred over the network, data and effort were duplicated, different versions of reality existed as different numbers were reported, numbers that were reflecting different definitions, knowledge about the business or data-analysis skillsets. The management needed a more consolidated and standardized effort in order to address these problems. Organizations were forced or embraced the idea of investing money in modern BI solutions, in more powerful servers capable of handling a larger amount of requests, in flexible data models that facilitate data consumption, in data quality initiatives. Thus through various projects a considerable number of such solutions were converted into more standardized and performant BI solutions, the IT department being in control of the changes and new requests.

Back to Present

With IT in control of the reporting requirements the business is forced to rely on the rapidity with which IT is able to address new requirements. Some organizations acquired internal resources in order to build reports and afferent infrastructure in-house, others created partnerships with vendors, or approached a combination of the two. As the volume of requirements isn’t uniform over time, the business has to wait several days between the time a requirement was addressed to IT and a solution was provided. In business terms a few of days of waiting for data can equate with the loss of an opportunity, a decision taken too late, decision that could have broader impact.

A few years ago things started to change when the ad-hoc analysis concept was rebranded as self-service and surfaced as trend. This time vendors like Qlik, Tableau, MicroStrategy or Microsoft, some of the main SSBI vendors, are offering easy to use and rich functionality tools for data integration, visualization and discovery, tools that reflect the advances made in graphics, data storage and processing technologies (e.g. in-memory databases, parallel processing). With just a few drag-and-drops users are able to display details, aggregate data, identify trends and correlations between data. In addition the tools can make use of the existing data models available in data warehouses, data marts and other types of data repositories, including the rich set of open data available on the web.

Looking at the Future

Like its predecessors, SSBI seems to address primarily data analysts and data-aware business users (aka data citizens), however in time is expected to be adopted by more organizations and become more mature where already adopted. Of course, some of the problems from the early days more likely will resurface though through governance, better architectures and tools, integration with other BI capabilities, trainings and awareness most of the problems will be overcome. More likely there will be also organizations in which SSBI will fail. In the end each organization will need to find by itself the value of SSBI.

Previous Post <<||>> Next Post

Resources:
[1] Gartner (2016) Self-Service Analytics [Online] Available from: http://www.gartner.com/it-glossary/self-service-analytics
[2
] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb

12 October 2010

MS Office: Why I (dis)like MS Access

   In the previous post, “The Limitations of MS Access Database”, I highlighted a few of the limitations of MS Access  as database, ignoring the other two or three important aspects – Access as development, reporting, respectively data analysis platform. In this post I’ll retake the topic from a general and personal perspective, considering some of the features that I like and makes from Access a useful and powerful tool, attempting also to mention some of its usage limitations I run into over the time. With the risk of repeating myself, I can’t say I’m an expert in MS Access even if I provided several solutions based on it, its use in the various contexts being not always so inspired, that being one of the reasons why in the first post “Is MS Access or MS Excel the Answer to Your Problem?” I insisted on this aspect.

Ad-Hoc Database

   MS Access is a file server-based relational database, being one of the most used databases, though it can’t be compared with more mature RDBMS like Oracle, MySQL, SQL Server, Sybase, PostgreSQL, Teradata, Informix or DB2, richer in features, especially in what concerns their administration, transactional and concurrent processing, scalability, stability, availability, performance, reliability, portability, replication, integration, security, manageability, extensibility, the degree to which they fit in the overall architecture of an enterprise, of relevance being topics like Business Intelligence, Data Warehousing, SOA, Cloud Computing, etc. These are some of the reasons for which I categorized Access as a Personal or Ad-Hoc database, being, at least from my point of view, more appropriate for small-size or personal solutions. In essence Access has the characteristics of a relational database though the lack in the mentioned features makes it less desirable. Nobody denies Access’ usefulness, the point is that when compared with full-featured RDBMS Access has no chance, fact reflected also in the following market share diagrams:

MarketShare  gartner-database-deployment[1]
DBMS Market 2006 (JoinVision e-Services via[2]) DBMS Market 2008 (Gartner via MySQL) [1]

     Even if the diagrams are a few years old, I think they are still representative in what concerns the state of art in the world of databases, the first diagram providing an historical perspective, while the second the “actual” and “future” reflected tendencies. It’s not the first time I’m seeing MS Access and SQL Server represented together even if they belong to different technology stacks, Access’ strength and weakness being deeply rooted in its affiliation to MS Office set of tools. It would be interesting to know which was the ratio then between the number of Access and SQL Servers, and what’s the ratio now, SQL Server Express replacing Access’ role of personal or small-scale database.

    The statistics are less representative when it comes to people, their interests and immediate needs. The bottom line is that Access is an easy to use database with pretty low learning curve, you don’t need to know the fancy stuff about databases, you could experiment and learn it as an add-on to your job, making the consumption of data much easier, at least in theory. Are you having your data stored across several Excel files? You can import or copy paste them in Access and there you have an ad-hoc database, then create several queries on top of them with the Query Designer or Query Designer, and this without any knowledge of SQL. The saved queries could be reused much as the views, they could be parameterized, the parameters could be bounded much like the user-defined functions, and made available for further consumption. I can’t say I met any other similar software tool that simplifies so much the design and consumption of databases. The simplicity of Access query designing comes with its tribute, especially when you want to achieve more from your database, the minimum of features making difficult to design complex queries, Access requiring a different mindset in problem solving. In addition, those used with the rich features of RDBMS won’t feel too comfortable in using the Query Designer or Editor, the ANSI syntax it’s inflexible while the troubleshooting quite painful.

    I used Access as database only when I had no other alternative, preferring to store the data in a RDBMS like SQL Server or Oracle. In exchange I used Access as presentation layer, allowing users for example to access and analyze the data. In many occasions I played with Access databases in small projects or enhancing existing applications, spending many hours in tweaking Access queries or on porting such queries to other RDBMS. I had the occasion to work with several tools that were using Access as backend, one of them IQ Insight, used to assess the quality of data, was an interesting tool to work with though it was paying tribute to the stability and speed of its database, in a next implementation project deciding to take it out of the landscape, the performance of VB + SQL Server solution that replaced it, increased the performance from a matter of hours to minutes. I know that many people out there love Access as database, though once you acknowledged the performance power and flexibility of other databases, you don’t feel like returning in the past.

Data Analysis Tool

    When having multiple Excel or other data sources, you don’t need to store your data in Access itself, it’s enough to link your text, Excel or any other ODBC data source, built a query on top of them, and there you have on the fly your data at your disposal, something that Data Warehousing and Business Intelligence tools hardly manage to do when considering all the people’s needs. By importing the data in your Access database, you could even correct some of the inherent issues existing in data, use some mappings in order to translate the data, use several queries in order to aggregate the data at the needed level of detail or get new insights. From a mapping table or a query to creating a whole data analysis framework is just a small step, and this without too much involvement of the IT guys. Even more, the framework could be used by your colleagues too, they could use it directly or indirectly by re-linking the results of your analysis with a minimum of effort, they could even improve the character of your analysis or find other purposes for the data. Thus results a complex network of interconnected Access databases, and that’s a matter of time until it’s getting out of control, for example by not knowing how a change in one of the queries could impact the other known and unknown users of your data, on whether you are using the actual data, on whether the data have been tempered, and so on. There should be no wonder when people are arriving to report different numbers, when the numbers don’t tie together, though also more modern reporting frameworks are dealing with these types of issues, isn’t it? In addition, you arrive to have multiple instances of the same data or have data distributed and isolated in a uncontrolled fashion, not the best strategy for an enterprise though…

   I used Access as data access end point for data available in various data sources, allowing users to analyze and recombine the data by themselves, but this mainly in order to overcome the limitations of available standard reporting tools. This combined with the fact that has been attempted to move most of the logic created by users in a standardized form, limiting the risks of running into Access fallacies. Sure, there could be done more in order to avoid such pitfalls, for example having adequate reporting and data analysis tools, having in place a Data Management Policy which addresses common data problems, training users, etc.

Reporting

    The possibility to present the data in a reporting-like fashion is one of the greatest advantages of Access, the tabular structure being easy to integrate with charting, paging, results breaking, formulas, filtering/parameterization, rich formatting, subreports and other types of report structures (e.g. footer, header), in other words the ingredients of a typical report. The combination between ad-hoc data analysis and reporting,  it quite an advantage, depending on users’ skills in making most out of it. Reports’ functionality could be extended using Reports’ DOM and VBA, only the fact that a report could be entirely created and modified at runtime is quite of a deal.

   I used Access reports only in the applications which were built entirely on MS Access, whenever was possible preferring to move the reports on more standardized platforms. Sometimes I find it more useful to export the data directly to Excel or to a more portable format like PDF, thing also possible with Access reports, though eliminating thus the intermediary platform. Now it depends on users’ preferences and organizations’ infrastructure.

Rapid Prototyping

    Access could be used as frontend for various types of applications, and you don’t need to put too much effort in your application. Is enough to drop a form and link it to a table, then link the screens together and here you have an already functioning application, fact that makes from Access a tool ideal for rapid prototyping.

   I used Access in several projects for building proof of concept prototypes, allowing customers to gather requirements, evaluate the concept and the available functionality. There were also cases in which the prototypes were comparable as performance with the applications that replaced them, from some points of view even better, though that’s a matter of architecture, skills and sometimes infrastructure.

Extensibility: VBA

    A person could create in Access a data analysis framework, a report or a prototype without writing a single line of code, richer functionality being available by using VBA, which is nothing more than old-fashioned VB based on Access’ DOM. VBA extensibility refers here to the possibility of going beyond the wizarding and drag-and-drop functionality provided by Access, for example by adding complex validation into forms, linking forms, altering or creating content at runtime, etc. Not everybody needs to do go so far, however those who used formulas or have some programming experience would find VBA easy to learn. Those wanting to change the default behavior of Access or provide missing functionality then they will have to go deeper in VBA’s secrets, in using built-in or third party developed libraries. For example in order to change the “sequential” access of data provided by Access a programmer will have to use ADO or DAO, the built-in transactional functionality provided in the two libraries could be used to cover the lack of transactional processing not built-in in Access. With some exceptions, in theory you could do with VBA anything you do with old fashioned VB, though with VB.Net the gap to VBA increased considerably ( see Converting Code from VBA to Visual Basic .NET for differences). There are also some limitations, for example the adding of controls in Access forms at runtime, and I remember I found a few other with time, some of them deriving from bugs existing in the tool itself. 

Extensibility: .DLLs

   I was saying that it’s possible to use third party developed libraries in Access, this functionality relying on COM+ and its predecessors DCOM, COM or ActiveX, technologies that allow the communication between components not only on the local computer but also in distributed networks or internet as in the case of ActiveX. In this way it’s possible to encapsulate functionality in libraries saved as .dlls, distribute them with your applications or reuse them in other applications. Writing COM classes is a job for programming languages like C++, VB, VB.Net, C#, etc. The old-fashioned VB was great in creating and debugging COM components in just a question of minutes, in theory any piece of code could be encapsulated in such a component. Having the possibility to extend the functionality of MS Access with such libraries open the door to an unlimited number of architectural scenarios.

Extensibility: Add-ins

   Add-ins are forming a special type of components rooted in OLE, later based on COM, that use the MS Office DOM architecture, their primary utility relying in the fact that they make it possible to provide new features for MS Office itself. An example of such “bonus” features are Save as PDF add-in for Access 2007 or Open Database Connectivity add-in for Excel. I used Add-ins only to extend Excel’s UI-based functionality, therefore I can’t talk too much about their use in Access. For more see Building COM Add-ins for Office Applications material available in MSDN.

Database Templates

   I observed that in MS Access 2007 are available several templates (e.g. Assets, Contacts, Sales pipeline, etc.) that could be extended or used in learning how an application is designed. Doing a little research I found out that is possible to create templates for whole databases, reports or forms. I haven’t use templates until now in Access, but it could prove to be an interesting feature when common architectural or functional characteristics are found.

References:

[1] MySQL. (2010). Market Share. [Online] Available from: http://www.mysql.com/why-mysql/marketshare/ (Accessed: 10 October 2010)

[2] Creative System Design. (2010) Databases. [Online] Available from: http://online.creativesystemdesigns.com/projects/databases.asp (Accessed: 10 October 2010)

05 October 2010

MS Office: The Limitations of MS Access Database

In the previous post I was highlighting some general considerations on the use of MS Access and Excel as frameworks for building applications. I left many things out from the lack of time and space, therefore, as the title reveals, in this post I will focus simply on the limitations of MS Access considered as Database. I considered then that Access is a fairly good as database, recommending it for 10-20 concurrent users, fact that could equate, after case, maybe with a total of users that range between 1-100. Of course, this doesn’t mean that MS Access can’t do more, actually it supports 255 concurrent users and with a good design that limit could be reached.

Another important limitation regards the size of an Access database, set to 2GB, it used to be more than sufficient a few years back, though nowadays, it’s sometimes the equivalent of a month/year of transactions. I never tried to count how many records could store a MS Access, though if I remember correctly, a relatively small to average table of 1000000 (10^6) records occupies about 100MB, using this logic 2GB could equate with about 20000000 (2*10^7) records, the equivalent of a small to average database size. Anyway, the numbers are relative, the actual size depends also on the number of objects the database stores, the size of attributes stored, on the fact that even if Access is supposed to have a limitation of 2GB, I met cases in which a database of 1GB was crashing a lot, needing to be repaired or backed up regularly. 

Sometimes it could be repaired, other times not, unfortunately the “recovery” built within a MS Access can’t be compared with the recovery available in a RDBMS. That’s ok in the end, even mature databases crash from time to time, though the logs and transaction isolation models allow them to provide high recoverability and reliability, to which adds up scalability, availability, security and manageability. If all these are not essential for your database solution, the MS Access is ok, though you’ll have to invest effort in each of these area when you have to raise your standards.

One of the most painful issues when dealing with concurrent data access is the transaction processing that needs to guarantee the consistency and recoverability of operations. As Access is not handling the transactions, the programmer has to do that using ADO or DAO transactions. As many applications still don’t need pessimistic concurrency, with some effort and a good row versioning also this issue could be solved. Also the security-related issues could be solved programmatically by designing a role-based permission framework, though it occasionally it could be breached when the user is aware of the few Access hacks and has direct access to the database. 

Manageability resumes usually in controlling resources utilization, monitoring the progress of the actions running on the database. If Access is doing a relatively good job in what concerns the manageability of its objects, it has no reliable way to control their utilization, when a query is running for too long, the easiest way to solve this is to coldly kill the process belonging to Access. Not sure if it makes sense to philosophy about Access’ scalability and availability, at least can’t be comparable from this point of view with RDBMS for which failover clustering, mirroring, log shipping, online backup and in general online maintenance have an important impact on the two.

Excepting the above theoretical limitations, when MS Access is part of your solution, it’s always a good idea to know its maximal capacity specifications, this applying to all type of databases or technologies.  Most probably you won’t want that in the middle of your project or even later you realize that you reach one of such limitations. I tried to put together a comparison between the maximal capacity specifications for 2000, 2007 and 2010 versions of MS Access and, for reference, the same specification for SQL Server (2000, 2005, 2008 R2). The respective information come mainly from Microsoft websites, with a few additions from [5] and [6].


MS Access
SQL Server
Attribute
2000 [1]
2007/2010 [2]
2000 [7]
2005 [4]
2008 R2 [3]
 SQL statements size
64kb
64kb
64kb
64kb
64kb
# characters in Memo field
65535
65535
-
2^30-1
2^31-1
# characters in Text field
255
255
8000
8000
8000
# characters in object name
64
64
128
128
128
# characters in record
4000
4000
8000
8000
8000
# concurrent users
255

255


32767
# databases per instance
1
1
32767
32767
32767
# fields in index
10
10
16
16
16
# fields in recordset
255
255
4096
4096
4096
# fields in table
255
255
1024
1024
1024/30000
# files per database
1
1
32767
32767
32767
# forced relationships per table
32
32
253
253
253
# indexes per table
32
32
250 (1 clustered)
250 (1 clustered)
250 (1 clustered)
# instances


16
50
50
# joins in a query
16
16
32
32
32
# levels nested queries
50
50
32
32
32
# nested subqueries


32
32
32
# objects
32768
32768
2147483647
<>
</>
2147483647
2147483647
# open tables
2048
2048
2147483647
2147483647
2147483647
# roles per database
n/a
n/a
16379
16379
16379
# tables in a query
32
32
256
256
256
# users per database
n/a
n/a
16379
16379
16379
database size
<2GB
<2GB
1048516 TB
542272TB
542272TB
file size (data)
2GB
2GB
32TB
16TB
16TB
file size (log)
n/a
n/a
32TB
2TB
2TB


For my surprise the maximal capacity specifications of Access are comparable with the ones of SQL Server for many of the above attributes. Sure, there is a huge difference in what concerns the number of databases, the database/file size and the number of supported objects, quite relevant in the architecture of applications. Several other differences, for example the number of indexes supported per table or relationships per table, are less important for the majority of solutions. Another fact that is not remarked in the above table is the fact that the number of records in a table are typically limited by storage. Please note that many important features not available in Access were left out, therefore, for a better overview is advisable to check directly the referenced sources.

There are two one more personal observations for this post. Even if MS Access is great for non-SQL developers giving its nice Designer, for SQL developers it lacks a rich editor, the initial formatting being lost, this doubled by the poor support for later versions of the ANSI standard make from Access a tool to avoid.

References:
[1] Microsoft. 2010. Microsoft Access database specifications. [Online] Available form:
http://office.microsoft.com/en-us/access-help/access-specifications-HP005186808.aspx (Accessed: 04.10.2010)
[2] Microsoft. 2010. Access 2010 specifications [Online] Available form: http://office.microsoft.com/en-us/access-help/access-2010-specifications-HA010341462.aspx (Accessed: 04.10.2010)
[3] MSDN. (2010). Maximum Capacity Specifications for SQL Server: SQL Server 2008 R2. [Online] Available form: http://msdn.microsoft.com/en-us/library/ms143432.aspx (Accessed: 04.10.2010)
[4] MSDN. (2010). Maximum Capacity Specifications for SQL Server: SQL Server 2005. [Online] Available form: http://msdn.microsoft.com/en-us/library/ms143432(SQL.90).aspx (Accessed: 04.10.2010)
[5] SQL Server Helper. (2005). SQL Server 2005: Maximum Capacity Specifications. [Online] Available form: http://www.sql-server-helper.com/sql-server-2005/maximum-capacity-specifications.aspx (Accessed: 04.10.2010)
[6] MSDN. (2008).SQL 2005 and SQL 2008 database volume capacity. [Online] Available form: http://social.msdn.microsoft.com/forums/en-US/sqlgetstarted/thread/4225734e-e480-4b21-8cd4-4228ca2abf55/ (Accessed: 04.10.2010)
[7] MSDN. (2010). Maximum Capacity Specifications for SQL Server: SQL Server 2000. [Online] Available form: http://technet.microsoft.com/en-us/library/aa274604(SQL.80).aspx (Accessed: 04.10.2010)
[8] MSDN. (2010). Comparison of Microsoft Access SQL and ANSI SQL. [Online] Available form: http://msdn.microsoft.com/en-us/library/bb208890.aspx (Accessed: 04.10.2010)

02 October 2010

Business Intelligence: Is MS Access or Excel the Answer to your Problems?

Business Intelligence
Business Intelligence Series

Introduction 

That’s one of the questions that followed me for years, quite often being asked by customers to provide a MS Access or MS Excel solution as an answer to a business need. The beauty of this question is that there is no right answer and, as I stressed out in several occasions, there is not always a straightforward answer to such a question in IT, the feasibility of an IT solution relying on many variables formulated typically in term of business and IT requirements. 

When a customer is requesting to built a MS Access or Excel solution outside of Office paradigm, I’m kind of circumspect, and this not because they are not great tools, but because they are not adequate for all purposes. I even recommend the two for personal or for small-scale solutions, though their applicability should stop right there.

A personal solution is an application developed for personal use, for example to store and maintain the data for a report, to process data automatically or any other attempt of automating some tasks. By small-scale solutions I’m referring to the following types of applications: 
- applications of basic to average complexity, that don’t require complex design or could be developed by a developer with average skills.
- applications that target a small number of users, usually a small group of max 10-20 concurrent users, it may be occasionally a whole department or it could be cross departmental as long the previous mentioned condition are met.

A Short Review 
 
MS Excel is the perfect tool for storing non-relational tabular data, manipulating data manually or with the help of formulas, doing data analysis with pivoting and charting, or of querying various data sources. Its extensibility based on its DOM (Document Object Model), VBA (Visual Basic for Applications) and its IDE (Integrated Development Environment), Forms, add-ins, in-house or third-party developed libraries, the template and wizard-based approach, make from Excel a powerful development environment. I would say that Excel’s weakness resides in its intrinsic design, the DOM model which lacks a rich event model, in the fact that Excel is mainly a tool for data entry, analysis and reporting, the other types of functionality coming on a secondary plan. Excepting a few new features built in Excel itself, the important new functionality comes as add-on – SQL Server-based data mining add-in, MS Sharepoint Server-based Web Services features like multiuser collaboration, slicer and a few other.

The extensibility capabilities mentioned above are not only a particularity of Excel but apply to the whole Office family: Access, Word, Outlook, Powerpoint, and even Visio if is considered the “extended family”, each of them with its role. Access’ role is that of flexible relational data storage, querying and reporting solution, its strength relying mainly in the easiness of providing a simple UI (User Interface) for maintaining and navigating the data, in the easiness of pulling data from various sources for further analysis. As in the case of Excel, Access’ weakness resides in its DOM, in the fact that it’s not a full RDBMS (Relational Database Management System) and all the consequences deriving from it.

Programming for the Masses/Citizens
 
The great thing about VBA is that also non-developers could successfully adventure in developing Office-based applications, the possibility of learning from the code built with “Record Macro” functionality allowing a small learning curve. Enabling “non-developers” to built applications makes from Office a powerful and altogether dangerous tool because such applications could be easily misused. Misused here refers to the fact that often is attempted to built in Excel or Access complex applications that sooner or later break apart under their complexity, that organizations arrive to have a multitude of such applications with no control over their existence, maintenance, security, etc. 

Unfortunately the downsides of such applications are discovered late in the process, when intended functionality is not available, thus arriving to reinvent the wheel, patch up functionality in a jumble, in a tumble. With some hard-work you could achieve the alike functionality as the one available in powerful frameworks like .Net, WPF, WCF or Silverlight, to mention the Microsoft technologies I’m somewhat acquainted to. VBA is great but with time became less powerful than VB, C# or C++ (the comparison between VBA and C++ is a little forced), to mention the most important programming languages for writing managed code in .Net. The barriers between the capabilities of the two types of programming languages are somehow broken by the possibility of developing add-ins and libraries for MS Office or of using Office DOM in .Net applications, though few (non-) programmers adventure on this path.

The Architectural Perspective 
 
There is another important architectural perspective – separating the data storage and eventually data processing from presentation. Also when using Access or Excel the data storage could be separated from presentation, though I’ve seen few solutions doing that, the three layers coexisting usually within the same tire. An Access solution could be split in two, one for database and other for UI and processing, allowing more flexibility in what concerns the architecture, security, version management, etc. 

Access is good for data presentation and rapid prototyping, though the concept and the data controls are quite old, having several limitations when compared with similar controls available for example in .Net. The advantage of using simple drag-and-drop or wizards in Access is for long over, the same functionality existing also in Visual Studio (Express), environment in which applications could be built with drag-and-drop and wizards too, in plus taking advantage of additional built-in features. The database layer could be replaced with a full RDBMS, same as the presentation layer could be replaced with a .Net UI. It’s not much easier then to built the architecture around the .Net UI and a RDBMS?!
 
Excel is considered by many as a (relational) database, is it really so? It’s true the data could be stored in tabular format in which a sheet plays the role of a table and queryable through the various drivers available, though no primary key is available, less control over the data entered and many other features available in RDBMS need to be provided programmatically, again reinventing the wheel. Same as in the case of Access, Excel could be considered for data storage and presentation, its functionality being reduced when compared with the one of Access. 

Many people are used with the data entry mechanism available in Excel, especially in what concerns data manipulation, wanting similar functionality in other tools. If this was Excels’ advantage some time ago, that’s no more valid, several rich data grids offering similar data entry functionality which, with some effort, could simulate to an acceptable degree the functionality of Excel, and they could provide also richer validation functionality.

It’s all about Costs 
 
In the past MS Excel and Access were quite cheap as "development platforms" when compared with the purchasing of existing IDE, especially when we consider their extensibility through VBA and IDE’s availability, thus the functionality vs. extensibility favorable ratio. Recently were introduced express (aka community) versions of powerful IDEs for Visual Studio, respectively open source IDE and development frameworks that provide rich capabilities, the report of forces changed dramatically in the favor of the later. 

Today you could put together a small-scale application with a minimum of investment, making sometimes obsolete the use of Office tools outside of the Office solutions. The pool of software tools and technologies changed in the past years considerable, but the mentality in what concerns the IT infrastructure and software development changed less. It’s true that sometimes organizations lack the resources who could architect and design such solutions, relying mainly on external resources, or being much easier to rely on an employee’s programming skills who knows “exactly” what's needed and it would be in theory much easier in order to attempt solving a problem directly rather than writing the requirements down. 

In VBA’s advantage comes also the fact that normally software solutions evolve and need to be changed in order to reflect business or philosophy changes, being much easier to introduce such changes directly by the employee who built the application in contrast with starting a whole project for this purpose. This aspect is rooted in other perspective – sometimes organizations ignore the software needs, falling in employees attribution to find cheap and fast ways of automating tasks in particular, solving work-related problems in general, Excel or Access being quite handy for this purpose. Sure, you can do almost anything also in Excel/Access but with what costs?

The Strategic Context 
 
Several times I heard people talking about replacing the collection of Excel sheets with an Access solution. I know that in the absence of adequate solutions people arrive to store various types of data in Excel sheets, duplicating data, loosing the control over versions, data quality, making data unsecure/unavailable or un-processable. Without a good data management and infrastructure strategy the situation doesn’t change significantly by using an Access solution. 

It’s true that the data could be easier stored in a global place, some validation could result in better data quality, while security, availability and data maintainability could suffer some improvements too, however the gain is insignificant when compared with the capabilities of a full-featured RDBMS. Even if a company doesn’t have the resources to invest in a mature RDBMS like Oracle or SQL Server, there are also the Express versions for the respective databases, several other free solutions existing on the market especially in the area of open source. On the other side it’s true that MS Access, through its easy to use SQL Designer, allows people building queries with simple drag-and-drops and limited SQL knowledge, though its value is relative.

Talking about data management strategy, it concerns mainly the data quality as a function of its 6 main dimensions (accuracy, conformity, consistency, completeness, duplicates, referential integration) to which add data actuality, accessibility, security, relevance, usability, and so on. The main problem with personal solutions is that they lead to data and logic duplication, and even when such solutions are consolidated in one form or another, their consolidation and integration is quite complex because you have to consider not only the various designs but also the overall requirements from a higher perspective. On the other side it’s difficult to satisfy the needs of all the people in an organization, in a form or another, duplication of data being inevitable, with direct or indirect implications on data quality. It is required some effort and a good strategy in what concerns these aspects, finding the balance between the various requirements and the number of solutions to satisfy them.

Reformulating the Question

How can we determine which tool or set of tools is appropriate for our problem? Normally the answer to this question depends on the needed functionality. The hard road in answering this question is to identify all the requirements, the features available in the various tools, weight both of them, and decide what worth best. Unfortunately that’s not an easy task, it need to be considered not only actual but also future requirements, organization’s strategy, and whatever might come around. 

Reports, best practices, lessons learned or other type of succinct content might help as well in taking a decision without going too deep in analyzing features and requirements thoroughly. Sometimes a gut feeling might work as well, especially when comes from a person with experience in the field. Other times you don’t have too many options – time, resources, knowledge, IT infrastructure, philosophy or politics reducing your area of maneuverability/decision. In the end we learn by doing, by fighting with the constraints and problems we have, hopefully we learn also from our or others’ mistakes…

PS: Even if I’m having several good cumulated years in developing solutions based on Excel and Access, and I can’t pretend that I know their full potential, especially when judged from the perspective of the new features introduced with Excel 2007 or 2010, even more when considering their integration with SharePoint, SQL Server or other similar platforms. The various software tools or platforms existing on the market allow people to mix functionality theoretically in unlimited ways, the separation of functionality between layers, SaaS (software as a service) and data meshes changing the way we program and perceive software development.

Previous Post <<||>> Next Post

27 June 2010

Market Review: What’s New in Microsoft World II

Microsoft Office - Cloud Computing is the Word

    Two weeks ago, on 15th of June 2010, Microsoft Office was shipped together with Visio and Project 2010, closing the cycle of releases started with SQL Server 2008 R2, Visual Studio 2010, Sharepoint 2010 (all 3 shipped in April 2010) and Windows Azure (available also in April). The words that describe/unite at best these software tools is cloud computing and collaboration, why that? First we have to consider Azure, the new product from Windows’ portfolio, a framework for cloud computing and SaaS (Software as a Service) architectures, and composed of 3 components, namely Windows Azure which allows running applications and accessing data in the cloud, SQL Azure Database  provides data services in the cloud, while Windows Azure platform AppFabric allows the communication between the applications residing in the cloud. Also MS Office 2010 is part of Microsoft’s strategy toward cloud computing, the weight falling on SharePoint 2010, a business collaboration platform that together with the other MS Office tools allow to manage information, automate and manage business processes, facilitate decision making process, etc. A cornerstone of the framework is the co-authoring tool that “allows multiple people to work on a single copy of a document at the same time or at different times, seamlessly, whether they are online or offline”. As it seems are provided also “community features that allows users to share data as they do on Twitter and Facebook”, a step toward social computing. Microsoft plans to offer an online version of Office 2010, called Office Web Apps (OWA), supposed to be also a competitor for Google Docs.

    There are also people who question the steps done by Microsoft toward cloud computing, but in the end is important to establish the software infrastructure in which cloud computing-based applications could be developed, futures that don’t exist currently could appear in future versions or could be provided by third-party vendors.

    Microsoft comes also with some unpleasant surprises, as it seems Microsoft’s SharePoint Server runs only on 64-bit hardware and requires also a 64 bit SQL Server edition, and this could be quite an important constraint for many customers. The most unpleasant surprise is that Microsoft renounces to the well-known upgrade schema, the reason for that, as mentioned in Ars Technica quoting a Microsoft spokesman, from the need to simplify the product lineup and pricing, based on “partner and customer feedback” (I’m sorry but I can’t really buy that!). The same source expects that upgrades will be available with promotions, after Office’s launch. The only promotion I heard of is Microsoft Office 2010 Technology Guarantee program but if refers only to the customers who “purchased, installed, and activated a qualifying Microsoft Office 2007 product between March 5, 2010, and September 30, 2010”, they being eligible to download Office 2010 at no additional cost. How about the ones who bought a Microsoft Office 2007 copy in 2010 but before 5th of March (like I did)?!

Microsoft TechEd North America Sessions are Online

    The Microsoft TechEd North America sessions held in New Orleans were made available online (video and slides), an opportunity for technical professionals to get an overview on the new advancements in Microsoft technologies, being approached topics related to the various platforms of Windows, MS Office, Dynamics, Web, Cloud Computing & Online Services, etc. I really like the way Microsoft makes its technologies available to the public, especially the fact that it provides also Express versions of their software, allowing newbies and developers to get acquainted and use essential basic functionality. The MSDN, TechNet, webcasts, Channel9  and community and personal blogs bring the technical and non-technical closer to the company and its technologies.

16 March 2010

MS Office: Excel for SQL Developers IV (Differences Between Two Datasets)

    One of the tasks which appears from time to time on my table is to determine the differences between two datasets coming from different systems or from the same system/source but taken at different points in time. Even if it’s fairly simple to write such a query, especially when the number of attributes is quite small, there is lot of repetitive work that needs to be done because typically for each set of matched attributes needs to be added a third attribute (difference flag) showing whether there is a difference or not. In time I started to use Excel formulas to create the SELECT clause for such a query, matching tables’ attribute first and categorizing the data types mainly in 4 categories: text, amounts, numeric (other than amounts) and dates. This categorization is requested first of all by the different ways in handling the NULL values for each data type, and secondly by the difference between amounts and numeric, even if they might have the same data type, the difference resides in the fact that amounts might be rounded differently in each data source, therefore a difference of a few cents will not be considered as difference.

    Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:

--text attributes: 
CASE 
    WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--amount attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--numeric attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--date attributes: 
CASE 
   WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y' 
   ELSE 'N' 
END DiffColumnXYFlag 


Notes:
1.   Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
    WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
    ELSE 'N'

END
DiffColumnXYFlag

2.   In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3.    In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4.   It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.

    Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found. 

Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String 
'builds the code for a comparison query between two tables 
Dim attributes As String 
Dim differences As String 
Dim constraint As String 
Dim whereConstraints As String 
Dim joinConstraints As String 
Dim columnX As String 
Dim columnY As String 
Dim index As Integer 
For index = 1 To rng.Rows.Count 
    columnX = Trim(rng.Cells(index, 1).Value) 
    columnY = Trim(rng.Cells(index, 2).Value) 
  
    If Len(columnX) > 0 Or Len(columnY) > 0 Then 
       If Len(columnX) > 0 Then 
            attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf 
       End If 
       If Len(columnY) > 0 Then 
            attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf 
       End If 
       
       constraint = "" 
       If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then 
            If Len(columnX) > 0 And Len(columnY) > 0 Then 
                 'creating the difference flag 
                 Select Case Trim(rng.Cells(index, 3).Value) 
                 Case "text": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "amount": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "numeric": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "date": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case Else: 'error 
                     MsgBox "Incorrect data type provided for " & index & " row!", vbCritical 
                 End Select 
                 
                
                If ShowOnlyDifferencesFlag Then 
                   whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf 
                End If 
                
                differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf 
            Else 
                differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf 
            End If 
                             
        Else 
            joinConstraints = joinConstraints & "    AND A." & columnX & " = B." & columnY & vbCrLf 
        End If 
     
     End If 
Next 
If Len(attributes) > 0 Then 
    attributes = Right(attributes, Len(attributes) - 2) 
End If 
If Len(joinConstraints) > 0 Then 
    joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8) 
End If 
If Len(whereConstraints) > 0 Then 
    whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4) 
End If 
'building the comparison query 
GetComparisonQuery = "SELECT " & attributes & _ 
    differences & _ 
    "FROM " & LeftTable & " A" & vbCrLf & _ 
    "     FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _ 
    "       ON " & joinConstraints & _ 
    IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "") 
   
End Function 
 

Excel - Comparison Datasets

    The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in Saving Data With Stored Procedures post: 
 
SELECT A.ProductID LTProductID 
, B.ProductID RTProductID 
, A.Name LTName 
, B.Name RTName 
, A.ProductNumber LTProductNumber 
, B.ProductNumber RTProductNumber 
, A.MakeFlag LTMakeFlag 
, B.MakeFlag RTMakeFlag 
, A.FinishedGoodsFlag LTFinishedGoodsFlag 
, B.FinishedGoodsFlag RTFinishedGoodsFlag 
, A.Color LTColor 
, B.Color RTColor 
, A.SafetyStockLevel LTSafetyStockLevel 
, B.SafetyStockLevel RTSafetyStockLevel 
, A.ReorderPoint LTReorderPoint 
, A.StandardCost LTStandardCost 
, B.StandardCost RTStandardCost 
, A.ListPrice LTListPrice 
, B.ListPrice RTListPrice 
, A.Size LTSize 
, A.DaysToManufacture LTDaysToManufacture 
, A.ProductLine LTProductLine 
, A.Class LTClass 
, A.Style LTStyle 
, A.SellStartDate LTSellStartDate 
, B.StartDate RTSellStartDate 
, A.SellEndDate LTSellEndDate 
, B.EndDate RTSellEndDate 
, A.DiscontinuedDate LTDiscontinuedDate 
, B.CreationDate RT 
, B.CreatedBy RT 
, A.ModifiedDate LTModifiedDate 
, B.ModifiedDate RTModifiedDate 
, B.ModifiedBy RT 
, CASE 
    WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
    ELSE 'N' 
END DiffNameFlag 
, CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
    ELSE 'N' 
END DiffProductNumberFlag 
, CASE 
    WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffMakeFlagFlag 
, CASE 
    WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffFinishedGoodsFlagFlag 
, CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
    ELSE 'N' 
END DiffColorFlag 
, CASE 
    WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
    ELSE 'N' 
END DiffSafetyStockLevelFlag 
, 'n/a' DiffReorderPointFlag 
, CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05       
     THEN 'Y' 
ELSE 'N' 
END DiffStandardCostFlag 
, CASE 
     WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
     ELSE 'N' 
END DiffListPriceFlag 
, 'n/a' DiffSizeFlag 
, 'n/a' DiffDaysToManufactureFlag 
, 'n/a' DiffProductLineFlag 
, 'n/a' DiffClassFlag 
, 'n/a' DiffStyleFlag 
, CASE 
    WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellStartDateFlag 
, CASE 
    WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellEndDateFlag 
, 'n/a' DiffDiscontinuedDateFlag 
, 'n/a' DiffCreationDateFlag 
, 'n/a' DiffCreatedByFlag 
, CASE 
     WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
     ELSE 'N' 
END DiffModifiedDateFlag 
, 'n/a' DiffModifiedByFlag 
FROM Production.Product A 
    FULL OUTER JOIN Production.Products B 
       ON A.ProductID = B.ProductID 
WHERE CASE 
     WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 
      THEN 'Y' 
ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
       WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 


Notes:
1.   The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2.   Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3.    Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4.     Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5.     In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false) 
6.     For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7.     Please note that no validation is made for the input parameters.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.