About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Wednesday, February 03, 2010

Reports - Ways of Looking at Data – Part III

    In the previous posts on the same topic (see Part I and Part II) I was showing how the ranking window functions can be used in SQL Server in order to show the first/last record appearing in a certain partition. A broader set of reports are the ones focusing on the top n records matching a certain criteria, for example the top 10 Vendors/Products based on the amount purchased. Before the introduction of ranking window functions in SQL Server 2005 such requirements were previously addressed using the TOP clause. The following two queries are showing the top 10 Vendors, respectively the top 10 Vendors/Products based on the amount purchased, the first query focusing on the active Vendors while the second on all Vendors – now it depends on the requirements whether only the active or all Vendors are needed, therefore the Users should be always asked about this concern.
    In contrast the use of ranking/aggregated window functions allows more flexibility in selecting not only the first n Vendors, but any interval, thus being possible to use this feature as a simple pagination technique. Here is the first query modified.
    As for the second query the TOP clause allows only to select the top 10 Vendors/Products based on the amount purchased, not being possible to select each top n Vendors for each purchased Product as ranking functions allow. Here is the query for the top 3 Vendors for a Product.
    The logic for the current example could be reused in further analysis, therefore it makes sense to encapsulated it in a database object. A view would do, though if we would like to use the number of Vendors as parameter, then it’s more appropriate to use a table-valued function or a stored procedure for this. Even if in theory stored procedures provide better performance, if we like to include the above logic in further queries then creating a table-valued function is the best choice.
    Once the table-valued function was created we can go on and use it in other queries, for example here is a simple query showing maximum 5 vendors for each product.
    Most probably the Users used with the pivoting functionality of Excel would ask you if it’s possible to show all the Vendors together in one line. Starting with SQL Server 2005 Microsoft introduced the PIVOT operator though it can pivot only one attribute, fortunately there is an alternative using a GROUP BY together with the CASE function.
    The downside of this technique is that you can work only with a predefined number of Vendors per Product (here 3), on the other side there are few the cases in which more than 5 Vendors exist for the same Product. An alternative would be to process the data using a programming language as VBA, not a complicated thing to do at all. In almost 10 years of experience I found only a few cases in which the functionality provided by RDBMS needed to be extended with additional coding outside of the RDBMS (e.g. Excel, ASP), mainly when the data needed to be organized beyond the simple tabular structure.

No comments: