About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Saturday, January 16, 2010

Levels of Detail in Reporting

    Working with data mainly from frontend (User Interface), in general Users have limited knowledge on how the data are physically stored in the various systems existing in an organization. When a new report is needed they point out the attributes they know from the screens they are working with, falling in developers’ duties to figure out whether the “soup of attributes” makes really sense and find a workable solution. Once the developer has identified the attributes and the tables they are stored in, he/she can go on and create the query/queries on which the report will be based upon. For this task is important to know how the various tables relate to each other, in other words knowing the attributes which can be used to link the tables or which is the relational path to link the table, and relations’ cardinality, which reflects how the number of records of a table or data set changes when is joined with another table or data set.

    Between two tables and extensively two data sets the relations could have any of the four types of cardinality – many-to-many (represented as m:n), one-to-one (1:1), one-to-many (1:n) and the reverse many-to-one (n:1). It could be given a definition for each of the cardinality types, though it’s easier to remember that in the x-to-y compounds, between two tables or data sets A and B, x has the value ‘one’ when the number of references from table B to any record from A is maximum 1 (there could be records not referenced at all), and ‘many’ when at least a record could be referenced more than once; the same logic applies for y but inversing the tables’ perspective.

    The level of detail (LOD) of a report is directly correlated with the changes in cardinality - by adding a data set to an existing data set, if the cardinality of one-to-many and many-to-many is implied then the level of detail changes too. In other words if a record in a given data set is referenced more than once in the new added table then the LOD changes. A simple example of change of LOD occurs in parent-child/master-details relations. For example if the Invoice Lines are added to a data set based on Invoice Header then the LOD changes from Invoice Headers to Invoice Lines. If the Payments are added to the resulted data set then the LOD changes from Invoice Lines to Payments only if there are multiple Payments for an Invoice (one-to-many or many-to-many cardinalities), otherwise LOD remains the same.

    Summarizing, the level of detail of a report is the lowest level at which a cardinality change occurs at entity level. It can be discussed about lower or higher LOD in relation to a given level of detail, for example Invoice Payments have a lower LOD than Invoice Lines, while Invoice Header a higher LOD.

    Why is it necessary to introduce the LOD especially when considering a relatively complex notion as cardinality?! It worth defining it mainly because the term is used when defining/mitigating reporting requirements, its use being intuitive rather than well-defined and understood. When creating reports it’s important to find the adequate LOD for a report and know what methods can be used in order to pull data that would normally change reports’ LOD without actually changing reports’ LOD.

    The limitations imposed by the need to report at a certain LOD higher than the one implied by the row data can be overcome with the help of aggregate functions used with GROUP BY constructs. Such constructs make it possible to bring into a report data found at lower LOD than the reporting LOD by grouping the data based on a set of attributes. The aggregate functions provide functionality to calculate for a set of values the maximum, minimum, sum, count of records, standard deviation, and other statistical values, all of them working on numeric data types, while maximum/minimum and count of records work also with dates or alphanumeric data types.

    For example using aggregate functions the Invoice amounts can be aggregated and shown at Invoice Header level, however, in contrast, it’s not possible to do the same with quantities, because typically each Invoice Line refers to a specific Product, and as apples can’t be counted with peaches, this means that in order to see the quantities, the level of detail has to be changed from Invoice Header to Invoice Line. The same technique could be applied to similar parent-child (master-details) relations covering one-to-many or one-to-one cardinality, and also many-to-many relations that involve higher reporting complexity.

    Direct many-to-many relations are seldom, involving mainly data sets, the attributes used in relation appearing more than once in each dataset (at least once). Taking an example from ERP world, there could be multiple Receipts and Invoices for the same Purchase Order, thus if there is no direct match between Receipts and Invoices to identify uniquely which Invoice belong to each Receipt, a report involving all the three entities could be barely usable, records being duplicated (e.g. 2 Invoices vs. 3 Receipts result in 6 records in a such scenario). For such reports to be usable the LOD needs to be change at a higher level at least on one side, thus reconsidering the mentioned example a report could show the Purchase Order details and aggregating the Invoice & Receipt Quantities/Amounts at Purchase Order level, or show detailed Invoices with aggregated Receipt information, respectively detailed Receipts with aggregated Invoice information.

    Unfortunately even if the aggregate functions are quite handy they have their own limitations, being difficult to use them in order to answer to questions like “what was the last product sold for each customer”, “which was latest invoice placed for each customer”, at least not without considerable effort from developer’s side. Fortunately database vendors implemented their own more specialized type of aggregate functions - analytic functions in Oracle and window functions in SQL Server, they allowing modeling such questions with a report. The downside for developers is that each database vendor comes with its own philosophy, so techniques and features working in one database might not work in another.

No comments: