About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Wednesday, January 13, 2010

Data Quality Dimensions – Completeness

    Completeness refers to the extent there are missing data in a data set, fact reflected in the number of the missing values, also referred as empty or Null “values” (simpler Nulls), and/or in the number of missing records.

    The missing values are typically considered in report to mandatory attributes, attributes that need a not Null value for each record, though after case might be applied to non-mandatory attributes (optional attributes) too, for example when is intended to understand whether the attributes are adequately maintained or not. It’s interesting that [1] considers also the inapplicable attributes referring to the attributes not applicable in certain scenarios (e.g. physical dimensions for service-based materials), which together with the applicable attributes can be considered as another type of categorization for attributes. Whether an attribute is mandatory is decided upon request and not necessarily upon the physical structure containing the attribute, in other words an attribute could be optional as per database schema and mandatory per business rules.

    Missing records can be a misleading term because it’s used in several contexts, however within data completeness context it refers only to the cases not covered by data integrity. For example in parent-child table relations the header data was entered though the detail data is missing, either not entered or deleted; such a case is not covered by data integrity because there is no missing reference, but just the parent without child data (1:n cardinality). A second such example occurs when the same entity is split across several tables at the same level of detail, though also here one of the tables must function as a parent, falling in the previous mentioned example (1:1 cardinality). Both examples are dealing with explicit data referents – the “parent” data, though there are cases in which the referents are implicit, for example when the data are not available for a certain time interval (e.g. period, day) even if needed, though also for this case the referents could be made explicit, falling in the previous mentioned examples. In such scenarios all the attributes corresponding to the missing records will be null.

    Normally the completeness of parent-child relations is enforced with the help of referential integrity and database transactions, a set of actions performed as a single unit of work, they allowing saving the parent data only if the child data were saved successfully, though such type of constraints is not always necessary.

[1] Loshin D. (2009). Master Data Management. Morgan Kaufmann OMG Press. ISBN 978-0-12-374225-4.

No comments: