18 January 2010

Data Management: Data Quality Dimensions (Part VI: Referential Integrity)

  
Data Management
Data Management Series

Referential integrity, when considered as data quality dimension, refers to the degree to which the values of a key in one table (aka reference value) match the values of a key in a related table (aka the referenced value). Typically, that's assured by design in Database Management Systems (DBMS) using a feature called referential integrity that defines an explicit relationship between the two tables that makes sure that the values remain valid during database changes. Thus, when a record is inserted or updated and a value is provided for the reference value, the system makes sure that the referenced value is valid, otherwise it throws a referential integrity error. A similar error is thrown when one attempts to delete the record with the referenced value as long is referenced by a table on which the relationship was explicitly defined.

Using referential integrity is a recommended technique for assuring the overall integrity of the data in a database, though there are also exceptions when that's not enforced for all tables (e.g. data warehouses) or only for exceptions (e.g. interface tables where records are imported as they are, attribute whose values references data from multiple tables). Therefore, even if there are tables with the referential integrity enforced, don't make the assumption that it applies to all tables!

In relational DBMS there are three types of integrity mentioned – entity, referential and domain integrity. Entity integrity demands that all the tables must have a primary key that contains no Null values. The referential integrity demands that each non-null value of a foreign key must match the value of a primary key [1], while the domain integrity demands that the type of an attribute should be restricted to a certain data type, the format should be restricted by using constraints, rules or range of possible values [2]. 

Even if not mandatory, all three types of integrity are quintessential for reliable relational databases. When the referential integrity is not enforced at database level or at least in code, when a record from a table is deleted and a foreign key it’s still pointing to it, fact that could lead to unexpected disappearance of records from the system’s UI even if the records are still available. 

During conversions or data migrations is important to assure that the various sets loaded match the referential and domain integrity of the database in which the data will be loaded, otherwise the records not respecting the mentioned type of integrity will be rejected. The rejection itself might not be a problem for several records, though when it happens at large scale, then the situations changes dramatically, especially when the system gives no adequate messages for the cause or rejection. A recommended approach is to assure that the scope is synchronized between the various data elements, and that the referential integrity of datasets is validated before the data are loaded in the destination database.

There are several sources (e.g. [3]) that consider Codd’s referential integrity constraint as a type of consistency, in the support of this idea could be mentioned the fact that referential integrity could be used to solve data consistency issues by bringing the various LOV in the systems. Referential integrity is mainly an architectural concept even if it involves the 'consistency' of foreign key/primary key pairs.

Note:
Expect the unforeseeable! It’s always a good idea to check whether the referential integrity is kept by a system – there are so many things that could go wrong! In data migration solutions, data warehouses and more general analytical solutions is a good idea to have in place mechanisms that check for this kind of issues.


Written: Jan-2010, Last Reviewed: Mar-2024

References:
[1] Halpin. T. (2001) Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann Publishers. ISBN 1-55860-672-6.
[2] MSDN. 2009. Data Integrity. [Online] Available from: http://msdn.microsoft.com/en-us/library/ms184276.aspx (Accessed: 18 January 2009)
[3] Lee Y.W., Pipino L.L., Funk J.D., Wang R.Y. (2006) "Journey to Data Quality", MIT Press. ISBN: 0-262-12287-1

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.