18 January 2010

Data Quality Dimensions – Referential Integrity

    In relational databases’ world there are three types of integrity mentioned – entity, referential and domain integrity. Entity integrity demands that all the tables must have a primary key that contains no Null values, the referential integrity demands that each non-null value of a foreign key must match the value of a primary key [1], while the domain integrity demands that the type of an attribute should be restricted to a certain data type, the format should be restricted by using constraints, rules or range of possible values [2]. Even if not mandatory, all three types of integrity are quintessential for reliable relational databases, with a plus for referential integrity. When the referential integrity is not enforced at database level or at least in code, then it might happen that a record from a table is deleted and a foreign key it’s still pointing to it, fact that could lead to unexpected disappearance of records from system’s screens even if the records are still in the database.

    During conversions or data migrations is important to assure that the various sets loaded match the referential and domain integrity of the database in which the data will be loaded, otherwise the records not respecting the mentioned type of integrity will be rejected. The rejection itself might not be a problem for several records, though when it happens at large scale, then the situations changes dramatically, especially when the system gives no adequate messages for the cause or rejection. A good approach for such scenarios is to assure that the scope is synchronized between the various data elements, and that the referential integrity of data sets is validated before the data are loaded in the destination database.

    Expect the unforeseeable! It’s always a good idea to check whether the referential integrity is kept by a system – there are so many things that could go wrong! There are even systems (e.g. data warehouses) that don’t necessarily need referential integrity, though in time that might become quite a problem, therefore it’s a good idea to have in place mechanisms that check for this kind of issues.

[1] Halpin. T. (2001) Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann Publishers. ISBN 1-55860-672-6.
[2] MSDN. 2009. Data Integrity. [Online] Available from: http://msdn.microsoft.com/en-us/library/ms184276.aspx (Accessed: 18 January 2009)

No comments: