Data Migrations Series |
The data requirements for the Data Migration (DM) and Data Quality (DQ) are driven by the processes implemented in the target system(s). Therefore, a good knowledge of these requirements can decrease the effort needed for these two subprojects considerably. The needed knowledge basis starts with the entities and their attributes, the dependencies existing between them and the various rules that apply, and ends with the parametrization requirements, respectively the architecture(s) that can be used to import the data.
The DM process starts with
defining the entities in scope and their attributes, respectively identifying the
corresponding entities and attributes from the legacy systems. The attributes not
having a correspondent in the legacy system need to be provided by the business
and integrated in the DM logic. In addition, it’s needed to consider also the attributes
needed by the business and not available in the target system, some of them more
likely available in the legacy systems. For such attributes is needed either to
misuse an attribute from the target or to extend the target system.
For each entity is created
a data mapping that basically documents the data transformations needed for migrating
the data. In the process is needed to consider also attributes’ data types, the
(standard) formatting, their domain of definition, as well the various rules that
apply. Their implementation belongs into the DM layer from which the data are exported
in a standard format as needed by the target system.
Exporting the data from
the DM layer directly into the target system’s tables has in theory the lowest overhead
even if the rejected records are difficult to track, the rejections resulting only
from records’ ‘validation against database’s schema. For this approach to work,
one must have a good knowledge of the database schema and of the business rules
implemented into the target system.
To solve the issue with
errors’ logging, systems have a further layer on top of the database model, which
also allow running data validation against target system’s business rules. Modern
import frameworks allow loading the data via a set of standard files with a predefined
structure. The data can be thus imported manually or via load jobs into the system
a log with the issues being generated in the process. Some frameworks allow even
the manual editing of failed records, respectively to import the data. Unfortunately,
calling the layer from the DM layer is not possible from a database, though this
would bring seldom a benefit. Some third-party tools attempt to improve the import
functionality by calling the target system’s import layer.
The import files must be
generated from the DM layer in the required structure with the appropriate formatting.
The challenge however resides in identifying all the attributes that should make
scope of the load. It’s an iterative process which sometimes is backed by try-and-error
heuristics. Unless target system’s validation rules are known beforehand, the rules
need to be discovered in this process, which can prove time-consuming. The discoveries
need to be integrated also in the DM and from here results the big number of changes
that need to be performed.
Given the dependencies
existing between entities the files need to be generated and loaded in a predefined
order. These dependencies are reflected also in the data processing and the validation
rules considered in the DM layer.
A quality checkpoint can
be implemented between the export from the DM layer and import to enforce the four-eyes
principle. It’s normally the last opportunity for trapping the eventual issues.
A further quality check is performed after import by validating on whether the data
were imported as expected.
Previous Post <<||>> Next Post