SQL Troubles: ERP Data Migration

Showing posts with label ERP Data Migration. Show all posts

03 July 2023

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" II (Technical Aspects in Dynamics 365 Finance & Operations)

Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migrations (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The chapter makes a few good points, however there are statements that require further clarifications, while others can be questionable.

Concerning the Data Migration (DM), besides several architectural recommendations, the author makes also several technical recommendations that can be summarized as follows:

(10) migrate transactional data manually via direct input or by using the Excel add-in (and it doesn’t recommend migrating transactional data using data packages because data change frequently)
(11) put in place a data outage should be as a part of the cutover timeframe
(12) new transactional data should be migrated after Go-live;
(13) include the effort for data entry in the cutover plan.

General Aspects

In what concerns the data there are 4 important phases during a cutover: configuring the production environment, migrating the master data, migrating the transactional data, respectively importing/creating the new transactional data. After each of these phases a data validation step is required to assure and sign-off on data quality.

Ideally, one can make sure that the production environment is correctly set up by deploying a copy of the database with the gold configuration (e.g. export the database and restore it in the target environment). Otherwise, direct data entry and templates, when available, can help obtain the same result, though the effort and risks for errors are higher.

Moving to next phases, it’s important to understand that a data migration is not a copy paste of some data from one system to another. Often the systems have different schemas, data definitions or granularity of the data entities. Ideally, a DM layer in between should take as input the source data and prepare the load data for the target system. This applies to master as well as to transactional data.

For importing data in D365 FO there are the following main options:
(a) manual data entry
(b) import via Excel-add in and templates
(c) manual/automated data packages
(d) batch API

As rule of thumb, if one has no more than 100-200 records for a data entity, it might be Ok to enter the data manually, eventually by splitting the effort between several users. This would allow users to accommodate themselves with the system, even if errors are made in the process. However, giving the importance of having “clean data” and a repeatable process for Go-Live makes this approach less desirable. On the other side, there will be cases when this will be the only available option.

As soon data's volume goes above this threshold, the effort doesn’t make sense. Preparing the data in Excel and importing them via the Excel add-in is in most cases recommended, as long as the volume of data is manageable. Moreover, data can be partitioned and imported in batches of 1000-2000 records. Ideally, the data should be available in the same structure as required by the templates used.

There will be however a second threshold that makes a batch API solution more attractive. How big is this threshold? It depends. I was able to import 50-100k records via partitioning in Excel add-in, though these values shouldn’t be taken as fix.

The dependencies existing between data will dictate the order in which data must be imported, while the size of each data entity can be used to decide which approach will be used.

Master Data

In theory, the migration of master data can start as soon as the corresponding configuration is available. However, it is recommended to split the two phases and make sure that the environment is fully configured. This helps take a backup of the configuration, when such a snapshot is not available (see golden configuration in previous post).

Before taking a snapshot of the master data from the source system(s) it’s recommended to disable the access for changing the respective data (aka master data freeze). Otherwise, besides the fact that the changes will not appear in the target system(s), changes can make master data’s validation more complex. Sometimes, that's a risk the business is willing to take.

The master data are typically imported a few days before the transactional data need to be imported to allow the team to validate the master data and if the data don’t have the expected quality, perform at most one more migration. Thus, the migration of master data can start one or two weeks earlier, however the longer the timeframe, the higher the chances that the business will be impacted by this (e.g. new orders with new products are needed urgently).

Transaction Data

Before migrating the transactional data, a few processes must be run (e.g. monthly/yearly closing, inventory counting, receiving goods in transit, etc.). Once this accomplished, the system can be frozen and thus the access to making changes disabled. This can happen in phases, depending on the requirements (e.g. migrating the balance can happen much later, even weeks after Go-Live).

What one can migrate are only open transactions (e.g. open purchase orders, open sales orders, open customer/vendor invoices, active assets) and balances (e.g. inventory, trial balance). Usually migrating historical data is out of the question. A data warehouse or similar data repository is more appropriate for storing historical data. Otherwise, keeping the source system(s) available for some users for regulatory requirements would be a better option, when feasible.

The biggest issue with transactional data is that the referenced values (products, customers, vendors) must be available in the target system(s). Even if names and descriptions are maybe the same, the unique identifiers or the surrogate keys are more likely to change. E.g. a product, vendor or customer will have other product number, vendor number or customer number than in the source system(s). This means that the old values need to be replaced with the new ones and this can become a tedious and error-prone process even for Excel. Unless the number of records is really small and there’s no other solution, I don’t recommend this approach.

The alternative would be to build a data migration layer that can address many of the challenges of data migrations. The effort for building such a layer might be high comparable with a manual transformation of the data, though it increases the chances of success by a considerable factor.

During and Post-Go-Live

After validating and signing off on the DM, and here extracts from source and target systems can help, the Go-Live will depend only on the functional testing’s results (and many things can go wrong in this area).

During the freeze period(s) of the source systems, more likely that new master and transactional data needed to be created. Ideally, these data should be entered after the Go-Live announcement, though it isn’t a must if a backup of the target system was taken before. For this the Excel add-ins can become the tool of choice.

With the Go-Live the DM should be over, though there will always be inquiries from the business. In fact only when the auditor signed off the DM is over. Even when one thinks that everything is over a few more surprises can appear – forgotten data, data enrichment, data for new features, etc.

Wrap Up

These are the most important aspects the reader should be aware of. There is more to say about the DM architecture and process, there are more best practices that need to be considered in areas like planning, conceptualization, quality assurance, principles, etc.

Comming back to the best practices from the book, it's worth to stress out that the frequency with which data changes is not the main driver for what approach to use in the DM. Definitely more important is the volume and complexity of data entities to be migrated, and this applies to master and transactional data altogether. Therefore, the argumentation behind (10) doesn't stand entirely.

Concerning (11), a multi-level data freeze is more appropriate than an outage, even if the author intended maybe to say the same thing.

(12) and (13) make sense, though the new data are part of daily business (business as usual) and not of the DM. Moreover, if the data entry or import fails because of whatever reason, it can't be the DM to blame. Even if the lessons learned during DM can be further used for mass data entry and updates, this doesn't mean that the DM project continues to exist. In theory, the DM layer can be used further on, though the respective layer was build on different premises that become obsolete with the Go-Live. One needs to think only from the perspective of the new system. Data Management or more specifically Master Data Management should be responsible for this type of data changes!

Previous Post <<||>> Next Post

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" I (Architecture Aspects in Dynamics 365 Finance & Operations)

Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migration (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The section on best practices makes many good points, however some of the practices require further clarifications, while some statements can be questionable as the context associated with them can make an important difference. Overall however the recommendations hold.

Concerining Data Migrations (DM), besides a few teachnical recommendations, the author makes also several architectural recommendations that can be summarized as follows:

(1) put the data into a backup system or database, if possible, and use that system to the data extraction parts of the DM tasks;
(2) use a Tier 2 system for the majority of the development of the data packages;
(3) once the data packages validated, they can be used against production environments;
(4) don’t use the OData protocol for data transfer, but use the Batch API instead;
(5) don’t use dual-write for DM (technology used for data integrations), first complete the DM and after that enable the dual-write;
(6) have a backup of the environments involved;
(7) have a good internet connection;
(8) plan an environment for DM (at a 2-tier environment, distinct from the one used for functional testing);
(9) for the gold configurations have an environment with limited access.

General Aspects

In a Data Migration there are at least 2 systems involved, though in more complex scenarios there can be one more source systems, respectively one or more target systems. At minimum there is a source and a target system.

Ideally, a target production environment should not be used for testing the data migration! On the other side, as long there’s a backup with a given state of the system (e.g. only configuration data, without master or transactional data) a system can be always restored to a previous state. This applies to D365 or to any system for which a database backup and restore can be applied. Even so, as best practice it isn’t recommended to use a production environment for testing as this can increase the complexity of the data migration.

Moreover, the same constraint applies also to the sandbox used for UAT (User Acceptance Testing), given that is supposed to represent at different points in time the same state as the production environment). Thus, at least a third environment will be needed.

There are no hard constraints on the source systems. Ideally, one should use the production source system(s) or environments that resemble the production environments. A read replica of the respective environment(s) will work as well, given that there are typically only reads involved.

The downside of accessing directly a production environment for DM is that the data changes frequently, which makes it more difficult to validate the DM logic – the time factor needing to be considered – data being added, deleted or changed. That’s why an environment with a recent snapshot from production would facilitate the process and would make sure that the DM workloads don’t affect production environment’s performance.

Often, a better alternative would be to have a database in between (aka DM layer) that contains only the data in scope of DM. ETL (Extract Transfer Load) jobs can extract the data on demand and in a consistent manner, this approach assuring a snapshot. This layer can be used to build, test and troubleshoot the DM logic, before Go-Live and after, as issues will be more likely raised by the business and will need to be mitigated.

There are also scenarios in which the direct access to source systems is not possible, a push, respectively a push & pull scenario being needed. If possible, it would be great if the data needed for migration could be exported directly from the source system(s) as needed by the target system(s). In some scenarios this might be achievable, though the bigger the differences in schemas betweeen the systems and the more complex the data, the more transformations are needed, respectively the more difficult it becomes to achieve this. Therefore, moving such logic to an intermediate DM layer would facilitate the DM architecture allowing to address many of the challenges.

Batch API

Using Batch API could be a solution when the source environments allow only API access to the data (thus no direct access over SQL scripting) or when the volume of data makes the alternatives unusable. Indeed, OData seems to be slow or unusable when the volume of data exceeds a given threshold, even if the calls can be partitioned.

Another scenario for Batch API is when the source and target systems need to operate in parallel for a considerable amount of time that would make other approaches unusable. Even if a DM typically involves the replacement of one or more systems, there can be exceptions. Such scenario increases a DM’s complexity by several factors and should be avoided. Even if such scenarios seem to be logical and approachable at first sight, the benefits can be easily outrun by the downsides.

Backup

Hopefully, your organization has a backup and restore strategy for the production and other essential environments! The strategy needs to be extended also for the further environments available during the implementation. It’s also true that until Go-Live the target environments don’t suffer many changes. Ideally, a backup should be taken at least when important changes are made to the systems. This can involve the configuration as well the DM. E.g. a setup would be required after the configuration is completed, when the master data, respectively when the transactional data was migrated. A backup of all the systems involved should be taken before Go-Live.

Gold Configuration

Having a system with the gold configuration (the values used to configure the system) available can indeed facilitate the implementation and there are two main reasons for this. Primarily, the gold configuration allows to build reliable processes around its maintenance and to minimize the risk of having discrepancies between expectations and reality. Secondly, the database with the gold configuration can be used to easily setup a new environment and this might be needed often than thought (e.g. for dry runs).

However, in praxis the technical value is easily overrun by the financial aspect as such an environment is barely used and can involve significant costs. As alternative one can use the DAT legal entity from an available environment for storing the gold configuration common across all the legal entities and easily copy it to the other legal entities. In addition, it’s needed to document the deviations, however it’s recommended to document all configurations and use this as baseline for the post-Go-Live changes.
Indeed, the access to the gold configuration should be restrained as much as possible (e.g. only admin, consultants and/or data owners) and change policies should be enforced. Otherwise, one risks having different configurations between the environments. For Go-Live it is critical that the UAT and Go-Live environments have the same configuration.

Independently of the approach used to maintain the gold configuration, it’s recommended to perform a comparison between UAT and production environments to make sure that there are no differences. The comparison can be handled also via SQL scripts, the effort being well-spent when such comparisons needed to be done several times. Even if the data from production isn’t directly accessible, a snapshot of the production database can be copied in another environment. However, this approach requires a good understanding of the tables and/or entities involved. There will be cases (e.g. module parameters) in which it’s easier to perform a manual comparison.

Wrap Up

Coming back to the recommendations, the only points that require some discussion are (1), (2) and maybe (8), while (9) was discussed above (see 'Gold configuration' section).

The recommendation of putting the data into a backup system or database is too vague. A backup system can mean a backup database that can be accessible typically only over DRBMS or an instance of the system having a copy of the data (which usually implies a RDBMS as well). Besides these, a database can refer to a read replica of source system's database or to a DM layer.

Besides price and performance, the main differences between a Tier-1 and a Tier-2 environment (see also the Microsoft documentation) rely in the number of VM machines (aka boxes) involved, how the various components are distributed between them, respectively the edition of SQL Server used. Otherwise, for the users the system will look the same. The most important constraint is that a Tier-1 isn't suitable for UAT or performance testing. In other words, the environment will be slow for concurrent use.

If the performance is acceptable, if the volume of data and the number of users is small, a Tier-1 environment can be used for building data packages, performing initial DM dry-runs and other tasks. However, a Tier-2 resembles closer the production environment and if the UAT is performed using such a system, the more likely is to identify and address the bottlenecks related to performance. Unless they accept the costs blindly, the customers will need to trade between performance and costs from the perspective of their requirements and their business context.

Previous Post <<||>> Next Post

21 May 2020

📦Data Migrations (DM): In-house Built Solutions (Part III: The Data Preparation Layer)

Data Migrations Series

Once the source data (including the data needed for enrichment) were made available into the migration database, one can model the source entities by encapsulating the logic in views or table-valued functions (TVFs). This enables code’s maintainability (by providing better visibility over the transformations), reuse (various validations are necessary), and performance (by taking advantage of RDBMS native functionality), and flexibility in changing the code. The objects thus created can be used in a new set of similar objects supposed to contain the mapping logic between the source and target data models. One attempts thus to prepare the data as needed by the target system.

As each target entity is modelled, it’s useful to dump the resulting data into tables, which will be further used as source for the further logic, instead of using directly the views or TVFs. This allows to keep a copy of the data, perform a range of validations, and most important, can provide better performance as indexes can be built on the tables. In addition, one can further manipulate the data in tables as requested, e.g. by including information which are later available (e.g. attributes from the target system) or, for testing or correcting the data without affecting the built logic. From the same reasons such tables can be used in intermediate steps of the migration, inclusively when modelling the source data entities. However, one should avoid their excessive use, as this can complicate the architecture unnecessarily.

The last step into the preparation layer is to prepare queries which only select the attributes needed by the target system and include additional formatting. It’s in general recommended to detach the formatting from other transformations as this approach provides better flexibility in addressing the migration requirements. The data can be afterwards exported manually or via an automated job, the latter approach being recommended especially when the data need to be partitioned.

At this stage, after validating the data, they can be imported into the target system via the mechanisms available. Until here lies in theory the boundary of the migration logic, however this layer can be extended for data validation purposes. It would be helpful for example to assure that the data imported into the target entirely reflect the prepared data. It can happen that during import data are truncated, incorrectly imported (wrong attribute, values are changed or incorrectly mapped) or even whole records not being imported, with impact on data consistency.

After importing the data into the target system one can import the migrated data via ETL packages back into the migration database and build queries which match the data at attribute level. This step may seem redundant, though it’s a way to assure that the migration occurred according to the expectations and minimize thus the later surprises. If not for all entities, this type of import might be the easiest solution for importing data into the logic (e.g. when identity values need to be mapped into the logic after migrating an entity).

It can be also helpful to import the tables for dropdown values (typical parameters), to assure thus that the values used for parameterizing the system were built into the migration logic as expected. It may sound surprising that not all systems perform checks by imports or these checks were disabled for other reasons.

In data migrations is recommended to assure data internal consistency by design. Even if various validations for uniqueness, completeness, consistency, conformity, timeliness, referential integrity or even accuracy might seem as redundant or involve extra work, on the long run they pay-off as they allow trapping the issues early in the process.

📦Data Migrations (DM): In-house Built Solutions (Part II: The Import Layer)

Data Migrations Series

A data migration involves the mapping between two data models at (data) entity level, where the entity is a data abstraction modelling a business entity (e.g. Products, Vendors, Customers, Sales Orders, Purchase Orders, etc.). Thus, the Products business entity from the source will be migrated to a similar entity into the target. Ideally, the work would be simplified if the two models would provide direct access to the data through entities. Unfortunately, this is seldom the case, the entities being normalized and thus broken into several tables, with important structural differences.

Therefore, the first step within a DM is identifying the business entities that make its scope from source and target, and providing a mapping between their attributes which will define how the data will flow between source and target.

In theory, the source entity could be defined directly into the source with the help of views, if they are not already available. The problem with this approach is that the base data change, fact that can easily lead to inconsistencies between the various steps of the migration. For example, records are added, deleted, inactivated, or certain attributes are changed, fact that can easily make troubleshooting and validation a nightmare.

The easiest way to address this is by assuring that the data will change only when actually needed. Is needed thus to create a snapshot of the data and work with it. Snapshots can become however costly for the performance of the source database, as they involve an additional maintenance overhead. Another solution is to make the snapshot via a backup or by copying the data via ETL functionality into another database (aka migration database). Considering that the data in scope make a small subset, a backup is usually costly as storage space and time, and is not always possible to take a backup when needed.

An ETL-based solution for this provides an acceptable performance and is flexible enough to address all important types of requirements. The data can be accessed directly from the source (pull mechanism) or, when the direct access is an issue, they could be pushed to the migration database (push mechanism) or made available for load to a given location, then imported it into the migration database (hybrid mechanism). There’s also the possibility to integrate the migration database when a publisher/subscriber mechanism is in place, however such solutions raise other types of issues.

One can import the tables 1:1 from the source for the entities in scope, attempt directly to model the entity within the ETL jobs or find a solution in between (e.g. import the base tables while considering joining also the dropdown tables). The latter seems to provide the best approach because it minimizes the numbers of tables to be imported while still reflecting the data structures from the source. An entity-based import addresses the first but not the second aspect, though depending on the requests it can work as well.

In Data Warehousing (DW) there’s the practice to load the data into staging tables with no constraints on them, and only when the load is complete to move the data into the base tables which will be used as source for the further processing. This approach assures that the data are loaded completely and that the unavailability of the base tables is limited. In contrast to DW solutions is ideally not to perform any transformations on the data, as they should reflect the quality characteristics from the source. It's ideal to keep the data extraction, respectively the ETL jobs as simple as possible and resist building the migration logic already into this layer.

19 May 2020

📦Data Migrations (DM): In-house Built Solutions (Part I - An Introduction)

Data Migrations Series

A Data Migration (DM) is the move of all or a subset of the data available from one or more system(s) into other system(s). For ERP systems a DM this can be achieved by having a database in between which allows the import and combination of data from various sources, the further processing of data, respectively the preparation of the data as per target system(s)’ needs.

At a deeper lever is needed to model the entities from source and the target systems at level that allows easier processing, which resumes to removing and adding data, making mapping and transformations between the input and output models. In other words, a set of entities is transformed in another set of entities, while at a much lower level attributes from the source system must be mapped to the target system.

SQL scripting provides enough flexibility in modelling the source or target system(s), in changing the data and logic on the fly, in building additional functionality, in reusing logic or cleaning the data. It works with big amounts of structured data, while for unstructured data one can still combine the power of SQL with ad-hoc or even NoSQL processing. It all sounds easy, isn’t it? Then from where comes the complexity people talk about?

Knowing SQL and the features of a (relational) database (the how) is only a small part from all is needed. Building a DM solution implies the knowledge of the source data models and of the data behind (the what), knowing the target (where), how and when to import the data, why the data must be made available, in what way the data has impact on the target system(s), respectively who needs to be involved. The more of these aspects are known, the higher the chances for the DM to succeed.

When there’s a gap in the needed knowledge, then the knowledge must be acquired from the business and/or consultants, by investing time in data discovery or similar activities, in experimenting and testing. The more uncertainty, the more iterations are needed to achieve the degree of certainty (aka quality) needed. At minimum one needs at least 3-4 iterations to migrate the data – to build the concept, to test fully the migration, respectively to get the sign-off in a test environment and doing the migration into the production.

Another type of complexity is derived from the interdependencies existing between the DM and vital activities like system parameterization and data cleaning. System’s parametrization defines how a system behaves, an important number of the parametrization values being reflected also in the prepared data for the DM. Wrong parametrization values can be trapped in the process during the various iterations, however poor data quality will reflect though the system after Go-Live and will haunt the business on the long term, unless addressed correctly into the project.

All these aspects are reflected to some degree also in DM’s architecture (e.g. building restart points, including data validation and automatic cleaning) and process (e.g. sequencing of steps). From all the effort involved in DM, maybe 20-30% needs to be invested in building the solution. These 20-30% can be reduced in theory by using specialized DM tools, though the advantage provided by such tools can be illusory, especially when further limitations are involved. On the other side such tools may provide functionality that address the other 70-80%.

As usual in IT, there’s a trade-off between advantages and disadvantages. An in-house build DM solution may provide the needed flexibility in the detriment of a small time investment. Whether it pays-off is something one has to prove by walking this path.

22 June 2018

📦Data Migrations (DM): Approaches to Planning a Data Migration for an ERP Implementation

Data Migrations Series

Introduction

ERP implementations are one of the most complex projects to plan as they often imply changes/transformations at different levels (e.g. strategic, processes, data, cultural, technological), span over one or more years, involve many resources that need to be efficiently managed, and often come with important costs for the organization.

One way of handling complexity is to ignore the nonessential in planning by focusing on the important activities/phases, following to go deeper as the project progresses. Another way to handle complexity is to split it at manageable parts – identifying and grouping together components. For example, Data Migration (DM) and Data Quality (DQ) are managed as subprojects, with their own planning. The two strategies can be combined to increase the effect.

Planning a DM cannot be done without looking at the timelines of the ERP implementation and considering the various interfaces to the DQ, however in this post I will focus only on the first two.

The Context

In the context of an ERP implementation there are three main approaches to the planning of a DM – pushing the activities toward the end of the implementation, pushing most activities toward the beginning of the implementation, or splitting the various activities over the whole timeline of the ERP implementation. Borrowing a term from statistics, we can talk about a left-skewed plan, a right-skewed plan, respectively a uniform-distributed plan.

For exemplification I will use a set of Lego pieces grouped together in 3 rows and representing the main phases of an ERP implementation, DM, respectively DQ:

The Lego pieces are a good tool for representing the phases, even they can be mischievous because there’s often no clear delimitation between certain phases as they overlap or repeat over several iterations, and bricks’ length doesn’t necessarily represent the actual duration of the phases. In addition, the phases are oversimplified in order not to clutter the diagrams. The detailed phases will be considered in further posts. The color changes gradually as the activities get closer toward the end.

Left-Skewed Planning

One way to plan a DM is backwards from the Go-Live, the DM activities flowing continuously backwards (the DM bricks are arranged from the end over the implementation bricks) and thus accumulating toward the end of the ERP implementation. This approach is natural considering that the requirements stabilize in the second half of the implementation, the code freeze occurring toward the end. Stabilizing means that the most important code changes were performed, and only minor changes need to be performed, typically bug fixing, refactoring or last-minute changes.

The DM starts with a set of requirements in what concerns the business processes, the data and configuration. Each change to these requirements equate with overwork that need to be performed. Typically, this happens only at entity level, however there can be changes that have impact for the whole or important parts of the migration. Additionally, after each set of changes another dry-run needs to be performed in order test the changes. Therefore, to minimize the volume of rework a DM needs a stable environment – in other words a stabile data model, configuration and requirements.

Thus, the conceptualizing of the migration including the prototyping can start in the first half of the implementation or, for long-running projects, even in the second half of the implementation. Performing the DM without interruptions assures an optimal use and planning of the resources – the resources are continuously working on the project, they are focused toward the end.

On the other side, the accumulation of the activities toward the end can easily lead to problems in what concerns resources’ availability. This type of approach needs a good planning, otherwise the project runs into the risk of having the Go-Live delayed until the stability of the DM is assured, or of going Live with data that don’t have the expected quality. These risks can be alleviated by adding a puffer to DM’s timeline, or by considering one of the other two approaches.

This approach minimizes the various types of waste associated with software projects, and thus the costs associated with waste.

Right-Skewed Planning

To release the pressure existing toward the end of the project, some of the DM activities can be performed toward the beginning of the project – the conceptualization and prototyping, as well the data mapping. One can in theory build also an important part of the DM for the standard functionality, following to address the changes in data model, processes and configuration in subsequent iterations. This could involve a higher volume of rework and more dry-runs, however this depends on the complexity and number of the changes. If a small number of customizations are expected, then this approach may be the best approach. Even in the case of many customization this approach might be something to consider, however the DM costs increase with the number of customization made, and in certain contexts the increase can be exponential.

This approach pushes some of the costs toward the beginning of the implementation, and this can have positive as well negative aspects. For example, it is well-known that ERP implementations involve cost overruns. With this approach the DM costs are assured toward the beginning and one can better get a hold of the budget, at least in theory. As negative aspect could be considered the cases in which an ERP implementation is stopped toward the middle of the project, the incurred costs being thus higher. In the end the main cost-driver are the volume of customizations.

Breaking a DM in two can have several other negative aspects. The data cleaning needs to be broken eventually as well, most probably in the second phase more data enrichment activities need to be considered.

The resources that worked on the first phase might not be available for the second phase. An adequate knowledge transfer might be hard to make, so the second team might need good documentation or time to understand the solution. This can lead to other type of behavior, e.g. rewriting unnecessarily the code, the push for a redesign, and so on.

As the environment stabilizes much later, there is the risk that an important part of the migration need to be reworked/redesigned. In extreme cases might be needed to start from zero. The chances for this to happen are small, though such a case can occur. Probably some of the code, transformation can be reused, though this depends from case to case. Without knowing implementation’s details it’s difficult to estimate the chances for something like this to happen. Sometimes is enough to invalidate a premise considered in design phase. Usually the interplay between several new requirements lead to redesign.

Uniform-distributed Plan

To alleviate the risks from the first two approaches, some of the activities could be uniformly distributed over the whole duration of the ERP implementation. This approach works well when same resources from the vendor side are involved in activities from all the three layers, the nature of the tasks allowing them to work continuously in the project. For example, the consultants working on the DM concept are helping on the mapping of the attributes, as well on data cleansing. When the work on multiple activities isn’t possible then the vendor(s) more likely will have problems in assigning resources to the project. Either the same resources will be assigned for big parts from the projects, incurring thus higher costs, or the resources will be replaced by others, additional learning being involved. In either case the costs are higher.

One of the main dangers of this approach is that certain activities will expand taking the time available, incurring thus higher costs. When the Implementation time is much higher than DM’s duration, the distance between DM’s phases can increase dramatically, being almost impossible to manage resources adequately. Keeping the metaphor of the Lego pieces, it will be thus also more difficult to build a structure on which an edifice can be built. With proper planning and adequate use of resources and knowledge the empty spaces can be incorporated in the structure for project’s advantage.

Even if this approach attempts to even the DM effort over the whole duration of the ERP implementation, performing the activities too early, before requirements stabilized can have an adverse effect.

Personal Approach

Looking back at the projects I worked on, I think I used a hybrid between the 3 approaches. The DM was planed backwards from the Go Live, however the first draft of the DM concept and prototyping was performed at the beginning of the implementation. This assured that the technical solution was working. Being involved in the creation of the data mappings as well in data cleansing, the jumps between activities allowed me to smoothly switch between the various activities, however toward the end of the project this became a bottleneck, the activities being harder to synchronize, and the volume of work could be addressed at that time only with overtime.

With a few exceptions I worked mainly alone on the technical activities, being responsible for the data mappings, design, prototyping, implementation, testing, protocolling, and execution of the DM. I think that more resources would have removed some of the burden but made the planning more complicated and the synchronization even harder. Probably a team of 2-3 people that could cover these activities would provide the optimum balance between costs, effort and quality.

Conclusion

I suppose there is no best solution that will work for all. The three approaches are more an attempt to highlight some of the extreme usages of planning. In an ERP implementation there are so many factors, so many chances for a decision to be an opportunity or a threat. My advice – ponder the various aspects/constraints, choose an approach, and adjust it as the project advances.

Previous Post <<||>> Next Post

09 June 2018

📦Data Migrations (DM): Guiding Principles

Data Migrations Series

Introduction

"An army of principles can penetrate where an army of soldiers cannot."
Thomas Paine

In life as well in IT principles serve as patterns of advice in form of general or fundamental ideas, truths or values stated in a context-independent manner. They can be used as guidelines in understanding and modeling the reality, the world we live in. With the invasion of technologies in our lives principles serve as a solid ground on which we can build castles – solutions for our problems. Each technology comes with its own set of principles that defines in general terms its usage. That's why most of the IT books attempt to catch these sets of principles. Unfortunately, few of the technical writers manage to define some meaningful principles and showcase their usages.

Many of the ideas considered as principles in papers on Data Migration (DM) are at best just practices, and some can be considered as best/good practices. Just because something worked good in a previous migration doesn’t mean automatically that the idea behind the respective decision turns automatically in a principle. Some of the advices advanced are just lessons learned in disguise. Principles through their generality apply to a broad range of cases, while practices are more activity specific.

A DM through its nature finds its characteristics at the intersection of several area - database-based architecture design, ETL workflows, data management, project management (PM) and services. From these areas one can pull a set of principles that can be used in building DM architectures.

Architecture Principles

"Architecture starts when you carefully put two bricks together."
Ludwig Mies van der Rohe

There are several general principles that apply to the architecture of applications, independently of the technologies used or the industry, e.g. research first, keep it simple/small, start with the end in mind, model first, design to handle failure, secure by design (aka safety first), prototype, progress iteratively, focus on value, reuse (aka don't reinvent the wheel), test early, early feedback, refactor, govern, validate, document, right tool – right people, make it to last, make it sustainable, partition around limits, scale out, defensive coding, minimal intervention, use common sense, process orientation, follow the data, abstract, anticipate obsolescence, benchmark, single-responsibility, single dispatch, separation of concerns, right perspective.

To them add a range of application design characteristics that can be considered as principles as well: extensibility, modularity, adaptability, reusability, repeatability, modularity, performance, revocability, auditability, subject-orientation, traceability, robustness, locality, heterogeneity, consistency, atomicity, increased cohesion, reduced coupling, monitoring, usability, etc. There are several principles that can be transported from problem solving into design - divide and conquer, prioritize, system’s approach, take inventory, and so on.

A DM’s architecture has more to do with a data warehouse as it relies heavily on ETL tasks and data need to be stored for various purposes. Besides the principles of good database design, a few other principles apply: model (the domain) first, denormalize, design for performance, maintainability and security, validate continuously. From ETL area following principles can be considered: single point of processing, each step must have a purpose, minimize touch points, rest data for checkpoints, leverage existing knowledge, automate the steps, batch processing.

In addition, considering their data-specific character, a DM can be regarded as one or several data products, though in contrast with typical data products DM have typically a limited purpose. From this area following principles could be considered: build trust with transparency, blend in, visualize the complex.

Data Management Principles

Considering that a DM’s focus is an organization's data, some principles need to focus on the management and governance of Data. Data Governance together with Data Quality, Data Architecture, Metadata Management, Master Data Management are functions of Data Management. The focus is on data, metadata and their lifecycle, on processes, ownership and roles and their responsibilities. With this in mind there can be defined several principles supposed to facilitate the functions of Data Management: manage data as asset, manage data lifecycle, the business owns the data, integration across the organization, make data/metadata accessible, transparent and auditable processes, one source of truth.

As part of DM there are customer, employee and vendor information which fall under the General Data Protection Regulation (GDPR) EU 2016/679 regulation which defines the legal framework for data protection and privacy for all individuals within the European Union (EU) and the European Economic Area (EEA) as well the export of personal data outside the EU and EEA. The regulation defines a set of principles that make its backbone: fairness, lawfulness and transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity and confidentiality, accountability [6].

Overseas, the US Federal Trade Commission (FTC) issued in 2012, a report recommending organizations design and implement their own privacy programs based on a set of best practices. The report reaffirms the FTC’s focus on Fair Information Processing Principles, which include notice/awareness, choice/consent, access/participation, integrity/security, enforcement/redress [6].

Project Management (PM) Principles

"Management is doing things right […]"
Peter Drucker

A DM though its characteristics is a project and, to increase the chances of success, it needs to be managed as a project. Managing DM as a project is one of the most important principles to consider. The usage of a PM framework will further increase the chances of success, as long the framework is adequate for the purpose and the organization team is able to use the framework. PMI, Prince2, Agile/Scrum/Kanban are probably the most used PM methodologies and they come with their own sets of principles.

In general, all or some of the PM principles apply independently on whether is used alone or in combination with other PM methodologies: a single project manager, an informed and supportive management, a dedicated team of qualified people to do the work of the project, clearly defined goals addressing stakeholders’ priorities, an integrated plan and schedule, as well a budget of costs and/or resources required [1].

On the other side, an agile approach could prove to be a better match for a DM given that requirements change a lot, frequent and continuous deliveries are needed, collaboration is necessary, agile processes as well self-organizing teams can facilitate the migration. These are just a few of the catchwords that make the backbone of the Agile Manifesto (see [3]).

An agile form of Prince2 could be something to consider as well, especially when Prince2 is used as methodology for other projects. For Prince2 are the following principles to consider: continued business justification, learn from experience, defined roles and responsibilities, manage by stages, management by exception, focus on products, tailor to suit the project environment [2].

All these PM principles reveal important aspects to ponder upon, and maybe with a few exceptions, all can be incorporated in the way the DM project is managed.

Service Principles

Considering the dependencies existing between the DM and Data Quality as well to the broader project, a DM can have the characteristics of a service. It’s not an IT Service per se, as IT only supports technically and eventually from a PM perspective the project. Even if a DM is not a ITSM service, some of the ITIL principles can still apply: focus on value, design for experience, start where you are, work holistically, progress iteratively, observe directly, be transparent, collaborate and keep it simple [4].

Conclusion

“Obey the principles without being bound by them.”
Bruce Lee

Within a DM all the above principles can be considered, though the network of implication they create can easily shift the focus from the solution to the philosophical aspects, and that’s a marshy road to follow. Even if all principles are noble, not all can be considered. It would be utopic to consider each possible principle. The trick is to identify the most "important" principles (principles that make sense) and prioritize them according to existing requirements. In theory, this is a one-time process that involves establishing a 'framework" of best/good practices for the DM, in next migrations needing only to consider the new facts and aspects.

Previous Post <<||>> Next Post

References:
[1] “Principles of project management”, by J. A. Bing, PM Network, 1994 (link)
[2] Axelos (2018) What is PRINCE2? (link)
[3] Agile Manifesto (2001) Principles behind the Agile Manifesto (link)
[4] Axelos (2018) ITIL® Practitioner 9 Guiding Principles (link)
[5] The Data Governance Institute (2018) Goals and Principles for Data Governance (link)
[6] Navigating the Labyrinth: An Executive Guide to Data Management, by Laura Sebastian-Coleman for DAMA International, Technics Publications, 2018 (link)

08 May 2018

📦Data Migrations (DM): Facts, Principles and Practices

Data Migrations Series

Introduction

Ask a person who is familiar with cars ‘how a car works‘ - you’ll get an answer even if it doesn’t entirely reflect the reality. Of course, the deeper one's knowledge about cars, the more elaborate or exact is the answer. One doesn't have to be a mechanic to give an acceptable explanation, though in order to design, repair or build a car one needs extensive knowledge about a car’s inner workings.

Keeping the proportions, the same statements are valid for the inner workings of Data Migrations (DM) – almost everybody in IT knows what a DM is, though to design or perform one you might need an expert.

The good news about DMs is that their inner workings are less complex than the ones of cars. Basically, a DM requires some understanding of data architecture, data modelling and data manipulation, and some knowledge of business data and processes. A data architect, a database developer, a data modeler or any other data specialist can approach such an endeavor. In theory, with some guidance also a person with knowledge about business data and processes can do the work. Even if DMs imply certain complexity, they are not rocket science! In fact, there are tools that can be used to do most the work, there some general principles and best practices about the architecture, planning and execution that can help in the process.

Principles and Best Practices

It might be useful to explain the difference between principles and best practices, because they’ll more likely lead you to success if you understood and incorporated them in your solutions. Principles as patterns of advice are general or fundamental ideas, truths or values stated in a context-independent manner. Practices on the other side are specific actions or applications of these principles stated in a context-dependent way. The difference between them is relatively thin, and therefore, they are easy to confound, though by looking at their generality, one can easily identify which is which.

For example, in the 60’s become known the "keep it simple, stupid" (aka KISS) principle, which states that a simple solution works better than a complex one, and therefore as key goal one should search the simplicity in design. Even if kind of pejorative, it’s a much simpler restatement of Occam’s razor –do something in the simplest manner possible because simpler is usually better. To apply it one must understand what simplicity means, and how it can be translated in designs. According to Hans Hofmann "the ability to simplify means to eliminate the unnecessary so that the necessary may speak" or in a quote quote attributed to Einstein: "everything should be made as simple as possible, but not simpler". This is the range within which the best practices derived from KISS can be defined.

There are multiple practices that allow reducing the complexity of DM solutions: start with a Proof-of-Concept (PoC), start small and build incrementally, use off-the-shelf software, use the best tool for the purpose, use incremental data loads, split big data files into smaller ones, and so on. As can be seen all of them are direct actions that address specific aspects of the DM architecture or process.

Data Migration Truths

When looking at principles and best practices they seem to be further rooted in some basic truths or facts common to most DMs. When considered together, they offer a broader view and understanding of what a DM is about. Here are some of the most important facts about DMs:

DM as a project:

A DM is a subproject with specific characteristics
A DM is typically a one-time activity before Go live
A DM’s success is entirely dependent or an organization’s capability of running projects
Responsibilities are not always clear
Requirements change as the project progresses
Resources aren't available when needed
Parallel migrations require a common strategy
A successful DM can be used as recipe for further migrations
A DM's success is a matter of perception
The volume of work increases toward the end

DM Architecture

A DM is more complex and messier than initially thought
A DM needs to be repeatable
A DM requires experts from various areas
There are several architectures to be considered
The migration approach is dependent on the future architecture
Management Systems have their own requirements
No matter how detailed the planning something is always forgotten
Knowledge of the source and target systems aren't always available
DM are too big to be performed manually
Some tasks are easier to be performed manually
Steps in the migration needs to be rerun
It takes several iterations before arriving to the final solution
Several data regulations apply
Fall-back is always an alternative
IT supports the migration project/processes
Technologies are enablers and not guarantees for success
Tools address only a set of the needed functionality
Troubleshooting needs to be performed before, during and after migrations
Failure/nonconformities need to be documented
A DM is an opportunity to improve the quality of the data
A DM needs to be transparent for the business

DM implications for the Business:

A DM requires a downtime for the system involved
The business has several expectations/assumptions
Some expectations are considered as self-evident
The initial assumptions are almost always wrong
A DM's success/failure depends on business' perception
Business' knowledge about the data and processes is relative
The business is involved for whole project’s duration
Business needs continuous communication
Data migration is mainly a business rather than a technical challenge
Business’ expertize in every data area is needed
DM and Data Quality (DQ) need to be part of a Data Management strategy
Old legacy system data have further value for the business
Reporting requirements come with their own data requirements

DM and Data Quality:

Not all required data are available
Data don't match the expectations
Quality of the data needs to be judged based on the target system
DQ is usually performed as a separate project with different timelines
Data don't have the same importance for the business
Improving DQ is a collective effort
Data cleaning needs to be done at the source (when possible)
Data cleaning is a business activity
The business is responsible for the data
Quality improvement is governed by 80-20 rule
No organization is willing paying for perfect data quality
If can’t be counted, it isn’t visible

More to come, stay tuned…

SQL Troubles

Pages

03 July 2023

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" II (Technical Aspects in Dynamics 365 Finance & Operations)

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" I (Architecture Aspects in Dynamics 365 Finance & Operations)

21 May 2020

📦Data Migrations (DM): In-house Built Solutions (Part III: The Data Preparation Layer)

📦Data Migrations (DM): In-house Built Solutions (Part II: The Import Layer)

19 May 2020

📦Data Migrations (DM): In-house Built Solutions (Part I - An Introduction)

22 June 2018

📦Data Migrations (DM): Approaches to Planning a Data Migration for an ERP Implementation

Introduction

The Context

Left-Skewed Planning

Right-Skewed Planning

Uniform-distributed Plan

Personal Approach

Conclusion

09 June 2018

📦Data Migrations (DM): Guiding Principles

Data Migrations Series

Introduction

Architecture Principles

Data Management Principles

Project Management (PM) Principles

Conclusion

08 May 2018

📦Data Migrations (DM): Facts, Principles and Practices

Principles and Best Practices

About Me