08 May 2018

Data Migration – Facts, Principles and Practices


Introduction

    Ask a person who is familiar with cars ‘how a car works‘ - you’ll get an answer even if it doesn’t entirely reflect the reality. Of course, the deeper one's knowledge about cars, the more elaborate or exact is the answer. One doesn't have to be a mechanic to give an acceptable explanation, though in order to design, repair or build a car one needs extensive knowledge about a car’s inner workings.

    Keeping the proportions, the same statements are valid for the inner workings of Data Migrations (DM) – almost everybody in IT knows what a DM is, though to design or perform one you might need an expert.

    The good news about DMs is that their inner workings are less complex than the ones of cars. Basically, a DM requires some understanding of data architecture, data modelling and data manipulation, and some knowledge of business data and processes. A data architect, a database developer, a data modeler or any other data specialist can approach such an endeavor. In theory, with some guidance also a person with knowledge about business data and processes can do the work. Even if DMs imply certain complexity, they are not rocket science! In fact, there are tools that can be used to do most the work, there some general principles and best practices about the architecture, planning and execution that can help in the process.

Principles and Best Practices

    It might be useful to explain the difference between principles and best practices, because they’ll more likely lead you to success if you understood and incorporated them in your solutions. Principles as patterns of advice are general or fundamental ideas, truths or values stated in a context-independent manner. Practices on the other side are specific actions or applications of these principles stated in a context-dependent way. The difference between them is relatively thin, and therefore, they are easy to confound, though by looking at their generality, one can easily identify which is which.

    For example, in the 60’s become known the “keep it simple, stupid” (aka KISS) principle, which states that a simple solution works better than a complex one, and therefore as key goal one should search the simplicity in design. Even if kind of pejorative, it’s a much simpler restatement of Occam’s razor –do something in the simplest manner possible because simpler is usually better. To apply it one must understand what simplicity means, and how it can be translated in designs. According to Hans Hofmann “the ability to simplify means to eliminate the unnecessary so that the necessary may speak” or in a quote quote attributed to Einstein: “everything should be made as simple as possible, but not simpler”. This is the range within which the best practices derived from KISS can be defined.

   There are multiple practices that allow reducing the complexity of DM solutions: start with a Proof-of-Concept (PoC), start small and build incrementally, use off-the-shelf software, use the best tool for the purpose, use incremental data loads, split big data files into smaller ones, and so on. As can be seen all of them are direct actions that address specific aspects of the DM architecture or process.


Data Migration Truths

    When looking at principles and best practices they seem to be further rooted in some basic truths or facts common to most DMs. When considered together, they offer a broader view and understanding of what a DM is about.  Here are some of the most important facts about DMs:

DM as a project:

  • A DM is a subproject with specific characteristics
  • A DM is typically a one-time activity before Go live
  • A DM’s success is entirely dependent or an organization’s capability of running projects
  • Responsibilities are not always clear
  • Requirements change as the project progresses
  • Resources aren't available when needed
  • Parallel migrations require a common strategy
  • A successful DM can be used as recipe for further migrations
  • A DM's success is a matter of perception
  • The volume of work increases toward the end


DM Architecture

  • A DM is more complex and messier than initially thought
  • A DM needs to be repeatable
  • A DM requires experts from various areas
  • There are several architectures to be considered
  • The migration approach is dependent on the future architecture
  • Management Systems have their own requirements
  • No matter how detailed the planning something is always forgotten
  • Knowledge of the source and target systems aren't always available
  • DM are too big to be performed manually
  • Some tasks are easier to be performed manually
  • Steps in the migration needs to be rerun
  • It takes several iterations before arriving to the final solution
  • Several data regulations apply
  • Fall-back is always an alternative
  • IT supports the migration project/processes
  • Technologies are enablers and not guarantees for success
  • Tools address only a set of the needed functionality
  • Troubleshooting needs to be performed before, during and after migrations
  • Failure/nonconformities need to be documented
  • A DM is an opportunity to improve the quality of the data
  • A DM needs to be transparent for the business


DM implications for the Business:

  • A DM requires a downtime for the system involved
  • The business has several expectations/assumptions
  • Some expectations are considered as self-evident
  • The initial assumptions are almost always wrong
  • A DM's success/failure depends on business' perception
  • Business' knowledge about the data and processes is relative
  • The business is involved for whole project’s duration
  • Business needs continuous communication
  • Data migration is mainly a business rather than a technical challenge
  • Business’ expertize in every data area is needed
  • DM and Data Quality (DQ) need to be part of a Data Management strategy
  • Old legacy system data have further value for the business
  • Reporting requirements come with their own data requirements


DM and Data Quality:

  • Not all required data are available
  • Data don't match the expectations
  • Quality of the data needs to be judged based on the target system
  • DQ is usually performed as a separate project with different timelines
  • Data don't have the same importance for the business
  • Improving DQ is a collective effort
  • Data cleaning needs to be done at the source (when possible)
  • Data cleaning is a business activity
  • The business is responsible for the data
  • Quality improvement is governed by 80-20 rule
  • No organization is willing paying for perfect data quality
  • If can’t be counted, it isn’t visible


More to come, stay tuned…

No comments:

Related Posts Plugin for WordPress, Blogger...