Showing posts with label configuration. Show all posts
Showing posts with label configuration. Show all posts

03 July 2023

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" II (Technical Aspects in Dynamics 365 Finance & Operations)

 

Data Migration
Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migrations (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The chapter makes a few good points, however there are statements that require further clarifications, while others can be questionable.  

Concerning the Data Migration (DM), besides several architectural recommendations, the author makes also several technical recommendations that can be summarized as follows:

(10) migrate transactional data manually via direct input or by using the Excel add-in (and it doesn’t recommend migrating transactional data using data packages because data change frequently)
(11) put in place a data outage should be as a part of the cutover timeframe
(12) new transactional data should be migrated after Go-live;
(13) include the effort for data entry in the cutover plan.

General Aspects

In what concerns the data there are 4 important phases during a cutover: configuring the production environment, migrating the master data, migrating the transactional data, respectively importing/creating the new transactional data. After each of these phases a data validation step is required to assure and sign-off on data quality.

Ideally, one can make sure that the production environment is correctly set up by deploying a copy of the database with the gold configuration (e.g. export the database and restore it in the target environment). Otherwise, direct data entry and templates, when available, can help obtain the same result, though the effort and risks for errors are higher.

Moving to next phases, it’s important to understand that a data migration is not a copy paste of some data from one system to another. Often the systems have different schemas, data definitions or granularity of the data entities. Ideally, a DM layer in between should take as input the source data and prepare the load data for the target system. This applies to master as well as to transactional data.

For importing data in D365 FO there are the following main options: 
(a) manual data entry
(b) import via Excel-add in and templates
(c)  manual/automated data packages
(d) batch API

As rule of thumb, if one has no more than 100-200 records for a data entity, it might be Ok to enter the data manually, eventually by splitting the effort between several users. This would allow users to accommodate themselves with the system, even if errors are made in the process. However, giving the importance of having “clean data” and a repeatable process for Go-Live makes this approach less desirable. On the other side, there will be cases when this will be the only available option.

As soon data's volume goes above this threshold, the effort doesn’t make sense. Preparing the data in Excel and importing them via the Excel add-in is in most cases recommended, as long as the volume of data is manageable. Moreover, data can be partitioned and imported in batches of 1000-2000 records. Ideally, the data should be available in the same structure as required by the templates used.

There will be however a second threshold that makes a batch API solution more attractive.  How big is this threshold? It depends. I was able to import 50-100k records via partitioning in Excel add-in, though these values shouldn’t be taken as fix.

The dependencies existing between data will dictate the order in which data must be imported, while the size of each data entity can be used to decide which approach will be used.

Master Data

In theory, the migration of master data can start as soon as the corresponding configuration is available. However, it is recommended to split the two phases and make sure that the environment is fully configured. This helps take a backup of the configuration, when such a snapshot is not available (see golden configuration in previous post).

Before taking a snapshot of the master data from the source system(s) it’s recommended to disable the access for changing the respective data (aka master data freeze). Otherwise, besides the fact that the changes will not appear in the target system(s), changes can make master data’s validation more complex. Sometimes, that's a risk the business is willing to take. 

The master data are typically imported a few days before the transactional data need to be imported to allow the team to validate the master data and if the data don’t have the expected quality, perform at most one more migration. Thus, the migration of master data can start one or two weeks earlier, however the longer the timeframe, the higher the chances that the business will be impacted by this (e.g. new orders with new products are needed urgently).

Transaction Data

Before migrating the transactional data, a few processes must be run (e.g. monthly/yearly closing, inventory counting, receiving goods in transit, etc.). Once this accomplished, the system can be frozen and thus the access to making changes disabled. This can happen in phases, depending on the requirements (e.g. migrating the balance can happen much later, even weeks after Go-Live).

What one can migrate are only open transactions (e.g. open purchase orders, open sales orders, open customer/vendor invoices, active assets) and balances (e.g. inventory, trial balance). Usually migrating historical data is out of the question. A data warehouse or similar data repository is more appropriate for storing historical data. Otherwise, keeping the source system(s) available for some users for regulatory requirements would be a better option, when feasible.

The biggest issue with transactional data is that the referenced values (products, customers, vendors) must be available in the target system(s). Even if names and descriptions are maybe the same, the unique identifiers or the surrogate keys are more likely to change. E.g. a product, vendor or customer will have other product number, vendor number or customer number than in the source system(s). This means that the old values need to be replaced with the new ones and this can become a tedious and error-prone process even for Excel. Unless the number of records is really small and there’s no other solution, I don’t recommend this approach.

The alternative would be to build a data migration layer that can address many of the challenges of data migrations. The effort for building such a layer might be high comparable with a manual transformation of the data, though it increases the chances of success by a considerable factor.

During and Post-Go-Live

After validating and signing off on the DM, and here extracts from source and target systems can help, the Go-Live will depend only on the functional testing’s results (and many things can go wrong in this area).

During the freeze period(s) of the source systems, more likely that new master and transactional data needed to be created. Ideally, these data should be entered after the Go-Live announcement, though it isn’t a must if a backup of the target system was taken before. For this the Excel add-ins can become the tool of choice.

With the Go-Live the DM should be over, though there will always be inquiries from the business. In fact only when the auditor signed off the DM is over. Even when one thinks that everything is over a few more surprises can appear – forgotten data, data enrichment, data for new features, etc.

Wrap Up

These are the most important aspects the reader should be aware of. There is more to say about the DM architecture and process, there are more best practices that need to be considered in areas like planning, conceptualization, quality assurance, principles, etc.

Comming back to the best practices from the book, it's worth to stress out that the frequency with which data changes is not the main driver for what approach to use in the DM. Definitely more important is the volume and complexity of data entities to be migrated, and this applies to master and transactional data altogether. Therefore, the argumentation behind (10) doesn't stand entirely. 

Concerning (11), a multi-level data freeze is more appropriate than an outage, even if the author intended maybe to say the same thing. 

(12) and (13) make sense, though the new data are part of daily business (business as usual) and not of the DM. Moreover, if the data entry or import fails because of whatever reason, it can't be the DM to blame. Even if the lessons learned during DM can be further used for mass data entry and updates, this doesn't mean that the DM project continues to exist. In theory, the DM layer can be used further on, though the respective layer was build on different premises that become obsolete with the Go-Live. One needs to think only from the perspective of the new system. Data Management or more specifically Master Data Management should be responsible for this type of data changes!

Previous Post <<||>>  Next Post

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" I (Architecture Aspects in Dynamics 365 Finance & Operations)

 

Data Migrations
Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migration (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The section on best practices makes many good points, however some of the practices require further clarifications, while some statements can be questionable as the context associated with them can make an important difference.  Overall however the recommendations hold.

Concerining Data Migrations (DM), besides a few teachnical recommendations, the author makes also several architectural recommendations that can be summarized as follows:

(1) put the data into a backup system or database, if possible, and use that system to the data extraction parts of the DM tasks;
(2) use a Tier 2 system for the majority of the development of the data packages;
(3) once the data packages validated, they can be used against production environments;
(4) don’t use the OData protocol for data transfer, but use the Batch API instead;
(5) don’t use dual-write for DM (technology used for data integrations), first complete the DM and after that enable the dual-write;
(6)  have a backup of the environments involved;
(7) have a good internet connection;
(8) plan an environment for DM (at a 2-tier environment, distinct from the one used for functional testing);
(9) for the gold configurations have an environment with limited access.

General Aspects
 
In a Data Migration there are at least 2 systems involved, though in more complex scenarios there can be one more source systems, respectively one or more target systems. At minimum there is a source and a target system.

Ideally, a target production environment should not be used for testing the data migration! On the other side, as long there’s a backup with a given state of the system (e.g. only configuration data, without master or transactional data) a system can be always restored to a previous state. This applies to D365 or to any system for which a database backup and restore can be applied. Even so, as best practice it isn’t recommended to use a production environment for testing as this can increase the complexity of the data migration.

Moreover, the same constraint applies also to the sandbox used for UAT (User Acceptance Testing), given that is supposed to represent at different points in time the same state as the production environment). Thus, at least a third environment will be needed.

There are no hard constraints on the source systems. Ideally, one should use the production source system(s) or environments that resemble the production environments. A read replica of the respective environment(s) will work as well, given that there are typically only reads involved.

The downside of accessing directly a production environment for DM is that the data changes frequently, which makes it more difficult to validate the DM logic – the time factor needing to be considered – data being added, deleted or changed. That’s why an environment with a recent snapshot from production would facilitate the process and would make sure that the DM workloads don’t affect production environment’s performance.

Often, a better alternative would be to have a database in between (aka DM layer) that contains only the data in scope of DM. ETL (Extract Transfer Load) jobs can extract the data on demand and in a consistent manner, this approach assuring a snapshot. This layer can be used to build, test and troubleshoot the DM logic, before Go-Live and after, as issues will be more likely raised by the business and will need to be mitigated.

There are also scenarios in which the direct access to source systems is not possible, a push, respectively a push & pull scenario being needed. If possible, it would be great if the data needed for migration could be exported directly from the source system(s) as needed by the target system(s). In some scenarios this might be achievable, though the bigger the differences in schemas betweeen the systems and the more complex the data, the more transformations are needed, respectively the more difficult it becomes to achieve this. Therefore, moving such logic to an intermediate DM layer would facilitate the DM architecture allowing to address many of the challenges. 
 
Batch API
 
Using Batch API could be a solution when the source environments allow only API access to the data (thus no direct access over SQL scripting) or when the volume of data makes the alternatives unusable. Indeed, OData seems to be slow or unusable when the volume of data exceeds a given threshold, even if the calls can be partitioned.

Another scenario for Batch API is when the source and target systems need to operate in parallel for a considerable amount of time that would make other approaches unusable. Even if a DM typically involves the replacement of one or more systems, there can be exceptions. Such scenario increases a DM’s complexity by several factors and should be avoided. Even if such scenarios seem to be logical and approachable at first sight, the benefits can be easily outrun by the downsides.

Backup

Hopefully, your organization has a backup and restore strategy for the production and other essential environments! The strategy needs to be extended also for the further environments available during the implementation. It’s also true that until Go-Live the target environments don’t suffer many changes. Ideally, a backup should be taken at least when important changes are made to the systems. This can involve the configuration as well the DM. E.g. a setup would be required after the configuration is completed, when the master data, respectively when the transactional data was migrated. A backup of all the systems involved should be taken before Go-Live.

Gold Configuration

Having a system with the gold configuration (the values used to configure the system) available can indeed facilitate the implementation and there are two main reasons for this. Primarily, the gold configuration allows to build reliable processes around its maintenance and to minimize the risk of having discrepancies between expectations and reality. Secondly, the database with the gold configuration can be used to easily setup a new environment and this might be needed often than thought (e.g. for dry runs).

However, in praxis the technical value is easily overrun by the financial aspect as such an environment is barely used and can involve significant costs. As alternative one can use the DAT legal entity from an available environment for storing the gold configuration common across all the legal entities and easily copy it to the other legal entities. In addition, it’s needed to document the deviations, however it’s recommended to document all configurations and use this as baseline for the post-Go-Live changes.
Indeed, the access to the gold configuration should be restrained as much as possible (e.g. only admin, consultants and/or data owners) and change policies should be enforced. Otherwise, one risks having different configurations between the environments. For Go-Live it is critical that the UAT and Go-Live environments have the same configuration.

Independently of the approach used to maintain the gold configuration, it’s recommended to perform a comparison between UAT and production environments to make sure that there are no differences. The comparison can be handled also via SQL scripts, the effort being well-spent when such comparisons needed to be done several times. Even if the data from production isn’t directly accessible, a snapshot of the production database can be copied in another environment. However, this approach requires a good understanding of the tables and/or entities involved. There will be cases (e.g. module parameters) in which it’s easier to perform a manual comparison.

Wrap Up

Coming back to the recommendations, the only points that require some discussion are (1), (2) and maybe (8), while (9) was discussed above (see 'Gold configuration' section).

The recommendation of putting the data into a backup system or database is too vague. A backup system can mean a backup database that can be accessible typically only over DRBMS or an instance of the system having a copy of the data (which usually implies a RDBMS as well). Besides these, a database can refer to a read replica of source system's database or to a DM layer.

Besides price and performance, the main differences between a Tier-1 and a Tier-2 environment (see also the Microsoft documentation) rely in the number of VM machines (aka boxes) involved, how the various components are distributed between them, respectively the edition of SQL Server used. Otherwise, for the users the system will look the same. The most important constraint is that a Tier-1 isn't suitable for UAT or performance testing. In other words, the environment will be slow for concurrent use.

If the performance is acceptable, if the volume of data and the number of users is small, a Tier-1 environment can be used for building data packages, performing initial DM dry-runs and other tasks. However, a Tier-2 resembles closer the production environment and if the UAT is performed using such a system, the more likely is to identify and address the bottlenecks related to performance. Unless they accept the costs blindly, the customers will need to trade between performance and costs from the perspective of their requirements and their business context. 

17 July 2019

💻IT: Configuration Management [CM] (Definitions)

 "A discipline applying technical and administrative direction and surveillance to: identify and document the functional and physical characteristics of a configuration item, control changes to those characteristics, record and report change processing and implementation status, and verify compliance with specified requirements. (IEEE 610, 1990)

"The process of identifying and defining the configuration items in a system, controlling the release and change of these items throughout the system life cycle, recording and reporting the status of configuration items and change requests, and verifying the completeness and correctness of configuration items." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"Process for the definition and management of configurations, allowing change control and change monitoring over a defined period. Configuration management allows access to individual configurations or configuration items (i.e., work products). Differences between individual configurations are readily identifiable. A configuration can be used to form a baseline; see also Baseline." (Lars Dittmann et al, "Automotive SPICE in Practice", 2008)

"A generic term that is often used to describe the whole of the activities concerned with the creation, maintenance, and control of databases and their environments." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The management of configurations, normally involving holding configuration data in a database so that the data can be managed and changed where necessary." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Managing the items produced by the project such as requirements documents, designs, and, of course, source code. This may include controlling changes to those items so that changes don’t happen willy-nilly." (Rod Stephens, "Beginning Software Engineering", 2015)

"The detailed recording, management, and updating of the details of an information system." (Weiss, "Auditing IT Infrastructures for Compliance, 2nd Ed", 2015)

"An operational process aimed at ensuring that systems and controls are configured correctly and are responsive to the current threat and operational environments." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed, 2018)

"The process of controlling modifications to a system’s hardware, software, and documentation, which provides sufficient assurance that the system is protected against the introduction of improper modification before, doing, and after system implementation." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The process of managing versions of configuration items and their coherent consistent sets, in order to control their modification and release, and to ensure their consistency, completeness, and accuracy." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"Process responsible for maintaining information about CIs required to deliver an IT service, including their relationships" (ITIL)

02 January 2017

#️⃣Software Engineering: Programming (Part VII: Documentation - Lessons Learned)

Software Engineering
Software Engineering Series

Introduction


“Documentation is a love letter that you write to your future self.”
Damian Conway

    For programmers as well for other professionals who write code, documentation might seem a waste of time, an effort few are willing to make. On the other side documenting important facts can save time sometimes and provide a useful base for building own and others’ knowledge. I found sometimes on the hard way what I needed to document. With the hope that others will benefit from my experience, here are my lessons learned:

 

Lesson #1: Document your worked tasks


“The more transparent the writing, the more visible the poetry.”
Gabriel Garcia Marquez


   Personally I like to keep a list with what I worked on a daily basis – typically nothing more than 3-5 words description about the task I worked on, who requested it, and eventually the corresponding project, CR or ticket. I’m doing it because it makes easier to track my work over time, especially when I have to retrieve some piece of information that is somewhere else in detail documented.

   Within the same list one can track also the effective time worked on a task, though I find it sometimes difficult, especially when one works on several tasks simultaneously. In theory this can be used to estimate further similar work. One can use also a categorization distinguishing for example between the various types of work: design, development, maintenance, testing, etc. This approach offers finer granularity, especially in estimations, though more work is needed in tracking the time accurately. Therefore track the information that worth tracking, as long there is value in it.

   Documenting tasks offers not only easier retrieval and base for accurate estimations, but also visibility into my work, for me as well, if necessary, for others. In addition it can be a useful sensemaking tool (into my work) over time.

Lesson #2: Document your code


“Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.”
Damian Conway

    There are split opinions over the need to document the code. There are people who advise against it, and probably one of most frequent reasons is rooted in Agile methodology. I have to stress that Agile values “working software over comprehensive documentation”, fact that doesn’t imply the total absence of documentation. There are also other reasons frequently advanced, like “there’s no need to document something that’s already self-explanatory “(like good code should be), “no time for it”, etc. Probably in each statement there is some grain of truth, especially when considering the fact that in software engineering there are so many requirements for documentation (see e.g. ISO/IEC 26513:2009).

   Without diving too deep in the subject, document what worth documenting, however this need to be regarded from a broader perspective, as might be other people who need to review, modify and manage your code.

    Documenting code doesn’t resume only to the code being part of a “deliverables”, but also to intermediary code written for testing or other activities. Personally I find it useful to save within the same fill all the scripts developed within same day. When some piece of code has a “definitive” character then I save it individually for reuse or faster retrieval, typically with a meaningful name that facilitates file’s retrieval. With the code it helps maybe to provide also some metadata like: a short description and purpose (who and when requested it).

   Code versioning can be used as a tool in facilitating the process, though not everything worth versioning.

 

Lesson #3: Document all issues as well the steps used for troubleshooting and fixing


“It’s not an adventure until something goes wrong.”
Yvon Chouinard

   Independently of the types of errors occurring while developing or troubleshooting code, one of the common characteristics is that the errors can have a recurring character. Therefore I found it useful to document all the errors I got in terms of screenshots, ways to fix them (including workarounds) and, sometimes also the steps followed in order to troubleshoot the problem.

   Considering that the issues are rooted in programming fallacies or undocumented issues, there is almost always something to learn from own as well from others’ errors. In fact, that was the reasons why I started the “SQL Troubles” blog – as a way to document some of the issues I met, to provide others some help, and why not, to get some feedback.

 

Lesson #4: Document software installations and changes in configurations


   At least for me this lesson is rooted in the fact that years back quite often release candidate as well final software was not that easy to install, having to deal with various installation errors rooted in OS or components incompatibilities, invalid/not set permissions, or unexpected presumptions made by the vendor (e.g. default settings). Over the years installation became smoother, though such issues are still occurring. Documenting the installation in terms of screenshots with the setup settings allows repeating the steps later. It can also provide a base for further troubleshooting when the configuration within the software changed or as evidence when something goes wrong.


   Talking about changes occurring in the environment, not often I found myself troubleshooting something that stopped working, following to discover that something changed in the environment. It’s useful to document the changes occurring in an environment, importance stressed also in “Configuration Management” section of ITIL® (Information Technology Infrastructure Library).

 

Lesson #5: Document your processes


“Verba volant, scripta manent.” Latin proverb
"Spoken words fly away, written words remain."

    In process-oriented organizations one has the expectation that the processes are documented. One can find that it’s not always the case, some organization relying on the common or individual knowledge about the various processes. Or it might happen that the processes aren’t always documented to the level of detail needed. What one can do is to document the processes from his perspective, to the level of detail needed.

 

Lesson #6: Document your presumptions


“Presumption first blinds a man, then sets him a running.”
Benjamin Franklin

   Probably this is more a Project Management related topic, though I find it useful also when coding: define upfront your presumptions/expectations – where should libraries lie, the type and format of content, files’ structure, output, and so on. Even if a piece of software is expected to be a black-box with input and outputs, at least the input, output and expectations about the environment need to be specified upfront.

 

Lesson #7: Document your learning sources


“Intelligence is not the ability to store information, but to know where to find it.”
Albert Einstein

    Computer specialists are heavily dependent on internet to keep up with the advances in the field, best practices, methodologies, techniques, myths, and other knowledge. Even if one learns something, over time the degree of retention varies, and it can decrease significantly if it wasn’t used for a long time. Nowadays with a quick search on internet one can find (almost) everything, though the content available varies in quality and coverage, and it might be difficult to find the same piece of information. Therefore, independently of the type of source used for learning, I found it useful to document also the information sources.

 

Lesson #8: Document the known as well the unknown

 

“A genius without a roadmap will get lost in any country but an average person
with a roadmap will find their way to any destination.”
Brian Tracy

   Over the years I found it useful to map and structure the learned content for further review, sometimes considering only key information about the subject like definitions, applicability, limitations, or best practices, while other times I provided also a level of depth that allow me and others to memorize and understand the topic. As part of the process I attempted to keep the  copyright attributions, just in case I need to refer to the source later. Together with what I learned I considered also the subjects that I still have to learn and review for further understanding. This provides a good way to map what I known as well what isn’t know. One can use for this a rich text editor or knowledge mapping tools like mind mapping or concept mapping.


Conclusion


    Documentation doesn’t resume only to pieces of code or software but also to knowledge one acquires, its sources, what it takes to troubleshoot the various types of issues, and the work performed on a daily basis. Documenting all these areas of focus should be done based on the principle: “document everything that worth documenting”.



Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.