SQL Troubles: historical data

Showing posts with label historical data. Show all posts

03 July 2023

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" II (Technical Aspects in Dynamics 365 Finance & Operations)

Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migrations (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The chapter makes a few good points, however there are statements that require further clarifications, while others can be questionable.

Concerning the Data Migration (DM), besides several architectural recommendations, the author makes also several technical recommendations that can be summarized as follows:

(10) migrate transactional data manually via direct input or by using the Excel add-in (and it doesn’t recommend migrating transactional data using data packages because data change frequently)
(11) put in place a data outage should be as a part of the cutover timeframe
(12) new transactional data should be migrated after Go-live;
(13) include the effort for data entry in the cutover plan.

General Aspects

In what concerns the data there are 4 important phases during a cutover: configuring the production environment, migrating the master data, migrating the transactional data, respectively importing/creating the new transactional data. After each of these phases a data validation step is required to assure and sign-off on data quality.

Ideally, one can make sure that the production environment is correctly set up by deploying a copy of the database with the gold configuration (e.g. export the database and restore it in the target environment). Otherwise, direct data entry and templates, when available, can help obtain the same result, though the effort and risks for errors are higher.

Moving to next phases, it’s important to understand that a data migration is not a copy paste of some data from one system to another. Often the systems have different schemas, data definitions or granularity of the data entities. Ideally, a DM layer in between should take as input the source data and prepare the load data for the target system. This applies to master as well as to transactional data.

For importing data in D365 FO there are the following main options:
(a) manual data entry
(b) import via Excel-add in and templates
(c) manual/automated data packages
(d) batch API

As rule of thumb, if one has no more than 100-200 records for a data entity, it might be Ok to enter the data manually, eventually by splitting the effort between several users. This would allow users to accommodate themselves with the system, even if errors are made in the process. However, giving the importance of having “clean data” and a repeatable process for Go-Live makes this approach less desirable. On the other side, there will be cases when this will be the only available option.

As soon data's volume goes above this threshold, the effort doesn’t make sense. Preparing the data in Excel and importing them via the Excel add-in is in most cases recommended, as long as the volume of data is manageable. Moreover, data can be partitioned and imported in batches of 1000-2000 records. Ideally, the data should be available in the same structure as required by the templates used.

There will be however a second threshold that makes a batch API solution more attractive. How big is this threshold? It depends. I was able to import 50-100k records via partitioning in Excel add-in, though these values shouldn’t be taken as fix.

The dependencies existing between data will dictate the order in which data must be imported, while the size of each data entity can be used to decide which approach will be used.

Master Data

In theory, the migration of master data can start as soon as the corresponding configuration is available. However, it is recommended to split the two phases and make sure that the environment is fully configured. This helps take a backup of the configuration, when such a snapshot is not available (see golden configuration in previous post).

Before taking a snapshot of the master data from the source system(s) it’s recommended to disable the access for changing the respective data (aka master data freeze). Otherwise, besides the fact that the changes will not appear in the target system(s), changes can make master data’s validation more complex. Sometimes, that's a risk the business is willing to take.

The master data are typically imported a few days before the transactional data need to be imported to allow the team to validate the master data and if the data don’t have the expected quality, perform at most one more migration. Thus, the migration of master data can start one or two weeks earlier, however the longer the timeframe, the higher the chances that the business will be impacted by this (e.g. new orders with new products are needed urgently).

Transaction Data

Before migrating the transactional data, a few processes must be run (e.g. monthly/yearly closing, inventory counting, receiving goods in transit, etc.). Once this accomplished, the system can be frozen and thus the access to making changes disabled. This can happen in phases, depending on the requirements (e.g. migrating the balance can happen much later, even weeks after Go-Live).

What one can migrate are only open transactions (e.g. open purchase orders, open sales orders, open customer/vendor invoices, active assets) and balances (e.g. inventory, trial balance). Usually migrating historical data is out of the question. A data warehouse or similar data repository is more appropriate for storing historical data. Otherwise, keeping the source system(s) available for some users for regulatory requirements would be a better option, when feasible.

The biggest issue with transactional data is that the referenced values (products, customers, vendors) must be available in the target system(s). Even if names and descriptions are maybe the same, the unique identifiers or the surrogate keys are more likely to change. E.g. a product, vendor or customer will have other product number, vendor number or customer number than in the source system(s). This means that the old values need to be replaced with the new ones and this can become a tedious and error-prone process even for Excel. Unless the number of records is really small and there’s no other solution, I don’t recommend this approach.

The alternative would be to build a data migration layer that can address many of the challenges of data migrations. The effort for building such a layer might be high comparable with a manual transformation of the data, though it increases the chances of success by a considerable factor.

During and Post-Go-Live

After validating and signing off on the DM, and here extracts from source and target systems can help, the Go-Live will depend only on the functional testing’s results (and many things can go wrong in this area).

During the freeze period(s) of the source systems, more likely that new master and transactional data needed to be created. Ideally, these data should be entered after the Go-Live announcement, though it isn’t a must if a backup of the target system was taken before. For this the Excel add-ins can become the tool of choice.

With the Go-Live the DM should be over, though there will always be inquiries from the business. In fact only when the auditor signed off the DM is over. Even when one thinks that everything is over a few more surprises can appear – forgotten data, data enrichment, data for new features, etc.

Wrap Up

These are the most important aspects the reader should be aware of. There is more to say about the DM architecture and process, there are more best practices that need to be considered in areas like planning, conceptualization, quality assurance, principles, etc.

Comming back to the best practices from the book, it's worth to stress out that the frequency with which data changes is not the main driver for what approach to use in the DM. Definitely more important is the volume and complexity of data entities to be migrated, and this applies to master and transactional data altogether. Therefore, the argumentation behind (10) doesn't stand entirely.

Concerning (11), a multi-level data freeze is more appropriate than an outage, even if the author intended maybe to say the same thing.

(12) and (13) make sense, though the new data are part of daily business (business as usual) and not of the DM. Moreover, if the data entry or import fails because of whatever reason, it can't be the DM to blame. Even if the lessons learned during DM can be further used for mass data entry and updates, this doesn't mean that the DM project continues to exist. In theory, the DM layer can be used further on, though the respective layer was build on different premises that become obsolete with the Go-Live. One needs to think only from the perspective of the new system. Data Management or more specifically Master Data Management should be responsible for this type of data changes!

Previous Post <<||>> Next Post

06 July 2020

💠🛠️🪄SQL Server: Undocumented (Part III: SQL Server CPU Utilization via the Ring Buffer)

Introduction

If no proper monitoring solution of the SQL Server and the hosting server is in place to review the CPU utilization, one can use the Scheduler Monitor buffer provided by the undocumented sys.dm_os_ring_buffers data management view (DMV). Introduced with SQL Server 2005, the DMV provides significant amount of diagnostic memory information in XML form via several buffers: Resource Monitor, Out-of-Memory, Memory Broker, Buffer Pool, respectively Scheduler Monitor buffer [2]. A ring buffer is a recorded response to a notification [1].

The view changed between the various versions of SQL Server, while with the introduction of Always On availability groups in SQL Server 2017 further buffer rings were made available (see [5]).

Warning:

According to Microsoft (see [4] the sys.dm_os_ring_buffers is provided only for information purposes, the future compatibility post SQL Server 2019 being not guaranteed!

Querying the Scheduler Monitor Buffer

Within the Scheduler Monitor buffer, the DMV stores a history of 4 hours uptime with minute by minute data points (in total 256 entries) with the CPU utilization for the SQL Server, other processes, respectively the system idle time as percentages. It allows thus to identify the peaks in CPU utilization and thus to determine the intervals of focus for further troubleshooting. As the data are stored within an XML structure, the values can be queried via the XQuery syntax as follows:

-- cpu utilization for SQL Server and other applications
DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)
        FROM sys.dm_os_sys_info); 

SELECT DAT.record_id
, DAT.EventTime
, DAT.SQLProcessUtilization 
, DAT.SystemIdle 
, 100 - (DAT.SystemIdle + DAT.SQLProcessUtilization) OtherUtilization
FROM ( 
	SELECT record.value('(./Record/@id)[1]', 'int') record_id
	, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') SystemIdle 
	, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') SQLProcessUtilization
	, EventTime 
	FROM ( 
		SELECT DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) EventTime
		, [timestamp]
		, CONVERT(xml, record) AS [record] 
		FROM sys.dm_os_ring_buffers 
		WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 
		  AND record LIKE N'%<SystemHealth>%') AS x 
	) AS DAT
ORDER BY DAT.record_id DESC;

If the SQL Server is not busy as all, the SQL Server utilization time may tend to 0%, while the system idle time to 90%. (It's the case of my SQL Server lab.)

CPU Utilization for my home SQL Server lab

Notes:

If the server was restarted within the last 4 hours, then the points will have a gap between two readings corresponding to the downtime interval.

The query is supposed to run also on Linux machines, though the SystemIdle time will be 0. One can thus consider the SQL and non-SQL CPU utilization.

Storing the History

The above query can be run on a regular basis (e.g. every 3-4 hours) via a SSIS package and push the data into a table for historical purposes. Because is needed to have a continuous history of the readings, it's better if the gap between runs is smaller than the 4 hours. No matter of the approach used is better to check for overlappings when storing the data:

-- dropping the table
-- DROP TABLE IF EXISTS dbo.T_RingBufferReadings 

-- reinitilizing the history
-- TRUNCATE TABLE dbo.T_RingBufferReadings

-- creating the table
CREATE TABLE dbo.T_RingBufferReadings (
  Id bigint IDENTITY (1,1) NOT NULL
, RecordId bigint 
, EventTime datetime2(3) NOT NULL
, SQLProcessUtilization int NOT NULL
, SystemIdle int NOT NULL
, OtherUtilization int NOT NULL
)


-- reviewing the data
SELECT *
FROM dbo.T_RingBufferReadings 
ORDER BY EventTime DESC

If there are many records, to improve the performance, one can create also an index, which can include the reading points as well:

-- creating a unique index with an include 
CREATE UNIQUE NONCLUSTERED INDEX [UI_T_RingBufferReadings_EventTime] ON dbo.T_RingBufferReadings
(
	EventTime ASC,
    RecordId ASC
) INCLUDE (SQLProcessUtilization, SystemIdle, OtherUtilization)
GO

The above query based on the DMV becomes:

-- cpu utilization by SQL Server and other applications
DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)
        FROM sys.dm_os_sys_info); 

INSERT INTO dbo.T_RingBufferReadings
SELECT record_id
, DAT.EventTime
, DAT.SQLProcessUtilization 
, DAT.SystemIdle 
, 100 - (DAT.SystemIdle + DAT.SQLProcessUtilization) OtherUtilization
FROM ( 
	SELECT record.value('(./Record/@id)[1]', 'int') record_id
	, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') SystemIdle 
	, record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') SQLProcessUtilization
	, EventTime 
	FROM ( 
		SELECT DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) EventTime
		, [timestamp]
		, CONVERT(xml, record) AS [record] 
		FROM sys.dm_os_ring_buffers 
		WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' 
		  AND record LIKE N'%<SystemHealth>%') AS x 
	) AS DAT
	LEFT JOIN dbo.T_RingBufferReadings RBR
	  ON DAT.record_id = RBR.Recordid 
WHERE RBR.Recordid IS NULL
ORDER BY DAT.record_id DESC;

Note:
A ServerName column can be added to the table if is needed to store the values for different SQL Servers. Then the LEFT JOIN has to consider the new added column.
Either of the two queries can be used to display the data points within a chart via SSRS, Power BI or any reporting tool available.

Happy coding!

References:
[1] Grant Fritchey (2014) SQL Server Query Performance Tuning: Troubleshoot and Optimize Query Performance in SQL Server 2014, 4th Ed.
[2] Sunil Agarwal et al (2005), Troubleshooting Performance Problems in SQL Server 2005, Source: TShootPerfProbs.docx
[3] Sunil Agarwal et al (2008), Troubleshooting Performance Problems in SQL Server 2008, Source: TShootPerfProbs2008.docx
[4] Microsoft SQL Docs (2018) Related Dynamic Management Views, Source
[5] Microsoft SQL Docs (2017) Use ring buffers to obtain health information about Always On availability groups, Source

Previous Post <<||>> Previous Post

06 May 2019

🧭Business Intelligence: Key Performance Indicators [KPI] (Part I: An Introduction)

Key Performance Indicators (KPIs) are quantifiable measurements (aka metrics) that reflect the critical success factor of an organization in respect to their strategic goals and objectives. They allow measuring the progress toward reaching the defined goals and, to some degree, forecasting the further evolution. They help keeping the focus on the goals, increases awareness in what concerns the goals and provide visibility into the business.

As they reflect an organization’s objectives, KPIs need to be anchored and aligned with them. If there’s no association with an objective then one doesn’t deal with a KPI but with other form of performance metric. Therefore KPIs need to change with the objectives, they are not fix.

One important requirement for a KPI is to be defined using SMART (specific, measurable, attainable, relevant, time-bound) criteria. Thus a KPI needs to be clear and unambiguous (specific), needs to measure the progress against a goal (measurable), needs to be realistic (attainable), needs to be relevant for the business and its current strategy (relevant), and needs to specify when the result(s) can be achieved (time-bound). To the SMART criteria some consider also the requirement for a KPI to be periodically and consistently evaluated and reviewed (trackable) and agreed between the parties afected by it (agreed).

A KPI needs to be visible within an organization, understandable and non-redundant. Even if KPIs are a tool for the upper management, their definition and impact needs to be visible and understood by all the people working with it, even if this can lead to unexpected behavior. The requirement for non-redundancy implies a partition of the KPIs to limit the cases in which two or more KPIs provide the same information.

A KPI needs to be supported by actions and needs to trigger actions. It’s nice to have KPIs reported periodically to the upper management, though as long no action is triggered, there’s no value in it. A KPI is kind of reinforcement for questions like: “why are we doing good/bad?”. The negative variations must trigger some form of action, however also the positive variation could involve further analysis to understand what caused the improvement.

The variation of a KPI needs to be supported by facts – each variation needs to be explainable in one form or another. A number without a story remains a number that can or not be trusted. Therefore, it might be needed to have further metrics or reports that support the KPIs, that can be used to identify the sources for variation, in order to understand the data.

Last but not the least KPIs need to be documented. The documentation needs to include at minimum a rough definition that includes the rationale, the boundary as well the critical values, metric’s owners, unit of measure, etc. In addition, one can add historical information about the KPI in respect to when and what caused variations, respectively how the variations were brought under control.

KPIs vary from an organization to another, the variation in not only influenced by the different goals organizations might have, but also based on the fact that organizations tend to measure different things, often the wrong things. It’s in general recommended to have a small number of KPIs that reflect in one dasboard how the business is doing and what is important for the business.

KPIs provide a basis for change by providing insights into what needs to change to improve some aspects of the business. When adequately defined and measured, KPIs provide a good perspective over an organization’s effort in achieving its goals and objectives, and therefore a good tool for monitoring and stirring organization’s strategy.

Previous Post <<||>> Next Post

30 March 2018

🔬Data Science: Forecast/Forecasting (Definitions)

"1. A projection or an estimate of future sales, revenue, earnings, or costs. 2. A projection of future financial position and operating results of an organization." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"The outcome of a series of exercises and analysis that helps a company, division, or product group to predict the number of units they might sell or produce, or the market share they could attain." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"An estimate or prediction of conditions and events in the project's future, based on information and knowledge available at the time of the forecast. The information is based on the project's past performance and expected future performance, and includes information that could impact the project in the future, such as estimate at completion and estimate to complete." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Refers to the operation responding to a wish to 'see in advance' what will happen later in a given field. Forecasting methods typically rely on data from the past to make forward-looking extrapolations; they assume continuity with possible inflections based on expert opinion(s)." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"Anticipating the future using quantitative techniques, such as mathematical and statistical rules and analysis of past data to predict the future, plus qualitative techniques, such as expert judgment and opinions to validate or adjust predictions." (Joan C Dessinger, "Fundamentals of Performance Improvement" 3rd Ed., 2012)

"A numerical prediction of a future value for a time series. Forecasting techniques are used to identify previously unseen trends and anticipate fluctuations to facilitate better planning." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"The practice of predicting or estimating a future event or trend, typically from historical data." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A planning tool to help management to cope with the uncertainty of the future. It is based on certain assumptions based on management’s experience, knowledge and judgment and these estimates are projected into the future using techniques such as Box-Jenkins models, Delphi method, exponential smoothing, moving averages, regression analysis and trend projection. The technique of sensitivity analysis is also often used which assigns a range of values to uncertain variables in order to reduce potential errors." (Duncan Angwin & Stephen Cummings, "The Strategy Pathfinder" 3rd Ed., 2017)

"Estimates or predictions of conditions and events in the project's future based on information and knowledge available at the time of the forecast. Forecasts are updated and reissued based on work performance information provided as the project is executed." (Project Management Institute, "Practice Standard for Scheduling" 3rd Ed., 2019)

"Forecast usually refers to a projected value for a metric. Organizations will often create a forecast that is different than their target for a given metric. There are multiple types of forecasting methods for creating forecasts based on past data and usage of them varies widely across organizations." (Intrafocus)

SQL Troubles

Pages