SQL Troubles

24 May 2020

🧮💫ERP: Proof-of-Concept (Part I: Migrating AdventureWorks to Dynamics 365 - Products)

ERP Implementations Series

Below is exemplified the migration of Products from AdventureWorks database to Dynamics 365 (D365), where a minimum of steps were considered. Variations (e.g. enrichment of data, successive migrations) and other Entities (Product Variants, Released Products, Released Product Variants, etc.) will be considered in future posts.

As the AdventureWorks database is available only for testing and exemplification purposes, there is no need for a data import layer, the data being prepared into the “Map” schema created for this purpose. In theory the same approach can be used in production systems as well, though usually it’s better to detach the migration layer from the source system(s) from performance or security reasons.

-- creating a schema into the AdventureWorks database 
CREATE SCHEMA [Map]

Step 1: Data Discovery

Within this step one attempts getting a sorrow understanding of the systems involved within the data migration, in this case AdventureWorks and D365. As basis for this will be analyzed the tables for each entity, respectively the relations existing between them, the values, the distribution as well the relations existing between attributes. Is needed to analyze the similarities as well differences between the involved data models at structural as well at value level.

In AdventureWorks the SKU (Stock Keeping Unit) has a Color, Size and Style as Dimensions, a Product being created for each SKU. In D365 one differentiates between Products and Dimensions associated with it, having thus two levels defined. In addition, in D365 a Product has also the Configuration as dimension:

In addition, the Products and their Dimensions are defined at master level with a minimal of attributes like the Dimension Group. After that the Products and their Dimensions can be released for each Business Unit (aka Data Area), where the detailed attributes for Purchasing, Sales, Production or Inventory are maintained. For those acquainted with Dynamics AX 2009 it’s the same structure.
Once the structural differences identified, one can start looking at the values that define a product in both systems. The following queries are based on the source system.

-- reviewing the sizes 
SELECT Size 
, count(*) NoSizes
FROM [Production].[vProductDetails]
WHERE CultureId = 'en'
GROUP BY Size
ORDER BY Size

-- reviewing the colors 
SELECT Color
, count(*) NoColors
FROM [Production].[vProductDetails]
WHERE CultureId = 'en'
GROUP BY Color
ORDER BY Color

-- reviewing the styles  
SELECT Style
, count(*) NoStyles
FROM [Production].[vProductDetails]
WHERE CultureId = 'en'
GROUP BY Style
ORDER BY Style

If the above queries show what values are used, the following shows the dependencies between them:

-- reviewing the sizes, colors, styles  
SELECT Size
, Color
, Style
, count(*) NoValues
FROM [Production].[vProductDetails]
WHERE CultureId = 'en'
GROUP BY Size
, Color
, Style
ORDER BY 1,2,3

-- reviewing the dependencies between sizes, colors, styles  
SELECT CASE WHEN IsNull(Size, '') != '' THEN 'x' ELSE '' END HasSize
, CASE WHEN IsNull(Color, '') != '' THEN 'x' ELSE '' END HasColor
, CASE WHEN IsNull(Style, '') != '' THEN 'x' ELSE '' END HasStyle
, count(*) NoValues
FROM [Production].[vProductDetails]
WHERE CultureId = 'en'
GROUP BY CASE WHEN IsNull(Size, '') != '' THEN 'x' ELSE '' END 
, CASE WHEN IsNull(Color, '') != '' THEN 'x' ELSE '' END 
, CASE WHEN IsNull(Style, '') != '' THEN 'x' ELSE '' END
ORDER BY 1,2,3

The last query is probably the most important, as it shows how the products need to be configured into the target system:

As can be seen a product can have only Color, Color and Style, Size and Color, respectively no dimensions or all dimensions. It will be needed to define a Dimension Group for each of these cases (e.g. Col, ColSty, SizCol, SizColSty, None). (More information on this in a future post.)

Unfortunately, unless the target system is already in use, there are no values usually, though one can attempt entering a few representative values manually over the user interface, at least to see what tables get populated.

Step 2: Data Mapping

Once the main attributes from source and target were identified, one can create the mapping at attribute level between them. Typically, one includes all the relevant information for a migration, from table, attribute, description to attributes’ definition (e.g. type, length, precision, mandatory) in all the systems:

The mapping was kept to a minimum to display only the most relevant information. Except a warning concerning the length of an attribute, respectively a new attribute (the old item number), the mapping doesn’t involve any challenges.

A data dictionary or even a metadata repository for the involved systems can help in the process, otherwise one needs to access the information from the available documentation or system’s metadata and prepare the data manually.

The relevant metadata for D365 can be obtained from the Microsoft documentation. The data can be loaded into system via the EcoResProductV2Entity (see also data entities or the AX 2012 documentation for tables and enumeration data types).

Step 3: Building the source entity

AdventureWorks already provides a view which models the Products entity, though because of its structure it needs to suffer some changes, or sometimes more advisable, do the changes in a separate view as follows:

-- Products source entity 
CREATE VIEW Map.vProductDetails 
AS 
SELECT CASE WHEN Size<>'' THEN dbo.CutLeft(ProductNumber, '-',1) ELSE ProductNumber End ItemIdOld 
, CASE WHEN Size<>'' THEN dbo.CutLeft(Name, '-',1) ELSE Name End Name 
, row_number() OVER(PARTITION BY CASE WHEN Size<>'' THEN dbo.CutLeft(ProductNumber, '-',1) ELSE ProductNumber End ORDER BY ProductNumber) Ranking
, ProductNumber 
, Description 
, Color
, Size
, Style
, CultureId 
, Subcategory 
, Category
, MakeFlag
, FinishedGoodsFlag
, SellStartDate 
, SellEndDate 
, StandardCost 
, ListPrice 
, SafetyStockLevel 
, ReorderPoint 
FROM [Production].[vProductDetails]

-- reviewing the data 
SELECT *
FROM Map.vProductDetails 
WHERE CultureId = 'en'
ORDER BY ProductNumber

To prepare the data for the migration the Product Number as well the Name were stripped from the Size, this being done with the help of dbo.CutLeft function. The row_number ranking window function was used to allow later selecting the first size for a given Product.

The discovery process continues, this time in respect to the target. Its useful to understand for example whether a Product has more than one Color or Style, whether the prices vary between Sizes, whether attributes like the Subcategory are consistent between Sizes, etc. It’s useful to prove anything that could have impact on the migration logic. The list of tests will be extended while building the logic, as new information are discovered.

-- checking dimensions' definition
SELECT ItemidOld 
, count(Size) NoSizes
, count(DISTINCT Color) NoColors
, count(DISTINCT Style) NoStyles
FROM Map.vProductDetails 
WHERE CultureId = 'en'
  --AND Ranking = 1
GROUP BY ItemidOld
ORDER BY ItemidOld

-- checking the price variances between dimensions 
SELECT ItemidOld 
, Min(StandardCost) MinStandardCost
, Max(StandardCost) MaxStandardCost
FROM Map.vProductDetails 
WHERE CultureId = 'en'
  AND Ranking = 1
GROUP BY ItemidOld
HAVING Min(IsNull(StandardCost, 0)) != Max(IsNull(StandardCost, 0)) 
ORDER BY ItemidOld

-- checking attribute's consistency between dimensions 
SELECT ItemidOld 
, Min(Subcategory) MinSubcategory
, Max(Subcategory) MaxSubcategory
FROM Map.vProductDetails 
WHERE CultureId = 'en'
  AND Ranking = 1
GROUP BY ItemidOld
HAVING Min(IsNull(Subcategory, '')) != Max(IsNull(Subcategory, '')) 
ORDER BY ItemidOld

When the view starts performing poorly, for example because of the number of joins or data’s volume, it might me useful to dump the data in a table and perform the tests on it.
Even if it’s not maybe the case, it’s useful to apply defensive techniques in the logic by handing adequately the nulls.

Step 4: Implementing the Mapping

The attributes which need to be considered here are based on the target entities. It might be needed to include also attributes that are further needed to build the logic.

-- Product Mapping 
CREATE VIEW [Map].vEcoResProductV2Entity
AS
SELECT ProductId 
, ItemidOld ItemId 
, 'Item' ProductType 
, 'ProductMaster' ProductSubtype 
, Left(Replace(Name, ' ', ''), 20) ProductSearchName 
, ItemidOld ProductNumber 
, Name ProductName 
, Description ProductDescription 
, CASE 
 WHEN IsNull(Size, '') !='' AND IsNull(Color, '') !='' AND IsNull(Style, '')!='' THEN 'SizColSty'
 WHEN IsNull(Size, '') !='' AND IsNull(Color, '') !=''  THEN 'SizCol'

        WHEN IsNull(Style, '') !='' AND IsNull(Color, '') !=''  THEN 'ColSty'
 WHEN IsNull(Color, '') !='' THEN 'Col'
 WHEN IsNull(Style, '')!='' THEN 'Sty'

        WHEN IsNull(Size, '')!='' THEN 'Siz'
 ELSE 'None'
  END ProductDimensionGroupName 
, 'WHS' StorageDimensionGroupName 
, CASE WHEN MakeFlag = 1 THEN 'SN' ELSE '' END TrackingDimensionGroupName 
, 'PredefinedVariants' VariantConfigurationTechnology 
, Subcategory ProductCategory 
, ItemidOld ItemIdOld 
, 1 IsNewItem 
FROM Map.vProductDetails 
WHERE CultureId = 'en'
  AND Ranking = 1

The Dimenstion Group is based on the above observation. It was supposed that all Products have inventory (see Storage Dimension), while the manufactured products will get a Serial Number (see Tracking Dimension). IsNewItem will be used further to migrate deltas (and thus to partition migrations).

Step 5: Building the Target Entity

The target entity is in the end only a table in which usually are kept only the attributes in scope. In this case the definition is given by the following DDL:

CREATE TABLE Map.EcoResProductV2Entity(
 Id int IDENTITY(1,1) NOT NULL,
 ProductId int NULL,
 ProductType nvarchar(20) NULL,
 ProductSubtype nvarchar(20) NULL,
 ProductsearchName nvarchar(255) NULL,
 ProductNumber nvarchar(20) NOT NULL,
 ProductName nvarchar(60) NULL,
 ProductDescription nvarchar(1000) NULL,
 ProductDimensionGroupName nvarchar(50) NULL,
 StorageDimensionGroupName nvarchar(50) NULL,
 TrackingDimensionGroupName nvarchar(50) NULL,
 VariantConfigurationTechnology nvarchar(50) NULL,
 ProductCategory nvarchar(255) NULL,
 ItemIdOld nvarchar(20) NULL,
 IsNewItem bit,
 CONSTRAINT I_EcoResProductV2Entity PRIMARY KEY CLUSTERED 
(
 Id ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

The table was build to match the name and definition from the target systems. The definition is followed by a few inserts based on the logic defined in the previous step:

-- preparing the data for EcoResProductV2Entity 
INSERT INTO [Map].EcoResProductV2Entity 
SELECT ITM.ProductId
, ITM.ProductType
, ITM.ProductSubtype
, ITM.ProductsearchName
, ITM.ProductNumber
, ITM.ProductName
, ITM.ProductDescription
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.VariantConfigurationTechnology
, ITM.ProductCategory
, ITM.ItemIdOld 
, ITM.IsNewItem
FROM [Map].VEcoResProductV2Entity ITM
WHERE ITM.IsnewItem = 1
ORDER BY ProductType 
, ITM.ItemIdOld 

-- reviewing the data 
SELECT *
FROM [Map].EcoResProductV2Entity 
ORDER BY ItemId

The business might decide to take over the Product Number into the target system as unique identifier, though it’s not always the case. It might opt to create a new sequence number, which could start e.g. with 10000000 (8 characters). In such a case is changed only the logic for the Product Number, the value being generated using a ranking window function:

-- preparing the data for EcoResProductV2Entity  
DECLARE @StartItemId as int = 10000000
--INSERT INTO [Map].EcoResProductV2Entity 
SELECT ...
, @StartItemId + Rank() OVER(ORDER BY ITM.ProductType, ITM.ItemIdOld) ProductNumber
, ...
FROM [Map].VEcoResProductV2Entity ITM
WHERE ITM.IsnewItem = 1
ORDER BY ProductType 
, ITM.ItemIdOld

Step 5: Reviewing the Data

Before exporting the data it makes sense to review the data from various perspectives: how many Products of a certain type will be created, whether the current and old product numbers are unique, etc. The scripts make sure that the consistency of the data in respect to the future systems was achieved.

-- checking values' frequency (overview, no implications)
SELECT ITM.ProductType
, ITM.ProductSubtype
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.VariantConfigurationTechnology
, count(*) NoRecords
FROM [Map].EcoResProductV2Entity ITM
WHERE IsNewItem = 1
GROUP BY ITM.ProductType
, ITM.ProductSubtype
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.VariantConfigurationTechnology

-- check ProductNumber's uniqueness (no duplicates allowed)
SELECT ProductNumber
, Min(ItemidOld) 
, max(ItemIdold)
, count(*)
FROM [Map].EcoResProductV2Entity
GROUP BY ProductNumber
HAVING count(*)>1

-- check old Product's uniqueness (no duplicates allowed)
SELECT ItemIdOld
, count(*)
FROM [Map].EcoResProductV2Entity
GROUP BY ItemIdOld
HAVING count(*)>1

This section will grow during the implementation, as further entities will be added.

Step 6: Exporting the data

The export query is usually reflecting the entity and can include further data’s formatting, when needed:

-- Export Products
SELECT ITM.ProductType
, ITM.ProductSubtype
, ITM.ProductsearchName
, ITM.ProductNumber
, ITM.ProductName
, ITM.ProductDescription
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.VariantConfigurationTechnology
, ITM.ProductCategory RetailProductCategoryName 
, ITM.ItemIdOld
FROM [Map].EcoResProductV2Entity ITM
WHERE ITM.isNewItem = 1
ORDER BY ProductNumber

Depending on the import needs, the data can be exported to Excel or a delimited text file (e.g. “|” pipe is an ideal delimiter.

Step 7: Validating the Data before Import

Before importing the data into the target system, it makes sense to have the data checked by the business or consultants. A visual check at this stage can help save time later.

Step 8: Validating the Data after Import

Unfortunately Microsoft doesn’t allow direct access to the D365 Production database, however one can still access various tables and entities’ content via the table browser. Anyway, the validation usually takes place into the UAT (User Acceptance Testing) system. So, if everything went well into the UAT and all measures were taken to have the same parameters across all systems, there should be no surprises during Go-Live.

21 May 2020

💎🏭SQL Reloaded: Functions Useful in Data Migrations

Left and Right-Padding

Oracle has the LPAD and RPAD functions which return and expression, left-padded, respectively right-padded with the specified characters. The functions are useful in formatting unique identifiers to fit a certain format (e.g. 0000012345 instead of 12345). Here’re similar implementations of the respective functions:

-- left padding function 
CREATE FUNCTION dbo.LeftPadding( 
  @str varchar(50) 
, @length int  
, @padchar varchar(1))
RETURNS varchar(50)
AS 
BEGIN 
  RETURN CASE  
    WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
    ELSE @str  
    END 
END  

-- example left padding 
SELECT dbo.LeftPadding('12345', '10', '0')

-- right padding function 
CREATE FUNCTION dbo.RightPadding( 
  @str varchar(50) 
, @length int  
, @padchar varchar(1))
RETURNS varchar(50)
AS 
BEGIN 
  RETURN CASE  
    WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
	ELSE @str  
  END
END  

-- example right padding 
SELECT dbo.RightPadding('12345', '10', '0')

Left and Right Side

When multiple pieces of data are stored within same same attribute, it’s helpful to get the left, respectively the right part of the string based on a given delimiter, where the reverse flag tells the directions in which the left is applied:

-- left part function 
CREATE FUNCTION dbo.CutLeft( 
  @str varchar(max) 
, @delimiter varchar(1)
, @reverse bit = 0)
RETURNS varchar(max)
AS 
BEGIN 
  RETURN CASE  
     WHEN CharIndex(@delimiter, @str)>0 THEN 
       CASE @reverse 
         WHEN 0 THEN Left(@str, CharIndex(@delimiter, @str)-1)    
         ELSE Left(@str, Len(@str)-CharIndex(@delimiter, Reverse(@str))) 
       END
	 ELSE @str  
  END
END  

-- example left part 
SELECT dbo.CutLeft('12345,045,000', ',', 0)
, dbo.CutLeft('12345,045,000', ',', 1)



-- right part function 
CREATE FUNCTION dbo.CutRight( 
  @str varchar(max) 
, @delimiter varchar(1)
, @reverse bit = 0)
RETURNS varchar(max)
AS 
BEGIN 
  RETURN CASE  
    WHEN CharIndex(@delimiter, @str)>0 THEN 
      CASE @reverse 
        WHEN 0 THEN Right(@str, CharIndex(@delimiter, Reverse(@str))-1) 
        ELSE Right(@str, Len(@str)-CharIndex(@delimiter, @str)) 
      END
   ELSE @str  
  END
END  

-- example right part 
SELECT dbo.CutRight('12345,045,000', ',', 0)
, dbo.CutRight('12345,045,000', ',', 1)

Replacing Special Characters

Special characters can prove to be undesirable in certain scenarios (e.g. matching values, searching). Until a “Replace from” function will be made available, the solution is to include the replacements in a user-defined function similar with the one below:

DROP FUNCTION IF EXISTS [dbo].[ReplaceSpecialChars]

CREATE FUNCTION [dbo].[ReplaceSpecialChars](
@string nvarchar(max)
, @replacer as nvarchar(1) = '-'
) RETURNS nvarchar(max)
-- replaces special characters with a given character (e.g. an empty string, space)
AS
BEGIN   
  IF CharIndex('*', @string) > 0  
     SET @string = replace(@string, '*', @replacer)	
    
  IF CharIndex('#', @string) > 0  
     SET @string = replace(@string, '#', @replacer)	
    
  IF CharIndex('$', @string) > 0  
     SET @string = replace(@string, '$', @replacer)	
    
  IF CharIndex('%', @string) > 0  
     SET @string = replace(@string, '%', @replacer)	
    
  IF CharIndex('&', @string) > 0  
     SET @string = replace(@string, '&', @replacer)	
    
  IF CharIndex(';', @string) > 0  
     SET @string = replace(@string, ';', @replacer)	
    
  IF CharIndex('/', @string) > 0  
     SET @string = replace(@string, '/', @replacer)	
    
  IF CharIndex('?', @string) > 0  
     SET @string = replace(@string, '?', @replacer)	
    
  IF CharIndex('\', @string) > 0  
     SET @string = replace(@string, '\', @replacer)	
    
  IF CharIndex('(', @string) > 0  
     SET @string = replace(@string, '(', @replacer)	
    
  IF CharIndex(')', @string) > 0  
     SET @string = replace(@string, ')', @replacer)	
    
  IF CharIndex('|', @string) > 0  
     SET @string = replace(@string, '|', @replacer)	
    
  IF CharIndex('{', @string) > 0  
     SET @string = replace(@string, '{', @replacer)	
    
  IF CharIndex('}', @string) > 0  
     SET @string = replace(@string, '}', @replacer)	
    
  IF CharIndex('[', @string) > 0  
     SET @string = replace(@string, '[', @replacer)	
    
  IF CharIndex(']', @string) > 0  
     SET @string = replace(@string, ']', @replacer)	
                                
  RETURN (LTrim(RTrim(@string)))
END

SELECT [dbo].[ReplaceSpecialChars]('1*2#3$4%5&6;7/8?9\10(11)12|13{14}15[16]', '')
SELECT [dbo].[ReplaceSpecialChars]('1*2#3$4%5&6;7/8?9\10(11)12|13{14}15[16]', ' ')

Other type of special characters are the umlauts (e.g. ä, ß, ö, ü from German language):

DROP FUNCTION IF EXISTS [dbo].[ReplaceUmlauts]

CREATE FUNCTION [dbo].[ReplaceUmlauts](
@string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces umlauts with their equivalent
AS
BEGIN   
  IF CharIndex('ä', @string) > 0  
     SET @string = replace(@string, 'ä', 'ae')	
    
  IF CharIndex('ö', @string) > 0  
     SET @string = replace(@string, 'ö', 'oe')	
    
  IF CharIndex('ß', @string) > 0  
     SET @string = replace(@string, 'ß', 'ss')	
    
  IF CharIndex('%', @string) > 0  
     SET @string = replace(@string, 'ü', 'ue')	
    
                                
  RETURN Trim(@string)
END

SELECT [dbo].[ReplaceUmlauts]('Hr Schrötter trinkt ein heißes Getränk')

Handling Umlauts

Another type of specials characters are the letter specific to certain languages that deviate from the Latin characters (e.g. umlauts in German, accents in French), the usage of such characters introducing further challenges in handling the characters, especially when converting the data between characters sets. A common scenario is the one in which umlauts in ISO-8891-1 are encoded using two character sets. Probably the easiest way to handle such characters is to write a function as follows:

-- create the function 
CREATE FUNCTION [dbo].[ReplaceCodes2Umlauts](
  @string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces ISO 8859-1 characters with the corresponding encoding 
AS
BEGIN   
  IF CharIndex('ÃŸ', @string) > 0  
     SET @string = replace(@string, 'ÃŸ', 'ß')	

  IF CharIndex('Ã¶', @string) > 0  
     SET @string = replace(@string, 'Ã¶', 'ö')	

  IF CharIndex('Ã–', @string) > 0  
     SET @string = replace(@string, 'Ã–', 'Ö')	
    
  IF CharIndex('Ã¼', @string) > 0  
     SET @string = replace(@string, 'Ã¼', 'ü')	

  IF CharIndex('Ãœ', @string) > 0  
     SET @string = replace(@string, 'Ãœ', 'Ü')	

  IF CharIndex('Ã¤', @string) > 0  
     SET @string = replace(@string, 'Ã¤', 'ä')

  IF CharIndex('Ã„', @string) > 0  
     SET @string = replace(@string, 'Ã„', 'Ä')
                                
  RETURN (LTrim(RTrim(@string)))
END


--test the function 
SELECT [dbo].[ReplaceCodes2Umlauts]('Falsches Ãœben von Xylophonmusik quÃ¤lt jeden grÃ¶ßeren Zwerg')

In the inverse scenario, at least for the German language is possible to replace the umlauts with a set the corresponding transliterations:

-- drop the function 
DROP FUNCTION IF EXISTS [dbo].[ReplaceUmlauts]

-- create the function 
CREATE FUNCTION [dbo].[ReplaceUmlauts](
  @string nvarchar(max)
) RETURNS nvarchar(max)
-- replaces umlauts with corresponding transliterations 
AS
BEGIN   
  IF CharIndex('ß', @string) > 0  
     SET @string = replace(@string, 'ß', 'ss')	

  IF CharIndex('ü', @string) > 0  
     SET @string = replace(@string, 'ü', 'ue')	

  IF CharIndex('ä', @string) > 0  
     SET @string = replace(@string, 'ä', 'ae')	
    
  IF CharIndex('ö', @string) > 0  
     SET @string = replace(@string, 'ö', 'oe')	
                                
  RETURN (LTrim(RTrim(@string)))
END

--test the function 
SELECT [dbo].[ReplaceUmlauts]('Falsches üben von Xylophonmusik quält jeden größeren Zwerg')

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

💎🏭SQL Reloaded: Splitting a String (Before and After)

Starting with SQL Server 2016 Microsoft introduced the STRING_SPLIT table-valued function, which splits a string into rows of substrings, based on a specified separator character (e.g. “:”).

-- splitting a string (SQL Server 2016+)
SELECT *
FROM STRING_SPLIT('100001::M:black:', ':')

The table-valued function object allowed also earlier to implement the same functionality, either by looping through the string or by using a common table expression. Here’s the implementation based on a loop (it was modified from a previous version to include an index):

-- split table-valued function with an index
CREATE FUNCTION [dbo].[SplitListWithIndex]( 
  @ListValues varchar(500) 
, @Delimiter char(1)) 
RETURNS @Temp TABLE(
  Ind int
, Value varchar(50)) 
AS 
BEGIN 
 DECLARE @Index int 
 DECLARE @Length int 
 DECLARE @Ind int 

 SET @Index = CharIndex(@Delimiter, @ListValues) 
 SET @Length = Len(@ListValues) - @Index 
 SET @Ind = 1
   
 WHILE @Index > 0 --if the fatch was successful
 BEGIN 
  INSERT @Temp 
  VALUES(@Ind, Substring(@ListValues, 0, @Index)) 

  SET @ListValues = Substring(@ListValues, @Index+1, @Length) 
  SET @Index = CharIndex(@Delimiter, @ListValues) 
  SET @Length = @Length - @Index 
  SET @Ind = @Ind + 1
 END  
   
 INSERT @Temp 
 VALUES(@Ind, @ListValues) 
RETURN 
END  
GO

The function could be called in code same as the STRING_SPLIT:

-- splitting a string (SQL Server 2000+)
SELECT *
FROM dbo.SplitListWithIndex('100001::M:black:', ':')

The two functions are helpful when a column stores delimited values. It’s the case of Dynamics 365 which stores a SKU (Stock Keeping Unit) – the Product Numbers together with its Dimensions (ItemId, Configid, InventSizeId, InventColorId, StyleId) delimited by “:”, like in the above examples. Therefore, to parse the records one could write such code:

-- parsing delimited values (SQL Server 2000+)
SELECT DAT.ProductNumber
, Max(CASE WHEN LV.Ind = 1 THEN LV.Value END) ItemId
, Max(CASE WHEN LV.Ind = 2 THEN LV.Value END) Configid
, Max(CASE WHEN LV.Ind = 3 THEN LV.Value END) InventSizeId
, Max(CASE WHEN LV.Ind = 4 THEN LV.Value END) InventColorId
, Max(CASE WHEN LV.Ind = 5 THEN LV.Value END) StyleId
FROM ( VALUES ('100001::S:black:')
 , ('100001::M:black:')
 , ('100001::L:black:')
 , ('100001::XL:black:')
 , ('100001::S:white:')
 , ('100001::M:white:')
 , ('100001::L:white:')
 , ('100001::XL:white:')
) DAT (ProductNumber)
CROSS APPLY dbo.SplitListWithIndex(DAT.ProductNumber, ':') LV
GROUP BY DAT.ProductNumber
ORDER BY DAT.ProductNumber

Similar output can be obtained via the STRING_SPLIT with the help of row_number() ranking window function introduced with SQL Server 2005:

-- parsing delimited values (SQL Server 2016+)
SELECT LV.ProductNumber
, Max(CASE WHEN LV.Ind = 1 THEN LV.Value END) ItemId
, Max(CASE WHEN LV.Ind = 2 THEN LV.Value END) Configid
, Max(CASE WHEN LV.Ind = 3 THEN LV.Value END) InventSizeId
, Max(CASE WHEN LV.Ind = 4 THEN LV.Value END) InventColorId
, Max(CASE WHEN LV.Ind = 5 THEN LV.Value END) StyleId
FROM (
 SELECT DAT.ProductNumber
 , XT.VALUE
 , ROW_NUMBER() OVER (PARTITION BY DAT.ProductNumber ORDER BY DAT.ProductNumber) Ind
 FROM ( VALUES ('100001::S:black:')
  , ('100001::M:black:')
  , ('100001::L:black:')
  , ('100001::XL:black:')
  , ('100001::S:white:')
  , ('100001::M:white:')
  , ('100001::L:white:')
  , ('100001::XL:white:')
 ) DAT (ProductNumber)
 CROSS APPLY STRING_SPLIT(DAT.ProductNumber, ':') XT
) LV
GROUP BY LV.ProductNumber
ORDER BY LV.ProductNumber

As can be seen the introduction of an index into the dbo.SplitListWithIndex function simplified the code, making the use of a ranking window function unnecessary. It would be useful for the STRING_SPLIT to provide the same, as this time of processing is pretty common.
Here’s another example based on the PIVOT clause introduced also in SQL 2005:

-- parsing delimited values (SQL Server 2016+)
SELECT P.ProductNumber
, Cast(Trim([1]) as nvarchar(20)) ItemId
, Cast(Trim([2]) as nvarchar(20)) ConfigId
, Cast(Trim([3]) as nvarchar(20)) InventSizeid
, Cast(Trim([4]) as nvarchar(20)) InventColorId
, Cast(Trim([5]) as nvarchar(20)) StyleId
FROM (
 SELECT DAT.ProductNumber
 , XT.VALUE
 , ROW_NUMBER() OVER (PARTITION BY DAT.ProductNumber ORDER BY DAT.ProductNumber) Ranking
 FROM ( VALUES ('100001::S:black:')
  , ('100001::M:black:')
  , ('100001::L:black:')
  , ('100001::XL:black:')
  , ('100001::S:white:')
  , ('100001::M:white:')
  , ('100001::L:white:')
  , ('100001::XL:white:')
 ) DAT (ProductNumber)
 CROSS APPLY STRING_SPLIT(DAT.ProductNumber, ':') XT
) DAT
PIVOT (MAX(DAT.[VALUE]) FOR DAT.Ranking IN ([1],[2],[3],[4],[5])) P

Notes:
The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code.

Happy coding!

📦Data Migrations (DM): In-house Built Solutions (Part III: The Data Preparation Layer)

Data Migrations Series

Once the source data (including the data needed for enrichment) were made available into the migration database, one can model the source entities by encapsulating the logic in views or table-valued functions (TVFs). This enables code’s maintainability (by providing better visibility over the transformations), reuse (various validations are necessary), and performance (by taking advantage of RDBMS native functionality), and flexibility in changing the code. The objects thus created can be used in a new set of similar objects supposed to contain the mapping logic between the source and target data models. One attempts thus to prepare the data as needed by the target system.

As each target entity is modelled, it’s useful to dump the resulting data into tables, which will be further used as source for the further logic, instead of using directly the views or TVFs. This allows to keep a copy of the data, perform a range of validations, and most important, can provide better performance as indexes can be built on the tables. In addition, one can further manipulate the data in tables as requested, e.g. by including information which are later available (e.g. attributes from the target system) or, for testing or correcting the data without affecting the built logic. From the same reasons such tables can be used in intermediate steps of the migration, inclusively when modelling the source data entities. However, one should avoid their excessive use, as this can complicate the architecture unnecessarily.

The last step into the preparation layer is to prepare queries which only select the attributes needed by the target system and include additional formatting. It’s in general recommended to detach the formatting from other transformations as this approach provides better flexibility in addressing the migration requirements. The data can be afterwards exported manually or via an automated job, the latter approach being recommended especially when the data need to be partitioned.

At this stage, after validating the data, they can be imported into the target system via the mechanisms available. Until here lies in theory the boundary of the migration logic, however this layer can be extended for data validation purposes. It would be helpful for example to assure that the data imported into the target entirely reflect the prepared data. It can happen that during import data are truncated, incorrectly imported (wrong attribute, values are changed or incorrectly mapped) or even whole records not being imported, with impact on data consistency.

After importing the data into the target system one can import the migrated data via ETL packages back into the migration database and build queries which match the data at attribute level. This step may seem redundant, though it’s a way to assure that the migration occurred according to the expectations and minimize thus the later surprises. If not for all entities, this type of import might be the easiest solution for importing data into the logic (e.g. when identity values need to be mapped into the logic after migrating an entity).

It can be also helpful to import the tables for dropdown values (typical parameters), to assure thus that the values used for parameterizing the system were built into the migration logic as expected. It may sound surprising that not all systems perform checks by imports or these checks were disabled for other reasons.

In data migrations is recommended to assure data internal consistency by design. Even if various validations for uniqueness, completeness, consistency, conformity, timeliness, referential integrity or even accuracy might seem as redundant or involve extra work, on the long run they pay-off as they allow trapping the issues early in the process.

📦Data Migrations (DM): In-house Built Solutions (Part II: The Import Layer)

Data Migrations Series

A data migration involves the mapping between two data models at (data) entity level, where the entity is a data abstraction modelling a business entity (e.g. Products, Vendors, Customers, Sales Orders, Purchase Orders, etc.). Thus, the Products business entity from the source will be migrated to a similar entity into the target. Ideally, the work would be simplified if the two models would provide direct access to the data through entities. Unfortunately, this is seldom the case, the entities being normalized and thus broken into several tables, with important structural differences.

Therefore, the first step within a DM is identifying the business entities that make its scope from source and target, and providing a mapping between their attributes which will define how the data will flow between source and target.

In theory, the source entity could be defined directly into the source with the help of views, if they are not already available. The problem with this approach is that the base data change, fact that can easily lead to inconsistencies between the various steps of the migration. For example, records are added, deleted, inactivated, or certain attributes are changed, fact that can easily make troubleshooting and validation a nightmare.

The easiest way to address this is by assuring that the data will change only when actually needed. Is needed thus to create a snapshot of the data and work with it. Snapshots can become however costly for the performance of the source database, as they involve an additional maintenance overhead. Another solution is to make the snapshot via a backup or by copying the data via ETL functionality into another database (aka migration database). Considering that the data in scope make a small subset, a backup is usually costly as storage space and time, and is not always possible to take a backup when needed.

An ETL-based solution for this provides an acceptable performance and is flexible enough to address all important types of requirements. The data can be accessed directly from the source (pull mechanism) or, when the direct access is an issue, they could be pushed to the migration database (push mechanism) or made available for load to a given location, then imported it into the migration database (hybrid mechanism). There’s also the possibility to integrate the migration database when a publisher/subscriber mechanism is in place, however such solutions raise other types of issues.

One can import the tables 1:1 from the source for the entities in scope, attempt directly to model the entity within the ETL jobs or find a solution in between (e.g. import the base tables while considering joining also the dropdown tables). The latter seems to provide the best approach because it minimizes the numbers of tables to be imported while still reflecting the data structures from the source. An entity-based import addresses the first but not the second aspect, though depending on the requests it can work as well.

In Data Warehousing (DW) there’s the practice to load the data into staging tables with no constraints on them, and only when the load is complete to move the data into the base tables which will be used as source for the further processing. This approach assures that the data are loaded completely and that the unavailability of the base tables is limited. In contrast to DW solutions is ideally not to perform any transformations on the data, as they should reflect the quality characteristics from the source. It's ideal to keep the data extraction, respectively the ETL jobs as simple as possible and resist building the migration logic already into this layer.

💼Project Management: Project Planning (Part III: Planning Correctly Misunderstood III)

One of the most misunderstood topics in Project Management seems to be the one of planning, and this probably because everyone has a good idea of what it means to plan an activity – we do it daily and most of the times (hopefully) we hit a bull’s-eye (or we have the impression we did that). You must do this and that, you have that dependency, you must coordinate with a few people, you must first reach that milestone before going further, you do one step at a time, and so on. It’s pretty easy, isn’t it?

From a bird’s eyes view project planning is like planning every other activity though there are several important differences. The most important one is of scale – the number of activities and resources involved, the level of coordination and communication, as well the quality with which occur, the level of uncertainty and control, respectively manageability. All these create a complexity that is hardly manageable by just one person.

Another difference is the detail needed for the planning and targets’ reachability. Some believe that the plan needs to be done down to the lowest level of detail, which even if possible can prove to be an impediment to planning. Projects’ environment share some important characteristics with a battle field in terms of complexity of interactions, their dynamics and logistical requirements. Within an army’s structure there are levels of organization that require different mindsets and levels of planning. A general thinks primarily at strategic level in which troops and actions are seen as aggregations at the needed level of abstraction that makes their organization and planning manageable. The strategy is done however in collaboration with other generals and upper structures, while having defined the strategic goals the general must devise together with the immediate subalterns the tactics. In theory the project manager must regard the project from the same perspective. Results thus three levels of planning – strategic, done with the upper management, tactical done with the team members, respectively logistical, done within the team. That’s a way of breaking the complexity and dividing the responsibilities within the project.

Projects’ final destination seem to have the character of a wish list more or less anchored in reality. From a technical point the target can be achievable though in big projects the most important challenges are of organizational nature – of being able to allocate and coordinate effectively the resources as needed by the project. The wish-like character is reflected also by the cost, scope, time triangle in respect to the expected quality – to some point in time one is forced to choose between two of them. On the other side, there’s the tendency to see the targets and milestones as fixed, with little room for deviation. One can easily forget that a strategic plan’s purpose is to set the objectives, identify the challenges and the possible lines of action, while a tactical plan’s objective is to devise the means to reach the objectives. Bringing everything together can easily obscure the view and, in extremis, the plan loses its actuality as soon was created (and approved).

The most confusing aspect is probably the adherence of a plan to a given methodology, one dicing a project and thus a plan to fit a methodology by following blindly the rules and principles imposed by it instead of fitting the methodology to a project. Besides the fact that the methodologies are best practices but not necessarily good practices, what fits for an organization, they tend to be either too general, by specifying the what and not the how, or too restrictive (interpreted).