Showing posts with label data entities. Show all posts
Showing posts with label data entities. Show all posts

07 August 2024

🧭Business Intelligence: Perspectives (Part II: From Data to Data Models)

Business Intelligence Series
Business Intelligence Series

A data model can be defined as an abstract, self-contained, logical definition of the data structures available in a database or similar repositories. It’s typically an abstraction of the data structures underpinning a set of processes, procedures and business logic used for a predefined purpose. A data model can be formed also of unrelated micromodels, depicting thus various aspects of a business. 

The association between data and data models is bidirectional. Given a set of data, a data model can be built to underpin the respective data. Conversely, one can create or generate data based on a data model. However, in business setups a bidirectional relationship between data and the data model(s) underpinning them is more realistic as the business evolves. In extremis, the data model can be used to reflect a business’ needs, at least when the respective needs are addressed accordingly by extending the data model(s).

Given a set of data (e.g. the data stored in one or more spreadsheets or other type of files) there can be defined in theory multiple data models to reflect the respective data. Within a data model, the fields (aka attributes) are partitioned into a set of data entities, where a data entity is thus a nonunique grouping of attributes that attempt to define together one unitary aspect of the world. Customers, Vendors, Products, Invoices or Sales Orders are examples of such data entities, though entities can have a broader granularity (e.g. Customers can be modeled over several tables like Entity, Addresses, Contact information, etc.). 

From an operational database’s perspective, a data entity is based on one or more tables, though several entities can share some of the tables. From a BI artifact’s perspective, an entity should be easy to create from the underlying tables, with a minimal set of transformations. Ideally, the BI data model should be as close as possible to the needed entity for reporting, however an optimal solution lies usually somewhere in between. In this resides the complexity of modeling BI solutions – providing an optimal data model which can be easily built on the source tables, and which allows addressing all or at least most of the BI requirements.

In other words, we deal with two optimization problems of two distinct data models. On one side the business data model must be flexible enough to provide fast read/write operations while keeping the referential data’s granularity efficient. Conversely, a BI data model needs to abstract these entities and provide a fast way of processing the data, while making data reads extremely efficient. These perspectives must apply when we move to Microsoft Fabric too. 

The operational data layer must provide this abstraction, and in this resides the complexity of building optimal BI solutions. This is the layer at which the modeling problems need to be tackled. The challenge of BI and Analytics resides in finding an optimal data model that allows us to address most or ideally all the BI requirements. Several overlapping layers of abstraction may be built in the process.

Looking at the data modeling techniques used in notebooks and other similar solutions, data modeling has the chance of becoming a redundant practice prone to errors. Moreover, data models have a tendency of being multilayered and of being based on certain perspectives into the processes they model. Providing reliable flexible models involves finding the right view into the data for modeling aspects of the business. Database views allow us to easily model such perspectives, often in a unique way. Moving away from them just shifts the burden on the multiple solutions built around the base data, which can create other important challenges. 

Previous Post <<||>> Next Post

10 April 2024

🧭Business Intelligence: Perspectives (Part I: Ways of Thinking about Data)

Business Intelligence Series

One can observe sometimes the tendency of data professionals to move from a business problem directly to data and data modeling without trying to understand the processes behind the data. One could say that the behavior is driven by the eagerness of exploring the data, though even later there are seldom questions considered about the processes themselves. One can argue that maybe the processes are self-explanatory, though that’s seldom the case. 

Conversely, looking at the datasets available on the web, usually there’s a fact table and the associated dimensions, the data describing only one process. It’s natural to presume that there are data professionals who don’t think much about, or better said in terms of processes. A similar big jump can be observed in blog posts on dashboards and/or reports, bloggers moving from the data directly to the data model. 

In the world of complex systems like Enterprise Resource Planning (ERP) systems thinking in terms of processes is mandatory because a fact table can hold the data for different processes, while processes can span over multiple fact-like tables, and have thus multiple levels of detail. Moreover, processes are broken down into sub-processes and procedures that have a counterpart in the data as well. 

Moreover, within a process there can be multiple perspectives that are usually module or role dependent. A perspective is a role’s orientation to the word for which the data belongs to, and it’s slightly different from what the data professional considers as view, the perspective being a projection over a set of processes within the data, while a view is a projection of the perspectives into the data structure. 

For example, considering the order-to-cash process there are several sub-processes like order fulfillment, invoicing, and payment collection, though there can be several other processes involved like credit management or production and manufacturing. Creating, respectively updating, or canceling an order can be examples of procedures. 

The sales representative, the shop worker and the accountant will have different perspectives projected into the data, focusing on the projection of the data on the modules they work with. Thinking in terms of modules is probably the easiest way to identify the boundaries of the perspectives, though the rules are occasionally more complex than this.

When defining and/or attempting to understand a problem it’s important to understand which perspective needs to be considered. For example, the sales volume can be projected based on Sales orders or on invoiced Sales orders, respectively on the General ledger postings, and the three views can result in different numbers. Moreover, there are partitions within these perspectives based on business rules that determine what to include or exclude from the logic. 

One can define a business rule as a set of conditional logic that constraints some part of the data in the data structures by specifying what is allowed or not, though usually we refer to a special type called selection business rule that determines what data are selected (e.g. open Purchase orders, Products with Inventory, etc.). However, when building the data model we need to consider business rules as well, though we might need to check whether they are enforced as well. 

Moreover, it’s useful to think also in terms of (data) entities and sub-entities, in which the data entity is an abstraction from the physical implementation of database tables. A data entity encapsulates (hides internal details) a business concept and/or perspective into an abstraction (simplified representation) that makes development, integration, and data processing easier. In certain systems like Dynamics 365 is important to think at this level because data entities can simplify data modelling considerably.

Previous Post <<||>> Next Post

04 March 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part V: Domains and the Data Mesh I -The Challenge of Structure Matching)

Business Intelligence Series
Business Intelligence Series

The holy grail of building a Data Analytics infrastructure seems to be nowadays the creation of a data mesh, a decentralized data architecture that organizes data by specific business domains. This endeavor proves to be difficult to achieve given the various challenges faced  – data integration, data ownership, data product creation and ownership, enablement of data citizens, respectively enforcing security and governance in a federated manner. 

Microsoft Fabric promises to facilitate the creation of data mashes with the help of domains and subdomain by providing built-in security, administration, and governance features associated with them. A domain is a way of logically grouping together all the data in an organization that is relevant to a particular area or field. A subdomain is a way for fine tuning the logical grouping of the data.

Business domains
Business domains & their entities

At high level the challenge of building a data mesh is on how to match or aggregate structures. On one side is the high-level structure of the data mesh, while on the other side is the structure of the business data entities. The data entities can be grouped within a taxonomy with multiple levels that expands to the departments. That’s why it seems somehow natural to consider the departments as the top-most domains of the data mesh. The issue is that if the segmentation starts from a high level, iI becomes inflexible in modeling. Moreover, one has only domains and subdomains, and thus a 2-level structure to model the main aspects of the data mesh.

Some organizations allow unrestricted access to the data belonging to a given department, while others breakdown the access to a more granular level. There are also organizations that don’t restrict the access at all, though this may change later. Besides permissions and a way of grouping together the entities, what value brings to set the domains as departments? 

Therefore, I’m not convinced about using an organizations’ departmental structure as domains, especially when such a structure may change and this would imply a full range of further changes. Moreover, such a structure doesn’t reflect the span of processes or how permissions are assigned for the various roles, which are better reflected on how information systems are structured. Most probably the solution needs to accommodate both perspective and be somehow in the middle. 

Take for example the internal structure of the modules from Dynamics 365 (D365). The Finance area is broken down in Accounts Payable, Accounts Receivables, Fixed Assets, General Ledger, etc. In some organizations the departments reflect this delimitation to some degree, while in others are just associated with finance-related roles. Moreover, the permissions are more granular and, reflecting the data entities the users work with. 

Conversely, SCM extends into Finance as Purchase orders, Sales orders and other business documents are the starting or intermediary points of processes that span modules. Similarly, there are processes that start in CRM or other systems. The span of processes seem to be more appropriate for structuring the data mesh, though the system overlapping with the roles involved in the processes and the free definition of process boundaries can overcomplicate the whole design.

It makes sense to define the domains at a level that resembles the structure of the modules available in D365, while the macro data-entities represent the subdomain. The subdomain would represent then master as well as transactional data entities from the perspective of the domains, with there will be entities that need to be shared between multiple domains. Such a structure has less chances to change over time, allowing more flexibility and smaller areas of focus and thus easier to design, develop, test, deploy and maintain.

Previous Post <<||>> Next Post

14 February 2024

🧭Business Intelligence: A One-Man Show (Part VI: The Lakehouse Perspective)

Business Intelligence Suite
Business Intelligence Suite

Continuing the ideas on Christopher Laubenthal's article "Why one person can't do everything in the data space" [1] and why his analogy between a college's functional structure and the core data roles is poorly chosen. In the last post I mentioned as a first argument that the two constructions have different foundations.

Secondly, it's a matter of construction, namely the steps used to arrive from one state to another. Indeed, there's somebody who builds the data warehouse (DWH), somebody who builds the ETL/ELT pipelines for moving the data from the sources to the DWH, somebody who builds the sematic data model that includes business related logic, respectively people who tap into the data for reporting, data visualizations, data science projects, and whatever is still needed in the organization. On top of this, there should be somebody who manages the DWH. I haven't associated any role to them because one of the core roles can be responsible for more than one step. 

In the case of a lakehouse, it is the data engineer who moves the data from the various data sources to the data lake if that doesn't happen already by design or configuration. As per my understanding the data engineers are the ones who design and build the new lakehouse, move transform and manage the data as required. The Data Analysts, Data Scientist and maybe some Information Designers can tap then into the data. However, the DWH and the lakehouse(s) are technologies that facilitate their work. They can still do their work also if the same data are available by other means.

In what concerns the dorm analogy, the verbs were chosen to match the way data warehouses (DWH) or lakehouses are built, though the congruence of the steps is questionable. One could have compared the number of students with the numbers of data entities, but not with the data themselves. Usually, students move by themselves and occupy the places. The story tellers, the assistants and researchers are independent on whether the students are hosted in the dorm or not. Therefore, the analogy seems to be a bit forced. 

Frankly, I covered all the steps except the ones related to Data Science by myself for both described scenarios. It helped that I knew the data from the data sources and the transformations rules I had to apply, respectively the techniques needed for moving and transforming the data, and the volume of data entities was manageable somehow. Conversely, 1-2 more resources in the area of data analysis and visualizations could have helped to bring more value to the business. 

This opens the challenge of scale and it has do to with systems engineering and how the number of components and the interactions between them increase systems' complexity and the demand for managing the respective components. In the simplest linear models, for each multiplier of a certain number of components of the same type from the organization, the number of resources managing the respective layer matches to some degree the multiplier. E.g. if a data engineer can handle x data entities in a unit of time, then for hand n*x components are more likely at least n data engineers required. However, the output of n components is only a fraction of the n*x given the dependencies existing between components and other constraints.

An optimization problem resumes in finding out what data roles to chose to cover an organization's needs. A one man show can be the best solution for small organizations, though unless there's a good division of labor, bringing a second person will make the throughput slower until will become faster.

Previous Post <<|||>> Next Post

Resources:
[1] Christopher Laubenthal (2024) "Why One Person Can’t Do Everything In Data" (link)

10 October 2023

🪄💫SSRS (& Paginated Reports): Products Master Report in Dynamics 365 for Finance & Operations via Base Tables

As mentioned in the previous post, building the Products Master paginated report via the base D365 FO tables synchronized in Data Lake involves more effort and implies duplicating the logic from the used data entity(-ies). One can attempt recreating the entities 1:1 as they are defined in the D365 database (an SQL Server Azure database), however there are high the chances that in certain areas (e.g. financial dimensions) the features used in building the logic are not supported. Moreover, it might be the case that only a subset of the functionality was implemented, and with this, pieces of logic becoming obsolete. 

There's also a limitation on the number of tables which can be synchronized, and therefore one needs to compromise. In the dbo.EcoResProductV2Entity there are about 20 tables involved, though probably only half of them are used. The base table dbo.EcoResProduct contains most of the fields available also in the data entity, though the names for the various attributes need to be retrieved from the various tables. 

Some of the issues met in working with data entities appear in this scenario as well: (2) there are fields based on Enums that store only the code and it's needed to map the corresponding values, (4) further transformations are needed (e.g., converting fields to formats). It makes sense to encapsulate similarly the logic in views, though this time one or two layers of logic are needed in addition. One can choose to do this for each table in scope, or replicate data entity's logic by focusing only on the needed parts. Both approaches will prove to be challenging in certain scenarios. Let's consider the latter:

CREATE OR ALTER VIEW TDM.vEcoResProductV2Entity
AS 
/*
name:  Products - Master (base view)
created: 01.04.2021
modified: 01.04.2021
*/
SELECT ITM.ProductType
, CASE ITM.ProductType
    WHEN 1 THEN 'Item'
	WHEN 2 THEN 'Service'
  END ProductTypeName
, CAST ((CASE WHEN ITM.InstanceRelationType = 13678 THEN 1 ELSE 2 END) AS INT) ProductSubtype
, CASE 
    WHEN ITM.InstanceRelationType = 13678 THEN 'Product'
	ELSE 'Product Master'
   END ProductSubtypeName
, ITM.DisplayProductNumber ProductNumber
, Replace(Replace(ILT.Name, char(10), ' '), char(13), ' ')  ProductName
, Replace(Replace(ILT.Description, char(10), ' '), char(13), ' ') ProductDescription
, PCA.CategoryName RetailProductCategoryName 
, PCA.CategoryCode RetailProductCategoryCode
, PDG.Name ProductDimensionGroupName 
, SDG.Name StorageDimensionGroupName 
, TDG.Name TrackingDimensionGroupName 
, ITM.RetailColorGroupId ProductColorGroupId 
, ITM.RetailSizeGroupId ProductSizeGroupId 
, ITM.RetailStyleGroupId ProductStyleGroupId
, ITM.VariantConfigurationTechnology
, CASE ITM.VariantConfigurationTechnology
	WHEN 0 THEN 'None'
	WHEN 1 THEN 'Predefined Variants'
    WHEN 2 THEN 'Dimension Based'
	WHEN 3 THEN 'RuleBased'
	WHEN 4 THEN 'Constraint Based'
  END VariantConfigurationTechnologyName
, CASE WHEN KIT.ProductMaster IS NOT NULL THEN 1 ELSE 0 END IsProductKit
, CASE WHEN KIT.ProductMaster IS NOT NULL  THEN 'Yes' ELSE 'No' END IsProductKitName
, ITM.PDSCWProduct IsCatchWeightproduct
, CASE ITM.PDSCWProduct WHEN 1 THEN 'Yes' ELSE 'No' END IsCatchWeightproductName
, ITM.IsProductVariantUnitConversionEnabled
, CASE ITM.IsProductVariantUnitConversionEnabled WHEN 1 THEN 'Yes' ELSE 'No' END IsProductVariantUnitConversionEnabledName
-- system
, PPD.ProductDimensionGroup ProductDimensionGroupRecId
, PSG.StorageDimensionGroup StorageDimensionGroupRecId
, PTD.TrackingDimensionGroup TrackingDimensionGroupRecId
, PCA.RetailCategoryRecId
, ITM.RecId
, ITM.Partition
FROM dbo.EcoResProduct ITM
     LEFT JOIN dbo.EcoResProductTranslation ILT
	   ON ITM.RecId = ILT.Product 
	  AND ITM.Partition = ILT.Partition 
	  AND ILT.LanguageId = 'en-us'
     LEFT JOIN dbo.EcoResProductDimensionGroupProduct PPD
	   ON ITM.RecId = PPD.Product
	  AND ITM.Partition = PPD.Partition
	      LEFT JOIN dbo.EcoResProductDimensionGroup PDG
		    ON PPD.ProductDimensionGroup = PDG.RecId 
		   AND PPD.Partition = PDG.Partition 
	 LEFT JOIN dbo.EcoResStorageDimensionGroupProduct PSG
	   ON ITM.RecId = PSG.Product
	  AND ITM.Partition = PSG.Partition
	      LEFT JOIN dbo.EcoResStorageDimensionGroup SDG
		    ON PSG.StorageDimensionGroup = SDG.RecId 
		   AND PSG.Partition = SDG.Partition 
	 LEFT JOIN dbo.EcoResTrackingDimensionGroupProduct PTD
	   ON ITM.RecId = PTD.Product
	  AND ITM.Partition = PTD.Partition
	      LEFT JOIN dbo.EcoResTrackingDimensionGroup TDG
		    ON PTD.TrackingDimensionGroup = TDG.RecId 
		   AND PTD.Partition = TDG.Partition 
	 LEFT JOIN (-- product retail category 
	   SELECT PCA.Product 
	   , PCA.Partition 
	   , CAT.Code CategoryCode 
	   , CAT.Name CategoryName
	   , PCA.Category RetailCategoryRecId
	   FROM dbo.EcoResProductCategory PCA
	        JOIN dbo.EcoResProductRetailCategoryHierarchy RCH
			  ON PCA.CategoryHierarchy = RCH.RetailCategoryHierarchy
			 AND PCA.Product = RCH.Product 
			 AND PCA.Partition = RCH.Partition 
			JOIN dbo.EcoResCategory CAT
			  ON PCA.Category = CAT.RecId 
			 AND PCA.Partition = CAT.Partition 
		 ) PCA
	   ON ITM.RecId = PCA.Product 
	  AND ITM.Partition = PCA.Partition 
	  LEFT JOIN dbo.RetailKit KIT
	    ON ITM.RecId = KIT.ProductMaster 
	   AND ITM.Partition = KIT.Partition
WHERE NOT(ITM.InstanceRelationType  =  4211)

As can be seen, the logic is quite complex and only half of the tables were used. There will be entities even more complex than this (the query was restructured for understandability). On the other hand, there will be environments where only half from the above tables will be used (e.g., when no inventory and/or dimensions are needed). 

To test the view, one just needs to change the schema referenced in the logic. The view is created under the TDM (Table Data Model) schema, so there should be no issues as long the schemas are used correctly. However, when duplicating logic, one should check whether the values match, respectively whether the objects have the same record count:

-- checking whether the values match for an example
SELECT * 
FROM TDM.vEcoResProductV2Entity
WHERE ProductNumber = '0169'

SELECT * 
FROM EDM.vEcoResProductV2Entity
WHERE ProductNumber = '0169'


-- checking the number of records
SELECT count(*)
FROM TDM.vEcoResProductV2Entity

SELECT count(*)
FROM EDM.vEcoResProductV2Entity

There are several optimizations or restructuring opportunities in the logic, e.g., the 'product retail category' subquery can be created as a separate view. Also, the groupings of two tables for the 'dimension groups' could be encapsulated individually in views. This can increase the number of views created considerably, though upon case the views could prove to be useful for troubleshooting or reuse. 

Moreover, in environments with only a partition the constraints on the respective field could be removed (important decision though), while the value for the retail category could be hardcoded. Even if hardcoding values should be in general avoided, it's needed to compromise when there are more important constraints. 

One way to reduce logic's complexity is to create a "base view" for each table in which the needed transformations are made, respectively only the needed columns are used. Reducing the number of columns simplifies the identification of attributes. For example, the base view for dbo.EcoResProduct could be written as follows:

CREATE OR ALTER VIEW [TDM].[vEcoResProduct]
AS
/*
name: Products - master data (base view)
created: 01.04.2021
modified: 01.04.2021
*/
SELECT PRD.DisplayProductNumber ProductNumber
, PRD.ProductType 
, CASE PRD.ProductType
    WHEN 1 THEN 'Product'
    WHEN 2 THEN 'ProductMaster'
    WHEN 3 THEN 'ProductVariant'
  END ProductTypeName
, CASE WHEN PRD.INSTANCERELATIONTYPE = 15969 THEN 1 ELSE 2 END ProductSubtype 
, CASE 
    WHEN PRD.INSTANCERELATIONTYPE = 15969 THEN 'Item'
    ELSE 'Product master'  
   END ProductSubtypeName
, PRD.RetailSizeGroupId
, PRD.RetailColorGroupId
, PRD.RetailStyleGroupId
, PRD.RetailFlavorGroupId
, PRD.VariantConfigurationTechnology 
, CASE PRD.VariantConfigurationTechnology
    WHEN 0 THEN 'None'
    WHEN 1 THEN 'PredefinedVariants'
    WHEN 2 THEN 'DimensionBased'
    WHEN 3 THEN 'RuleBased'
    WHEN 4 THEN 'ConstraintBased'
  END VariantConfigurationTechnologyName
, PRD.IsProductVariantUnitConversionEnabled
, CASE PRD.IsProductVariantUnitConversionEnabled WHEN 1 THEN 'Yes' ELSE 'No' END IsProductVariantUnitConversionEnabledName
-- system
, PRD.RecId 
, PRD.Partition 
, PRD.ModifiedBy
FROM dbo.EcoResProduct PRD
WHERE NOT(PRD.InstanceRelationType  =  4211)
GO

A similar approach can be used for each table, or at least the tables that need further transformations. There are several choices - ideally one should find a good-enough approach and stick to it. There will be also exceptions, though the general design should prevail!

The bottom line, when using the Export to Data Lake add-in (*), as only the export of tables is supported, the logic can become occasionally complex, though still manageable (in other areas the tables are straightforward to use). When using Link to Data Lake, one can rely on data entities, as long they are synchronized correctly. However, one still needs to rely on tables to fill the gaps. 

Note:
(*) Existing customers have until 1-Nov-2024 to transition from Export to Data lake to Synapse link. Microsoft advises new customers to use Synapse Link.

Happy coding!

Previous Post << ||>> Next Post

09 October 2023

🪄💫SSRS (& Paginated Reports): Products Master Report in Dynamics 365 for Finance & Operations via Data Entities

With the Export to Data Lake add-in (*), respectively with Azure Synapse Link for Dataverse with Azure Data Lake, it's possible now to create Power BI paginated reports based directly on the Dynamics 365 Finance & Operations (D365 FO) data model as available in the Data Lake. With the add-in it's possible to export the tables from D365 FO, while with the second option it's possible to export data entities and hopefully soon base tables as well. 

The two features can be used in parallel in the same environment and even if they are based on different mechanisms, they synchronize D365 FO data to the Data Lake in standard data files that can be further consumed via a serverless/dedicated SQL server pool by reporting tools. Thus, one can build a data lakehouse/warehouse on top of the data for reporting. At least this was the approach before Microsoft Fabric. 

So, in the end a developer will have the option to choose between base tables and data entities, as synchronized to the Data Lake. Data entities encapsulate the logic as used by the D365 FO application and can thus allow developing a report with the minimum of overhead, at least in theory. Unfortunately, there are other downsides that make the use of base tables a better approach, at least for certain scenarios (e.g. data entities are not available for synchronization or doesn't include all needed fields). 

For the developer of reports the synchronization mechanism is less important. A set of database objects will be available, and reports can be built on top of them. Ideally, the base objects will have the same or similar names as in D365 FO, otherwise the differences between the two data models need to be translated in each piece of logic, which adds more complexity and overhead to the development.

Thus, to build a report to show the Product master data the developer can use the dbo.EcoResProductV2Entity data entity. Unfortunately, there are several important issues: (1) the name of the source entity can change as versions change, (2) there are fields based on Enums that store only the code and it's needed to map the corresponding values, (3) some fields from the base table need to be made available, (4) further transformations are needed (e.g., converting fields to formats). Thus, it's useful to encapsulate the logic into a view, when creating views is possible:

CREATE OR ALTER VIEW EDM.vEcoResProductV2Entity
AS 
/*
name:  Products - Master (base entity)
created: 01.04.2021
modified: 01.04.2021
*/
SELECT ITM.ProductType
, CASE ITM.ProductType
    WHEN 1 THEN 'Item'
    WHEN 2 THEN 'Service'
  END ProductTypeName
, ITM.ProductSubtype
, CASE ITM.ProductSubtype
    WHEN 1 THEN 'Product'
    WHEN 2 THEN 'Product Master'
    WHEN 3 THEN 'Product Variant'
   END ProductSubtypeName
, ITM.ProductNumber
, Replace(Replace(ITM.ProductName, char(10), ' '), char(13), ' ')  ProductName
, Replace(Replace(ITM.ProductDescription, char(10), ' '), char(13), ' ') ProductDescription
, ITM.RetailProductCategoryName 
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.ProductColorGroupId 
, ITM.ProductSizeGroupId 
, ITM.ProductStyleGroupId
, ITM.VariantConfigurationTechnology
, CASE ITM.VariantConfigurationTechnology
    WHEN 0 THEN 'None'
    WHEN 1 THEN 'Predefined Variants'
    WHEN 2 THEN 'Dimension Based'
    WHEN 3 THEN 'Rule Based'
    WHEN 4 THEN 'Constraint Based'
  END VariantConfigurationTechnologyName
, ITM.IsProductKit
, CASE ITM.IsProductKit WHEN 1 THEN 'Yes' ELSE 'No' END IsProductKitName
, ITM.IsCatchWeightproduct
, CASE ITM.IsCatchWeightproduct WHEN 1 THEN 'Yes' ELSE 'No' END IsCatchWeightproductName
, ITM.IsProductVariantUnitConversionEnabled
, CASE ITM.IsProductVariantUnitConversionEnabled WHEN 1 THEN 'Yes' ELSE 'No' END IsProductVariantUnitConversionEnabledName
-- system
, ITM.ProductDimensionGroupRecId
, ITM.StorageDimensionGroupRecId
, ITM.TrackingDimensionGroupRecId
, ITM.RetailCategoryRecId
, ITM.RecId
, ITM.Partition
FROM dbo.EcoResProductV2Entity ITM

As can be seen, the view was created in the EDM (Entity Data Model) and has the "v" prefix added to the original name. The EDM schema was created to store the objects based on data coming via data entities.

With this view's data can be consumed in the paginated report:

--Products - Master
SELECT ITM.ProductTypeName
, ITM.ProductSubtypeName
, ITM.ProductNumber
, ITM.ProductName
, ITM.RetailProductCategoryName 
, ITM.ProductDimensionGroupName 
, ITM.StorageDimensionGroupName 
, ITM.TrackingDimensionGroupName 
, ITM.ProductColorGroupId 
, ITM.ProductSizeGroupId 
, ITM.ProductStyleGroupId
, ITM.VariantConfigurationTechnologyName
, ITM.IsProductKitName
, ITM.IsCatchWeightproductName
FROM EDM.vEcoResProductV2Entity ITM
WHERE ITM.ProductName LIKE 'Mens%'
ORDER BY ITM.ProductNumber

One can use directly this query to generate the report, though in a second step, once the report was created, one might prefer to provide the query as variable to the corresponding dataset, as this allows better handling of the parameters and thus create the query dynamically as fit:

= "--Products - Master" & vbCrLf 
& "SELECT ITM.ProductTypeName" & vbCrLf 
& ", ITM.ProductSubtypeName" & vbCrLf 
& ", ITM.ProductNumber" & vbCrLf 
& ", ITM.ProductName" & vbCrLf 
& ", ITM.RetailProductCategoryName " & vbCrLf 
& ", ITM.ProductDimensionGroupName " & vbCrLf 
& ", ITM.StorageDimensionGroupName " & vbCrLf 
& ", ITM.TrackingDimensionGroupName " & vbCrLf 
& ", ITM.ProductColorGroupId " & vbCrLf 
& ", ITM.ProductSizeGroupId " & vbCrLf 
& ", ITM.ProductStyleGroupId" & vbCrLf 
& ", ITM.VariantConfigurationTechnologyName" & vbCrLf 
& ", ITM.IsProductKitName" & vbCrLf 
& ", ITM.IsCatchWeightproductName" & vbCrLf 
& ", ITM.IsProductVariantUnitConversionEnabledName" & vbCrLf 
& "FROM EDM.vEcoResProductV2Entity ITM" & vbCrLf 
& "WHERE 0=0 " & vbCrLf 
& IIf(Parameters!ProductName.Value<> "", " AND ITM.ProductName LIKE @ProductName ", "") & vbcrlf 
& IIf(Parameters!ProductNumber.Value<> "", " AND ITM.ProductNumber LIKE @ProductNumber", "") & vbcrlf 
& IIf(Parameters!ProductType.Value<> "", " 	AND ITM.ProductType = @ProductType", "") & vbcrlf 
& IIf(Parameters!ProductSubtype.Value<> "", " AND ITM.ProductSubtype = @ProductSubtype ", "") & vbcrlf 
& "ORDER BY ITM.ProductNumber" & vbCrLf

The expression can be created using an Excel formula (after vbCrLf observer the empty space needs to be set, otherwise the report will fail):
="& """ & A1 & """ & vbCrLf 

The four parameters created for exemplification need to be defined also in the dataset accordingly. And here's report's output:

For Product Number and Name the User can use wildcards - that's what the "*" besides parameter's Prompt signifies.

Unfortunately, the logic needs to be extended accordingly as soon as further attributes not available in the data entities are needed. In this scenario and several others, using directly the base tables allows more flexibility even if the logic behind the data entity needs to be duplicated.

Notes:
(1) When building and testing the report use only a small subset of the data. 
(2) For the dropdown parameters the "(all)" label was added which has an empty value. It allows in the expression to ignore the parameter, when selected. 
(3) In text values with a high probability of coming from Excel sheets (e.g. Product Name or Description) it might be a good idea to replace char (10) line feed, char (13) ENTER, and even char (9) horizontal Tab values, otherwise the data might not be exported correctly to Excel.
(4) If the definitions for Enums change, the mappings need to be changed accordingly. 
(*) Existing customers have until 1-Nov-2024 to transition from Export to Data lake to Synapse link. Microsoft advises new customers to use Synapse Link.

Happy coding!

Previous Post <<||>> Next Post

13 January 2021

💎SQL Reloaded: Useful Queries for D365 FO (and not only)

Microsoft Dynamics 365 for Finance and Operations (D365 FO or simply D365) has currently more than 12000 tables, therefore attempting to understand the data model behind might seem at first sight to be a foolhardy effort. Fortunately, only a small percentage from all the tables contain meaningful master or transactional data, fact that makes easier the task. 

Moreover, the tables have meaningful prefixes (e.g. 'Invent' for Inventory, 'Purch' for Purchae Orders, 'Sales' for Sales Orders, etc.) and postfixes (e.g. 'Table for header data, 'Line' for position data, 'Trans' for transaction data) while their names are in many cases self-explanatory if one knows the corresponding business terms.

For those coming from earlier versions of D365 (e.g. Microsoft Dynamics AX 2009), with a few exceptions in the inventory transactions, product master data, addresses and finance dimensions, the data structures remained almost the same, if one considers the main functional area of the ERP system. However as new modules and/or functionality were added, the number of new tables increased considerably between the versions. 

On the other side in D365 the data are exposed through data entities, abstractions from the physical implementation of the database tables [1], implemented as views typically having the 'Entity' postfix. In many scenarios the data entities more likely will prove to be the best way of accessing the data, as they include an important part of the logic one will need anyway to reengineer, however there are maybe also exceptions in which is needed to work with the base tables. In the end it will be needed to know the main data entities as well the important tables related to them, being needed to check which views reference a given table, respectively which tables are references by a view. This can be easily achieved via the sys.sql_expression_dependencies system catalog view. 

The following queries can be run in non-production D365 environments:

-- retrieving the views where an object was used 
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.REFERENCING_ID 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE sed.referenced_entity_name = 'PurchTable'
  AND obj.Type = 'V'
  AND obj.name LIKE 'Purch%'
ORDER BY ReferencingObject

Output (12 records):
ReferencingObjecttype_descReferencedObjectReferencedObjectType
dbo.PURCHASEORDERCUBEVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHASEORDERRESPONSEHEADERSUMMARYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHLINEBACKORDERVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHLINEBACKORDERWORKSPACEVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHLINEEXPANDEDVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHORDERAPPROVEDVIEWVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPREPAYOPENBASEVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERCHARGEENTITYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERENTITYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERLINECDSENTITYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHTABLENOTARCHIVEDVERSIONSVIEWdbo.PURCHTABLEUSER_TABLE

The inverse search: 

-- retrieving the objects referenced in a view
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc ReferncingObjectType
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.referencing_id 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE obj.name = 'PURCHPURCHASEORDERHEADERV2ENTITY'
ORDER BY ReferencedObject

Output:
ReferencingObjectReferncingObjectTypeReferencedObjectReferencedObjectType
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.ACCOUNTINGDISTRIBUTIONTEMPLATEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.CONFIRMINGPOUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.DIMENSIONSETENTITYVIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.GetValidFromInContextInfoSQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.GetValidToInContextInfoSQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.HCMWORKERUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.LOGISTICSPOSTALADDRESSUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.LOGISTICSPOSTALADDRESSBASEENTITYVIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.PURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.PURCHTABLE_WUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.REASONTABLEREFUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.TMSPURCHTABLEUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.TRANSPORTATIONDOCUMENTUSER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITYVIEWdbo.VENDINVOICEDECLARATION_ISUSER_TABLE

Notes:
1. There will be cases in which is needed to check views' definition. To understand the logic it will be needed to format the queries in a more readable form. This can take time, given that the joins and the further constraints are not always straightforward. Therefore, it might be a good idea to export a formatted copy of the view for later use.
2.  Unfortunately, the data model for D365 is not publicly available. One can use however the data model for AX 2012 (see [2] [3]) as basis and fill the gaps when needed. 
3. The important changes between the various data models versions are partially documented in several documents available via the Microsoft Dynamics CustomerSource. 
4. For non-production environments the data from views, tables or other database objects can be exported directly from the database. Unfortunately, Microsoft blocked the direct access to production environments. The data from tables can be accessed and exported via the table browser, while the data from entities can be exported to Excel typically from the display forms or via the Data Management Framework.
5. The queries can be used to explore any SQL Server database. The logic can be easily encapsulated in a view to simplify the calls. 

References:
[1] Microsoft Dynamics 365 (2020) Data entities overview [source]
[2] Microsoft Docs (2014) Application tables [source]
[3] Microsoft Docs (2014) Base Enums AX 2012 [source]

10 May 2019

🧊💫Data Warehousing: Architecture (Part II: Data Warehousing and Microsoft Dynamics 365)

Data Warehousing

With Dynamics 365 (D365) Online Microsoft made an important strategical move on the ERP market, however in what concerns the BI & Data Warehousing (BI/DW) area Microsoft changed the rules of the game by allowing no direct SQL access to the production environment. This primarily means that will become challenging for organizations to use the existing DW infrastructure to access the D365 data, and for Vendors and Service Providers to provide BI/DW solutions integrated within the D365 platform.

D365 includes its own data warehouse (actually data mart) designed for financial reporting however as per now it can’t be extended to support other business areas. The solution favorited by Microsoft for DW seems to be the use of an Azure SQL Database aka BYOD (Bring Your Own Database) to which entity-based data can be exported incrementally (aka incremental push) or fully (aka full push) via the Data Management Framework (DMF) packages.

Because many of the D365 tables (e.g. Inventory Transactions, Products, Customers, Vendors) were overnormalized over the years and other tables were added as part of new functionality, to hide this complexity, Microsoft introduced a new layer of abstraction formed from data entities organized within an entity store. Data entities are view-like encapsulations of the underlying D365 table schema, the data import/export from and D365 being performed extensively over these data entities via the DMF, which extends the Data Import/Export Framework (DIXF).

One can use thus a BYOD as a direct source for other reporting tools as long they support a connection to Azure, otherwise the data can be further loaded into a database into the cloud, which seems to be the best option until now, as long the organization has other data that need to be consolidated for reporting. From here on, one deals with the traditional way of reporting and the available infrastructure can be extended to use an additional data source.

The BYOD solution comes with several restrictions: a package needs to be created for each business unit, no composite data entities can be exported, data entities that don’t have a unique key can’t be exported via an incremental push, data entities can change over times (new versions being available), while during synchronization no active locks should be on the database. In addition, organizations which followed this path report also some bugs that needed to be addressed via the Microsoft support. Even if the about 1700 available data entities facilitate to some degree data consumption, they seem to be more appropriate for data migrations and data integrations than for DW workloads.

In absence of direct SQL connectivity, in theory organizations can still use SSIS or similar integration tools to connect to D365 production databases and consume data entities via the Open Data Protocol (OData), a standard that defines a set of best practices for building and consuming RESTful APIs. Besides some architectural challenges, loading big tables with transactional data is reportedly slow and impracticable for loading a data warehouse. Therefore, the usability of such an architecture becomes limited in time.

Microsoft imposed a hard limitation upon its D365 architecture by making its production database inaccessible. Of course, there’s still time for Microsoft to do some magic and pull new solutions from the technology stack hat. Unfortunately, the constraints imposed to the production environments limit organizations’ choices of building a modern and flexible data warehouse. For the future it would be great if the DMF could be used directly with standard SQL Server databases, avoiding thus the need for the intermediary Azure database, or if a real-time operational solution could be provided out-of-the-box. We’ll see what the future brings...
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.