SQL Troubles: dependecy

Showing posts with label dependecy. Show all posts

13 January 2021

💎💫SQL Reloaded: Useful Queries for Dynamics 365 F&O and not only (Part I: Dependencies)

Microsoft Dynamics 365 for Finance and Operations (D365 FO or simply D365) has currently more than 12000 tables, therefore attempting to understand the data model behind might seem at first sight to be a foolhardy effort. Fortunately, only a small percentage from all the tables contain meaningful master or transactional data, fact that makes easier the task.

Moreover, the tables have meaningful prefixes (e.g. 'Invent' for Inventory, 'Purch' for Purchae Orders, 'Sales' for Sales Orders, etc.) and postfixes (e.g. 'Table for header data, 'Line' for position data, 'Trans' for transaction data) while their names are in many cases self-explanatory if one knows the corresponding business terms.

For those coming from earlier versions of D365 (e.g. Microsoft Dynamics AX 2009), with a few exceptions in the inventory transactions, product master data, addresses and finance dimensions, the data structures remained almost the same, if one considers the main functional area of the ERP system. However as new modules and/or functionality were added, the number of new tables increased considerably between the versions.

On the other side in D365 the data are exposed through data entities, abstractions from the physical implementation of the database tables [1], implemented as views typically having the 'Entity' postfix. In many scenarios the data entities more likely will prove to be the best way of accessing the data, as they include an important part of the logic one will need anyway to reengineer, however there are maybe also exceptions in which is needed to work with the base tables. In the end it will be needed to know the main data entities as well the important tables related to them, being needed to check which views reference a given table, respectively which tables are references by a view. This can be easily achieved via the sys.sql_expression_dependencies system catalog view.

The following queries can be run in non-production D365 environments:

-- retrieving the views where an object was used 
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.REFERENCING_ID 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE sed.referenced_entity_name = 'PurchTable'
  AND obj.Type = 'V'
  AND obj.name LIKE 'Purch%'
ORDER BY ReferencingObject

Output (12 records):

ReferencingObject	type_desc	ReferencedObject	ReferencedObjectType
dbo.PURCHASEORDERCUBE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHASEORDERRESPONSEHEADERSUMMARY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEBACKORDER	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEBACKORDERWORKSPACE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHLINEEXPANDED	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHORDERAPPROVEDVIEW	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPREPAYOPENBASE	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERCHARGEENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERLINECDSENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHTABLENOTARCHIVEDVERSIONS	VIEW	dbo.PURCHTABLE	USER_TABLE

The inverse search:

-- retrieving the objects referenced in a view
SELECT SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ obj.name AS ReferencingObject
, obj.type_desc ReferncingObjectType
, SCHEMA_NAME(obj.SCHEMA_ID) + '.'+ sed.referenced_entity_name AS ReferencedObject
, obr.type_desc ReferencedObjectType
FROM sys.all_objects obj
     JOIN sys.sql_expression_dependencies sed 
       ON obj.OBJECT_ID = sed.referencing_id 
	      JOIN sys.all_objects obr
		    ON sed.referenced_id = obr.object_id
WHERE obj.name = 'PURCHPURCHASEORDERHEADERV2ENTITY'
ORDER BY ReferencedObject

Output:

ReferencingObject	ReferncingObjectType	ReferencedObject	ReferencedObjectType
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.ACCOUNTINGDISTRIBUTIONTEMPLATE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.CONFIRMINGPO	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.DIMENSIONSETENTITY	VIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.GetValidFromInContextInfo	SQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.GetValidToInContextInfo	SQL_SCALAR_FUNCTION
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.HCMWORKER	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.LOGISTICSPOSTALADDRESS	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.LOGISTICSPOSTALADDRESSBASEENTITY	VIEW
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.PURCHTABLE_W	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.REASONTABLEREF	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.TMSPURCHTABLE	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.TRANSPORTATIONDOCUMENT	USER_TABLE
dbo.PURCHPURCHASEORDERHEADERV2ENTITY	VIEW	dbo.VENDINVOICEDECLARATION_IS	USER_TABLE

Notes:

1. There will be cases in which is needed to check views' definition. To understand the logic it will be needed to format the queries in a more readable form. This can take time, given that the joins and the further constraints are not always straightforward. Therefore, it might be a good idea to export a formatted copy of the view for later use.

2. Unfortunately, the data model for D365 is not publicly available. One can use however the data model for AX 2012 (see [2] [3]) as basis and fill the gaps when needed.

3. The important changes between the various data models versions are partially documented in several documents available via the Microsoft Dynamics CustomerSource.

4. For non-production environments the data from views, tables or other database objects can be exported directly from the database. Unfortunately, Microsoft blocked the direct access to production environments. The data from tables can be accessed and exported via the table browser, while the data from entities can be exported to Excel typically from the display forms or via the Data Management Framework.

5. The queries can be used to explore any SQL Server database. The logic can be easily encapsulated in a view to simplify the calls.

Previous Post <<||>> Next Post

References:
[1] Microsoft Dynamics 365 (2020) Data entities overview [source]

[2] Microsoft Docs (2014) Application tables [source]

[3] Microsoft Docs (2014) Base Enums AX 2012 [source]

06 July 2020

🪄SSRS (& Paginated Reports): Design (Part II: Parameter Dependencies in Dropdowns)

Introduction

Quite often there are dependencies between the values of two distinct dropdowns, typically the values representing the different levels of a direct on implied hierarchy. This means that the choice at the upper level determines the values which will be shown at a lower level, the corresponding query needing to handle the value as a parameter. There are several ways on how one can handle this, the choice depending also on the data provider considered for the report.

Because the AdventureWorks database is used within an SQL Server, the available choices are MSOLEDBSQL, SQLOLEDB, ODBC or SQL Server Native Client. It should be noted that the SQLOLEDB and the Native Clients providers have been deprecated and shouldn't be used for new developments! In the project was used the Microsoft OLE DB Provider for SQL Server (MSOLEDBSQL) provider.

For exemplification will be used the reports build previously. The hierarchy will be built based on the State and Territory, a Country belonging to a Territory, while a State to a Country. For the beginning we will build the dependency existing between the Country and State.

First Level of Dependency

The text parameter for StateCountry dataset will be defined similarly to the one consider for the Country, and have as source the following query:

-- State Provinces
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT StateProvinceCode
, StateProvinceCode + ' ' + Name
FROM [Person].[StateProvince]
WHERE CountryRegionCode = ISNULL(NullIf(?, ''), CountryRegionCode)
ORDER BY Id

The Country parameter will be handled in the WHERE constraint - it's the role of the question mark. When an empty string was selected, the equivalent for ignoring the parameter, the formula used will result to an identity, which however makes sense only if there's always a value for the Country Code in the Person.StateProvince table. Otherwise the same line needs to be written as follows:

WHERE IsNull(CountryRegionCode, '') = COALESCE(NullIf(?, ''), CountryRegionCode, '')

This might look strange, though it does the trick of handling the nulls. According to the rules of defensive programming, one should treat the eventuality of nulls. However, this depends also on application's design. If one develops reports with this supposition, then the supposition should be made explicit in the documentation!

Once the query provided, SSRS recognizes that a parameter was provided and requires to map the question mark to the respective parameter. Within dataset's properties click on the "Parameter" tab and selected @Country as Parameter Value:

Parameters in Dataset Properties

A question mark needs to be considered in the query for each parameter, the parameters from the Dataset Properties being displayed in the order in which they appear in the query.

Once the dataset defined, one can go on and define the StateProvince parameter similarly to the Country parameter and consider the StateProvinces dataset as source. The StateProvince parameter needs to be added also in report's header:
= "State Province: " & Parameters!StateProvince.Label

Within the Addresses main dataset there needs to be added the declaration for the StateProvince parameter, while in where the corresponding clause:

-- SQL Parameter
& " DECLARE @StateProvince as nvarchar(3) = '" & Parameters!StateProvince.Value & "'" & vbcrlf 

-- constraint in WHERE clause
& IIf(Parameters!StateProvince.Value<> "", "     AND SIC.StateProvinceCode = @StateProvince", "") & vbcrlf W

With these changes the report should be ready to run.

Notes:
The above query for the StateProvince dataset can be used inside of a formula with a few small changes. This should work independently of the provider used. Just replace the query in the formula and delete the parameters previously created for the data source via the Dataset Properties:

= " -- State Provinces " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " SELECT '' Id " & vbCrLf
 & " , '(All)' Name " & vbCrLf
 & " UNION ALL " & vbCrLf
 & " SELECT StateProvinceCode " & vbCrLf
 & " , StateProvinceCode + ' ' + Name " & vbCrLf
 & " FROM [Person].[StateProvince] " & vbCrLf
 & IIf(Parameters!Country.Value<>"", "     WHERE CountryRegionCode = @Country", "") & vbcrlf 
 & " ORDER BY Id " & vbCrLf

Second Level of Dependency

The second dependency is based on the Territory Group formed of North America, Pacific and Europe. There is however no Territory Group table, the relations between Countries and Territory Groups being stored into the Sales.SalesTerritory table. Therefore, the query for the corresponding dropdown can be written via a DISTINCT:

-- Territories
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT DISTINCT [Group] Id
, [Group] Name
FROM Sales.SalesTerritory
ORDER BY Id

When the user selects a Territory, in the Countries and State Province dropdowns are supposed to be shown only the corresponding values for the selected value. We need to modify thus both queries via formulas.
The query for the Country dataset includes the Territory parameter and the corresponding constraint via an EXIST:

= "  -- Countries  " & vbCrLf 
& " DECLARE @Territory as nvarchar(50) = '" & Parameters!Territory.Value & "'" & vbcrlf 
& " SELECT '' Id  " & vbCrLf 
& " , '(All)' Name  " & vbCrLf 
& " UNION ALL  " & vbCrLf 
& " SELECT SSP.CountryRegionCode Id  " & vbCrLf 
& " , SSP.Name   " & vbCrLf 
& " FROM [Person].[CountryRegion] SSP " & vbCrLf 
& IIf(Parameters!Territory.Value<> "", "     WHERE EXISTS (SELECT * FROM [Sales].[SalesTerritory] SST WHERE SSP.CountryRegionCode = SST.CountryRegionCode AND SST.[Group] = @Territory)", "") & vbcrlf 
& " ORDER BY Id " & vbCrLf

In exchange the StateProvince dataset is slightly more complex:

= " -- State Provinces " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " DECLARE @Territory as nvarchar(50) = '" & Parameters!Territory.Value & "'" & vbcrlf 
 & " SELECT '' Id " & vbCrLf
 & " , '(All)' Name " & vbCrLf
 & " UNION ALL " & vbCrLf
 & " SELECT PSP.StateProvinceCode " & vbCrLf
 & " , PSP.StateProvinceCode + ' ' + PSP.Name " & vbCrLf
 & " FROM [Person].[StateProvince] PSP" & vbCrLf
 & "      JOIN [Sales].[SalesTerritory] PST" & vbCrLf
 & "       ON PSP.TerritoryId = PST.TerritoryId " & vbCrLf
 & " WHERE 0=0 " & vbCrLf
 & IIf(Parameters!Country.Value<> "", "     AND PSP.CountryRegionCode = @Country", "") & vbcrlf 
 & IIf(Parameters!Territory.Value<> "", "     AND PST.[Group] = @Territory", "") & vbcrlf 
 & " ORDER BY Id " & vbCrLf

Once the datasets changed, is still needed to modify the order of the parameters to reflect the order in which they are called within dependency, respectively Territory, State Province, Country. In the end the report looks like this:

Notes:
(1) If the order of the parameters doesn't reflect the otder in which they are called, you'll get a generic error similar with the following one (the error appears when there are problems with the query):
An error occurred during local report processing. The definition of the report '/Customer Addresses' is invalid. The CommandText expression for the query 'Countries' contains an error: [BC30277] Type character '8e does not match declared data type 'String'.

(2) When designing parameters one should consider also the best practices for avoiding the poor design of parameters.

Previous Post <<||>> Next Post

24 February 2018

💎SQL Reloaded: Misusing Views and Pseudo-Constants

Views as virtual tables can be misused to replace tables in certain circumstances, either by storing values within one or multiple rows, like in the below examples:

-- parameters for a BI solution
CREATE VIEW dbo.vLoV_Parameters
AS
SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
 , Cast(GetDate() as Date) AS CurrentDate 
 , Cast(100 as int) AS BatchCount 

GO

SELECT *
FROM dbo.vLoV_Parameters

GO

-- values for a dropdown 
CREATE VIEW dbo.vLoV_DataAreas
AS
SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
, Cast('Company ABC' as nvarchar(50)) AS Description 
UNION ALL
SELECT 'XYZ' DataAreaId 
, 'Company XYZ'

GO

SELECT *
FROM dbo.vLoV_DataAreas

GO

These solutions aren’t elegant, and typically not recommended because they go against one of the principles of good database design, namely “data belong in tables”, though they do the trick when needed. Personally, I used them only in a handful of cases, e.g. when it wasn’t allowed to create tables, when it was needed testing something for a short period of time, or when there was some overhead of creating a table for 2-3 values. Because of their scarce use, I haven’t given them too much thought, not until I discovered Jared Ko’s blog posting on pseudo-constants. He considers the values from the first view as pseudo-constants, and advocates for their use especially for easier dependency tracking, easier code refactoring, avoiding implicit data conversion and easier maintenance of values.

All these are good reasons to consider them, therefore I tried to take further the idea to see if it survives a reality check. For this I took Dynamics AX as testing environment, as it makes extensive use of enumerations (aka base enums) to store list of values needed allover through the application. Behind each table there are one or more enumerations, the tables storing master data abounding of them. For exemplification let’s consider InventTrans, table that stores the inventory transactions, the logic that governs the receipt and issued transactions are governed by three enumerations: StatusIssue, StatusReceipt and Direction.

-- Status Issue Enumeration 
 CREATE VIEW dbo.vLoV_StatusIssue
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Sold
 , cast(2 as int) AS Deducted
 , cast(3 as int) AS Picked
 , cast(4 as int) AS ReservPhysical
 , cast(5 as int) AS ReservOrdered
 , cast(6 as int) AS OnOrder
 , cast(7 as int) AS QuotationIssue

GO

-- Status Receipt Enumeration 
 CREATE VIEW dbo.vLoV_StatusReceipt
 AS
SELECT cast(0 as int) AS None
 , cast(1 as int) AS Purchased
 , cast(2 as int) AS Received
 , cast(3 as int) AS Registered
 , cast(4 as int) AS Arrived
 , cast(5 as int) AS Ordered
 , cast(6 as int) AS QuotationReceipt

GO

-- Inventory Direction Enumeration 
 CREATE VIEW dbo.vLoV_InventDirection
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Receipt
 , cast(2 as int) AS Issue

To see these views at work let’s construct the InventTrans table on the fly:

-- creating an ad-hoc table  
SELECT *
INTO  dbo.InventTrans
FROM (VALUES (1, 1, 0, 2, -1, 'A0001')
, (2, 1, 0, 2, -10, 'A0002')
, (3, 2, 0, 2, -6, 'A0001')
, (4, 2, 0, 2, -3, 'A0002')
, (5, 3, 0, 2, -2, 'A0001')
, (6, 1, 0, 1, 1, 'A0001')
, (7, 0, 1, 1, 50, 'A0001')
, (8, 0, 2, 1, 100, 'A0002')
, (9, 0, 3, 1, 30, 'A0003')
, (10, 0, 3, 1, 20, 'A0004')
, (11, 0, 1, 2, 10, 'A0001')
) A(TransId, StatusIssue, StatusReceipt, Direction, Qty, ItemId)

Here are two sets of examples using literals vs. pseudo-constants:

--example issued with literals 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusIssue = 1 
  AND ITR.Direction = 2

GO
--example issued with pseudo-constants
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusIssue SI
        ON ITR.StatusIssue = SI.Sold
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Issue

GO

--example receipt with literals 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt= 1
   AND ITR.Direction = 1

GO

--example receipt with pseudo-constants
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusReceipt SR
        ON ITR.StatusReceipt= SR.Purchased
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Receipt

As can be seen the queries using pseudo-constants make the code somehow readable, though the gain is only relative, each enumeration implying an additional join. In addition, when further business tables are added to the logic (e.g. items, purchases or sales orders) it complicates the logic, making it more difficult to separate the essential from nonessential. Imagine a translation of the following query:

-- complex query 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
              <several tables here>
WHERE ((ITR.StatusReceipt<=3 AND ITR.Direction = 1)
  OR (ITR.StatusIssue<=3 AND ITR.Direction = 2))
  AND (<more constraints here>)

The more difficult the constraints in the WHERE clause, the more improbable is a translation of the literals into pseudo-constraints. Considering that an average query contains 5-10 tables, each of them with 1-3 enumerations, the queries would become impracticable by using pseudo-constants and quite difficult to troubleshoot their execution plans.

The more I’m thinking about, an enumeration data type as global variable in SQL Server (like the ones available in VB) would be more than welcome, especially because values are used over and over again through the queries. Imagine, for example, the possibility of writing code as follows:

-- hypothetical query
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt = @@StatusReceipt .Purchased
  AND ITR.Direction = @@InventDirection.Receipt

From my point of view this would make the code more readable and easier to maintain. Instead, in order to make the code more readable, one’s usually forced to add some comments in the code. This works as well, though the code can become full of comments.

-- query with commented literals
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt <=3 -- Purchased, Received, Registered 
   AND ITR.Direction = 1-- Receip

In conclusion, pseudo-constants’ usefulness is only limited, and their usage is against developers’ common sense, however a data type in SQL Server with similar functionality would make code more readable and easier to maintain.

Notes:
1) It is possible to simulate an enumeration data type in tables’ definition by using a CHECK constraint.
2) The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code.

Happy coding!

16 January 2017

⛏️Data Management: Data Flow (Definitions)

"The sequence in which data transfer, use, and transformation are performed during the execution of a computer program." (IEEE," IEEE Standard Glossary of Software Engineering Terminology", 1990)

"A component of a SQL Server Integration Services package that controls the flow of data within the package." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"Activities of a business process may exchange data during the execution of the process. The data flow graph of the process connects activities that exchange data and - in some notations - may also represent which input/output parameters of the activities are involved." (Cesare Pautasso, "Compiling Business Process Models into Executable Code", 2009)

"Data dependency and data movement between process steps to ensure that required data is available to a process step at execution time." (Christoph Bussler, "B2B and EAI with Business Process Management", 2009)

[logical data flow:] "A data flow diagram that describes the flow of information in an enterprise without regard to any mechanisms that might be required to support that flow." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

[physical data flow:] "A data flow diagram that identifies and represents data flows and processes in terms of the mechanisms currently used to carry them out." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The fact that data, in the form of a virtual entity class, can be sent from a party, position, external entity, or system process to a party, position, external entity, or system process." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction [Beizer]." (International Qualifications Board for Business Analysis, "Standard glossary of terms used in Software Engineering", 2011)

"Data flow refers to the movement of data from one purpose to another; also the movement of data through a set of systems, or through a set of transformations within one system; it is a nontechnical description of how data is processed. See also Data Chain." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The movement of data through a group of connected elements that extract, transform, and load data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A path that carries packets of information of known composition; a roadway for data. Every data flow’s composition is recorded in the data dictionary." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"the path, in information systems or otherwise, through which data move during the active phase of a study." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The lifecycle movement and storage of data assets along business process networks, including creation and collection from external sources, movement within and between internal business units, and departure through disposal, archiving, or as products or other outputs." (Kevin J Sweeney, "Re-Imagining Data Governance", 2018)

"A graphical model that defines activities that extract data from flat files or relational tables, transform the data, and load it into a data warehouse, data mart, or staging table." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction." (Software Quality Assurance)

14 August 2011

📈Graphical Representation: Data Flow Diagram (Definitions)

"A diagram that shows the data flows in an organization, including sources of data, where data are stored, and processes that transform data." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

"A diagram of the data flow from sources through processes and files to users. A source or user is represented by a square; a data file is represented by rectangles with missing righthand edges; a process is represented by a circle or rounded rectangle; and a data flow is represented by an arrow." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"A diagram used in functional analysis which specifies the functions of the system, the inputs/outputs from/to external (user) entities, and the data being retrieved from or updating data stores. There are well-defined rules for specifying correct DFDs, as well as for creating hierarchies of interrelated DFDs." (Peretz Shoval & Judith Kabeli, "Functional and Object-Oriented Methodology for Analysis and Design", 2009)

[Control Data Flow Graph (CDFG):] " Represents the control flow and the data dependencies in a program." (Alexander Dreweke et al, "Text Mining in Program Code", 2009)

"A graphic method for documenting the flow of data within an organization." (Jan L Harrington, "Relational Database Design and Implementation: Clearly explained" 3rd Ed., 2009)

"A graphic representation of the interactions between different processes in an organization in terms of data flow communications among them. This may be a physical data flow diagram that describes processes and flows in terms of the mechanisms involved, a logical data flow diagram that is without any representation of the mechansm, or an essential data flow diagram that is a logical data flow diagram organized in terms of the processes that respond to each external event." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Data-flow diagrams (DFDs) are system models that show a functional perspective where each transformation represents a single function or process. DFDs are used to show how data flows through a sequence of processing steps." (Ian Sommerville, "Software Engineering" 9th Ed., 2011)

"A model of the system that shows the system’s processes, the data that flow between them (hence the name), and the data stores used by the processes. The data flow diagram shows the system as a network of processes, and is thought to be the most easily recognized of all the analysis models." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"A picture of the movement of data between external entities and the processes and data stores within a system." (Jeffrey A Hoffer et al, "Modern Systems Analysis and Design" 7th Ed., 2014)

"A schematic indicating the direction of the movement of data" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A Data Flow Diagram (DFD) is a graphical representation of the 'flow' of data through an information system, modeling its process aspects. Often it is a preliminary step used to create an overview of the system that can later be elaborated." (Henrikvon Scheel et al, "Process Concept Evolution", 2015)

"Data flow maps are tools that graphically represent the results of a comprehensive data assessment to illustrate what information comes into an organization, for what purposes that information is used, and who has access to that information." (James R Kalyvas & Michael R Overly, "Big Data: A Business and Legal Guide", 2015)

"A graphical representation of the logical or conceptual movement of data within an existing or planned system." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"a visual depiction using standard symbols and conventions of the sources of, movement of, operations on, and storage of data." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"A data-flow diagram is a way of representing a flow of data through a process or a system (usually an information system). The DFD also provides information about the outputs and inputs of each entity and the process itself." (Wikipedia) [source]

"A graphical representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction." (IQBBA)

07 August 2010

Database Design: Object Dependencies (Part I - An Introduction)

Data Management Series

Introduction

Around the various data islands existing in the blue and the models that support them are created a whole range of database objects (views, stored procedures, user-defined functions) and other type of non-database objects (classes, strong-typed datasets, reports, ad-hoc queries, etc.) With each reference to a database object is created a database dependency, or simply dependency, between the database object and the other objects that reference it, thus any change occurring in a database object could impact the various referents resulting in broken links, invalid calls or any type of error that might brake the calling applications or the isolated pieces of code (e.g. reports, ad-hoc queries, SQL scripts-based logic).

Tracking Database Dependencies

Many organizations use to document such dependencies in data dictionaries or any other type of similar documentation, one of the reasons being the easier identification of the objects that are impacted by the changes occurring in the database structure. One of the problems is that the documentation is often application-oriented, targeting thus the application using the data, and if there are multiple applications consuming the same data, then it’s not so easy to aggregate all the dependencies especially when they are stored in Excel files, dispersed documents, repositories, with (complicated) permissions access,

That’s one of the reasons for which an organization might consider storing in the source database as much of the business logic related directly to data. Encapsulating queries and procedural logic in views, stored procedures, user-defined functions or any other similar objects seems a good idea in order to reduce the maintenance of code, hide the complexity of a database from the consumers (users, services, web/desktop applications, etc.), and from several other considerations.

The most important of these considerations is the fact that databases store not only the respective objects and statistics about them, but could store also the dependencies between them, making easier the impact analysis or any type of analysis based on the dependency between objects. It’s at the discretion of developers, architects or any other type of professional with decision power on whether they want to take advantage of such functionality.

Foreign Key Constraints

The simplest and most natural dependence information to store are the primary-foreign key relations implemented in the form of a constraint. The foreign key constraints, as they are called, identify and enforce the relationship between two tables; “identify” because it makes the relation explicit, and “enforce” because it checks the validity of foreign-primary key values pairs when records are inserted, updated or deleted, enforcing thus the referential integrity of the database.

When a deletion is attempted on a record, the database engine checks if there is any dependent record (in the same or other table) that references the respective record, and if such a constraint is defined, the deletion is aborted raising also an error message. A check is performed also when a record is inserted or updated in the child table, the respective actions being aborted if the foreign key reference is not valid.

Conversely, a foreign key constraint could bring additional complexity during migration tasks, though with a little effort and a good architecture the overhead is minimized. In addition the foreign key constraints could be used by third party tools to provide some degree of automation when joining tables or for other purposes.

Object Dependencies

More complex dependency regard the dependency between database objects – views, stored procedures, user-defined functions or tables. In some cases is enough to see that there is a dependency between two objects, though in more complex situations would be useful to know which specific attribute is used from the dependency objects, especially when using the metadata for automation tasks. By creating the dependency tree, the tree of objects resulted from the dependency between the various database objects, it’s possible to provide more accurate impact assessments.

So until now were considered the dependencies between database objects, though, as highlighted above, there are many other objects stored outside of a database and referencing database objects. It makes sense to have a global repository in which to store information about dependencies, preferably in a relational database which could be easily interrogated using simple flat or hierarchical queries.

Metadata

The object definitions, statistics, dependencies and other type of information stored about data or the structures or objects related to data are encompassed under the denomination of metadata, which in common understanding is defined as “data about data”. The metadata could be used not only as input for impact analysis but also for automating business logic, functionality that opens new perspectives in development. Einstein’s believe that “problems cannot be solved by the same level of thinking that created them” is reflected in the world of databases by the fact that the metadata stored about database objects help to solved problems related to the objects and the data the databases contain. For example during a data migration project the two database structures could be mapped at table and attribute level, being possible thus to create validation rules in an automated manner.

Even if databases come with a predefined structure of storing metadata, the various solutions developed on top of such databases require additional metadata to be stored, and in theory it would be great if databases’ metadata structures could be extended for this purposes, though given the risks involved in altering such structures lead to the existence of parallel metadata repositories, in which an important percent of the database’s metadata are duplicated.

Beyond Database Dependencies

Talking about data mappings, integration projects and integration functionality/features rely heavily on data mappings, they involving a degree of automation too. Integration of data doesn’t necessarily occur only at application level, in the context of web’s evolution, the tendency is to link and integrate the various data islands (see linkeddata.org), especially the ones with public character, and provide thus cross-database functionality. Many of the problems such an approach implies are solved at metadata level, new metadata and dependency levels being required for this purpose.

Created: Aug-2010, Last Reviewed: Mar-2024

15 February 2010

🕋Data Warehousing: Control Flow (Definitions)

"A component of a SQL Server Integration Services package that controls the flow of tasks within the package. The high-level workflow of a package. Often contains one or more data flow tasks." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"Concept to define causal dependency between process steps to enforce a specific execution order." (Christoph Bussler, "B2B and EAI with Business Process Management", 2009)

"The flow of control defines a partial order relationship between the activities of a business process model, specifying in which temporal order they will be executed." (Cesare Pautasso, "Compiling Business Process Models into Executable Code", 2009)

"A group of connected control flow elements that perform tasks." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An abstract representation of all possible sequences of events (paths) during execution of a component or system. Often represented in graphical form, see control flow graph." (Tilo Linz et al, "Software Testing Foundations" 4th Ed., 2014)

"The control-flow perspective focuses on the control-flow, i.e., the ordering of activities. The goal of mining this perspective is to find a good characterization of all possible paths. Other popular perspectives are the organizational perspective and the case perspective." (Pavlos Delias et al, "Applying Process Mining to the Emergency Department", Encyclopedia of Business Analytics and Optimization, 2014)

"In DB2 data warehousing, a graphical model that sequences data flows and mining flows, integrates external commands, programs, and stored procedures, and provides conditional processing logic for a data warehouse application." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A sequence of events (paths) in the execution through a component or system." (ISTQB)

"The ordered workflow in an Integration Services package that performs tasks." (Microsoft Technet)

30 April 2009

🛢DBMS: Functional Dependency (Definitions)

"Y is functionally dependent on X if the value of Y is determined by X. In other words if Y = X +1, the value of X will determine the resultant value of Y. Thus, Y is dependent on X as a function of the value of X. Functional dependence is the opposite of determinance." (Gavin Powell, "Beginning Database Design", 2006)

"A constraint between two attributes or two sets of attributes in a relation." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A relationship between two attributes (or two combination of attributes) in a relation such that for every unique value of the second attribute, the table contains only one value of the first attribute. The first attribute, however, may be associated with multiple values of the second attribute." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"The property of one or more attributes (data items) that uniquely determines the value of one or more other attributes (data items). Given a table R, a set of attributes B is functionally dependent on another set of attributes A if, at each instant of time, each A value is associated with only one B value." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"Within a relation R, an attribute B is functionally dependent on an attribute A if and only if a given value of the attribute A determines exactly one value of the attribute B. The relationship 'B is dependent on A' is equivalent to 'A determines B' and is written as AB." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

SQL Troubles

Pages