SQL Troubles

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part II: Running Totals) 🆕

In the previous post I shown how to use the newly introduced WINDOW clause in SQL Server 2022 by providing some historical context concerning what it took to do the same simple aggregations as SUM or AVG within previous versions of SQL Server. Let's look at another scenario based on the previously created Sales.vSalesOrders view - using the same two functions in providing running totals.

A running total is a summation of a sequence of values (e.g. OrderQty) ordered by a key, typically one or more columns that uniquely define the sequence, something like a unique record identifier (e.g. SalesOrderId). Moreover, the running total can be defined within a partition (e.g. Year/Month).

In SQL Server 2000, to calculate the running total and average for a value within a partition would resume implementing the logic for each attribute in correlated subqueries. One needed thus one subquery for each attribute, resulting in multiple processing of the base tables. Even if the dataset was already in the memory, it still involves a major overhead.

-- running total/average based on correlated subquery (SQL Server 2000+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, (-- correlated subquery
  SELECT SUM(SRT.OrderQty)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
	AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RunningTotalQty
, (-- correlated subquery
  SELECT AVG(SRT.OrderQty)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
	AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RunningAvgQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Then SQL Server 2005 allowed consolidating the different correlated subqueries into one by using the CROSS APPLY join:

-- running total/average based on correlated subquery with CROSS APPLY (SQL Server 2005+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SRT.RunningTotalQty
, SRT.RunningAvgQty
FROM Sales.vSalesOrders SOL
     CROSS APPLY (-- correlated subquery
	  SELECT SUM(SRT.OrderQty) RunningTotalQty
	  , AVG(SRT.OrderQty) RunningAvgQty
	  FROM Sales.vSalesOrders SRT
	  WHERE SRT.ProductId = SOL.ProductId 
		AND SRT.[Year] = SOL.[Year]
		AND SRT.[Month] = SOL.[Month]
		AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) SRT
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Even if SQL Server 2005 introduced window functions that work on a partition, an ORDER BY clause was supported only with SQL Server 2012:

-- running total/average based on window functions (SQL Server 2012+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId) RunningTotalQty
, AVG(SOL.OrderQty) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId) RunningAvgQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Now, in SQL Server 2022 the WINDOW clause allows simplifying the query as follows by defining the partition only once:

-- running total/average based on window functions and clause (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS RunningTotalQty

, AVG(SOL.OrderQty) OVER SalesByMonth AS RunningAvgQty

FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId)
ORDER BY SOL.SalesOrderId

Moreover, the WINDOW clause allows forward (and backward) referencing of one window into the other:

-- running total/average based on window functions and clause (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- simple aggregations
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQty

, AVG(SOL.OrderQty) OVER SalesByMonth AS AvgQty

-- running totals
, SUM(SOL.OrderQty) OVER SalesByMonthSorted AS RunningTotalQty

, AVG(SOL.OrderQty) OVER SalesByMonthSorted AS RunningAvgQty

FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month])
, SalesByMonthSorted AS (SalesByMonth ORDER BY SOL.SalesOrderId)
ORDER BY SOL.SalesOrderId

Happy coding!

💎💫SQL Reloaded: Querying Azure Synapse Metadata for the D365 CRM & FO Tables Exported via Azure Synapse Link for Dataverse

Enabling Azure Synapse Link for Dataverse for Dynamics 365 CRM (D365 CRM) or for Dynamics 365 Finance & Operations (D365 FO) allows to have all the needed tables for reporting in a CDM structure, however the csv files with data don’t have any headers, which makes it challenging to use directly the files without the corresponding metadata. For example, attempting to define external tables would be useless without proper headers and data types. Fortunately, attribute’s name and data types are available in JSON files and can be queried.

D365 CRM

For CRM the files start with tables’ names and have the EntityMetadata.json postfix, the files being stored in the Microsoft.Athena.TrickleFeedService folder which lies with the other folders containing the data. Given that all files are in the same folder, the following query can be used to export the metadata for all or some of the tables. Just replace "(container name)" and "(data source)" with the corresponding values for your environment.

-- export definition for D365 CRM' CDM (all attributes, all tables)
SELECT DAT.EntityName
, DAT.AttributeName
, DAT.Timestamp
, DAT.AttributeType
, DAT.MetadataId
, DAT.Precision
, DAT.MaxLength
FROM openrowset(
        bulk '/(container name)/Microsoft.Athena.TrickleFeedService/*-EntityMetadata.json',
        data_source = '(data source)',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b',
        rowterminator = '0x0b' --> You need to override rowterminator to read classic JSON
    ) with (doc nvarchar(max)) as rows
       CROSS APPLY OPENJSON(doc, '$.AttributeMetadata')--definitions[0].hasAttributes
       with (
         EntityName nvarchar(255) '$.EntityName'
        , AttributeName nvarchar(255) '$.AttributeName'
        , Timestamp nvarchar(50) '$.Timestamp'
        , AttributeType nvarchar(50) '$.AttributeType'
        , MetadataId nvarchar(100) '$.MetadataId'
        , Precision int '$.Precision'
        , MaxLength int '$.MaxLength'
     ) as DAT
WHERE DAT.EntityName IN ('lead', 'opportunity', 'product')

D365 FO

For Finance & Operations folders’ structure is more complex, a whole hierarchy of folders being built based on a set of predefined categories. One would need to traverse the hierarchy structure to find the files. Thus, it might be easier to generate the metadata for each table. Files’ names start with tables’ names and have the .cdm.json postfix. The below query generates the metadata for the InventTable table. Just replace the "(container name)", "(instance name)" and "(data source)" with the values for your environment.

-- export definition for D365 FO's CDM (all attributes, specific table)

SELECT DAT.name
, DAT.dataFormat
FROM openrowset(
        bulk '/(container name)/(instance name)/Tables/SupplyChain/ProductInformationManagement/Main/InventTable.cdm.json',
        data_source = '(data source)',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b',
        rowterminator = '0x0b' --> You need to override rowterminator to read classic JSON
    ) with (doc nvarchar(max)) as rows
       CROSS APPLY OPENJSON(doc, '$.definitions[0].hasAttributes')
       with (
          name nvarchar(255) '$.name'
        , dataFormat nvarchar(50) '$.dataFormat'
     ) as DAT

Further Steps

One can obtain thus the needed metadata, however after a first inspection, the data types are too general compared with the ones considered for the data source attributes. One can work probably with these data types as well, though there's a Microsoft recommendation to minimize the row length by using the smallest possible column size, which leads to better query performance. Exporting the metadata from the source system and matching the two datasets based on tables and attributes’ names would allow addressing this recommendation, even if this implies checking from time to time whether their definition changed. The trick is to keep the same sorting order like in the Synapse files. Unfortunately, not all SQL Server data types are supported (e.g. text, ntext, sql_variant, etc.) or are ideal to work with (e.g. money), however there are alternative data types that can be used.

Happy coding!

Previous Post <<||>> Next Post

💎🏭SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part I: Simple Aggregations)

Between the many new features introduced in SQL Server 2022, Microsoft brings the WINDOW clause for defining the partitioning and ordering of a dataset which uses window functions. But before unveiling the change, let's take a walk down the memory lane.

In the early ages of SQL Server, many of descriptive statistics techniques were resuming in using the averages over a dataset that contained more or less complex logic. For example, to get the total and average of sales per Month, the query based on AdventureWorks database would look something like:

-- aggregated sales orders - detailed (SQL Server 2000+)
SELECT SOL.ProductID
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, SUM(SOL.OrderQty) TotalQty
, AVG(SOL.OrderQty) AverageQty
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
WHERE SOL.ProductId IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
GROUP BY SOL.ProductID
, Year(SOH.OrderDate) 
, Month(SOH.OrderDate)

When possible, the base logic was encapsulated within a view, hiding query's complexity from the users, allowing also reuse and better maintenance, just to mention a few of the benefits of this approach:

-- encapsulating the logic into a view (SQL Server 2000+)
CREATE OR ALTER VIEW Sales.vSalesOrders
AS
-- sales orders details
SELECT SOL.SalesOrderID
, SOL.ProductID
, Cast(SOH.OrderDate as Date) OrderDate
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, Cast(SOL.OrderQty as decimal(18,2)) OrderQty
-- many more columns 
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
/* -- some constraints can be brought directly into the view
WHERE SOL.ProductID IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
*/

The above query now becomes:

-- aggregated sales orders - simplified (SQL Server 2000+)
SELECT SOL.ProductID
, SOL.[Year]
, SOL.[Month]
, SUM(SOL.OrderQty) TotalQty
, AVG(SOL.OrderQty) AverageQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
GROUP BY SOL.ProductID
, SOL.[Year]
, SOL.[Month]

In many cases, performing this kind of aggregations was all the users wanted. However, quite often it's useful to look and the individual sales and consider them as part of the broader picture. For example, it would be interesting to know, what's the percentage of an individual value from the total (see OrderQtyPct) or any similar aggregate information. In SQL Server 2000 addressing such requirements would involve a join between the same view - one query with the detailed records, while the other contains the aggregated data:

-- sales orders from the totals (SQL Server 2000+)
SELECT SOL.*
, ASO.TotalQty
, Cast(CASE WHEN ASO.TotalQty <> 0 THEN 100*SOL.OrderQty/ASO.TotalQty ELSE 0 END as decimal(18,2)) OrderQtyPct
, ASO.AverageQty
FROM Sales.vSalesOrders SOL
     JOIN ( -- aggregated sales orders
		SELECT SOL.ProductID
		, SOL.[Year]
		, SOL.[Month]
		, SUM(SOL.OrderQty) TotalQty
		, AVG(SOL.OrderQty) AverageQty
		FROM Sales.vSalesOrders SOL
		WHERE SOL.ProductID IN (745)
		  AND SOL.[Year] = 2012
		  AND SOL.[Month] BETWEEN 1 AND 3
		GROUP BY SOL.ProductId
		, SOL.[Year]
		, SOL.[Month]
	) ASO
	ON SOL.ProductID = ASO.ProductID 
   AND SOL.[Year] = ASO.[Year]
   AND SOL.[Month] = ASO.[Month]
ORDER BY SOL.ProductID
, SOL.OrderDate

Now, just imagine that you can't create a view and having to duplicate the logic each time the view is used! Aggregating the data at different levels would require similar joins to the same view or piece of logic.

Fortunately, SQL Server 2005 came with two long-awaited features, common table expressions (CTEs) and window functions, which made a big difference. First, CTEs allowed defining inline views that can be referenced multiple times. Secondly, the window functions allowed aggregations within a partition.

-- sales orders with CTE SQL Server 2005+ query 
WITH CTE
AS (--sales orders in scope
SELECT SOL.SalesOrderID 
, SOL.ProductID
, Cast(SOH.OrderDate as Date) OrderDate
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, SOL.OrderQty
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
WHERE SOL.ProductID IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty
, AVG(SOL.OrderQty)  OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty
FROM CTE SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3

This kind of structuring the queries allows to separate the base logic for better maintainability, readability and fast prototyping. Once the logic from CTE becomes stable, it can be moved within a view, following to replace the CTE reference with view's name in the final query. On the other side, the window functions allow writing more flexible and complex code, even if the statements can become occasionally complex (see OrderQtyPct):

-- aggregated sales orders (SQL Server 2005+)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty
, AVG(SOL.OrderQty)  OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) ELSE 0 END as decimal(18,2)) OrderQtyPct
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3

Now, SQL Server 2022 allows to further simplify the logic by allowing to define with a name the window at the end of the statement and reference the name in several window functions (see last line of the query):

-- Aggregations per month (SQL Server 2022+)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQty
, AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQty
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPct
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month])

Moreover, several windows can be defined. For example, aggregating the data also per year:

-- Aggregations per month and year (SQL Server 2022+) 
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQtyByMonth
, AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQtyByMonth
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPctByMonth
, SUM(SOL.OrderQty) OVER SalesByYear AS TotalQtyByYear
, AVG(SOL.OrderQty) OVER SalesByYear AS AverageQtyByYear
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByYear <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByYear ELSE 0 END as decimal(18,2)) OrderQtyPctByYear
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month])
, SalesByYear AS  (PARTITION BY SOL.[Year])

Isn't that cool? It will take probably some time to get used in writing and reading this kind of queries, though using descriptive names for the windows should facilitate the process!

As a side note, even if there's no product function like in Excel, there's a mathematical trick to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see example).

Notes:
The queries work also in a SQL databases in Microsoft Fabric. Just replace the Sales with SalesLT schema (see post, respectively GitHub repository with the changed code).

Happy coding!

19 October 2022

🌡Performance Management: Mastery (Part II: First Time Right - The Aim toward Operational Excellence)

Performance Management Series

Rooted in Six Sigma methodology as a step toward operational excellence, First Time Right (FTR) implies that any procedure is performed in the right manner the first time and every time. It equates to minimizing the waste in its various forms (inventory, motion, overprocessing, overproduction, waiting, transportation, defects). Like many quality concepts from the manufacturing industry, the concept was transported in the software development process as principle, process, goal and/or metric. Thus, it became part of Software Engineering, Project Management, Data Science, and any other similar endeavors whose outcome results in software products.

Besides the quality aspect, FTR is rooted also in the economic imperative – the need to achieve something in the minimum amount of time with the minimum of effort. It’s about being efficient in delivering a product or achieving a given target. It can be associated with continuous improvement, learning and mastery, the aim being to encompass FTR as part of organization’s culture.

Even if not explicitly declared, FTR lurks in each task planned. It seems that it became common practice to plan with the FTR in mind, however between this theoretical aim and practice there’s as usual an important gap. Unfortunately, planners, managers and even tasks' performers often forget that mistakes are made, that several iterations are needed to get the job done. It starts with the communication between people in clarifying the requirements and ends with the formal sign off. All the deviations from the FTR add up in the deviations between expected and actual effort, though probably more important are the deviations from the plan and all the consequences deriving from it. Especially in complex projects this adds up into a spiral of issues that can easily reinforce themselves.

Many of the jobs that imply creativity, innovation, research or exploration require at least several iterations to get the job done and this is independent of participants’ professionalism and experience. Moreover, the more quality one needs, the higher the effort, the 80/20 being sometimes a good approximation of the effort needed. In extremis, aiming for perfection instead of excellence can make certain tasks a never-ending story.

Achieving FTR requires practice - the more novelty, the higher the complexity, the communication or the synchronization needs, the more practice is needed. It starts with the individual to master the individual tasks and ends with the team, where communication, synchronization and other aspects need to be considered. The practice is usually achieved on hands-on work as part of the daily duties, project work, and so on. Unfortunately, it’s based primarily on individual experience, and seldom groomed in advance, as preparation for future tasks. That’s why sometimes when efficiency is needed in performing critical complex tasks, one also needs to consider the learning curve in achieving the required quality.

Of course, many organizations demand from job applicants experience and, when possible, they hire people with experience, however the diversity, complexity and changing nature of tasks require further practice. This aspect is somehow recognized in the implementation in organizations of the various forms of DevOps, though how many organizations adopt it and enforce it on a regular basis? Moreover, a major requirement of nowadays businesses is to be agile, and besides the mere application of methodologies, being agile means to have also a FTR mindset.

FTR starts with the wish for mastery at individual and team level and, with the right management attention, by allocating time for learning, self-development in the important areas, providing relevant feedback and building an infrastructure for knowledge sharing and harnessing, FTR can become part of organization’s culture. It’s up to each of us to do it!

18 October 2022

💎SQL Reloaded: Successive Price Increases/Discounts via Windowing Functions and CTEs

I was trying today to solve a problem that apparently requires recursive common table expressions, though they are not (yet) available in Azure Synapse serverless SQL pool. The problem can be summarized in the below table definition, in which given a set of Products with an initial Sales price, is needed to apply Price Increases successively for each Cycle. The cumulated increase is simulated in the last column for each line.

Unfortunately, there is no SQL Server windowing function that allows multiplying incrementally the values of a column (similar as the running total works). However, there’s a mathematical trick that can be used to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see Solution 1), and which frankly is more elegant than applying CTEs (see Solution 2).

-- create table with test data
SELECT *
INTO dbo.ItemPrices
FROM (VALUES ('ID001', 1000, 1, 1.02, '1.02')
, ('ID001', 1000, 2, 1.03, '1.02*1.03')
, ('ID001', 1000, 3, 1.03, '1.02*1.03*1.03')
, ('ID001', 1000, 4, 1.04, '1.02*1.03*1.03*1.04')
, ('ID002', 100, 1, 1.02, '1.02')
, ('ID002', 100, 2, 1.03, '1.02*1.03')
, ('ID002', 100, 3, 1.04, '1.02*1.03*1.04')
, ('ID002', 100, 4, 1.05, '1.02*1.03*1.04*1.05')
) DAT (ItemId, SalesPrice, Cycle, PriceIncrease, CumulatedIncrease)

-- reviewing the data
SELECT *
FROM dbo.ItemPrices

-- Solution 1: new sales prices with log & exp
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) CumulatedIncrease
, SalesPrice * EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) NewSalesPrice
FROM dbo.ItemPrices

-- Solution 2: new sales prices with recursive CTE
;WITH CTE 
AS (
-- initial record
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, cast(ITP.PriceIncrease as decimal(38,6)) CumulatedIncrease
FROM dbo.ItemPrices ITP
WHERE ITP.Cycle = 1
UNION ALL
-- recursice part
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, Cast(ITP.PriceIncrease * ITO.CumulatedIncrease as decimal(38,6))  CumulatedIncrease
FROM dbo.ItemPrices ITP
    JOIN CTE ITO
	  ON ITP.ItemId = ITO.ItemId
	 AND ITP.Cycle-1 = ITO.Cycle
)
-- final result
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, CumulatedIncrease
, SalesPrice * CumulatedIncrease NewSalesPrice
FROM CTE
ORDER BY ItemId
, Cycle


-- validating the cumulated price increases (only last ones)
SELECT 1.02*1.03*1.03*1.04 
, 1.02*1.03*1.04*1.05

-- cleaning up
DROP TABLE IF EXISTS dbo.ItemPrices

Notes:
1. The logarithm in SQL Server’s implementation works only with positive numbers!
2. For simplification I transformed percentages (e.g. 1%) in values that are easier to multitply with (e.g. 1.01). The solution can be easily modified to consider discounts.
3. When CTEs are not available, one is forced to return to the methods used in SQL Server 2000 (I've been there) and thus use temporary tables or table variables with loops. Moreover, the logic can be encapsulated in multi-statement table-valued functions (see example), unfortunately, another feature not (yet) supported by serverless SQL pools.
4. Unfortunately, STRING_AGG, which concatenates values across rows, works only with a GROUP BY clause. Anyway, its result is useless without the availability of a Eval function in SQL (see example), however the Expr function available in data flows could be used as workaround.
4. Even if I thought about the use of logarithms for transforming the product into a sum, I initially ignored the idea, thinking that the solution would be too complex to implement. So, the credit goes to another blogpost. Kudos!
5. The queries work also in a SQL databases in Microsoft Fabric. Just replace the Sales with SalesLT schema (see post, respectively GitHub repository with the changed code).

6. With SQL Server 2025, Azure SQL Database and SQL databases was intrduced the Product function.

Happy coding!

21 August 2022

🧮ERP: Implementations (Part VI: It’s all about Partnership II - Closing the Gap)

ERP Implementations Series

When starting an ERP implementation project an organization needs to fill the existing knowledge gaps in respect to whatever it takes to achieve the goals associated with the respective project. Therefore, it makes sense to work with a implementer that can help cover the gaps directly or indirectly. Moreover, it makes sense to establish a long-term relationship that would allow to harness ERP system’s capabilities after project’s end, increase the ROI and, why not, find other areas of cooperation. It’s in theory what a partner does, and a strategic technology partnership is about – providing any kind of technological expertise the customer doesn't have in-house.

Unfortunately, from being a ‘service provider’ to becoming a ‘partner’ is a challenging road for many organizations, especially when this type of relationship is not understood and managed accordingly. Partnership’s management may resume in defining common goals, principles, values and processes, establishing a communication strategy and a common understanding of the challenges and the steps ahead, providing visibility into the cost estimates, billing, resources’ availability and utilization. Addressing these aspects would offer a framework on which the partnerships can nourish. Without considering these topics, the implementer remains just a 'service provider', no matter of the names used to characterize the relationship.

Now, the use of the word ‘partner’ would make someone think that only one partner is considered, typically a big to middle-sized organization that would have this kind of resources. The main reason behind this reasoning is that the number of functional areas and volume of skillset required for filling the requirements of an implementation are high compared with other projects, the resources needing to be available on-demand without affecting the other constraints: costs, quality, time. This can be challenging, therefore can be met scenarios in which two or more external organizations are involved in the partnership, ideally organizations that complement each other.

It is common in ERP implementations to appeal also to individual consultants for specific areas or the whole project. The principles and values of a partnership, as well the framework behind, can be applied to individual consultants as well. Independently of resources’ provenience more important is the partnership ‘mindset’ - being together in the same boat, working together on a shared and understood strategy, with clear goals and objectives.

Moreover, the people participating in the project must have a ‘partner's mindset’ as well. Without this, the project will likely get different impulses in the wrong direction(s), as a group’s interests will take priority over the ones of the organization. Ideally, this mindset should extend to the whole organization as topics like Data Quality and Process Improvement must be an organization’s effort, deeply imprinted in organization’s culture.

More like ever, it’s important for the business to see and treat the IT department as a ‘partner’ and not as a ‘service provider’ by providing the needed level of transparency in requirements, issues, practices and processes, by treating the IT department as equal party in the decision-making and addressing its current and future strategical requirements. Ideally, this partnership should happen long before the implementation starts, given that it takes time for mentalities and practices to change, for knowledge to be acquired and used appropriately.

Building a partnership takes time, effort and strategic thinking, this on top of the actual implementation, increasing thus the overall complexity, at least at the beginning. Does it pay off? Like in a marriage, it’s useful to have somebody you can trust, who knows you, whom you can rely upon, and talk with to find solutions. However, only time will tell whether such expectations are met and kept till the end.

Previous <<||>> Next Post

09 August 2022

🧭🪄Business Intelligence: Power BI (Part I: Learning Curve I)

A learning curve attempts depicting the (average) time it takes a person to learn how to use a method, tool, or technique, tracing the path from newbie to mastery. A common definition of the learning curve is based on the correlation between a learner’s performance on a task or activity and the number of attempts or amount of time required to complete it.

There are several diagrams in circulation which depict the correlation between the difficulty of Power BI concepts and probably their implementation as functionality. Even if they reflect to some degree the rate of learning, their simplicity and fuzziness can easily make one question their accuracy in reflecting the reality.

Researchers tend to categorize the curves associated with the learning process in simple idealized patterns like S-curve (aka sigmoid), exponential growth, exponential rise and fall to limit, or power law, however the learning process in IT-based endeavors is seldom characterized by a linear or exponential curve, given that the tasks seldom allow a steady path. The jumps of knowledge between tasks can be wide enough to appear insurmountable, and they can prove to be quite of a challenge without some help.

Like a baby’s first steps, we, as learners, must learn first to crawl, before making some unsteady steps, and it can take long time until visible progress is made. It’s a slow progress until we suddenly hit a (tipping) point from which everything seems easy, fact that increases our confidence in us. On the other side, when we find that we make no visible progress for a long period, it’s easy to arrive to the opposite, a critical zone, which in extremis could make one lose interest.

As beginners, after the first tipping point on the learning journey, it’s easy to arrive at a plateau in which there seem no need to learn new things, the current knowledge allowing to handle a range of tasks of small to average complexity. This can last for a long time, and then, a big thing comes our way – a hard problem to solve or a concept hard to understand. It’s the point where we stagnate, and the deeper we go, and the more such challenges are thrown in our way, the more difficult the learning seems to be. However, with new understanding, small steps are made, one step after the other, the pace makes us to evolve faster until we reach again a critical point from which the process increases smoothly until we seem to stagnate again. We meet again a hard limit to growth, which seems to be more solid than the previous one.

Both limits to growth can appear to be hard, however, considering that the knowledge in the field expands, more opportunity for growth appear, thus, the limits are apparent. Even if knowledge tends to increase ‘indefinitely’, the limits are there in terms of complexity, time available, knowledge quality (incl. availability) or any other dimension of the learning process. Moreover, these successions of tipping points, growth limits, plateaus, critical, steady and fast progress zones can occur in several iterations in the learning path. Thus, the path seems to resemble a snakelike curve with many ups and downs.

For the learner is important to be aware of this last aspect, there are always ups and downs, taking effort, patience and maybe an expert’s help to bridge the gaps in between. The chances are high that the gap between what we think we know and what we know is considerable, therefore a reality check is useful from time to time. A new problem to tackle will provide that occasion!

Previous Post <<||>> Next Post

SQL Troubles

Pages

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part II: Running Totals) 🆕

💎💫SQL Reloaded: Querying Azure Synapse Metadata for the D365 CRM & FO Tables Exported via Azure Synapse Link for Dataverse

💎🏭SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part I: Simple Aggregations)

19 October 2022

🌡Performance Management: Mastery (Part II: First Time Right - The Aim toward Operational Excellence)

18 October 2022

💎SQL Reloaded: Successive Price Increases/Discounts via Windowing Functions and CTEs

21 August 2022

🧮ERP: Implementations (Part VI: It’s all about Partnership II - Closing the Gap)

09 August 2022

🧭🪄Business Intelligence: Power BI (Part I: Learning Curve I)

About Me