Between the many new features introduced in SQL Server 2022, Microsoft brings the WINDOW clause for defining the partitioning and ordering of a dataset which uses window functions. But before unveiling the change, let's take a walk down the memory lane.
In the early ages of SQL Server, many of descriptive statistics techniques were resuming in using the averages over a dataset that contained more or less complex logic. For example, to get the total and average of sales per Month, the query based on AdventureWorks database would look something like:
-- aggregated sales orders - detailed (SQL Server 2000+) SELECT SOL.ProductID , Year(SOH.OrderDate) [Year] , Month(SOH.OrderDate) [Month] , SUM(SOL.OrderQty) TotalQty , AVG(SOL.OrderQty) AverageQty FROM Sales.SalesOrderDetail SOL JOIN Sales.SalesOrderHeader SOH ON SOL.SalesOrderID = SOH.SalesOrderID WHERE SOL.ProductId IN (745) AND Year(SOH.OrderDate) = 2012 AND Month(SOH.OrderDate) BETWEEN 1 AND 3 GROUP BY SOL.ProductID , Year(SOH.OrderDate) , Month(SOH.OrderDate)
When possible, the base logic was encapsulated within a view, hiding query's complexity from the users, allowing also reuse and better maintenance, just to mention a few of the benefits of this approach:
-- encapsulating the logic into a view (SQL Server 2000+) CREATE OR ALTER VIEW Sales.vSalesOrders AS -- sales orders details SELECT SOL.SalesOrderID , SOL.ProductID , Cast(SOH.OrderDate as Date) OrderDate , Year(SOH.OrderDate) [Year] , Month(SOH.OrderDate) [Month] , Cast(SOL.OrderQty as decimal(18,2)) OrderQty -- many more columns FROM Sales.SalesOrderDetail SOL JOIN Sales.SalesOrderHeader SOH ON SOL.SalesOrderID = SOH.SalesOrderID /* -- some constraints can be brought directly into the view WHERE SOL.ProductID IN (745) AND Year(SOH.OrderDate) = 2012 AND Month(SOH.OrderDate) BETWEEN 1 AND 3 */
-- aggregated sales orders - simplified (SQL Server 2000+) SELECT SOL.ProductID , SOL.[Year] , SOL.[Month] , SUM(SOL.OrderQty) TotalQty , AVG(SOL.OrderQty) AverageQty FROM Sales.vSalesOrders SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3 GROUP BY SOL.ProductID , SOL.[Year] , SOL.[Month]
In many cases, performing this kind of aggregations was all the users wanted. However, quite often it's useful to look and the individual sales and consider them as part of the broader picture. For example, it would be interesting to know, what's the percentage of an individual value from the total (see OrderQtyPct) or any similar aggregate information. In SQL Server 2000 addressing such requirements would involve a join between the same view - one query with the detailed records, while the other contains the aggregated data:
-- sales orders from the totals (SQL Server 2000+) SELECT SOL.* , ASO.TotalQty , Cast(CASE WHEN ASO.TotalQty <> 0 THEN 100*SOL.OrderQty/ASO.TotalQty ELSE 0 END as decimal(18,2)) OrderQtyPct , ASO.AverageQty FROM Sales.vSalesOrders SOL JOIN ( -- aggregated sales orders SELECT SOL.ProductID , SOL.[Year] , SOL.[Month] , SUM(SOL.OrderQty) TotalQty , AVG(SOL.OrderQty) AverageQty FROM Sales.vSalesOrders SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3 GROUP BY SOL.ProductId , SOL.[Year] , SOL.[Month] ) ASO ON SOL.ProductID = ASO.ProductID AND SOL.[Year] = ASO.[Year] AND SOL.[Month] = ASO.[Month] ORDER BY SOL.ProductID , SOL.OrderDate
-- sales orders with CTE SQL Server 2005+ query WITH CTE AS (--sales orders in scope SELECT SOL.SalesOrderID , SOL.ProductID , Cast(SOH.OrderDate as Date) OrderDate , Year(SOH.OrderDate) [Year] , Month(SOH.OrderDate) [Month] , SOL.OrderQty FROM Sales.SalesOrderDetail SOL JOIN Sales.SalesOrderHeader SOH ON SOL.SalesOrderID = SOH.SalesOrderID WHERE SOL.ProductID IN (745) AND Year(SOH.OrderDate) = 2012 AND Month(SOH.OrderDate) BETWEEN 1 AND 3 ) SELECT SOL.SalesOrderID , SOL.ProductID , SOL.OrderDate , SOL.[Year] , SOL.[Month] , SOL.OrderQty , SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty , AVG(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty FROM CTE SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3
-- aggregated sales orders (SQL Server 2005+) SELECT SOL.SalesOrderID , SOL.ProductID , SOL.OrderDate , SOL.[Year] , SOL.[Month] , SOL.OrderQty , SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty , AVG(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty , Cast(CASE WHEN SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) ELSE 0 END as decimal(18,2)) OrderQtyPct FROM Sales.vSalesOrders SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3
Now, SQL Server 2022 allows to further simplify the logic by allowing to define with a name the window at the end of the statement and reference the name in several window functions (see last line of the query):
-- Aggregations per month (SQL Server 2022+) SELECT SOL.SalesOrderID , SOL.ProductID , SOL.OrderDate , SOL.[Year] , SOL.[Month] , SOL.OrderQty , SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQty , AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQty , Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPct FROM Sales.vSalesOrders SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3 WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month])
Moreover, several windows can be defined. For example, aggregating the data also per year:
-- Aggregations per month and year (SQL Server 2022+) SELECT SOL.SalesOrderID , SOL.ProductID , SOL.OrderDate , SOL.[Year] , SOL.[Month] , SOL.OrderQty , SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQtyByMonth , AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQtyByMonth , Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPctByMonth , SUM(SOL.OrderQty) OVER SalesByYear AS TotalQtyByYear , AVG(SOL.OrderQty) OVER SalesByYear AS AverageQtyByYear , Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByYear <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByYear ELSE 0 END as decimal(18,2)) OrderQtyPctByYear FROM Sales.vSalesOrders SOL WHERE SOL.ProductID IN (745) AND SOL.[Year] = 2012 AND SOL.[Month] BETWEEN 1 AND 3 WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month]) , SalesByYear AS (PARTITION BY SOL.[Year])
Isn't that cool? It will take probably some time to get used in writing and reading this kind of queries, though using descriptive names for the windows should facilitate the process!
As a side note, even if there's no product function like in Excel, there's a mathematical trick to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see example).
Notes:
The queries work also in a SQL databases in Microsoft Fabric. Just replace the Sales with SalesLT schema (see post, respectively GitHub repository with the changed code).
Happy coding!
No comments:
Post a Comment