SQL Troubles: window functions

Showing posts with label window functions. Show all posts

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 8-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventstream(s)

{def} feature in Microsoft Fabric's Real-Time Intelligence experience, that allows to bring real-time events into Fabric

bring real-time events into Fabric, transform them, and then route them to various destinations without writing any code

⇐ aka no-code solution
{feature} drag and drop experience

gives users an intuitive and easy way to create your event data processing, transforming, and routing logic without writing any code

work by creating a pipeline of events from multiple internal and external sources to different destinations

a conveyor belt that moves data from one place to another [1]
transformations to the data can be added along the way [1]

filtering, aggregating, or enriching

{def} eventstream

an instance of the Eventstream item in Fabric [2]
{feature} end-to-end data flow diagram

provide a comprehensive understanding of the data flow and organization [2].

{feature} eventstream visual editor

used to design pipelines by dragging and dropping different nodes [1]
sources

where event data comes from
one can choose

the source type
the data format
the consumer group

Azure Event Hubs

allows to get event data from an Azure event hub [1]
allows to create a cloud connection with the appropriate authentication and privacy level [1]

Azure IoT Hub

SaaS service used to connect, monitor, and manage IoT assets with a no-code experience [1]

CDC-enabled databases

software process that identifies and tracks changes to data in a database, enabling real-time or near-real-time data movement [1]
Azure SQL Database
PostgreSQL Database
MySQL Database
Azure Cosmos DB

Google Cloud Pub/Sub

messaging service for exchanging event data among applications and services [1]

Amazon Kinesis Data Streams

collect, process, and analyze real-time, streaming data [1]

Confluent Cloud Kafka

fully managed service based on Apache Kafka for stream processing [1]

Fabric workspace events

events triggered by changes in Fabric Workspace

e.g. creating, updating, or deleting items.

allows to capture, transform, and route events for in-depth analysis and monitoring within Fabric [1]
the integration offers enhanced flexibility in tracking and understanding workspace activities [1]

Azure blob storage events

system triggers for actions like creating, replacing, or deleting a blob [1]

these actions are linked to Fabric events

allowing to process Blob Storage events as continuous data streams for routing and analysis within Fabric [1]

support streamed or unstreamed events [1]

custom endpoint

REST API or SDKs can be used to send event data from custom app to eventstream [1]
allows to specify the data format and the consumer group of the custom app [1]

sample data

out-of-box sample data

destinations

where transformed event data is stored.

in a table in an eventhouse or a lakehouse [1]
redirect data to

another eventstream for further processing [1]
an activator to trigger an action [1]

Eventhouse

offers the capability to funnel your real-time event data into a KQL database [1]

Lakehouse

allows to preprocess real-time events before their ingestion in the lakehouse

the events are transformed into Delta Lake format and later stored in specific lakehouse tables [1]

facilitating the data warehousing needs [1]

custom endpoint

directs real-time event traffic to a bespoke application [1]
enables the integration of proprietary applications with the event stream, allowing for the immediate consumption of event data [1]
{scenario} aim to transfer real-time data to an independent system not hosted on the Microsoft Fabric [1]

Derived Stream

specialized destination created post-application of stream operations like Filter or Manage Fields to an eventstream
represents the altered default stream after processing, which can be routed to various destinations within Fabric and monitored in the Real-Time hub [1]

Fabric Activator

enables to use Fabric Activator to trigger automated actions based on values in streaming data [1]

transformations

filter or aggregate the data as is processed from the stream [1]
include common data operations

filtering

filter events based on the value of a field in the input
depending on the data type (number or text), the transformation keeps the values that match the selected condition, such as is null or is not null [1]

joining

transformation that combines data from two streams based on a matching condition between them [1]

aggregating

calculates an aggregation every time a new event occurs over a period of time [1]

Sum, Minimum, Maximum, or Average

allows renaming calculated columns, and filtering or slicing the aggregation based on other dimensions in your data [1]
one can have one or more aggregations in the same transformation [1]

grouping

allows to calculate aggregations across all events within a certain time window [1]

one can group by the values in one or more fields [1]

allows for the renaming of columns

similar to the Aggregate transformation
⇐ provides more options for aggregation and includes more complex options for time windows [1]

allows to add more than one aggregation per transformation [1]

allows to define the logic needed for processing, transforming, and routing event data [1]

union

allows to connect two or more nodes and add events with shared fields (with the same name and data type) into one table [1]

fields that don't match are dropped and not included in the output [1]

expand

array transformation that allows to create a new row for each value within an array [1]

manage fields

allows to add, remove, change data type, or rename fields coming in from an input or another transformation [1]

temporal windowing functions

enable to analyze data events within discrete time periods [1]
way to perform operations on the data contained in temporal windows [1]

e.g. aggregating, filtering, or transforming streaming events that occur within a specified time period [1]
allow analyzing streaming data that changes over time [1]

e.g. sensor readings, web-clicks, on-line transactions, etc.
provide great flexibility to keep an accurate record of events as they occur [1]

{type} tumbling windows

divides incoming events into fixed and nonoverlapping intervals based on arrival time [1]

{type} sliding windows

take the events into fixed and overlapping intervals based on time and divides them [1]

{type} session windows

divides events into variable and nonoverlapping intervals that are based on a gap of lack of activity [1]

{type} hopping windows

are different from tumbling windows as they model scheduled overlapping window [1]

{type} snapshot windows

group event stream events that have the same timestamp and are unlike the other windowing functions, which require the function to be named [1]
one can add the System.Timestamp() to the GROUP BY clause [1]

{type} window duration

the length of each window interval [1]
can be in seconds, minutes, hours, and even days [1]

{parameter} window offset

optional parameter that shifts the start and end of each window interval by a specified amount of time [1]

{concept} grouping key

one or more columns in an event data use to group the data by [1]

aggregation function

one or more of the functions applied to each group of events in each window [1]

where the counts, sums, averages, min/max, and even custom functions become useful [1]

see the event data flowing through the pipeline in real-time [1]
handles the scaling, reliability, and security of event stream automatically [1]

no need to write any code or manage any infrastructure [1]

{feature} eventstream editing canvas

used to

add and manage sources and destinations [1]
see the event data [1]
check the data insights [1]
view logs for each source or destination [1]

{feature} Apache Kafka endpoint on the Eventstream item

{benefit} enables users to connect and consume streaming events through the Kafka protocol [2]

application using the protocol can send or receive streaming events with specific topics [2]
requires updating the connection settings to use the Kafka endpoint provided in the Eventstream [2]

{feature} support runtime logs and data insights for the connector sources in Live View mode [3]

allows to examine detailed logs generated by the connector engines for the specific connector [3]

help with identifying failure causes or warnings [3]
⇐ accessible in the bottom pane of an eventstream by selecting the relevant connector source node on the canvas in Live View mode [3]

{feature} support data insights for the connector sources in Live View mode [3]
{feature} integrates eventstreams CI/CD tools

{benefit} developers can efficiently build and maintain eventstreams from end-to-end in a web-based environment, while ensuring source control and smooth versioning across projects [3]

{feature} REST APIs

allow to automate and manage eventstreams programmatically

{benefit} simplify CI/CD workflows and making it easier to integrate eventstreams with external applications [3]

{recommendation} use event streams feature with at least SKU: F4 [2]
{limitation} maximum message size: 1 MB [2]
{limitation} maximum retention period of event data: 90 days [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Microsoft Fabric: Use real-time eventstreams in Microsoft Fabric [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Fabric Eventstream - overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What's new in Fabric event streams? [link]

[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

[5] Microsoft Learn (2025) Use real-time eventstreams in Microsoft Fabric [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Fabric Updates Blog (2024) CI/CD – Git Integration and Deployment Pipeline [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

API - Application Programming Interface
CDC - Change Data Capture
CI/CD - Continuous Integration/Continuous Delivery
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
RTI - Real-Time Intelligence

SaaS - Software-as-a-Service
SDK - Software Development Kit
SKU - Stock Keeping Unit

09 February 2025

🌌🏭KQL Reloaded: First Steps (Part XI: Window Functions)

Window functions are one of the powerful features available in RDBMS as they allow to operate across several lines or a result set and return a result for each line. The good news is that KQL supports several window functions, which allow to address several scenarios. However, the support is limited and needs improvement.

One of the basic scenarios that takes advantage of windows functions is the creation of a running sum or creating a (dense) rank across a whole dataset. For this purposes, in Kusto one can use the row_cumsum for running sum, respectively the row_number, row_rank_dense and row_rank_min for ranking. In addition, one can refer to the values of a field from previous (prev) and next record.

// rank window functions within dataset (not partitioned)
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| sort by CustomerKey asc, DateKey asc 
| extend Rank1 = row_rank_dense(DateKey)
    , Rank2 = row_number(0)
    , Rank3 = row_rank_min(DateKey)
    , Sum = row_cumsum(TotalCost)
    , NextDate = next(DateKey)
    , PrevDate = prev(DateKey)

Often, it's needed to operate only inside of a partition and not across the whole dataset. Some of the functions provide additional parameters for this:

// rank window functions within partitions
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| sort by CustomerKey asc, DateKey asc 
| extend RowRank1 = row_rank_dense(DateKey, prev(CustomerKey) != CustomerKey)
    , RowRank2 = row_number(0, prev(CustomerKey) != CustomerKey)
    , Sum = row_cumsum(TotalCost, prev(CustomerKey) != CustomerKey)
    , NextDate = iif(CustomerKey == next(CustomerKey), next(DateKey), datetime(null))
    , PrevDate = iif(CustomerKey == prev(CustomerKey), prev(DateKey),  datetime(null))

In addition, the partitions can be defined explicitly via the partition operator:

// creating explicit partitions
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| partition by CustomerKey
(
    order by DateKey asc
    | extend prev_cost = prev(TotalCost, 1)
)
| order by CustomerKey asc, DateKey asc
| extend DifferenceCost = TotalCost - prev_cost

It will be interesting to see whether Microsoft plans for the introduction of further window functions in KQL to bridge the gap. The experience proved that such functions are quite useful in data analytics, sometimes developers needing to go a long extent for achieving the respective behavior (see visual calcs in Power BI). For example, SQL Server leverages several types of such functions, though it took Microsoft more than several versions to this make progress. More over, developers can introduce their own functions, even if this involves .Net programming.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Kusto: Window functions overview [link]

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part III: Ranking) 🆕

In two previous posts I shown how to use the newly introduced WINDOW clause in SQL Server 2022 for simple aggregations, respectively running totals, by providing some historical context concerning what it took to do the same simple aggregations as SUM or AVG within previous versions of SQL Server. Let's look at another scenario based on the previously created Sales.vSalesOrders view - ranking records within a partition.

There are 4 ranking functions that work across partitions: Row_Number, Rank, Dense_Rank and NTile. However, in SQL Server 2000 only Row_Number could be easily implemented, and this only if there is a unique identifier (or one needed to create one on the fly):

-- ranking based on correlated subquery (SQL Server 2000+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, (-- correlated subquery
  SELECT count(SRT.SalesOrderId)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
    AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RowNumberByDate
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderDate ASC

As alternative for implementing the other ranking functions, one could use procedural language for looping, though this approach was not recommendable given the performance concerns.

SQL Server 2005 introduced all 4 ranking functions, as they are in use also today:

-- ranking functions (SQL Server 2005+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings
, Row_Number() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) RowNumberQty
, Rank() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS RankQty
, Dense_Rank() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS DenseRankQty
, NTile(4) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS NTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Now, in SQL Server 2022 the WINDOW clause allows simplifying the query as follows by defining the partition only once:

-- ranking functions (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings

, Row_Number() OVER SalesByMonth AS RowNumberQty
, Rank() OVER SalesByMonth AS RankQty
, Dense_Rank() OVER SalesByMonth AS DenseRankQty
, NTile(4) OVER SalesByMonth AS NTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC)
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Forward (and backward) referencing of one window into the other can be used with ranking functions as well:

-- ranking functions with ascending/descending sorting (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings (descending)
, Row_Number() OVER SalesByMonthSortedDESC AS DescRowNumberQty
, Rank() OVER SalesByMonthSortedDESC AS DescRankQty
, Dense_Rank() OVER SalesByMonthSortedDESC AS DescDenseRankQty
, NTile(4) OVER SalesByMonthSortedDESC AS DescNTileQty
-- rankings (ascending)
, Row_Number() OVER SalesByMonthSortedASC AS AscRowNumberQty
, Rank() OVER SalesByMonthSortedASC AS AscRankQty
, Dense_Rank() OVER SalesByMonthSortedASC AS AscDenseRankQty
, NTile(4) OVER SalesByMonthSortedASC AS AscNTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month])
, SalesByMonthSortedDESC AS (SalesByMonth ORDER BY SOL.OrderQty DESC)
, SalesByMonthSortedASC AS (SalesByMonth ORDER BY SOL.OrderQty ASC)
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Happy coding!

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part II: Running Totals) 🆕

In the previous post I shown how to use the newly introduced WINDOW clause in SQL Server 2022 by providing some historical context concerning what it took to do the same simple aggregations as SUM or AVG within previous versions of SQL Server. Let's look at another scenario based on the previously created Sales.vSalesOrders view - using the same two functions in providing running totals.

A running total is a summation of a sequence of values (e.g. OrderQty) ordered by a key, typically one or more columns that uniquely define the sequence, something like a unique record identifier (e.g. SalesOrderId). Moreover, the running total can be defined within a partition (e.g. Year/Month).

In SQL Server 2000, to calculate the running total and average for a value within a partition would resume implementing the logic for each attribute in correlated subqueries. One needed thus one subquery for each attribute, resulting in multiple processing of the base tables. Even if the dataset was already in the memory, it still involves a major overhead.

-- running total/average based on correlated subquery (SQL Server 2000+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, (-- correlated subquery
  SELECT SUM(SRT.OrderQty)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
	AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RunningTotalQty
, (-- correlated subquery
  SELECT AVG(SRT.OrderQty)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
	AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RunningAvgQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Then SQL Server 2005 allowed consolidating the different correlated subqueries into one by using the CROSS APPLY join:

-- running total/average based on correlated subquery with CROSS APPLY (SQL Server 2005+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SRT.RunningTotalQty
, SRT.RunningAvgQty
FROM Sales.vSalesOrders SOL
     CROSS APPLY (-- correlated subquery
	  SELECT SUM(SRT.OrderQty) RunningTotalQty
	  , AVG(SRT.OrderQty) RunningAvgQty
	  FROM Sales.vSalesOrders SRT
	  WHERE SRT.ProductId = SOL.ProductId 
		AND SRT.[Year] = SOL.[Year]
		AND SRT.[Month] = SOL.[Month]
		AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) SRT
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Even if SQL Server 2005 introduced window functions that work on a partition, an ORDER BY clause was supported only with SQL Server 2012:

-- running total/average based on window functions (SQL Server 2012+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId) RunningTotalQty
, AVG(SOL.OrderQty) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId) RunningAvgQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.SalesOrderId

Now, in SQL Server 2022 the WINDOW clause allows simplifying the query as follows by defining the partition only once:

-- running total/average based on window functions and clause (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS RunningTotalQty

, AVG(SOL.OrderQty) OVER SalesByMonth AS RunningAvgQty

FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.SalesOrderId)
ORDER BY SOL.SalesOrderId

Moreover, the WINDOW clause allows forward (and backward) referencing of one window into the other:

-- running total/average based on window functions and clause (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- simple aggregations
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQty

, AVG(SOL.OrderQty) OVER SalesByMonth AS AvgQty

-- running totals
, SUM(SOL.OrderQty) OVER SalesByMonthSorted AS RunningTotalQty

, AVG(SOL.OrderQty) OVER SalesByMonthSorted AS RunningAvgQty

FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month])
, SalesByMonthSorted AS (SalesByMonth ORDER BY SOL.SalesOrderId)
ORDER BY SOL.SalesOrderId

Happy coding!

💎🏭SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part I: Simple Aggregations)

Between the many new features introduced in SQL Server 2022, Microsoft brings the WINDOW clause for defining the partitioning and ordering of a dataset which uses window functions. But before unveiling the change, let's take a walk down the memory lane.

In the early ages of SQL Server, many of descriptive statistics techniques were resuming in using the averages over a dataset that contained more or less complex logic. For example, to get the total and average of sales per Month, the query based on AdventureWorks database would look something like:

-- aggregated sales orders - detailed (SQL Server 2000+)
SELECT SOL.ProductID
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, SUM(SOL.OrderQty) TotalQty
, AVG(SOL.OrderQty) AverageQty
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
WHERE SOL.ProductId IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
GROUP BY SOL.ProductID
, Year(SOH.OrderDate) 
, Month(SOH.OrderDate)

When possible, the base logic was encapsulated within a view, hiding query's complexity from the users, allowing also reuse and better maintenance, just to mention a few of the benefits of this approach:

-- encapsulating the logic into a view (SQL Server 2000+)
CREATE OR ALTER VIEW Sales.vSalesOrders
AS
-- sales orders details
SELECT SOL.SalesOrderID
, SOL.ProductID
, Cast(SOH.OrderDate as Date) OrderDate
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, Cast(SOL.OrderQty as decimal(18,2)) OrderQty
-- many more columns 
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
/* -- some constraints can be brought directly into the view
WHERE SOL.ProductID IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
*/

The above query now becomes:

-- aggregated sales orders - simplified (SQL Server 2000+)
SELECT SOL.ProductID
, SOL.[Year]
, SOL.[Month]
, SUM(SOL.OrderQty) TotalQty
, AVG(SOL.OrderQty) AverageQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
GROUP BY SOL.ProductID
, SOL.[Year]
, SOL.[Month]

In many cases, performing this kind of aggregations was all the users wanted. However, quite often it's useful to look and the individual sales and consider them as part of the broader picture. For example, it would be interesting to know, what's the percentage of an individual value from the total (see OrderQtyPct) or any similar aggregate information. In SQL Server 2000 addressing such requirements would involve a join between the same view - one query with the detailed records, while the other contains the aggregated data:

-- sales orders from the totals (SQL Server 2000+)
SELECT SOL.*
, ASO.TotalQty
, Cast(CASE WHEN ASO.TotalQty <> 0 THEN 100*SOL.OrderQty/ASO.TotalQty ELSE 0 END as decimal(18,2)) OrderQtyPct
, ASO.AverageQty
FROM Sales.vSalesOrders SOL
     JOIN ( -- aggregated sales orders
		SELECT SOL.ProductID
		, SOL.[Year]
		, SOL.[Month]
		, SUM(SOL.OrderQty) TotalQty
		, AVG(SOL.OrderQty) AverageQty
		FROM Sales.vSalesOrders SOL
		WHERE SOL.ProductID IN (745)
		  AND SOL.[Year] = 2012
		  AND SOL.[Month] BETWEEN 1 AND 3
		GROUP BY SOL.ProductId
		, SOL.[Year]
		, SOL.[Month]
	) ASO
	ON SOL.ProductID = ASO.ProductID 
   AND SOL.[Year] = ASO.[Year]
   AND SOL.[Month] = ASO.[Month]
ORDER BY SOL.ProductID
, SOL.OrderDate

Now, just imagine that you can't create a view and having to duplicate the logic each time the view is used! Aggregating the data at different levels would require similar joins to the same view or piece of logic.

Fortunately, SQL Server 2005 came with two long-awaited features, common table expressions (CTEs) and window functions, which made a big difference. First, CTEs allowed defining inline views that can be referenced multiple times. Secondly, the window functions allowed aggregations within a partition.

-- sales orders with CTE SQL Server 2005+ query 
WITH CTE
AS (--sales orders in scope
SELECT SOL.SalesOrderID 
, SOL.ProductID
, Cast(SOH.OrderDate as Date) OrderDate
, Year(SOH.OrderDate) [Year]
, Month(SOH.OrderDate) [Month]
, SOL.OrderQty
FROM Sales.SalesOrderDetail SOL
     JOIN Sales.SalesOrderHeader SOH
	   ON SOL.SalesOrderID = SOH.SalesOrderID
WHERE SOL.ProductID IN (745)
  AND Year(SOH.OrderDate) = 2012
  AND Month(SOH.OrderDate) BETWEEN 1 AND 3
)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty
, AVG(SOL.OrderQty)  OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty
FROM CTE SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3

This kind of structuring the queries allows to separate the base logic for better maintainability, readability and fast prototyping. Once the logic from CTE becomes stable, it can be moved within a view, following to replace the CTE reference with view's name in the final query. On the other side, the window functions allow writing more flexible and complex code, even if the statements can become occasionally complex (see OrderQtyPct):

-- aggregated sales orders (SQL Server 2005+)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) TotalQty
, AVG(SOL.OrderQty)  OVER(PARTITION BY SOL.[Year], SOL.[Month]) AverageQty
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER(PARTITION BY SOL.[Year], SOL.[Month]) ELSE 0 END as decimal(18,2)) OrderQtyPct
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3

Now, SQL Server 2022 allows to further simplify the logic by allowing to define with a name the window at the end of the statement and reference the name in several window functions (see last line of the query):

-- Aggregations per month (SQL Server 2022+)
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQty
, AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQty
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPct
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month])

Moreover, several windows can be defined. For example, aggregating the data also per year:

-- Aggregations per month and year (SQL Server 2022+) 
SELECT SOL.SalesOrderID 
, SOL.ProductID
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, SUM(SOL.OrderQty) OVER SalesByMonth AS TotalQtyByMonth
, AVG(SOL.OrderQty) OVER SalesByMonth AS AverageQtyByMonth
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByMonth <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByMonth ELSE 0 END as decimal(18,2)) OrderQtyPctByMonth
, SUM(SOL.OrderQty) OVER SalesByYear AS TotalQtyByYear
, AVG(SOL.OrderQty) OVER SalesByYear AS AverageQtyByYear
, Cast(CASE WHEN SUM(SOL.OrderQty) OVER SalesByYear <> 0 THEN 100*SOL.OrderQty/SUM(SOL.OrderQty) OVER SalesByYear ELSE 0 END as decimal(18,2)) OrderQtyPctByYear
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductID IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.[Year], SOL.[Month])
, SalesByYear AS  (PARTITION BY SOL.[Year])

Isn't that cool? It will take probably some time to get used in writing and reading this kind of queries, though using descriptive names for the windows should facilitate the process!

As a side note, even if there's no product function like in Excel, there's a mathematical trick to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see example).

Notes:
The queries work also in a SQL databases in Microsoft Fabric. Just replace the Sales with SalesLT schema (see post, respectively GitHub repository with the changed code).

Happy coding!

SQL Troubles

Pages

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

09 February 2025

🌌🏭KQL Reloaded: First Steps (Part XI: Window Functions)

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part III: Ranking) 🆕

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part II: Running Totals) 🆕

💎🏭SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part I: Simple Aggregations)

About Me