09 February 2025

🌌🏭KQL Reloaded: First Steps (Part XI: Window Functions)

Window functions are one of the powerful features available in RDBMS as they allow to operate across several lines or a result set and return a result for each line. The good news is that KQL supports several window functions, which allow to address several scenarios. However, the support is limited and needs improvement.

One of the basic scenarios that takes advantage of windows functions is the creation of a running sum or creating a (dense) rank across a whole dataset. For this purposes, in Kusto one can use the row_cumsum for running sum, respectively the row_number, row_rank_dense and row_rank_min for ranking. In addition, one can refer to the values of a field from previous (prev) and next record. 

// rank window functions within dataset (not partitioned)
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| sort by CustomerKey asc, DateKey asc 
| extend Rank1 = row_rank_dense(DateKey)
    , Rank2 = row_number(0)
, Rank3 = row_rank_min(DateKey) , Sum = row_cumsum(TotalCost) , NextDate = next(DateKey) , PrevDate = prev(DateKey)

Often, it's needed to operate only inside of a partition and not across the whole dataset. Some of the functions provide additional parameters for this:

// rank window functions within partitions
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| sort by CustomerKey asc, DateKey asc 
| extend RowRank1 = row_rank_dense(DateKey, prev(CustomerKey) != CustomerKey)
    , RowRank2 = row_number(0, prev(CustomerKey) != CustomerKey)
    , Sum = row_cumsum(TotalCost, prev(CustomerKey) != CustomerKey)
    , NextDate = iif(CustomerKey == next(CustomerKey), next(DateKey), datetime(null))
    , PrevDate = iif(CustomerKey == prev(CustomerKey), prev(DateKey),  datetime(null))

In addition, the partitions can be defined explicitly via the partition operator:

// creating explicit partitions
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| where CustomerKey in (12617, 12618)
| project CustomerKey, DateKey, ProductName, TotalCost
| partition by CustomerKey
(
    order by DateKey asc
    | extend prev_cost = prev(TotalCost, 1)
)
| order by CustomerKey asc, DateKey asc
| extend DifferenceCost = TotalCost - prev_cost

It will be interesting to see whether Microsoft plans for the introduction of further window functions in KQL to bridge the gap. The experience proved that such functions are quite useful in data analytics, sometimes developers needing to go a long extent for achieving the respective behavior (see visual calcs in Power BI). For example, SQL Server leverages several types of such functions, though it took Microsoft more than several versions to this make progress. More over, developers can introduce their own functions, even if this involves .Net programming. 

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Kusto: Window functions overview [link]

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.