Window functions are one of the powerful features available in RDBMS as they allow to operate across several lines or a result set and return a result for each line. The good news is that KQL supports several window functions, which allow to address several scenarios. However, the support is limited and needs improvement.
One of the basic scenarios that takes advantage of windows functions is the creation of a running sum or creating a (dense) rank across a whole dataset. For this purposes, in Kusto one can use the row_cumsum for running sum, respectively the row_number, row_rank_dense and row_rank_min for ranking. In addition, one can refer to the values of a field from previous (prev) and next record.
// rank window functions within dataset (not partitioned) NewSales | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201"))) | where CustomerKey in (12617, 12618) | project CustomerKey, DateKey, ProductName, TotalCost | sort by CustomerKey asc, DateKey asc | extend Rank1 = row_rank_dense(DateKey), Rank2 = row_number(0), Rank3 = row_rank_min(DateKey) , Sum = row_cumsum(TotalCost) , NextDate = next(DateKey) , PrevDate = prev(DateKey)
Often, it's needed to operate only inside of a partition and not across the whole dataset. Some of the functions provide additional parameters for this:
// rank window functions within partitions NewSales | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201"))) | where CustomerKey in (12617, 12618) | project CustomerKey, DateKey, ProductName, TotalCost | sort by CustomerKey asc, DateKey asc | extend RowRank1 = row_rank_dense(DateKey, prev(CustomerKey) != CustomerKey) , RowRank2 = row_number(0, prev(CustomerKey) != CustomerKey) , Sum = row_cumsum(TotalCost, prev(CustomerKey) != CustomerKey) , NextDate = iif(CustomerKey == next(CustomerKey), next(DateKey), datetime(null)) , PrevDate = iif(CustomerKey == prev(CustomerKey), prev(DateKey), datetime(null))
In addition, the partitions can be defined explicitly via the partition operator:
// creating explicit partitions NewSales | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201"))) | where CustomerKey in (12617, 12618) | project CustomerKey, DateKey, ProductName, TotalCost | partition by CustomerKey ( order by DateKey asc | extend prev_cost = prev(TotalCost, 1) ) | order by CustomerKey asc, DateKey asc | extend DifferenceCost = TotalCost - prev_cost
It will be interesting to see whether Microsoft plans for the introduction of further window functions in KQL to bridge the gap. The experience proved that such functions are quite useful in data analytics, sometimes developers needing to go a long extent for achieving the respective behavior (see visual calcs in Power BI). For example, SQL Server leverages several types of such functions, though it took Microsoft more than several versions to this make progress. More over, developers can introduce their own functions, even if this involves .Net programming.
Happy coding!
Previous Post <<||>> Next Post
References:
[1] Microsoft Learn (2024) Kusto: Window functions overview [link]