08 February 2025

🌌🏭KQL Reloaded: First Steps (Part X: Translating SQL to KQL - Correlated Subqueries)

In SQL Server and other RDBMS databases there are many  scenarios in which one needs information from a fact table based on a dimension table without requiring information from the dimension table. 

Correlated Subquery via EXISTS

Before considering the main example, let's start with a simple subquery:

// subquery in SQL
--
explain
SELECT CustomerKey
, ProductKey
, ProductName
FROM NewSales 
WHERE ProductKey IN (
    SELECT DISTINCT ProductKey
    FROM Products 
    WHERE ProductSubcategoryName = 'MP4&MP3'
    ) 

// subquery in KQL
NewSales
| where ProductKey in (
    (Products
    | where (ProductSubcategoryName == "MP4&MP3")
    | project ProductKey
    | distinct *))
| project CustomerKey, ProductKey, ProductName

Of course, the ProductKey is unique by design, though there can be dimension, fact tables or subqueries in which the value is not unique.

Now let's consider the correlated subquery pattern, which should provide the same outcome as above, though in RDBMS there are scenarios in which it provides better performance, especially when the number of values from subquery is high.

// correlated subquery in SQL
--
explain
SELECT CustomerKey
, ProductKey
, ProductName
FROM NewSales 
WHERE EXISTS (
    SELECT Products.ProductKey
    FROM Products 
    WHERE NewSales.ProductKey = Products.ProductKey)

Unfortunately, trying to translate the code via explain leads to the following error, which confirms that the syntax is not supported in KQL (see [1]):

"Error: Reference to missing column 'NewSales.ProductKey'"

 Fortunately, in this case one can use the first version of the query. 

Correlated Subquery via CROSS APPLY

Before creating the main query, let's look at the inner query and check whether it gets correctly translate to KQL:

// subquery logic
--
explain
SELECT sum(TotalCost) TotalCost 
FROM NewSales 
WHERE DateKey > '20240101' and DateKey <'20240201'

Now, let's bring the logic within the CROSS APPLY:

// correlated subquery in SQL
--
explain
SELECT ProductKey
, ProductName
, TotalCost
FROM Products
    CROSS APPLY (
        SELECT sum(TotalCost) TotalCost 
        FROM NewSales 
        WHERE DateKey > '20240101' and DateKey <'20240201'
          AND Products.ProductKey = NewSales.ProductKey
    ) DAT

Running the above code leads to the following error:

"Sql node of type 'Microsoft.SqlServer.TransactSql.ScriptDom.UnqualifiedJoin' is not implemented"

Unfortunately, many SQL queries are written following this pattern, especially when an OUTER CROSS APPLY is used, retrieving thus all the records from the dimension table. 

In this case one can rewrite the query via a RIGHT JOIN:

// correlated subquery in SQL
--
explain
SELECT PRD.ProductKey
, PRD.ProductName
, SAL.TotalCost
FROM Products PRD
    LEFT JOIN (
        SELECT ProductKey
        , sum(TotalCost) TotalCost 
        FROM NewSales 
        WHERE DateKey > '20240101' and DateKey <'20240201'
        GROUP BY ProductKey
    ) SAL
      ON PRD.ProductKey = SAL.ProductKey

// direct translation of the query
Products
| join kind=leftouter 
    (NewSales
        | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
        | summarize TotalCost=sum(TotalCost) by ProductKey
        | project ProductKey, TotalCost
    ) on ($left.ProductKey == $right.ProductKey)
| project ProductKey, ProductName, TotalCost
//| where isnull(TotalCost)
//| summarize record_number = count()


// query after restructuring
NewSales
| where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
| summarize TotalCost=sum(TotalCost) by ProductKey
| join kind=rightouter
    (
        Products
        | project ProductKey, ProductName
    ) on ($left.ProductKey == $right.ProductKey)
| project ProductKey, ProductName, TotalCost
//| where isnull(TotalCost)
//| summarize record_number = count()

During transformations it's important to check whether the number of records changes between the various versions of the query (including the most general version in which filtering constraints were applied).

Especially when SQL solutions are planned to be migrated to KQL, it's important to know which query patterns can be used in KQL. 

Happy coding!

Previous Post <<||>>  Next Post

References:
[1] GitHib (2024) Module Ex-01 - Advanced KQL [link]

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.