One of the great features of KQL is the possibility of translating SQL code to KQL via the "explain" keyword, allowing thus to port SQL code to KQL, respectively help translate knowledge from one programming language to another.
Let's start with a basic example:
// transform SQL to KQL code (to be run only the first part from --) -- explain SELECT top(10) CustomerKey, FirstName, LastName, CityName, CompanyName FROM Customers ORDER BY CityName DESC // output: translated KQL code Customers | project CustomerKey, FirstName, LastName, CityName, CompanyName | sort by CityName desc nulls first | take int(10)
The most interesting part of the translation is how "explain" translate joins from SQL to KQL. Let's start with a FULL JOIN from the set of patterns considered in a previous post on SQL joins:
-- explain SELECT CST.CustomerKey , CST.FirstName + ' ' + CST.LastName CustomerName , Cast(SAL.DateKey as Date) DateKey , SAL.TotalCost FROM NewSales SAL JOIN Customers CST ON SAL.CustomerKey = CST.CustomerKey WHERE SAL.DateKey > '20240101' AND SAL.DateKey < '20240201' ORDER BY CustomerName, DateKey, TotalCost DESC
And, here's the translation:
// translated code NewSales | project-rename ['SAL.DateKey']=DateKey | join kind=inner (Customers | project-rename ['CST.CustomerKey']=CustomerKey , ['CST.CityName']=CityName , ['CST.CompanyName']=CompanyName , ['CST.ContinentName']=ContinentName , ['CST.Education']=Education , ['CST.FirstName']=FirstName , ['CST.Gender']=Gender , ['CST.LastName']=LastName , ['CST.MaritalStatus']=MaritalStatus , ['CST.Occupation']=Occupation , ['CST.RegionCountryName']=RegionCountryName , ['CST.StateProvinceName']=StateProvinceName) on ($left.CustomerKey == $right.['CST.CustomerKey']) | where ((['SAL.DateKey'] > todatetime("20240101")) and (['SAL.DateKey'] < todatetime("20240201"))) | project ['CST.CustomerKey']
, CustomerName=__sql_add(__sql_add(['CST.FirstName'] , " "), ['CST.LastName']) , DateKey=['SAL.DateKey'] , TotalCost | sort by CustomerName asc nulls first , DateKey asc nulls first , TotalCost desc nulls first | project-rename CustomerKey=['CST.CustomerKey']
The code was slightly formatted to facilitated its reading. Unfortunately, the tool doesn't work well with table aliases, introduces also all the fields available from the dimension table, which can become a nightmare for the big dimension tables, the concatenation seems strange, and if one looks deeper, further issues can be identified. So, the challenge is how to write a query in SQL so it can minimize the further changed in QKL.
Probably, one approach is to write the backbone of the query in SQL and add the further logic after translation.
-- explain SELECT NewSales.CustomerKey , NewSales.DateKey , NewSales.TotalCost FROM NewSales INNER JOIN Customers ON NewSales.CustomerKey = Customers.CustomerKey WHERE DateKey > '20240101' AND DateKey < '20240201' ORDER BY NewSales.CustomerKey , NewSales.DateKey
And the translation looks simpler:
// transformed query NewSales | join kind=inner (Customers | project-rename ['Customers.CustomerKey']=CustomerKey , ['Customers.CityName']=CityName , ['Customers.CompanyName']=CompanyName , ['Customers.ContinentName']=ContinentName , ['Customers.Education']=Education , ['Customers.FirstName']=FirstName , ['Customers.Gender']=Gender , ['Customers.LastName']=LastName , ['Customers.MaritalStatus']=MaritalStatus , ['Customers.Occupation']=Occupation , ['Customers.RegionCountryName']=RegionCountryName , ['Customers.StateProvinceName']=StateProvinceName) on ($left.CustomerKey == $right.['Customers.CustomerKey']) | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201"))) | project CustomerKey, DateKey, TotalCost | sort by CustomerKey asc nulls first , DateKey asc nulls first
// transformed final query NewSales | where (DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")) | join kind=inner ( Customers | project CustomerKey , FirstName , LastName ) on $left.CustomerKey == $right.CustomerKey | project CustomerKey , CustomerName = strcat(FirstName, ' ', LastName) , DateKey , TotalCost | sort by CustomerName asc nulls first , DateKey asc nulls first
So, it makes sense to create the backbone of a query, translate it to KQL via explain, remove the unnecessary columns and formatting, respectively add what's missing. Once the patterns were mastered, there's probably no need to use the translation tool, but could prove to be also some exceptions. Anyway, the translation tool helps considerably in learning. Big kudos for the development team!
Notes:
1) The above queries ignore the fact that Customer information is available also in the NewSales table, making thus the joins obsolete. The joins were considered only for exemplification purposes. Similar joins might be still needed for checking the "quality" of the data (e.g. for dimensions that change over time). Even if such "surprises" shouldn't appear by design, real life designs continue to surprise...
2) Queries should be less verbose by design (aka simplicity by design)! The more unnecessary code is added, the higher the chances for errors to be overseen, respectively the more time is needed to understand and validated the queries!
Happy coding!
Previous Post <<||>> Next Post
References
[1] Microsoft Lear n (2024) Azure: Query data using T-SQL [link]