In a standard scenario there is a fact table and multiple dimension table (see previous post), though one can look at the same data from multiple perspectives. In KQL it's recommended to start with the fact table, however in some reports one needs records from the dimension table independently whether there are any records in the fact tale.
For example, it would be useful to show all the products, independently whether they were sold or not. It's what the below query does via a RIGHT JOIN between the fact table and the Customer dimension:
// totals by customer via right join NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) by CustomerKey | join kind=rightouter ( Customers | where RegionCountryName in ('Canada', 'Australia') | project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName) ) on CustomerKey | project RegionCountryName, CustomerName, TotalCost //| summarize record_count = count() | order by CustomerName asc
The Product details can be added then via a LEFT JOIN between the fact table and the Product dimension:
// total by customer via right join with product information NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) , FirstPurchaseDate = min(DateKey) , LastPurchaseDate = max(DateKey) by CustomerKey, ProductKey | join kind=rightouter ( Customers | where RegionCountryName in ('Canada', 'Australia') | project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName) ) on CustomerKey | join kind=leftouter ( Products | where ProductCategoryName == 'TV and Video' | project ProductKey, ProductName ) on ProductKey | project RegionCountryName, CustomerName, ProductName, TotalCost, FirstPurchaseDate, LastPurchaseDate, record_count //| where record_count>1 // multiple purchases | order by CustomerName asc, ProductName asc
These kind of queries need adequate validation and for this it might be needed to restructure the queries.
// validating the multiple records (defailed) NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | lookup Products on ProductKey | lookup Customers on CustomerKey | where FirstName == 'Alexandra' and LastName == 'Sanders' | project CustomerName = strcat(FirstName, ' ', LastName), ProductName, TotalCost, DateKey, ProductKey, CustomerKey // validating the multiple records against the fact table (detailed) NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | where CustomerKey == 14912
Mixing RIGHT and LEFT joins in this way increases the complexity of the queries and sometimes comes with a burden for validating the logic. In SQL one could prefer to start with the Customer table, add the summary data and the other dimensions. In this way one can do a total count individually starting with the Customer table and adding each join which reviewing the record count for each change.
In special scenario instead of a RIGHT JOIN one could use a FULL JOIN and add the Customers without any orders via a UNION. In some scenarios this approach can offer even a better performance. For this approach, one needs to get the Customers without Orders via an anti-join, more exactly a rightanti:
// customers without orders NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) , FirstPurchaseDate = min(DateKey) , LastPurchaseDate = max(DateKey) by CustomerKey, ProductKey | join kind=rightanti ( Customers | project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName) ) on CustomerKey | project RegionCountryName, CustomerName, ProductName = '', TotalCost = 0 //| summarize count() | order by CustomerName asc, ProductName asc
And now, joining the Customers with orders with the ones without orders gives an overview of all the customers:
// total by product via lookup with table alias NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) by CustomerKey //, ProductKey //| lookup Products on ProductKey | lookup Customers on CustomerKey | project RegionCountryName , CustomerName = strcat(FirstName, ' ', LastName) //, ProductName , TotalCost //| summarize count() | union withsource=SourceTable kind=outer ( // customers without orders NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) , FirstPurchaseDate = min(DateKey) , LastPurchaseDate = max(DateKey) by CustomerKey //, ProductKey | join kind=rightanti ( Customers | project CustomerKey , RegionCountryName , CustomerName = strcat(FirstName, ' ', LastName) ) on CustomerKey | project RegionCountryName , CustomerName //, ProductName = '' , TotalCost = 0 //| summarize count() ) //| summarize count() | order by CustomerName asc //, ProductName asc
And, of course, the number of records returned by the three queries must match. The information related to the Product were left out for this version, though it can be added as needed. Unfortunately, there are queries more complex than this, which makes the queries more difficult to read, understand and troubleshoot. Inline views could be useful to structure the logic as needed.
let T_customers_with_orders = view () { NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) by CustomerKey //, ProductKey //, DateKey //| lookup Products on ProductKey | lookup Customers on CustomerKey | project RegionCountryName , CustomerName = strcat(FirstName, ' ', LastName) //, ProductName //, ProductCategoryName , TotalCost //, DateKey }; let T_customers_without_orders = view () { // customers without orders NewSales | where SalesAmount <> 0 and ProductCategoryName == 'TV and Video' | where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01) | summarize record_count = count() , TotalCost = sum(TotalCost) , FirstPurchaseDate = min(DateKey) , LastPurchaseDate = max(DateKey) by CustomerKey //, ProductKey | join kind=rightanti ( Customers | project CustomerKey , RegionCountryName , CustomerName = strcat(FirstName, ' ', LastName) ) on CustomerKey | project RegionCountryName , CustomerName //, ProductName = '' , TotalCost = 0 //| summarize count() }; T_customers_with_orders | union withsource=SourceTable kind=outer T_customers_without_orders | summarize count()
In this way the queries should be easier to troubleshoot and restructure the logic in more manageable pieces.
It would be useful to save the definition of the view (aka stored view), however it's not possible to create views on the machine on which the tests are run:
"Principal 'aaduser=...' is not authorized to write database 'ContosoSales'."
Happy coding!