In a standard scenario there is a fact table and multiple dimension table (see
previous post), though one can look at the same data from multiple perspectives. In KQL
it's recommended to start with the fact table, however in some reports one
needs records from the dimension table independently whether there are any
records in the fact tale.
For example, it would be useful to show all the products, independently
whether they were sold or not. It's what the below query does via a RIGHT JOIN
between the fact table and the Customer dimension:
// totals by customer via right join
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost) by CustomerKey
| join kind=rightouter (
Customers
| where RegionCountryName in ('Canada', 'Australia')
| project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName)
)
on CustomerKey
| project RegionCountryName, CustomerName, TotalCost
//| summarize record_count = count()
| order by CustomerName asc
The Product details can be added then via a LEFT JOIN between the fact
table and the Product dimension:
// total by customer via right join with product information
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost)
, FirstPurchaseDate = min(DateKey)
, LastPurchaseDate = max(DateKey) by CustomerKey, ProductKey
| join kind=rightouter (
Customers
| where RegionCountryName in ('Canada', 'Australia')
| project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName)
)
on CustomerKey
| join kind=leftouter (
Products
| where ProductCategoryName == 'TV and Video'
| project ProductKey, ProductName
)
on ProductKey
| project RegionCountryName, CustomerName, ProductName, TotalCost, FirstPurchaseDate, LastPurchaseDate, record_count
//| where record_count>1 // multiple purchases
| order by CustomerName asc, ProductName asc
These kind of queries need adequate validation and for this it might be needed
to restructure the queries.
// validating the multiple records (defailed)
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| lookup Products on ProductKey
| lookup Customers on CustomerKey
| where FirstName == 'Alexandra' and LastName == 'Sanders'
| project CustomerName = strcat(FirstName, ' ', LastName), ProductName, TotalCost, DateKey, ProductKey, CustomerKey
// validating the multiple records against the fact table (detailed)
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| where CustomerKey == 14912
Mixing RIGHT and LEFT joins in this way increases the complexity of the
queries and sometimes comes with a burden for validating the logic. In SQL one could
prefer to start with the Customer table, add the summary data and the other
dimensions. In this way one can do a total count individually starting with
the Customer table and adding each join which reviewing the record count for
each change.
In special scenario instead of a RIGHT JOIN one could use a FULL JOIN and add
the Customers without any orders via a UNION. In some scenarios this approach
can offer even a better performance. For this approach, one needs to get the
Customers without Orders via an anti-join, more exactly a rightanti:
// customers without orders
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost)
, FirstPurchaseDate = min(DateKey)
, LastPurchaseDate = max(DateKey) by CustomerKey, ProductKey
| join kind=rightanti (
Customers
| project CustomerKey, RegionCountryName, CustomerName = strcat(FirstName, ' ', LastName)
)
on CustomerKey
| project RegionCountryName, CustomerName, ProductName = '', TotalCost = 0
//| summarize count()
| order by CustomerName asc, ProductName asc
And now, joining the Customers with orders with the ones without orders gives an overview of all the customers:
// total by product via lookup with table alias
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost) by CustomerKey //, ProductKey
//| lookup Products on ProductKey
| lookup Customers on CustomerKey
| project RegionCountryName
, CustomerName = strcat(FirstName, ' ', LastName)
//, ProductName
, TotalCost
//| summarize count()
| union withsource=SourceTable kind=outer (
// customers without orders
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost)
, FirstPurchaseDate = min(DateKey)
, LastPurchaseDate = max(DateKey) by CustomerKey
//, ProductKey
| join kind=rightanti (
Customers
| project CustomerKey
, RegionCountryName
, CustomerName = strcat(FirstName, ' ', LastName)
)
on CustomerKey
| project RegionCountryName
, CustomerName
//, ProductName = ''
, TotalCost = 0
//| summarize count()
)
//| summarize count()
| order by CustomerName asc
//, ProductName asc
And, of course, the number of records returned by the three queries must match. The information related to the Product were left out for this version, though it can be added as needed. Unfortunately, there are queries more complex than this, which makes the queries more difficult to read, understand and troubleshoot. Inline views could be useful to structure the logic as needed.
let T_customers_with_orders = view () {
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost)
by CustomerKey
//, ProductKey
//, DateKey
//| lookup Products on ProductKey
| lookup Customers on CustomerKey
| project RegionCountryName
, CustomerName = strcat(FirstName, ' ', LastName)
//, ProductName
//, ProductCategoryName
, TotalCost
//, DateKey
};
let T_customers_without_orders = view () {
// customers without orders
NewSales
| where SalesAmount <> 0 and ProductCategoryName == 'TV and Video'
| where DateKey >=date(2023-02-01) and DateKey < datetime(2023-03-01)
| summarize record_count = count()
, TotalCost = sum(TotalCost)
, FirstPurchaseDate = min(DateKey)
, LastPurchaseDate = max(DateKey) by CustomerKey
//, ProductKey
| join kind=rightanti (
Customers
| project CustomerKey
, RegionCountryName
, CustomerName = strcat(FirstName, ' ', LastName)
)
on CustomerKey
| project RegionCountryName
, CustomerName
//, ProductName = ''
, TotalCost = 0
//| summarize count()
};
T_customers_with_orders
| union withsource=SourceTable kind=outer T_customers_without_orders
| summarize count()
In this way the queries should be easier to troubleshoot and restructure the logic in more manageable pieces.
It would be useful to save the definition of the view (aka stored view), however it's not possible to create views on the machine on which the tests are run:
"Principal 'aaduser=...' is not authorized to write database 'ContosoSales'."
Happy coding!
Previous Post
<<||>>
Next Post