Let's consider two more tables involving aggregations on Sales orders and On-hand:
-- products analysis (via JOINs) SELECT ITM.ProductNumber , ITM.Name , IsNull(OHD.OnHand, 0) OnHand , IsNull(SOL.SalesQty, 0) SalesQty , IsNull(POL.PurchQty, 0) PurchQty FROM [Production].[Product] ITM LEFT JOIN ( -- cumulated on hand SELECT OHD.ProductId , SUM(OHD.Quantity) OnHand , 0 SalesQty , 0 PurchQty FROM [Production].[ProductInventory] OHD GROUP BY OHD.ProductId ) OHD ON ITM.ProductId = OHD.ProductId LEFT JOIN ( -- cumulated sales orders SELECT SOL.ProductId , 0 OnHand , SUM(SOL.OrderQty) SalesQty , 0 PurchQty FROM [Sales].[SalesOrderDetail] SOL GROUP BY SOL.ProductId ) SOL ON ITM.ProductID = SOL.ProductId LEFT JOIN (-- cumulated open purchase orders SELECT POL.ProductId , 0 OnHand , 0 SalesQty , SUM(POL.OrderQty) PurchQty FROM Purchasing.PurchaseOrderDetail POL WHERE OrderQty - (ReceivedQty - RejectedQty)>0 --AND POL.DueDate BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY POL.ProductId ) POL ON ITM.ProductId = POL.ProductId WHERE COALESCE(OHD.ProductId, SOL.ProductId, POL.ProductId) IS NOT NULL ORDER BY ITM.ProductNumber
-- products analysis (via CTEs) WITH OnHand AS ( -- cumulated on hand SELECT OHD.ProductId , SUM(OHD.Quantity) OnHand , 0 SalesQty , 0 PurchQty FROM [Production].[ProductInventory] OHD GROUP BY OHD.ProductId ) , SalesOrders AS ( -- cumulated sales orders SELECT SOL.ProductId , 0 OnHand , SUM(SOL.OrderQty) SalesQty , 0 PurchQty FROM [Sales].[SalesOrderDetail] SOL GROUP BY SOL.ProductId ) , OpenPOs AS ( -- cumulated open purchase orders SELECT POL.ProductId , 0 OnHand , 0 SalesQty , SUM(POL.OrderQty) PurchQty FROM Purchasing.PurchaseOrderDetail POL WHERE OrderQty - (ReceivedQty - RejectedQty)>0 --AND POL.DueDate BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY POL.ProductId ) SELECT ITM.ProductNumber , ITM.Name , IsNull(OHD.OnHand, 0) OnHand , IsNull(SOL.SalesQty, 0) SalesQty , IsNull(POL.PurchQty, 0) PurchQty FROM [Production].[Product] ITM LEFT JOIN OnHand OHD ON ITM.ProductId = OHD.ProductId LEFT JOIN SalesOrders SOL ON ITM.ProductID = SOL.ProductId LEFT JOIN OpenPOs POL ON ITM.ProductId = POL.ProductId WHERE COALESCE(OHD.ProductId, SOL.ProductId, POL.ProductId) IS NOT NULL ORDER BY ITM.ProductNumber
This way of structuring the code allows us in theory also a better way to troubleshoot the logic on segments (each CTE) and a better grip at the end query. In fact, any of the above alternatives would result in a similar writing of the end query and, with the right indexes and processing power, the performance won't be in general an issue.
A CTE is just a way of reorganizing a query and the changes made by moving the logic to the CTE shouldn't have any impact on the performance. Even if the constraints and tables appear somewhere else, the database engine will push down predicates and move operators around to obtain an optimal algebraic tree. In theory, the two approaches should lead to the same query plan. Of course, there are also exceptions, e.g. when restructuring the logic inline in a query without CTE can favor one plan over the other.
There's also the temptation to use CTEs for everything. An extreme case is using a CTE for each table on which a filter is applied instead of bringing the constraints into the JOIN or the WHERE clause. Even if the former approach allows troubleshooting individual sources, it can create further overhead as is needed to check back and forth between CTEs and JOINs, taking thus more time and allowing some errors maybe to escape. CTEs are for breaking down complex logic to a level at which the code can be managed and not create further overhead. There must be a balance between usability and effort.
Moving to a distributed environment like Azure Synapse, where tables need to be distributed between several compute nodes, because of the dependencies existing between tables, queries based on several aggregations might take longer than expected. Besides following the best practices for table design including tables' distribution (see [1], [2]), it's maybe worth playing around with the query structure and see how the database engine will react to it.
For example, the considered query could be restructured as follows by transforming the many LEFT JOINs to UNIONs:
-- products analysis (via UNIONs) SELECT ITM.ProductNumber , ITM.Name , SUM(DAT.OnHand) OnHand , SUM(DAT.SalesQty) SalesQty , SUM(DAT.PurchQty) PurchQty FROM [Production].[Product] ITM JOIN ( -- cumulated on hand SELECT OHD.ProductId , SUM(OHD.Quantity) OnHand , 0 SalesQty , 0 PurchQty FROM [Production].[ProductInventory] OHD GROUP BY OHD.ProductId UNION ALL -- cumulated sales orders SELECT SOL.ProductId , 0 OnHand , SUM(SOL.OrderQty) SalesQty , 0 PurchQty FROM [Sales].[SalesOrderDetail] SOL GROUP BY SOL.ProductId UNION ALL -- cumulated open purchase orders SELECT POL.ProductId , 0 OnHand , 0 SalesQty , SUM(POL.OrderQty) PurchQty FROM Purchasing.PurchaseOrderDetail POL WHERE OrderQty - (ReceivedQty - RejectedQty)>0 --AND POL.DueDate BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY POL.ProductId ) DAT ON ITM.ProductId = DAT.ProductId GROUP BY ITM.ProductNumber , ITM.Name ORDER BY ITM.ProductNumber
In similar situations I found this approach to provide better performance, especially when working with serverless SQL pools. Though, this might depend also on tables' particularities.
If the queries take long time and the result sets are needed frequently, it might be useful to store the intermediary or final result sets in (base) tables, or in case of serverless SQL pool to parquet files and have the data updated on a regular basis. Upon case, unless the data chance too often, this might prove a much better option than using temporary tables or table variables.
Happy coding!
Resources:
[1] Microsoft Learn (2022) Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics (link)
[1] Microsoft Learn (2022) Guidance for designing distributed tables using dedicated SQL pool in Azure Synapse Analytics (link)
[2] Microsoft Learn (2022) Design tables using dedicated SQL pool in Azure Synapse Analytics (link)
No comments:
Post a Comment