The order in which a database engine processes the joins is highly dependent on queries, tables and servers’ characteristics, and it’s quite a comprehensive topic that deserves an ample discussion that could span several posts. Keeping it simple, once the execution plan is determined, the engine starts with the first join, the result being joined with the next table, and so on. The question is how could we make the database engine to apply a certain processing order and how could we facilitate the whole process. Given the fact that in order to solve a problem often we have only 1 or 2 solutions that together with the nature of the relationship between tables involve dictates the nature (types) and the order of the joins, I can’t they there is much flexibility from this point of view. In the end it resumes in knowing the rules that govern the joins, for example in case of commutative join operators it doesn’t really matter how the joins are written, however special attention must be given to non-commutative operations (e.g. left/right outer joins) and the implications of mixing outer with inner joins.
One approach is to start with the table having the lowest level of detail or the row-preserving table, and first add successively the child tables with the lower cardinality (number of records) and then the other tables in increasing level of detail. On the other side it depends also on the problems needed to be solved. Taking for example the Production.Product, Purchasing.PurchaseOrderDetail and Purchasing.PurchaseOrderHeader tables from AdventureWorks database a query showing the lowest level of details would start with the PO (Purchase Order) Details, and considering that for each PO Detail should exist a PO Header and a Product, could be used an inner join in order to add the PO Header and Product information:
-- PO Details
SELECT POD.PurchaseOrderDetailID
, POD.PurchaseOrderID
, ITM.ProductID
, ITM.ProductNumber
, ITM.StandardCost
, POD.UnitPrice
, POD.OrderQty
FROM Purchasing.PurchaseOrderDetail POD
JOIN Production.Product ITM
ON POD.ProductID = ITM.ProductID
JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete
Another view within the same data could be based on a requirement to have the Products as row-preserving table, and because there could be Products without a PO placed, this requires to have a left join between the Products and PO Details. In theory because there should be a PO Header for each PO Detail, an inner join would be enough, however an inner join will restrain the scope of the query only to the PO Details eliminating thus the Products without a PO Placed, so a left join must be used between PO Details and PO Headers:
-- Products with PO Details SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID , ITM.ProductID , ITM.ProductNumber , ITM.StandardCost , POD.UnitPrice , POD.OrderQty FROM Production.Product ITM LEFT JOIN Purchasing.PurchaseOrderDetail POD ON ITM.ProductID = POD.ProductID LEFT JOIN Purchasing.PurchaseOrderHeader POH ON POD.PurchaseOrderID = POH.PurchaseOrderID --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete -- OR POD.ProductID IS NULL
In case a constraint needs to be added based on the attributes from the null-preserving table (e.g. the commented constraint based on the PO Status), then the overall constraint should be modified in order to reflect the need to show the Products without a PO placed, this being the purpose of the second commented constraint.
Each table referenced by the PO Details or PO Headers and the tables further referenced by the respective tables, when added to the above query require a left join in order to keep the level of detail of the query. On the other side the left joins are performing poorer than the inner joins, so inner joins should be used whenever is possible. A technique to reduce the number of left joins in a query and accommodate the non-join constraints based on the null-preserving tables is to include such tables in a view or inline view. Here’s the modified above query using an inline view for PO Details:
-- Products with PO Details (inline view) SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID , ITM.ProductID , ITM.ProductNumber , ITM.StandardCost , POD.UnitPrice , POD.OrderQty FROM Production.Product ITM LEFT JOIN ( -- PO Details SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID , POD.ProductID , POD.UnitPrice , POD.OrderQty FROM Purchasing.PurchaseOrderDetail POD JOIN Purchasing.PurchaseOrderHeader POH ON POD.PurchaseOrderID = POH.PurchaseOrderID --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete ) POD ON ITM.ProductID = POD.ProductID
A few years back I found an interesting article of Itzik Ben-Gan on undocumented join features existing in SQL Server, in the respective article he was showing how parentheses and join constraints’ order could be used to control the order in which joined are processed. Parentheses could be used to group the tables and the join between them, allowing thus to perform an inner join between PO Details and PO Headers, the results following to represent the null-preserving table joined with Products table:
-- Products with PO Details (parentheses grouping) SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID , ITM.ProductID , ITM.ProductNumber , ITM.StandardCost , POD.UnitPrice , POD.OrderQty FROM Production.Product ITM LEFT JOIN ( Purchasing.PurchaseOrderDetail POD JOIN Purchasing.PurchaseOrderHeader POH ON POD.PurchaseOrderID = POH.PurchaseOrderID --AND POH.Status IN (2, 4)-- 2-Approved, 4-Complete ) ON ITM.ProductID = POD.ProductID
The second undocumented method is based on tables and join constraints’ order, being possible to list the table in the reverse order in which they need to be processed and thus accommodate the right type of join in between, the join constraints following the same reverse logic. Thus the PO Headers could be placed the last in the list together with the PO Details and use a inner join between them, the result between them being joined to Products using a left join.
-- Products with PO Details (condition order) SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID , ITM.ProductID , ITM.ProductNumber , ITM.StandardCost , POD.UnitPrice , POD.OrderQty FROM Production.Product ITM LEFT JOIN Purchasing.PurchaseOrderDetail POD LEFT JOIN Purchasing.PurchaseOrderHeader POH ON POD.PurchaseOrderID = POH.PurchaseOrderID ON ITM.ProductID = POD.ProductID --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete -- OR POD.ProductID IS NULL
No comments:
Post a Comment