03 April 2010

SQL Reloaded: The Power of Joins IV (Multi-Joins Under Control)

       Few are the queries based only on the join between two tables, each additional table added to the query involving an additional join with one or more of the already considered tables. Such cases in which more than one join is involved in a query are referred as multi-joins or multiple joins, the joins could involve the same join operator or it can be a mixture of the horizontal or vertical joins. In this context can be discussed about join precedence, in other words the way in which a database engine processes the joins, on how to control the order in which the joins are processed, on how to use the joins in order to model a solution for a given problem or how the constraints are considered at macro level.

      The order in which a database engine processes the joins is highly dependent on queries, tables and servers’ characteristics, and it’s quite a comprehensive topic that deserves an ample discussion that could span several posts. Keeping it simple, once the execution plan is determined, the engine starts with the first join, the result being joined with the next table, and so on. The question is how could we make the database engine to apply a certain processing order and how could we facilitate the whole process. Given the fact that in order to solve a problem often we have only 1 or 2 solutions that together with the nature of the relationship between tables involve dictates the nature (types) and the order of the joins, I can’t they there is much flexibility from this point of view. In the end it resumes in knowing the rules that govern the joins, for example in case of commutative join operators it doesn’t really matter how the joins are written, however special attention must be given to non-commutative operations (e.g. left/right outer joins) and the implications of mixing outer with inner joins.

    One approach is to start with the table having the lowest level of detail or the row-preserving table, and first add successively the child tables with the lower cardinality (number of records) and then the other tables in increasing level of detail. On the other side it depends also on the problems needed to be solved. Taking for example the Production.Product, Purchasing.PurchaseOrderDetail and Purchasing.PurchaseOrderHeader tables from AdventureWorks database a query showing the lowest level of details would start with the PO (Purchase Order) Details, and considering that for each PO Detail should exist a PO Header and a Product, could be used an inner join in order to add the PO Header and Product information:

-- PO Details 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Purchasing.PurchaseOrderDetail POD  
    JOIN Production.Product ITM 
      ON POD.ProductID = ITM.ProductID 
   JOIN Purchasing.PurchaseOrderHeader POH 
     ON POD.PurchaseOrderID = POH.PurchaseOrderID 
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 

  Another view within the same data could be based on a requirement to have the Products as row-preserving table, and because there could be Products without a PO placed, this requires to have a left join between the Products and PO Details. In theory because there should be a PO Header for each PO Detail, an inner join would be enough, however an inner join will restrain the scope of the query only to the PO Details eliminating thus the Products without a PO Placed, so a left join must be used between PO Details and PO Headers:

-- Products with PO Details 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
    LEFT JOIN Purchasing.PurchaseOrderDetail POD                
       ON ITM.ProductID = POD.ProductID  
     LEFT JOIN Purchasing.PurchaseOrderHeader POH           
      ON POD.PurchaseOrderID = POH.PurchaseOrderID 
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
-- OR POD.ProductID IS NULL 

     
In case a constraint needs to be added based on the attributes from the null-preserving table (e.g. the commented constraint based on the PO Status), then the overall constraint should be modified in order to reflect the need to show the Products without a PO placed, this being the purpose of the second commented constraint.  
    Each table referenced by the PO Details or PO Headers and the tables further referenced by the respective tables, when added to the above query require a left join in order to keep the level of detail of the query. On the other side the left joins are performing poorer than the inner joins, so inner joins should be used whenever is possible. A technique to reduce the number of left joins in a query and accommodate the non-join constraints based on the null-preserving tables is to include such tables in a view or inline view. Here’s the modified above query using an inline view for PO Details:

-- Products with PO Details (inline view) 
SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
    LEFT JOIN ( -- PO Details 
    SELECT POD.PurchaseOrderDetailID  
    , POD.PurchaseOrderID 
    , POD.ProductID     , POD.UnitPrice 
     , POD.OrderQty 
     FROM Purchasing.PurchaseOrderDetail POD  
        JOIN Purchasing.PurchaseOrderHeader POH  
           ON POD.PurchaseOrderID = POH.PurchaseOrderID 
       --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete  
     ) POD  
     ON ITM.ProductID = POD.ProductID 

     A few years back I found an interesting article of Itzik Ben-Gan on undocumented join features existing in SQL Server, in the respective article he was showing how parentheses and join constraints’ order could be used to control the order in which joined are processed. Parentheses could be used to group the tables and the join between them, allowing thus to perform an inner join between PO Details and PO Headers, the results following to represent the null-preserving table joined with Products table:

-- Products with PO Details (parentheses grouping) 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
     LEFT JOIN ( 
           Purchasing.PurchaseOrderDetail POD  
           JOIN Purchasing.PurchaseOrderHeader POH  
               ON POD.PurchaseOrderID = POH.PurchaseOrderID 
        --AND POH.Status IN (2, 4)-- 2-Approved, 4-Complete 
       )  
     ON ITM.ProductID = POD.ProductID 

     The second undocumented method is based on tables and join constraints’ order, being possible to list the table in the reverse order in which they need to be processed and thus accommodate the right type of join in between, the join constraints following the same reverse logic. Thus the PO Headers could be placed the last in the list together with the PO Details and use a inner join between them, the result between them being joined to Products using a left join.
-- Products with PO Details (condition order) 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
     LEFT JOIN Purchasing.PurchaseOrderDetail POD 
        LEFT JOIN Purchasing.PurchaseOrderHeader POH 
           ON POD.PurchaseOrderID = POH.PurchaseOrderID 
         ON ITM.ProductID = POD.ProductID  
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
-- OR POD.ProductID IS NULL 

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.