About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Friday, April 02, 2010

The Power of Joins – Part IV: Multi-Joins Under Control

       Few are the queries based only on the join between two tables, each additional table added to the query involving an additional join with one or more of the already considered tables. Such cases in which more than one join is involved in a query are referred as multi-joins or multiple joins, the joins could involve the same join operator or it can be a mixture of the horizontal or vertical joins. In this context can be discussed about join precedence, in other words the way in which a database engine processes the joins, on how to control the order in which the joins are processed, on how to use the joins in order to model a solution for a given problem or how the constraints are considered at macro level.

      The order in which a database engine processes the joins is highly dependent on queries, tables and servers’ characteristics, and it’s quite a comprehensive topic that deserves an ample discussion that could span several posts. Keeping it simple, once the execution plan is determined, the engine starts with the first join, the result being joined with the next table, and so on. The question is how could we make the database engine to apply a certain processing order and how could we facilitate the whole process. Given the fact that in order to solve a problem often we have only 1 or 2 solutions that together with the nature of the relationship between tables involve dictates the nature (types) and the order of the joins, I can’t they there is much flexibility from this point of view. In the end it resumes in knowing the rules that govern the joins, for example in case of commutative join operators it doesn’t really matter how the joins are written, however special attention must be given to non-commutative operations (e.g. left/right outer joins) and the implications of mixing outer with inner joins.

    One approach is to start with the table having the lowest level of detail or the row-preserving table, and first add successively the child tables with the lower cardinality (number of records) and then the other tables in increasing level of detail. On the other side it depends also on the problems needed to be solved. Taking for example the Production.Product, Purchasing.PurchaseOrderDetail and Purchasing.PurchaseOrderHeader tables from AdventureWorks database a query showing the lowest level of details would start with the PO (Purchase Order) Details, and considering that for each PO Detail should exist a PO Header and a Product, could be used an inner join in order to add the PO Header and Product information:
-- PO Details
SELECT POD.PurchaseOrderDetailID

,
POD.PurchaseOrderID

,
ITM.ProductID

,
ITM.ProductNumber

,
ITM.StandardCost

,
POD.UnitPrice

,
POD.OrderQty

FROM
Purchasing.PurchaseOrderDetail POD
    JOIN
Production.Product ITM
      ON
POD.ProductID = ITM.ProductID
   JOIN Purchasing.PurchaseOrderHeader POH
     ON POD.PurchaseOrderID = POH.PurchaseOrderID

--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

   Another view within the same data could be based on a requirement to have the Products as row-preserving table, and because there could be Products without a PO placed, this requires to have a left join between the Products and PO Details. In theory because there should be a PO Header for each PO Detail, an inner join would be enough, however an inner join will restrain the scope of the query only to the PO Details eliminating thus the Products without a PO Placed, so a left join must be used between PO Details and PO Headers:
-- Products with PO Details
SELECT POD.PurchaseOrderDetailID

,
POD.PurchaseOrderID

,
ITM.ProductID

,
ITM.ProductNumber

,
ITM.StandardCost

,
POD.UnitPrice

,
POD.OrderQty

FROM
Production.Product ITM
    LEFT JOIN
Purchasing.PurchaseOrderDetail POD
        ON ITM.ProductID = POD.ProductID

     LEFT JOIN Purchasing.PurchaseOrderHeader POH
        ON POD.PurchaseOrderID = POH.PurchaseOrderID

--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete
-- OR POD.ProductID IS NULL

    In case a constraint needs to be added based on the attributes from the null-preserving table (e.g. the commented constraint based on the PO Status), then the overall constraint should be modified in order to reflect the need to show the Products without a PO placed, this being the purpose of the second commented constraint.
    Each table referenced by the PO Details or PO Headers and the tables further referenced by the respective tables, when added to the above query require a left join in order to keep the level of detail of the query. On the other side the left joins are performing poorer than the inner joins, so inner joins should be used whenever is possible. A technique to reduce the number of left joins in a query and accommodate the non-join constraints based on the null-preserving tables is to include such tables in a view or inline view. Here’s the modified above query using an inline view for PO Details:
-- Products with PO Details (inline view)
SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID
,
ITM.ProductID

,
ITM.ProductNumber

,
ITM.StandardCost

,
POD.UnitPrice

,
POD.OrderQty

FROM
Production.Product ITM
    LEFT JOIN ( -- PO Details

    SELECT POD.PurchaseOrderDetailID
    , POD.PurchaseOrderID
    , POD.ProductID     , POD.UnitPrice
     , POD.OrderQty
     FROM Purchasing.PurchaseOrderDetail POD
        JOIN Purchasing.PurchaseOrderHeader POH
           ON POD.PurchaseOrderID = POH.PurchaseOrderID
       --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

     ) POD
     ON ITM.ProductID = POD.ProductID


     A few years back I found an interesting article of Itzik Ben-Gan on undocumented join features existing in SQL Server, in the respective article he was showing how parentheses and join constraints’ order could be used to control the order in which joined are processed. Parentheses could be used to group the tables and the join between them, allowing thus to perform an inner join between PO Details and PO Headers, the results following to represent the null-preserving table joined with Products table:
-- Products with PO Details (parentheses grouping)
SELECT POD.PurchaseOrderDetailID
,
POD.PurchaseOrderID

,
ITM.ProductID

,
ITM.ProductNumber

,
ITM.StandardCost

,
POD.UnitPrice

,
POD.OrderQty

FROM
Production.Product ITM
     LEFT JOIN (

           Purchasing.PurchaseOrderDetail POD
           JOIN Purchasing.PurchaseOrderHeader POH
               ON POD.PurchaseOrderID = POH.PurchaseOrderID
        --AND POH.Status IN (2, 4)-- 2-Approved, 4-Complete

)
ON ITM.ProductID = POD.ProductID

     The second undocumented method is based on tables and join constraints’ order, being possible to list the table in the reverse order in which they need to be processed and thus accommodate the right type of join in between, the join constraints following the same reverse logic. Thus the PO Headers could be placed the last in the list together with the PO Details and use a inner join between them, the result between them being joined to Products using a left join.
-- Products with PO Details (condition order)
SELECT POD.PurchaseOrderDetailID

,
POD.PurchaseOrderID

,
ITM.ProductID

,
ITM.ProductNumber

,
ITM.StandardCost

,
POD.UnitPrice

,
POD.OrderQty

FROM
Production.Product ITM
     LEFT JOIN Purchasing.PurchaseOrderDetail POD
        LEFT JOIN Purchasing.PurchaseOrderHeader POH
           ON POD.PurchaseOrderID = POH.PurchaseOrderID
         ON ITM.ProductID = POD.ProductID

--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete
-- OR POD.ProductID IS NULL

No comments: