If we talk about join constraints then more likely we refer to vertical joins, though also anti-joins and semi-joins could include sporadically not-equality operators. In a normal join based on equalities for the set of attributes participating in the join from the left table are matched all the records from the right table, even if this means retrieving more than one record. Same happens also for not-equal constraints even if the business scenarios in which such needs arises are not so intuitive. One relatively simple example I often met is the case when is needed to calculate the incremental aggregation of values, for example to calculate the incremental volume of PO placed over time:
SELECT DII.DateSequence , IsNull(POH.SubTotal, 0) SubTotal FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII CROSS APPLY (--incremental PO Volume SELECT SUM(POH.SubTotal) SubTotal FROM Purchasing.PurchaseOrderHeader POH WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete AND DATEDIFF(d, POH.OrderDate, DII.DateSequence)>=0 --not-equal constraint ) POH
In order to simplify the query I used the dbo.fGetDatesInInterval table-valued function that returns a sequence of dates within a given time interval, and a CROSS APPLY returning the aggregated value for each date returned by the function, for the calculations being considered all the records having an Order Date smaller or equal with the given (sequence) date. As can be seen from the above query the not-equal constraint is based on DateDiff function, instead we could have written POH.OrderDate<=DII.DateSequence, one of the reasons for its use is the fact that the constraint could be easily modified to show for example only the aggregated values for the past 90 days, the BETWEEN operator entering in scene:
SELECT DII.DateSequence , IsNull(POH.SubTotal, 0) SubTotal FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII CROSS APPLY (--incremental PO Volume SELECT SUM(POH.SubTotal) SubTotal FROM Purchasing.PurchaseOrderHeader POH WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete AND DATEDIFF(d, POH.OrderDate, DII.DateSequence) BETWEEN 0 AND 90--not-equal constraint ) POH
The not-equal constraint from the query is part of interval-based constraints category which covers the scenarios in which one of the attributes from the tables participating in a join is checked whether it falls within a given interval, typically that being valid for numeric values as above or date value, as in the next query that shows the Last Standard Cost Details per Product.
-- Products & Last Standard Cost details SELECT ITM.ProductID , ITM.ProductNumber , ITM.StandardCost , PCH.StartDate , PCH.EndDate FROM Production.Product ITM JOIN Production.ProductCostHistory PCH ON ITM.ProductID = PCH.ProductID AND GETDATE() BETWEEN PCH.StartDate AND IsNull(PCH.EndDate, GETDATE())
Such queries need often to be written when the records in a table have a given validity, no overlap existing between intervals.
Using list of values within a join constraint and thus the IN or NOT IN operators occurs when for example the base table stores data coming from multiple sources and more encodings are used for the same source, each of them caring its meaning. For example using the Production.TransactionHistory and several other tables coming with AdventureWorks database, and supposing that for Purchase Orders are two transaction types P and B, and for Sales Orders S and I, a general scope query on Transactions could be written as follows:
-- Transaction History SELECT PTH.TransactionID , PTH.ProductID , PTH.ReferenceOrderID , CASE PTH.TransactionType WHEN 'W' THEN 'Work Order' WHEN 'S' THEN 'Sales Order' WHEN 'I' THEN 'Internal Sales Order' WHEN 'P' THEN 'Purchase Order' WHEN 'B' THEN 'Blanket Purchase Order' ELSE 'n.a' END TransactionType , CASE WHEN PTH.TransactionType = 'W' THEN StartDate WHEN PTH.TransactionType IN ('S', 'I') THEN SOH.OrderDate WHEN PTH.TransactionType IN ('P', 'B') THEN POH.OrderDate END ReferenceDate FROM Production.TransactionHistory PTH LEFT JOIN Purchasing.PurchaseOrderHeader POH ON PTH.ReferenceOrderID = POH.PurchaseOrderID AND PTH.TransactionType IN ('P', 'B') --not-equal constraint LEFT JOIN Sales.SalesOrderHeader SOH ON PTH.ReferenceOrderID = SOH.SalesOrderID AND PTH.TransactionType IN ('S', 'I') --not-equal constraint LEFT JOIN Production.WorkOrder PWO ON PTH.ReferenceOrderID = PWO.WorkOrderID AND PTH.TransactionType = 'W'
The need for writing such a query is seldom and could have been written as 3 distinct queries whose results are joined with two unions, both approaches having their pluses and minuses.
In one of the above queries was used a constraint based on DateDiff function, and in theory any other function could be used in a join constraint, including user-defined functions. Such function-based join constraints are handy but should be used with caution because they could impact query’s performance. Pattern-based join constraints used with LIKE operator could be used as well.
There are cases in which the constraints that typically would be included in the WHERE clause are added in the body of the join, apparently without any good reason. Such based denatured join constraints are base only on the attributes of one of the tables involved in the join, like in the next query:
-- PO for Products with StandardCost>10 SELECT POD.PurchaseOrderDetailID , POD.ProductID , ITM.ProductNumber , ITM.StandardCost , POD.UnitPrice , POD.OrderQty FROM Purchasing.PurchaseOrderDetail POD JOIN Production.Product ITM ON POD.ProductID = ITM.ProductID AND ITM.StandardCost>10 -- denaturated join constraint
No comments:
Post a Comment