SQL Server 2008 documentation mentions the not-equal joins without giving an explicit definition, the term appearing within the context of joining values in two columns that are not equal. On the other side the not-equal join denomination could be used to refer more generally to the joins whose join constraints involve comparison operators other than equal (“=”), for example <, >, <>, BETWEEN, IN, and its negation NOT IN, LIKE, etc. The difference between the two views resides on the fact that some of the operators (e.g. IN, BETWEEN) could imply the equality of values from joined columns. Another objection could be made on whether the not-equality join constraint could be based only on one of the table’s attributes participating in the join, the other thus missing, such constraints could be considered also as join constraints even if in some scenarios they could have been written in the main WHERE clause as well. For the sake of simplicity and flexibility I would consider a
not-equal join or
not-equality join as a join in which at least one of the join constraints is based on a operator other than the equal operator. Could be discussed thus about
not-equal join constraints referring to the join constraints that involve other operators than equal, and equal join constraints, the two types of joins could coexist in the same join. A not-equal join constraint often involves the possibility of being returned more records then if a equal join constraint was used.
If we talk about join constraints then more likely we refer to vertical joins, though also anti-joins and semi-joins could include sporadically not-equality operators. In a normal join based on equalities for the set of attributes participating in the join from the left table are matched all the records from the right table, even if this means retrieving more than one record. Same happens also for not-equal constraints even if the business scenarios in which such needs arises are not so intuitive. One relatively simple example I often met is the case when is needed to calculate the incremental aggregation of values, for example to calculate the incremental volume of PO placed over time:
SELECT DII.DateSequence
, IsNull(POH.SubTotal, 0) SubTotal
FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII
CROSS APPLY (--incremental PO Volume
SELECT SUM(POH.SubTotal) SubTotal
FROM Purchasing.PurchaseOrderHeader POH
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete
AND DATEDIFF(d, POH.OrderDate, DII.DateSequence)>=0 --not-equal constraint
) POH
In order to simplify the query I used the
dbo.fGetDatesInInterval table-valued function that returns a sequence of dates within a given time interval, and a CROSS APPLY returning the aggregated value for each date returned by the function, for the calculations being considered all the records having an Order Date smaller or equal with the given (sequence) date. As can be seen from the above query the not-equal constraint is based on DateDiff function, instead we could have written POH.OrderDate<=DII.DateSequence, one of the reasons for its use is the fact that the constraint could be easily modified to show for example only the aggregated values for the past 90 days, the BETWEEN operator entering in scene:
SELECT DII.DateSequence
, IsNull(POH.SubTotal, 0) SubTotal
FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII
CROSS APPLY (--incremental PO Volume
SELECT SUM(POH.SubTotal) SubTotal
FROM Purchasing.PurchaseOrderHeader POH
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete
AND DATEDIFF(d, POH.OrderDate, DII.DateSequence) BETWEEN 0 AND 90--not-equal constraint
) POH
The not-equal constraint from the query is part of
interval-based constraints category which covers the scenarios in which one of the attributes from the tables participating in a join is checked whether it falls within a given interval, typically that being valid for numeric values as above or date value, as in the next query that shows the Last Standard Cost Details per Product.
-- Products & Last Standard Cost details SELECT ITM.ProductID
, ITM.ProductNumber
, ITM.StandardCost
, PCH.StartDate
, PCH.EndDate
FROM Production.Product ITM
JOIN Production.ProductCostHistory PCH
ON ITM.ProductID = PCH.ProductID AND GETDATE() BETWEEN PCH.StartDate AND IsNull(PCH.EndDate, GETDATE())
Such queries need often to be written when the records in a table have a given validity, no overlap existing between intervals.
Using
list of values within a join constraint and thus the IN or NOT IN operators occurs when for example the base table stores data coming from multiple sources and more encodings are used for the same source, each of them caring its meaning. For example using the Production.TransactionHistory and several other tables coming with AdventureWorks database, and supposing that for Purchase Orders are two transaction types P and B, and for Sales Orders S and I, a general scope query on Transactions could be written as follows:
-- Transaction History
SELECT PTH.TransactionID
, PTH.ProductID
, PTH.ReferenceOrderID
, CASE PTH.TransactionType
WHEN 'W' THEN 'Work Order'
WHEN 'S' THEN 'Sales Order'
WHEN 'I' THEN 'Internal Sales Order'
WHEN 'P' THEN 'Purchase Order'
WHEN 'B' THEN 'Blanket Purchase Order'
ELSE 'n.a'
END TransactionType , CASE
WHEN PTH.TransactionType = 'W' THEN StartDate
WHEN PTH.TransactionType IN ('S', 'I') THEN SOH.OrderDate
WHEN PTH.TransactionType IN ('P', 'B') THEN POH.OrderDate
END ReferenceDate FROM Production.TransactionHistory PTH
LEFT JOIN Purchasing.PurchaseOrderHeader POH ON PTH.ReferenceOrderID = POH.PurchaseOrderID AND PTH.TransactionType IN ('P', 'B') --not-equal constraint
LEFT JOIN Sales.SalesOrderHeader SOH
ON PTH.ReferenceOrderID = SOH.SalesOrderID
AND PTH.TransactionType IN ('S', 'I') --not-equal constraint
LEFT JOIN Production.WorkOrder PWO
ON PTH.ReferenceOrderID = PWO.WorkOrderID
AND PTH.TransactionType = 'W'
The need for writing such a query is seldom and could have been written as 3 distinct queries whose results are joined with two unions, both approaches having their pluses and minuses.
In one of the above queries was used a constraint based on DateDiff function, and in theory any other function could be used in a join constraint, including user-defined functions. Such
function-based join constraints are handy but should be used with caution because they could impact query’s performance. Pattern-based join constraints used with LIKE operator could be used as well.
There are cases in which the constraints that typically would be included in the WHERE clause are added in the body of the join, apparently without any good reason. Such based d
enatured join constraints are base only on the attributes of one of the tables involved in the join, like in the next query:
-- PO for Products with StandardCost>10 SELECT POD.PurchaseOrderDetailID
, POD.ProductID
, ITM.ProductNumber
, ITM.StandardCost
, POD.UnitPrice
, POD.OrderQty
FROM Purchasing.PurchaseOrderDetail POD
JOIN Production.Product ITM ON POD.ProductID = ITM.ProductID
AND ITM.StandardCost>10 -- denaturated join constraint