One of the tasks which appears from time to time on my table is to determine the differences between two datasets coming from different systems or from the same system/source but taken at different points in time. Even if it’s fairly simple to write such a query, especially when the number of attributes is quite small, there is lot of repetitive work that needs to be done because typically for each set of matched attributes needs to be added a third attribute (difference flag) showing whether there is a difference or not. In time I started to use Excel formulas to create the SELECT clause for such a query, matching tables’ attribute first and categorizing the data types mainly in 4 categories: text, amounts, numeric (other than amounts) and dates. This categorization is requested first of all by the different ways in handling the NULL values for each data type, and secondly by the difference between amounts and numeric, even if they might have the same data type, the difference resides in the fact that amounts might be rounded differently in each data source, therefore a difference of a few cents will not be considered as difference.
Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:
--text attributes:
CASE
WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag
--amount attributes:
CASE
WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag
--numeric attributes:
CASE
WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag
--date attributes:
CASE
WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag
Notes: 1. Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag 2. In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3. In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4. It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.
Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found.
Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String
'builds the code for a comparison query between two tables
Dim attributes As String
Dim differences As String
Dim constraint As String
Dim whereConstraints As String
Dim joinConstraints As String
Dim columnX As String
Dim columnY As String
Dim index As Integer
For index = 1 To rng.Rows.Count
columnX = Trim(rng.Cells(index, 1).Value)
columnY = Trim(rng.Cells(index, 2).Value)
If Len(columnX) > 0 Or Len(columnY) > 0 Then
If Len(columnX) > 0 Then
attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf
End If
If Len(columnY) > 0 Then
attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf
End If
constraint = ""
If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then
If Len(columnX) > 0 And Len(columnY) > 0 Then
'creating the difference flag
Select Case Trim(rng.Cells(index, 3).Value)
Case "text":
constraint = "CASE" & vbCrLf & _
" WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _
" ELSE 'N'" & vbCrLf & _
" END"
Case "amount":
constraint = "CASE" & vbCrLf & _
" WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _
" ELSE 'N'" & vbCrLf & _
" END"
Case "numeric":
constraint = "CASE" & vbCrLf & _
" WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _
" ELSE 'N'" & vbCrLf & _
" END"
Case "date":
constraint = "CASE" & vbCrLf & _
" WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _
" ELSE 'N'" & vbCrLf & _
" END"
Case Else: 'error
MsgBox "Incorrect data type provided for " & index & " row!", vbCritical
End Select
If ShowOnlyDifferencesFlag Then
whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf
End If
differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf
Else
differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf
End If
Else
joinConstraints = joinConstraints & " AND A." & columnX & " = B." & columnY & vbCrLf
End If
End If
Next
If Len(attributes) > 0 Then
attributes = Right(attributes, Len(attributes) - 2)
End If
If Len(joinConstraints) > 0 Then
joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8)
End If
If Len(whereConstraints) > 0 Then
whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4)
End If
'building the comparison query
GetComparisonQuery = "SELECT " & attributes & _
differences & _
"FROM " & LeftTable & " A" & vbCrLf & _
" FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _
" ON " & joinConstraints & _
IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "")
End Function
The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in
Saving Data With Stored Procedures post:
SELECT A.ProductID LTProductID
, B.ProductID RTProductID
, A.Name LTName
, B.Name RTName
, A.ProductNumber LTProductNumber
, B.ProductNumber RTProductNumber
, A.MakeFlag LTMakeFlag
, B.MakeFlag RTMakeFlag
, A.FinishedGoodsFlag LTFinishedGoodsFlag
, B.FinishedGoodsFlag RTFinishedGoodsFlag
, A.Color LTColor
, B.Color RTColor
, A.SafetyStockLevel LTSafetyStockLevel
, B.SafetyStockLevel RTSafetyStockLevel
, A.ReorderPoint LTReorderPoint
, A.StandardCost LTStandardCost
, B.StandardCost RTStandardCost
, A.ListPrice LTListPrice
, B.ListPrice RTListPrice
, A.Size LTSize
, A.DaysToManufacture LTDaysToManufacture
, A.ProductLine LTProductLine
, A.Class LTClass
, A.Style LTStyle
, A.SellStartDate LTSellStartDate
, B.StartDate RTSellStartDate
, A.SellEndDate LTSellEndDate
, B.EndDate RTSellEndDate
, A.DiscontinuedDate LTDiscontinuedDate
, B.CreationDate RT
, B.CreatedBy RT
, A.ModifiedDate LTModifiedDate
, B.ModifiedDate RTModifiedDate
, B.ModifiedBy RT
, CASE
WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y'
ELSE 'N'
END DiffNameFlag
, CASE
WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y'
ELSE 'N'
END DiffProductNumberFlag
, CASE
WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y'
ELSE 'N'
END DiffMakeFlagFlag
, CASE
WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y'
ELSE 'N'
END DiffFinishedGoodsFlagFlag
, CASE
WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y'
ELSE 'N'
END DiffColorFlag
, CASE
WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y'
ELSE 'N'
END DiffSafetyStockLevelFlag
, 'n/a' DiffReorderPointFlag
, CASE
WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05
THEN 'Y'
ELSE 'N'
END DiffStandardCostFlag
, CASE
WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y'
ELSE 'N'
END DiffListPriceFlag
, 'n/a' DiffSizeFlag
, 'n/a' DiffDaysToManufactureFlag
, 'n/a' DiffProductLineFlag
, 'n/a' DiffClassFlag
, 'n/a' DiffStyleFlag
, CASE
WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y'
ELSE 'N'
END DiffSellStartDateFlag
, CASE
WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y'
ELSE 'N'
END DiffSellEndDateFlag
, 'n/a' DiffDiscontinuedDateFlag
, 'n/a' DiffCreationDateFlag
, 'n/a' DiffCreatedByFlag
, CASE
WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y'
ELSE 'N'
END DiffModifiedDateFlag
, 'n/a' DiffModifiedByFlag
FROM Production.Product A
FULL OUTER JOIN Production.Products B
ON A.ProductID = B.ProductID
WHERE CASE
WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05
THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y'
ELSE 'N'
END = 'Y'
OR CASE
WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y'
ELSE 'N'
END = 'Y'
Notes: 1. The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2. Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3. Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4. Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5. In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false)
6. For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7. Please note that no validation is made for the input parameters.