Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:
--text attributes: CASE WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y' ELSE 'N' END DiffColumnXYFlag --amount attributes: CASE WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' ELSE 'N' END DiffColumnXYFlag --numeric attributes: CASE WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y' ELSE 'N' END DiffColumnXYFlag --date attributes: CASE WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y' ELSE 'N' END DiffColumnXYFlag
Notes:
1. Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
ELSE 'N'
END DiffColumnXYFlag
2. In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3. In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4. It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.
Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found.
Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String 'builds the code for a comparison query between two tables Dim attributes As String Dim differences As String Dim constraint As String Dim whereConstraints As String Dim joinConstraints As String Dim columnX As String Dim columnY As String Dim index As Integer For index = 1 To rng.Rows.Count columnX = Trim(rng.Cells(index, 1).Value) columnY = Trim(rng.Cells(index, 2).Value) If Len(columnX) > 0 Or Len(columnY) > 0 Then If Len(columnX) > 0 Then attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf End If If Len(columnY) > 0 Then attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf End If constraint = "" If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then If Len(columnX) > 0 And Len(columnY) > 0 Then 'creating the difference flag Select Case Trim(rng.Cells(index, 3).Value) Case "text": constraint = "CASE" & vbCrLf & _ " WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _ " ELSE 'N'" & vbCrLf & _ " END" Case "amount": constraint = "CASE" & vbCrLf & _ " WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _ " ELSE 'N'" & vbCrLf & _ " END" Case "numeric": constraint = "CASE" & vbCrLf & _ " WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _ " ELSE 'N'" & vbCrLf & _ " END" Case "date": constraint = "CASE" & vbCrLf & _ " WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _ " ELSE 'N'" & vbCrLf & _ " END" Case Else: 'error MsgBox "Incorrect data type provided for " & index & " row!", vbCritical End Select If ShowOnlyDifferencesFlag Then whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf End If differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf Else differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf End If Else joinConstraints = joinConstraints & " AND A." & columnX & " = B." & columnY & vbCrLf End If End If Next If Len(attributes) > 0 Then attributes = Right(attributes, Len(attributes) - 2) End If If Len(joinConstraints) > 0 Then joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8) End If If Len(whereConstraints) > 0 Then whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4) End If 'building the comparison query GetComparisonQuery = "SELECT " & attributes & _ differences & _ "FROM " & LeftTable & " A" & vbCrLf & _ " FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _ " ON " & joinConstraints & _ IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "") End Function
The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in Saving Data With Stored Procedures post:
SELECT A.ProductID LTProductID , B.ProductID RTProductID , A.Name LTName , B.Name RTName , A.ProductNumber LTProductNumber , B.ProductNumber RTProductNumber , A.MakeFlag LTMakeFlag , B.MakeFlag RTMakeFlag , A.FinishedGoodsFlag LTFinishedGoodsFlag , B.FinishedGoodsFlag RTFinishedGoodsFlag , A.Color LTColor , B.Color RTColor , A.SafetyStockLevel LTSafetyStockLevel , B.SafetyStockLevel RTSafetyStockLevel , A.ReorderPoint LTReorderPoint , A.StandardCost LTStandardCost , B.StandardCost RTStandardCost , A.ListPrice LTListPrice , B.ListPrice RTListPrice , A.Size LTSize , A.DaysToManufacture LTDaysToManufacture , A.ProductLine LTProductLine , A.Class LTClass , A.Style LTStyle , A.SellStartDate LTSellStartDate , B.StartDate RTSellStartDate , A.SellEndDate LTSellEndDate , B.EndDate RTSellEndDate , A.DiscontinuedDate LTDiscontinuedDate , B.CreationDate RT , B.CreatedBy RT , A.ModifiedDate LTModifiedDate , B.ModifiedDate RTModifiedDate , B.ModifiedBy RT , CASE WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' ELSE 'N' END DiffNameFlag , CASE WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' ELSE 'N' END DiffProductNumberFlag , CASE WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' ELSE 'N' END DiffMakeFlagFlag , CASE WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' ELSE 'N' END DiffFinishedGoodsFlagFlag , CASE WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' ELSE 'N' END DiffColorFlag , CASE WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' ELSE 'N' END DiffSafetyStockLevelFlag , 'n/a' DiffReorderPointFlag , CASE WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' ELSE 'N' END DiffStandardCostFlag , CASE WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' ELSE 'N' END DiffListPriceFlag , 'n/a' DiffSizeFlag , 'n/a' DiffDaysToManufactureFlag , 'n/a' DiffProductLineFlag , 'n/a' DiffClassFlag , 'n/a' DiffStyleFlag , CASE WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' ELSE 'N' END DiffSellStartDateFlag , CASE WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' ELSE 'N' END DiffSellEndDateFlag , 'n/a' DiffDiscontinuedDateFlag , 'n/a' DiffCreationDateFlag , 'n/a' DiffCreatedByFlag , CASE WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' ELSE 'N' END DiffModifiedDateFlag , 'n/a' DiffModifiedByFlag FROM Production.Product A FULL OUTER JOIN Production.Products B ON A.ProductID = B.ProductID WHERE CASE WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' ELSE 'N' END = 'Y' OR CASE WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' ELSE 'N' END = 'Y'
Notes:
1. The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2. Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3. Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4. Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5. In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false)
6. For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7. Please note that no validation is made for the input parameters.