SQL Troubles: defensive programming

Showing posts with label defensive programming. Show all posts

16 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 190: How Invisible Prompt Injection Could Be a Good Thing for AI)

Prompt Engineering Series

Prompt: "write a post of 600 words on how invisible prompt injection could be a good thing for AI"

Introduction

Invisible prompt injection is usually discussed as a threat - an attack vector that manipulates AI systems without the user’s awareness. And yes, in the wild, it is dangerous. But in controlled environments, invisible prompt injection can actually be a powerful tool for strengthening AI safety, improving robustness, and helping developers understand how models behave under pressure. By studying how AI systems respond to hidden instructions, researchers can build models that are more resilient, more transparent, and ultimately more trustworthy. In this sense, invisible prompt injection isn’t just a vulnerability; it’s also a diagnostic instrument that reveals how AI systems interpret, prioritize, and negotiate conflicting signals.

1. A Testing Ground for AI Robustness

Invisible prompt injection acts like a stress test. When researchers embed hidden instructions into text, images, or metadata, they can observe how the AI responds when its input channel is compromised. This helps developers identify:

Weak points in the model’s reasoning
Situations where the model over‑trusts user input
Scenarios where safety guardrails fail

By intentionally exposing the model to controlled injections, teams can strengthen its resistance to real‑world attacks. This transforms a vulnerability into a research tool that improves system resilience.

2. A Way to Understand How AI Prioritizes Instructions

Invisible prompt injection reveals how an AI model weighs different layers of input. Does it prioritize the user’s visible request? The hidden instruction? The system‑level rules? The model’s internal alignment?

Studying these interactions helps researchers map the model’s internal decision‑making. This is crucial for:

Improving interpretability
Refining alignment strategies
Ensuring consistent behavior across contexts

In other words, invisible prompt injection becomes a lens through which developers can examine the model’s internal hierarchy of influence.

3. A Tool for Building Better Defenses

You can’t defend against what you don’t understand. Controlled invisible prompt injection allows researchers to simulate attacks that malicious actors might attempt. This helps teams design:

Stronger input sanitization
Better content‑filtering pipelines
More resilient prompt‑parsing mechanisms

By studying how injections succeed, developers can build systems that automatically detect and neutralize them. This proactive approach turns a threat into a training mechanism for safer AI.

4. A Method for Evaluating Real‑World Risk

Invisible prompt injection helps researchers evaluate how AI systems behave in messy, unpredictable environments. Real‑world data is full of:

Hidden formatting
Embedded metadata
Unintended instructions
Noisy or adversarial contention

Testing with invisible injections helps developers understand how the model behaves when confronted with ambiguous or corrupted inputs. This leads to AI systems that are more stable, more predictable, and more reliable in everyday use.

5. A Catalyst for Better AI Governance

Invisible prompt injection research encourages organizations to adopt stronger governance practices. It highlights the need for:and

Clear safety protocols
Rigorous red‑team testing
Transparent risk assessments
Continuous monitoring

By treating invisible prompt injection as a legitimate research tool, organizations can build a culture of proactive safety rather than reactive patching.

Closing Statement

Invisible prompt injection is dangerous when used maliciously - but in controlled, ethical research settings, it becomes a powerful instrument for strengthening AI. It exposes weaknesses, reveals hidden dynamics, and helps developers build systems that are more robust, more transparent, and more aligned with human values. By studying how AI responds to invisible manipulation, we gain the insight needed to design models that behave predictably and safely, even in the face of unexpected inputs. In this way, invisible prompt injection isn’t just a threat - it’s also an opportunity to build better, safer AI.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

13 August 2017

#️⃣Software Engineering: SQL Reloaded (Patt II: Who Messed with My Data?)

Introduction

"Errors, like straws, upon the surface flow;
He who would search for pearls must dive below."
(John Dryden)

Life of a programmer is full of things that stopped working overnight. What’s beautiful about such experiences is that always there is a logical explanation for such “happenings”. There are two aspects - one is how to troubleshoot such problems, and the second – how to avoid such situations, and this is typically done through what we refer as defensive programming. On one side avoiding issues makes one’s life simpler, while issues make it fuller.

I can say that I had plenty such types of challenges in my life, most of them self-created, mainly in the learning process, but also a good share of challenges created by others. Independently of the time spent on troubleshooting such issues, it’s the experience that counts, the little wins against the “dark” side of programming. In the following series of posts I will describe some of the issues I was confronted directly or indirectly over time. In an ad-hoc characterization they can be split in syntax, logical, data, design and systemic errors.

Syntax Errors

"Watch your language young man!"

(anonymous mother)

Syntax in natural languages like English is the sequence in which words are put together, word’s order indicating the relationship existing between words. Based on the meaning the words carry and the relationships formed between words we are capable to interpret sentences. SQL, initially called SEQUEL (Structured English Query Language) is an English-like language designed to manipulate and retrieve data. Same as natural languages, artificial languages like SQL have their own set of (grammar) rules that when violated lead to runtime errors, leading to interruption in code execution or there can be cases when the code runs further leading to inconsistencies in data. Unlike natural languages, artificial languages interpreters are quite sensitive to syntax errors.

Syntax errors are common to beginners, though a moment of inattention or misspelling can happen to anyone, no matter how versatile one’s coding is. Some are more frequent or have a bigger negative impact than others. Here are some of the typical types of syntax errors:
- missing brackets and quotes, especially in complex formulas;
- misspelled commands, table or column names;
- omitting table aliases or database names;
- missing objects or incorrectly referenced objects or other resources;
- incorrect statement order;
- relying on implicit conversion;
- incompatible data types;
- incorrect parameters’ order;
- missing or misplaced semicolons;
- usage of deprecated syntax.

Typically, syntax errors are easy to track at runtime with minimal testing as long the query is static. Dynamic queries on the other side require sometimes a larger number of combinations to be tested. The higher the number of attributes to be combined and the more complex the logic behind them, the more difficult is to test all combinations. The more combinations not tested, the higher the probability that an error might lurk in the code. Dynamics queries can thus easily become (syntax) error generators.

Logical Errors

"Students are often able to use algorithms to solve numerical problems
without completely understanding the underlying scientific concept."
(Eric Mazur)

One beautiful aspect of the human mind is that it needs only a rough understanding about how a tool works in order to make use of it up to an acceptable level. Therefore often it settles for the minimum of understanding that allows it to use a tool. Aspects like the limits of a tool, contexts of applicability, how it can be used efficiently to get the job done, or available alternatives, all these can be ignored in the process. As the devil lies in details, misunderstanding how a piece of technology works can prove to be our Achilles’ heel. For example, misunderstanding how sets and the different types of joins work, that lexical order differ from logical order and further to order of execution, when is appropriate or inappropriate to use a certain technique or functionality can make us make poor choices.

One of these poor choices is the method used to solve a problem. A mature programming language can offer sometimes two or more alternatives for solving a problem. Choosing the inadequate solution can lead to performance issues in time. This type of errors can be rooted in the lack of understanding of the data, of how an application is used, or how a piece of technology works.

"I suppose it is tempting, if the only tool you have is a hammer,
to treat everything as if it were a nail."
(Abraham Maslow)

Some of the errors derive from the difference between how different programming languages work with data. There can be considerable differences between procedural, relational and vector languages. When jumping from one language to another, one can be tempted to apply the same old techniques to the new language. The solution might work, though (by far) not optimal.

The capital mistake is to be the man of one tool, and use it in all the cases, even when not appropriate. For example. when one learned working with views, attempts to apply them all over the code in order to reuse logic, creating thus chains of views which even prove to be flexible, their complexity sooner or later will kick back. Same can happen with stored procedures and other object types as well. A sign of mastery is when the developer adapts his tools to the purpose.

"For every complex problem there is an answer
that is clear, simple, and wrong."
(Henry L Mencken)

One can build elegant solutions but solve the wrong problem. Misunderstanding the problem at hand is one type of error sometimes quite difficult to identify. Typically, they can be found through thorough testing. Sometimes the unavailability of (quality) data can impede the process of testing, such errors being found late in the process.

At the opposite side, one can attempt to solve the right problem but with logic flaws – wrong steps order, wrong algorithm, wrong set of tools, or even missing facts/assumptions. A special type of logical errors are the programmatic errors, which occur when SQL code encounters a logic or behavioral error during processing (e.g. infinite loop, out of range input). [1]

Data Errors

"Data quality requires certain level of sophistication within a company
to even understand that it’s a problem."
(Colleen Graham)

Poor data quality is the source for all evil, or at least for some of the evil. Typically, a good designed database makes use of a mix of techniques to reduce the chances for inconsistencies: appropriate data types and data granularity, explicit transactions, check constraints, default values, triggers or integrity constraints. Some of these techniques can be too restrictive, therefore in design one has to provide a certain flexibility in the detriment of one of the above techniques, fact that makes the design vulnerable to same range of issues: missing values, missing or duplicate records.

No matter how good a database was designed, sometimes is difficult to cope with users’ ingenuity – misusage of functionality, typically resulting in deviations from standard processes, that can invalidate an existing query. Similar effects have the changes to processes or usage of new processed not addressed in existing queries or reports.

Another topic that have a considerable impact on queries’ correctness is the existence, or better said the inexistence of master data policies and a board to regulate the maintenance of master data. Without proper governance of master data one might end up with a big mess with no way to bring some order in it without addressing the quality of data adequately.

Designed to Fail

"The weakest spot in a good defense is designed to fail."
(Mark Lawrence)

In IT one can often meet systems designed to fail, the occurrences of errors being just a question of time, kind of a ticking bomb. In such situations, a system is only as good as its weakest link(s). Issues can be traced back to following aspects:
- systems used for what they were not designed to do – typically misusing a tool for a purpose for which another tool would be more appropriate (e.g. using Excel as database, using SSIS for real-time, using a reporting tool for data entry);
- poor performing systems - systems not adequately designed for the tasks supposed to handle (e.g. handling large volume of data/transactions);
- systems not coping with user’s inventiveness or mistakes (e.g. not validating adequately user input or not confirming critical actions like deletion of records);
- systems not configurable (e.g. usage of hardcoded values instead of parameters or configurable values);
- systems for which one of the design presumptions were invalidated by reality (e.g. input data don’t have the expected format, a certain resource always exists);
- systems not being able to handle changes in environment (e.g. changing user settings for language, numeric or data values);
- systems succumbing in their own complexity (e.g. overgeneralization, wrong mix of technologies);
- fault intolerant systems – system not handling adequately more or less unexpected errors or exceptions (e.g. division by zero, handling of nulls, network interruptions, out of memory).

Systemic Errors

Systemic errors can be found at the borders of the "impossible", situations in which the errors defy the common sense. Such errors are not determined by chance but are introduced by an inaccuracy inherent to the system/environment.

A systemic error occurs when a SQL program encounters a deficiency or unexpected condition with a system resource (e.g. a program encountered insufficient space in tempdb to process a large query, database/transaction log running out of space). [1]

Such errors are often difficult but not impossible to reproduce. The difficulty resides primarily in figuring out what happened, what caused the error. Once one found the cause, with a little resourcefulness one can come with an example to reproduce the error.

Conclusion

"To err is human; to try to prevent recurrence of error is science."
(Anon)

When one thinks about it, there are so many ways to fail. In the end to err is human and nobody is exempted from making mistakes, no matter how good or wise. The quest of a (good) programmer is to limit errors’ occurrences, and to correct them early in process, before they start becoming a nightmare.

References:
[1] Transact-SQL Programming: Covers Microsoft SQL Server 6.5 /7.0 and Sybase, by Kevin Kline, Lee Gould & Andrew Zanevsky, O’Reilly, ISBN 10: 1565924010, 1999

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

Typically, a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation.

Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.

-- concatenation of strings 
SELECT 'a string' + ' and ' + 'a second string' Example1  
, 'a string' + ',' + 'a second string' Example2  
, '1245' + '67890' Example3  
, '1245' + '.' + '67890' Example4

The concatenation of string variables or columns functions based on the same principles:

-- concatenating string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 varchar(50) 
DECLARE @string3 varchar(50) 

SET @string1 = 'this is a string'  
SET @string2 = 'this is another string'  

SELECT @string1 + ' and ' + @string2 Example1 
, @string1 + char(31) + @string2 Example2 
, @string1 + ', ' + @string2 Example3 
, @string1 + ' ' + @string3 Example4 
, @string1 + IsNull(@string3, '!') Example5

Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:

-- concatenating columns of joined tables 
SELECT PAD.AddressID 
, IsNull(PAD.AddressLine1, '')  
+ IsNull(', ' + PAD.AddressLine2, '') 
+ IsNull(', ' + PAD.City, '') 
+ IsNull(', ' + PAD.PostalCode, '') 
+ IsNull(', ' + PSP.Name, '') Address 
FROM Person.Address PAD 
      JOIN Person.StateProvince PSP 
         ON PAD.StateProvinceID = PSP.StateProvinceID

As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:

-- concatenation within a loop 
DECLARE @list varchar(max) 
DECLARE @index int  
SET @list = '' 
SET @index = ASCII('a') WHILE (@index<ASCII('z')) 
BEGIN 
     SET @list = @list + ', ' + Char(@index) 
     SET @index = @index + 1 
END 
SELECT @list Result

There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:

-- concatenation across records 
;WITH Data(Column1, Ranking)  
AS  
( -- preparing test data 
       SELECT 'A' Column1, 0 Ranking  
       UNION ALL  
       SELECT 'B' Column1, 1 Ranking  
       UNION ALL  
       SELECT 'C' Column1, 2 Ranking 
) 
, Result(Column1, Ranking)  
AS 
(  -- performing the actual concatenation 
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking

      FROM Data

      WHERE Ranking = 0  
      UNION ALL 
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking  
      FROM Data A  
         JOIN Result  B  
             ON A.Ranking - 1 = B.Ranking  
) SELECT Column1  
FROM Result  
WHERE Ranking IN (SELECT MAX(Ranking) FROM Result)

The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

It doesn’t make sense to talk about the creation of string data types without talking about their “selection” as people typically want to see also examples at work, and not only learn about theoretical facts. I’m saying that because often programming seems to be like putting together the pieces of a puzzle without knowing how the final image would look like. Anyway, coming back to the topic, what I find great about SQL Server is that it allows to “create” and select a string with a minimum overhead within a simple select statement:

-- selecting a string value 
SELECT 'this is a string' -- string 1 
, 'and this is a second string' --string2 

-- selecting a string value (named columns)  
SELECT 'this is a string' String1  
, 'and this is a second string' String2

Pretty simple, isn’t it? This kind of scripts could be useful especially when debugging logic or testing an expression (e.g. functions at work), the advantage residing in the fact that is not necessary to built a table in order to test simple things. When performing multiple operations with the same values could be handy to store the values in declared variables:

-- selecting string variables 
DECLARE @string1 varchar(50)

DECLARE @string2 char(50) 
DECLARE @string3 char(50) = 'this is a string'  

SET @string1 = 'this is a string'  

SELECT @string1 String1 , @string2 String2 
, @string3 String3

As can be seen a not initialized variable has by default the value NULL, fact that could lead to unexpected behavior problems when used in expressions involving multiple variables. Therefore it’s recommended to initialize the values with the empty string or at least to handle the nulls in expressions. Starting with SQL Server 2008 it’s possible to declare and initialize variables within the same statement, so the above code won’t work in previous versions, unless the third declaration is broken into two pieces as per definition of first string.

It looks like that’s all, at least in what concerns the direct declaration of a string, be it implicit or explicit. However a basic introduction into strings’ creation is incomplete without mentioning the creation of strings from other data types and the various functions that could be used for this purpose. Ignoring the Cast and Convert functions used to explicitly convert the other data types to string data types, there are several other functions for this purpose, namely Char, Space and Replicate.

Probably some of you are already familiar with the ASCII (American Standard Code for Information Interchange) character-encoding scheme (vs. binary encoding) used to encode files. ASCII code represents a mapping between number and characters, SQL Server supporting the transformation between two sets through the ASCII and Char functions. If ASCII translates a non-unicode character into an integer, the Char function translates an integer value into a non-unicode character. The integer values range between 0 and 255, they encompassing the 0-9 digits, the characters of English together with the diacritics of other important languages, punctuation signs and several graphical characters. The mapping between the two sets is unique, and as can be seen from the below example based on a common table expression, the functions are inverse:

-- ASCII character values 
WITH CTE 
AS (    
    SELECT 0 [Index]  
    UNION ALL 
    SELECT [Index]+1 [Index]  
    FROM CTE 
    WHERE [Index]<255 
) SELECT [Index] 
, CHAR([Index]) [Character] 
, ASCII(CHAR([Index])) [ASCII] 
FROM CTE 
OPTION (MAXRECURSION 0)

There is not much to say about the Space and Replicate functions, the Space function returns a string of repeated spaces, while the Replicate function forms a string as a repeated sequence of values. The definition of the Space function could be considered as redundant as long the same output could be obtained by using the space as repeating sequence.

-- Space & Replicate at work 
SELECT Space(1) Example1 
, Space(5) Example2 
, Replicate(' ', 1) Example3 
, Replicate(' ', 5) Example4 
, Replicate('10', 1) Example5 
, Replicate('10', 2) Example6 
, Replicate('10', 3) Example7

As per the last statement, the first and third examples, respectively the second and fourth example will return the same values, unfortunately the output of the respective examples it’s not so easy to see. For exemplification, it could have been enriched by comparing or concatenating the strings, though that’s a topic for the next post.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

16 March 2010

🔏MS Office: Excel for SQL Developers IV (Differences Between Two Datasets)

One of the tasks which appears from time to time on my table is to determine the differences between two datasets coming from different systems or from the same system/source but taken at different points in time. Even if it’s fairly simple to write such a query, especially when the number of attributes is quite small, there is lot of repetitive work that needs to be done because typically for each set of matched attributes needs to be added a third attribute (difference flag) showing whether there is a difference or not. In time I started to use Excel formulas to create the SELECT clause for such a query, matching tables’ attribute first and categorizing the data types mainly in 4 categories: text, amounts, numeric (other than amounts) and dates. This categorization is requested first of all by the different ways in handling the NULL values for each data type, and secondly by the difference between amounts and numeric, even if they might have the same data type, the difference resides in the fact that amounts might be rounded differently in each data source, therefore a difference of a few cents will not be considered as difference.

Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:

--text attributes: 
CASE 
    WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--amount attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--numeric attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--date attributes: 
CASE 
   WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y' 
   ELSE 'N' 
END DiffColumnXYFlag

Notes:
1.   Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
    WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
    ELSE 'N'
END DiffColumnXYFlag
2.   In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3.    In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4.   It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.

    Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found.

Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String 
'builds the code for a comparison query between two tables 
Dim attributes As String 
Dim differences As String 
Dim constraint As String 
Dim whereConstraints As String 
Dim joinConstraints As String 
Dim columnX As String 
Dim columnY As String 
Dim index As Integer 
For index = 1 To rng.Rows.Count 
    columnX = Trim(rng.Cells(index, 1).Value) 
    columnY = Trim(rng.Cells(index, 2).Value) 
  
    If Len(columnX) > 0 Or Len(columnY) > 0 Then 
       If Len(columnX) > 0 Then 
            attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf 
       End If 
       If Len(columnY) > 0 Then 
            attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf 
       End If 
       
       constraint = "" 
       If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then 
            If Len(columnX) > 0 And Len(columnY) > 0 Then 
                 'creating the difference flag 
                 Select Case Trim(rng.Cells(index, 3).Value) 
                 Case "text": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "amount": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "numeric": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "date": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case Else: 'error 
                     MsgBox "Incorrect data type provided for " & index & " row!", vbCritical 
                 End Select 
                 
                
                If ShowOnlyDifferencesFlag Then 
                   whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf 
                End If 
                
                differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf 
            Else 
                differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf 
            End If 
                             
        Else 
            joinConstraints = joinConstraints & "    AND A." & columnX & " = B." & columnY & vbCrLf 
        End If 
     
     End If 
Next 
If Len(attributes) > 0 Then 
    attributes = Right(attributes, Len(attributes) - 2) 
End If 
If Len(joinConstraints) > 0 Then 
    joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8) 
End If 
If Len(whereConstraints) > 0 Then 
    whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4) 
End If 
'building the comparison query 
GetComparisonQuery = "SELECT " & attributes & _ 
    differences & _ 
    "FROM " & LeftTable & " A" & vbCrLf & _ 
    "     FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _ 
    "       ON " & joinConstraints & _ 
    IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "") 
   
End Function

The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in Saving Data With Stored Procedures post:

SELECT A.ProductID LTProductID 
, B.ProductID RTProductID 
, A.Name LTName 
, B.Name RTName 
, A.ProductNumber LTProductNumber 
, B.ProductNumber RTProductNumber 
, A.MakeFlag LTMakeFlag 
, B.MakeFlag RTMakeFlag 
, A.FinishedGoodsFlag LTFinishedGoodsFlag 
, B.FinishedGoodsFlag RTFinishedGoodsFlag 
, A.Color LTColor 
, B.Color RTColor 
, A.SafetyStockLevel LTSafetyStockLevel 
, B.SafetyStockLevel RTSafetyStockLevel 
, A.ReorderPoint LTReorderPoint 
, A.StandardCost LTStandardCost 
, B.StandardCost RTStandardCost 
, A.ListPrice LTListPrice 
, B.ListPrice RTListPrice 
, A.Size LTSize 
, A.DaysToManufacture LTDaysToManufacture 
, A.ProductLine LTProductLine 
, A.Class LTClass 
, A.Style LTStyle 
, A.SellStartDate LTSellStartDate 
, B.StartDate RTSellStartDate 
, A.SellEndDate LTSellEndDate 
, B.EndDate RTSellEndDate 
, A.DiscontinuedDate LTDiscontinuedDate 
, B.CreationDate RT 
, B.CreatedBy RT 
, A.ModifiedDate LTModifiedDate 
, B.ModifiedDate RTModifiedDate 
, B.ModifiedBy RT 
, CASE 
    WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
    ELSE 'N' 
END DiffNameFlag 
, CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
    ELSE 'N' 
END DiffProductNumberFlag 
, CASE 
    WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffMakeFlagFlag 
, CASE 
    WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffFinishedGoodsFlagFlag 
, CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
    ELSE 'N' 
END DiffColorFlag 
, CASE 
    WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
    ELSE 'N' 
END DiffSafetyStockLevelFlag 
, 'n/a' DiffReorderPointFlag 
, CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05       
     THEN 'Y' 
ELSE 'N' 
END DiffStandardCostFlag 
, CASE 
     WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
     ELSE 'N' 
END DiffListPriceFlag 
, 'n/a' DiffSizeFlag 
, 'n/a' DiffDaysToManufactureFlag 
, 'n/a' DiffProductLineFlag 
, 'n/a' DiffClassFlag 
, 'n/a' DiffStyleFlag 
, CASE 
    WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellStartDateFlag 
, CASE 
    WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellEndDateFlag 
, 'n/a' DiffDiscontinuedDateFlag 
, 'n/a' DiffCreationDateFlag 
, 'n/a' DiffCreatedByFlag 
, CASE 
     WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
     ELSE 'N' 
END DiffModifiedDateFlag 
, 'n/a' DiffModifiedByFlag 
FROM Production.Product A 
    FULL OUTER JOIN Production.Products B 
       ON A.ProductID = B.ProductID 
WHERE CASE 
     WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 
      THEN 'Y' 
ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
       WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y'

Notes:
1.   The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2.   Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3.    Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4.     Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5.     In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false)
6.     For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7.     Please note that no validation is made for the input parameters.

SQL Troubles

Pages