SQL Troubles: SQL Reloaded: Troubleshooting Query Issues III (Logical errors)

20 June 2010

SQL Reloaded: Troubleshooting Query Issues III (Logical errors)

    In last post I provided some general guidelines on how to troubleshoot query issues related to errors thrown by the database engine. That’s a fortunate case, because engine’s validation shows that something went wrong. I’m saying fortunate because there are also cases in which there are also logical errors that result in unexpected output, and the errors could remain even for years undiscovered. It’s true that such errors are hard to discover, but not impossible; adequate testing combined with defensive programming techniques could allow decreasing the volume of such errors. The simplest and handiest test could be performed by looking in a first phase at the number of records returned by the query, and secondly by studying the evolution of cardinality with each table added to the query. The clearest signal that something went bad is when no records, too less or too many records are returned (than expected), then the problem most probably resides in the constraints used, namely in JOIN, WHERE or HAVING clause constraints.

WHERE clause constraints

    If your query is just a simple statement without joins of any type, then more likely the problem resides in the constraints used in the WHERE clause, at least one of them being evaluated to false. Here are several possible reasons:
- One of the hardcoded values or one of the provided parameters is not found between the values available in the table;
- Not treated NULL cases;
- Wrong use of predicates or parenthesis.

    The simplest constraint evaluated to false is “0=1” or something similar, actually can be used any similar values (e.g. ‘A’=’B’, ‘ABC’=’’, etc.). In contrast “1=1” is evaluated always to true. The below query returns no records because the constraint is evaluated for false:

-- single constraint evaluated to false 
SELECT * 
FROM Production.Product 
WHERE 1=0

A more complex scenario is when multiple predicates and at least one equates always to true or to false, the task in such situations being to identify such constraints:

-- multiple constraints, one evaluated to false 
SELECT * 
FROM Production.Product 
WHERE 1=0 
    AND Class = 'L' 
    AND SafetyStockLevel >100

The constraints could be grouped also using parenthesis:

-- multiple constraints with parenthesis, one evaluated to false 
SELECT * 
FROM Production.Product 
WHERE 1=0 
    AND ((Class = 'L' 
    AND SafetyStockLevel >100) 
      OR (Class = 'M' 
    AND SafetyStockLevel =100))

Dealing with multiple such constraints is not a question of guessing but pure applied Boolean algebra essential in writing accurate queries. So if you are feeling that you’re not mastering such concepts, then maybe it’s a good idea to consult some material on this topic.

There could be situations in which the total outcome of a set of constraints it’s not so easy to see, in such cases the constraint could be brought in the SELECT statement. Which constraints should be tested depends also on the particularities of your query, here is an example:

--testing constraints' outcome for each record 
SELECT * 
, CASE  
     WHEN Class = 'L' AND SafetyStockLevel >100 THEN 1 
     ELSE 0 
  END Constraint1  
, CASE  
     WHEN Class = 'M' AND SafetyStockLevel =100 THEN 1 
     ELSE 0 
  END Constraint2 
, CASE  
     WHEN (Class = 'L' AND SafetyStockLevel >100) 
        OR (Class = 'M' AND SafetyStockLevel =100) THEN 1 
     ELSE 0 
END Constraint3 
FROM Production.Product

The constraints in which one of the members is NULL and the IS NULL or Null functions are not used, are incorrect evaluated, actually ignored, and in certain cases they might look even correct, though this depends also on the expressions used. For example the below query will return no records, while the next one will return some records. For more on the correct handling of NULLs see Null-ifying the world or similar posts on the web.

--misusing NULL-based constraints 
  SELECT * 
FROM Production.Product 
WHERE (NOT 1=NULL) OR (1=NULL)

JOIN constraints

JOINs based on single JOIN constraints should in theory pose no problem unless the wrong attributes are used in the JOIN and that’s so easy to do especially when no documentation is available on the topic or it is incomplete. Functional or other technical specifications, physical or semantic models, metadata or other developers’ knowledge could help the developer to figure out about the relations between the various entities. On the other side it’s in human nature to make mistakes, forgetting to add one constraint, using the wrong type of join or the wrong attributes in the join constraint. In addition all the situations described above for WHERE clauses constraints apply also to join constraints, independently on whether the ANSI or non-ANSI syntax is used, the later based on writing the JOIN constraints in the WHERE clause, making the identification of the JOIN constraints not so easy to see, this being one of the reasons for which I recommend you to use ANSI syntax.

A technique I use on a daily basis in order to test my scripts is to check the changes in the number of records with the adding of each new table to the initial query. In a first phase is enough to count the number of records from the table with the lowest level of detail, and the number of records from the final query, this ignoring the WHERE constraints. For example let’s consider the following query written some time ago (most of attributes were removed for convenience):

-- Purchase Orders 
 SELECT POD.PurchaseOrderDetailID 
, POD.PurchaseOrderID 
, POD.ProductID  
, POH.ShipMethodID 
, POH.VendorID 
FROM Purchasing.PurchaseOrderDetail POD 
     JOIN Purchasing.PurchaseOrderHeader POH 
       ON POD.PurchaseOrderID = POH.PurchaseOrderID 
           JOIN Purchasing.ShipMethod PSM 
              ON POH.ShipMethodID = PSM.ShipMethodID 
           JOIN Purchasing.Vendor SMF 
              ON POH.VendorID = SMF.VendorID  
     JOIN Production.Product ITM 
        ON POD.ProductID = ITM.ProductID 
WHERE POD.StockedQty >100 
     AND ITM.Class = 'L'

Usually I write the table with the lowest level of detail (in theory also with the highest number of records) first, adding the referenced table gradually, starting with the tables with the lowest number of records, thus the lookup tables will come first. This usually doesn’t affect the way the database engine processes the query (unless special techniques are used), and it allows to use kind of a systematic approach. This allows me also to test the query without making important changes. As I wrote above, the first test is done against the first table, for this commenting the other joins and eventually the WHERE constraints:

-- testing changes in the number of records - test 1 
SELECT count(1) NumberRecords 
FROM Purchasing.PurchaseOrderDetail POD 
/* JOIN Purchasing.PurchaseOrderHeader POH 
     ON POD.PurchaseOrderID = POH.PurchaseOrderID 
          JOIN Purchasing.ShipMethod PSM 
             ON POH.ShipMethodID = PSM.ShipMethodID 
          JOIN Purchasing.Vendor SMF 
             ON POH.VendorID = SMF.VendorID 
      JOIN Production.Product ITM 
         ON POD.ProductID = ITM.ProductID 
WHERE POD.StockedQty >100 
AND ITM.Class = 'L' */

In the second step is enough to check the number of records returned by the whole query without the WHERE constraints:

-- testing changes in the number of records - test 2 
SELECT count(1) NumberRecords 
FROM Purchasing.PurchaseOrderDetail POD  
     JOIN Purchasing.PurchaseOrderHeader POH 
        ON POD.PurchaseOrderID = POH.PurchaseOrderID 
            JOIN Purchasing.ShipMethod PSM 
               ON POH.ShipMethodID = PSM.ShipMethodID 
            JOIN Purchasing.Vendor SMF 
               ON POH.VendorID = SMF.VendorID  
      JOIN Production.Product ITM 
          ON POD.ProductID = ITM.ProductID 
/* WHERE POD.StockedQty >100 
     AND ITM.Class = 'L' */

If the number of records is correct then most probably the query is correct, this because there could be incorrect LEFT JOINS that are not reflected in the number of records but in the fact that the corresponding attributes are missing. So don’t rely entirely on this type of test, but take 1-2 examples for which you are sure that records must be retrieved from the LEFT JOIN, and check whether the expected values are shown. Eventually, only for testing purposes, the LEFT JOIN could be modified as a FULL JOIN to see what records are returned.

Now if there are changes in the number of records from one test to the other, then take the first query and move the comment one join further (see below), and repeat the step until the JOIN that causes the variances is identified.

-- testing changes in the number of records - test 3  
SELECT count(1) NumberRecords 
FROM Purchasing.PurchaseOrderDetail POD 
JOIN Purchasing.PurchaseOrderHeader POH 
ON POD.PurchaseOrderID = POH.PurchaseOrderID 
/*      JOIN Purchasing.ShipMethod PSM 
           ON POH.ShipMethodID = PSM.ShipMethodID 
         JOIN Purchasing.Vendor SMF 
            ON POH.VendorID = SMF.VendorID 
     JOIN Production.Product ITM 
        ON POD.ProductID = ITM.ProductID 
WHERE POD.StockedQty >100       
AND ITM.Class = 'L' */

Once the join that causes the variances is found, then might be requested then to add a new join constraint or use a LEFT JOIN instead of a FULL JOIN. For example if in the above query there are ShipMethodID with NULL values, then the query should have been written using a LEFT JOIN. On the other side, if duplicates are found, then do a grouping by using the primary key of the table with the lowest level of detail, in this case the Purchasing.PurchaseOrderDetail.

HAVING clause constraints

HAVING clause constraints behave more like WHERE clause constraints, the difference residing in the different levels where the constraints are applied, so the same technique could be applied in this case too, with the difference that the constraints need to be accommodated to support aggregate functions.

--Vendors with total purchase value >100000 
SELECT POH.VendorID 
, SUM(POH.SubTotal) TotalSubTotal 
FROM Purchasing.PurchaseOrderHeader POH 
WHERE POH.Status IN (2,4) -- 2-Approved, 4-Complete  
GROUP BY POH.VendorID 
HAVING SUM(POH.SubTotal)>100000

Sometimes it helps to bring the HAVING constraints into the SELECT, though that offers limited functionality, therefore it’s more productive maybe to re-link the aggregated query back to the base table(s) on which the statement is based:

--Vendors with total purchase value >100000 - base records 
SELECT A.* 
, B.TotalSubTotal 
, CASE 
     WHEN B.VendorID IS NOT NULL THEN 1 
     ELSE 0 
END ConsideredFlag 
FROM Purchasing.PurchaseOrderHeader A 
LEFT JOIN ( --aggregated data 
      SELECT POH.VendorID 
      , SUM(POH.SubTotal) TotalSubTotal 
      FROM Purchasing.PurchaseOrderHeader POH 
      WHERE POH.Status IN (2,4) -- 2-Approved, 4-Complete  
      GROUP BY POH.VendorID 
HAVING SUM(POH.SubTotal)>100000 
) B 
     ON A.VendorID = B.VendorID 
WHERE A.Status IN (2,4) -- 2-Approved, 4-Complete

Also this technique could prove to be limited, though I found it useful in several cases, especially when a complicated formula is included in the aggregated function. The same query could be rewritten with a window aggregate function introduced with SQL Server 2005.

--Vendors with total purchase value &gt;100000 - window aggregate function 
SELECT POH.* 
, SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID) TotalSubTotal 
, CASE 
     WHEN SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID)>100000 THEN 1 
     ELSE 0 
END ConsideredFlag 
FROM Purchasing.PurchaseOrderHeader POH 
WHERE POH.Status IN (2,4) -- 2-Approved, 4-Complete

When working with amounts, quantities or other number values, it could be useful to check the total amount/quantity based on the query logic against the total amount/quantity based on the base table.

SQL Troubles

Pages

20 June 2010

SQL Reloaded: Troubleshooting Query Issues III (Logical errors)

No comments:

About Me