20 June 2010

Troubleshooting – Part III: Troubleshooting query issues: Logical errors

    In last post I provided some general guidelines on how to troubleshoot query issues related to errors thrown by the database engine. That’s a fortunate case, because engine’s validation shows that something went wrong. I’m saying fortunate because there are also cases in which there are also logical errors that result in unexpected output, and the errors could remain even for years undiscovered. It’s true that such errors are hard to discover, but not impossible; adequate testing combined with defensive programming techniques could allow decreasing the volume of such errors. The simplest and handiest test could be performed by looking in a first phase at the number of records returned by the query, and secondly by studying the evolution of cardinality with each table added to the query. The clearest signal that something went bad is when no records, too less or too many records are returned (than expected), then the problem most probably resides in the constraints used, namely in JOIN, WHERE or HAVING clause constraints.

WHERE clause constraints

    If your query is just a simple statement without joins of any type, then more likely the problem resides in the constraints used in the WHERE clause, at least one of them being evaluated to false. Here are several possible reasons:
- One of the hardcoded values or one of the provided parameters is not found between the values available in the table;
- Not treated NULL cases;
- Wrong use of predicates or parenthesis.

    The simplest constraint evaluated to false is “0=1” or something similar, actually can be used any similar values (e.g. ‘A’=’B’, ‘ABC’=’’, etc.). In contrast “1=1” is evaluated always to true. The below query returns no records because the constraint is evaluated for false:
-- single constraint evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0


    A more complex scenario is when multiple predicates and at least one equates always to true or to false, the task in such situations being to identify such constraints:
-- multiple constraints, one evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0
    AND Class = 'L'

    AND SafetyStockLevel >100


    The constraints could be grouped also using parenthesis:
-- multiple constraints with parenthesis, one evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0
    AND ((Class = 'L'

    AND SafetyStockLevel >100)

      OR (Class = 'M'

    AND SafetyStockLevel =100))


    Dealing with multiple such constraints is not a question of guessing but pure applied Boolean algebra essential in writing accurate queries. So if you are feeling that you’re not mastering such concepts, then maybe it’s a good idea to consult some material on this topic.

    There could be situations in which the total outcome of a set of constraints it’s not so easy to see, in such cases the constraint could be brought in the SELECT statement. Which constraints should be tested depends also on the particularities of your query, here is an example:
--testing constraints' outcome for each record
SELECT *
,
CASE
     WHEN Class = 'L' AND SafetyStockLevel >100 THEN 1
     ELSE 0
  END Constraint1

,
CASE
     WHEN Class = 'M' AND SafetyStockLevel =100 THEN 1
     ELSE 0
  END Constraint2

,
CASE
     WHEN (Class = 'L' AND SafetyStockLevel >100)

        OR (Class = 'M' AND SafetyStockLevel =100) THEN 1
     ELSE 0
END Constraint3

FROM
Production.Product 

  The constraints in which one of the members is NULL and the IS NULL or Null functions are not used, are incorrect evaluated, actually ignored, and in certain cases they might look even correct, though this depends also on the expressions used. For example the below query will return no records, while the next one will return some records. For more on the correct handling of NULLs please see Null-ifying the world or similar posts on the web.

--misusing NULL-based constraints
SELECT *
FROM
Production.Product

WHERE
(NOT 1=NULL) OR (1=NULL)


JOIN constraints

    JOINs based on single JOIN constraints should in theory pose no problem unless the wrong attributes are used in the JOIN and that’s so easy to do especially when no documentation is available on the topic or it is incomplete. Functional or other technical specifications, physical or semantic models, metadata or other developers’ knowledge could help the developer to figure out about the relations between the various entities. On the other side it’s in human nature to make mistakes, forgetting to add one constraint, using the wrong type of join or the wrong attributes in the join constraint. In addition all the situations described above for WHERE clauses constraints apply also to join constraints, independently on whether the ANSI or non-ANSI syntax is used, the later based on writing the JOIN constraints in the WHERE clause, making the identification of the JOIN constraints not so easy to see, this being one of the reasons for which I recommend you to use ANSI syntax.

    A technique I use on a daily basis in order to test my scripts is to check the changes in the number of records with the adding of each new table to the initial query. In a first phase is enough to count the number of records from the table with the lowest level of detail, and the number of records from the final query, this ignoring the WHERE constraints. For example let’s consider the following query written some time ago (most of attributes were removed for simplicity):
-- Purchase Orders
SELECT POD.PurchaseOrderDetailID
,
POD.PurchaseOrderID

,
POD.ProductID

,
POH.ShipMethodID

,
POH.VendorID

FROM
Purchasing.PurchaseOrderDetail POD
     JOIN Purchasing.PurchaseOrderHeader POH
       ON POD.PurchaseOrderID = POH.PurchaseOrderID
           JOIN Purchasing.ShipMethod PSM
              ON POH.ShipMethodID = PSM.ShipMethodID
           JOIN Purchasing.Vendor SMF
              ON POH.VendorID = SMF.VendorID
     JOIN Production.Product ITM
        ON POD.ProductID = ITM.ProductID

WHERE
POD.StockedQty >100
     AND ITM.Class = 'L'


    Usually I write the table with the lowest level of detail (in theory also with the highest number of records) first, adding the referenced table gradually, starting with the tables with the lowest number of records, thus the lookup tables will come first. This usually doesn’t affect the way the database engine processes the query (unless special techniques are used), and it allows to use kind of a systematic approach. This allows me also to test the query without making important changes. As I wrote above, the first test is done against the first table, for this commenting the other joins and eventually the WHERE constraints:
-- testing changes in the number of records - test 1
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD

/* JOIN Purchasing.PurchaseOrderHeader POH
     ON POD.PurchaseOrderID = POH.PurchaseOrderID
          JOIN Purchasing.ShipMethod PSM
             ON POH.ShipMethodID = PSM.ShipMethodID
          JOIN Purchasing.Vendor SMF
             ON POH.VendorID = SMF.VendorID
      JOIN Production.Product ITM
         ON POD.ProductID = ITM.ProductID
WHERE POD.StockedQty >100
AND ITM.Class = 'L' */


    In the second step is enough to check the number of records returned by the whole query without the WHERE constraints:
-- testing changes in the number of records - test 2
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD

     JOIN Purchasing.PurchaseOrderHeader POH
        ON POD.PurchaseOrderID = POH.PurchaseOrderID
            JOIN Purchasing.ShipMethod PSM
               ON POH.ShipMethodID = PSM.ShipMethodID
            JOIN Purchasing.Vendor SMF
               ON POH.VendorID = SMF.VendorID
      JOIN Production.Product ITM
          ON POD.ProductID = ITM.ProductID

/* WHERE POD.StockedQty >100
     AND ITM.Class = 'L' */

    If the number of records is correct then most probably the query is correct, this because there could be incorrect LEFT JOINS that are not reflected in the number of records but in the fact that the corresponding attributes are missing. So don’t rely entirely on this type of test, but take 1-2 examples for which you are sure that records must be retrieved from the LEFT JOIN, and check whether the expected values are shown. Eventually, only for testing purposes, the LEFT JOIN could be modified as a FULL JOIN to see what records are returned.

    Now if there are changes in the number of records from one test to the other, then take the first query and move the comment one join further (see below), and repeat the step until the JOIN that causes the variances is identified. 
 
-- testing changes in the number of records - test 3
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD
JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID

/*      JOIN Purchasing.ShipMethod PSM
           ON POH.ShipMethodID = PSM.ShipMethodID
         JOIN Purchasing.Vendor SMF
            ON POH.VendorID = SMF.VendorID
     JOIN Production.Product ITM
        ON POD.ProductID = ITM.ProductID
WHERE POD.StockedQty >100
AND ITM.Class = 'L' */ 

    Once the join that causes the variances is found, then might be requested then to add a new join constraint or use a LEFT JOIN instead of a FULL JOIN. For example if in the above query there are ShipMethodID with NULL values, then the query should have been written using a LEFT JOIN. On the other side, if duplicates are found, then do a grouping by using the primary key of the table with the lowest level of detail, in this case the Purchasing.PurchaseOrderDetail.

HAVING clause constraints

    HAVING clause constraints behave more like WHERE clause constraints, the difference residing in the different levels where the constraints are applied, so the same technique could be applied in this case too, with the difference that the constraints need to be accommodated to support aggregate functions.
--Vendors with total purchase value >100000
SELECT POH.VendorID
,
SUM(POH.SubTotal) TotalSubTotal

FROM
Purchasing.PurchaseOrderHeader POH

WHERE
POH.Status IN (2,4) -- 2-Approved, 4-Complete

GROUP
BY POH.VendorID

HAVING
SUM(POH.SubTotal)>100000


    Sometimes it helps to bring the HAVING constraints into the SELECT, though that offers limited functionality, therefore it’s more productive maybe to re-link the aggregated query back to the base table(s) on which the statement is based:
--Vendors with total purchase value >100000 - base records
SELECT A.*
,
B.TotalSubTotal

,
CASE

     WHEN B.VendorID IS NOT NULL THEN 1
     ELSE 0
END ConsideredFlag

FROM
Purchasing.PurchaseOrderHeader A
LEFT JOIN ( --aggregated data

      SELECT POH.VendorID
      , SUM(POH.SubTotal) TotalSubTotal
      FROM Purchasing.PurchaseOrderHeader POH
      WHERE POH.Status IN (2,4) -- 2-Approved, 4-Complete

      GROUP BY POH.VendorID
HAVING SUM(POH.SubTotal)>100000
) B
     ON A.VendorID = B.VendorID

WHERE
A.Status IN (2,4) -- 2-Approved, 4-Complete

 
    Also this technique could prove to be limited, though I found it useful in several cases, especially when a complicated formula is included in the aggregated function. The same query could be rewritten with a window aggregate function introduced with SQL Server 2005.
--Vendors with total purchase value >100000 - window aggregate function
SELECT POH.*
,
SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID) TotalSubTotal

,
CASE

     WHEN SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID)>100000 THEN 1
     ELSE 0
END ConsideredFlag

FROM
Purchasing.PurchaseOrderHeader POH

WHERE
POH.Status IN (2,4) -- 2-Approved, 4-Complete

    When working with amounts, quantities or other number values, it could be useful to check the total amount/quantity based on the query logic against the total amount/quantity based on the base table.

No comments: