SQL Troubles: CASE

Showing posts with label CASE. Show all posts

18 February 2010

💎SQL Reloaded: Just in CASE V (Dynamic Queries)

Stored procedures allow not only to encapsulate logic, enhance queries’ performance by cashing query plans or reduce the risks of injection by using parameters, they allow also altering the structure of a query at runtime, modifying for example the query based on the provided parameters. Such behavior is ideal when needed to base the logic of a query on a list of filters (filter controls) in which some of the attributes are not mandatory. A dynamic created query can be executed using the EXEC function or the sp_executesql stored procedure, the later allowing parameters’ substitution, ensuring thus parameters’ data types, another benefit residing in the fact that the execution plans are more likely to be reused by SQL Server, Microsoft recommending thus the use of sp_executesql over Exec [1]. Now it depends also on how much the queries vary between requests and how easy is to handle the missing parameters or parameter’s substitution – sometimes quite a difficult topic for a beginner. Therefore for exemplifying the techniques I will use Exec, and here is the example:

-- Retrieving Products 
CREATE PROCEDURE dbo.pSearchProducts( 
@ProductID int , 
@ProductSubcategoryID int , 
@ProductModelIDs varchar(500) , 
@ProductName nvarchar(50) , 
@ProductNumber nvarchar(25) , 
@MakeFlag bit , 
@FinishedGoodsFlag bit , 
@StartStandardCost money , 
@EndStandardCost money , 
@StartSellStartDate varchar(10) , 
@EndSellStartDate varchar(10) , 
@OrderBy nvarchar(500) ) 
AS 
BEGIN 
DECLARE @Query varchar(2000) 
-- translating Order By clause 
IF LEN(@OrderBy)&gt;0 
BEGIN 
    SET @OrderBy = ',' + @OrderBy  
    SET @OrderBy = Replace(@OrderBy, ',0', ', ProductNumber') 
    SET @OrderBy = Replace(@OrderBy, ',1', ', ProductName') 
    SET @OrderBy = Replace(@OrderBy, ',2', ', ProductModel') 
    SET @OrderBy = Replace(@OrderBy, ',3', ', SellStartDate') 
    SET @OrderBy = Replace(@OrderBy, ',4', ', StandardCost') 
    SET @OrderBy = Right(@OrderBy, Len(@OrderBy)-1) 
END 
-- building the query 
SET @Query = 'SELECT ProductID, ProductName, ProductNumber, ProductModel, CatalogDescription, ProductSubcategory' 
+ ', MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice' 
+ ', Size, SizeUnitMeasureCode, SizeUnitMeasure, WeightUnitMeasureCode, WeightUnitMeasure, Weight' 
+ ', DaysToManufacture, ProductLine, Class, Style, dbo.GetDateAsString(SellStartDate) SellStartDate' 
+ ', dbo.GetDateAsString(SellEndDate) SellEndDate, dbo.GetDateAsString(ModifiedDate) ModifiedDate' 
+ ' FROM Production.vProducts' 
+ ' WHERE 0=0' 
+ CASE IsNull(@ProductID, 0) 
     WHEN 0 THEN '' 
    ELSE ' AND ProductID = ' + cast(@ProductID as varchar(10)) 
END 
+ CASE IsNull(@ProductSubcategoryID, 0) 
     WHEN 0 THEN '' 
     ELSE ' AND ProductSubcategoryID = ' + cast(@ProductSubcategoryID as varchar(10)) 
END 
+ CASE IsNull(@ProductModelIDs, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductModelID IN (' + @ProductModelIDs + ')' 
END  
+ CASE IsNull(@ProductName, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductName LIKE ''' + @ProductName + '''' 
END  
+ CASE IsNull(@ProductNumber, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductNumber LIKE ''' + @ProductNumber + '''' 
END  
+ CASE IsNull(@MakeFlag, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND MakeFlag = 1' 
END  
+ CASE IsNull(@FinishedGoodsFlag, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND FinishedGoodsFlag = 1' 
END  
+ CASE IsNull(@StartStandardCost, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND StandardCost &gt;= ' + cast(@StartStandardCost as varchar(20)) 
END  
+ CASE IsNull(@EndStandardCost, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND StandardCost &lt;= ' + cast(@EndStandardCost as varchar(20)) 
END 
+ CASE IsNull(@StartSellStartDate, '') 
    WHEN '' THEN '' 
    ELSE ' AND DateDiff(d, SellStartDate, dbo.GetStringDate(''' + @StartSellStartDate + '''))&lt;=0' 
END  
+ CASE IsNull(@EndSellStartDate, '') 
     WHEN '' THEN '' 
     ELSE 'AND DateDiff(d, SellStartDate, dbo.GetStringDate(''' + @EndSellStartDate + '''))&gt;=0' 
END  
+ ' ORDER BY ' 
+ CASE LEN(@OrderBy) 
    WHEN 0 THEN ' ProductNumber' -- default sorting 
     ELSE @OrderBy -- chosen sorting 
END 
-- executing the query 
EXEC (@Query)  
END

The above stored procedure could split in 5 logical section:

1. The definition holding stored procedure’s name and parameters
Parameters’ selection depends on the requirements, usually the natural and database key for the main entity or for master data (e.g. Customer, Vendor, Item Number, etc.), the LOV-based attributes used most often as selection criteria (e.g. Statuses, Characteristics), flags (e.g. Make Flag, Finished Goods Flag) that have important meaning for the business, dates used to partition the result set (e.g. Creation Date, Document Date, GL Date) being in general needed to provide the lower and upper bound of the interval. For special type of analysis it’s useful to include also a set of controls for specifying the numeric intervals, like in this case for Standard Cost.
In general the parameters have the same date type as the attribute they represent, with a few exceptions:
dates – because the different date formats used in the various environments (e.g. client vs. web server vs. database server) I prefer to send the dates as string, thus assuring that the date will be always in the expected format.
multi-choice attributes – in case is needed to handle multi-choice selections then the easiest way to handle this is to provide a list of concatenated values as a string, as in @ProductModelIDs’ case.
flags – sometimes I prefer to use characters instead of boolean (0,1) values because it offers more flexibility in handling triple or multi-state situations.

2. The declaration of internal variables, here the @Query holding the dynamic built query, the variable needs to be big enough to hold the whole concatenated string with maximum list of parameters and ORDER BY attributes.

3. The translation of ORDER BY attributes here stored in @OrderBy input parameter, which contains a list with numbers from 0 to 9 that will dictate the attributes used in ORDER BY clause and their order. For example will be sent ‘3,1’ to sort the result set based on SellStartDate and ProductName.

4. The actual concatenation of the dynamic query, the handling for NULL values being handled with a set of CASE functions, existing one CASE for each parameter, including @OrderBy.

5. Query’s execution, here done using Exec.

Notes:
Usually I’m including a 6th section which includes several metadata about the current object (e.g. who created it and when, when was last time modified, objects that call current object, the purpose of the object, etc.).
It’s needed to test such a procedure for all parameters – either by providing all parameters as in the first below example, or creating a test case for each parameter, the later approach being much safer because when multiple parameters are provided, one of them could hide an existing issue.

EXEC dbo.pSearchProducts 973, 2, '25,26,27', 'Road%', 'BK%', 1, 1, 1000, 1200, '01/01/2003', '31/12/2003', '' -- all parameters 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 0, 0, '', '', '' -- all records 
EXEC dbo.pSearchProducts 1, 0, '', '', '', 0, 0, 0, 0, '', '', '' -- ProductID test 
EXEC dbo.pSearchProducts 0, 2, '', '', '', 0, 0, 0, 0, '', '', '' -- ProductSubcategoryID test 
EXEC dbo.pSearchProducts 0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '', '', '2,1' -- ProductModelIDs test 
EXEC dbo.pSearchProducts 0, 0, '', 'Mountain%', '', 0, 0, 0, 0, '', '', '1' -- ProductName test 
EXEC dbo.pSearchProducts 0, 0, '', '', 'BK-T18U%', 0, 0, 0, 0, '', '', '2' -- ProductNumber test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 1, 1, 0, 0, '', '', '' -- MakeFlag &amp; FinishedGoodsFlag test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 50, 110, '', '', '' -- StandardCost test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002', '3,1' -- SellStartDate test

Is it really needed to create a dynamic query in order to handle the various filtering scenarios? In theory yes, there are several techniques that could be used to address a special type of parameters:

1. Single values excepting and dates

In such cases the CASE could be replaced with the IsNull function test on parameter, if the parameter is null then take the value of the attribute against which the test is performed. For example the following CASE:

CASE IsNull(@ProductID, 0) 
    WHEN 0 THEN '' 
     ELSE ' AND ProductID = ' + cast(@ProductID as varchar(10)) 
END

could be replaced with the following constraint:

ProductID = IsNull(@ProductID, ProductID)

A similar technique could be used also for interval-based attributes like the StandardCost from our example, the two above cases could be replaced with the following constraint:

StandardCost BETWEEN ISNULL(@StartStandardCost, StandardCost) AND ISNULL(@EndStandardCost, StandardCost)

Note:
In case the attributes contain also NULL values then the Null value need to be replaced with a special value, typically not found in the list of values the respective attribute could take, for example –1 for foreign keys, an incredibly small number for numeric data types, single or empty space for string data types.

2. List of values

The list of values are a little more difficult to handle but not impossible, for this scenario could be used the dbo.SplitList function introduced in a previous post. For example the CASE for @ProductModelIDs could be rewritten as follows:

IsNull (ProductModelID, -1) IN (SELECT value FROM dbo.SplitList(IsNull(NullIf(@ProductModelIDs, ''), CAST(IsNull(ProductModelID, -1) as varchar(20)) ), ','))

The IsNull(ProductModelID, -1) expression has been introduced in order to handle the Null values, while the expression for the first parameters of dbo.SplitList was written in this way in order to handle the cases in which the @ProductModelIDs was not provided or it contains no values.

Unfortunately this technique even if it solves the problem, is not quite efficient because the dbo.SplitList function is called for each record, fact that impacts query’s performance considerably. On the other side if the @ProductModelIDs would be mandatory than the dbo.SplitList could be joined with the main query, fact that increases query’s performance, though it must be paid attention to the duplicates, because each duplicate found in the list provided to dbo.SplitList function could lead to duplicates in the final result!

3. Dates

Because the Date functions can’t be used errorless with the BETWEEN operator in SQL Server, the DateDiff function being preferred for such case, the handing of Date intervals could be achieved by providing the lower/upper bound of the interval date types used, for example ‘0001-01-01’, respectively ‘9999-12-31’ for date data type, or ‘1900-01-01’, respectively ‘2079-06-06’ for smalldatetime data type. or ‘1753-01-01’, respectively ‘9999-12-31’ for datetime data type. Because the dbo.GetDateAsString function returns a datetime I will use the lower/uppers bounds of the respective data type in case the date values are null, thus the two CASEs for SellStartDate could be rewritten as:

DateDiff (d, SellStartDate, dbo.GetStringDate(IsNull(@StartSellStartDate, '01-01-1753')))&lt;=0 
AND DateDiff(d, SellStartDate, dbo.GetStringDate(IsNull(@EndSellStartDate, '31-12-2009')))&gt;=0

In case is needed to return also the dates having Null value, then the SellStatDate could be replaced with any of the two bounds of the data type used, thus the above constraint becomes:

DateDiff (d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(@StartSellStartDate, '01-01-1753')))&lt;=0 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(@EndSellStartDate, '31-12-2009')))&gt;=0

Unfortunately, I haven’t found any easy way to handle the different attribute combinations that could appear in the ORDER BY clause. On the other side there are controls that handle the sorting on the client, therefore a default ORDER BY could be provided, and if the Users need other sorting, then they could sort the data by themselves. Here’s an alterative writing for the above stored procedure without using a dynamic query, though this doesn’t mean that the performance of this approach is better than the one of the first stored procedure given the fact that a constraint is evaluated for each parameter, the unnecessary constraints for list of values and dates coming with a considerable decrease in performance.

  -- Retrieving Products w/o dynamic query 
CREATE PROCEDURE dbo.pSearchProductsND( 
@ProductID int , @ProductSubcategoryID int , @ProductModelIDs varchar(500) , @ProductName nvarchar(50) , @ProductNumber nvarchar(25) , @MakeFlag bit , @FinishedGoodsFlag bit , @StartStandardCost money , @EndStandardCost money , @StartSellStartDate varchar(10) , @EndSellStartDate varchar(10)) 
AS 
BEGIN -- the query SELECT ProductID, ProductName, ProductNumber, ProductModel, CatalogDescription, ProductSubcategory 
,MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice 
,Size, SizeUnitMeasureCode, SizeUnitMeasure, WeightUnitMeasureCode, WeightUnitMeasure, Weight 
,DaysToManufacture, ProductLine, Class, Style, dbo.GetDateAsString(SellStartDate) SellStartDate 
,dbo.GetDateAsString(SellEndDate) SellEndDate, dbo.GetDateAsString(ModifiedDate) ModifiedDate 
FROM Production.vProducts 
WHERE IsNull(ProductID, -1) = COALESCE(NullIf(@ProductID, 0), ProductID, -1) 
AND IsNull(ProductSubcategoryID, -1) = COALESCE(NullIf(@ProductSubcategoryID, 0), ProductSubcategoryID, -1) 
AND IsNull(ProductModelID, -1) IN (SELECT value FROM dbo.SplitList(IsNull(NullIf(@ProductModelIDs, ''), CAST(IsNull(ProductModelID, -1) as varchar(20)) ), ','))  
AND IsNull(ProductName, ' ') LIKE IsNull(NullIf(@ProductName, ''), IsNull(ProductName, ' '))  
AND IsNull(ProductNumber, ' ') LIKE IsNull(NullIf(@ProductNumber, ''), IsNull(ProductNumber, ' '))  
AND IsNull(MakeFlag, 0) = COALESCE(NullIf(@MakeFlag, 0), MakeFlag, 0)  
AND IsNull(FinishedGoodsFlag, 0) = COALESCE(NullIf(@FinishedGoodsFlag, 0), FinishedGoodsFlag, 0) 
AND StandardCost BETWEEN ISNULL(NullIf(@StartStandardCost, 0), StandardCost) AND ISNULL(NullIf(@EndStandardCost, 0), StandardCost) 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@StartSellStartDate, ''), '01-01-1753')))&lt;=0 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@EndSellStartDate, ''), '31-12-2009')))&gt;=0 
ORDER BY ProductNumber END

Given the format in which parameters are sent (0 for numeric values, empty string for string values) instead of using Nulls, I had to transform the parameters to Null in case they take the value 0 or empty string, this being the only difference between the constraints used in the query and the example constraints from above. In this way the stored procedures should return the same number of records for the same test cases, here are the test cases for the second stored procedure:

EXEC dbo.pSearchProductsND 973, 2, '25,26,27', 'Road%', 'BK%', 1, 1, 1000, 1200, '01/01/2003', '31/12/2003' -- all parameters 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 0, 0, '', '' -- all records 
EXEC dbo.pSearchProductsND 1, 0, '', '', '', 0, 0, 0, 0, '', '' -- ProductID test 
EXEC dbo.pSearchProductsND 0, 2, '', '', '', 0, 0, 0, 0, '', '' -- ProductSubcategoryID test 
EXEC dbo.pSearchProductsND 0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '', '' -- ProductModelIDs test 
EXEC dbo.pSearchProductsND 0, 0, '', 'Mountain%', '', 0, 0, 0, 0, '', '' -- ProductName test 
EXEC dbo.pSearchProductsND 0, 0, '', '', 'BK-T18U%', 0, 0, 0, 0, '', '' -- ProductNumber test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 1, 1, 0, 0, '', '' -- MakeFlag &amp; FinishedGoodsFlag test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 50, 110, '', '' -- StandardCost test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002' -- SellStartDate test

Now it depends which method you will want to use, frankly I like the dynamic query approach because it offers more flexibility in dealing with parameters and in handling other types of requirements. In addition, if it happens that you use the second approach, the request could change and you might need to redesign your stored procedure using the first approach.

References:

[1] MSDN (2009) Using sp_executesql

15 February 2010

💎⌛SQL Reloaded: Oracle vs. SQL Server (Handling Missing Dates)

What do we do with missing values? How do they influence the data analysis? Two questions each developer should ask users when creating/modifying a report. In general for numbers things are pretty simple, just replace the NULL values with a 0, this impacting the report minimally. With Dates is a little more complicated because maybe is needed to remove records having a certain date null, replace it with another date representing a certain event or maybe with the current date.

As NULLIF, COALESCE and CASE functions are available in both systems, they could be used in particular to handle missing dates or other type of data types, especially when intended to make code portable between the two platforms. In addition SQL Server provides the IsNull function, the equivalent in Oracle being the NVL, extended by NVL2. Oracle provides also an equivalent for simple CASE function, namely the DECODE function. Here are some examples based on Product table from AdventureWorks database:

-- SQL Server NULL handling 
SELECT ProductID, SellStartDate, SellEndDate, DiscontinuedDate, GetDate() CurrentDate 
, IsNull(SellEndDate, GetDate()) Example1 
, COALESCE(SellEndDate, GetDate()) Example3 
, COALESCE(DiscontinuedDate, SellEndDate, GetDate()) Example4 
, CASE 
     WHEN SellEndDate IS NULL THEN GetDate() 
     ELSE SellEndDate 
END Example6 
, NullIf(SellEndDate, SellStartDate) Example7 
FROM Production.Product 

-- Oracle NULL handling 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, DISCONTINUEDDATE, SYSDATE 
, NVL(SELLENDDATE, SYSDATE) EXAMPLE1 
, NVL2(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, COALESCE(SELLENDDATE, SYSDATE) EXAMPLE3 
, COALESCE(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE4 
, DECODE(SELLENDDATE, NULL, SYSDATE, SELLENDDATE) EXAMPLE5 
, CASE 
    WHEN SELLENDDATE IS NULL THEN SYSDATE 
    ELSE SELLENDDATE 
END EXAMPLE6 
, NULLIF(SELLENDDATE, SELLSTARTDATE) EXAMPLE7 
FROM PRODUCT

The simple CASE function can’t be use to check for NULL values unless the NULL value is implied within the ELSE branch, and this because the IS NULL clause needs to be used for checking whether a value is missing. An exception from this rule seems to be the DECODE function, as can be seen from 5th example from the Oracle-based query.

Aggregated functions ignore NULL values in both systems, therefore any of the above techniques could be used to handle the missing values if other behavior is expected. The selection of the date used in case the target date is missing depends on the requirements, usually is used the current date when the reports focus on today’s status, while for the calculation of lead/cycle times is preferred to use the closest date to the target date.

-- SQL Server: handling nulls in aggregations 
SELECT AVG(DateDiff(d, SELLSTARTDATE, IsNull(SELLENDDATE, GETDATE()))) -- handling null values 
, AVG(DateDiff(d,SELLSTARTDATE, SELLENDDATE)) -- ignoring null values 
FROM Production.Product 

-- Oracle: handling nulls in aggregations 
SELECT AVG(TRUNC(NVL(SELLENDDATE, SYSDATE)) - TRUNC(SELLSTARTDATE)) -- handling null values 
, AVG(TRUNC(SELLENDDATE) - TRUNC(SELLSTARTDATE)) -- ignoring null values 
FROM PRODUCT

Another scenario when missing date values need to be handled is in joins, often for such cases a missing value denoting that the record is still active, like in the below example:

-- SQL Server: getting the Start Date for current List Price 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, ITM.ListPrice 
, PPH.StartDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND PPH.EndDate IS NULL

The above query could be rewritten also in the following form that can be useful to get the record in use during a certain event:

-- SQL Server: getting the List Price when the product was last time modified 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, PPH.ListPrice 
, PPH.StartDate 
, PPH.EndDate 
, ITM.ModifiedDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND DateDiff(d, ITM.ModifiedDate, IsNull(PPH.EndDate, GETDATE()))>=0 
AND DateDiff(d, PPH.StartDate, ITM.ModifiedDate)>=0

In Oracle special attention must be given to LEAST and GREATEST functions that consider the smallest/greatest values from a list of attributes because if one of the vales is NULL then function’s value is NULL too, therefore if one of the attributes could take NULL values then it should be replaced with the values from a NOT NULL attribute used in the functions, like in 3rd and 4th example from the below example, this change having no impact on functions’ output.

-- Oracle Greatest/Least example 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, SYSDATE 
, GREATEST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE1 
, LEAST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, GREATEST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE3 
, LEAST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE4 
FROM PRODUCT

05 February 2010

💎SQL Reloaded: Just in CASE IV (Other Scenarios)

Two days ago I was looking on the Web on random posts to see whether I missed something important related to CASE function and I stumble over an old post of Craig S. Mullins, in one of the examples (See SQL Statement #5) there is a query similar with the one below:

As can be seen on more than one branch of a CASE there is a correlated sub-query based on the same table, the aggregated value being used for further calculations. Most probably the respective query had only the purpose to demonstrate a technique, though I’m having mainly two observations related to it:
1. The query returns a label of whether the Stock is High, Normal, Low or there is no Stock, though I have no measure of what the Stock actually is, thus in a first phase I can’t validate whether the query shows the correct data. In case a report is based on the respective query this reduces considerably also report's usability.
2. A select is replicated across several CASE branches fact that I find it complicates query’s complexity reflected in query’s understanding and maintainability, while considering that the logic needs to be kept in synchronization increases the chances for making a mistake. At least in this situation, being involved only one table, the logic is quite simple though what do you do when the correlated sub-query is more complex?

How about encapsulating query’s logic in a sub-query and join it to the main tables using a left join? This approach would address both issues mentioned above:

    When creating a query there is always an important facts that needs to be considered - query’s performance. The second query is simpler and in theory it should be easier to process, expecting to have at least similar performance as the first version. In this case the ProductID is a foreign key in Production.ProductInventory, the search performed on the respective table having minimum impact on performance. If no index is available on the searched attribute, more likely first query’s performance will decrease considerably. The best approach for mitigating the performance differences between the two queries is to look at Client Statistics and Execution Plan. In what concerns the Client Statistics both queries are having similar performance, while the Execution Plan of the second query excepting the fact that is simpler and, without going into details, it seems the second plan is better.

    Maybe it makes sense to use correlated sub-queries in a CASE only when the correlated sub-queries have distinct logic, though even then I would recommend using a left join instead, this technique allowing more flexibility being possible to show the actual values in the query and even reuse them if needed in additional calculations.

    Talking about correlated queries, I found cases in which two distinct correlated sub-queries with the same logic were used to pull two distinct attributes available in the same table, something like in the below example.

What if the correlated sub-queries are used again with the CASE function like in the below example?

Even if maybe the number of calls to the correlated sub-queries is reduced by using a CASE, I would recommend using a left join instead, this technique offering more flexibility, the logic becoming also much easier to maintain and debug.

04 February 2010

💎SQL Reloaded: Just In CASE III (Quest for Performance)

Arranging branches

Given the fact that the CASE function stops after the first match, in case there are no two expressions that evaluate to true for any same value, the CASE could be optimized for the best performance by arranging the branches in the order of the highest number of matches. For example taking the logic from the 3rd example from Part I and checking the number of occurrences for each value, surprisingly (or maybe not) there are 257 NULL values, 97 ‘L’ values, 82 ‘H’ values, respectively 68 ‘M’ values. If we change the branches in this order, the number of evaluations will decrease from 82*1+68*2+97*3+257*3=1280 to 257*1+97*2+82*3+68*3=901, the difference is small though when applied to big data sets the improvement could pay the effort. Here is the modified logic for Example 3:

This technique is unfortunately not bulletproof – even if less probable in many cases, the order of the values’ occurrences could change overnight, the number of total evaluations varying in time on the number of occurrences for each value.

Rewriting Simple Unions

Have you ever seen a set of similar queries whose output is merged with the help of UNION (ALL) operator? I’ve seen so many queries like the below one, in which the only difference between the sub-queries was residing in the WHERE constraint(s) applied and maybe a few additional calculations!

The example is quite simple, though I think it demonstrates the point. The UNION could be translated to a simple CASE as in the 3rd example. The problem with such queries is that if there is no index on the attribute(s) used is the WHERE constraint, it will be performed a full table scan for each sub-query, quite expensive from a performance standpoint. Even if the respective attribute is indexed, there’s still a pay in performance. Such an approach could be maybe acceptable when in each scenario different tables are joined to the main table, though even then should be checked which one of the two approaches has better performance. A recent example I can recall and in which could have been used several CASE functions instead of the UNION was based on the JOIN of two tables, the query looking something like:

Rewriting Complex Unions

Even if when the base table is used in several union-based merged queries, it doesn’t sense to apply this technique all the times, especially when dealing with complex queries using multiple joins. Now it depends, there are scenarios and scenarios, is must be always considered the trade in performance, readability and usability of a query when applying any technique. The Person.Address from AdventureWorks is the best way to exemplify two scenarios in which it makes sense and doesn’t makes sense to combine the logic in two union queries. The respective table stored the addresses for Customers (Sales & Individuals), Vendors and Employees, thus supposing that we would like to see the Owners and Owner Type for each Address we would need to create several unions.

Let’s consider first the Customer-based Addresses – as there are two types of Customers and because the details for each type are stored in different tables, we might end up creating the following query:

As can be seen the two queries are quite similar, most of the tables joined are the same with 3 exceptions: Sales.Store vs. Sales.Individual & Person.Contact. The query could be rewritten using a CASE and left joins instead of inner joins as follows:

Similar queries could be written in order to get the Vendor and Employee details, and as can be seen also these queries share many similarities.

Does it makes sense to use the above technique also in this case? Maybe it would be acceptable to merge the Vendor with Employee Addresses queries, though does it make sense to do merge also the Customer Addresses using CASE & left joins. The queries are not so complex so it’s possible to do that, though I think a decision should be taken only when has been studied the performance of the two approaches.

Pivoting Behavior

Another technique when a CASE could help eliminate several joins to the same table is presented in a post on List Based Aggregations, the CASE function being used together with aggregated functions in order to cumulate the On Hand for several locations and show it on the same line. This approach was quite useful before the PIVOT operator was introduced in SQL Server 2005 (see Pivot operator example), and is still is in case is needed to select more than 1 attribute for each line.

SQL Troubles

Pages