Showing posts with label date functions. Show all posts
Showing posts with label date functions. Show all posts

05 April 2010

SQL Reloaded: The Power of Joins VI (Not-Equal Joins)

    SQL Server 2008 documentation mentions the not-equal joins without giving an explicit definition, the term appearing within the context of joining values in two columns that are not equal. On the other side the not-equal join denomination could be used to refer more generally to the joins whose join constraints involve comparison operators other than equal (“=”), for example <, >, <>, BETWEEN, IN, and its negation NOT IN, LIKE, etc. The difference between the two views resides on the fact that some of the operators (e.g. IN, BETWEEN) could imply the equality of values from joined columns. Another objection could be made on whether the not-equality join constraint could be based only on one of the table’s attributes participating in the join, the other thus missing, such constraints could be considered also as join constraints even if in some scenarios they could have been written in the main WHERE clause as well. For the sake of simplicity and flexibility I would consider a not-equal join or not-equality join as a join in which at least one of the join constraints is based on a operator other than the equal operator. Could be discussed thus about not-equal join constraints referring to the join constraints that involve other operators than equal, and equal join constraints, the two types of joins could coexist in the same join. A not-equal join constraint often involves the possibility of being returned more records then if a equal join constraint was used.

      If we talk about join constraints then more likely we refer to vertical joins, though also anti-joins and semi-joins could include sporadically not-equality operators. In a normal join based on equalities for the set of attributes participating in the join from the left table are matched all the records from the right table, even if this means retrieving more than one record. Same happens also for not-equal constraints even if the business scenarios in which such needs arises are not so intuitive. One relatively simple example I often met is the case when is needed to calculate the incremental aggregation of values, for example to calculate the incremental volume of PO placed over time:
 
SELECT DII.DateSequence 
, IsNull(POH.SubTotal, 0) SubTotal 
FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII 
    CROSS APPLY (--incremental PO Volume 
     SELECT SUM(POH.SubTotal) SubTotal 
    FROM Purchasing.PurchaseOrderHeader POH 
   WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete  
       AND DATEDIFF(d, POH.OrderDate, DII.DateSequence)&gt;=0 --not-equal constraint 
    ) POH 

    In order to simplify the query I used the dbo.fGetDatesInInterval table-valued function that returns a sequence of dates within a given time interval, and a CROSS APPLY returning the aggregated value for each date returned by the function, for the calculations being considered all the records having an Order Date smaller or equal with the given (sequence) date. As can be seen from the above query the not-equal constraint is based on DateDiff function, instead we could have written POH.OrderDate<=DII.DateSequence, one of the reasons for its use is the fact that the constraint could be easily modified to show for example only the aggregated values for the past 90 days, the BETWEEN operator entering in scene:
 
SELECT DII.DateSequence 
, IsNull(POH.SubTotal, 0) SubTotal 
FROM dbo.fGetDatesInInterval('2002-01-01', '2002-01-31') DII 
     CROSS APPLY (--incremental PO Volume 
    SELECT SUM(POH.SubTotal) SubTotal 
    FROM Purchasing.PurchaseOrderHeader POH 
    WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete  
        AND DATEDIFF(d, POH.OrderDate, DII.DateSequence) BETWEEN 0 AND 90--not-equal constraint 
    ) POH 

   The not-equal constraint from the query is part of interval-based constraints category which covers the scenarios in which one of the attributes from the tables participating in a join is checked whether it falls within a given interval, typically that being valid for numeric values as above or date value, as in the next query that shows the Last Standard Cost Details per Product.  

-- Products &amp; Last Standard Cost details SELECT ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, PCH.StartDate 
, PCH.EndDate 
FROM Production.Product ITM 
    JOIN Production.ProductCostHistory PCH 
     ON ITM.ProductID = PCH.ProductID AND GETDATE() BETWEEN PCH.StartDate AND IsNull(PCH.EndDate, GETDATE()) 

    Such queries need often to be written when the records in a table have a given validity, no overlap existing between intervals.

    Using list of values within a join constraint and thus the IN or NOT IN operators occurs when for example the base table stores data coming from multiple sources and more encodings are used for the same source, each of them caring its meaning. For example using the Production.TransactionHistory and several other tables coming with AdventureWorks database, and supposing that for Purchase Orders are two transaction types P and B, and for Sales Orders S and I, a general scope query on Transactions could be written as follows:
   
-- Transaction History 

SELECT PTH.TransactionID 
, PTH.ProductID 
, PTH.ReferenceOrderID 
, CASE PTH.TransactionType 
     WHEN 'W' THEN 'Work Order' 
    WHEN 'S' THEN 'Sales Order' 
    WHEN 'I' THEN 'Internal Sales Order' 
    WHEN 'P' THEN 'Purchase Order' 
    WHEN 'B' THEN 'Blanket Purchase Order' 
    ELSE 'n.a' 
  END TransactionType , CASE  
    WHEN PTH.TransactionType = 'W' THEN StartDate 
    WHEN PTH.TransactionType IN ('S', 'I') THEN SOH.OrderDate 
    WHEN PTH.TransactionType IN ('P', 'B') THEN POH.OrderDate 
  END ReferenceDate FROM Production.TransactionHistory PTH 
    LEFT JOIN Purchasing.PurchaseOrderHeader POH       ON PTH.ReferenceOrderID = POH.PurchaseOrderID   AND PTH.TransactionType IN ('P', 'B') --not-equal constraint 
   LEFT JOIN Sales.SalesOrderHeader SOH 
     ON PTH.ReferenceOrderID = SOH.SalesOrderID 
  AND PTH.TransactionType IN ('S', 'I')  --not-equal constraint 
    LEFT JOIN Production.WorkOrder PWO 
      ON PTH.ReferenceOrderID = PWO.WorkOrderID 
   AND PTH.TransactionType = 'W' 

    The need for writing such a query is seldom and could have been written as 3 distinct queries whose results are joined with two unions, both approaches having their pluses and minuses. 

   In one of the above queries was used a constraint based on DateDiff function, and in theory any other function could be used in a join constraint, including user-defined functions. Such function-based join constraints are handy but should be used with caution because they could impact query’s performance. Pattern-based join constraints used with LIKE operator could be used as well.

    There are cases in which the constraints that typically would be included in the WHERE clause are added in the body of the join, apparently without any good reason. Such based denatured join constraints are base only on the attributes of one of the tables involved in the join, like in the next query:
   
-- PO for Products with StandardCost&gt;10 SELECT POD.PurchaseOrderDetailID  
, POD.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Purchasing.PurchaseOrderDetail POD  
    JOIN Production.Product ITM       ON POD.ProductID = ITM.ProductID 
  AND ITM.StandardCost&gt;10  -- denaturated join constraint

16 March 2010

MS Office: Excel for SQL Developers IV (Differences Between Two Datasets)

    One of the tasks which appears from time to time on my table is to determine the differences between two datasets coming from different systems or from the same system/source but taken at different points in time. Even if it’s fairly simple to write such a query, especially when the number of attributes is quite small, there is lot of repetitive work that needs to be done because typically for each set of matched attributes needs to be added a third attribute (difference flag) showing whether there is a difference or not. In time I started to use Excel formulas to create the SELECT clause for such a query, matching tables’ attribute first and categorizing the data types mainly in 4 categories: text, amounts, numeric (other than amounts) and dates. This categorization is requested first of all by the different ways in handling the NULL values for each data type, and secondly by the difference between amounts and numeric, even if they might have the same data type, the difference resides in the fact that amounts might be rounded differently in each data source, therefore a difference of a few cents will not be considered as difference.

    Given two attributes ColumnX and ColumnY from tables A, respectively B, let’s look first on how the difference flag constraint could be written for each category:

--text attributes: 
CASE 
    WHEN IsNull(A.ColumnX , '')<> ISNULL(B.ColumnY, '') THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--amount attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) - IsNull(B.ColumnY, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--numeric attributes: 
CASE 
    WHEN IsNull(A.ColumnX, 0) <> IsNull(B.ColumnY, 0) THEN 'Y' 
    ELSE 'N' 
END DiffColumnXYFlag 

--date attributes: 
CASE 
   WHEN IsNull(DateDiff(d, A.ColumnX, B.ColumnY), -1)<>0 THEN 'Y' 
   ELSE 'N' 
END DiffColumnXYFlag 


Notes:
1.   Bit attributes can be treated as numeric as long as they are considered as having a bi-state, for tri-state values in which also NULL is considered as a distinct value then the constraint must be changed, the most natural way being to translate the NULL to –1:
CASE
    WHEN IsNull(A.ColumnX, -1) <> IsNull(B.ColumnY, -1) THEN 'Y'
    ELSE 'N'

END
DiffColumnXYFlag

2.   In most of the examples I worked with the difference between two pair dates, the difference was calculated at day level, though it might happen that is needed to compare the values at smaller time intervals to the order of hours, minutes or seconds. The only thing that needs to be changed then is the first parameter from DateDiff function. There could be also situations in which a difference of several seconds is acceptable, a BETWEEN operator could be used then as per the case of numeric vs. amount values.
3.    In case one of the attributes is missing, the corresponding difference flag could take directly the value ‘N’ or ‘n/a’.
4.   It could happen that there are mismatches between the attributes’ data type, in this case at least one of them must be converted to a form that could be used in further processing.

    Thus a macro for this purpose would take as input a range with the list of attributes from the two tables, the data type category and the columns participating in the join constraint, two parameters designating the name of the left and right table participating in the FULL OUTER JOIN, the time interval considered, the error margin value interval (e.g. [-e, e]) and a flag indicating whether to show all combined data or only the records for which there is at least a difference found. 

Function GetComparisonQuery(ByVal rng As Range, ByVal LeftTable As String, ByVal RightTable As String, ByVal TimeInterval As String, ByVal ErrorMargin As String, ByVal ShowOnlyDifferencesFlag As Boolean) As String 
'builds the code for a comparison query between two tables 
Dim attributes As String 
Dim differences As String 
Dim constraint As String 
Dim whereConstraints As String 
Dim joinConstraints As String 
Dim columnX As String 
Dim columnY As String 
Dim index As Integer 
For index = 1 To rng.Rows.Count 
    columnX = Trim(rng.Cells(index, 1).Value) 
    columnY = Trim(rng.Cells(index, 2).Value) 
  
    If Len(columnX) > 0 Or Len(columnY) > 0 Then 
       If Len(columnX) > 0 Then 
            attributes = attributes & ", A." & columnX & " LT" & columnX & vbCrLf 
       End If 
       If Len(columnY) > 0 Then 
            attributes = attributes & ", B." & columnY & " RT" & columnX & vbCrLf 
       End If 
       
       constraint = "" 
       If Len(Trim(rng.Cells(index, 4).Value)) = 0 Then 
            If Len(columnX) > 0 And Len(columnY) > 0 Then 
                 'creating the difference flag 
                 Select Case Trim(rng.Cells(index, 3).Value) 
                 Case "text": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , '') <> IsNUll(B." & columnY & ", '') THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "amount": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) - IsNUll(B." & columnY & ", 0) NOT BETWEEN -" & ErrorMargin & " AND " & ErrorMargin & " THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "numeric": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN IsNull(A." & columnX & " , 0) <> IsNUll(B." & columnY & ", 0) THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case "date": 
                      constraint = "CASE" & vbCrLf & _ 
                                   "     WHEN DateDiff(" & TimeInterval & ", A." & columnX & ", B." & columnY & ")<>0 THEN 'Y'" & vbCrLf & _ 
                                   "     ELSE 'N'" & vbCrLf & _ 
                                   "  END" 
                 Case Else: 'error 
                     MsgBox "Incorrect data type provided for " & index & " row!", vbCritical 
                 End Select 
                 
                
                If ShowOnlyDifferencesFlag Then 
                   whereConstraints = whereConstraints & " OR " & constraint & " = 'Y'" & vbCrLf 
                End If 
                
                differences = differences & ", " & constraint & " Diff" & columnX & "Flag" & vbCrLf 
            Else 
                differences = differences & ", 'n/a' Diff" & IIf(Len(columnX) > 0, columnX, columnY) & "Flag" & vbCrLf 
            End If 
                             
        Else 
            joinConstraints = joinConstraints & "    AND A." & columnX & " = B." & columnY & vbCrLf 
        End If 
     
     End If 
Next 
If Len(attributes) > 0 Then 
    attributes = Right(attributes, Len(attributes) - 2) 
End If 
If Len(joinConstraints) > 0 Then 
    joinConstraints = Right(joinConstraints, Len(joinConstraints) - 8) 
End If 
If Len(whereConstraints) > 0 Then 
    whereConstraints = Right(whereConstraints, Len(whereConstraints) - 4) 
End If 
'building the comparison query 
GetComparisonQuery = "SELECT " & attributes & _ 
    differences & _ 
    "FROM " & LeftTable & " A" & vbCrLf & _ 
    "     FULL OUTER JOIN " & RightTable & " B" & vbCrLf & _ 
    "       ON " & joinConstraints & _ 
    IIf(ShowOnlyDifferencesFlag And Len(whereConstraints) > 0, "WHERE " & whereConstraints, "") 
   
End Function 
 

Excel - Comparison Datasets

    The query returned by the macro for the above example based on attributes from Production.Product table from AdventureWorks database and Production.Products table created in Saving Data With Stored Procedures post: 
 
SELECT A.ProductID LTProductID 
, B.ProductID RTProductID 
, A.Name LTName 
, B.Name RTName 
, A.ProductNumber LTProductNumber 
, B.ProductNumber RTProductNumber 
, A.MakeFlag LTMakeFlag 
, B.MakeFlag RTMakeFlag 
, A.FinishedGoodsFlag LTFinishedGoodsFlag 
, B.FinishedGoodsFlag RTFinishedGoodsFlag 
, A.Color LTColor 
, B.Color RTColor 
, A.SafetyStockLevel LTSafetyStockLevel 
, B.SafetyStockLevel RTSafetyStockLevel 
, A.ReorderPoint LTReorderPoint 
, A.StandardCost LTStandardCost 
, B.StandardCost RTStandardCost 
, A.ListPrice LTListPrice 
, B.ListPrice RTListPrice 
, A.Size LTSize 
, A.DaysToManufacture LTDaysToManufacture 
, A.ProductLine LTProductLine 
, A.Class LTClass 
, A.Style LTStyle 
, A.SellStartDate LTSellStartDate 
, B.StartDate RTSellStartDate 
, A.SellEndDate LTSellEndDate 
, B.EndDate RTSellEndDate 
, A.DiscontinuedDate LTDiscontinuedDate 
, B.CreationDate RT 
, B.CreatedBy RT 
, A.ModifiedDate LTModifiedDate 
, B.ModifiedDate RTModifiedDate 
, B.ModifiedBy RT 
, CASE 
    WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
    ELSE 'N' 
END DiffNameFlag 
, CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
    ELSE 'N' 
END DiffProductNumberFlag 
, CASE 
    WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffMakeFlagFlag 
, CASE 
    WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
    ELSE 'N' 
END DiffFinishedGoodsFlagFlag 
, CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
    ELSE 'N' 
END DiffColorFlag 
, CASE 
    WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
    ELSE 'N' 
END DiffSafetyStockLevelFlag 
, 'n/a' DiffReorderPointFlag 
, CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05       
     THEN 'Y' 
ELSE 'N' 
END DiffStandardCostFlag 
, CASE 
     WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
     ELSE 'N' 
END DiffListPriceFlag 
, 'n/a' DiffSizeFlag 
, 'n/a' DiffDaysToManufactureFlag 
, 'n/a' DiffProductLineFlag 
, 'n/a' DiffClassFlag 
, 'n/a' DiffStyleFlag 
, CASE 
    WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellStartDateFlag 
, CASE 
    WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
    ELSE 'N' 
END DiffSellEndDateFlag 
, 'n/a' DiffDiscontinuedDateFlag 
, 'n/a' DiffCreationDateFlag 
, 'n/a' DiffCreatedByFlag 
, CASE 
     WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
     ELSE 'N' 
END DiffModifiedDateFlag 
, 'n/a' DiffModifiedByFlag 
FROM Production.Product A 
    FULL OUTER JOIN Production.Products B 
       ON A.ProductID = B.ProductID 
WHERE CASE 
     WHEN IsNull(A.Name , '') <> IsNUll(B.Name, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
    WHEN IsNull(A.ProductNumber , '') <> IsNUll(B.ProductNumber, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.MakeFlag , 0) <> IsNUll(B.MakeFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.FinishedGoodsFlag , 0) <> IsNUll(B.FinishedGoodsFlag, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.Color , '') <> IsNUll(B.Color, '') THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.SafetyStockLevel , 0) <> IsNUll(B.SafetyStockLevel, 0) THEN 'Y' 
     ELSE 'N' 
END = 'Y' 
OR CASE 
     WHEN IsNull(A.StandardCost , 0) - IsNUll(B.StandardCost, 0) NOT BETWEEN -0.05 AND 0.05 
      THEN 'Y' 
ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN IsNull(A.ListPrice , 0) - IsNUll(B.ListPrice, 0) NOT BETWEEN -0.05 AND 0.05 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellStartDate, B.StartDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
      WHEN DateDiff(d, A.SellEndDate, B.EndDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 
OR CASE 
       WHEN DateDiff(d, A.ModifiedDate, B.ModifiedDate)<>0 THEN 'Y' 
      ELSE 'N' 
END = 'Y' 


Notes:
1.   The macro doesn’t consider an ORDER BY clause, though it could be easily added manually
2.   Not all of the join constraints are so simple so that they can be reduced to one or more simple equalities, on the other side we have to consider that the most time consuming task is listing the attributes and the difference flags.
3.    Sometimes it’s easier to create two extracts – in the first being considered all the records from the left table and the matches from the right table (left join), respectively all the records from the right table and the matches from the left table (right join).
4.     Given the fact that the attributes participating in the join clause should in theory match, each pair of such attributes could be merged in one attribute using the formula: IsNull(A.ColumnX, B.ColumnY) As ColumnXY.
5.     In order to show all the data from the two tables and not only the differences, all is needed to do is to change the value of the last parameter from true to false:
=GetComparisonQuery(A2:D23, "Production.Product", "Production.Products", "d", "0.05", false) 
6.     For TimeInterval parameter should be provided only the values taken by DatePart parameter (first parameter) of SQL Server’s DateDiff function.
7.     Please note that no validation is made for the input parameters.

18 February 2010

SQL Reloaded: Just in CASE V (Dynamic Queries)

    Stored procedures allows not only to encapsulate logic, enhance queries’ performance by cashing query plans or reduce the risks of injection by using parameters, they allow also altering the structure of a query at runtime, modifying for example the query based on the provided parameters. Such behavior is ideal when needed to base the logic of a query on a list of filters (filter controls) in which some of the attributes are not mandatory. A dynamic created query can be executed using the EXEC function or the sp_executesql stored procedure, the later allowing parameters’ substitution, ensuring thus parameters’ data types, another benefit residing in the fact that the execution plans are more likely to be reused by SQL Server, Microsoft recommending thus the use of sp_executesql over Exec [1]. Now it depends also on how much the queries vary between requests and how easy is to handle the missing parameters or parameter’s substitution – sometimes quite a difficult topic for a beginner. Therefore for exemplifying the techniques I will use Exec, and here is the example:
 
-- Retrieving Products 
CREATE PROCEDURE dbo.pSearchProducts( 
@ProductID int , 
@ProductSubcategoryID int , 
@ProductModelIDs varchar(500) , 
@ProductName nvarchar(50) , 
@ProductNumber nvarchar(25) , 
@MakeFlag bit , 
@FinishedGoodsFlag bit , 
@StartStandardCost money , 
@EndStandardCost money , 
@StartSellStartDate varchar(10) , 
@EndSellStartDate varchar(10) , 
@OrderBy nvarchar(500) ) 
AS 
BEGIN 
DECLARE @Query varchar(2000) 
-- translating Order By clause 
IF LEN(@OrderBy)&gt;0 
BEGIN 
    SET @OrderBy = ',' + @OrderBy  
    SET @OrderBy = Replace(@OrderBy, ',0', ', ProductNumber') 
    SET @OrderBy = Replace(@OrderBy, ',1', ', ProductName') 
    SET @OrderBy = Replace(@OrderBy, ',2', ', ProductModel') 
    SET @OrderBy = Replace(@OrderBy, ',3', ', SellStartDate') 
    SET @OrderBy = Replace(@OrderBy, ',4', ', StandardCost') 
    SET @OrderBy = Right(@OrderBy, Len(@OrderBy)-1) 
END 
-- building the query 
SET @Query = 'SELECT ProductID, ProductName, ProductNumber, ProductModel, CatalogDescription, ProductSubcategory' 
+ ', MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice' 
+ ', Size, SizeUnitMeasureCode, SizeUnitMeasure, WeightUnitMeasureCode, WeightUnitMeasure, Weight' 
+ ', DaysToManufacture, ProductLine, Class, Style, dbo.GetDateAsString(SellStartDate) SellStartDate' 
+ ', dbo.GetDateAsString(SellEndDate) SellEndDate, dbo.GetDateAsString(ModifiedDate) ModifiedDate' 
+ ' FROM Production.vProducts' 
+ ' WHERE 0=0' 
+ CASE IsNull(@ProductID, 0) 
     WHEN 0 THEN '' 
    ELSE ' AND ProductID = ' + cast(@ProductID as varchar(10)) 
END 
+ CASE IsNull(@ProductSubcategoryID, 0) 
     WHEN 0 THEN '' 
     ELSE ' AND ProductSubcategoryID = ' + cast(@ProductSubcategoryID as varchar(10)) 
END 
+ CASE IsNull(@ProductModelIDs, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductModelID IN (' + @ProductModelIDs + ')' 
END  
+ CASE IsNull(@ProductName, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductName LIKE ''' + @ProductName + '''' 
END  
+ CASE IsNull(@ProductNumber, '') 
    WHEN '' THEN '' 
    ELSE ' AND ProductNumber LIKE ''' + @ProductNumber + '''' 
END  
+ CASE IsNull(@MakeFlag, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND MakeFlag = 1' 
END  
+ CASE IsNull(@FinishedGoodsFlag, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND FinishedGoodsFlag = 1' 
END  
+ CASE IsNull(@StartStandardCost, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND StandardCost &gt;= ' + cast(@StartStandardCost as varchar(20)) 
END  
+ CASE IsNull(@EndStandardCost, 0) 
    WHEN 0 THEN '' 
    ELSE ' AND StandardCost &lt;= ' + cast(@EndStandardCost as varchar(20)) 
END 
+ CASE IsNull(@StartSellStartDate, '') 
    WHEN '' THEN '' 
    ELSE ' AND DateDiff(d, SellStartDate, dbo.GetStringDate(''' + @StartSellStartDate + '''))&lt;=0' 
END  
+ CASE IsNull(@EndSellStartDate, '') 
     WHEN '' THEN '' 
     ELSE 'AND DateDiff(d, SellStartDate, dbo.GetStringDate(''' + @EndSellStartDate + '''))&gt;=0' 
END  
+ ' ORDER BY ' 
+ CASE LEN(@OrderBy) 
    WHEN 0 THEN ' ProductNumber' -- default sorting 
     ELSE @OrderBy -- chosen sorting 
END 
-- executing the query 
EXEC (@Query)  
END 

    The above stored procedure could split in 5 logical section:
1. The definition holding stored procedure’s name and parameters
    Parameters’ selection depends on the requirements, usually the natural and database key for the main entity or for master data (e.g. Customer, Vendor, Item Number, etc.), the LOV-based attributes used most often as selection criteria (e.g. Statuses, Characteristics), flags (e.g. Make Flag, Finished Goods Flag) that have important meaning for the business, dates used to partition the result set (e.g. Creation Date, Document Date, GL Date) being in general needed to provide the lower and upper bound of the interval. For special type of analysis it’s useful to include also a set of controls for specifying the numeric intervals, like in this case for Standard Cost.
   In general the parameters have the same date type as the attribute they represent, with a few exceptions:
dates – because the different date formats used in the various environments (e.g. client vs. web server vs. database server) I prefer to send the dates as string, thus assuring that the date will be always in the expected format.
multi-choice attributes – in case is needed to handle multi-choice selections then the easiest way to handle this is to provide a list of concatenated values as a string, as in @ProductModelIDs’ case.
flags – sometimes I prefer to use characters instead of boolean (0,1) values because it offers more flexibility in handling triple or multi-state situations.
2. The declaration of internal variables, here the @Query holding the dynamic built query, the variable needs to be big enough to hold the whole concatenated string with maximum list of parameters and ORDER BY attributes.
3. The translation of ORDER BY attributes here stored in @OrderBy input parameter, which contains a list with numbers from 0 to 9 that will dictate the attributes used in ORDER BY clause and their order. For example will be sent ‘3,1’ to sort the result set based on SellStartDate and ProductName.
4. The actual concatenation of the dynamic query, the handling for NULL values being handled with a set of CASE functions, existing one CASE for each parameter, including @OrderBy.
5. Query’s execution, here done using Exec.

Notes:  
    Usually I’m including a 6th section which includes several metadata about the current object (e.g. who created it and when, when was last time modified, objects that call current object, the purpose of the object, etc.).
     It’s needed to test such a procedure for all parameters – either by providing all parameters as in the first below example, or creating a test case for each parameter, the later approach being much safer because when multiple parameters are provided, one of them could hide an existing issue.

EXEC dbo.pSearchProducts 973, 2, '25,26,27', 'Road%', 'BK%', 1, 1, 1000, 1200, '01/01/2003', '31/12/2003', '' -- all parameters 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 0, 0, '', '', '' -- all records 
EXEC dbo.pSearchProducts 1, 0, '', '', '', 0, 0, 0, 0, '', '', '' -- ProductID test 
EXEC dbo.pSearchProducts 0, 2, '', '', '', 0, 0, 0, 0, '', '', '' -- ProductSubcategoryID test 
EXEC dbo.pSearchProducts 0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '', '', '2,1' -- ProductModelIDs test 
EXEC dbo.pSearchProducts 0, 0, '', 'Mountain%', '', 0, 0, 0, 0, '', '', '1' -- ProductName test 
EXEC dbo.pSearchProducts 0, 0, '', '', 'BK-T18U%', 0, 0, 0, 0, '', '', '2' -- ProductNumber test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 1, 1, 0, 0, '', '', '' -- MakeFlag &amp; FinishedGoodsFlag test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 50, 110, '', '', '' -- StandardCost test 
EXEC dbo.pSearchProducts 0, 0, '', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002', '3,1' -- SellStartDate test 

    Is it really needed to create a dynamic query in order to handle the various filtering scenarios? In theory yes, there are several techniques that could be used to address a special type of parameters:
 
1. Single values excepting  and dates

    In such cases the CASE could be replaced with the IsNull function test on parameter, if the parameter is null then take the value of the attribute against which the test is performed. For example the following CASE:
 
CASE IsNull(@ProductID, 0) 
    WHEN 0 THEN '' 
     ELSE ' AND ProductID = ' + cast(@ProductID as varchar(10)) 
END 
 
could be replaced with the following constraint:
 
ProductID = IsNull(@ProductID, ProductID) 
 
    A similar technique could be used also for interval-based attributes like the StandardCost from our example, the two above cases could be replaced with the following constraint:
 
StandardCost BETWEEN ISNULL(@StartStandardCost, StandardCost) AND ISNULL(@EndStandardCost, StandardCost) 
 
 Note:
    In case the attributes contain also NULL values then the Null value need to be replaced with a special value, typically not found in the list of values the respective attribute could take, for example –1 for foreign keys, an incredibly small number for numeric data types, single or empty space for string data types.

2. List of values
 
   The list of values are a little more difficult to handle but not impossible, for this scenario could be used the dbo.SplitList function introduced in a previous post. For example the CASE for @ProductModelIDs could be rewritten as follows:
 
IsNull (ProductModelID, -1) IN (SELECT value FROM dbo.SplitList(IsNull(NullIf(@ProductModelIDs, ''), CAST(IsNull(ProductModelID, -1) as varchar(20)) ), ',')) 

    The IsNull(ProductModelID, -1) expression has been introduced in order to handle the Null values, while the expression for the first parameters of dbo.SplitList was written in this way in order to handle the cases in which the @ProductModelIDs was not provided or it contains no values.
   Unfortunately this technique even if it solves the problem, is not quite efficient because the dbo.SplitList function is called for each record, fact that impacts query’s performance considerably. On the other side if the @ProductModelIDs would be mandatory than the dbo.SplitList could be joined with the main query, fact that increases query’s performance, though it must be paid attention to the duplicates, because each duplicate found in the list provided to dbo.SplitList function could lead to duplicates in the final result!

3. Dates

  Because the Date functions can’t be used errorless with the BETWEEN operator in SQL Server, the DateDiff function being preferred for such case, the handing of Date intervals could be achieved by providing the lower/upper bound of the interval date types used, for example ‘0001-01-01’, respectively ‘9999-12-31’ for date data type, or ‘1900-01-01’, respectively ‘2079-06-06’ for smalldatetime data type. or ‘1753-01-01’, respectively ‘9999-12-31’ for datetime data type.  Because the dbo.GetDateAsString function returns a datetime I will use the lower/uppers bounds of the respective data type in case the date values are null, thus the two CASEs for SellStartDate could be rewritten as:

DateDiff (d, SellStartDate, dbo.GetStringDate(IsNull(@StartSellStartDate, '01-01-1753')))&lt;=0 
AND DateDiff(d, SellStartDate, dbo.GetStringDate(IsNull(@EndSellStartDate, '31-12-2009')))&gt;=0     

   In case is needed to return also the dates having Null value, then the SellStatDate could be replaced with any of the two bounds of the data type used, thus the above constraint becomes:

DateDiff (d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(@StartSellStartDate, '01-01-1753')))&lt;=0 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(@EndSellStartDate, '31-12-2009')))&gt;=0 

   Unfortunately I haven’t found any easy way to handle the different attribute combinations that could appear in the ORDER BY clause. On the other side there are controls that handle the sorting on the client, therefore a default ORDER BY could be provided, and if the Users need other sorting, then they could sort the data by themselves. Here’s an alterative writing for the above stored procedure without using a dynamic query, though this doesn’t mean that the performance of this approach is better than the one of the first stored procedure given the fact that a constraint is evaluated for each parameter, the unnecessary constraints for list of values and dates coming with a considerable decrease in performance.

  -- Retrieving Products w/o dynamic query 
CREATE PROCEDURE dbo.pSearchProductsND( 
@ProductID int , @ProductSubcategoryID int , @ProductModelIDs varchar(500) , @ProductName nvarchar(50) , @ProductNumber nvarchar(25) , @MakeFlag bit , @FinishedGoodsFlag bit , @StartStandardCost money , @EndStandardCost money , @StartSellStartDate varchar(10) , @EndSellStartDate varchar(10)) 
AS 
BEGIN -- the query SELECT ProductID, ProductName, ProductNumber, ProductModel, CatalogDescription, ProductSubcategory 
,MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice 
,Size, SizeUnitMeasureCode, SizeUnitMeasure, WeightUnitMeasureCode, WeightUnitMeasure, Weight 
,DaysToManufacture, ProductLine, Class, Style, dbo.GetDateAsString(SellStartDate) SellStartDate 
,dbo.GetDateAsString(SellEndDate) SellEndDate, dbo.GetDateAsString(ModifiedDate) ModifiedDate 
FROM Production.vProducts 
WHERE IsNull(ProductID, -1) = COALESCE(NullIf(@ProductID, 0), ProductID, -1) 
AND IsNull(ProductSubcategoryID, -1) = COALESCE(NullIf(@ProductSubcategoryID, 0), ProductSubcategoryID, -1) 
AND IsNull(ProductModelID, -1) IN (SELECT value FROM dbo.SplitList(IsNull(NullIf(@ProductModelIDs, ''), CAST(IsNull(ProductModelID, -1) as varchar(20)) ), ','))  
AND IsNull(ProductName, ' ') LIKE IsNull(NullIf(@ProductName, ''), IsNull(ProductName, ' '))  
AND IsNull(ProductNumber, ' ') LIKE IsNull(NullIf(@ProductNumber, ''), IsNull(ProductNumber, ' '))  
AND IsNull(MakeFlag, 0) = COALESCE(NullIf(@MakeFlag, 0), MakeFlag, 0)  
AND IsNull(FinishedGoodsFlag, 0) = COALESCE(NullIf(@FinishedGoodsFlag, 0), FinishedGoodsFlag, 0) 
AND StandardCost BETWEEN ISNULL(NullIf(@StartStandardCost, 0), StandardCost) AND ISNULL(NullIf(@EndStandardCost, 0), StandardCost) 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@StartSellStartDate, ''), '01-01-1753')))&lt;=0 
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@EndSellStartDate, ''), '31-12-2009')))&gt;=0 
ORDER BY ProductNumber END 
      
   Given the format in which parameters are sent (0 for numeric values, empty string for string values) instead of using Nulls, I had to transform the parameters to Null in case they take the value 0 or empty string, this being the only difference between the constraints used in the query and the example constraints from above. In this way the stored procedures should return the same number of records for the same test cases, here are the test cases for the second stored procedure: 


EXEC dbo.pSearchProductsND 973, 2, '25,26,27', 'Road%', 'BK%', 1, 1, 1000, 1200, '01/01/2003', '31/12/2003' -- all parameters 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 0, 0, '', '' -- all records 
EXEC dbo.pSearchProductsND 1, 0, '', '', '', 0, 0, 0, 0, '', '' -- ProductID test 
EXEC dbo.pSearchProductsND 0, 2, '', '', '', 0, 0, 0, 0, '', '' -- ProductSubcategoryID test 
EXEC dbo.pSearchProductsND 0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '', '' -- ProductModelIDs test 
EXEC dbo.pSearchProductsND 0, 0, '', 'Mountain%', '', 0, 0, 0, 0, '', '' -- ProductName test 
EXEC dbo.pSearchProductsND 0, 0, '', '', 'BK-T18U%', 0, 0, 0, 0, '', '' -- ProductNumber test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 1, 1, 0, 0, '', '' -- MakeFlag &amp; FinishedGoodsFlag test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 50, 110, '', '' -- StandardCost test 
EXEC dbo.pSearchProductsND 0, 0, '', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002' -- SellStartDate test 

  Now it depends which method you will want to use, frankly I like the dynamic query approach because it offers more flexibility in dealing with parameters and in handling other types of queries. In addition if it happens that you use the second approach, the request could change and you might need to redesign your stored procedure using the first approach.

References:
[1] MSDN. 2009. Using sp_executesql. [Online] Available from:
http://msdn.microsoft.com/en-us/library/ms175170.aspx (Accessed: 15 February 2010)


Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.