15 February 2010

Oracle vs. SQL Server: Handling Missing Dates

    What do we do with missing values? How do they influence the data analysis? Two questions each developer should ask users when creating/modifying a report. In general for numbers things are pretty simple, just replace the NULL values with a 0, this impacting the report minimally. With Dates is a little more complicated because maybe is needed to remove records having a certain date null, replace it with another date representing a certain event or maybe with the current date.

    As NULLIF, COALESCE and CASE functions are available in both systems, they could be used in particular to handle missing dates or other type of data types, especially when intended to make code portable between the two platforms. In addition SQL Server provides the IsNull function, the equivalent in Oracle being the NVL, extended by NVL2. Oracle provides also an equivalent for simple CASE function, namely the DECODE function. Here are some examples based on Product table from AdventureWorks database:
-- SQL Server NULL handling
SELECT ProductID, SellStartDate, SellEndDate, DiscontinuedDate, GetDate() CurrentDate
,
IsNull(SellEndDate, GetDate()) Example1

,
COALESCE(SellEndDate, GetDate()) Example3

,
COALESCE(DiscontinuedDate, SellEndDate, GetDate()) Example4

,
CASE
     WHEN SellEndDate IS NULL THEN GetDate()

     ELSE SellEndDate
END Example6

,
NullIf(SellEndDate, SellStartDate) Example7

FROM
Production.Product


-- Oracle NULL handling
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, DISCONTINUEDDATE, SYSDATE
,
NVL(SELLENDDATE, SYSDATE) EXAMPLE1

,
NVL2(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE2

,
COALESCE(SELLENDDATE, SYSDATE) EXAMPLE3

,
COALESCE(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE4

,
DECODE(SELLENDDATE, NULL, SYSDATE, SELLENDDATE) EXAMPLE5

,
CASE

    WHEN SELLENDDATE IS NULL THEN SYSDATE
    ELSE SELLENDDATE
END EXAMPLE6
, NULLIF(SELLENDDATE, SELLSTARTDATE) EXAMPLE7
FROM
PRODUCT


      The simple CASE function can’t be use to check for NULL values unless the NULL value is implied within the ELSE branch, and this because the IS NULL clause needs to be used for checking whether a value is missing. An exception from this rule seems to be the DECODE function, as can be seen from 5th example from the Oracle-based query.

     Aggregated functions ignore NULL values in both systems, therefore any of the above techniques could be used to handle the missing values if other behavior is expected. The selection of the date used in case the target date is missing depends on the requirements, usually is used the current date when the reports focus on today’s status, while for the calculation of lead/cycle times is preferred to use the closest date to the target date.
-- SQL Server: handling nulls in aggregations
SELECT AVG(DateDiff(d, SELLSTARTDATE, IsNull(SELLENDDATE, GETDATE()))) -- handling null values
,
AVG(DateDiff(d,SELLSTARTDATE, SELLENDDATE)) -- ignoring null values

FROM
Production.Product


-- Oracle: handling nulls in aggregations
SELECT AVG(TRUNC(NVL(SELLENDDATE, SYSDATE)) - TRUNC(SELLSTARTDATE)) -- handling null values
,
AVG(TRUNC(SELLENDDATE) - TRUNC(SELLSTARTDATE)) -- ignoring null values

FROM
PRODUCT


    Another scenario when missing date values need to be handled is in joins, often for such cases a missing value denoting that the record is still active, like in the below example:
-- SQL Server: getting the Start Date for current List Price
SELECT ITM.ProductID
,
ITM.ProductNumber

,
ITM.ListPrice

,
PPH.StartDate

FROM
Production.Product ITM
JOIN Production.ProductListPriceHistory PPH
ON ITM.ProductID = PPH.ProductID
AND PPH.EndDate IS NULL

   The above query could be rewritten also in the following form that can be useful to get the record in use during a certain event:
-- SQL Server: getting the List Price when the product was last time modified
SELECT ITM.ProductID
,
ITM.ProductNumber

,
PPH.ListPrice

,
PPH.StartDate

,
PPH.EndDate

,
ITM.ModifiedDate

FROM
Production.Product ITM
JOIN Production.ProductListPriceHistory PPH
ON ITM.ProductID = PPH.ProductID
AND DateDiff(d, ITM.ModifiedDate, IsNull(PPH.EndDate, GETDATE()))>=0
AND DateDiff(d, PPH.StartDate, ITM.ModifiedDate)>=0


    In Oracle special attention must be given to LEAST and GREATEST functions that consider the smallest/greatest values from a list of attributes because if one of the vales is NULL then function’s value is NULL too, therefore if one of the attributes could take NULL values then it should be replaced with the values from a NOT NULL attribute used in the functions, like in 3rd and 4th example from the below example, this change having no impact on functions’ output.
-- Oracle Greatest/Least example
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, SYSDATE
,
GREATEST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE1

,
LEAST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE2

,
GREATEST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE3

,
LEAST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE4

FROM
PRODUCT

No comments: