15 February 2010

SQL Reloaded: Oracle vs. SQL Server (Handling Missing Dates)

    What do we do with missing values? How do they influence the data analysis? Two questions each developer should ask users when creating/modifying a report. In general for numbers things are pretty simple, just replace the NULL values with a 0, this impacting the report minimally. With Dates is a little more complicated because maybe is needed to remove records having a certain date null, replace it with another date representing a certain event or maybe with the current date.

    As NULLIF, COALESCE and CASE functions are available in both systems, they could be used in particular to handle missing dates or other type of data types, especially when intended to make code portable between the two platforms. In addition SQL Server provides the IsNull function, the equivalent in Oracle being the NVL, extended by NVL2. Oracle provides also an equivalent for simple CASE function, namely the DECODE function. Here are some examples based on Product table from AdventureWorks database: 
  
-- SQL Server NULL handling 
SELECT ProductID, SellStartDate, SellEndDate, DiscontinuedDate, GetDate() CurrentDate 
, IsNull(SellEndDate, GetDate()) Example1 
, COALESCE(SellEndDate, GetDate()) Example3 
, COALESCE(DiscontinuedDate, SellEndDate, GetDate()) Example4 
, CASE 
     WHEN SellEndDate IS NULL THEN GetDate() 
     ELSE SellEndDate 
END Example6 
, NullIf(SellEndDate, SellStartDate) Example7 
FROM Production.Product 

-- Oracle NULL handling 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, DISCONTINUEDDATE, SYSDATE 
, NVL(SELLENDDATE, SYSDATE) EXAMPLE1 
, NVL2(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, COALESCE(SELLENDDATE, SYSDATE) EXAMPLE3 
, COALESCE(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE4 
, DECODE(SELLENDDATE, NULL, SYSDATE, SELLENDDATE) EXAMPLE5 
, CASE 
    WHEN SELLENDDATE IS NULL THEN SYSDATE 
    ELSE SELLENDDATE 
END EXAMPLE6 
, NULLIF(SELLENDDATE, SELLSTARTDATE) EXAMPLE7 
FROM PRODUCT 

      The simple CASE function can’t be use to check for NULL values unless the NULL value is implied within the ELSE branch, and this because the IS NULL clause needs to be used for checking whether a value is missing. An exception from this rule seems to be the DECODE function, as can be seen from 5th example from the Oracle-based query.

     Aggregated functions ignore NULL values in both systems, therefore any of the above techniques could be used to handle the missing values if other behavior is expected. The selection of the date used in case the target date is missing depends on the requirements, usually is used the current date when the reports focus on today’s status, while for the calculation of lead/cycle times is preferred to use the closest date to the target date. 

-- SQL Server: handling nulls in aggregations 
SELECT AVG(DateDiff(d, SELLSTARTDATE, IsNull(SELLENDDATE, GETDATE()))) -- handling null values 
, AVG(DateDiff(d,SELLSTARTDATE, SELLENDDATE)) -- ignoring null values 
FROM Production.Product 

-- Oracle: handling nulls in aggregations 
SELECT AVG(TRUNC(NVL(SELLENDDATE, SYSDATE)) - TRUNC(SELLSTARTDATE)) -- handling null values 
, AVG(TRUNC(SELLENDDATE) - TRUNC(SELLSTARTDATE)) -- ignoring null values 
FROM PRODUCT 

    Another scenario when missing date values need to be handled is in joins, often for such cases a missing value denoting that the record is still active, like in the below example: 

-- SQL Server: getting the Start Date for current List Price 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, ITM.ListPrice 
, PPH.StartDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND PPH.EndDate IS NULL 
   
   The above query could be rewritten also in the following form that can be useful to get the record in use during a certain event:

-- SQL Server: getting the List Price when the product was last time modified 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, PPH.ListPrice 
, PPH.StartDate 
, PPH.EndDate 
, ITM.ModifiedDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND DateDiff(d, ITM.ModifiedDate, IsNull(PPH.EndDate, GETDATE()))>=0 
AND DateDiff(d, PPH.StartDate, ITM.ModifiedDate)>=0 

    In Oracle special attention must be given to LEAST and GREATEST functions that consider the smallest/greatest values from a list of attributes because if one of the vales is NULL then function’s value is NULL too, therefore if one of the attributes could take NULL values then it should be replaced with the values from a NOT NULL attribute used in the functions, like in 3rd and 4th example from the below example, this change having no impact on functions’ output.

-- Oracle Greatest/Least example 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, SYSDATE 
, GREATEST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE1 
, LEAST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, GREATEST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE3 
, LEAST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE4 
FROM PRODUCT

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.