23 September 2007

💎SQL Reloaded: 6 out of 49 (Part 2: Modeling the Data)

In a previous post I created a random set of data and stored them in LottoExtracts table, the first step is done, now we have to model the data in a form which could be useful for our analysis. Thus it makes sense to split the table in two tables:     
  • LottoExtracts containing mainly the DrawingDate and eventually additional information about extraction (e.g. Country, System, etc.). For simplicity I’m including only the Drawing Date.     
  • LottoNumbers containing the numbers and their order in extraction. 
Here are the tables:
 
-- dropping the LottoDrawings table
DROP TABLE IF EXISTS dbo.LottoDrawings
GO

--creating the LottoDrawings table 
CREATE TABLE dbo.LottoDrawings( 
  DrawingID int IDENTITY(1,1) NOT NULL
 , DrawingDate smalldatetime NULL) 
GO

 -- dropping the LottoNumbers table
DROP TABLE IF EXISTS dbo.LottoNumbers

 --creating LottoNumbers table 
 CREATE TABLE dbo.LottoNumbers(NumberID int IDENTITY(1,1) NOT NULL
 , DrawingID int NULL
 , Position smallint NULL
 , Number smallint NULL)   

 And, the view which will be used as source for analysis: 

 --creating vLottoNumbers view 
 CREATE VIEW dbo.vLottoNumbers AS 
 -- consolidated drawings by number
 SELECT LN.NumberID 
 , LN.Position 
 , LN.Number 
 , LN.DrawingID 
 , LD.DrawingDate 
 , Year(LD.DrawingDate) DrawingYear , Month(LD.DrawingDate) DrawingMonth 
 , DatePart(dd, LD.DrawingDate) DrawingDay 
 , DatePart(wk, LD.DrawingDate) DrawingWeek 
 FROM dbo.LottoNumbers LN 
      JOIN dbo.LottoDrawings LD 
	    ON LN.DrawingID = LD.DrawingID    

Now that we have the objects to model the data, we need to populate the tables with the data from LottoExtracts table: 

--populating LottoDrawings table 
INSERT dbo.LottoDrawings (DrawingDate) 
SELECT DISTINCT DrawingDate 
FROM dbo.LottoExtracts 

--populating the LottoNumbers table - first drawing number 
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 1, N1 
FROM dbo.LottoExtracts LE 
     JOIN dbo.LottoDrawings LD 
	   ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0  

The above query inserted only the first number of the extraction, in order to insert the other 5 numbers, all you have to do is to replace 1 with 2 in the second line and run the query again, the insert for second number becomes: 

--populating the LottoNumbers table – second drawing number 
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 2, N2 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0

Here's the code for the remaining numbers:

--populating LottoNumbers table – the remaining drawing numbers
INSERT LottoNumbers (DrawingID, Position, Number) 
SELECT DrawingID, 3, N3 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 4, N4
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 5, N5 
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0
UNION ALL 
SELECT DrawingID, 6, N6
FROM dbo.LottoExtracts LE 
       JOIN dbo.LottoDrawings LD
	     ON DateDiff(d, LE.DrawingDate, LD.DrawingDate) = 0

Checking the output, you should have 12000 records: 

--checking the data 
SELECT * 
FROM dbo.vLottoNumbers 
ORDER BY DrawingDate
, Position

Now we have the data for analysis (see the next post).

Happy coding!

Previous Post <<||>> Next Post

💎SQL Reloaded: 6 out of 49 (Part 1: Getting the data)

I was thinking to model the 6 out of 49 lottery drawing, so common across EU countries. To study numbers’ occurrences we need to import the historical data, though during my first attempts this proved to be challenging. Each lottery has a website from which data can be taken drawing by drawing or when possible, as a data dump (see notes). Moreover, some of the lotteries have variations for 6/49.       

To avoid any legal litigations, I will use a self-made set of data via the RAND function (which unfortunately is not a perfect randomizer). To store the generated data, I've created a table called LottoExtracts: 

-- cleaning the table
DROP TABLE IF EXISTS dbo.LottoExtracts

--creating the LottoExtracts table 
CREATE TABLE dbo.LottoExtracts(     
   DrawingDate date NULL
 , N1 smallint NULL
 , N2 smallint NULL
 , N3 smallint NULL
 , N4 smallint NULL
 , N5 smallint NULL
 , N6 smallint NULL)

This format will be used to upload the data in a staging table which is used later as source for our data model. As can be seen it has a Drawing Date which identifies uniquely an extraction, while in N1, …, N6 are stored the drawings’ numbers in the order they were picked. 

--creating random data 
DECLARE @index, @counter int 
DECLARE @N1, @N2, @N3, @N4, @N5, @N6 smallint 
  
SET @index = 0 
WHILE @index < 2000 
BEGIN 
  -- creating the first number 
  SET @N1 = Cast(1000 * RAND() as int) % 48 + 1 

  -- creating the second number 
  SET @counter = 1 
  WHILE @counter < 50 or @N1 = @N2 
  BEGIN 
	  SET @N2 = Cast(1000 * RAND() as int) % 48 + 1;
	  SET @counter = @counter + 1 
  END;

  -- creating the third number
  SET @counter = 1 
  WHILE @counter < 50 or @N3 IN (@N1, @N2) 
  BEGIN 
	  SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fourth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N4 IN (@N1, @N2, @N3) 
  BEGIN 
	  SET @N4 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fifth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N5 IN (@N1, @N2, @N3, @N4) 
  BEGIN 
	  SET @N5 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the sixth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N6 IN (@N1, @N2, @N3, @N4, @N5) 
  BEGIN 
	  SET @N6 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  --inserting a new drawing 
  INSERT LottoExtracts(DrawingDate, N1, N2, N3, N4, N5, N6) 
  VALUES (DateAdd(d, @index*7, '01/01/2000'), @N1, @N2, @N3, @N4, @N5, @N6) 
  SET @index = @index + 1 
END 

I've created only 200 records, which equates with almost 4 years of history, and should be enough for testing purposes. Of course, one can increase or decrease the number of drawings as needed! 
 
--checking the data 
SELECT * 
FROM dbo.LottoExtracts   

In what concerns the code, the following statement creates a random number in 1 … 49 range:
 
-- creating first number 
SET @N1 = Cast(1000 * RAND() as int) % 48 + 1    

Thus, can be created all 6 numbers, however because the same number can't be extracted twice in the same drawing, I had to add duplication checks:
 
-- creating third number 
SET @counter = 1 
WHILE @counter < 50 or @N3 IN (@N1, @N2) 
BEGIN 
	SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	SET @counter = @counter + 1 
END    

Notes:
(1) The code is verbose and probably suboptimal, though it's a must because of the behavior of RAND function as part of transactions.
(2) The below links could change in time. I provided them just to make easier the search. (*) denotes that historical data can be downloaded.
Previous Post <<||>> Next Post

15 September 2007

💎SQL Reloaded: Updating Data With Values From The Same Table

I have to deal with cases in which I have to update data from a table with values from the same table across records, a simple example I can sketch is the one of Standard Prices, which are normally valid during a year. For example if I would have to reset the Standard Prices from a year to the ones from the previous year, how this can be done? Actually that's pretty sample, for exemplification purposes I will create a table having only 3 columns, ProductID, Year and StandardPrice: 

-- creating the table 
CREATE TABLE dbo.StandardPrices( 
ProductID int 
, [Year] int 
, StandardPrice decimal(13,2) )      

--creating test data 
DECLARE @index int 
SET @index = 0 
  WHILE @index < 50 
  BEGIN 
      INSERT dbo.StandardPrices(ProductID, [Year], StandardPrice) 
      VALUES (@index % 25+1, 2000 + (@index/25), 100 * RAND()) 
      SET @index = @index + 1 
END     

Let's see the content, the RAND() function allows me to assign random prices to each product. As can be seen there are 25 products with Standard Prices for the Years 2000 and 2001.

 --testing StandardPrices table's content 
SELECT * 
FROM dbo.StandardPrices 
ORDER BY ProductID, [Year]     

Let's try to use a normal update: 

--first update attempt 
UPDATE dbo.StandardPrices 
SET StandardPrice = A.StandardPrice 
FROM StandardPrices A 
WHERE StandardPrices.ProductID = A.ProductID 
AND StandardPrices.[Year] = 2001 AND A.[Year] = 2000     

Ups, I got an error (SQL Server 2005):
Msg 4104, Level 16, State 1, Line 1 The multi-part identifier "StandardPrices.ProductID" could not be bound. Msg 4104, Level 16, State 1, Line 1 The multi-part identifier "StandardPrices.Year" could not be bound. 
SQL Server 2000:
Server: Msg 107, Level 16, State 3, Line 1 The column prefix 'StandardPrices' does not match with a table name oralias name used in the query. Server: Msg 107, Level 16, State 1, Line 1 The column prefix 'StandardPrices' does not match with a table name oralias name used in the query.     

It seems we can't use the same table in updates of this type, but we can use a subquery or view which refers to the same table, so we can write: 

  --the update 
UPDATE dbo.StandardPrices 
SET StandardPrice = A.StandardPrice 
FROM ( 
     SELECT ProductID , [Year] , StandardPrice 
     FROM dbo.StandardPrices 
     WHERE [Year] = 2000 
) A 
WHERE StandardPrices.ProductID = A.ProductID 
     AND StandardPrices.[Year] = 2001     

Now there are no errors, 25 records updated, let's see the output: 

--testing table's content after update 
SELECT * 
FROM dbo.StandardPrices 
ORDER BY ProductID
, [Year]

💎SQL Reloaded: PIVOT Operator Example

    Starting with SQL Server 2005, Microsoft introduced the PIVOT operator, which tries to implement the pivoting feature available in Excel, however it does only static pivoting, the values on which the pivoting is done must be mentioned explicitely.
    In List Based Aggregations I gave an example of a query in which the On Hand was cumulated per Location, the same output can be obtained with the below query:
SELECT Name
, ProductNumber
, MakeFlag
, FinishedGoodsFlag
, IsNull([Paint Storage], 0) OnHandPaintStorage
, IsNull([Frame Forming], 0) OnHandFrameForming
, IsNull([Frame Welding], 0) OnHandFrameWelding
, IsNull([Debur and Polish], 0) OnHandDeburPolish
, IsNull([Specialized Paint], 0) OnHandSpecializedPaint
, IsNull([Subassembly], 0) OnHandAssembly
, IsNull([Final Assembly], 0) OnHandFinalAssembly
, IsNull([Paint Storage], 0) + IsNull([Frame Forming], 0) + IsNull([Frame Welding], 0) + IsNull([Debur and Polish], 0) + IsNull([Specialized Paint], 0) + IsNull([Subassembly], 0) + IsNull([Final Assembly], 0) OnHand
, ListPrice
, StandardCost
, (IsNull([Paint Storage], 0) + IsNull([Frame Forming], 0) + IsNull([Frame Welding], 0) + IsNull([Debur and Polish], 0) + IsNull([Specialized Paint], 0) + IsNull([Subassembly], 0) + IsNull([Final Assembly], 0) ) * StandardCost Value
FROM
(
SELECT PP.Name
, PP.ProductNumber
, PP.MakeFlag
, PP.FinishedGoodsFlag
, PL.Name AS Location
, PP.ListPrice
, PP.StandardCost
, PPI.Quantity
FROM Production.ProductInventory PPI
JOIN Production.Product PP
ON PPI.ProductID = PP.ProductID
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
) AS A
PIVOT
(
SUM(Quantity)
FOR Location IN ([Paint Storage], [Frame Forming], [Frame Welding], [Debur and Polish], [Specialized Paint], [Subassembly], [Final Assembly])
) PVT

    The query is not rocket science, but it took me some time to figure out that when I need to use multiple tables, I have to create first a subquery, and only then can apply the PIVOT operator. This is how I tried to write my query:
SELECT PP.Name
, PP.ProductNumber
, PP.MakeFlag
, PP.FinishedGoodsFlag
, IsNull([Paint Storage], 0) OnHandPaintStorage
, IsNull([Frame Forming], 0) OnHandFrameForming
, IsNull([Frame Welding], 0) OnHandFrameWelding
, IsNull([Debur and Polish], 0) OnHandDeburPolish
, IsNull([Specialized Paint], 0) OnHandSpecializedPaint
, IsNull([Subassembly], 0) OnHandAssembly
, IsNull([Final Assembly], 0) OnHandFinalAssembly
, IsNull([Paint Storage], 0) + IsNull([Frame Forming], 0) + IsNull([Frame Welding], 0) + IsNull([Debur and Polish], 0) + IsNull([Specialized Paint], 0) + IsNull([Subassembly], 0) + IsNull([Final Assembly], 0) OnHand
, PP.ListPrice
, PP.StandardCost
, (IsNull([Paint Storage], 0) + IsNull([Frame Forming], 0) + IsNull([Frame Welding], 0) + IsNull([Debur and Polish], 0) + IsNull([Specialized Paint], 0) + IsNull([Subassembly], 0) + IsNull([Final Assembly], 0) ) * StandardCost Value
FROM Production.ProductInventory PPI
JOIN Production.Product PP
ON PPI.ProductID = PP.ProductID
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
PIVOT
(
SUM(Quantity)
FOR PL.Name IN ([Paint Storage], [Frame Forming], [Frame Welding], [Debur and Polish], [Specialized Paint], [Subassembly], [Final Assembly])
) PVT
and the error: Msg 8156, Level 16, State 1, Line 1
The column 'ProductID' was specified multiple times for 'PVT'.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.Name" could not be bound.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.ProductNumber" could not be bound.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.MakeFlag" could not be bound.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.FinishedGoodsFlag" could not be bound.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.ListPrice" could not be bound.
Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "PP.StandardCost" could not be bound.

Notes:
1. Probably it's better to create a view for subquery and use the view instead. I am doing that most of the times, sometimes I prefer to write table-valued functions; more on this in another post.
2. As it seems can be used only one aggregation function with PIVOT, that's pretty bad because even if you start doing your query using a PIVOT and then the requirements change (e.g. is needed to output how many records are for each Location, a more appropriate example is the ones of Purchase Orders, in which could be aggregations applied on Ordered, Receipt and Open Quantity) then you'll have to change the whole query. I'm expecting more from next implementations of PIVOT operator, I hope Microsoft will have that on their list.
3. The benefit of PIVOT operator is supposed to be that it allows writing shorter code, not necessarily. Of course, it allows to eliminate the redundant code - the multiple CASE WHEN statments, however this works when there is only one column (Location in our example) used to do the split, what if there is additional logic which must be included in CASEs?

💎SQL Reloaded: List Based Aggregations (On Hand example)

    I found some time ago a query like the one below (I wrote this kind of queries myself too):
SELECT PP.Name
, PP.ProductNumber
, PP.MakeFlag
, PP.FinishedGoodsFlag
, IsNull(PI1.Quantity, 0) OnHandPaintStorage
, IsNull(PI2.Quantity, 0) OnHandFrameForming
, IsNull(PI3.Quantity, 0) OnHandFrameWelding
, IsNull(PI1.Quantity, 0)+IsNull(PI2.Quantity, 0)+IsNull(PI3.Quantity, 0) OnHand
, PP.ListPrice
, PP.StandardCost
, (IsNull(PI1.Quantity, 0)+IsNull(PI2.Quantity, 0)+IsNull(PI3.Quantity, 0))*PP.StandardCost Value
FROM Production.Product PP
LEFT JOIN
(-- On Hand Paint Storage
SELECT SUM (Quantity) Quantity
, PPI.ProductID
FROM Production.ProductInventory PPI
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
WHERE PL.Name = 'Paint Storage' GROUP BY PPI.ProductID
) PI1
ON PP.ProductID = PI1.ProductID
LEFT JOIN
(--On Hand Frame Forming
SELECT SUM (Quantity) Quantity
, PPI.ProductID
FROM Production.ProductInventory PPI
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
WHERE PL.Name = 'Frame Forming'
GROUP BY PPI.ProductID
) PI2
ON PP.ProductID = PI2.ProductID
LEFT JOIN
(--On Hand Frame Welding
SELECT SUM (Quantity) Quantity
, PPI.ProductID
FROM Production.ProductInventory PPI
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
WHERE PL.Name = 'Frame Welding'
GROUP BY PPI.ProductID
) PI3
ON PP.ProductID = PI3.ProductID
WHERE IsNull(PI1.Quantity, 0)+IsNull(PI2.Quantity, 0)+IsNull(PI3.Quantity, 0) <> 0
ORDER BY PP.Name

    I exemplified the above query by using tables from AdventureWorks database, and it's based on Product, ProductInventoy and Location table.
The query returns the OnHand from special Locations (here I exemplified with only 3 locations, but could be more), the query is not difficult even if it involves 3 tables, however many people write this kind of lengthy queries. A much simpler solution is to use directly aggregations as in the below query. Such method is flexible enough to allow to add another Locations to the query in no time, and performs better!
SELECT PP.Name
, PP.ProductNumber
, PP.MakeFlag
, PP.FinishedGoodsFlag
, SUM(CASE WHEN PL.Name = 'Paint Storage' THEN PPI.Quantity ELSE 0 END) OnHandPaintStorage
, SUM(CASE WHEN PL.Name = 'Frame Forming' THEN PPI.Quantity ELSE 0 END) OnHandFrameForming
, SUM(CASE WHEN PL.Name = 'Frame Welding' THEN PPI.Quantity ELSE 0 END) OnHandFrameWelding
, SUM(CASE WHEN PL.Name = 'Debur and Polish' THEN PPI.Quantity ELSE 0 END) OnHandDeburPolish
, SUM(CASE WHEN PL.Name = 'Specialized Paint' THEN PPI.Quantity ELSE 0 END) OnHandSpecializedPaint
, SUM(CASE WHEN PL.Name = 'Subassembly' THEN PPI.Quantity ELSE 0 END) OnHandSubassembly
, SUM(CASE WHEN PL.Name = 'Final Assembly' THEN PPI.Quantity ELSE 0 END) OnHandFinalAssembly
, SUM(PPI.Quantity) OnHand
, PP.ListPrice
, PP.StandardCost
, SUM(PPI.Quantity)*PP.StandardCost Value
FROM Production.ProductInventory PPI
JOIN Production.Product PP
ON PPI.ProductID = PP.ProductID
JOIN Production.Location PL
ON PPI.LocationID = PL.LocationID
WHERE PL.Name IN ('Paint Storage', 'Frame Forming', 'Frame Welding', 'Debur and Polish', 'Specialized Paint', 'Subassembly', 'Final Assembly')
GROUP BY PP.Name
, PP.ProductNumber
, PP.MakeFlag
, PP.FinishedGoodsFlag
, PP.ListPrice
, PP.StandardCost
HAVING SUM(PPI.Quantity)<>0
ORDER BY PP.Name

Note:
1. Probably you'll wonder why I put the constraint SUM(PPI.Quantity)<>0?! The explanation is simple, I met cases in which ERP systems were allowing negative values for On Hand, pretty cool, isn't it?
2. AdventureWorks database can be downloaded from Microsoft site.
3. I met many other cases in which this type of quries are very handy, in here I tried to exemplify the method in the easiest possible way, with minimum of effort :D.

14 September 2007

💎SQL Reloaded: The Stuff function

No matter how much experience we have in a programming language or a technology, there is always something new to learn. There are a few hardly used functions in SQL Server 2000, but they could be really useful in certain situations. One of such functions is Stuff, I discovered its use long time after I started to play with SQL. 

  Stuff ( character_expression , start , length , character_expression )

--inserting a string inside another without doing a replacement
SELECT Stuff('This is just a test', 3, 0, 'x')  -- Output: Thxis is just a test 

--inserting a string inside another without doing a replacement 
SELECT Stuff('This is just a test', 3, 5, 'at was') --Output: That was just a test  

 So, it could be useful when we check whether a character is on a certain position, and replace it with another character. Normally we would have to write something like: 

DECLARE @string varchar(50) 
SET @string = 'DE1988299X8829' 
SELECT 
    CASE WHEN Substring(@string, 10,1) = 'X' THEN Stuff(@string, 10, 1, 'Y') 
   ELSE @string 
END 
Output: DE1988299Y8829     

Another function I haven't saw too often in SQL logic is Replicate, yeah, it does exactly what it's name suggests - it takes a string and replicates it's content multiple times. 

 Replicate ( character_expression , integer_expression ) 

 SELECT Replicate ('0', 10) --Output: 0000000000 
 SELECT Replicate ('tone', 3) --Output: tonetonetone     

The function could be useful when we need to put a number of characters in front of a value. For example a table contains integer values, but in a report we need them with leading zeroes (e.g. 00056 instead of 56). I tried to implement such functionality as a function, in a previous posting with the help of Space function; using Space and Replace functions can be obtained the same result as using Replicate

  SELECT Replace(Space(3), ' ', 'tone') --Output: tonetonetone

💎SQL Reloaded: Useful Functions in Data Conversion

    When working with data, even if it's about reporting, data conversion or data analysis, there are a few useful functions which makes life easier. They deal with three types of basic variables and their variations: strings (e.g. char, varchar), numeric (e.g. int, decimal) and date (e.g. smalldatetime, datetime).  

1. Case Function

CASE input_expression 
        WHEN when_expression 
        THEN result_expression [ ...n ] 
       [ELSE else_result_expression] 
END   
        
Judging from experience, CASE is the most used function in data conversions, when the output value is based on on one or multiple constraints over the already existing values. 
  
CASE WHEN column1 = 'yes' OR column2 = 'other' THEN 'yes/other' END    

 Just be careful, when the ELSE is not provided, a null will be returned! 

Another important thing, the values returned must have the same base type, dates, strings or numeric values can't be mixed without conversion! 

--example 1 
CASE 
    WHEN DateDiff(d, DateValue, GetDate())>0 THEN 'past' WHEN DateDiff(d, DateValue, GetDate())=0 THEN 'today' 
    ELSE 'future' 
END 

--example 2 
CASE 
     WHEN column1 BETWEEN 1 AND 100 THEN '1st range' 
     WHEN column1 BETWEEN 101 AND 1000 THEN '2nd range' 
     ELSE '3rd range' 
END 

--example 3 
CASE 
    WHEN column1 IN ('A', 'B', 'C') THEN '1st set' 
    WHEN column1 IN ('C', 'D', 'E') THEN '2nd set' 
    ELSE '3rd set' 
END      

 There is a saying for DBAs and not only, "Know your data!". As each statement is checked until comparison operation evaluated True, it is better to arrange the WHEN expression in the descending order of highest occurrence. 

 2. IsNull Function

IsNull(expression)   
  
 It's always a good idea to check for NULL values, they could occur anytime when a default value is not provided for a value. One of the cases I often found was when joining two tables when one of the values from JOIN was NULL. 

-- example for string values 
SELECT * 
FROM table1 A 
         LEFT JOIN table2 B 
             ON A.column1 = B.column1 
           AND IsNull(A.column2, ' ') = IsNull(B.column2, ' ') 
WHERE B.column1 IS NULL  

-- example for numeric values 
SELECT * 
FROM table1 A 
      LEFT JOIN table2 B 
          ON A.column1 = B.column1  
        AND IsNull(A.column2, 0) = IsNull(B.column2, 0) 
WHERE B.column1 IS NULL 

The ' ' and 0 values should be used when they aren't found in the list of possible values or there is no impact on the result. Another used value for IsNull function is 'N/A', to point out the missing value. 

 3. Coalesce Function

COALESCE ( expression [ ,...n ] )  
       
There are cases when is needed to take the first not null value from a list of columns, could be dates or any type of other value. Instead of writting something like IsNull(Date1, IsNull(Date2, IsNull(Date3, GetDate()) it's much easier to write COALESCE(Date1, Date2, Date3, GetDate()) 

-- example coalesce 
SELECT * 
FROM table1 
WHERE COALESCE(Column1, Column2, Column3) IS NOT NULL 

-- example IsNull alternative 
SELECT * 
FROM table1 
WHERE Column1 IS NOT NULL 
        OR Column2 IS NOT NULL 
        OR Column3 IS NOT NULL 

 4. NullIf Function
  
NullIf ( expression , expression )    

NullIf function is handy especially when is not needed a certain value in output, for example 0 or the .  

-- example numeric values 
SELECT column1
, NullIf(column2, 0) column2
FROM table_name     

-- example string or date values 
SELECT column1
, NullIf(column2, '') column2
FROM table_name     

When multiple values are not needed, it's easier maybe to write 

-- example multiple checks 
 SELECT column1 
, CASE 
     WHEN column2 IN ('A', 'B', 'V') THEN NULL 
     ELSE column1 END 
FROM table_name 

-- alternative 
SELECT column1, NullIf(NullIf(NullIf(column2, 'A'), 'B'), 'V') 
FROM table_name

 5. Convert Function

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )       

 I used to use CAST instead CONVERT, however CONVERT is more powerful, especially when it comes to dates. SQL Server comes with a few dates formats obtain by providing a value for style parameter: 

SELECT CONVERT(varchar(10), GetDate(),101) -- US format: mm/dd/yyyy
SELECT CONVERT(varchar(10), GetDate(),102) -- ANSI format: yyyy.mm.dd
SELECT CONVERT(varchar(10), GetDate(),103) -- British/French dd/mm/yyyy
SELECT CONVERT(varchar(10), GetDate(),104) -- German format: dd.mm.yyyy
SELECT CONVERT(varchar(12), GetDate(),105) -- Italian format: dd-mm-yyyy
SELECT CONVERT(varchar(12), GetDate(),106) -- dd mon yyyy
SELECT Upper(CONVERT(varchar(12), GetDate(),106))   -- upper case of: dd mon yyyy
SELECT CONVERT(varchar(8), GetDate(),112) -- ISO format (preferred): yyyymmdd
SELECT CONVERT(varchar(10), GetDate(), 23) -- yyyy-mm-dd
SELECT CONVERT(varchar(23), GetDate(),126) -- yyyy-mm-ddThh:mi:ss.mmm

The conversion of a date to a string is mainly required by the need to have the date in a special format or not allow Excel to format the date to its default. 

 6. IsNumeric Function

IsNumeric ( expression )     

This function shouldn't miss when casting string values to numeric values and doing operations with them. 

SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE IsNumeric(column1)=1     

Please note that when multiple constraints are used, the IsNumeric function should appear first after the WHERE clause, a query like the next one might raise an error if one of the column1 values is not numeric. 

-- example 
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE column2 = 'value' 
AND IsNumeric(column1)=1

-- alternative writing (recommended)
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name WHERE IsNumeric(column1)=1 
   AND column2 = 'value' 

 7. IsDate Function

IsDate( expression )          

 It works the same way as IsNumeric, but on date data types. Thus if the provided values is a date, it will return 1, otherwise 0. 

  SELECT IsDate(GetDate()) 
  SELECT IsDate('11/31/2007') --november has only 30 days 
  SELECT CASE IsDate('11/31/2007') WHEN 1 THEN 'Yes' ELSE 'No' END 

 Note:     
For more information on the function is always a good idea to consult Microsoft Technical documentation.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.