Showing posts with label RAND. Show all posts
Showing posts with label RAND. Show all posts

13 September 2009

💎SQL Reloaded: Randomized Data Selection

 A few years back I was asked to provide a random set of data from a data collection. Dealing only with a small number of records and as they were in Excel, it was much easier to use the Excel’s RAND function. Even if SQL Server provides a RAND function too, it returns the same value within a data set when the same parameter is provided. In order to avoid this behavior it would be useful if could be provided a distinct parameter value for each row, for example using the data set's integer unique identifier or eventually create one, thus if ID is such a column the query would look something like that: 

SELECT TOP 10 RAND(ID) PERC, ID, N1, N2 
FROM 
ORDER BY RAND(ID)     

The fact that RAND will return always the same value for the same parameter value might be a problem because the above query will always return the same records. In exchange you can use a multiplier for the parameter provided to RAND function - for example in a first attempt you can use RAND(id), in a second RAND(id*2), in a third RAND(id*3), etc.     

This approach offers unfortunately a relatively "weak" randomization which might be not sufficient for repetitive selections, therefore under SQL Server 2005+ could be used instead a CLR-based UDF (user defined function) and use the random functions available in VB, C# or any other CLR programming language, or even the .Net Random class designed to produce a sequence of numbers that meet certain statistical requirements for randomness.

23 September 2007

💎SQL Reloaded: 6 out of 49 (Part 1: Getting the data)

I was thinking to model the 6 out of 49 lottery drawing, so common across EU countries. To study numbers’ occurrences we need to import the historical data, though during my first attempts this proved to be challenging. Each lottery has a website from which data can be taken drawing by drawing or when possible, as a data dump (see notes). Moreover, some of the lotteries have variations for 6/49.       

To avoid any legal litigations, I will use a self-made set of data via the RAND function (which unfortunately is not a perfect randomizer). To store the generated data, I've created a table called LottoExtracts: 

-- cleaning the table
DROP TABLE IF EXISTS dbo.LottoExtracts

--creating the LottoExtracts table 
CREATE TABLE dbo.LottoExtracts(     
   DrawingDate date NULL
 , N1 smallint NULL
 , N2 smallint NULL
 , N3 smallint NULL
 , N4 smallint NULL
 , N5 smallint NULL
 , N6 smallint NULL)

This format will be used to upload the data in a staging table which is used later as source for our data model. As can be seen it has a Drawing Date which identifies uniquely an extraction, while in N1, …, N6 are stored the drawings’ numbers in the order they were picked. 

--creating random data 
DECLARE @index, @counter int 
DECLARE @N1, @N2, @N3, @N4, @N5, @N6 smallint 
  
SET @index = 0 
WHILE @index < 2000 
BEGIN 
  -- creating the first number 
  SET @N1 = Cast(1000 * RAND() as int) % 48 + 1 

  -- creating the second number 
  SET @counter = 1 
  WHILE @counter < 50 or @N1 = @N2 
  BEGIN 
	  SET @N2 = Cast(1000 * RAND() as int) % 48 + 1;
	  SET @counter = @counter + 1 
  END;

  -- creating the third number
  SET @counter = 1 
  WHILE @counter < 50 or @N3 IN (@N1, @N2) 
  BEGIN 
	  SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fourth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N4 IN (@N1, @N2, @N3) 
  BEGIN 
	  SET @N4 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the fifth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N5 IN (@N1, @N2, @N3, @N4) 
  BEGIN 
	  SET @N5 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  -- creating the sixth number 
  SET @counter = 1 
  WHILE @counter < 50 or @N6 IN (@N1, @N2, @N3, @N4, @N5) 
  BEGIN 
	  SET @N6 = Cast(1000 * RAND() as int) % 48 + 1 
	  SET @counter = @counter + 1 
  END 

  --inserting a new drawing 
  INSERT LottoExtracts(DrawingDate, N1, N2, N3, N4, N5, N6) 
  VALUES (DateAdd(d, @index*7, '01/01/2000'), @N1, @N2, @N3, @N4, @N5, @N6) 
  SET @index = @index + 1 
END 

I've created only 200 records, which equates with almost 4 years of history, and should be enough for testing purposes. Of course, one can increase or decrease the number of drawings as needed! 
 
--checking the data 
SELECT * 
FROM dbo.LottoExtracts   

In what concerns the code, the following statement creates a random number in 1 … 49 range:
 
-- creating first number 
SET @N1 = Cast(1000 * RAND() as int) % 48 + 1    

Thus, can be created all 6 numbers, however because the same number can't be extracted twice in the same drawing, I had to add duplication checks:
 
-- creating third number 
SET @counter = 1 
WHILE @counter < 50 or @N3 IN (@N1, @N2) 
BEGIN 
	SET @N3 = Cast(1000 * RAND() as int) % 48 + 1 
	SET @counter = @counter + 1 
END    

Notes:
(1) The code is verbose and probably suboptimal, though it's a must because of the behavior of RAND function as part of transactions.
(2) The below links could change in time. I provided them just to make easier the search. (*) denotes that historical data can be downloaded.
Previous Post <<||>> Next Post

13 September 2007

💎SQL Reloaded: Preparing Test Data

There are cases in which we need to create test data in order to check a certain functionality. In this post I'll show how can be created data with the help of loops. For this let's create a simple table containing 5 fields: 

-- creating the test table 
CREATE TABLE Table1( 
ID int NOT NULL 
, [Year] int NOT NULL 
, Product varchar(30) NOT NULL 
, Price decimal(13,2) NOT NULL 
, Quantity decimal(13,3) NOT NULL) 

-- dropping the test table 
--DROP TABLE Table1 

-- deleting the data 
--TRUNCATE TABLE Table1          
        
Usually I'm commenting DROP and TRUNCATES so I can avoid the modification of data by mistake.

 And now the loop, with the help of RAND and modulo the values can be varied:

-- inserting the data
DECLARE @index int 
SET @index = 1 
WHILE @index < 99
BEGIN 
      INSERT Table1(ID, [Year], Product, Price, Quantity) 
      VALUES (@index, 2000 + (@index % 4), 'Product ' + Cast(@index%10 as char(3)), 10 * RAND(), 100 * RAND()) SET @index = @index + 1 
END 


--testing the output 
SELECT * 
FROM Table1     

-- using the data
SELECT [Year], Product, SUM(Price * Quantity) Amount 
FROM Table1 
GROUP BY [Year], Product
ORDER BY [Year], Product 
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.