The most basic information one can get is the number of drawings in which a number occurred, respectively which was its frequency:
-- number & frequency drawings by number
DECLARE @NumberDrawings int = 2000
SELECT Number
, Cast(100.0*count(*)/@NumberDrawings as decimal(10,2)) Frequency
, count(*) NumberDrawings
FROM dbo.vLottoNumbers
GROUP BY Number
ORDER BY Number
One of the interesting things to see in a data set of drawings is the distance between two drawings in which appears the same numbers. It doesn’t look too easy to compare two consecutive records within the same query; actually this can be done easy with the help of DENSE_RANK function (works only on SQL Server 2005+ and Oracle), which ranks the data within a partition, thus if two values are identical, they have the same ranking. Let’s see how DENSE_RANK function works:
-- ranked drawings by number
SELECT Number
, DrawingDate
, DENSE_RANK() OVER (PARTITION BY Number ORDER BY DrawingDate) Ranking
FROM dbo.vLottoNumbers
The partition is created by Number, while the ranking is delimited by DrawingDate. Now all we have to do is to join two such queries by Number with two consecutive Rankings:
--creating the view
CREATE VIEW dbo.vLottoConsecutiveDrawings
AS
-- consecutive drawings by numbers
WITH DAT
AS (
SELECT Number
, DrawingDate
, DENSE_RANK() OVER (PARTITION BY Number ORDER BY DrawingDate) Ranking
FROM dbo.vLottoNumbers
)
SELECT A.Number
, A.DrawingDate
, B.DrawingDate NextDrawingDate
, IsNull(DateDiff(d, A.DrawingDate, B.DrawingDate), 0) DifferenceDays
, IsNull(DateDiff(wk, A.DrawingDate, B.DrawingDate), 0) DifferenceWeeks
FROM DAT A
LEFT JOIN DAT B
ON A.Number = B.Number
AND A.Ranking = B.Ranking - 1
The first interesting thing to find out is:
- What’s the average number of occurrences?
- What’s the minimum/maximum number of consecutive drawings in which the number hasn’t appeared?
- How many times a number appeared?
The following query answers to these questions, by doing a simple grouping by Number over vLottoConsecutiveDrawings output:
-- minimum/maximum differences between drawings
SELECT Number
, Min(NullIf(DifferenceWeeks, 0)) Minimum
, Max(NullIf(DifferenceWeeks, 0)) Maximum
, Avg(NullIf(DifferenceWeeks, 0)) Average
, count(*) NumberDrawings
FROM dbo.vLottoConsecutiveDrawings
GROUP BY Number
ORDER BY Number
The output doesn’t give much information, but it’s a good start. The study interval can be broken down by using the DrawingDate in GROUP and/or WHERE clause:
-- minimum/maximum differences between drawings by year
SELECT Year(DrawingDate) DrawingYear
, Number
, Min(NullIf(DifferenceWeeks, 0)) Minimum
, Max(NullIf(DifferenceWeeks, 0)) Maximum
, Avg(NullIf(DifferenceWeeks, 0)) Average
, count(*) NumberDrawings
FROM vLottoConsecutiveDrawings
--WHERE Year(DrawingDate) IN (2000, 2001)
GROUP BY Number
, Year(DrawingDate)
ORDER BY Number, DrawingYear
It would be also interesting to know how many numbers fall on a given row or column within the 7x7 matrix:
-- aggregating data by drawing & row
SELECT LN.DrawingDate
, SUM(CASE WHEN LN.Number BETWEEN 1 AND 7 THEN 1 ELSE 0 END) Bucket1
, SUM(CASE WHEN LN.Number BETWEEN 8 AND 14 THEN 1 ELSE 0 END) Bucket2
, SUM(CASE WHEN LN.Number BETWEEN 15 AND 21 THEN 1 ELSE 0 END) Bucket3
, SUM(CASE WHEN LN.Number BETWEEN 22 AND 28 THEN 1 ELSE 0 END) Bucket4
, SUM(CASE WHEN LN.Number BETWEEN 29 AND 35 THEN 1 ELSE 0 END) Bucket5
, SUM(CASE WHEN LN.Number BETWEEN 36 AND 42 THEN 1 ELSE 0 END) Bucket6
, SUM(CASE WHEN LN.Number BETWEEN 43 AND 49 THEN 1 ELSE 0 END) Bucket7
FROM dbo.vLottoNumbers LN
GROUP BY LN.DrawingDate
-- aggregating data by drawing & column
SELECT LN.DrawingDate
, SUM(CASE WHEN LN.Number %7 = 1 THEN 1 ELSE 0 END) Bucket1
, SUM(CASE WHEN LN.Number %7 = 2 THEN 1 ELSE 0 END) Bucket2
, SUM(CASE WHEN LN.Number %7 = 3 THEN 1 ELSE 0 END) Bucket3
, SUM(CASE WHEN LN.Number %7 = 4 THEN 1 ELSE 0 END) Bucket4
, SUM(CASE WHEN LN.Number %7 = 5 THEN 1 ELSE 0 END) Bucket5
, SUM(CASE WHEN LN.Number %7 = 6 THEN 1 ELSE 0 END) Bucket6
, SUM(CASE WHEN LN.Number %7 = 7 THEN 1 ELSE 0 END) Bucket7
FROM dbo.vLottoNumbers LN
GROUP BY LN.DrawingDate
The buckets can be aggregated as follows (observe the replacement of SUM with MAX):
-- consolidated drawings by 7 buckets (row)
SELECT SUM(Bucket1) Bucket1
, SUM(Bucket2) Bucket2
, SUM(Bucket3) Bucket3
, SUM(Bucket4) Bucket4
, SUM(Bucket5) Bucket5
, SUM(Bucket6) Bucket6
, SUM(Bucket7) Bucket7
FROM (
-- aggregating data by drawing & row
SELECT LN.DrawingDate
, Max(CASE WHEN LN.Number BETWEEN 1 AND 7 THEN 1 ELSE 0 END) Bucket1
, Max(CASE WHEN LN.Number BETWEEN 8 AND 14 THEN 1 ELSE 0 END) Bucket2
, Max(CASE WHEN LN.Number BETWEEN 15 AND 21 THEN 1 ELSE 0 END) Bucket3
, Max(CASE WHEN LN.Number BETWEEN 22 AND 28 THEN 1 ELSE 0 END) Bucket4
, Max(CASE WHEN LN.Number BETWEEN 29 AND 35 THEN 1 ELSE 0 END) Bucket5
, Max(CASE WHEN LN.Number BETWEEN 36 AND 42 THEN 1 ELSE 0 END) Bucket6
, Max(CASE WHEN LN.Number BETWEEN 43 AND 49 THEN 1 ELSE 0 END) Bucket7
FROM dbo.vLottoNumbers LN
GROUP BY LN.DrawingDate
) DAT
Happy coding!