17 May 2020

💎🏭SQL Reloaded: Query Patterns in SQL Server (Part IV: Window Functions)

For a long time aggregate functions were the only tool for statistical purposes available for raw SQL scripting. Their limitations become more evident with the introduction of window functions, which allow to apply the aggregates over a defined partition. Besides aggregate window functions, the use of ranking, respectively value window functions, opened the door to a new set of techniques. Here are a few examples based on the tables defined in a previous post.

Ranking windows functions allow ranking a record within the whole dataset or a partition by providing a sorting key:

-- Ranking window functions (RANK, DENSE_RANK, ROW_NUMBER)
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
, A.StartDate 
, A.EndDate 
, RANK() OVER (ORDER BY A.Mark) [Rank over whole set]
, DENSE_RANK() OVER (ORDER BY A.Mark) [Dense Rank over whole set]
, ROW_NUMBER() OVER (ORDER BY A.Mark) [Row Number over whole set]
--, RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Rank over Course]
--, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Rank over Student]
--, DENSE_RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course]
--, DENSE_RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student]
--, ROW_NUMBER() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course]
--, ROW_NUMBER() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student]
FROM dbo.T_Allocations A
     JOIN dbo.T_Courses C
       ON A.CourseId = C.CourseId 
     JOIN dbo.T_Students S
       ON A.StudentId = S.StudentId 
ORDER BY C.CourseName 
, S.StudentName 

Being able to rank records allows to easier select the last, respectively the first records from a dataset or a given partition:

-- first course 
SELECT *
, A.StartDate 
FROM dbo.T_Students S
     LEFT JOIN (
		SELECT A.CourseId
		, A.StudentId 
		, A.StartDate 
		, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) Ranking 
		FROM dbo.T_Allocations A
    ) A
   ON S.StudentId = A.StudentId 
  AND A.Ranking = 1 
ORDER BY S.StudentName

The aggregate window functions function similarly as their simple counterparts, except that they are valid over a partition. If the partition is left out, the functions apply over the whole data set.

-- aggregate window functions 
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
, A.StartDate 
, A.EndDate 
, SUM(A.Mark) OVER (PARTITION BY A.StudentId) [Sum]
, MAX(A.Mark) OVER (PARTITION BY A.StudentId) [Max]
, MIN(A.Mark) OVER (PARTITION BY A.StudentId) [Min]
, AVG(A.Mark) OVER (PARTITION BY A.StudentId) [Avg]
, COUNT(A.Mark) OVER (PARTITION BY A.StudentId) [Count]
FROM dbo.T_Allocations A
     JOIN dbo.T_Courses C
       ON A.CourseId = C.CourseId 
     JOIN dbo.T_Students S
       ON A.StudentId = S.StudentId 
ORDER BY C.CourseName 
, S.StudentName 

When the ordering attributes are specified, running averages, respectively sums derive:

-- Running averages/sums and ranking via Sum and ORDER BY
SELECT A.StudentId 
, A.CourseId
, A.StartDate
, A.EndDate
, A.Mark 
, AVG(Cast(Mark as decimal(4,2))) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) AvgGrade
, SUM(Mark) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) SumGrade
, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking
FROM dbo.T_Allocations A
ORDER BY A.StudentId
, A.StartDate

Running averages were introduced only with SQL Server 2008. Previously, one could still calculate them using common table expressions and a ranking function:
 
-- Running averages/sums and ranking via common table expressions 
WITH CTE AS (
	SELECT A.StudentId 
	, A.CourseId
	, A.StartDate
	, A.EndDate
	, A.Mark 
	, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking
	FROM dbo.T_Allocations A
), 
DAT AS (
	SELECT A.StudentId 
	, A.CourseId
	, A.StartDate
	, A.EndDate
	, A.Mark 
	, A.Ranking
	, A.Mark SumMark
	FROM CTE A
	WHERE A.Ranking = 1
	UNION ALL
	SELECT CTE.StudentId 
	, CTE.CourseId
	, CTE.StartDate
	, CTE.EndDate
	, CTE.Mark 
	, CTE.Ranking
	, CTE.Mark + DAT.Mark SumMark 
	FROM CTE
		 JOIN DAT
		   ON CTE.StudentID = DAT.StudentId 
		  AND CTE.Ranking-1 = DAT.Ranking 
) 
SELECT DAT.StudentId 
, DAT.CourseId
, DAT.StartDate
, DAT.EndDate
, DAT.Mark 
, cast(DAT.SumMark/DAT.Ranking as decimal(4,2)) AvgMark
, DAT.SumMark
, DAT.Ranking
FROM DAT
ORDER BY DAT.StudentId
, DAT.StartDate
   

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.