For a long time aggregate functions were the only tool for statistical purposes available for raw SQL scripting. Their limitations become more evident with the introduction of window functions, which allow to apply the aggregates over a defined partition. Besides aggregate window functions, the use of ranking, respectively value window functions, opened the door to a new set of techniques. Here are a few examples based on the tables defined in a previous post.
Ranking windows functions allow ranking a record within the whole dataset or a partition by providing a sorting key:
-- Ranking window functions (RANK, DENSE_RANK, ROW_NUMBER) SELECT A.CourseId , C.CourseName , A.StudentId , S.StudentName , A.StartDate , A.EndDate , RANK() OVER (ORDER BY A.Mark) [Rank over whole set] , DENSE_RANK() OVER (ORDER BY A.Mark) [Dense Rank over whole set] , ROW_NUMBER() OVER (ORDER BY A.Mark) [Row Number over whole set] --, RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Rank over Course] --, RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Rank over Student] --, DENSE_RANK() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course] --, DENSE_RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student] --, ROW_NUMBER() OVER (PARTITION BY A.CourseId ORDER BY A.StartDate) [Dense Rank over Course] --, ROW_NUMBER() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) [Dense Rank over Student] FROM dbo.T_Allocations A JOIN dbo.T_Courses C ON A.CourseId = C.CourseId JOIN dbo.T_Students S ON A.StudentId = S.StudentId ORDER BY C.CourseName , S.StudentName
Being able to rank records allows to easier select the last, respectively the first records from a dataset or a given partition:
-- first course SELECT * , A.StartDate FROM dbo.T_Students S LEFT JOIN ( SELECT A.CourseId , A.StudentId , A.StartDate , RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate) Ranking FROM dbo.T_Allocations A ) A ON S.StudentId = A.StudentId AND A.Ranking = 1 ORDER BY S.StudentName
The aggregate window functions function similarly as their simple counterparts, except that they are valid over a partition. If the partition is left out, the functions apply over the whole data set.
-- aggregate window functions SELECT A.CourseId , C.CourseName , A.StudentId , S.StudentName , A.StartDate , A.EndDate , SUM(A.Mark) OVER (PARTITION BY A.StudentId) [Sum] , MAX(A.Mark) OVER (PARTITION BY A.StudentId) [Max] , MIN(A.Mark) OVER (PARTITION BY A.StudentId) [Min] , AVG(A.Mark) OVER (PARTITION BY A.StudentId) [Avg] , COUNT(A.Mark) OVER (PARTITION BY A.StudentId) [Count] FROM dbo.T_Allocations A JOIN dbo.T_Courses C ON A.CourseId = C.CourseId JOIN dbo.T_Students S ON A.StudentId = S.StudentId ORDER BY C.CourseName , S.StudentName
When the ordering attributes are specified, running averages, respectively sums derive:
-- Running averages/sums and ranking via Sum and ORDER BY SELECT A.StudentId , A.CourseId , A.StartDate , A.EndDate , A.Mark , AVG(Cast(Mark as decimal(4,2))) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) AvgGrade , SUM(Mark) OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) SumGrade , RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking FROM dbo.T_Allocations A ORDER BY A.StudentId , A.StartDate
-- Running averages/sums and ranking via common table expressions WITH CTE AS ( SELECT A.StudentId , A.CourseId , A.StartDate , A.EndDate , A.Mark , RANK() OVER (PARTITION BY A.StudentId ORDER BY A.StartDate, A.AllocationId) Ranking FROM dbo.T_Allocations A ), DAT AS ( SELECT A.StudentId , A.CourseId , A.StartDate , A.EndDate , A.Mark , A.Ranking , A.Mark SumMark FROM CTE A WHERE A.Ranking = 1 UNION ALL SELECT CTE.StudentId , CTE.CourseId , CTE.StartDate , CTE.EndDate , CTE.Mark , CTE.Ranking , CTE.Mark + DAT.Mark SumMark FROM CTE JOIN DAT ON CTE.StudentID = DAT.StudentId AND CTE.Ranking-1 = DAT.Ranking ) SELECT DAT.StudentId , DAT.CourseId , DAT.StartDate , DAT.EndDate , DAT.Mark , cast(DAT.SumMark/DAT.Ranking as decimal(4,2)) AvgMark , DAT.SumMark , DAT.Ranking FROM DAT ORDER BY DAT.StudentId , DAT.StartDate