Showing posts with label query patterns. Show all posts
Showing posts with label query patterns. Show all posts

20 May 2020

Query Patterns in SQL Server: Matrix Representation

In a previous post on Project Management (PM) I used Excel to represent the data into a matrix format. The same output can be obtained by using SQL in a simple query based on a CROSS JOIN between two numeric intervals created via a CTE within a table-valued function. For this purpose I used an old piece of code which displays the values within an interval given a step. It’s the typical loop with a specified step.
I have made mainly two changes to the old example - I defined the start, end and step to be numeric, while considering also an index into the output table. Here’s the modified function:

-- SQL Server 2005+: Numeric Interval 
CREATE FUNCTION dbo.fGetNInterval( 
  @Start numeric(18,2)
, @End numeric(18,2)
, @Step numeric(18,2)) 
RETURNS @Interval TABLE( 
  Ind int PRIMARY KEY
, Val numeric(18,2)) 
AS 
BEGIN 
 DECLARE @Sign smallint 
 SET @Sign = IsNull(NullIf(Sign(@End-@Start), 0), 1); 
 WITH CTE(Ind, Val) 
 AS( 
   SELECT Cast(1 as int) Ind
   , Cast(COALESCE(@Start, @End, 0) as numeric(18,2)) Val 
   UNION ALL 
   SELECT cast(Ind+1 as int) Ind
   , Cast(CTE.Val + @Step * @Sign as numeric(18,2)) Val 
   FROM CTE 
   WHERE (@End-CTE.Val)*@Sign-@Step>=0 
 ) 
 INSERT @Interval 
 SELECT Ind
 , Val 
 FROM CTE 
 OPTION(MAXRECURSION 0) 
 RETURN 
END 
As it can be seen, the function can be used to step forwards as well backwards:

--descending sequence 
SELECT UT.Val, 100/UT.Val
FROM dbo.fGetNInterval(100, 25, 5) UT

-- ascending sequence 
SELECT *
FROM dbo.fGetNInterval(1, 2, 0.25)
Based on these two sequences the matrix query can be written as follows:

-- matrix representation
DECLARE @EE as numeric(18,2) = 100
SELECT UT.Val [UT %]
, Max(CASE WHEN QF.Ind = 1 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [80%]
, Max(CASE WHEN QF.Ind = 2 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [85%]
, Max(CASE WHEN QF.Ind = 3 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [90%]
, Max(CASE WHEN QF.Ind = 4 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [95%]
, Max(CASE WHEN QF.Ind = 5 THEN Cast(QF.Val*@EE*100/UT.Val as numeric(18,1)) END) [100%]
FROM dbo.fGetNInterval(100, 25, 5) UT
, dbo.fGetNInterval(1, 2, 0.25) QF
GROUP BY UT.Ind
, UT.Val

Here’s the output:

Note:
The indexes were used to display the values into the needed format. The table-valued function could be created also without the respective indexes, however it would have complicated the query.

17 May 2020

Query Patterns in SQL Server: Aggregate Functions

As their names denote, aggregate functions are used to summarize information across the whole data set or based on several attributes. The following queries exemplify the use of aggregates based on the tables created in a previous post:
-- aggreates over the whole table 
SELECT COUNT(*) NoRecords 
, MIN(StartDate) StartDate
, MAX(EndDate) EndDate
, DateDiff(d, MIN(StartDate), MAX(EndDate)) [DiffDays]
FROM dbo.T_Allocations A

-- aggregates based on several several attributes 
 SELECT A.CourseId
 , A.StudentId 
 , COUNT(*) NoRecords 
 , MIN(StartDate) StartDate
 , MAX(EndDate) EndDate
 , DateDiff(d, MIN(StartDate), MAX(EndDate)) [DiffDays]
 FROM dbo.T_Allocations A
GROUP BY A.CourseId
 , A.StudentId 

Correlated subqueries can be of help either inside of a SELECT or an CROSS, respectively OUTER APPLY: 

-- correlated subquery within a SELECT 
 SELECT *
 , (SELECT COUNT(*) NoRecords 
     FROM dbo.T_Allocations A
     WHERE C.CourseId = A.CourseId
     GROUP BY A.CourseId) NoRecords 
 FROM dbo.T_Courses C


 -- CROSS APPLY (correlated subquery)
 SELECT *
 , A.NoRecords
 , A.StartDate
 FROM dbo.T_Courses C
      CROSS APPLY (
         SELECT A.CourseId
         , COUNT(*) NoRecords 
         , MIN(StartDate) StartDate
         FROM dbo.T_Allocations A
         WHERE C.CourseId = A.CourseId
         GROUP BY A.CourseId
     ) A
 ORDER BY C.CourseName

-- OUTER APPLY (correlated subquery)
 SELECT *
 , A.NoRecords
 , A.StartDate
 FROM dbo.T_Courses C
      OUTER APPLY (
         SELECT A.CourseId
         , COUNT(*) NoRecords 
         , MIN(StartDate) StartDate
         FROM dbo.T_Allocations A
         WHERE C.CourseId = A.CourseId
         GROUP BY A.CourseId
     ) A
 ORDER BY C.CourseName

Aggregates are useful in a series of techniques like the listing of duplicates:

-- identifying duplicates via an aggregate 
SELECT A.*
FROM dbo.T_Allocations A
     JOIN (
  SELECT A.CourseId
  , A.StudentId 
  , COUNT(*) NoRecords 
  FROM dbo.T_Allocations A
  GROUP BY A.CourseId
  , A.StudentId 
  HAVING COUNT(*)>1
  ) DUP
   ON A.CourseId = DUP.CourseId 
  AND A.StudentId = DUP.Studentid
ORDER BY A.CourseId
, A.StudentId 

A similar technique can be used to remove the duplicates from the base table:

-- removing the duplicates except the last record  
 DELETE dbo.T_Allocations 
 FROM (
 SELECT A.StudentId 
 , A.CourseId
 , A.StartDate
 , A.EndDate
 , Max(A.AllocationId) AllocationId
 FROM dbo.T_Allocations A
 GROUP BY A.StudentId 
 , A.CourseId
 , A.StartDate
 , A.EndDate
 HAVING count(*)>1
  ) A
WHERE dbo.T_Allocations.CourseId = A.CourseId
  AND dbo.T_Allocations.StudentId = A.StudentId
  AND dbo.T_Allocations.StartDate = A.StartDate
  AND dbo.T_Allocations.EndDate = A.EndDate
  AND dbo.T_Allocations.AllocationId <> A.AllocationId

When the number of values is fixed and unique within a grouping, one can use aggregate functions to display the values within a matrix:

-- Matrix display via aggregates 
SELECT S.StudentId 
, S.StudentName 
, CASE WHEN A.Course1 = 1 THEN 'x' ELSE '-' END Course1
, CASE WHEN A.Course2 = 1 THEN 'x' ELSE '-' END Course2
, CASE WHEN A.Course3 = 1 THEN 'x' ELSE '-' END Course3
, CASE WHEN A.Course4 = 1 THEN 'x' ELSE '-' END Course4
, CASE WHEN A.Course5 = 1 THEN 'x' ELSE '-' END Course5
FROM dbo.T_Students S
     LEFT JOIN (
  SELECT A.StudentId 
  , Max(CASE WHEN A.CourseId = 1 THEN 1 ELSE 0 END) Course1
  , Max(CASE WHEN A.CourseId = 2 THEN 1 ELSE 0 END) Course2
  , Max(CASE WHEN A.CourseId = 3 THEN 1 ELSE 0 END) Course3
  , Max(CASE WHEN A.CourseId = 4 THEN 1 ELSE 0 END) Course4
  , Max(CASE WHEN A.CourseId = 5 THEN 1 ELSE 0 END) Course5
  FROM dbo.T_Allocations A
  GROUP BY A.StudentId
    ) A
   ON S.StudentId = A.StudentId 
ORDER BY S.StudentName 

16 May 2020

Query Patterns in SQL Server: Joins and Subqueries

The basis for being able to manipulate data via SQL scripting is a good knowledge of using joins and subqueries as seems fit for the purpose and data models. The following scripts are based on the tables created in a previous post.

-- FULL JOIN
SELECT *
FROM dbo.T_Courses C
     JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL JOIN (deprecated)
 SELECT *
 FROM dbo.T_Courses C
 , dbo.T_Students S
 WHERE C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

--LEFT JOIN
 SELECT *
 FROM dbo.T_Courses C
      LEFT JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 WHERE S.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- RIGHT JOIN
 SELECT *
 FROM dbo.T_Courses C
      RIGHT JOIN dbo.T_Students S
       ON C.CourseId = S.CourseId 
 WHERE C.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL OUTER JOIN
 SELECT *
 FROM dbo.T_Courses C
      FULL OUTER JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 --WHERE C.CourseId IS NULL
 --WHERE S.CourseId IS NULL
 --WHERE (C.CourseId IS NULL OR S.StudentId IS NULL)
 --WHERE (C.CourseId IS NULL AND S.StudentId IS NULL)
 ORDER BY C.CourseName
 , S.StudentName 

The IN, NOT IN, EXISTS and NOT EXISTS allow using correlated queries, their use being indicated when there are no actual data needed from the tables involved in the correlated queries:

-- EXISTS (correlated subquery)
SELECT *
FROM dbo.T_Courses C
WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
ORDER BY C.CourseName

-- NOT EXISTS (correlated subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
 ORDER BY C.CourseName

-- IN (subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE CourseId IN (SELECT CourseId 
        FROM dbo.T_Students S)
 ORDER BY C.CourseName

Joining multiples tables is done using the same principles as above:

-- joins with more tables 
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
      JOIN dbo.T_Courses C
        ON A.CourseId = C.CourseId 
      JOIN dbo.T_Students S
        ON A.StudentId = S.StudentId 
 ORDER BY C.CourseName 
 , S.StudentName 

One can obtain the same result via correlated subqueries (a technique often met between Oracle developers). From readability reasons I avoid writing such queries, unless there’s a special purpose to do so.

-- correlated subquery for individual values
SELECT A.CourseId 
, (SELECT C.CourseName 
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT S.StudentName 
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

When displaying values within a SELECT via a correlated subqueries, some developers feel the need to use MAX or MIN functions to make sure only one value will be returned. For data analysis it may be acceptable, however if the data model imposes it, then a redesign of the solution is more likely necessary.

-- correlated subquery for individual values
 SELECT A.CourseId 
 , (SELECT Max(C.CourseName)
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT Max(S.StudentName)
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
 FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

Another technique not recommended is displaying one or more attributes from the same table with the same conditions via individual correlated queries. The use of aggregate functions is more appropriate however with numerical or date values.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.