Showing posts with label subquery. Show all posts
Showing posts with label subquery. Show all posts

16 May 2020

Query Patterns in SQL Server: Joins and Subqueries

The basis for being able to manipulate data via SQL scripting is a good knowledge of using joins and subqueries as seems fit for the purpose and data models. The following scripts are based on the tables created in a previous post.

-- FULL JOIN
SELECT *
FROM dbo.T_Courses C
     JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL JOIN (deprecated)
 SELECT *
 FROM dbo.T_Courses C
 , dbo.T_Students S
 WHERE C.CourseId = S.CourseId
 ORDER BY C.CourseName
 , S.StudentName 

--LEFT JOIN
 SELECT *
 FROM dbo.T_Courses C
      LEFT JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 WHERE S.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- RIGHT JOIN
 SELECT *
 FROM dbo.T_Courses C
      RIGHT JOIN dbo.T_Students S
       ON C.CourseId = S.CourseId 
 WHERE C.CourseId IS NULL
 ORDER BY C.CourseName
 , S.StudentName 

-- FULL OUTER JOIN
 SELECT *
 FROM dbo.T_Courses C
      FULL OUTER JOIN dbo.T_Students S
        ON C.CourseId = S.CourseId 
 --WHERE C.CourseId IS NULL
 --WHERE S.CourseId IS NULL
 --WHERE (C.CourseId IS NULL OR S.StudentId IS NULL)
 --WHERE (C.CourseId IS NULL AND S.StudentId IS NULL)
 ORDER BY C.CourseName
 , S.StudentName 

The IN, NOT IN, EXISTS and NOT EXISTS allow using correlated queries, their use being indicated when there are no actual data needed from the tables involved in the correlated queries:

-- EXISTS (correlated subquery)
SELECT *
FROM dbo.T_Courses C
WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
ORDER BY C.CourseName

-- NOT EXISTS (correlated subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE EXISTS (SELECT StudentId 
        FROM dbo.T_Students S
        WHERE C.CourseId = S.CourseId)
 ORDER BY C.CourseName

-- IN (subquery)
 SELECT *
 FROM dbo.T_Courses C
 WHERE CourseId IN (SELECT CourseId 
        FROM dbo.T_Students S)
 ORDER BY C.CourseName

Joining multiples tables is done using the same principles as above:

-- joins with more tables 
SELECT A.CourseId 
, C.CourseName 
, A.StudentId 
, S.StudentName 
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
      JOIN dbo.T_Courses C
        ON A.CourseId = C.CourseId 
      JOIN dbo.T_Students S
        ON A.StudentId = S.StudentId 
 ORDER BY C.CourseName 
 , S.StudentName 

One can obtain the same result via correlated subqueries (a technique often met between Oracle developers). From readability reasons I avoid writing such queries, unless there’s a special purpose to do so.

-- correlated subquery for individual values
SELECT A.CourseId 
, (SELECT C.CourseName 
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT S.StudentName 
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

When displaying values within a SELECT via a correlated subqueries, some developers feel the need to use MAX or MIN functions to make sure only one value will be returned. For data analysis it may be acceptable, however if the data model imposes it, then a redesign of the solution is more likely necessary.

-- correlated subquery for individual values
 SELECT A.CourseId 
 , (SELECT Max(C.CourseName)
    FROM dbo.T_Courses C
    WHERE A.CourseId = C.CourseId) CourseName
 , A.StudentId 
 , (SELECT Max(S.StudentName)
    FROM dbo.T_Students S
    WHERE A.StudentId = S.StudentId) StudentName
 , A.StartDate 
 , A.EndDate 
 FROM dbo.T_Allocations A
 ORDER BY CourseName 
 , StudentName 

Another technique not recommended is displaying one or more attributes from the same table with the same conditions via individual correlated queries. The use of aggregate functions is more appropriate however with numerical or date values.

13 June 2009

DBMS: Subquery (Definitions)

"A SELECT statement nested inside another SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery." (Patrick Dalton, "Microsoft SQL Server Black Book", 1997)

"A SELECT statement that is usually a part of a larger SQL statement; a query nested inside a SELECT, INSERT, UPDATE, or DELETE statement." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"A SELECT within another SQL statement, usually within another SELECT." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A query that is embedded in a main, or parent, query and used to assist in filtering the result set from a query." (Bob Bryla, "Oracle Database Foundations", 2004)

"A nested query that returns information to an outer query, thereby helping the outer query correctly identify results." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A complete SELECT statement that is part of another SELECT." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"A SELECT statement that contains one or more subqueries." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A query nested within another SQL statement. Unlike implicit queries, subqueries use a SELECT statement to retrieve data." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"A form of a full-select that is enclosed within parentheses and used as a component of a query." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

10 June 2009

DBMS: Correlated Subquery (Definitions)

 "A subquery that cannot be evaluated independently, but depends on the outer query for its results. Also called a repeating subquery, since the subquery is executed once for each row that might be selected by the outer query. See also nested query." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A repeating subquery. Many queries can be evaluated by executing the subquery once and substituting the resulting value or values into the WHERE clause of the outer query. In queries that include a correlated subquery, the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, one time for each row that is selected by the outer query." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A subquery that references a column in an outer statement. The inner query is executed for each candidate row in the outer statement." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A subquery that contains a reference to a column in the main, or parent, query." (Bob Bryla, "Oracle Database Foundations", 2004)

"A subquery that a DBMS cannot process completely before turning to the outer query. The DBMS must execute the subquery repeatedly for every row in the outer query." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"A subquery that contains a correlated reference." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

30 August 2008

Oracle Troubleshooting: ANSI 92 JOIN syntax error

 Lately I’ve been working a lot with Oracle APPS, doing mainly ad-hoc reporting. One of my nightmares is an Oracle bug related to ANSI 92 syntax:

  “ORA-01445: cannot select ROWID from, or sample, a join without a key-preserved table”

 Unfortunately, even if the bug was solved by Oracle, it seems the update was missed on some servers and the bug haunts my queries almost on a daily basis. 

 Having an SQL Server background and, for code clearness, I prefer ANSI 92 JOIN syntax:

SELECT A.column1, B.column2
FROM table1 A JOIN table2 B
 ON A.column1 = B.column2 

instead of using the old fashioned writing:

SELECT A.column1, B.column2
FROM table1 A , table2 B
WHERE A.column1 = B.column2

 In theory the two queries should provide the same output and have, hopefully, similar performance. The problem with ANSI 92 syntax is that, on some Oracle installations, when the number of joins exceeds a certain limit, usually greater than 7, the above error is thrown.

What one can do is to reduce the number of joins to the main table by restructuring the query and grouping multiple tables into subqueries, which are then joined to the main table. For the tables from which is returned only one column, one can move the table into the SELECT statement.

Happy coding! 

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.