SQL Troubles: inline view

Showing posts with label inline view. Show all posts

30 July 2010

💎SQL Reloaded: Self-Join in Update Query II

In yesterday’s post on the same topic I tried to exemplify how a self-join update query could be written in order to avoid an error raised by the database engine. Even if this type of query is normally written for hierarchical structures consisting of one-level parent-child relations stored in the same table, my example was using a simple join based directly on the primary key. During the day I was thinking that might be straightforward to create a simple hierarchical structure based on a relatively realistic example. For this let’s consider a department in which the performance of the the manager is a function of the performance of its employees, while the number of hours of training allocated to each employee is proportional to the number of hours allocated to its manager. For this let’s consider the following table:

--Employee table's script 
CREATE TABLE [dbo].[Employees]( 
[EmployeeID] [int]  NOT NULL 
, [ManagerID] [int]  NULL 
, [Name] [nvarchar] (50) NULL 
, [Performance] [numeric]  (5, 3) NULL 
, [TrainingHours] [numeric]  (5, 2) NULL ) 
ON [PRIMARY] 

-- inserting the test data 
INSERT INTO dbo.Employees 
VALUES (1, NULL, 'Joe', NULL, 10) 
, (2, 1, 'Jack', .65, NULL) 
, (3, 1, 'Mary', .45, NULL) 
, (4, 1, 'Ross', .54, NULL) 
, (5, 1, 'Scott', .48, NULL) 
, (6, NULL, 'Jane', NULL, 15) 
, (7, 6, 'Sam', .50, NULL) 
, (8, 6, 'Ron', .45, NULL)

In order to calculate Manager’s performance we could use a query similar with the following, of course the formula could be more complicated and not a simple average:

-- updating Manager's Performance 
UPDATE dbo.Employees 
SET Performance = EMP.Performance 
FROM ( -- inline view 
     SELECT ManagerID 
     , AVG(Performance) Performance 
     FROM dbo.Employees 
     WHERE ManagerID IS NOT NULL 
     GROUP BY ManagerID 
) EMP 
WHERE dbo.Employees.ManagerID IS NULL 
AND dbo.Employees.EmployeeID = EMP.ManagerID

Let’s check the updates and their correctitude:

-- Checking updated data 
SELECT * 
FROM dbo.Employees 
WHERE ManagerID IS NULL -- Verifying output SELECT ManagerID 
, AVG(Performance) Performance 
FROM dbo.Employees 
WHERE ManagerID IS NOT NULL 
GROUP BY ManagerID

Note:
    The average needs to be calculated in the inline view. In case are retrieved more records for each record from the updated table, the query will still work though the result is “unpredictable”:

-- updating Manager's Performance w/o aggregates 
UPDATE dbo.Employees 
SET Performance = EMP.Performance 
FROM ( -- inline view 
     SELECT ManagerID 
     , Performance 
     FROM dbo.Employees 
     WHERE ManagerID IS NOT NULL 
) EMP WHERE dbo.Employees.ManagerID IS NULL 
AND dbo.Employees.EmployeeID = EMP.ManagerID

The reverse update based on the number of hours of training could be written as follows:

-- updating Employees' Training Hours 
UPDATE dbo.Employees 
SET TrainingHours = 0.75 * EMP.TrainingHours 
FROM ( -- inline view 
    SELECT EmployeeID 
    , TrainingHours 
    FROM dbo.Employees 
   WHERE ManagerID IS NULL 
) EMP 
WHERE dbo.Employees.ManagerID IS NOT NULL 
AND dbo.Employees.ManagerID = EMP.EmployeeID       
      
-- Checking updated data 
SELECT * 
FROM dbo.Employees 
WHERE ManagerID IS NOT NULL -- Verifying output SELECT EmployeeID 
, 0.75 * TrainingHours TrainingHours 
FROM dbo.Employees 
WHERE ManagerID IS NULL

28 July 2010

💎SQL Reloaded: Self-Join in Update Query I

While reading R. Scheldon’s article on “UPDATE Basics in SQL Server” I remembered about a problem I had long time ago while attempting to do a self-join in an Update query. Such a query is quite useful in hierarchical structures consisting of one-level parent-child relations stored in the same table, when needed to update the parent based on child information, or vice-versa. The problem I had could be also exemplified by using a simple table (no hierarchical structure), here’s the query:

-- self-join update - problematic query 
UPDATE Production.Product 
SET StandardCost = ITM.StandardCost* (1+.012) 
FROM Production.Product ITM 
WHERE Production.Product.ProductID = ITM.ProductID

The above statement returns the following error:
Msg 4104, Level 16, State 1, Line 5
The multi-part identifier "Production.Product.ProductID" could not be bound.

In order to avoid this error message, the table from the FROM clause could be included in an inline view, something like:

-- self-join update - solution 
UPDATE Production.Product 
SET StandardCost = ITM.StandardCost*(1+.012) 
FROM ( -- inline view 
    SELECT * 
    FROM Production.Product ITM 
    ) ITM  
WHERE Production.Product.ProductID = ITM.ProductID

The inline view could be replaced with a standard view, table-valued UDF or the update could be done through a view. For exemplification I will use only the first and third case, here’s the view based on Production.Product table and the queries corresponding to the two cases:

-- view based on Production.Product 
CREATE VIEW Production.vProductTest 
AS 
SELECT *  
FROM Production.Product   

-- self-join update through view update 
UPDATE Production.vProductTest 
SET StandardCost = ITM.StandardCost* (1+.012) 
FROM Production.Product ITM 
WHERE Production.vProductTest.ProductID = ITM.ProductID     

-- self-join update from view 
UPDATE Production.Product 
SET StandardCost = ITM.StandardCost* (1+.012) 
FROM Production.vProductTest ITM 
WHERE Production.Product.ProductID = ITM.ProductID

Note:
1. The first update query was written in this way only to exemplify the self-update within an hierarchical structure, if it’s needed to modify the StandardCost and nothing more, then the update could be written simply in any of the following forms:

-- Query 1: update statement 
UPDATE Production.Product 
SET StandardCost = StandardCost* (1+.012) 

--Query 2: update statement 
UPDATE ITM 
SET StandardCost = StandardCost * (1+.012) 
FROM Production.Product ITM

2. Because the join constraint is based on a hierarchical structure that considers in the join different columns, the query can’t be written as follows:

 
--simple update statement 

UPDATE Production.Product 
SET StandardCost = ITM.StandardCost* (1+.012) 
FROM Production.Product ITM

This query is nothing more than an alternative to the queries shown in the first note.

03 April 2010

💎SQL Reloaded: The Power of Joins IV (Multi-Joins Under Control)

       Few are the queries based only on the join between two tables, each additional table added to the query involving an additional join with one or more of the already considered tables. Such cases in which more than one join is involved in a query are referred as multi-joins or multiple joins, the joins could involve the same join operator or it can be a mixture of the horizontal or vertical joins. In this context can be discussed about join precedence, in other words the way in which a database engine processes the joins, on how to control the order in which the joins are processed, on how to use the joins in order to model a solution for a given problem or how the constraints are considered at macro level.

      The order in which a database engine processes the joins is highly dependent on queries, tables and servers’ characteristics, and it’s quite a comprehensive topic that deserves an ample discussion that could span several posts. Keeping it simple, once the execution plan is determined, the engine starts with the first join, the result being joined with the next table, and so on. The question is how could we make the database engine to apply a certain processing order and how could we facilitate the whole process. Given the fact that in order to solve a problem often we have only 1 or 2 solutions that together with the nature of the relationship between tables involve dictates the nature (types) and the order of the joins, I can’t they there is much flexibility from this point of view. In the end it resumes in knowing the rules that govern the joins, for example in case of commutative join operators it doesn’t really matter how the joins are written, however special attention must be given to non-commutative operations (e.g. left/right outer joins) and the implications of mixing outer with inner joins.

    One approach is to start with the table having the lowest level of detail or the row-preserving table, and first add successively the child tables with the lower cardinality (number of records) and then the other tables in increasing level of detail. On the other side it depends also on the problems needed to be solved. Taking for example the Production.Product, Purchasing.PurchaseOrderDetail and Purchasing.PurchaseOrderHeader tables from AdventureWorks database a query showing the lowest level of details would start with the PO (Purchase Order) Details, and considering that for each PO Detail should exist a PO Header and a Product, could be used an inner join in order to add the PO Header and Product information:

-- PO Details 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Purchasing.PurchaseOrderDetail POD  
    JOIN Production.Product ITM 
      ON POD.ProductID = ITM.ProductID 
   JOIN Purchasing.PurchaseOrderHeader POH 
     ON POD.PurchaseOrderID = POH.PurchaseOrderID 
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

Another view within the same data could be based on a requirement to have the Products as row-preserving table, and because there could be Products without a PO placed, this requires to have a left join between the Products and PO Details. In theory because there should be a PO Header for each PO Detail, an inner join would be enough, however an inner join will restrain the scope of the query only to the PO Details eliminating thus the Products without a PO Placed, so a left join must be used between PO Details and PO Headers:

-- Products with PO Details 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
    LEFT JOIN Purchasing.PurchaseOrderDetail POD                
       ON ITM.ProductID = POD.ProductID  
     LEFT JOIN Purchasing.PurchaseOrderHeader POH           
      ON POD.PurchaseOrderID = POH.PurchaseOrderID 
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
-- OR POD.ProductID IS NULL

In case a constraint needs to be added based on the attributes from the null-preserving table (e.g. the commented constraint based on the PO Status), then the overall constraint should be modified in order to reflect the need to show the Products without a PO placed, this being the purpose of the second commented constraint.
Each table referenced by the PO Details or PO Headers and the tables further referenced by the respective tables, when added to the above query require a left join in order to keep the level of detail of the query. On the other side the left joins are performing poorer than the inner joins, so inner joins should be used whenever is possible. A technique to reduce the number of left joins in a query and accommodate the non-join constraints based on the null-preserving tables is to include such tables in a view or inline view. Here’s the modified above query using an inline view for PO Details:

-- Products with PO Details (inline view) 
SELECT POD.PurchaseOrderDetailID , POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
    LEFT JOIN ( -- PO Details 
    SELECT POD.PurchaseOrderDetailID  
    , POD.PurchaseOrderID 
    , POD.ProductID     , POD.UnitPrice 
     , POD.OrderQty 
     FROM Purchasing.PurchaseOrderDetail POD  
        JOIN Purchasing.PurchaseOrderHeader POH  
           ON POD.PurchaseOrderID = POH.PurchaseOrderID 
       --WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete  
     ) POD  
     ON ITM.ProductID = POD.ProductID

A few years back I found an interesting article of Itzik Ben-Gan on undocumented join features existing in SQL Server, in the respective article he was showing how parentheses and join constraints’ order could be used to control the order in which joined are processed. Parentheses could be used to group the tables and the join between them, allowing thus to perform an inner join between PO Details and PO Headers, the results following to represent the null-preserving table joined with Products table:

-- Products with PO Details (parentheses grouping) 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
     LEFT JOIN ( 
           Purchasing.PurchaseOrderDetail POD  
           JOIN Purchasing.PurchaseOrderHeader POH  
               ON POD.PurchaseOrderID = POH.PurchaseOrderID 
        --AND POH.Status IN (2, 4)-- 2-Approved, 4-Complete 
       )  
     ON ITM.ProductID = POD.ProductID

The second undocumented method is based on tables and join constraints’ order, being possible to list the table in the reverse order in which they need to be processed and thus accommodate the right type of join in between, the join constraints following the same reverse logic. Thus the PO Headers could be placed the last in the list together with the PO Details and use a inner join between them, the result between them being joined to Products using a left join.

-- Products with PO Details (condition order) 
SELECT POD.PurchaseOrderDetailID  
, POD.PurchaseOrderID 
, ITM.ProductID  
, ITM.ProductNumber  
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM Production.Product ITM  
     LEFT JOIN Purchasing.PurchaseOrderDetail POD 
        LEFT JOIN Purchasing.PurchaseOrderHeader POH 
           ON POD.PurchaseOrderID = POH.PurchaseOrderID 
         ON ITM.ProductID = POD.ProductID  
--WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
-- OR POD.ProductID IS NULL

02 April 2010

💎SQL Reloaded: The Power of Joins III (Vertical Joins)

Introduction

A vertical join allows the possibility of retrieving data from multiple datasets by combining the attributes from one or more datasets on the same row, being possible to include complex expressions using any combination of attributes. The join within a vertical join is based only on the attributes specified in the join constraint(s), typically each constraint taking the form of an identity, in this case the data types must match, though also complex expressions evaluated to a value of true/false could be used.

Within SQL Server 2000 were possible two syntaxes for vertical joins – the ANSI and non-ANSI syntaxes, however in later versions the non-ANSI syntax could be enabled only by changing database’s compatibility level to 80, Microsoft recommends using only the ANSI standard because the non-ANSI syntax could be dropped in future versions. In ANSI syntax are used the LEFT OTER JOIN, RIGHT OUTER JOIN, INNER JOIN and FULL OUTER JOIN operators in the FROM clause while in non-ANSI syntax the join constraints appear in WHERE clause, being used the *= and =* operators for LEFT OUTER JOIN, respectively RIGHT OUTER JOIN, and = for an INNER JOIN, the FULL OUTER JOIN not being supported in non-ANSI. Starting with SQL Server 2005 were introduced also the CROSS APPLY and OUTER APPLY operators that could be used to model cross joins.

-- Joins ANSI syntax 
SELECT A.Attribute1 
[, A.Attribute] 
[, B.Attribute] 
FROM TABLE1 A 
    [LEFT|RIGHT|FULL] [INNER|OUTER] TABLE2 B 
        ON <join_constraints> 
[WHERE <constraints>] 

-- Joins non-ANSI syntax 
SELECT A.Attribute1 
[, A.Attribute] 
[, B.Attribute] 
FROM TABLE1 A 
,TABLE2 B 
[WHERE<join_constraints>] 
[AND|OR <join_constraints>]

Notes:
1. In case of horizontal joins the Set Theory applies mainly to the join constraints, an element in the dataset being the n-uple formed by the attributes participating in the join constraints. Many times it’s natural to use the foreign key-primary key attribute pairs though tables’ design and the logic to be modeled don’t always allow this luxury, therefore when choosing the attributes participating in the join is targeted to find the smallest n-uple with unique values across the whole dataset. From a performance standpoint is preferred to use the attributes that are part of an index, actually in many cases an index arrives to be created for the attributes used often in joins in order to increase queries’ performance. If the n-uple used in the join doesn’t have unique values, in other words exists two records in the dataset on which the n-uple takes the same values for each attribute of the n-uple, then duplicates are induced in the join.

2. The records whose attributes participating in the join constraint have NULL values are ignored, so it might be needed to consider replacing the NULL values with a default value, though must be paid attention also to the possibility of introducing duplicates.

3. Oracle supports the SQL ANSI syntax together with a slightly different form of non-ANSI syntax. Especially when considering the portability of code it makes sense to use the SQL ANSI syntax in any RDBMS that supports it; I mention this aspect because many Oracle developers are still using the non-ANSI syntax.

4. In case an attribute is found in more than one table then it’s needed to use aliases for the tables in which the attribute is found or prefix the attribute with the table and schema name (e.g. Production.Product.Name). Actually from performance reasons it’s advisable to always use aliases, thus the database engine identifying easier the source table for each attribute.

5. In theory in the SELECT statement could include all the attributes from the tables participating in the join, though it’s recommendable to use only the attributes needed (when possible should be also avoided SELECT * constructs). For output’s readability it makes sense to group together the attributes coming from the same table or arrange them in a patterns that could be easily understood by the users.

6. The cross, inner and full outer joins are the only commutative operations, in other words the same results are obtained if the tables are inversed. It can be discussed also about associativity, the ability of changing the table join processing precedence without affecting the result, [1] for example discussing in detail this topic.

For demonstrating the application of vertical joins I will use the Production.Product, Purchasing.PurchaseOrderDetail POD and HumanResources.Employee tables from AdventureWorks database.

The Inner Join

The INNER JOIN operators, shortened sometimes as JOIN, allows retrieving the rows matched from both tables based on the join constraints, thus if for one record isn’t found a match in the other table then the record will not appear in the result dataset.

-- Inner Join ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
    JOIN Purchasing.PurchaseOrderDetail POD  
       ON ITM.ProductID = POD.ProductID 
WHERE ITM.MakeFlag = 0 

-- Inner Join non-ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
Purchasing.PurchaseOrderDetail POD 
WHERE ITM.ProductID=POD.ProductID 
    AND ITM.MakeFlag = 0

Note:
The inner join could be considered the projection of the intersection operation from Set Theory applied to the attributes participating in the join constraints.

In order to exemplify what’s happening in the background I will use a smaller subset from each of the two tables used as input, and the final result for the inner join:

As can be seen from the above image, the 8th, 9th and 10th records from Purchase Orders sample table corresponding to PurchaseOrderID = 7 are not appearing in the result dataset because the corresponding Product IDs (317, 318 and 319) for the respective lines are not found in the Products sample table. Also the second line from Products sample table corresponding to ProductID = 2 is not appearing in the result dataset because there is no Purchase Order placed for the respective Product.

Note:
The below query summarizes the sample tables used above and the output of the inner join, inline views being used for each of the tables in order to simplify the use of the example with all types of join, thus only the join type needs to be changed for exemplification:

SELECT IsNull(ITM.ProductID, POD.ProductID) ProductID 
, POD.PurchaseOrderID 
, ITM.ProductNumber 
, ITM.StandardCost 
, POD.UnitPrice 
, POD.OrderQty 
FROM ( -- sample Products 
     SELECT ProductID 
     , ProductNumber 
     , StandardCost 
      FROM Production.Product 
      WHERE ProductID IN (1, 2, 359, 360, 530, 4, 512, 513) 
      AND MakeFlag = 0) ITM 
      JOIN (-- sample Purchase Orders 
          SELECT ProductID 
         , PurchaseOrderID 
         , UnitPrice 
         , OrderQty 
         FROM Purchasing.PurchaseOrderDetail 
         WHERE PurchaseOrderID IN (1,2,3,4,5,6,7)) POD 
     ON ITM.ProductID = POD.ProductID 
ORDER BY 2

The Left Outer Join

The LEFT OUTER JOIN, shortened sometimes as LEFT JOIN, allows retrieving all the rows from the left table and only the matched records from the right table based on the join constraints.

-- Left Join ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
      LEFT JOIN Purchasing.PurchaseOrderDetail POD 
         ON ITM.ProductID = POD.ProductID 
WHERE ITM.MakeFlag = 0 

-- Left Join non-ANSI syntax 
SELECTITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
, Purchasing.PurchaseOrderDetail POD 
WHERE ITM.ProductID*=POD.ProductID 
    AND ITM.MakeFlag = 0

In order to exemplify what’s happening in the background I will use a smaller subset from each of the two tables used as input, and the final result for the left outer join:

Because the Products sample table is used as row-preserving table it will be shown all the records it holds together with the matched records from the Purchase Orders sample table, thus the Purchase Orders sample table corresponding to PurchaseOrderID = 7 are not appearing in the result dataset because the corresponding Product IDs (317, 318 and 319) for the respective lines are not found in the Products sample table.

If the tables are inversed then the Purchase Orders table becomes the row-preserving table and thus all the records from it are shown, including the records for which no Product is found in the Products sample table.

Notes:
1. The left join has no direct correspondent operation from Set Theory but it’s the projection of AU(A∩B) formula, which equals to (A\B)U(A∩B). The left join could be used also to determine the projection of A\B by adding in the WHERE clause the constraint that allows retrieving only the records for which no match was found to the right table:

-- LEFT Join ANSI syntax for A\B 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
    LEFT JOIN Purchasing.PurchaseOrderDetail POD 
      ON ITM.ProductID = POD.ProductID 
WHERE POD.ProductID IS NULL

2. Special attention must be given to the way the join and non-join constraints are added to the WHERE clause, because if the constraint is based solely on the null-supplying table and included in the WHERE clause without handling the Null case then an inner join could have been written instead because the rows for the attributes participating in constraints are ignored. In order to preserve left join’s character the constraint should be brought into the join constraint as below:

-- Right Join ANSI syntax modified 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Purchasing.PurchaseOrderDetail POD 
     LEFT JOIN Production.Product ITM 
        ON ITM.ProductID = POD.ProductID 
AND ITM.MakeFlag = 0

Because unlike in Oracle the both terms of an non-ANSI left join operator must contain columns (attributes), then the IsNull function could be used in order to handle the NULL values:

SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROMPurchasing.PurchaseOrderDetail POD 
, Production.Product ITM 
WHERE POD.ProductID *= ITM.ProductID 
AND IsNull(ITM.MakeFlag, 0) = 0

The same method could be used also to keep non-join constraints in the WHERE clause when using the ANSI syntax for outer joins.

3. There are cases in which is needed to replace with default values the NULL values from the not matched records from the Null-preserving table.

The Right Outer Join

The RIGHT OUTER JOIN, shortened sometimes as RIGHT JOIN, allows retrieving all the rows from the right table and only the matched records from the right table based on the join constraints. A left join could be rewritten as a right join, just by inversing the tables, in fact the above examples given for left joins are rewritten using a right join:

-- Right Join ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Purchasing.PurchaseOrderDetail POD 
     RIGHT JOIN Production.Product ITM 
          ON ITM.ProductID = POD.ProductID 
WHERE ITM.MakeFlag = 0 

-- Right Join ANSI syntax modified 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Purchasing.PurchaseOrderDetail POD 
    RIGHT JOIN Production.Product ITM 
         ON ITM.ProductID = POD.ProductID 
AND ITM.MakeFlag = 0

In order to exemplify what’s happening in the background I will use a smaller subset from each of the two tables used as input, and the final result for the right outer join:

As opposed to the general syntax examples given above for the left outer join in which the Products sample table is used as row-preserving table, in the above image the Purchase Orders sample table is used as row-preserving table, thus are shown all the records it holds and only the matched Products.

The Full Outer Join

The FULL OUTER JOIN, shortened sometimes as FULL JOIN, allows retrieving the rows matched from both tables based on the join together with the not matched records from both tables.

-- Full Outer Join ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
    FULL JOIN Purchasing.PurchaseOrderDetail POD 
       ON ITM.ProductID = POD.ProductID

In order to exemplify what’s happening in the background I will use a smaller subset from each of the two tables used as input, and the final result for the full outer join:

As can be seen from the above image are considered all the rows from both tables, including the Purchase Orders for which no match was found in the Products sample table and the Products for which no Purchase Order is found in the corresponding sample table.

Note:
1. As in the left outer join’s case, special attention must be given to the non-join constraints added in the WHERE clause because they could reduce the range of applicability of the full outer join to a left/right outer join or to an inner join. Eventually, if really needed to add non-join constraints, instead of the base table it could be used an inline view as input for the full outer join.

2. In SQL Server there is no non-ANSI operator equivalent to the ANSI full outer join operator, though the same functionality could be obtained by using the union between the result of a left outer join and the one of a right outer join:

-- Full Outer Join non-ANSI syntax equivalent 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
, Purchasing.PurchaseOrderDetail POD 
WHERE ITM.ProductID *= POD.ProductID 
UNION 
SELECT 
ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
, Purchasing.PurchaseOrderDetail POD 
WHERE ITM.ProductID =* POD.ProductID

In some cases could be useful to rewrite the above query using the ANSI syntax, but this time it makes sense to use the UNION ALL operator instead and limit the second dataset only to the Products for which no Purchase Order was placed:

-- Full Outer Join ANSI syntax equivalent using Left & Right Outer Joins 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
     LEFT JOIN Purchasing.PurchaseOrderDetail POD  
         ON ITM.ProductID = POD.ProductID 
UNIONALL 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
      RIGHT JOIN Purchasing.PurchaseOrderDetail POD  
            ON ITM.ProductID = POD.ProductID 
WHERE ITM.ProductID IS NULL

The Cross Join

The CROSS JOIN resumes at listing the joined tables without specifying the join constraint and it returns the carthezian product between the two tables, a row from the first table being matched to each row from the second table, thus if the first table has m rows and the second n rows, the final query will return m*n rows. If in non-ANSI syntax it’s pretty simple to create a cross join, using ANSI syntax the same could be obtained using a full outer join with a join constraint that always equates to true independently of the tables’ attributes values (e.g. 1=1) or, starting with SQL Server 2005, when was first introduced, could be used the cross apply operator.

-- Cross Join ANSI syntax (CROSS APPLY) 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
     CROSS APPLY Purchasing.PurchaseOrderDetail POD 

-- Cross Join ANSI syntax (FULL OUTER JOIN) 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
    FULL OUTER JOIN Purchasing.PurchaseOrderDetail POD 
      ON 1=1 

-- Cross Join non-ANSI syntax 
SELECT ITM.ProductID 
, POD.PurchaseOrderID 
FROM Production.Product ITM 
, Purchasing.PurchaseOrderDetail POD

The Self Join

The self join is a special type of vertical join in which the left and right tables in a join represent the same table, and it could take the form of any of the above mentioned types of joins. This typically implies the existence of a self-reference attribute that references the primary key of another record, such structures being useful in modeling hierarchies (e.g. Bill of Materials, Human Resources hierarchies). For example HumanResources.Employee table stores together with the current Employee also ManagerID which stores the EmployeeID of Employee’s Manager, that stores at its turn the reference to Manager’s Manager, thus a whole structure of an organization could be built with self joins. The simplified queries for getting the Manager of Employee’s Manager could be written with a left join, this mainly because there might be Employees who don’t have a manager:

-- Self Join ANSI Syntax (LEFT JOIN) 
SELECT EMP.EmployeeID 
, EMP.ManagerID 
, MNG.ManagerID ManagersManagerID 
FROM HumanResources.Employee EMP 
    LEFT JOIN HumanResources.Employee MNG 
       ON EMP.ManagerID = MNG.EmployeeID 

-- Self Join non-ANSI Syntax (LEFT JOIN) 
SELECT EMP.EmployeeID 
, EMP.ManagerID 
, MNG.ManagerID ManagersManagerID 
FROM HumanResources.Employee EMP 
, HumanResources.Employee MNG 
WHERE EMP.ManagerID *= MNG.EmployeeID

Note:
In order to built the whole hierarchy, starting with SQL Server 2005 instead of using multiple self joins could be used common tables expressions (CTE), they offering more flexibility and better performance.

References:
[1] David M.M., (1999). Advanced ANSI SQL Data Modeling and Structure Processing. Artech House. ISBN: 1-58053-038-9

11 March 2010

💎SQL Reloaded: Oracle vs. SQL Server: Averages I

For many people working with averages is not a complicated topic at all, and that’s quite true, though as always it depends on how much you want to complicate everything. The average or the arithmetic mean is defined as the middle value of the values within a data set calculated as the sum of the values divided by the number of values:

RDBMS vendors provide similar functionality within the AVG aggregate function, and even if in theory the result could be calculated using the SUM and COUNT aggregate functions, as per the above definition, given its importance in statistics, to provide an AVG function seems to be a natural choice. AVG could be used across the whole data set or within a given “partition” determined by the GROUP BY clause.

-- Oracle/SQL Server: Simple Averages 
SELECT ProductID 
, AVG(UnitPrice) AverageUnitPrice 
, SUM(UnitPrice)/COUNT(1) AverageUnitPriceDef 
FROM Purchasing.PurchaseOrderDetail 
GROUP BY ProductID

For most of the data analysis requirements the simple average function would be enough, though there are cases in which is required to address the sensitivity of values to certain modifiers – for example quantities or the number of lines. For such cases is used the weighted average (weighted arithmetic mean) calculated as the sum of values and quantifiers products, the total being divided by the sum of quantifiers:

Note:
In case the quantifiers are all 1 then the weighted average equates with the simple average, the output being the same also when the values are constant.

The weighted average is quite a useful tool when performing PO Price Analysis in which often needs to be be considered also the volume of purchased quantities with a given Price Unit:

-- Oracle/SQL Server: Simple/Weighted Averages 
SELECT ProductID 
, AVG(UnitPrice) AverageUnitPrice 
, SUM(UnitPrice*OrderQty)/SUM(OrderQty) WeightedAverageUnitPrice 
FROM Purchasing.PurchaseOrderDetail 
GROUP BY ProductID

The actual implementation in a query might differ based on requirements and preferences – as multiple left joins, inline view or correlated query, each of the mentioned approaches coming with their downsides and strengths:

-- Oracle/SQL Server: Averages (inline view) 
SELECT ITM.ProductID 
, ITM.Name ProductName 
, ITM.ProductNumber 
, DAT.AverageUnitPrice 
, DAT.WeightedAverageUnitPrice 
FROM Production.Product ITM 
LEFT JOIN ( --calculated averages 
    SELECT POD.ProductID 
    , AVG(POD.UnitPrice) AverageUnitPrice 
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice 
    FROM Purchasing.PurchaseOrderDetail POD 
       JOIN Purchasing.PurchaseOrderHeader POH 
          ON POD.PurchaseOrderID = POH.PurchaseOrderID 
    WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
   GROUP BY POD.ProductID)DAT 
    ON ITM.ProductID = DAT.ProductID 
ORDER BY ITM.Name 

-- Oracle/SQL Server: Averages (multiple left joins) 
SELECT ITM.ProductID 
, ITM.Name ProductName 
, ITM.ProductNumber 
, AVG(POD.UnitPrice) AverageUnitPrice 
, SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice 
FROM Production.Product ITM 
   LEFT JOIN Purchasing.PurchaseOrderDetail POD 
      ON ITM.ProductID = POD.ProductID 
   LEFT JOIN Purchasing.PurchaseOrderHeader POH 
      ON POD.PurchaseOrderID = POH.PurchaseOrderID  
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
GROUP BY ITM.ProductID 
,ITM.Name 
,ITM.ProductNumber 
ORDER BY ITM.Name  

-- SQL Server 2005: Averages (OUTER APPLY) 
SELECT ITM.ProductID 
, ITM.Name ProductName 
, ITM.ProductNumber 
, DAT.AverageUnitPrice 
, DAT.WeightedAverageUnitPrice 
FROM Production.Product ITM 
OUTER APPLY ( -- calculated averages 
    SELECT AVG(POD.UnitPrice) AverageUnitPrice 
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice 
    FROM Purchasing.PurchaseOrderDetail POD  
       JOIN Purchasing.PurchaseOrderHeader POH 
          ON POD.PurchaseOrderID = POH.PurchaseOrderID  
    WHERE ITM.ProductID = POD.ProductID  
        AND POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
) DAT 
ORDER BY ITM.Name

A different type of approach could be focused on calculating the simple or weighted average based only on the last n PO Lines, for example the last 3 PO Lines:

-- Oracle/SQL Server 2005: last 3 PO average prices 
SELECT DAT.ProductID 
, MAX(DAT.RANKING) NumberRecords 
, AVG(DAT.UnitPrice) AveragePrice 
, SUM(DAT.UnitPrice* DAT.OrderQty)/SUM(DAT.OrderQty) WeightedAveragePrice 
FROM (-- ranked PO Prices 
    SELECT POD.ProductID 
    , POD.UnitPrice 
    , POD.OrderQty 
    , RANK() OVER(PARTITION BY POD.ProductID ORDER BY POH.OrderDate DESC,   POD.PurchaseOrderDetailID DESC) RANKING 
    FROM Purchasing.PurchaseOrderDetail POD 
       JOIN Purchasing.PurchaseOrderHeader POH 
          ON POD.PurchaseOrderID = POH.PurchaseOrderID 
    WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
) DAT 
WHERE DAT.RANKING BETWEEN 1 AND 3 
GROUP BY DAT.ProductID

The problem could be complicated even more by selecting all the POs corresponding to the last 3 distinct Prices used; not sure if it brings any value but the formulation is perfectly valid and don’t despair if a user comes with such a request.

Occasionally it might be needed to study the evolution of prices over time, the previous problem becoming a simple moving average or weighted moving average problem, in which the average is calculated in report with a moving point on the time scale with respect to the previous k points or concerning the previous and following k points including the current point:

G. Priester shows in his Calculating Moving Averages with T-SQL article a nice solution for simple moving averages calculation exemplifying it on stock market data. Frankly, given its flexibility, I prefer to use the CROSS APPLY operator in combination with fGetDatesInInterval function introduced in More Date-related Functionality – Part I post, the function returning all the dates falling in a certain interval. Because in case of POs there could be multiple transactions per day for the same Product, the problem is slightly changed, the averages being based within a 31 window:

-- SQL Server 2005: 31 days moving averages 
SELECT ITM.ProductNumber 
, ITM.Name ProductName 
, DIN.DateSequence ReportingDate 
, DAT.AverageUnitPrice 
, DAT.WeightedAverageUnitPrice 
FROM dbo.fGetDatesInInterval('2003-01-01', '2003-12-31') DIN 
CROSS APPLY (--calculated averages 
    SELECT POD.ProductID 
    , AVG(POD.UnitPrice) AverageUnitPrice 
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice 
    FROM Purchasing.PurchaseOrderDetail POD 
       JOIN Purchasing.PurchaseOrderHeader POH 
          ON POD.PurchaseOrderID = POH.PurchaseOrderID 
     WHERE DATEDIFF(d, POH.OrderDate, DIN.DateSequence) BETWEEN 0 AND 30 -- 31 days 
     --WHERE DATEDIFF(d, POH.OrderDate, DIN.DateSequence) BETWEEN -15 AND 15 -- 31 days 
         AND POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
GROUP BY POD.ProductID) DAT 
LEFT JOIN Production.Product ITM 
   ON DAT.ProductID = ITM.ProductID 
WHERE DATEPART(dw, DIN.DateSequence) BETWEEN 2 AND 6 -- Working Days 
ORDER BY ITM.Name 
, DIN.DateSequence

SQL Troubles

Pages