One of the SQL Server 2005+ functionalities I kind of ignored in the past years is the CROSS APPLY operator that allows returning rows from a table-valued function of a correlated query, allowing for example to calculate running totals, cumulated amounts from the previous records and the current record in a data set. Before the introduction of CROSS APPLY operator such a query could be written as follows:
-- SQL Server/Oracle: Running Totals (simple join)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, SUM(POT.SubTotal) RunningSubTotal
, SUM(POT.TaxAmt) RunningTaxAmt
, SUM(POT.Freight) RunningFreight
FROM Purchasing.PurchaseOrderHeader POH
LEFT JOIN Purchasing.PurchaseOrderHeader POT
ON POH.PurchaseOrderID >= POT.PurchaseOrderID
GROUP BY POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
ORDER BY POH.PurchaseOrderID
Each of the attributes not aggregated needs to be added in the GROUP BY clause, fact that could be quite a hit from a performance standpoint. If only one aggregated value is needed then the query could be rewritten, the cumulated value being calculated in a correlated query:
-- SQL Server/Oracle: Running Totals (correlated query)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, ( -- calculating aggregated running total
SELECT SUM(POT.SubTotal)
FROM Purchasing.PurchaseOrderHeader POT
WHERE POH.PurchaseOrderID >= POT.PurchaseOrderID
) RunningSubTotal
FROM Purchasing.PurchaseOrderHeader POH
In theory could be added a correlated query for each aggregated value, though this means an additional table scan for each attribute, not the best approach for sure. The CROSS APPLY operator simplifies such queries allowing to do multiple calculations within the same correlated query:
-- SQL Server 2005: Running Totals (CROSS APPLY)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, DAT.RunningSubTotal
, DAT.RunningTaxAmt
, DAT.RunningFreight
FROM Purchasing.PurchaseOrderHeader POH
CROSS APPLY (-- calculating aggregated running total
SELECT SUM(POT.SubTotal) RunningSubTotal
, SUM(POT.TaxAmt) RunningTaxAmt
, SUM(POT.Freight) RunningFreight
FROM Purchasing.PurchaseOrderHeader POT
WHERE POH.PurchaseOrderID >= POT.PurchaseOrderID) DAT
ORDER BY POH.PurchaseOrderID
The advantage of the first two approaches is that they work also in Oracle, though also Oracle comes with an alternative using analytic aggregate functions:
-- Oracle: Running Totals (analytic aggregate function)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, SUM(POH.SubTotal) OVER (ORDER BY POH.PurchaseOrderID ) RunningSubTotal
, SUM(POH.TaxAmt) OVER (ORDER BY POH.PurchaseOrderID) RunningTaxAmt
, SUM(POH.Freight) OVER (ORDER BY POH.PurchaseOrderID) RunningFreight
FROM PurchaseOrderHeader POH
ORDER BY POH.PurchaseOrderID
Notes:
1. The calculation of running totals requires the existence of a unique identifier for each record that can be sorted, in this case being used the PurchaseOrderID attribute, though typically could be used the creation date (timestamp) when the record was modified, alone or in combination with other attributes; when used in combination then maybe it makes sense to add a running number in order to apply the above techniques. In Oracle the use of analytic aggregate functions offers less headaches from this perspective, being enough to specify the unique combination of attributes in the ORDER BY clause in the body of the function.
2. SQL Server 2005’s window aggregate functions, even if similar with Oracle’s analytic aggregate functions, unlike the window ranking functions they don’t allow the ORDER BY clause in the functions, therefore running totals can’t be modeled in SQL Server with the help of window aggregate functions.
3. In case the running totals are based on other table, then might be needed to use the OUTER APPLY operator, returning thus also the records for which not match is retrieved in the outer dataset.
Sometimes might be requested to create running totals within a given partition, for example by Vendor, in this case all is needed to do is to add the VendorID in the WHERE or JOIN constraint, while in Oracle’s analytic functions case in the PARTITION BY clause. Here are the three queries modified:
-- SQL Server/Oracle: Running Totals per Vendor (simple join)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, SUM(POT.SubTotal) RunningSubTotal
, SUM(POT.TaxAmt) RunningTaxAmt
, SUM(POT.Freight) RunningFreight
FROM Purchasing.PurchaseOrderHeader POH
LEFT JOIN Purchasing.PurchaseOrderHeader POT
ON POH.VendorID = POT.VendorID
AND POH.PurchaseOrderID >= POT.PurchaseOrderID
GROUP BY POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
ORDER BY POH.VendorID
, POH.PurchaseOrderID
-- SQL Server 2005: Running Totals per Vendor (CROSS APPLY)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, DAT.RunningSubTotal
, DAT.RunningTaxAmt
, DAT.RunningFreight
FROM Purchasing.PurchaseOrderHeader POH
CROSS APPLY ( -- calculating aggregated running total
SELECT SUM(POT.SubTotal) RunningSubTotal
, SUM(POT.TaxAmt) RunningTaxAmt
, SUM(POT.Freight) RunningFreight
FROM Purchasing.PurchaseOrderHeader POT
WHERE POH.VendorID = POT.VendorID
AND POH.PurchaseOrderID >= POT.PurchaseOrderID) DAT
ORDER BY POH.VendorID
, POH.PurchaseOrderID
-- Oracle: Running Totals per Vendor (analytic aggregate function)
SELECT POH.PurchaseOrderID
, POH.VendorID
, POH.OrderDate
, POH.SubTotal
, POH.TaxAmt
, POH.Freight
, SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID ORDER BY POH.PurchaseOrderID ) RunningSubTotal
, SUM(POH.TaxAmt) OVER (PARTITION BY POH.VendorID ORDER BY POH.PurchaseOrderID) RunningTaxAmt
, SUM(POH.Freight) OVER (PARTITION BY POH.VendorID ORDER BY POH.PurchaseOrderID) RunningFreight
FROM PurchaseOrderHeader POH
ORDER BY POH.VendorID
, POH.PurchaseOrderID
A Software Engineer and data professional's blog on SQL, data, databases, data architectures, data management, programming, Software Engineering, Project Management, ERP implementation and other IT related topics.
Subscribe to:
Post Comments (Atom)
About Me
- Adrian
- Koeln, NRW, Germany
- IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.
No comments:
Post a Comment