About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Thursday, March 11, 2010

Oracle vs. SQL Server: Averages – Part I

     For many people working with averages is not a complicated topic at all, and that’s quite true, though as always it depends on how much you want to complicate everything. The average or the arithmetic mean is defined as the middle value of the values within a data set calculated as the sum of the values divided by the number of values:
AVG - Simple Average
    RDBMS vendors provide similar functionality within the AVG aggregate function, and even if in theory the result could be calculated using the SUM and COUNT aggregate functions, as per the above definition, given its importance in statistics, to provide an AVG function seems to be a natural choice. AVG could be used across the whole data set or within a given “partition” determined by the GROUP BY clause.
-- Oracle/SQL Server: Simple Averages
SELECT ProductID
, AVG(UnitPrice) AverageUnitPrice
,
SUM(UnitPrice)/COUNT(1) AverageUnitPriceDef

FROM
Purchasing.PurchaseOrderDetail

GROUP
BY ProductID

 
    For most of the data analysis requirements the simple average function would be enough, though there are cases in which is required to address the sensitivity of values to certain modifiers – for example quantities or the number of lines. For such cases is used the weighted average (weighted arithmetic mean) calculated as the sum of values and quantifiers products, the total being divided by the sum of quantifiers:
AVG - Weighted Average
Note:
     In case the quantifiers are all 1 then the weighted average equates with the simple average, the output being the same also when the values are constant.

      The weighted average is quite a useful tool when performing PO Price Analysis in which often needs to be be considered also the volume of purchased quantities with a given Price Unit:
-- Oracle/SQL Server: Simple/Weighted Averages
SELECT ProductID
,
AVG(UnitPrice) AverageUnitPrice

,
SUM(UnitPrice*OrderQty)/SUM(OrderQty) WeightedAverageUnitPrice

FROM
Purchasing.PurchaseOrderDetail

GROUP
BY ProductID
 

   The actual implementation in a query might differ based on requirements and preferences – as multiple left joins, inline view or correlated query, each of the mentioned approaches coming with their downsides and strengths:
-- Oracle/SQL Server: Averages (inline view)
SELECT ITM.ProductID
,
ITM.Name ProductName

,
ITM.ProductNumber

,
DAT.AverageUnitPrice

,
DAT.WeightedAverageUnitPrice

FROM
Production.Product ITM
LEFT JOIN (—calculated averages

    SELECT POD.ProductID
    , AVG(POD.UnitPrice) AverageUnitPrice
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice
    FROM Purchasing.PurchaseOrderDetail POD
       JOIN Purchasing.PurchaseOrderHeader POH
          ON POD.PurchaseOrderID = POH.PurchaseOrderID
    WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

   GROUP BY POD.ProductID)DAT
    ON ITM.ProductID = DAT.ProductID

ORDER BY ITM.Name


-- Oracle/SQL Server: Averages (multiple left joins)
SELECT ITM.ProductID
,
ITM.Name ProductName

,
ITM.ProductNumber

,
AVG(POD.UnitPrice) AverageUnitPrice

,
SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice

FROM Production.Product ITM

   LEFT JOIN Purchasing.PurchaseOrderDetail POD
      ON ITM.ProductID = POD.ProductID
   LEFT JOIN Purchasing.PurchaseOrderHeader POH
      ON POD.PurchaseOrderID = POH.PurchaseOrderID
 
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

GROUP BY ITM.ProductID
,
ITM.Name

,
ITM.ProductNumber

ORDER BY ITM.Name
 

-- SQL Server 2005: Averages (OUTER APPLY)
SELECT ITM.ProductID
,
ITM.Name ProductName

,
ITM.ProductNumber

,
DAT.AverageUnitPrice

,
DAT.WeightedAverageUnitPrice

FROM
Production.Product ITM
OUTER APPLY ( -- calculated averages

    SELECT AVG(POD.UnitPrice) AverageUnitPrice
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice
    FROM Purchasing.PurchaseOrderDetail POD 

       JOIN Purchasing.PurchaseOrderHeader POH
          ON POD.PurchaseOrderID = POH.PurchaseOrderID 
    WHERE ITM.ProductID = POD.ProductID 
        AND POH.Status IN (2, 4) -- 2-Approved, 4-Complete

) DAT
ORDER
BY ITM.Name


      A different type of approach could be focused on calculating the simple or weighted average based only on the last n PO Lines, for example the last 3 PO Lines:
-- Oracle/SQL Server 2005: last 3 PO average prices
SELECT DAT.ProductID
,
MAX(DAT.RANKING) NumberRecords

,
AVG(DAT.UnitPrice) AveragePrice

,
SUM(DAT.UnitPrice* DAT.OrderQty)/SUM(DAT.OrderQty) WeightedAveragePrice

FROM
(-- ranked PO Prices

    SELECT POD.ProductID
    , POD.UnitPrice
    , POD.OrderQty
    , RANK() OVER(PARTITION BY POD.ProductID ORDER BY POH.OrderDate DESC,   POD.PurchaseOrderDetailID DESC) RANKING
    FROM Purchasing.PurchaseOrderDetail POD
       JOIN Purchasing.PurchaseOrderHeader POH
          ON POD.PurchaseOrderID = POH.PurchaseOrderID
    WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete

) DAT

WHERE
DAT.RANKING BETWEEN 1 AND 3

GROUP BY DAT.ProductID


    The problem could be complicated even more by selecting all the POs corresponding to the last 3 distinct Prices used; not sure if it brings any value but the formulation is perfectly valid and don’t despair if a user comes with such a request :) !

    Occasionally it might be needed to study the evolution of prices over time, the previous problem becoming a simple moving average or weighted moving average problem, in which the average is calculated in report with a moving point on the time scale with respect to the previous k points or concerning the previous and following k points including the current point:
AVG - Simple Moving Average
AVG - Weighted Moving Average
     G. Priester shows in his Calculating Moving Averages with T-SQL article a nice solution for simple moving averages calculation exemplifying it on stock market data. Frankly, given its flexibility, I prefer to use the CROSS APPLY operator in combination with fGetDatesInInterval function introduced in More Date-related Functionality – Part I post, the function returning all the dates falling in a certain interval. Because in case of POs there could be multiple transactions per day for the same Product, the problem is slightly changed, the averages being based within a 31 window:
-- SQL Server 2005: 31 days moving averages
SELECT ITM.ProductNumber
,
ITM.Name ProductName

,
DIN.DateSequence ReportingDate

,
DAT.AverageUnitPrice

,
DAT.WeightedAverageUnitPrice

FROM
dbo.fGetDatesInInterval('2003-01-01', '2003-12-31') DIN
CROSS APPLY (--calculated averages

    SELECT POD.ProductID
    , AVG(POD.UnitPrice) AverageUnitPrice
    , SUM(POD.UnitPrice*OrderQty)/SUM(POD.OrderQty) WeightedAverageUnitPrice
    FROM Purchasing.PurchaseOrderDetail POD
       JOIN Purchasing.PurchaseOrderHeader POH
          ON POD.PurchaseOrderID = POH.PurchaseOrderID
     WHERE DATEDIFF(d, POH.OrderDate, DIN.DateSequence) BETWEEN 0 AND 30 -- 31 days
     --WHERE DATEDIFF(d, POH.OrderDate, DIN.DateSequence) BETWEEN -15 AND 15 -- 31 days

         AND POH.Status IN (2, 4) -- 2-Approved, 4-Complete

GROUP BY POD.ProductID) DAT
LEFT JOIN Production.Product ITM
   ON DAT.ProductID = ITM.ProductID

WHERE
DATEPART(dw, DIN.DateSequence) BETWEEN 2 AND 6 -- Working Days

ORDER
BY ITM.Name

,
DIN.DateSequence


No comments: