13 March 2010

SSIS: Aggregate Data Flow Transformation

    At a first look the SSIS Aggregate Data Flow Transformation doesn’t seem to be so useful given the fact that the same functionality could be easily obtained with an aggregate query, which actually has more flexibility in what concerns data manipulation, however must be not forgotten that the power of transformations reside in the way they are combined in order to solve a problem and not in their isolated use. On the other side in order to master the basics it makes sense to create simple examples that include a minimum of information/tasks. It’s also the case of this post, for exemplification I will use the following query based on AdventureWorks’ Purchase Orders Header/Detail tables:
SELECT POD.PurchaseOrderDetailID
,
POD.PurchaseOrderID

,
POD.ProductID

,
POD.OrderQty

,
POD.ReceivedQty

,
POD.UnitPrice

,
POD.OrderQty * POD.UnitPrice OrderValue

,
POD.ReceivedQty * POD.UnitPrice ReceiptValue

,
POH.OrderDate

,
POD.DueDate

FROM
Purchasing.PurchaseOrderDetail POD
   JOIN Purchasing.PurchaseOrderHeader POH
      ON POD.PurchaseOrderID = POH.PurchaseOrderID

WHERE
POH.Status IN (2, 4) -- 2-Approved, 4-Complete


     Using the template SSIS package defined in Third Magic Class post, copy paste the Package.dtsx in the project and rename it (e.g. Package Aggregate.dtsx), then in Data Flow Task rename the Data Source (e.g. OLE DB Source POs) and change the SQL command text using the above query. From Toolbox add an Aggregate Transformation and link it to the OLE DB Source and access the Aggregate Transformation Editor in which, if everything went swell until now, it will appear a “table” based on the above query and check in a first phase the following attributes: ProductID, OrderQty, ReceivedQty, OrderValue and ReceiptValue. Once an attribute is checked in the table, it will appear also as Input Column in the Editor, the same name being used also for Output Alias. For ProductID the Editor chosen the ‘Group by’ as Operation and ‘Sum’ for OrderValue. Change the Operation as Sum for the other three attributes like in the below screenshot. Add also the Order Date and once more the ProductID as Input Columns, for the first choose ‘Maximum’ as Operation while for the second the ‘Count’ in order to get the Last OrderDate, respectively the Number of Records.
SSIS - Aggregate Transformation Editor

     Once the above changes completed, delete the existing OLE DB Destination and add another one, link the Aggregate Transformation to it, create a new table (e.g. Purchasing.PurchaseOrderProductAggregation) based on Aggregate’s input, accept the mappings and test the Data Flow Task:
SSIS - Aggregate Data Flow Task

     In the last step, before testing the whole package, in Control Flow tab change the Execute SQL Task’s SQLStatement property to point to the current destination table:
TRUNCATE TABLE [Purchasing].[PurchaseOrderProductAggregation]  

   Save the project, test (debug) the package (twice) and don’t forget to validate the output data:
SELECT *
FROM
[Purchasing].[PurchaseOrderProductAggregation]  

    The output from the above table should be actually the same with the output’s of the following aggregated query:
-- PO Product Aggregation SELECT POD.ProductID
,
SUM(POD.OrderQty) OrderQty

,
SUM(POD.ReceivedQty) ReceivedQty

,
SUM(POD.OrderQty * POD.UnitPrice) OrderValue

,
SUM(POD.ReceivedQty * POD.UnitPrice) ReceiptValue

,
Max(POH.OrderDate) LastOrderDate

,
COUNT(1) NumberRecords

FROM
Purchasing.PurchaseOrderDetail POD
   JOIN Purchasing.PurchaseOrderHeader POH
      ON POD.PurchaseOrderID = POH.PurchaseOrderID

WHERE
POH.Status IN (2, 4) -- 2-Approved, 4-Complete

GROUP
BY POD.ProductID

ORDER
BY POD.ProductID

No comments: