15 March 2010

MS Office: Excel for SQL Developers III (Creating a Stored Procedure from Table’s Metadata)

In a previous post I was showing a simple technique for using stored procedures for inserting/updating data within SQL Server 2005+. From a rough estimation I think that about 80% of the stored procedures built for this purpose could use the same template, without including additional business logic. Given the fact that databases store metadata about their objects, table’s metadata could be used to create such a stored procedure using VBA or more complex programming languages. SQL Server 2008 stores its table’s columns metadata in INFORMATION_SCHEMA.COLUMNS view, the required information for the Products created in the previous post could be get using the following query:

-- SQL Server 2008 Table/Column Metadata SELECT TABLE_SCHEMA SchemaName 
, TABLE_NAME TableName 
, COLUMN_NAME ColumnName 
, DATA_TYPE DataType 
, CHARACTER_MAXIMUM_LENGTH Length 
, IS_NULLABLE IsNullable 
, NUMERIC_PRECISION Precision 
, NUMERIC_SCALE NumericScale 
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = 'Products' 
ORDER BY TABLE_NAME 
, ORDINAL_POSITION 

The output of the above query could be easily copy-pasted into Excel or directly exported using the SQL Server Import and Export Wizard, the range with data (without headers) being passed as parameter to GetStoredProcedure macro created for this post together with stored procedure and table’s name.

Excel - Get Stored Procedure 

Macro’s construction is based on the following presumptions:
1.   The first attribute in the target table is always table’s primary key.
2.   The stored procedure considers only simple data types though it can be extended to support further data types.
3.   Stored procedure's name and table's name contains also the schemas in which the respective objects are created.
4.   CreationDate, ModifiedDate, CreatedBy and ModifiedBy are reserved words for tracking the timestamp and the User who created/modified the record.
5.   The UserID  will be the last parameter in case the CreatedBy or ModifiedBy are in scope.

And here’s the macro GetStoredProcedure with its afferent Metadata Enumeration that maps the column order from the above query, and the GetDataType function that translates the metadata related to data type to the actual parameters’ data type: 
 
Enum Metadata 
  enmSchemaName = 1 
  enmTableName = 2 
  enmColumnName = 3 
  enmDataType = 4 
  enmLength = 5 
  enmIsNullable = 6 
  enmPrecision = 7 
  enmNumericScale = 8 
End Enum 

Function GetStoredProcedure(ByVal rng As Range, ByVal ProcedureName As String, ByVal TableName As String) As String 
'builds the code for a stored procedure based on table's metadata 
Dim parameters As String 
Dim columnName As String 
Dim parameterName As String 
Dim columnValue As String 
Dim insert As String 
Dim columns As String 
Dim update As String 
Dim indexRow As Integer 
Dim ID As String 
Dim insertUserID As Boolean 

insertUserID = False 

For index = 1 To rng.Rows.Count 
    If Len(Trim(Cells(index, Metadata.enmTableName).Value)) > 0 And Len(Trim(Cells(index, Metadata.enmColumnName))) Then 
       'translating column names 
       Select Case rng.Cells(index, Metadata.enmColumnName) 
       Case "CreationDate": 
            columnName = "" 
            columnValue = "GetDate()" 
            parameterName = "" 
       Case "ModifiedDate": 
            columnName = "ModifiedDate" 
            columnValue = "GetDate()" 
            parameterName = "" 
       Case "CreatedBy": 
            columnName = "" 
            columnValue = "@UserID" 
            parameterName = "" 
            insertUserID = True 
       Case "ModifiedBy": 
            columnName = Trim(rng.Cells(index, Metadata.enmColumnName)) 
            columnValue = "@UserID" 
            parameterName = "" 
            insertUserID = True 
       Case Else: 
            columnName = Trim(rng.Cells(index, Metadata.enmColumnName)) 
            columnValue = "@" & columnName 
            parameterName = "@" & columnName 
       End Select 
       
       If index = 1 Then 
            ID = columnName 
       Else 
            columns = columns & ", " & Trim(rng.Cells(index, Metadata.enmColumnName)) 
            insert = insert & ", " & columnValue 
            
            If Len(columnName) > 0 Then 
                 update = update & Space(9) & ", " & columnName & " = " & columnValue & "" & vbCrLf 
            End If 
       End If 
       
        If Len(parameterName) > 0 Then 
             parameters = parameters & ", " & parameterName & " " & GetDataType(Trim(rng.Cells(index, Metadata.enmDataType).Value), Trim(rng.Cells(index, Metadata.enmLength).Value), Trim(rng.Cells(index, Metadata.enmPrecision).Value), Trim(rng.Cells(index, Metadata.enmNumericScale).Value)) & vbCrLf 
         End If 
    End If 
Next 

If Len(columns) > 0 Then 
    columns = Right(columns, Len(columns) - 1) 
End If 
If Len(insert) > 0 Then 
    insert = Right(insert, Len(insert) - 1) 
End If 
If Len(update) > 0 Then 
    update = Right(update, Len(update) - 10) 
End If 
If Len(parameters) > 0 Then 
    parameters = Right(parameters, Len(parameters) - 1) & IIf(insertUserID, ", @UserID int" & vbCrLf, "") 
End If 

'building the stored procedure string 
GetStoredProcedure = "CREATE PROCEDURE " & ProcedureName & "(" & vbCrLf & parameters & ")" & vbCrLf & _ 
    "AS" & vbCrLf & _ 
    "BEGIN" & vbCrLf & _ 
    "BEGIN TRY" & vbCrLf & _ 
    "    IF ISNULL(@" & ID & ", 0) = 0" & vbCrLf & _ 
    "    BEGIN" & vbCrLf & _ 
    "    -- insert statement" & vbCrLf & _ 
    "       INSERT INTO " & TableName & " (" & columns & ")" & vbCrLf & _ 
    "       VALUES (" & insert & ")" & vbCrLf & _ 
    "       SET @" & ID & " = @@IDENTITY" & vbCrLf & _ 
    "    END" & vbCrLf & _ 
    "    ELSE" & vbCrLf & _ 
    "    BEGIN" & vbCrLf & _ 
    "        -- update statement" & vbCrLf & _ 
    "        UPDATE " & TableName & vbCrLf & _ 
    "        SET " & update & _ 
    "        WHERE " & ID & "= @" & ID & vbCrLf & _ 
    "    END" & vbCrLf & _ 
    "    SELECT @" & ID & vbCrLf & _ 
    "END TRY" & vbCrLf & _ 
    "BEGIN CATCH" & vbCrLf & _ 
    "   SELECT Cast(ERROR_NUMBER() as varchar(10)) + ':' + ERROR_MESSAGE()" & vbCrLf & _   
    "END CATCH" & vbCrLf & _ 
    "END" 

End Function 

Private Function GetDataType(ByVal DataType As String, ByVal Length As String, ByVal NumericPrecision As String, ByVal NumericScale As String) As String 
'translating data types 
Select Case DataType 
    Case "varchar": GetDataType = DataType & "(" & Length & ")" 
    Case "nvarchar": GetDataType = DataType & "(" & Length & ")" 
    Case "nchar": GetDataType = DataType & "(" & Length & ")" 
    Case "decimal": GetDataType = DataType & "(" & NumericPrecision & "," & NumericScale & ")" 
    Case "numeric": GetDataType = DataType & "(" & NumericPrecision & "," & NumericScale & ")" 
    Case "money": GetDataType = DataType & "(" & NumericPrecision & "," & NumericScale & ")" 
    Case "smallmoney": GetDataType = DataType & "(" & NumericPrecision & "," & NumericScale & ")" 
    Case Else: GetDataType = DataType 
End Select 

End Function 

Note:
As always, there is enough place for improvement, I just tried to exemplify the utility of such a function that could reduce considerably developers’ time for building such stored procedures.

14 March 2010

💎SQL Reloaded: Saving Data With Stored Procedures

    Without going too much into details on the benefits of stored procedures, in this post I will show a simple technique I usually like to use for database updates. It’s not rocket science, I bet many of the developers use similar approaches, and many more don’t! Stored procedures are the ideal place for storing your business logic, including insert/updates, in addition stored procedures being recommended as a way of avoiding SQL injection and facilitate code’s maintenance. Doing inserts/updates in a stored procedure implies sending each attribute that needs to be maintain, and given the fact that, with small exceptions, the same parameters are sent, it makes sense to bring the two statements in the same stored procedure. The difference in the number of parameters resides actually in the Primary Key which is sent in the update case in order to identify uniquely the record that needs to be updated, though this fact could be used to determine whether is dealt with an insert or an update.

   Enough with the talking, so let’s consider the below table in which a unique constrain is defined on ProductNumber attribute: 
 
CREATE TABLE Production.Products( 
ProductID int IDENTITY(1,1) NOT NULL, 
Name nvarchar(250) NOT NULL, 
ProductNumber nvarchar(25) NOT NULL, 
MakeFlag bit NOT NULL, 
FinishedGoodsFlag bit NOT NULL, 
Color nvarchar(15) NULL, 
SafetyStockLevel int NULL, 
ListPrice decimal(13,2) NOT NULL, 
StandardCost decimal(13,2) NOT NULL, 
CreationDate smalldatetime NOT NULL, 
CreatedBy int NOT NULL, 
ModifiedDate smalldatetime NOT NULL, 
ModifiedBy int NOT NULL) 

    The table is based on Production.Product table from AdventureWorks database, however I reduced the number of attributes and simplified table’s definition. I introduced also the Creation/Modified Dates used as timestamps for the creation/modification of a record, and the Created/Modified By as a reference to the User that performed the respective actions. From all these 4 attributes is enough to send only the User ID, the actual timestamp being obtained using the GetDate function in the background, so we’ll have in the end 13-3= 10 parameters. Here is the stored procedure: 
 
CREATE PROCEDURE dbo.pUpdateProducts( 
  @ProductID int 
, @Name nvarchar(250) 
, @ProductNumber nvarchar(25) 
, @MakeFlag bit 
, @FinishedGoodsFlag bit 
, @Color nvarchar(15) 
, @SafetyStockLevel int 
, @ListPrice decimal(13,2) 
, @StandardCost decimal(13,2) 
, @StartDate smalldatetime 
, @EndDate smalldatetime 
, @UserID int) 
AS 
BEGIN 
BEGIN TRY 
    IF ISNULL(@ProductID, 0) = 0 
    BEGIN 
        -- insert statement 
        INSERT INTO Production.Products (Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color , SafetyStockLevel 
, ListPrice, StandardCost, StartDate, EndDate, CreationDate, CreatedBy, ModifiedDate, ModifiedBy) 
        VALUES ( @Name, @ProductNumber, IsNull(@MakeFlag, 0), IsNull(@FinishedGoodsFlag, 0), @Color , @SafetyStockLevel 
, @ListPrice, @StandardCost, @StartDate, @EndDate, GETDATE(), @UserID, GETDATE(), @UserID) 
        SET @ProductID = @@IDENTITY 
     END 
     ELSE 
     BEGIN 
          -- update statement 
          UPDATE Production.Products 
          SET Name = @Name 
         , ProductNumber = @ProductNumber 
         , MakeFlag = IsNull(@MakeFlag, 0)  
         , FinishedGoodsFlag = IsNull(@FinishedGoodsFlag, 0) 
         , Color = @Color 
         , SafetyStockLevel = @SafetyStockLevel 
         , ListPrice = @ListPrice 
         , StandardCost = @StandardCost 
        , StartDate = @StartDate 
        , EndDate = @EndDate 
        , ModifiedDate = GETDATE() 
        , ModifiedBy = @UserID 
        WHERE ProductID = @ProductID 
     END 
     SELECT @ProductID 
END TRY 
BEGIN CATCH 
     SELECT Cast(ERROR_NUMBER() as varchar(10)) + ':' + ERROR_MESSAGE() 
END CATCH 
END 

    The TRY… CATCH… block introduced with SQL Server 2005 allows trapping the errors and thus returns error details to the calling application. In case the DML statement succeeded is returned the Primary Key of the just updated, respectively inserted record. Let’s see the stored procedure in action: 

--inserting new records    
EXEC dbo.pUpdateProducts 0, 'Product 1', 'Product1', 1, 1, 'Blue', 100, 50.32 , 44.06, '2010-01-01', NULL, 1 -- Example 1 
EXEC dbo.pUpdateProducts 0, 'Product 2', 'Product2', 0, 1, 'Red', 100, 51.32 , 44.45, '2010-01-01', NULL, 1 -- Example 2 
EXEC dbo.pUpdateProducts 0, 'Product 3', 'Product3', 1, 0, 'Gren', 100, 59.32 , 44.87, '2010-01-01', NULL, 1 -- Example 3 
EXEC dbo.pUpdateProducts 0, 'Product 4', 'Product4', 0, 0, 'Red', 100, 57.32 , 44.78, '2010-01-01', NULL, 1 -- Example 4 


    After running the above EXEC statements try to run them again, each of them will error out with the following error details: 2601:Cannot insert duplicate key row in object 'Production.Products' with unique index 'IX_Products_UniqueProductNumber'. It will be in developer’s attribution to translate the error to a more appealing error message.

    In order to update a record all is needed to do is to transmit the ID of the record to be updated and the new values. For example in a new created table the record created by first example will more likely correspond to ProductID = 1, thus the following statement is used to change its values: 
 
-- updating a record 
EXEC dbo.pUpdateProducts 1, 'Product 1', 'Product1', 1, 1, 'White', 100, 50.32 , 44.06, '2010-01-01', NULL, 1 -- Example 1 


Note:
    This technique has the inconvenience that it doesn’t consider the cases in which two different Users update the same record, though that’s a topic for another post.

13 March 2010

MS Office: Excel for SQL Developers II (Insert Statements)

    From time to time is required to load a bulk of data into a table, and even if SQL Server and Oracle provides tools for this purpose, there are also occasions when this needs to be done with statements, an insert statement needing to be prepared for each record from the dataset. It sounds like a task that could be easily done in Excel, isn’t it? For this purpose in order to concatenate the values on a row I will use the GetRangeAsList macro defined in the previous post on the same topic. Considering the dataset from the below screenshot and that the headers match attributes’ name from the destination table (see table definition below) the formula that will allow me to create the multiple insert statements would look like:
 
= "INSERT INTO Production.ProductsTest (" & SUBSTITUTE(GetRangeAsList($A$1:$F$1), "'", "") & ")" & CHAR(13) & "VALUES (" & GetRangeAsList(A2:F2) & ")" 

Excel - Insert statement

After stripping off the output of the double quotes the output would like below: 

INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-150 Red, 62', 'BK-R93R-62', '1', '1', 'Red', '100') 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-150 Red, 44', 'BK-R93R-44', '1', '1', 'Red', '100') 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-150 Red, 48', 'BK-R93R-48', '1', '1', 'Red', '100') 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-150 Red, 52', 'BK-R93R-52', '1', '1', 'Red', '100') 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-150 Red, 56', 'BK-R93R-56', '1', '1', 'Red', '100') 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
VALUES( 'Road-450 Red, 58', 'BK-R68R-58', '1', '1', 'Red', '100') 

    A much simpler approach is based on the use of a UNION ALL to join the records, for this purpose could be used the = " SELECT " & GetRangeAsList(A2:F2) & " UNION ALL" formula, following to strip off the last UNION ALL and eventually add the INSERT INTO statement manually: 
 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) 
SELECT 'Road-150 Red, 62', 'BK-R93R-62', '1', '1', 'Red', '100' UNION ALL 
SELECT 'Road-150 Red, 44', 'BK-R93R-44', '1', '1', 'Red', '100' UNION ALL 
SELECT 'Road-150 Red, 48', 'BK-R93R-48', '1', '1', 'Red', '100' UNION ALL 
SELECT 'Road-150 Red, 52', 'BK-R93R-52', '1', '1', 'Red', '100' UNION ALL 
SELECT 'Road-150 Red, 56', 'BK-R93R-56', '1', '1', 'Red', '100' UNION ALL 
SELECT 'Road-450 Red, 58', 'BK-R68R-58', '1', '1', 'Red', '100' 

    SQL Server 2008 allows to insert multiple records within a single statements, for this purpose the last formula could be changed to = " (" & GetRangeAsList(A2:F2) & " ) ,", following to just add the INSERT INTO statement manually: 
 
INSERT INTO Production.ProductsTest ( Name, ProductNumber, MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel) VALUES 
( 'Road-150 Red, 62', 'BK-R93R-62', '1', '1', 'Red', '100' ) , 
( 'Road-150 Red, 44', 'BK-R93R-44', '1', '1', 'Red', '100' ) , 
( 'Road-150 Red, 48', 'BK-R93R-48', '1', '1', 'Red', '100' ) , 
( 'Road-150 Red, 52', 'BK-R93R-52', '1', '1', 'Red', '100' ) , 
( 'Road-150 Red, 56', 'BK-R93R-56', '1', '1', 'Red', '100' ) , 
( 'Road-450 Red, 58', 'BK-R68R-58', '1', '1', 'Red', '100' ) 

   Just in case you want to test the statements here is Production.ProductsTest table’s definition:

CREATE TABLE [Production].[ProductsTest]( 
[ProductID] [int]IDENTITY(1,1) NOT NULL, 
[Name] [nvarchar](50) NOT NULL, 
[ProductNumber] [nvarchar](25) NOT NULL, 
[MakeFlag] [bit]NOT NULL, 
[FinishedGoodsFlag] [bit]NOT NULL, 
[Color] [nvarchar](15) NULL, 
[SafetyStockLevel] [smallint] NOT NULL) 

Note:
    Of course, also for the above 3 cases could be written macros that produces the same output taking a range as parameter, however the formulas are fairly simple to use, the only thing that needs to be changed are the ranges and, in the first example, table’s name.

MS Office: Excel for SQL Developers I (Range to Delimited List of Values)

    For many SQL developers Excel could prove to be a useful tool in what concerns not only data analysis and manipulation but also in using its functionality for handling repetitive tasks. One of the simplest such examples I may thing of is the formatting of a list of values for use within a query.

    Typically users are requesting to provide a report based on a list of values provided in Excel, for example a list of Products. If the list is quite small, let's say 5-10 products, list's transformation to be used in the query is not quite a big deal. On the other side when dealing with hundreds or thousand of records the story totally changes. In all cases what I do is to select first the distinct list of values and copy them eventually in a new sheet where I use a formula like the one in the screenshot and apply it to the whole data set. 
 
Excel - List of values formula 
 
Notes:
1.    Within the = ", '" & TRIM(A2) & "'" formula the Trim function helps to remove the additional spaces.
2.    Please note that when the values from Excel were copied beforehand from other applications (e.g. Web pages) it could happen that additional formatting is stored resulting in unexpected behavior. What it helps to solve this issue is to copy paste the values in another sheet using the Paste Special feature, use Data/Text To Columns functionality for each column involved or export the data to a comma delimited file and re-import it in another Excel sheet.

     Now all I have to do is to copy the transformed output into my SQL Editor of choice and use it in a query.

SELECT * -- specify explicitly the attributes! 
FROM Production.Product 
WHERE ProductNumber IN ( 
'BK-R93R-62' , 'BK-R93R-44' 
, 'BK-R93R-48' 
, 'BK-R93R-52' 
, 'BK-R93R-56' 
, 'BK-R68R-58' 
, 'BK-R68R-60' 
, 'BK-R68R-44' 
, 'BK-R68R-48' 
, 'BK-R68R-52' 
, 'BK-R50R-58' 
, 'BK-R50R-60' 
, 'BK-R50R-62' 
, 'BK-R50R-44' 
, 'BK-R50R-48' 
, 'BK-R50R-52' 
, 'BK-R50B-58' 
, 'BK-R50B-60' 
, 'BK-R50B-62' 
, 'BK-R50B-44') 

    An even much easier approach is to create a simple macro that takes as input the range I would like to extract the information from and transform the input directly into a comma delimited list of values. Here’s the function used for this purpose: 
 
Function GetRangeAsList(ByVal rng As Range) As String 
Dim result As String 
Dim cell As Range 
For Each cell In rng 
    If Len(Trim(cell.Value)) > 0 Then 
       result = result & ", '" & Trim(cell.Value) & "'" 
    End If 
Next 
GetRangeAsList = IIf(Len(result) > 0, Right(result, Len(result) - 1), result) 
End Function 

   Now all I have to do in the Excel sheet with my data is to type the following formula =GetRangeAsList(A2:A21) in an empty cell out of existing dataset, function’s output being similar to the previous used formula, however the list of values is more compact:
'BK-R93R-62', 'BK-R93R-44', 'BK-R93R-48', 'BK-R93R-52', 'BK-R93R-56', 'BK-R68R-58', 'BK-R68R-60', 'BK-R68R-44', 'BK-R68R-48', 'BK-R68R-52', 'BK-R50R-58', 'BK-R50R-60', 'BK-R50R-62', 'BK-R50R-44', 'BK-R50R-48', 'BK-R50R-52', 'BK-R50B-58', 'BK-R50B-60', 'BK-R50B-62', 'BK-R50B-44'

Note:
  The function could be easily changed to provide the delimiter as parameter too, though I haven’t met many occasions when other delimiter was required.

🎡SSIS: Percentage Sampling Data Flow Transformation

    One of the problems relatively difficult to address in the past was the random sampling within a SQL Server data set. Of course there is always Excel that could be used for this purpose just with a few tricks, though what do you do when working with huge source dataset? As we will see below SSIS 2008’s Percentage Sampling Data Flow Transformation allows easily addressing this problem.

    Using the template SSIS package defined in Third Magic Class post, copy paste the Package.dtsx in the project and rename it (e.g. Package Percentage Sampling.dtsx), and from Toolbox add an Percentage Sampling Transformation and link it to the OLE DB Source. Access the Percentage Sampling Editor in which modify the Percentage of rows value from 10 to 50. It doesn’t really makes sense to rename the sample and unselected outputs, though you might need to do that when dealing with multiple Percentage Sampling Transformations.

SSIS - Percentage Sampling Transformation Editor
Note:
    The percentage of rows you’d like to work with depends entirely upon request, in many cases it’s indicated to determine statistically the size of your sample. Given the fact that the number of records in this example is quite small I preferred to use a medium dataset size.

    Link the Aggregate Transformation to the OLE DB Destination and in the Input Output Selection dialog select as Output the ‘Sampling Selected Output’, while in the OLE DB Destination Editor create a new table (e.g. Production.BikesSample). 
 
SSIS - Input Output Selection

   In the last step, before testing the whole package, in Control Flow tab change the Execute SQL Task’s SQLStatement property to point to the current destination table: 
 
TRUNCATE TABLE [Production].[BikesSample]  

  
SSIS - Percentage Sampling Data Flow

   Save the project, test (debug) the package (twice) and don’t forget to validate the output data: 
  
SELECT * 
FROM [Production].[BikesSample]   
 

Note:
   I was actually expecting to have 48 or 49 records (97:2=48.5) in the output and not 45, I wonder from where comes the difference?! That’s a topic I still have to investigate. I tried also to change the percentage of rows to 25 resulting an output of 23 of records (23*4=92), 75 resulting an output of 74 records, respectively 100, all the records being this time selected. At least the algorithm used by Microsoft partitions the output in complementary datasets.

🎡SSIS: Aggregate Data Flow Transformation

    At a first look the SSIS Aggregate Data Flow Transformation doesn’t seem to be so useful given the fact that the same functionality could be easily obtained with an aggregate query, which actually has more flexibility in what concerns data manipulation, however must be not forgotten that the power of transformations reside in the way they are combined in order to solve a problem and not in their isolated use. On the other side in order to master the basics it makes sense to create simple examples that include a minimum of information/tasks. It’s also the case of this post, for exemplification I will use the following query based on AdventureWorks’ Purchase Orders Header/Detail tables:

SELECT POD.PurchaseOrderDetailID 
, POD.PurchaseOrderID 
, POD.ProductID 
, POD.OrderQty 
, POD.ReceivedQty 
, POD.UnitPrice 
, POD.OrderQty * POD.UnitPrice OrderValue 
, POD.ReceivedQty * POD.UnitPrice ReceiptValue 
, POH.OrderDate 
, POD.DueDate 
FROM Purchasing.PurchaseOrderDetail POD 
   JOIN Purchasing.PurchaseOrderHeader POH 
      ON POD.PurchaseOrderID = POH.PurchaseOrderID 
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 

   Using the template SSIS package defined in Third Magic Class post, copy paste the Package.dtsx in the project and rename it (e.g. Package Aggregate.dtsx), then in Data Flow Task rename the Data Source (e.g. OLE DB Source POs) and change the SQL command text using the above query. From Toolbox add an Aggregate Transformation and link it to the OLE DB Source and access the Aggregate Transformation Editor in which, if everything went swell until now, it will appear a “table” based on the above query and check in a first phase the following attributes: ProductID, OrderQty, ReceivedQty, OrderValue and ReceiptValue. 

    Once an attribute is checked in the table, it will appear also as Input Column in the Editor, the same name being used also for Output Alias. For ProductID the Editor chosen the ‘Group by’ as Operation and ‘Sum’ for OrderValue. Change the Operation as Sum for the other three attributes like in the below screenshot. Add also the Order Date and once more the ProductID as Input Columns, for the first choose ‘Maximum’ as Operation while for the second the ‘Count’ in order to get the Last OrderDate, respectively the Number of Records. 
 
SSIS - Aggregate Transformation Editor

     Once the above changes completed, delete the existing OLE DB Destination and add another one, link the Aggregate Transformation to it, create a new table (e.g. Purchasing.PurchaseOrderProductAggregation) based on Aggregate’s input, accept the mappings and test the Data Flow Task: 
 
SSIS - Aggregate Data Flow Task

     In the last step, before testing the whole package, in Control Flow tab change the Execute SQL Task’s SQLStatement property to point to the current destination table:

TRUNCATE TABLE [Purchasing].[PurchaseOrderProductAggregation]   

   Save the project, test (debug) the package (twice) and don’t forget to validate the output data: 
 
SELECT * 
FROM [Purchasing].[PurchaseOrderProductAggregation]    

     The output from the above table should be actually the same with the output’s of the following aggregated query:

-- PO Product Aggregation SELECT POD.ProductID 
, SUM(POD.OrderQty) OrderQty 
, SUM(POD.ReceivedQty) ReceivedQty 
, SUM(POD.OrderQty * POD.UnitPrice) OrderValue 
, SUM(POD.ReceivedQty * POD.UnitPrice) ReceiptValue 
, Max(POH.OrderDate) LastOrderDate 
, COUNT(1) NumberRecords 
FROM Purchasing.PurchaseOrderDetail POD 
   JOIN Purchasing.PurchaseOrderHeader POH 
      ON POD.PurchaseOrderID = POH.PurchaseOrderID 
WHERE POH.Status IN (2, 4) -- 2-Approved, 4-Complete 
GROUP BY POD.ProductID 
ORDER BY POD.ProductID

Happy Coding!

12 March 2010

🕋Data Warehousing: Pivot/Unpivot (Definitions)

"To rotate rows to columns, and columns to rows, in a cross-tabular data browser. Also refers to choosing dimensions from the set of available dimensions in a multidimensional data structure for display in the rows and columns of a cross-tabular structure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The process of rotating the view of data. For example, viewing what was the x-axis in the y-axis’s position and vice versa." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The UNPIVOT operator is used within a SELECT statement to create a normalized data report from data that is stored as a spreadsheet." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"The PIVOT operator is used within a SELECT statement. It is used to create cross-tab reports (similar to a spreadsheet) from normalized data." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"The act of rotating rows to columns, and columns to rows." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

[unpivot:] "To expand values from multiple columns in a single record into multiple records with the same values in a single column." (Microsoft, "SQL Server 2012 Glossary", 2012)

"To rotate a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and perform aggregations where they are required on any remaining column values that are wanted in the final output." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In the Lean start-up world, a pivot is a structured, often rapid, course correction on the basis of new market, customer, and development information." (Pamela Schure & Brian Lawley, "Product Management For Dummies", 2017)

 "1. To rotate rows to columns, and columns to rows, in a cross-tabular data browser. 2. To choose dimensions from the set of available dimensions in a multidimensional data structure for display in the rows and columns of a cross-tabular structure." (Microsoft Technet)

[unpivot:] "In Integration Services, the process of creating a more normalized dataset by expanding data columns in a single record into multiple records." (Microsoft Technet)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.