14 February 2010

💎SQL Reloaded: Oracle vs. SQL Server (Date Operations)

Difference between two dates 
 
    Often is needed to calculate the difference in days (or in any other time unit) between two dates, for example in order to determine the cycle/lead time, further analysis being done based on such data. In SQL Server the difference between two dates can be obtained using the DateDiff function which allows specifying the time unit used. Here’s the use for most important time units: 

-- SQL Server Date Difference 
SELECT ProductID, SellStartDate, SellEndDate, GetDate() CurrentDate , 
DATEDIFF(ss, SellStartDate, IsNull(SellEndDate, GetDate())) NumberSeconds , 
DATEDIFF(mi, SellStartDate, IsNull(SellEndDate, GetDate())) NumberMinutes , 
DATEDIFF(hh, SellStartDate, IsNull(SellEndDate, GetDate())) NumberHours , 
DATEDIFF(d, SellStartDate, IsNull(SellEndDate, GetDate())) NumberDays , 
DATEDIFF(wk, SellStartDate, IsNull(SellEndDate, GetDate())) NumberWeeks , 
DATEDIFF(mm, SellStartDate, IsNull(SellEndDate, GetDate())) NumberMonths , 
DATEDIFF(qq, SellStartDate, IsNull(SellEndDate, GetDate())) NumberQuarters , 
DATEDIFF(yyyy, SellStartDate, IsNull(SellEndDate, GetDate())) NumberYears  
FROM Production.Product   

    In Oracle the difference in days could be obtained using the subtraction operator “-”, therefore in order to get the difference in smaller time units the result should be multiplied 24 for hours, 24*60 for minutes, respectively 24*60*60 for seconds, and divided by 7, 30, 3*30 or 365 in order to approximate the difference in weeks, months, quarters, respectively years. 

-- Oracle Date Difference 
SELECT ProductID, SellStartDate, SellEndDate, SYSDATE , 
(NVL(SellEndDate, SYSDATE)- SellStartDate) * 24 * 60 * 60 NumberSeconds , 
(NVL(SellEndDate, SYSDATE)- SellStartDate) * 24 * 60 NumberMinutes , 
(NVL(SellEndDate, SYSDATE)- SellStartDate) * 24 NumberHours , 
NVL(SellEndDate, SYSDATE)- SellStartDate NumberDays , 
(NVL(SellEndDate, SYSDATE)- SellStartDate)/7 NumberWeeks , 
(NVL(SellEndDate, SYSDATE)- SellStartDate)/30 NumberMonths , 
(NVL(SellEndDate, SYSDATE)- SellStartDate)/(30*3) NumberQuarters , 
(NVL(SellEndDate, SYSDATE)- SellStartDate)/365 NumberYears  
FROM SQLServer.PRODUCT     

    Unfortunately the above approach won’t work if you want to use the result with numeric-based functions like Floor, because the difference is an interval and not a numeric value. If you attempt to use directly the Floor function you’ll get the “ORA-00932: inconsistent datatypes: expected NUMBER got INTERVAL DAY TO SECOND”. In order to overcome this issue the dates need to be truncated, the lowest time unit for this being the minutes. Here’s the modified query: 
-- Oracle Date Difference using truncated values 
SELECT ProductID, SellStartDate, SellEndDate, SYSDATE , 
(TRUNC(NVL(SellEndDate, SYSDATE), 'MI')- TRUNC(SellStartDate, 'MI')) * 24 * 60 * 60 NumberSeconds , 
(TRUNC(NVL(SellEndDate, SYSDATE), 'MI')- TRUNC(SellStartDate, 'MI')) * 24 * 60 NumberMinutes , 
TRUNC((TRUNC(NVL(SellEndDate, SYSDATE), 'HH')- TRUNC(SellStartDate, 'HH')) * 24) NumberHours , 
TRUNC(NVL(SellEndDate, SYSDATE))- TRUNC(SellStartDate) NumberDays , 
FLOOR((TRUNC(NVL(SellEndDate, SYSDATE), 'IW')- TRUNC(SellStartDate, 'IW'))/7) NumberWeeks , 
FLOOR((TRUNC(NVL(SellEndDate, SYSDATE), 'MM')- TRUNC(SellStartDate, 'MM'))/30) NumberMonths , 
FLOOR((TRUNC(NVL(SellEndDate, SYSDATE), 'Q')- TRUNC(SellStartDate, 'Q'))/(30*3)) NumberQuarters , 
FLOOR((TRUNC(NVL(SellEndDate, SYSDATE), 'YYYY')- TRUNC(SellStartDate, 'YYYY'))/365) NumberYears FROM  SQLServer.PRODUCT   

    Up until weeks the differences between the two queries are minimal, the important differences resulting for months, quarters and years. The logic used in the Oracle query could be encapsulated in a date function like the below one:
CREATE OR REPLACE FUNCTION DateDiff ( datepart varchar2 , 
startdate date , 
enddate date) RETURN 
NUMBER IS datepart1 varchar2 (4); 
BEGIN 
      datepart1:= CASE WHEN datepart = 'SS' THEN 'MI' ELSE datepart END;
RETURN FLOOR((TRUNC(enddate, datepart1)- TRUNC(startdate, datepart1)) 

* CASE datepart

    WHEN 'SS' THEN 24 * 60 * 60 
   WHEN 'MI' THEN 24 * 60 
   WHEN 'HH' THEN 24 
   WHEN 'DD' THEN 1 
   WHEN 'IW' THEN 1/7 
   WHEN 'MM' THEN 1/30 
   WHEN 'Q' THEN 1/(30*3)

   WHEN 'YYYY' THEN 1/365 
   ELSE 1 
END); END;  
  
    Here’s the above Oracle query rewritten using the Oracle DateDiff function: 
-- Oracle Date Difference 
SELECT  ProductID, SellStartDate, SellEndDate, SYSDATE , 
DateDiff('SS', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberSeconds , 
DateDiff('MI', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberMinutes , 
DateDiff('HH', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberHours , 
DateDiff('DD', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberDays , 
DateDiff('IW', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberWeeks , 
DateDiff('MM', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberMonths , 
DateDiff('Q', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberQuarters , 
DateDiff('YYYY', SellStartDate, NVL(SellEndDate, SYSDATE)) NumberYears 
FROM SQLServer.PRODUCT 

Adding Time Units
  
  Oracle is using the add/subtract operators also in order to add/subtract time units from a Date, while SQL Server provides the DateAdd built-in function for the same type of operation. 

-- SQL Server adding time unit 
SELECT GetDate() CurrentDate , 
DateAdd(ss, 1, GetDate()) AddingSecond , 
DateAdd(mi, 1, GetDate()) AddingMinute , 
DateAdd(hh, 1, GetDate()) AddingHour , 
DateAdd(dd, 1, GetDate()) AddingDay , 
DateAdd(wk, 1, GetDate()) AddingWeek , 
DateAdd(mm, 1, GetDate()) AddingMonth , 
DateAdd(q, 1, GetDate()) AddingQuarter , 
DateAdd(yyyy, 1, GetDate()) AddingYear 

-- Oracle adding time unit 
SELECT to_char(SYSDATE, 'dd-mon-yyyy hh24:mi:ss') CurrentDate , 
to_char(SYSDATE + 1/(24*60*60), 'dd-mon-yyyy hh24:mi:ss') AddingSecond , 
to_char(SYSDATE + 1/(24*60), 'dd-mon-yyyy hh24:mi:ss') AddingMinute , 
to_char(SYSDATE + 1/24, 'dd-mon-yyyy hh24:mi:ss') AddingHour , 
to_char(SYSDATE + 1, 'dd-mon-yyyy hh24:mi:ss') AddingDay , 
to_char(SYSDATE + 7, 'dd-mon-yyyy hh24:mi:ss') AddingWeek , 
to_char(add_months(SYSDATE, 1), 'dd-mon-yyyy hh24:mi:ss') AddingMonth , 
to_char(add_months(SYSDATE, 3), 'dd-mon-yyyy hh24:mi:ss') AddingQuarter , 
to_char(add_months(SYSDATE, 12), 'dd-mon-yyyy hh24:mi:ss') AddingYear 
FROM DUAL    

    In order to provide similar functionality with SQL Server, could be created a DateAdd function also in Oracle: 
--Oracle DateAdd implementation 
CREATE OR REPLACE FUNCTION DateAdd ( 
 datepart varchar2 , 
unit number , 
targetdate date) RETURN 
date IS     resultdate  date; 
BEGIN 
     IF datepart IN ('MM', 'Q', 'YYYY') THEN resultdate  := CASE datepart  
           WHEN 'MM' THEN add_months(targetdate, unit)
         WHEN 'Q' THEN add_months(targetdate, unit*3) 

         ELSE add_months(targetdate, unit*12) 
   END; 
  ELSE resultdate  := targetdate + unit * CASE datepart 

     WHEN 'SS' THEN 1/ (24 * 60 * 60) 
     WHEN 'MI' THEN 1/ (24 * 60) 
     WHEN 'HH' THEN 1 / 24 
     WHEN 'DD' THEN 1 
     WHEN 'IW' THEN 7 
      ELSE 1 
END; 
END IF; 
RETURN resultdate; 
END; 

     Here’s the above Oracle query rewritten using the Oracle DateAdd function: 
-- Oracle using DateAdd function 
SELECT to_char(SYSDATE, 'dd-mon-yyyy hh24:mi:ss') CurrentDate , 
to_char(DateAdd('SS', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingSecond , 
to_char(DateAdd('MI', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingMinute , 
to_char(DateAdd('HH', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingHour , 
to_char(DateAdd('DD', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingDay , 
to_char(DateAdd('IW', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingWeek , 
to_char(DateAdd('MM', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingMonth , 
to_char(DateAdd('Q', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingQuarter , 
to_char(DateAdd('YYYY', 1, SYSDATE), 'dd-mon-yyyy hh24:mi:ss') AddingYear 
FROM DUAL      

    Of course, in order to subtract a time unit then a negative value needs to be provided for the second parameter.

12 February 2010

🕋Data Warehousing: Operational Data Store (Definitions)

"The operational data store is subject-oriented and contains current, integrated, consistent data that reflects the current state of its subject. Operational data stores are similar to enterprise data warehouses in that they may include data from different systems that has been made consistent. Operational data stores are different from enterprise data warehouses in that they are updated frequently to reflect the current state of the operational systems." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A physical set of tables sitting between the operational systems and the data warehouse or a specially administered hot partition of the data warehouse itself. The main reason for an ODS is to provide immediate reporting of operational results if neither the operational system nor the regular data warehouse can provide satisfactory access. Because an ODS is necessarily an extract of the operational data, it also may play the role of source for the data warehouse." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"The operational data store is a subject-oriented, integrated, current, volatile collection of data used to support the operational and tactical decision-making process for the enterprise. It is the central point of data integration for business management, delivering a common view of enterprise data." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A hybrid structure designed to support both operational transaction processing and analytical processing." (William H Inmon, "Building the Data Warehouse", 2005)

"A collection of data from operational systems, most often integrated together, that is used for some operational purpose. The most critical characteristic here is that this is used for some operational function. This operational dependency takes precedence and the ODS should not be considered a central component of the data warehousing environment. An ODS can be a clean, integrated source of data to be pulled into the data warehousing environment." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A database designed to integrate data from multiple sources to facilitate operations. This is as opposed to a data warehouse, which integrates data from multiple sources to facilitate reporting and analysis." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A database that is subject-oriented, read-only to end users, current (non-historical), volatile, and integrated; is separate from and derived from one or more systems of record; and supports day-today business operations and real-time decision making." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A DB system designed to integrate data from multiple sources to allow operational access to the data for operational reporting." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Database for transaction processing systems that uses data warehouse concepts to provide clean data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"A database designed to integrate data from multiple sources for additional operations on the data." (Craig S Mullins, "Database Administration", 2012)

"A data store that provides data from the original source in near real time." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"ODS is the decision support database that integrated operational data from multiple source systems used to capture operational data and is used primarily for near real time operational reporting and analytics. ODSs are used to measure the operations processes efficiencies. The integration pattern is at the lowest levels of granularity and can happen from near real-time to multiple times in a day." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"A data store that integrates data from a range of sources, which is subsequently merged and cleaned to serve as the foundation for enterprise operational reporting. It is an important piece of an Enterprise Data Warehouse (EDW) used for enterprise analytical reporting." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"An ODS system integrates operational or transactional data from multiple systems to support operational reporting." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"An operational data store (ODS) is an alternative to having operational decision support system (DSS) applications access data directly from the database that supports transaction processing (TP). While both require a significant amount of planning, the ODS tends to focus on the operational requirements of a particular business process (for example, customer service), and on the need to allow updates and propagate those updates back to the source operational system from which the data elements were obtained. The data warehouse, on the other hand, provides an architecture for decision makers to access data to perform strategic analysis, which often involves historical and cross-functional data and the need to support many applications." (Gartner)

💎SQL Reloaded: Ways of Looking at Data V (Hierarchies)


While commenting on a post on User Defined Functions from SQL Server Programmers Blog I realized that I could use a multi-statement table-valued function in order to simulate recursive processing for ‘Identifying the top-most parent’ problem presented in Reports - Ways of Looking at Data – Part IV: Hierarchies. Some years back, I needed to create the BOM structure for a report based on IFS-iV ERP system, the data being stored in a SQL Server 2000-based data warehouse. Then, given the fact that common table expressions were introduced only on SQL Server 2005, I used a similar technique as the below one though based on temporary tables, if I’m remembering correctly. Then, like now, I was just iterating through the various levels of the BOM inserting data in the temporary table, this approach having better performance than creating a query self-referencing the same logic multiple times, the logic scaling quite well on multiple millions of records.

 -- BOM Top Most Items 
CREATE FUNCTION dbo.BOMTopMostItems(  
   @ProductID  int , 
   @Date datetime)  
RETURNS  
@BOM TABLE( 
    BillOfMaterialsID int NULL,  
   ProductAssemblyID int NULL,  
   ComponentID int NOT NULL, 
   ModifiedDate datetime NOT NULL, 
   Level smallint NOT NULL)  
AS 
BEGIN 
DECLARE @NoRecords int -- stored the number of Products part of an Assembly  

DECLARE @Level smallint  

SET @Date = ISNull(@Date, GetDate())  

SET @Level = 0 -- initializing the routine 
INSERT @BOM(BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, Level)  
SELECT BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, @Level  
FROM Production.BillOfMaterials  
WHERE ComponentID = @ProductID  
AND DATEDIFF(d, IsNull(EndDate, GetDate()), @Date)<=0  

SELECT @NoRecords = count(1) 
FROM @BOM  
WHERE ProductAssemblyID IS NOT NULL  

 
WHILE  (IsNull(@NoRecords, 0)>1)  
BEGIN  

    -- inserting next level 
    INSERT @BOM(BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, Level) 
    SELECT BOM.BillOfMaterialsID, BOM.ProductAssemblyID, BOM.ComponentID, BOM.ModifiedDate, @Level+1 
    FROM @BOM B  
         JOIN Production.BillOfMaterials BOM 
           ON B.ProductAssemblyID = BOM.ComponentID 
          AND DATEDIFF(d, IsNull(BOM.EndDate, GetDate()), @Date)<=0 
    WHERE Level = @Level 

    SET @Level = @Level+1  

    SELECT @NoRecords = count(1) 
    FROM @BOM 
    WHERE ProductAssemblyID IS NOT NULL 
      AND Level = @Level 
 
END -- deleting the intermediate levels 
 
DELETE   
FROM @BOM  
WHERE ProductAssemblyID IS NOT NULL 

RETURN 
END 
 
As can be seen I’m first inserting the BOM information of the Product searched, then based on these data I’m taking and inserting the next level, stopping when the top-most items where identified, here ProductAssemblyID IS NOT NULL, in the end deleting the intermediate levels. The same approach could be used to generate the whole hierarchy, in this case starting from the top.

Here is an example based on the above function, I used NULL in the second parameter because the @date parameter is set to the current date if not provided.
    

SELECT * 
FROM dbo.BOMTopMostItems(486, NULL)  

 When checking the data returned I observed that for the given ProductID the same BillOfMaterialsID appears more than once, this because, as can be seen by generating the whole BOM for one of the respective duplicated ComponentID, the same ProductID could appear in a BOM at different levels or at the same level but belonging to different assemblies. Therefore in order to get only the list of top-most items you could use a DISTINCT:

SELECT DISTINCT ComponentID  
FROM dbo.BOMTopMostItems(486, NULL) 

On the other side if we want to remove the duplicates resulted from Products appearing at the same level of the same BOM but for different assemblies, could be used a DISTINCT when inserting into the @BOM table or, even if not recommended, use a DISTINCT when calling the function:
 
SELECT DISTINCT BillOfMaterialsID
, ProductAssemblyID
, ComponentID
, ModifiedDate
, Level  
FROM dbo.BOMTopMostItems(486, NULL)

Happy coding!

Previous Post <<||>> Next Post

11 February 2010

🕋Data Warehousing: Staging Area (Definitions)

"The staging area is where data from the operational systems is first brought together. It is an informally designed and maintained grouping of data that may or may not have persistence beyond the load process." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A database design used to preprocess data before loading it into a different structure." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A place where data in transit is placed, usually coming from the legacy environment prior to entering the ETL layer of processing." (William H Inmon, "Building the Data Warehouse", 2005)

"Place where data is stored while it is being prepared for use, typically where data used by ETL processes is stored. This may encompass everything from where the data is extracted from its original source until it is loaded into presentation servers for end user access. It may also be where data is stored to prepare it for loading into a normalized data warehouse." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

[initial staging area:] "The area where the copy of the data from sources persists as a result of the extract/data movement process. (Data from real-time sources that is intended for real-time targets only is not passed through extract/data movement and does not land in the initial staging area.) The major purpose for the initial staging area is to persist source data in nonvolatile storage to achieve the “pull it once from source” goal." (Anthony D Giordano, "Data Integration Blueprint and Modeling", 2010)

"A location where data that is to be transformed is held in abeyance waiting for other events to occur" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"An area for the provision of data, where data is stored temporarily for validation and correction before it will be imported into the target database. A staging area thereby provides technical support for the first time right principle of data management." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

10 February 2010

💎SQL Reloaded: Hierarchies (The Oracle Solution)

    In yesterday’s post on hierarchy-based reports I was showing, using a BOM example, how hierarchy-related problems could be approached using common table expressions introduced starting with SQL Server 2005. Oracle’s approach to hierarchical queries is based on CONNECT BY clause that allows to specify the join constraint between the table(s) used in the recursive logic and previous output. If in SQL Server the logic is split in two parts – the anchor member sub-query representing the starting point of the query, respectively the recursive member, the query on which the recursive technique is applied, in Oracle the two are combined in only one query with the help of START WITH clause. Here’s the basic structure for building a hierarchy based on BillOfMaterials table coming with AdventureWorks.
    The START WITH ASS.productassemblyid IS NULL and TRUNC(NVL(ass.enddate, SYSDATE))>=TRUNC(SYSDATE) constraints function as initialization for the recursive logic, being ignore in next iterations, while the CONNECT BY PRIOR ass.componentid = ass.productassemblyid and TRUNC(NVL(ass.enddate, SYSDATE))>=TRUNC(SYSDATE) constraints function as join constraints between the prior result set and current logic. Maybe somebody is asking why the ‘Active BOM’ constraint if specified in both START WITH, respectively CONNECT BY clauses and not in the WHERE clause? If we are interested only in the Active BOMs then the corresponding constraint must be added to the initiating clause, and is added to the CONNECT BY clause because we need to assure that we take only the active children. If we move the ‘Active BOM’ constrain from the CONNECT BY to WHERE clause then most probably we’ll get a totally different number of records, respectively 9525 vs. 8730 how much the above query returns. The easiest way to check whether the logic is correct is to do include the Activer ecords in an inline view and construct with it the BOM.
    As the above query returns 8730 records it seems that the above logic is correct.

    In order to match the ‘CTE-Based BOM structure’ query from the previous post, the Oracle query needs several further changes, namely to construct the Path, add the top-most ComponentID, calculate the TotalQty, and add the Product details. The Path can be constructed using the SYS_CONNECT_BY_PATH function using as parameters the attribute needed to be concatenated (e.g. componentid) and the delimiter (e.g. ‘\’). The top-most ComponentID can be retrieved using the CONNECT_BY_ROOT operator. Unfortunately Oracle doesn’t provides a straight way to calculate the TotalQty, though there is an alternative, using again the SYS_CONNECT_BY_PATH function with the PerAssemblyQty attribute and multiplication sign ‘*’ as parameters, this smart trick allowing to evaluate the final expression (e.g. ‘1*1*2*5’ will be evaluated to 10); because the extract is typically saved to excel, the final concatenation could be written as an Excel formula expression (e.g. ‘=1*1*2*5’). Here is the final query:
    The specification of an ORDER BY is not always required because typically the output of a hierarchical query in Oracle is sorted as if it were used the depth-first search, though if I remember correctly there are exceptions too.

    Sometimes it’s useful to create a concatenation of Products in order to see the actual structure of the BOM (e.g. ‘\BK-R93R-62\BB-9108\BE-2349\BA-8327’) or to use to use incremental spaces instead (see. IndentPath), for this needing to bring the Product join for ComponentID into the recursive logic.
    By comparing the SQL Server vs. Oracle implementation for hierarchical queries, I would say that Oracle’s implementation is much simpler to use from the creational point of view with a trade off in functionality, several functions being required in order to extract information from top-most record or to concatenate attributes. On the other side SQL Server provides a more elegant solution that allows more functional flexibility in the detriment of code duplication – the anchor vs. the recursive member.

Note:
    In case you need to load the Product and BillOfMaterials tables in Oracle, you could do this by using the ‘SQL Server Import and Export Wizard’, for the steps see the SQL Server to Oracle Data Export – Second Magic Class tutorial.

09 February 2010

🕋Data Warehousing: Role-playing Dimension (Definitions)

"A dimension for which multiple foreign key columns exist in a single fact table to represent different contexts for the same dimension. For example, time is a common role-playing dimension with each date in the fact table associated with a different event, such as order date, ship date, and delivery date." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"Instances of a dimension that legitimately has more than one value for a given business transaction, such as order date and shipped date. Each attribute with the dimension is uniquely identified to enable easy differentiation between the different roles, such as Order Date, Order Quarter and Shipped Date, and Shipped Quarter." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A single database dimension joined to the fact table on a different foreign keys to produce multiple cube dimensions." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The situation where a single physical dimension table appears several times in a single fact table. Each of the dimension roles is represented as a separate logical table with unique column names through views." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

🕋Data Warehousing: Dimensional Model (Definitions)

"A dimensional model is a form of data modeling that packages data according to specific business queries and processes. The goals are business user understandability and multidimensional query performance." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A data model influenced by performance considerations and optimized for data access and retrieval. This is a form of entity-relationship (ER) modeling that’s limited to working with “measurement” types of subjects and has a prescribed set of allowable structures." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A dimensional model contains a central fact table and a set of surrounding dimension tables, each corresponding to one of the components or dimensions of the fact table." (James Yao et al, "Development and Design Methodologies in DWM", 2008)

"A data model organized for the purpose of user understandability and high performance. In a relational database, a dimensional model is a star join schema characterized by a central fact table with a multi-part key. The components of this key are joined to a set of dimension tables, each defined by its own primary keys. In a multi-dimensional database, a dimensional model is the database cube." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A data model that represents data in a star-like structure of only one-to-many relationships, where each entity has either all relationships having the ‘one’ side or the ‘many’ side." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A specialized type of physical data model particular to a retrieval-only database design, commonly used in Data Warehouses and data marts, where de-normalized fact tables are linked to dimension tables. Star schemas and snowflake schemas are examples of dimensional models." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

💎SQL Reloaded: Ways of Looking at Data IV (Hierarchies)

Introduction

There is a special type of entities that when related to each other, the relations put together form a hierarchy. It’s the case of assemblies and the components they are made of, a component could be at its turn an assembly formed from smaller components, resulting thus a structure known as Bill of Materials (BoM). Take for example a laptop made of parts like the keyboard, modem, hard disk, CD-ROM, microprocessor, volatile memory, graphic card, display, to name just a few of the main components, each of these parts are at their turn are made of smaller components that at their turn could be made from smaller components, and so on. Another popular example is the one of employees in an organization, following the line of command from CEO to the lowest person as importance, a person could be chief for more employees and have only one chief, resulting thus another hierarchical structure.

Most of the hierarchies I met were having maximum 7 levels, where a level represents a parent-child relation, typically there is no need for higher hierarchies. In a RDBMS such relations are represented with self-referencing tables or self-referencing structures in case more than one table is used to represent the relation. The problem with such structures is that in order to get the whole hierarchy a join must be created for each relation, though as the number of levels could vary, such queries can become quite complex. Therefore RDBMS came with their own solutions in order to work with hierarchies

There are several types of problems specific to hierarchies: 1. Building the hierarchy behind of an item in the structure
2. Identifying the top-most parent of an item in the structure
3. Identifying the top-most parent of an item in the structure and building the hierarchy behind it – a combination of the previous mentioned problems

Building the hierarchy

SQL Server 2005 introduced the common table expressions (CTE), a powerful feature that allows building whole hierarchies in a simplistic manner. From a structural point of view a CTE has two elements – the anchor member defining the starting point of the query, respectively the recursive member that builds upon the anchor member. The starting point of the query in this case is the entity or entities for which the structure is built, and here there are 2 situations – we are looking for all topmost entities or for the entities matching a certain criteria, typically looking for one or more Product Numbers. The difference between the two scopes could lead to different approaches – whether to include the Product information in the CTE or to use the CTE only to built the hierarchical structure, and add only in the end the production information. Before things would become fuzzier let’s start with an example, here are the two approaches for defining the anchor member, the first query identifying the top-most assemblies, while the second searches for the components in the BOM that matches a criteria (e.g. the Product’s name starts with BK-R93R).

So we have the starting point, how about the recursive member? Without using a CTE, normally we would have to join either of the two queries again with the BillofMaterials table in order to retrieve the components for the Products already retrieved, then the final result must be joined again with the BillofMaterials, and repeat this step over and over again until a predefined number of iterations are met, or until no components are retrieved anymore. Actually that’s what a CTE does too, and in order to make things simpler, a CTE has a name that it’s reused in the CTE in order to handle the recursive part, and outside of the CTE in order to use the final result. Supposing that CTE’s name is BOM here is the recursive member:
    The anchor and recursive member are merged in the CTE with the help of a UNION therefore the number of attributes must be the same and their data type must be compatible. Here is the final query:
    The TotalQty attribute has been introduced in order to show the cumulated quantity for each component, for example if the Assembly has the PerAssemblyQty 2 and the Component has the PerAssemblyQty 10 then the actual total quantity needed is 2*10.
    The Path attribute has been introduced in order to sort the hierarchy lexically based on the sequence formed from the ComponentID, it could have been used instead the Product Number.

Identifying the top-most parent

    The CTE can be used also for identifying the top-most parent of a given Product or set of Items, the second anchor query member could be used for this, while the recursive member would involve the reverse join between BOM and BillOfMaterials. Here is the final query:
Given the fact that the final CTE output includes the output from each iteration when only the data from the last iteration are needed, this might not be the best solution. I would expect that the simulation of recursion with the help of a temporary table would scale better in this case.

Another interesting fact to note is that in this case more assemblies are found to match the criteria of top-most item, meaning that the BOMs are not partitioned – two BOMs could have common elements.

Happy coding!

07 February 2010

🎡SSIS: SQL Server to Oracle Data Export (Second Magic Class)

In a previous post I described the steps needed to be followed in order to export the data from SQL Server to Excel, this time I will describe the steps needed in order to export the data from SQL Server to Oracle. For this is requested to have Oracle database product installed on your computer, or in case you are having already an Oracle database server on the network then you’ll need only the Oracle Client installed locally, and a SQL Development Tool like SQL Developer or TOAD, with a plus for the later.

Like in the previous tutorial we’ll use the Production.Product table from AdventureWorks database coming with SQL Server.

Step 1: Create an Oracle User

If you already have an Oracle User created then you could skip this step, unless you want to create a User only for the current tutorial or for loading other AdventureWorks tables in Oracle. For this we will create first a permanent tablespace (e.g. AdventureWorks), an allocation of space in the database, then create the actual User (e.g. SQLServer) on the just created tablespace, action the will create a schema with the same name as the User, and finally grant the User ‘create session’, ‘create materialized view’ and ‘create table’ privileges.

Open your SQL Developer tool of choice and connect to the Oracle database, then type the following statements and run them one by one, do not forget to provide a strong-typed password in IDENTIFIED BY clause, then create a new connection using the just created User.
 
-- Create TableSpace
CREATE BIGFILE TABLESPACE AdventureWorks
DATAFILE 'AdventureWorks.dat'
SIZE 20M AUTOEXTEND ON; 

-- Create User CREATE USER SQLServer 
IDENTIFIED BY <your_password>
DEFAULT TABLESPACE AdventureWorks 
QUOTA 20M ON AdventureWorks; 

-- Grant privileges
GRANT CREATE SESSION
, CREATE MATERIALIZED VIEW
, CREATE TABLE TO SQLServer; 

Step 2
: Start SQL Server Import and Export Wizard

From SQL Server Management Studio choose the database from which you want to export the data (e.g. AdventureWorks), right click on it and from the floating menu choose Tasks/Export Data

Step 3: Choose a Data Source

In ‘Choose a Data Source’ step select the ‘Data Source’, SQL Server Native Client 10.0 for exporting data from SQL Server, choose ‘Server name’ from the list of SQL Server available, select the Authentication mode and the Database (e. g. AdventureWorks), then proceed to the next step by clicking ‘Next’.

Step 4: Choose a Destination

In ‘Choose a Destination’ step select the Destination, in this case ‘Oracle Provider for OLE DB’ and then click on ‘Properties’ in order to provide the connectivity details, in ‘Data Link Properties’ dialog just opened, enter the Data Source (the SID of your Oracle database), the ‘User name’ (e.g. SQLServer), the ‘Password’ and check ‘Allow saving password’ checkbox, without this last step being not possible to connect to the Oracle database. Then test the connection by clicking the ‘Test Connection’, and if the ‘Test connection succeeded’ proceed to the next step. 

Note:
Excepting ‘Oracle Provider for OLE DB’ there are three other drivers that allows you to connect to an Oracle database: ‘Microsoft OLE DB Provider for Oracle’, ‘.Net Framework Data Provider for Oracle’, respectively ‘Oracle Data Provider for .Net’, each of them coming with their own downsizes and benefits.

Step 5: Specify Table Copy Or Query

In this step choose: ‘Copy data from one or more tables or views’ option.

Step 6: Select Source Tables and Views

In ‘Select Source Tables and Views’ step select the database objects (e.g. Production.vProducts) from which you’ll export the data. You could go with the provided Destination table, though if you are using the current settings Oracle table’s name will be “Product” (including quotes) and not Product as we’d expected. There are two options, go with the current settings and change table’s name and columns, they having also this problem, in Oracle, or create the table manually in Oracle. I prefer the second option because I could use the SQL script automatically generated by Oracle, for this with the just chosen table selected click on ‘Edit Mappings’ button that will bring the ‘Column Mappings’ dialog and click the ‘Edit SQL’ button, the query appearing as editable in ‘Create Table SQL Statement’ dialog. 


Copy the query and copy paste it in the Oracle SQL Developer, then do a replace of the quotes and delete the rowguid column as we don’t needed, in plus when attempting to dump the data, given the incompatibility of data types, an error will be raised. There is a second change we’ll have to make, to rename the Size column to ProductSize, this because Size is a reserved keyword.

 
CREATE TABLE sqlserver.Product (
ProductID INTEGER NOT NULL,
Name NVARCHAR2(50) NOT NULL,
ProductNumber NVARCHAR2(25) NOT NULL,
MakeFlag NUMBER NOT NULL,
FinishedGoodsFlag NUMBER NOT NULL,
Color NVARCHAR2(15),
SafetyStockLevel INTEGER NOT NULL,
ReorderPoint INTEGER NOT NULL,
StandardCost NUMBER NOT NULL,
ListPrice NUMBER NOT NULL,
ProductSize NVARCHAR2(5),
SizeUnitMeasureCode NCHAR(3),
WeightUnitMeasureCode NCHAR(3),
Weight NUMBER,
DaysToManufacture INTEGER NOT NULL,
ProductLine NCHAR(2),
Class NCHAR(2),
Style NCHAR(2),
ProductSubcategoryID INTEGER,
ProductModelID INTEGER,
SellStartDate TIMESTAMP NOT NULL,
SellEndDate TIMESTAMP,
DiscontinuedDate TIMESTAMP,
ModifiedDate TIMESTAMP NOT NULL) 

Unfortunately this approach has a downside, we have to go two steps back in Choose a Destination, then click on ‘Properties’ again and without doing any change test the connection by clicking the ‘Test Connection’, then proceed forward to the current step, and in ‘Select Source Tables and Views’ step select again the database object (e.g. Production.vProducts) from which you’ll export the data but this time in Destination dropdown control search for the table just created (e.g. sqlserver.Product).

Because we have removed the rowguid column from the table created in Oracle, we’ll have to change the mappings, for this click on ‘Edit Mappings’ and in ‘Mapping’ list box from ‘Column Mappings’, in the Destination dropdown next to rowguid column select ‘’, thus the respective column being ignored from the mapping. Accept the change and proceed to the next step. 


Note:
You could choose to export more than one table, though for this you’ll have to create manually each table in Oracle and then map the created tables in ‘Select Source Tables and Views’ step.

Step 7: Review Data Type Mapping

Step 8: Save and Run Package

In ‘Save and Run Package’ check the ‘Run immediately’, respectively the ‘Save SSIS Package’ and ‘File System’ option, then proceed to the next step by clicking ‘Next’.

Step 9: Save SSIS Package

    In ‘Save SSIS Package’ step provide the intended Name (e.g. Export Products To Oracle) or Description of the Package, choose the location where the package will be saved, then proceed to the next step by clicking ‘Next’.

Step 10: Complete the Wizard

Step 11: Executing the Package

Step 12: Checking the Data

If the previous step completed successfully, then you can copy the following statement in Oracle SQL Developer and execute it, the table should be already populated.

SELECT * 
FROM sqlserver.Product 

Note:

If you used other schema to connect to, then you’ll have to replace the above schema (e.g. SQLServer) with your schema.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.