Showing posts with label OLAP. Show all posts
Showing posts with label OLAP. Show all posts

23 February 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part IX: From OLTP to OLAP Data Models)

With SQL databases Microsoft brought OLTP to Microsoft Fabric which allows addressing a wider range of requirements, though this involves also some challenges that usually are addressed by the transition from the OLTP to OLAP architectures. Typically, there's an abstraction layer that is built on top of the OLTP data models that allows to address the various OLAP requirements. As soon as OLTP and OLAP models are mixed together, this opens the door to design and data quality issues that have impact on the adoption of solutions by users. Probably, those who worked with MS Access or even MS Excel directly or in combination with SQL Server can still remember the issues they run into.

Ideally, it should be a separation layer between OLTP and the OLAP data. This can be easily achieved in SQL databases by using two different schemas that mimic the interaction between the two types of architectures. So, supposing that the dbo schema from the SalesLT is the data as maintain by the OLTP layer, one can add an additional schema Test in which the OLAP logic is modelled. This scenario is not ideal, though it allows to model the two aspects of the topic considered. The following steps are to be performed in the environment in which the SalesLT database was created. 

Independently in which layer one works, it's ideal to create a set of views that abstracts the logic and ideally simplifies the processing of data. So, in a first step it's recommended to abstract the data from the source by creating a set of views like the one below:

-- drop view (cleaning)
-- DROP VIEW IF EXISTS SalesLT.vCustomerLocations 

-- create view
CREATE VIEW SalesLT.vCustomerLocations
-- Customers with main office
SELECT CST.CustomerId 
, CSA.AddressID
, CST.Title
, CST.FirstName 
, IsNull(CST.MiddleName, '') MiddleName
, CST.LastName 
, CST.CompanyName 
, CST.SalesPerson 
, IsNull(CSA.AddressType, '') AddressType
, IsNull(ADR.City, '') City
, IsNull(ADR.StateProvince, '') StateProvince
, IsNull(ADR.CountryRegion, '') CountryRegion
, IsNull(ADR.PostalCode, '') PostalCode
FROM SalesLT.Customer CST
	 LEFT JOIN SalesLT.CustomerAddress CSA
	   ON CST.CustomerID = CSA.CustomerID
	  AND CSA.AddressType = 'Main Office'
	 	LEFT JOIN SalesLT.Address ADR
		  ON CSA.AddressID = ADR.AddressID

The view uses LEFT instead of FULL joins because this allows more flexibility, respectively identifying the gaps existing between entities (e.g. customers without addresses). In these abstractions, the number of transformations is kept to a minimum to reflect the data as reflected by the source. It may be chosen to minimize the occurrence of NULL values as this simplifies the logic for comparisons (see the use of IsNull).

Once the abstraction from the OLTP layer was built, one can make the data available in the OLAP layer:

-- create schema

-- dropping the target table (for cleaning)
-- DROP TABLE IF EXISTS Test.CustomerLocations

-- Option 1
-- create the table on the fly
INTO Test.CustomerLocations
FROM SalesLT.vCustomerLocations

-- Option 2
-- create the table manually (alternative to precedent step
CREATE TABLE [Test].[CustomerLocations](
	[CustomerId] [int] NOT NULL,
	[AddressID] [int] NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [nvarchar](50) NULL,
	[MiddleName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NULL,
	[CompanyName] [nvarchar](128) NULL,
	[SalesPerson] [nvarchar](256) NULL,
	[AddressType] [nvarchar](50) NULL,
	[City] [nvarchar](30) NULL,
	[StateProvince] [nvarchar](50) NULL,
	[CountryRegion] [nvarchar](50) NULL,
	[PostalCode] [nvarchar](15) NULL

-- insert records
INSERT INTO Test.CustomerLocations
FROM SalesLT.vCustomerLocations

-- checking the output (both scenarios)
SELECT top 100 *
FROM Test.CustomerLocations

-- drop the view (for cleaning)
-- DROP VIEW IF EXISTS Test.vCustomerLocations

-- create view
CREATE VIEW Test.vCustomerLocations
-- Customer locations
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, Concat(CSL.FirstName, ' ' + CSL.MiddleName, ' ', CSL.LastName) FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType
, CSL.City
, CSL.StateProvince
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.CustomerLocations CSL

-- test the view
SELECT top 100 *
FROM Test.vCustomerLocations

Further on, one can create additional objects as required. Usually, a set of well-designed views is enough, offering the needed flexibility with a minimum of code duplication. In addition, one can build stored procedures and table-valued functions as needed:

-- drop the function (for cleaning)
-- DROP FUNCTION IF EXISTS Test.tvfGetCustomerAddresses

-- generated template - function
CREATE FUNCTION Test.tvfGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
-- Customers by Country & State province
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, CSL.FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType 
, CSL.City
, CSL.StateProvince 
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.vCustomerLocations CSL
WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
  AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)

-- retrieving all records
FROM Test.tvfGetCustomerAddresses(NULL, NULL)

-- providing parameters
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')

-- filtering on non-parametrized volumns
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')
WHERE City = 'Salt Lake City'

-- drop the procedure (for cleaning)
-- DROP PROCEDURE IF EXISTS Test.spGetCustomerAddresses 

-- generated template - stored procedure
CREATE PROCEDURE Test.spGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
-- Customers by Country & State province
	SELECT CSL.CustomerId 
	, CSL.AddressID
	, CSL.Title
	, CSL.FirstName 
	, CSL.MiddleName 
	, CSL.LastName 
	, CSL.FullName
	, CSL.CompanyName 
	, CSL.SalesPerson 
	, CSL.AddressType 
	, CSL.City
	, CSL.StateProvince 
	, CSL.CountryRegion 
	, CSL.PostalCode
	FROM Test.vCustomerLocations CSL
	WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
	AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)

-- retrieving all records
EXEC Test.spGetCustomerAddresses NULL, NULL

-- providing parameters
 EXEC Test.spGetCustomerAddresses 'United States', 'Utah'

These steps can repeated for each entity in scope.

This separation between OLTP and OLAP is usually necessary given that business processes need a certain amount of time until they are correctly reflected as per reporting needs. Otherwise, the gaps can negatively impact the quality of data used for reporting. For some reports these deviation might be acceptable, though there will be probably also (many) exceptions. Independently of the solution used, it's still needed to make sure that the data are appropriate for the processes and reporting. 

If no physical separation is needed between the two types of layers, one can remove the persisted tables from the logic and keep the objects as they are.

Independently of which architecture is chosen, one shouldn't forget to validate one's presumptions in what concerns the data model (e.g. customers without addresses, address types, etc.).

Previous Post <<||>> Next Post

01 March 2010

🕋Data Warehousing: Roll/Drill Up (Definitions)

"The act of removing a row header or replacing a row header in a report to summarize the rows of the answer set. Sometimes called dynamic aggregation." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed , 2002)

"To present higher levels of summarization. " (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"Method of analysis for retrieving higher levels of summary data starting from detailed data." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"Data analysis performed on a data set with applied mathematical functions, associated with fewer dimensions, higher levels of hierarchy in one or more dimensions, or both." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"In SQL, an OLAP extension used with the GROUP BY clause to aggregate data by different dimensions. (Rolling up the data is the exact opposite of drilling down the data.)" (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"To collect subsets of data from multiple locations in one location." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The act of summarizing or aggregating data to a level higher than the previous level of detail." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

01 February 2010

🕋Data Warehousing: Cube (Definitions)

"A subset of data, usually constructed from a data warehouse, that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. A cube's data is stored in one or more partitions." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Name for a dimensional structure on a multidimensional or online analytical processing (OLAP) database platform, originally referring to the simple three-dimension case of product, market, and time." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed, 2002)

"Proprietary data structure used to store data for an online analytical processing (OLAP) end user data access and analysis tool." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A multidimensional data structure that represents the intersections of each unique combination of dimensions. At each intersection there is a cell that contains a data value." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"Used with online analytical processing (OLAP), data cubes are multidimensional structures built from one or more tables in a relational database(s)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"A multidimensional structure that contains dimensions and measures." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A multidimensional structure that contains dimensions and measures. Cubes are a denormalized version of either the entire database or part of the database and are used within SQL Server Analysis Services (SSAS)." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A set of data that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A database object that organizes data for accessibility in an OLAP database." (Ken Withee, "Microsoft® Business Intelligence For Dummies®", 2010)

"A multi-dimensional data structure that contains an aggregate value at each point, i.e., the result of applying an aggregate function to an underlying relation. Data cubes are used to implement OLAP." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Refers to the multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value in the data cube is based on the x-, y-, and z-axes of the cube. Data cubes are static (must be created before they are used), so they cannot be created by an ad hoc query." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"A set of data that is organized and summarized into a multidimensional structure that is defined by a set of dimensions and measures." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A multidimensional representation of data needed for online analytical processing, multidimensional reporting, or multidimensional planning applications." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Cubes, also known as OLAP cubes, are preprocessed and presummarized collections of data that drastically improve query time. [...] OLAP cubes are logical structures as defined by the metadata." (Piethein Strengholt, "Data Management at Scale", 2020)

16 July 2009

🛢DBMS: Online Analytical Processing [OLAP] (Definitions)

"A technology that uses multidimensional structures to provide rapid access to data for analysis. The source data for OLAP is commonly stored in data warehouses in a relational database." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

[analytical processing:] "A general term that encompasses data warehousing and OLAP. Analytical processing produces information for management decisions. Contrasts with operational processing." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The capability to view data in different ways, organizing the data by various dimensions to perform analysis, query, and reporting interactively." (Margaret Y Chu, "Blissful Data ", 2004)

[analytical processing:] "using the computer to produce an analysis for management decision, usually involving trend analysis, drill-down analysis, demographic analysis, profiling, and so forth." (William H Inmon, "Building the Data Warehouse", 2005)

"A database designed to support analysis for decision making in an organization." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The ability for a user to 'drill down' on various data attributes in order to gain a more detailed view of the data. Such analysis enables a user to view different perspectives of the same data in order to facilitate decision making. OLAP is part of the broader category of business intelligence." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"Tools that provide different ways of summarizing multidimensional data." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Process whereby raw data is stored in a multidimensional format so that it can be analyzed easily by decision-makers." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"A data mining approach for performing multi-dimensional queries." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A collection of common business analysis functions that are difficult to perform directly with SQL. Some of the specific functions that fall under the OLAP umbrella include time series comparisons, ranking, ratios, penetration, thresholds, and contribution to report or to the whole data population. Most business intelligence tools provide this type of functionality. The capabilities can be implemented in a variety of different data storage mechanisms." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A query service that overlays a data warehouse by creating and maintaining a set of summary views (automatic summary tables, or ASTs) to enable quick access to summary data." (Toby J Teorey, "Database Modeling and Design 4th Ed", 2010)

"An approach to database design that focuses on analytical activities such as viewing data in various aggregations, slicing and dicing data to meet different criteria, and grouping data." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"An approach to quickly answer multidimensional analytical queries." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Systems that contain read-only data that can be queried and analyzed much more efficiently than OLTP application databases." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A type of computer processing that provides analysis of data stored in a database. OLAP tools enable users to analyze different dimensions of multidimensional data." (Craig S Mullins, "Database Administration", 2012)

"This technique for analyzing business data uses cubes, which are like multidimensional pivot tables in spreadsheets. OLAP tools can perform trend analysis and enable drilling down into data. They enable multidimensional analysis, such as analyzing by time, product, and geography." (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"OLAP is software for manipulating multidimensional data from a variety of sources that has been stored in a data warehouse. The software can create various views and representations of the data. OLAP software provides fast, consistent, interactive access to shared, multidimensional data." (Ciara Heavin & Daniel J Power, "Decision Support, Analytics, and Business Intelligence" 3rd Ed., 2017)

"The process of collecting data from one or many sources; transforming and analyzing the consolidated data quickly and interactively; and examining the results across different dimensions of the data by looking for patterns, trends, and exceptions within complex relationships of that data." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

08 November 2008

🧭Business Intelligence: Enterprise Reporting (Part I: An Introduction)

Business Intelligence
Business Intelligence Series


Let's suppose that your company invested a lots of money in an ERP system, and besides the complex setup many customizations were made. To increase ERP system's value, monitor the operations and make accurate decisions you'll need some reports out of it. What do you do then?

In general, there are 5 types of reporting needs: 
  • OLTP (On-Line Transaction Processing) system providing reports with actual (live) data;
  • OLAP (On-Line Analytical Processing) reports with drill-down, roll-up, slice and dice or pivoting functionality, working with historical data, the data source(s) being refreshed periodically;
  • ad-hoc reports – reports provided on request, often satisfying one time reports or reports with sporadic needs;
  • Data Mining tool(s) focusing on knowledge discovery (aka Data Science);
  • direct data access and analysis (aka self-service BI).
Standard Reports 

ERP systems like Oracle Applications, Dynamics AX/365 or SAP come by default with a set of (predefined) standard reports, which in theory cover basic reporting needs. Unfortunately the standard reports are not as flexible as expected, e.g. they can be exported only to text and/or in a non-tabular format, and therefore impossible to reuse for detailed analysis, have inadequate filtering parameters/constraints, behavior or scope. If existing functionality has been customized, most probably existing reports need to be adapted to the new logic. In the end customers need to change the existing reports or adopt an OLAP solution.
Vendors tend to keep the secrecy about their solutions and/or don't invest much time into documenting systems' functionality. Therefore, the information about ERP’s internals is limited, while good developers are hard to find or really expensive, and often they needing to reinvent the wheel. ERP vendors do provide documentation about their system's internals, though there are still many gaps concerning tables’ structure and functionality. Fortunately, armed with enough patience, some knowledge about existing business processes and databases, a developer can reengineer an important part of the logic, though there's always a shade of doubt whether the logic is entirely correct or complete. Other good news is that more and more professionals blog on ERP topics, however few are the source that bring something new.

OLAP Reporting  

OLAP solutions presume the existence of a data warehouse that reflects the business model, and when intelligently built it can satisfy an important percentage from the BI requirements. Building a data warehouse or a set of data marts is an expensive and time consuming endeavor and rarely arrives to satisfy everybody’s needs. There are also vendors that provide commercial off-the-shelf data models and solutions, and at a first view they look like an important deal, however such models are inflexible and seldom cover all requirements. One can end up by customizing and extending the model, running in all kind of issues involving model’s design, flexibility, quality, resources and costs.   
There are many ways in which things can go wrong or be misused. One of such scenarios is when an OLAP system is used to satisfy OLTP reporting needs. It’s like using a city car in a country cross race – you might make it to compete or even end the race, if you are lucky enough, but don’t expect to make a success out of it!

Ad-hoc Reporting   

The need for ad-hoc reports will be there no matter how complete and flexible are your existing reports. There are always new requirements that must be fulfilled in utile time and not rely on the long cycle time needed for an OLTP/OLAP report. Actually many of the reports start as ad-hoc reports and once their scope and logic stabilized they are moved to the reporting solution. Talking about new reports requirements, it worth to mention that many of the users don’t know exactly what they want, what is possible to get and what information it makes sense to show and at what level of detail in order to have a report that reflects the reality. 

In theory is needed a person who facilitate the communication between users and development team, especially when the work is outsourced. Such a person should have in theory a deep understanding of the business, of the ERP system and reporting possibilities, deeper the knowledge, shorter the delivery cycle time. Maybe such a person could be dispensable if the users and development have the required skill set and knowledge to define and interpret clearly the requirements, however I doubt that’s achievable on large scale. On the other side such attributions could be taken by the IM or functional leaders that support the ERP system, it might work, at least in theory.

Data Mining   

Data Mining tools and models are supposed to leverage the value of an ERP system beyond the functionality provided by analytic reports by helping to find hidden patterns and trends in data, to elaborate predictions and estimates. Here I resume only saying that DM makes sense only when the business reached a certain maturity, and I’m considering here mainly the costs/value ratio (the expected benefits needing to be greater than the costs) and effort required from business side in pursuing such a project.

Self-Service BI   

There are situations in which the functionality provided by reporting tools doesn’t fulfill users’ requirements, one of such situations being when users (aka data citizens) need to analyze data by themselves, to link data from different sources, especially Excel sheets. It’s true that vendors tried to address such requirements, though I don’t think they are mature enough, easy to use or allow users to go beyond their skills and knowledge.
Most of such scenarios resume in accessing various sources over ODBC or directly using Excel or MS Access, such solutions being adequate more for personal use. The negative side is that people arrive to misuse them, often ending up by having a multitude of such solution which maybe would make sense to have implemented as a report.

There are managers who believe that such tools would allow eliminating the need for ad-hoc reports, it might be possible in isolated cases though don’t expect from users to be a Bill Inmon or Bill Gates!


All the tools have their limitations, no matter how complex they are, and I believe that not always a single reporting tool or platform will address all requirements. Each of such tools need a support team and even a center of excellence, so assure yourself that you have the resources, knowledge and infrastructure to support them!

Previous Post <<||>> Next Post
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.