23 February 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part IX: From OLTP to OLAP Data Models)

With SQL databases Microsoft brought OLTP to Microsoft Fabric which allows addressing a wider range of requirements, though this involves also some challenges that usually are addressed by the transition from the OLTP to OLAP architectures. Typically, there's an abstraction layer that is built on top of the OLTP data models that allows to address the various OLAP requirements. As soon as OLTP and OLAP models are mixed together, this opens the door to design and data quality issues that have impact on the adoption of solutions by users. Probably, those who worked with MS Access or even MS Excel directly or in combination with SQL Server can still remember the issues they run into.

Ideally, it should be a separation layer between OLTP and the OLAP data. This can be easily achieved in SQL databases by using two different schemas that mimic the interaction between the two types of architectures. So, supposing that the dbo schema from the SalesLT is the data as maintain by the OLTP layer, one can add an additional schema Test in which the OLAP logic is modelled. This scenario is not ideal, though it allows to model the two aspects of the topic considered. The following steps are to be performed in the environment in which the SalesLT database was created. 

Independently in which layer one works, it's ideal to create a set of views that abstracts the logic and ideally simplifies the processing of data. So, in a first step it's recommended to abstract the data from the source by creating a set of views like the one below:

-- drop view (cleaning)
-- DROP VIEW IF EXISTS SalesLT.vCustomerLocations 

-- create view
CREATE VIEW SalesLT.vCustomerLocations
-- Customers with main office
AS
SELECT CST.CustomerId 
, CSA.AddressID
, CST.Title
, CST.FirstName 
, IsNull(CST.MiddleName, '') MiddleName
, CST.LastName 
, CST.CompanyName 
, CST.SalesPerson 
, IsNull(CSA.AddressType, '') AddressType
, IsNull(ADR.City, '') City
, IsNull(ADR.StateProvince, '') StateProvince
, IsNull(ADR.CountryRegion, '') CountryRegion
, IsNull(ADR.PostalCode, '') PostalCode
FROM SalesLT.Customer CST
	 LEFT JOIN SalesLT.CustomerAddress CSA
	   ON CST.CustomerID = CSA.CustomerID
	  AND CSA.AddressType = 'Main Office'
	 	LEFT JOIN SalesLT.Address ADR
		  ON CSA.AddressID = ADR.AddressID

The view uses LEFT instead of FULL joins because this allows more flexibility, respectively identifying the gaps existing between entities (e.g. customers without addresses). In these abstractions, the number of transformations is kept to a minimum to reflect the data as reflected by the source. It may be chosen to minimize the occurrence of NULL values as this simplifies the logic for comparisons (see the use of IsNull).

Once the abstraction from the OLTP layer was built, one can make the data available in the OLAP layer:

-- create schema
CREATE SCHEMA Test

-- dropping the target table (for cleaning)
-- DROP TABLE IF EXISTS Test.CustomerLocations

-- Option 1
-- create the table on the fly
SELECT *
INTO Test.CustomerLocations
FROM SalesLT.vCustomerLocations

-- Option 2
-- create the table manually (alternative to precedent step
CREATE TABLE [Test].[CustomerLocations](
	[CustomerId] [int] NOT NULL,
	[AddressID] [int] NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [nvarchar](50) NULL,
	[MiddleName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NULL,
	[CompanyName] [nvarchar](128) NULL,
	[SalesPerson] [nvarchar](256) NULL,
	[AddressType] [nvarchar](50) NULL,
	[City] [nvarchar](30) NULL,
	[StateProvince] [nvarchar](50) NULL,
	[CountryRegion] [nvarchar](50) NULL,
	[PostalCode] [nvarchar](15) NULL
) ON [PRIMARY]
GO

-- insert records
INSERT INTO Test.CustomerLocations
SELECT *
FROM SalesLT.vCustomerLocations


-- checking the output (both scenarios)
SELECT top 100 *
FROM Test.CustomerLocations


-- drop the view (for cleaning)
-- DROP VIEW IF EXISTS Test.vCustomerLocations

-- create view
CREATE VIEW Test.vCustomerLocations
-- Customer locations
AS
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, Concat(CSL.FirstName, ' ' + CSL.MiddleName, ' ', CSL.LastName) FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType
, CSL.City
, CSL.StateProvince
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.CustomerLocations CSL

-- test the view
SELECT top 100 *
FROM Test.vCustomerLocations

Further on, one can create additional objects as required. Usually, a set of well-designed views is enough, offering the needed flexibility with a minimum of code duplication. In addition, one can build stored procedures and table-valued functions as needed:

-- drop the function (for cleaning)
-- DROP FUNCTION IF EXISTS Test.tvfGetCustomerAddresses

-- generated template - function
CREATE FUNCTION Test.tvfGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
)
RETURNS TABLE
-- Customers by Country & State province
AS
RETURN (
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, CSL.FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType 
, CSL.City
, CSL.StateProvince 
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.vCustomerLocations CSL
WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
  AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)
);

-- retrieving all records
SELECT *
FROM Test.tvfGetCustomerAddresses(NULL, NULL)

-- providing parameters
SELECT *
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')

-- filtering on non-parametrized volumns
SELECT *
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')
WHERE City = 'Salt Lake City'



-- drop the procedure (for cleaning)
-- DROP PROCEDURE IF EXISTS Test.spGetCustomerAddresses 

-- generated template - stored procedure
CREATE PROCEDURE Test.spGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
)
-- Customers by Country & State province
AS
BEGIN
	SELECT CSL.CustomerId 
	, CSL.AddressID
	, CSL.Title
	, CSL.FirstName 
	, CSL.MiddleName 
	, CSL.LastName 
	, CSL.FullName
	, CSL.CompanyName 
	, CSL.SalesPerson 
	, CSL.AddressType 
	, CSL.City
	, CSL.StateProvince 
	, CSL.CountryRegion 
	, CSL.PostalCode
	FROM Test.vCustomerLocations CSL
	WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
	AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)
END 

-- retrieving all records
EXEC Test.spGetCustomerAddresses NULL, NULL

-- providing parameters
 EXEC Test.spGetCustomerAddresses 'United States', 'Utah'

These steps can repeated for each entity in scope.

This separation between OLTP and OLAP is usually necessary given that business processes need a certain amount of time until they are correctly reflected as per reporting needs. Otherwise, the gaps can negatively impact the quality of data used for reporting. For some reports these deviation might be acceptable, though there will be probably also (many) exceptions. Independently of the solution used, it's still needed to make sure that the data are appropriate for the processes and reporting. 

If no physical separation is needed between the two types of layers, one can remove the persisted tables from the logic and keep the objects as they are.

Independently of which architecture is chosen, one shouldn't forget to validate one's presumptions in what concerns the data model (e.g. customers without addresses, address types, etc.).

Previous Post <<||>> Next Post

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.