SQL Troubles: OLTP

Showing posts with label OLTP. Show all posts

23 February 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part IX: From OLTP to OLAP Data Models)

With SQL databases Microsoft brought OLTP to Microsoft Fabric which allows addressing a wider range of requirements, though this involves also some challenges that usually are addressed by the transition from the OLTP to OLAP architectures. Typically, there's an abstraction layer that is built on top of the OLTP data models that allows to address the various OLAP requirements. As soon as OLTP and OLAP models are mixed together, this opens the door to design and data quality issues that have impact on the adoption of solutions by users. Probably, those who worked with MS Access or even MS Excel directly or in combination with SQL Server can still remember the issues they run into.

Ideally, it should be a separation layer between OLTP and the OLAP data. This can be easily achieved in SQL databases by using two different schemas that mimic the interaction between the two types of architectures. So, supposing that the dbo schema from the SalesLT is the data as maintain by the OLTP layer, one can add an additional schema Test in which the OLAP logic is modelled. This scenario is not ideal, though it allows to model the two aspects of the topic considered. The following steps are to be performed in the environment in which the SalesLT database was created.

Independently in which layer one works, it's ideal to create a set of views that abstracts the logic and ideally simplifies the processing of data. So, in a first step it's recommended to abstract the data from the source by creating a set of views like the one below:

-- drop view (cleaning)
-- DROP VIEW IF EXISTS SalesLT.vCustomerLocations 

-- create view
CREATE VIEW SalesLT.vCustomerLocations
-- Customers with main office
AS
SELECT CST.CustomerId 
, CSA.AddressID
, CST.Title
, CST.FirstName 
, IsNull(CST.MiddleName, '') MiddleName
, CST.LastName 
, CST.CompanyName 
, CST.SalesPerson 
, IsNull(CSA.AddressType, '') AddressType
, IsNull(ADR.City, '') City
, IsNull(ADR.StateProvince, '') StateProvince
, IsNull(ADR.CountryRegion, '') CountryRegion
, IsNull(ADR.PostalCode, '') PostalCode
FROM SalesLT.Customer CST
	 LEFT JOIN SalesLT.CustomerAddress CSA
	   ON CST.CustomerID = CSA.CustomerID
	  AND CSA.AddressType = 'Main Office'
	 	LEFT JOIN SalesLT.Address ADR
		  ON CSA.AddressID = ADR.AddressID

The view uses LEFT instead of FULL joins because this allows more flexibility, respectively identifying the gaps existing between entities (e.g. customers without addresses). In these abstractions, the number of transformations is kept to a minimum to reflect the data as reflected by the source. It may be chosen to minimize the occurrence of NULL values as this simplifies the logic for comparisons (see the use of IsNull).

Once the abstraction from the OLTP layer was built, one can make the data available in the OLAP layer:

-- create schema
CREATE SCHEMA Test

-- dropping the target table (for cleaning)
-- DROP TABLE IF EXISTS Test.CustomerLocations

-- Option 1
-- create the table on the fly
SELECT *
INTO Test.CustomerLocations
FROM SalesLT.vCustomerLocations

-- Option 2
-- create the table manually (alternative to precedent step
CREATE TABLE [Test].[CustomerLocations](
	[CustomerId] [int] NOT NULL,
	[AddressID] [int] NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [nvarchar](50) NULL,
	[MiddleName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NULL,
	[CompanyName] [nvarchar](128) NULL,
	[SalesPerson] [nvarchar](256) NULL,
	[AddressType] [nvarchar](50) NULL,
	[City] [nvarchar](30) NULL,
	[StateProvince] [nvarchar](50) NULL,
	[CountryRegion] [nvarchar](50) NULL,
	[PostalCode] [nvarchar](15) NULL
) ON [PRIMARY]
GO

-- insert records
INSERT INTO Test.CustomerLocations
SELECT *
FROM SalesLT.vCustomerLocations


-- checking the output (both scenarios)
SELECT top 100 *
FROM Test.CustomerLocations


-- drop the view (for cleaning)
-- DROP VIEW IF EXISTS Test.vCustomerLocations

-- create view
CREATE VIEW Test.vCustomerLocations
-- Customer locations
AS
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, Concat(CSL.FirstName, ' ' + CSL.MiddleName, ' ', CSL.LastName) FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType
, CSL.City
, CSL.StateProvince
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.CustomerLocations CSL

-- test the view
SELECT top 100 *
FROM Test.vCustomerLocations

Further on, one can create additional objects as required. Usually, a set of well-designed views is enough, offering the needed flexibility with a minimum of code duplication. In addition, one can build stored procedures and table-valued functions as needed:

-- drop the function (for cleaning)
-- DROP FUNCTION IF EXISTS Test.tvfGetCustomerAddresses

-- generated template - function
CREATE FUNCTION Test.tvfGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
)
RETURNS TABLE
-- Customers by Country & State province
AS
RETURN (
SELECT CSL.CustomerId 
, CSL.AddressID
, CSL.Title
, CSL.FirstName 
, CSL.MiddleName 
, CSL.LastName 
, CSL.FullName
, CSL.CompanyName 
, CSL.SalesPerson 
, CSL.AddressType 
, CSL.City
, CSL.StateProvince 
, CSL.CountryRegion 
, CSL.PostalCode
FROM Test.vCustomerLocations CSL
WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
  AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)
);

-- retrieving all records
SELECT *
FROM Test.tvfGetCustomerAddresses(NULL, NULL)

-- providing parameters
SELECT *
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')

-- filtering on non-parametrized volumns
SELECT *
FROM Test.tvfGetCustomerAddresses('United States', 'Utah')
WHERE City = 'Salt Lake City'



-- drop the procedure (for cleaning)
-- DROP PROCEDURE IF EXISTS Test.spGetCustomerAddresses 

-- generated template - stored procedure
CREATE PROCEDURE Test.spGetCustomerAddresses (
    @CountryRegion nvarchar(50) NULL,
    @StateProvince nvarchar(50) NULL
)
-- Customers by Country & State province
AS
BEGIN
	SELECT CSL.CustomerId 
	, CSL.AddressID
	, CSL.Title
	, CSL.FirstName 
	, CSL.MiddleName 
	, CSL.LastName 
	, CSL.FullName
	, CSL.CompanyName 
	, CSL.SalesPerson 
	, CSL.AddressType 
	, CSL.City
	, CSL.StateProvince 
	, CSL.CountryRegion 
	, CSL.PostalCode
	FROM Test.vCustomerLocations CSL
	WHERE CSL.CountryRegion = IsNull(@CountryRegion, CSL.CountryRegion)
	AND CSL.StateProvince = IsNull(@StateProvince, CSL.StateProvince)
END 

-- retrieving all records
EXEC Test.spGetCustomerAddresses NULL, NULL

-- providing parameters
 EXEC Test.spGetCustomerAddresses 'United States', 'Utah'

These steps can repeated for each entity in scope.

This separation between OLTP and OLAP is usually necessary given that business processes need a certain amount of time until they are correctly reflected as per reporting needs. Otherwise, the gaps can negatively impact the quality of data used for reporting. For some reports these deviation might be acceptable, though there will be probably also (many) exceptions. Independently of the solution used, it's still needed to make sure that the data are appropriate for the processes and reporting.

If no physical separation is needed between the two types of layers, one can remove the persisted tables from the logic and keep the objects as they are.

Independently of which architecture is chosen, one shouldn't forget to validate one's presumptions in what concerns the data model (e.g. customers without addresses, address types, etc.).

Previous Post <<||>> Next Post

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part VI: SQL Databases for OLTP scenarios) [new feature]

Data Warehousing Series

One interesting announcements at Ignite is the availability in public preview of SQL databases in Microsoft Fabric, "a versatile and developer-friendly transactional database built on the foundation of Azure SQL database". With this Fabric can address besides OLAP also OLTP scenarios, evolving thus from analytics to a data platform [1]. According to the announcement, besides the AI-optimized architectural aspects, the feature makes the SQL Azure simple, autonomous and secure by design [1], and these latest aspects are considered in this post.

Simplicity revolves around the deployment and configuration of databases, the creation of a new database requiring giving a name and the database is created in seconds [1]. It’s a considerable improvement compared with the relatively complex setup needed for on-premise configurations, though sometimes more flexibility in configuration is needed upfront or over database’s lifetime. To get a database ready for testing one can import a sample database or get specific data via data flows and/or pipelines [1]. As development tools one can use Visual Studio Code or SSMS [1], and probably more tools will be available in time.

The integration with both GitHub and Azure DevOps allows to configure each database under source control, which is needed for many scenarios especially when multiple resources make changes to the database objects [1]. Frankly, that’s mainly important during the development phase, respectively in scenarios in which multiple people make in parallel changes to the logic. It will be interesting to see how much overhead or challenges the feature adds to development and how smoothly everything works together!

The most important aspect for many solutions is the replication of data in near-real time to the (open-source) delta parquet format in OneLake and thus making the data available for analytics almost immediately [1]. Probably, from this aspect many cloud-based applications can benefit, even if the performance might not be as good as in other well-established architectures. However, there are many other scenarios in which one needs to maintain and use data for OLTP/OLAP purposes. This invites adequate testing and a good weighting of the advantages and disadvantages involved.

A SQL database is a native item in Fabric, and therefore it utilizes Fabric capacity units like other Fabric workloads [1]. One can use the Fabric SKU estimator (still in private preview) to estimate the costs [2], though it will be interesting to see how cost-effective the solutions are. Probably, especially when the infrastructure is already available outside of Fabric, it will be easier and cost-effective to use the mirroring functionality. One should test and have a better estimator before moving blindly from the existing infrastructure to Fabric.

SQL databases in Fabric are autonomous by design, while allowing to get the best performance and availability by default [1]. High availability is reached through zone redundancy, while performance is achieved by scaling automatically the storage and compute to accommodate the workloads [1]. The auto-optimization capability is achieved with the help of the latest Intelligent Query Processing (IQP) enhancements, respectively the creation of missing indexes to improve query performance [1]. It will be interesting to see how the whole process works, given that the maintenance of indexes usually involves some challenges (e.g. identifying covering indexes, indexes needed only for temporary workloads, duplicated indexes).

SQL databases in Fabric are automatically configured for high availability with zone redundancy, while storage and compute scale automatically to accommodate the user workload [1]. The database is auto-optimized through the latest IQP enhancements while the system creates any missing indexes to improve query performance. All data is replicated to OneLake by default [1]. Finally, the database always receives the latest security updates with auto-patching, while automatic backups help in disaster recovery scenarios [1], which can be of real help for database administrators.

References:
[1] Microsoft Fabric Updates Blog (2024) Announcing SQL database in Microsoft Fabric Public Preview [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing New Recruitment for the Private Preview of Microsoft Fabric SKU Estimator [link]

20 May 2017

⛏️Data Management: Data Scrubbing (Definitions)

"The process of making data consistent, either manually, or automatically using programs." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

Processing data to remove or repair inconsistencies." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The process of building a data warehouse out of data coming from multiple online transaction processing (OLTP) systems." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A term that is very similar to data deidentification and is sometimes used improperly as a synonym for data deidentification. Data scrubbing refers to the removal, from data records, of identifying information (i.e., information linking the record to an individual) plus any other information that is considered unwanted. This may include any personal, sensitive, or private information contained in a record, any incriminating or otherwise objectionable language contained in a record, and any information irrelevant to the purpose served by the record." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"The process of removing corrupt, redundant, and inaccurate data in the data governance process. (Robert F Smallwood, Information Governance: Concepts, Strategies, and Best Practices, 2014)

"Data Cleansing (or Data Scrubbing) is the action of identifying and then removing or amending any data within a database that is: incorrect, incomplete, duplicated." (experian) [source]

"Data cleansing, or data scrubbing, is the process of detecting and correcting or removing inaccurate data or records from a database. It may also involve correcting or removing improperly formatted or duplicate data or records. Such data removed in this process is often referred to as 'dirty data'. Data cleansing is an essential task for preserving data quality." (Teradata) [source]

"Data scrubbing, also called data cleansing, is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated." (Techtarget) [source]

"Part of the process of building a data warehouse out of data coming from multiple online transaction processing (OLTP) systems." (Microsoft Technet)

"The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse." (Information Management)

12 February 2010

🕋Data Warehousing: Operational Data Store (Definitions)

"The operational data store is subject-oriented and contains current, integrated, consistent data that reflects the current state of its subject. Operational data stores are similar to enterprise data warehouses in that they may include data from different systems that has been made consistent. Operational data stores are different from enterprise data warehouses in that they are updated frequently to reflect the current state of the operational systems." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A physical set of tables sitting between the operational systems and the data warehouse or a specially administered hot partition of the data warehouse itself. The main reason for an ODS is to provide immediate reporting of operational results if neither the operational system nor the regular data warehouse can provide satisfactory access. Because an ODS is necessarily an extract of the operational data, it also may play the role of source for the data warehouse." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"The operational data store is a subject-oriented, integrated, current, volatile collection of data used to support the operational and tactical decision-making process for the enterprise. It is the central point of data integration for business management, delivering a common view of enterprise data." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A hybrid structure designed to support both operational transaction processing and analytical processing." (William H Inmon, "Building the Data Warehouse", 2005)

"A collection of data from operational systems, most often integrated together, that is used for some operational purpose. The most critical characteristic here is that this is used for some operational function. This operational dependency takes precedence and the ODS should not be considered a central component of the data warehousing environment. An ODS can be a clean, integrated source of data to be pulled into the data warehousing environment." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A database designed to integrate data from multiple sources to facilitate operations. This is as opposed to a data warehouse, which integrates data from multiple sources to facilitate reporting and analysis." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A database that is subject-oriented, read-only to end users, current (non-historical), volatile, and integrated; is separate from and derived from one or more systems of record; and supports day-today business operations and real-time decision making." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A DB system designed to integrate data from multiple sources to allow operational access to the data for operational reporting." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Database for transaction processing systems that uses data warehouse concepts to provide clean data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"A database designed to integrate data from multiple sources for additional operations on the data." (Craig S Mullins, "Database Administration", 2012)

"A data store that provides data from the original source in near real time." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"ODS is the decision support database that integrated operational data from multiple source systems used to capture operational data and is used primarily for near real time operational reporting and analytics. ODSs are used to measure the operations processes efficiencies. The integration pattern is at the lowest levels of granularity and can happen from near real-time to multiple times in a day." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"A data store that integrates data from a range of sources, which is subsequently merged and cleaned to serve as the foundation for enterprise operational reporting. It is an important piece of an Enterprise Data Warehouse (EDW) used for enterprise analytical reporting." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"An ODS system integrates operational or transactional data from multiple systems to support operational reporting." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"An operational data store (ODS) is an alternative to having operational decision support system (DSS) applications access data directly from the database that supports transaction processing (TP). While both require a significant amount of planning, the ODS tends to focus on the operational requirements of a particular business process (for example, customer service), and on the need to allow updates and propagate those updates back to the source operational system from which the data elements were obtained. The data warehouse, on the other hand, provides an architecture for decision makers to access data to perform strategic analysis, which often involves historical and cross-functional data and the need to support many applications." (Gartner)

15 July 2009

🛢DBMS: Online Transaction Processing [OLTP] (Definitions)

"A database management system representing the state of a particular business function at a specific point in time. An OLTP database is typically characterized by having large numbers of concurrent users actively adding and modifying data." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A data processing system designed to record all of the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Any software capability that applies transactional updates and inquiries interactively." (Margaret Y Chu, "Blissful Data", 2004)

"A relational database system used to manage the day-to-day operations of an organization." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The operational processes for executing a business activity while the customer or end user waits for the execution to complete. One example of OLTP would be an automated teller transaction." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"A data-processing system designed to record all the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data. Typically, OLTP systems perform large numbers of relatively small transactions." (Jim Joseph et al, "Microsoft® SQL Server™ 2008 Reporting Services Unleashed", 2009)

"Online transaction processing (OLTP) systems are the fundamental systems used to run the business. These are also called operational systems or operational applications. They are often used as sources of data for the data warehouse." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A data-processing system designed to record all the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data. Typically, OLTP systems perform large numbers of relatively small transactions." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"An approach to database design that focuses on data transactions in particular inserting, updating, and deleting data." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"Class of systems that facilitate and manage transaction-oriented applications." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A transaction processing system where transactions are executed as soon as they occur." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A type of computer processing in which the computer responds immediately to user requests. Each request is a transaction. The opposite of transaction processing is batch processing." (Craig S Mullins, "Database Administration", 2012)

"A type of interactive application in which requests that are submitted by users are processed as soon as they are received. Results are returned to the requester in a relatively short period of time." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Online transaction processing (OLTP) is a mode of processing that is characterized by short transactions recording business events and that normally requires high availability and consistent, short response times. This category of applications requires that a request for service be answered within a predictable period that approaches 'real time'. Unlike traditional mainframe data processing, in which data is processed only at specific times, transaction processing puts terminals online, where they can update the database instantly to reflect changes as they occur. In other words, the data processing models the actual business in real time, and a transaction transforms this model from one business state to another. Tasks such as making reservations, scheduling and inventory control are especially complex; all the information must be current. (Gartner)

12 February 2009

🛢DBMS: Operational Database [ODB] (Definitions)

"An OLTP database that supports the business operations - for example, logging orders and tracking customers. An operational database is usually the source of data for the data warehouse. Operational data is updated frequently to reflect the current value of all transactions." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A database containing a company's up-to-date and modifiable information." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A database that is designed primarily to support a company’s day-to-day operations. Also known as a transactional database or production database." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"A database system that runs core functions to the business in production environments. These are not test or reporting database systems, but actual systems that run the operations of the company." (Jason Williamson, Getting a Big Data Job For Dummies, 2015)

"The database of record, consisting of system-specific reference data and event data belonging to a transaction-update system. It may also contain system control data such as indicators, flags and counters. The operational database is the source of data for the data warehouse. It contains detailed data used to run the day-to-day operations of the business. The data continually changes as updates are made, and reflects the current value of the last transaction." (Information Management)

"The database that contains the live data that is viewed, retrieved, and edited in an Oracle Service Cloud application. While reports that are run on the operational database can access real-time data, the reports cannot process as much information as reports that are run on the report database." (Oracle)

"They carry out regular operations of an organisation and are generally very important to a business. They generally use online transaction processing that allows them to enter, collect and retrieve specific information about the company." (Data Floq)

02 December 2008

🧭Business Intelligence: Perspectives (Part 1: General Issues)

Business Intelligence Series

Introduction

BI projects are noble in intent though many managers and data professionals ignore their implications and prerequisites – data quality (incl. availability), cooperation, maturity, infrastructure, adequate tools and knowledge.

Data Quality

The problem with data starts usually at the source - ERP and other information systems (IS). In theory the system should cover all the basic reporting requirements existing in an enterprise, though that's seldom the case. Therefore, basic reporting needs arrive to be covered by ad-hoc developed tools which often include MS Excel/Access solutions, which are difficult to integrate and manage across organization.

Data Quality (DQ) is maybe the most ignored component in the attempt to build flexible, secure and reliable BI solutions. DQ is based on the validation implemented in source systems and the mechanisms used to cleanse the data before being reported, respectively on the efficiency and effectiveness of existing business processes and best practices.

DQ must be guaranteed for accurate decisions. If the quality is not validated and reviewed periodically, users will be reluctant in using the reports! The reports must be validated as part of the UAT process. Aggregated BI reports need detailed reports that can be used for validation, while the logic and data need to be synchronized accordingly.

The quality of decisions is based on the degree to which data were understood and presented to the decisional factors, though that’s not enough; it's need also a complete perspective, and maybe that’s why some business users prefer to prepare and aggregate data by themselves, the process allowing them in theory to get a deeper understanding of what’s happening.

Cooperation

A BI initiative doesn’t depend only on the effort of a department (usually IT), but on the business as a whole. Unfortunately, the so called partnership is more a theoretical term than a fact, while managers’ and business users' involvement is often suboptimal.

BI implementations are also dependent on consultants’ skills and the degree to which they understood business’ requirements, on team’s cohesion and other project (management) related prerequisites, respectively on knowledge transfer and training.

Tools

Most of the BI tools available on the market don’t satisfy all business, respectively users’ requirements. Even if they excel in some features, they lack in others. Usually, more than one BI tool is needed to cover (most of) the requirements. When features are not available, or they are not mature enough, or they are difficult to learn, users will prefer to use tools they already know.

Another important consideration is that BI tools rely on data models, often inflexible from the point of the data they provide, lacking integrating additional datasets, algorithms and customizations. The overall requirements need to be considered more recently from the point of cloud computing technologies, which becomes steadily a requirement for nowadays business dynamics.

Maturity

Besides the fact that Capability Maturity Models (CMMs) are difficult to implement, organizations lack the knowledge of transforming data into knowledge, respectively in understanding data and evolving it further in wisdom and competitive advantage.

Most of the fancy words used by salesmen to sell a product don’t become reality overnight. Of course, a BI tool might have the potentiality of fulfilling the various technical and nontechnical goals, though between a theoretical potentiality and harnessing the respective potential is a long road that need to be addressed at strategical, tactical and operational levels.

Infrastructure

Infrastructure refers to human and technical components and the way they interact in getting the job done. It's not only about "breaking habits" and using the best tools, but in aligning people and technologies to the desired level of performance, of retaining and diffusing knowledge.

Previous Post <<||>> Next Post

08 November 2008

🧭Business Intelligence: Enterprise Reporting (Part I: An Introduction)

Business Intelligence Series

Introduction

Let's suppose that your company invested a lots of money in an ERP system, and besides the complex setup many customizations were made. To increase ERP system's value, monitor the operations and make accurate decisions you'll need some reports out of it. What do you do then?

In general, there are 5 types of reporting needs:

OLTP (On-Line Transaction Processing) system providing reports with actual (live) data;
OLAP (On-Line Analytical Processing) reports with drill-down, roll-up, slice and dice or pivoting functionality, working with historical data, the data source(s) being refreshed periodically;
ad-hoc reports – reports provided on request, often satisfying one time reports or reports with sporadic needs;
Data Mining tool(s) focusing on knowledge discovery (aka Data Science);
direct data access and analysis (aka self-service BI).

Standard Reports

ERP systems like Oracle Applications, Dynamics AX/365 or SAP come by default with a set of (predefined) standard reports, which in theory cover basic reporting needs. Unfortunately the standard reports are not as flexible as expected, e.g. they can be exported only to text and/or in a non-tabular format, and therefore impossible to reuse for detailed analysis, have inadequate filtering parameters/constraints, behavior or scope. If existing functionality has been customized, most probably existing reports need to be adapted to the new logic. In the end customers need to change the existing reports or adopt an OLAP solution.

Vendors tend to keep the secrecy about their solutions and/or don't invest much time into documenting systems' functionality. Therefore, the information about ERP’s internals is limited, while good developers are hard to find or really expensive, and often they needing to reinvent the wheel. ERP vendors do provide documentation about their system's internals, though there are still many gaps concerning tables’ structure and functionality. Fortunately, armed with enough patience, some knowledge about existing business processes and databases, a developer can reengineer an important part of the logic, though there's always a shade of doubt whether the logic is entirely correct or complete. Other good news is that more and more professionals blog on ERP topics, however few are the source that bring something new.

OLAP Reporting

OLAP solutions presume the existence of a data warehouse that reflects the business model, and when intelligently built it can satisfy an important percentage from the BI requirements. Building a data warehouse or a set of data marts is an expensive and time consuming endeavor and rarely arrives to satisfy everybody’s needs. There are also vendors that provide commercial off-the-shelf data models and solutions, and at a first view they look like an important deal, however such models are inflexible and seldom cover all requirements. One can end up by customizing and extending the model, running in all kind of issues involving model’s design, flexibility, quality, resources and costs.

There are many ways in which things can go wrong or be misused. One of such scenarios is when an OLAP system is used to satisfy OLTP reporting needs. It’s like using a city car in a country cross race – you might make it to compete or even end the race, if you are lucky enough, but don’t expect to make a success out of it!

Ad-hoc Reporting

The need for ad-hoc reports will be there no matter how complete and flexible are your existing reports. There are always new requirements that must be fulfilled in utile time and not rely on the long cycle time needed for an OLTP/OLAP report. Actually many of the reports start as ad-hoc reports and once their scope and logic stabilized they are moved to the reporting solution. Talking about new reports requirements, it worth to mention that many of the users don’t know exactly what they want, what is possible to get and what information it makes sense to show and at what level of detail in order to have a report that reflects the reality.

In theory is needed a person who facilitate the communication between users and development team, especially when the work is outsourced. Such a person should have in theory a deep understanding of the business, of the ERP system and reporting possibilities, deeper the knowledge, shorter the delivery cycle time. Maybe such a person could be dispensable if the users and development have the required skill set and knowledge to define and interpret clearly the requirements, however I doubt that’s achievable on large scale. On the other side such attributions could be taken by the IM or functional leaders that support the ERP system, it might work, at least in theory.

Data Mining

Data Mining tools and models are supposed to leverage the value of an ERP system beyond the functionality provided by analytic reports by helping to find hidden patterns and trends in data, to elaborate predictions and estimates. Here I resume only saying that DM makes sense only when the business reached a certain maturity, and I’m considering here mainly the costs/value ratio (the expected benefits needing to be greater than the costs) and effort required from business side in pursuing such a project.

Self-Service BI

There are situations in which the functionality provided by reporting tools doesn’t fulfill users’ requirements, one of such situations being when users (aka data citizens) need to analyze data by themselves, to link data from different sources, especially Excel sheets. It’s true that vendors tried to address such requirements, though I don’t think they are mature enough, easy to use or allow users to go beyond their skills and knowledge.

Most of such scenarios resume in accessing various sources over ODBC or directly using Excel or MS Access, such solutions being adequate more for personal use. The negative side is that people arrive to misuse them, often ending up by having a multitude of such solution which maybe would make sense to have implemented as a report.

There are managers who believe that such tools would allow eliminating the need for ad-hoc reports, it might be possible in isolated cases though don’t expect from users to be a Bill Inmon or Bill Gates!

Conclusion

All the tools have their limitations, no matter how complex they are, and I believe that not always a single reporting tool or platform will address all requirements. Each of such tools need a support team and even a center of excellence, so assure yourself that you have the resources, knowledge and infrastructure to support them!