Showing posts sorted by relevance for query Data warehousing. Sort by date Show all posts
Showing posts sorted by relevance for query Data warehousing. Sort by date Show all posts

22 October 2022

🛢DBMS: Data Warehouse/Mart (Just the Quotes)

"Unfortunately, just collecting the data in one place and making it easily available isn’t enough. When operational data from transactions is loaded into the data warehouse, it often contains missing or inaccurate data. How good or bad the data is a function of the amount of input checking done in the application that generates the transaction. Unfortunately, many deployed applications are less than stellar when it comes to validating the inputs. To overcome this problem, the operational data must go through a 'cleansing' process, which takes care of missing or out-of-range values. If this cleansing step is not done before the data is loaded into the data warehouse, it will have to be performed repeatedly whenever that data is used in a data mining operation." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"Having a purposeless or poorly performing dashboard is more common than not. This happens when the underlying architecture is not designed properly to support the needs of dashboard interaction. There is an obvious disconnect between the design of the data warehouse and the design of the dashboards. The people who design the data warehouse do not know what the dashboard will do; and the people who design the dashboards do not know how the data warehouse was designed, resulting in a lack of cohesion between the two. A similar disconnect can also exist between the dashboard designer and the business analyst, resulting in a dashboard that may look beautiful and dazzling but brings very little business value." (Nils H Rasmussen et al, "Business Dashboards: A visual catalog for design and deployment", 2009)

"Having multiple data lakes replicates the same problems that were created with multiple data warehouses - disparate data siloes and data fiefdoms that don't facilitate sharing of the corporate data assets across the organization. Organizations need to have a single data lake from which they can source the data for their BI/data warehousing and analytic needs. The data lake may never become the 'single version of the truth' for the organization, but then again, neither will the data warehouse. Instead, the data lake becomes the 'single or central repository for all the organization's data' from which all the organization's reporting and analytic needs are sourced." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"Unfortunately, some organizations are replicating the bad data warehouse practice by creating special-purpose data lakes - data lakes to address a specific business need. Resist that urge! Instead, source the data that is needed for that specific business need into an 'analytic sandbox' where the data scientists and the business users can collaborate to find those data variables and analytic models that are better predictors of the business performance. Within the 'analytic sandbox', the organization can bring together (ingest and integrate) the data that it wants to test, build the analytic models, test the model's goodness of fit, acquire new data, refine the analytic models, and retest the goodness of fit." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"Data quality in warehousing and BI is typically defined in terms of the 4 C’s - is the data clean, correct, consistent, and complete? When it comes to big data, there are two schools of thought that have different views and expectations of data quality. The first school believes that the gold standard of the 4 C’s must apply to all data (big and little) used for clinical care and performance metrics. The second school believes that in big data environments, a stringent data quality standard is impossible, too costly, or not required. While diametrically opposite opinions may play well in panel discussions, they do little to reconcile the realities of healthcare data quality." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017) 

"Data warehousing has always been difficult, because leaders within an organization want to approach warehousing and analytics as just another technology or application buy. Viewed in this light, they fail to understand the complexity and interdependent nature of building an enterprise reporting environment." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"A data lake is a storage repository that holds a very large amount of data, often from diverse sources, in native format until needed. In some respects, a data lake can be compared to a staging area of a data warehouse, but there are key differences. Just like a staging area, a data lake is a conglomeration point for raw data from diverse sources. However, a staging area only stores new data needed for addition to the data warehouse and is a transient data store. In contrast, a data lake typically stores all possible data that might be needed for an undefined amount of analysis and reporting, allowing analysts to explore new data relationships. In addition, a data lake is usually built on commodity hardware and software such as Hadoop, whereas traditional staging areas typically reside in structured databases that require specialized servers." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

01 March 2010

🕋Data Warehousing: Extraction, transformation, and loading [ETL] (Definitions)

"A process for acquiring data from source systems, reformatting and cleansing it based on business requirements, and loading it, usually into a data warehouse. The term is more generally used as a means of moving data between systems." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"The process of taking data from one source - the source - and transforming it and loading it into another location - the target." (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

[ETL system:] "Extract, transformation, and load system consisting of a set of processes by which the operational source data is prepared for the data warehouse. Consists of extracting operational data from source applications, cleaning and conforming the data, and delivering the data to the presentation servers, along with the ongoing management and support functions." (Ralph Kimball, "The Data Warehouse Lifecycle Toolkit", 2008)

"The very essence of business intelligence, this is the process of removing raw data from a data system, processing and cleaning it, and then making it available in a business-intelligence database." (Stuart Mudie et al, "BusinessObjects™ XI Release 2 for Dummies", 2008)

"The process of connecting to a source database, pulling data out of the source database, transforming the data into a standard format, and then loading the data into a destination system." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"A term that describes the activities used to prepare a collection of data sources for loading into a database, but also used to describe data mapping and transformation processes in general." (John R Talburt, "Entity Resolution and Information Quality", 2011)

"Generally, an approach to data integration from multiple source databases to integrated target databases." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"In computing, extract, transform, and load refers to a process in database usage and especially in data warehousing that: extracts data from outside sources; transforms it to fit operational needs, which can include quality levels, and loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse)." (Keith Holdaway, "Harness Oil and Gas Big Data with Analytics", 2014)

"Data warehouse process used to extract data from disparate or homogeneous data sources. Transforms the data to align with data structure for storing and analysis." (Dennis C Guster, "Scalable Data Warehouse Architecture: A Higher Education Case Study", 2018)

"An acronym for the Extract, Transform and Load process. This refers to extracting data from the source system, transforming (cleaning and manipulating etc.) the data and then loading the data into the data warehouse." (BI System Builders)

"ETL is a process that extracts, transforms, and loads data from multiple sources to a data warehouse or other unified data repository." (IBM) [source

"Refers to the process by which data is:
– Extracted from sources
– Transformed or standardized for storing in the proper heterogeneous format
– and Loaded into the final store or warehouse. The ETL process is commonly run in parallel with transformation processes executing as data is being extracted from sources." (Insight Software)

"Short for extraction, transformation, and loading. The process used to populate a data warehouse from disparate existing database systems." (Microstrategy)

"The act of extracting data from various sources, transforming data to consistent types, and loading the transformed data for use by applications." (Microsoft)

"a process in data warehousing responsible for pulling data out of one source, transforming them so that they meet the needs of the processes that will be using them on next stages, and placing them into target database." (KDnuggets)

"ETL is shorthand for – extract, transform, load. An ETL solution facilitates the replication of data from one or more sources that is converted into format suitable for use in analytics and moved into a destination system." (Qlik) [source]

"ETL (extract, transform, load) is three combined processes that are used to pull data from one database and move it to another database. It is a common function in data warehousing." (snowflake) [source]

"ETL (extract, transform, load) processes are three common functions performed on databases. Extract or read data from a database, transform the extracted data into a structure that can be placed into another database, and load or write the data into the target database." (kloudless)

"ETL (extract, transform, load) processes are three database functions. They extract or read data from a database, transform the extracted data into a structure that can be placed into another database, and load or write the data into the target database." (MuleSoft) 

"ETL stands for Extract-Transform-Load and it refers to the process used to collect data from numerous disparate databases, applications and systems, transforming the data so that it matches the target system’s required formatting and loading it into a destination database." (Databricks) [source]

"Extract Transform Load refers to a trio of processes that are performed when moving raw data from its source to a data warehouse, data mart, or relational database." (Informatica) [source]

"Extract, Transform and Load (ETL) refers to the process in data warehousing that concurrently reads (or extracts) data from source systems; converts (or transforms) the data into the proper format for querying and analysis; and loads it into a data warehouse, operational data store or data mart). ETL systems commonly integrate data from multiple applications or systems that may be hosted on separate hardware and managed by different groups or users. ETL is commonly used to assemble a temporary subset of data for ad-hoc reporting, migrate data to new databases or convert database into a new format or type." (Teradata) [source]

"Refers to a process in database usage consisting of three phases: extracting data from external sources, transforming it to fit operational needs (can include a quality check), and loading it into a target database or data warehouse." (Board International)

"In business intelligence, an ETL tool extracts data from one or more data-sources, transforms it and cleanses it to be optimized for reporting and analysis, and loads it into a data store or data warehouse. ETL stands for extract, transform, and load." (Logi Analytics) [source]

25 March 2010

🧊Data Warehousing: Mea Culpa (Part I: A Personal Journey)

Data Warehousing


Any discussion on data warehousing topics, even unconventional, can’t avoid to mention the two most widely adopted concepts in data warehousing, B. Inmon vs. R. Kimball’s methodologies. There is lot of ink consumed already on this topic and is difficult to come with something new, however I can insert in between my experience and personal views on the topic. From the beginning I have to state that I can’t take any of the two sides because from a philosophical viewpoint I am the adept of “the middle way” and, in addition, when choosing a methodology we have to consider business’ requirements and objectives, the infrastructure, the experience of resources, and many other factors. I don’t believe one method fits all purposes, therefore some flexibility is needed into this concern even from most virulent advocates. After all in the end it counts the degree to which the final solution fits the purpose, and no matter how complex and perfect is a methodology, no matter of the precautions taken, given the complexity of software development projects there is always the risk for failure.

  

B. Inmon defines the data warehouse as a “subject-oriented, integrated, non-volatile and time-varying collection of data in support of the management’s decisions” [3] - subject-oriented because is focused on an organization’s strategic subject areas, integrated because the data are coming from multiple legacy systems in order to provide a single overview, time-variant because data warehouse’s content is time dependent, and non-volatile because in theory data warehouse’s content is not updated but refreshed. 


Within my small library and the internet articles I read on this topic, especially the ones from Kimball University cycle,  I can’t say I found a similar direct definition for data warehouse given by R. Kimball, the closest I could get to something in this direction is the data warehouse as a union of data marts, in his definition a data mart is “a process-oriented subset of the overall organization’s data based on a foundation of atomic data, and that depends only on the physics of the data-measurement events, not on the anticipated user’s questions” [2]. This reflects also an important difference between the two approaches, in Inmon’s philosophy the data marts are updated through the data warehouse, the data in the warehouse being stored in a 3rd normal form, while in data marts are multidimensional and thus denormalized.


Even if it’s a nice conceptual tool intended to simplify data manipulation, I can’t say I’m a big fan of dimensional modeling, mainly because it can be easily misused to create awful (inflexible) monster models that can be barely used, sometimes being impossible to go around them without redesigning them. Also the relational models could be easily misused though they are less complex as physical design, easier to model and they offer greater flexibility even if in theory data’s normalization could add further complexity, however there is always a trade between flexibility, complexity, performance, scalability, usability and reusability, to mention just a few of the dimensions associated with data in general and data quality in particular.

  

In order to overcome dimensional modeling issues R. Kimball recommends a four step approach – first identifying the business processes corresponding to a business measurement or event, secondly declaring the grain (level of detail) and only after that defining the dimensions and facts [1]. I have to admit that starting from the business process adds a plus to this framework because in theory it allows better visibility over the processes, supporting processed-based data analysis, though given the fact that a process could span over multiple data elements or that multiple processes could partition the same data elements, this increases the complexity of such models. I find that a model based directly on the data elements allows more flexibility in the detriment of the work needed to bring the data together, though they should cover also the processes in scope.

  

Building a data warehouse it’s quite a complex task, especially if we take into consideration the huge percentage of software projects failure that holds also in data warehousing area. On the other side not sure how much such statistics about software projects failure can be taken ad literam because different project methodologies and data collection methods are used, not always detailed information are given about the particularities of each project, it would be however interesting to know what the failure rate per methodology. Occasionally there are some numbers advanced that sustain the benefit of using one or another methodology, and ignoring the subjective approach of such justifications they often lack adequate details to support them.


My first contact with building a data warehouse was almost 8 years ago, when as part of the Asset Management System I was supposed to work on, the project included also the creation of a data warehouse. Frankly few things are more scaring than seeing two IT professionals fighting on what approach to use in order to design a data warehouse, and is needless to say that the fight lasted for several days, calls with the customer, nerves, management involved, whole arsenal of negotiations that looked like a never ending story. 


Such fights are sometimes part of the landscape and they should be avoided, the simplest alternative being to put together the advantages and disadvantages of most important approaches and balance between them, unfortunately there are still professionals who don’t know how or not willing to do that. The main problem in such cases is the time which instead of being used constructively was wasted on futile fights. When lot of time is waisted and a tight schedule applies, one is forced to do the whole work in less time, leading maybe to sloppy solutions. 

  

A few years back I had the occasion to develop one data warehouse around the two ERP systems and the other smaller systems one of the customers I worked for was having in place, SQL Server 2000 and its DTS (Data Transformation Services) functionality being of great help for this purpose. Even if I was having some basic knowledge on the two data warehousing approaches, I had to build the initial data warehouse from scratch evolving the initial solution in time along several years. 


The design was quite simple, the DTS packages extracting the data from the legacy systems and dumping them in staging tables in normalized or denormalized form, after several simple transformations loading the data in the production tables, the role of the multidimensional data marts being played successfully by views that were scaling pretty well to the existing demands. Maybe many data warehouse developers would disregard such a solution, though it was quite an useful exercise and helped me to easier understand later the literature on this topic and the issues related to it. In addition, while working on the data conversion of two ERP implementations I had to perform more complex ETL (Extract Transform Load) tasks that the ones consider in the data warehouse itself.


In what concerns software development I am an adept of rapid evolutional prototyping because it allows getting customers’ feedback in early stages and thus being possible to identify earlier the issues as per customers’ perceptions, in plus allowing customers to get a feeling of what’s possible, how the application looks like. The prototyping method proved to be useful most of the times, I would actually say all the times, and often was interesting to see how customers’ conceptualization about what they need changed with time, changes that looked simple leading to partial redesign of the application. In other development approaches with long releases (e.g. waterfall) the customer gets a glimpse of the application late in the process, often being impossible to redesign the application so the customer has to live with what he got. Call me “old fashion” but I am the adept of rapid evolutional prototyping also in what concerns the creation of data warehouses, and even if people might argue that a data warehousing project is totally different than a typical development project, it should not be forgotten that almost all software development projects share many particularities from planning to deployment and further to maintenance.


Even if also B. Inmon embraces the evolutional/iterative approach in building a data warehouse, from a philosophical standpoint the rapid evolutional prototyping applied to data warehouses I feel it’s closer to R. Kimball’s methodology, resuming in choosing a functional key area and its essential business processes, building a data mart and starting from there building other data marts for the other functional key areas, eventually integrating and aligning them in a common solution – the data warehouse. On the other side when designing a software component or a module of one application you have also to consider the final goal, as the respective component or module will be part of a broader system, even if in some cases it could exist in isolation. Same can be said also about data marts’ creation, even if sometimes a data mart is rooted in the needs of a department, you have to look also at the final goal and address the requirements from that perspective or at least be aware of them.


Previous Post <<||>> Next Post


References:

[1] M. Ross R. Kimball, (2004) Fables and Facts: Do you know the difference between dimensional modeling truth and fiction? [Online] Available from: http://intelligent-enterprise.informationweek.com/info_centers/data_warehousing/showArticle.jhtml;jsessionid=530A0V30XJXTDQE1GHPSKH4ATMY32JVN?articleID=49400912 (Accessed: 18 March 2010)

[2] R. Kimball, J. Caserta (2004). The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley Publishing Inc. ISBN: 0-7645-7923 -1

[3] Inmon W.H. (2005) Building the Data Warehouse, 4th Ed. Wiley Publishing. ISBN: 978-0-7645-9944-6 

12 February 2010

🕋Data Warehousing: Operational Data Store (Definitions)

"The operational data store is subject-oriented and contains current, integrated, consistent data that reflects the current state of its subject. Operational data stores are similar to enterprise data warehouses in that they may include data from different systems that has been made consistent. Operational data stores are different from enterprise data warehouses in that they are updated frequently to reflect the current state of the operational systems." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A physical set of tables sitting between the operational systems and the data warehouse or a specially administered hot partition of the data warehouse itself. The main reason for an ODS is to provide immediate reporting of operational results if neither the operational system nor the regular data warehouse can provide satisfactory access. Because an ODS is necessarily an extract of the operational data, it also may play the role of source for the data warehouse." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"The operational data store is a subject-oriented, integrated, current, volatile collection of data used to support the operational and tactical decision-making process for the enterprise. It is the central point of data integration for business management, delivering a common view of enterprise data." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A hybrid structure designed to support both operational transaction processing and analytical processing." (William H Inmon, "Building the Data Warehouse", 2005)

"A collection of data from operational systems, most often integrated together, that is used for some operational purpose. The most critical characteristic here is that this is used for some operational function. This operational dependency takes precedence and the ODS should not be considered a central component of the data warehousing environment. An ODS can be a clean, integrated source of data to be pulled into the data warehousing environment." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A database designed to integrate data from multiple sources to facilitate operations. This is as opposed to a data warehouse, which integrates data from multiple sources to facilitate reporting and analysis." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A database that is subject-oriented, read-only to end users, current (non-historical), volatile, and integrated; is separate from and derived from one or more systems of record; and supports day-today business operations and real-time decision making." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A DB system designed to integrate data from multiple sources to allow operational access to the data for operational reporting." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Database for transaction processing systems that uses data warehouse concepts to provide clean data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"A database designed to integrate data from multiple sources for additional operations on the data." (Craig S Mullins, "Database Administration", 2012)

"A data store that provides data from the original source in near real time." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"ODS is the decision support database that integrated operational data from multiple source systems used to capture operational data and is used primarily for near real time operational reporting and analytics. ODSs are used to measure the operations processes efficiencies. The integration pattern is at the lowest levels of granularity and can happen from near real-time to multiple times in a day." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"A data store that integrates data from a range of sources, which is subsequently merged and cleaned to serve as the foundation for enterprise operational reporting. It is an important piece of an Enterprise Data Warehouse (EDW) used for enterprise analytical reporting." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"An ODS system integrates operational or transactional data from multiple systems to support operational reporting." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"An operational data store (ODS) is an alternative to having operational decision support system (DSS) applications access data directly from the database that supports transaction processing (TP). While both require a significant amount of planning, the ODS tends to focus on the operational requirements of a particular business process (for example, customer service), and on the need to allow updates and propagate those updates back to the source operational system from which the data elements were obtained. The data warehouse, on the other hand, provides an architecture for decision makers to access data to perform strategic analysis, which often involves historical and cross-functional data and the need to support many applications." (Gartner)

01 February 2010

🕋Data Warehousing: Data Warehousing (Definitions)

Data Warehousing (DW): "The operational extract, cleansing, transformation, and load processes, and associated control processes, that maintain the data containing within a Data Warehouse." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

Data Warehousing: "creating a central database, extracting data from source systems, and then cleaning and loading the data in to the database. The database is designed in such a way that business users can then access the data to answer their business questions." (BI System Builders)

Data Warehousing: "The process of aggregating data from disparate sources for the purpose of building a data warehouse. Data warehousing involves design, development, testing, deployment, operations, impact analysis, and change management." (Insight Software)

"Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence." (Informatica) [source]

28 February 2017

🧊Data Warehousing: Data Load Optimization (Part I: A Success Story)

Data Warehousing
Data Warehousing Series

Introduction

This topic has been waiting in the queue for almost two years already - since I finished optimizing an already existing relational data warehouse within a SQL Server 2012 Enterprise Edition environment. Through various simple techniques I managed then to reduce the running time for the load process by more than 65%, from 9 to 3 hours. It’s a considerable performance gain, considering that I didn’t have to refactor any business logic implemented in queries.

The ETL (Extract, Transform, Load) solution was making use of SSIS (SQL Server Integration Services) packages to load data sequentially from several sources into staging tables, and from stating further into base tables. Each package was responsible for deleting the data from the staging tables via TRUNCATE, extracting the data 1:1 from the source into the staging tables, then loading the data 1:1 from the staging table to base tables. It’s the simplest and a relatively effective ETL design I also used with small alterations for data warehouse solutions. For months the data load worked smoothly, until data growth and eventually other problems increased the loading time from 5 to 9 hours.

Using TABLOCK Hint

Using SSIS to bulk load data into SQL Server provides an optimum of performance and flexibility. Within a Data Flow, when “Table Lock” property on the destination is checked, it implies that the insert records are minimally logged, speeding up the load by a factor of two. The TABLOCK hint can be used also for other insert operations performed outside of SSIS packages. At least in this case the movement of data from staging into base tables was performed in plain T-SQL, outside of SSIS packages. Also further data processing had benefitted from this change. Only this optimization step alone provided 30-40% performance gain.

Drop/Recreating the Indexes on Big Tables

As the base tables were having several indexes each, it proved beneficial to drop the indexes for the big tables (e.g. with more than 1000000 records) before loading the data into the base tables, and recreate the indexes afterwards. This was done within SSIS, and provided an additional 20-30% performance gain from the previous step.

Consolidating the Indexes

Adding missing indexes, removing or consolidating (overlapping) indexes are typical index maintenance tasks, apparently occasionally ignored. It doesn’t always bring much performance as compared with the previous methods, though dropping and consolidating some indexes proved to be beneficial as fewer data were maintained. Data processing logic benefited from the creation of new indexes as well.

Running Packages in Parallel

As the packages were run sequentially (one package at a time), the data load was hardly taking advantage of the processing power available on the server. Even if queries could use parallelism, the benefit was minimal. Enabling packages run in parallel added additional performance gain, however this minimized the availability of processing resources for other tasks. When the data load is performed overnight, this causes minimal overhead, however it should be avoided when the data are loading to business hours.

Using Nonclustered Indexes

In my analysis I found out that many tables, especially the ones storing prepared data, were lacking a clustered index, even if further indexes were built on them. I remember that years back there was a (false) myth that fact and/or dimension tables don’t need clustered indexes in SQL Server. Of course clustered indexes have downsides (e.g. fragmentation, excessive key-lookups) though their benefits exceed by far the downsides. Besides missing clustered index, there were cases in which the tables would have benefited from having a narrow clustered index, instead of a multicolumn wide clustered index. Upon case also such cases were addressed.

Removing the Staging Tables

Given the fact that the source and target systems are in the same virtual environment, and the data are loaded 1:1 between the various layers, without further transformations and conversions, one could load the data directly into the base tables. After some tests I came to the conclusion that the load from source tables into the staging table, and the load from staging table into base table (with TABLOCK hint) were taking almost the same amount of time. This means that the base tables will be for the same amount of the time unavailable, if the data were loaded from the sources directly into the base tables. Therefore one could in theory remove the staging tables from the architecture. Frankly, one should think twice when doing such a change, as there can be further implications in time. Even if today the data are imported 1:1, in the future this could change.

Reducing the Data Volume

Reducing the data volume was identified as a possible further technique to reduce the amount of time needed for data loading. A data warehouse is built based on a set of requirements and presumptions that change over time. It can happen for example that even if the reports need only 1-2 years’ worth of data, the data load considers a much bigger timeframe. Some systems can have up to 5-10 years’ worth of data. Loading all data without a specific requirement leads to waste of resources and bigger load times. Limiting the transactional data to a given timeframe can make a considerable difference. Additionally, there are historical data that have the potential to be archived.

There are also tables for which a weekly or monthly refresh would suffice. Some tables or even data sources can become obsolete, however they continue to be loaded in the data warehouse. Such cases occur seldom, though they occur. Also some unused or redundant column could have been removed from the packages.

Further Thoughts

There are further techniques to optimize the data load within a data warehouse like partitioning large tables, using columnstore indexes or optimizing the storage, however my target was to provide maximum sufficient performance gain with minimum of effort and design changes. Therefore I stopped when I considered that the amount of effort is considerable higher than the performance gain.

Further Reading:
[1] TechNet (2009) The Data Loading Performance Guide, by Thomas Kejser, Peter Carlin & Stuart Ozer (link)
[2] MSDN (2010) Best Practices for Data Warehousing with SQL Server 2008 R2, by Mark Whitehorn, Keith Burns & Eric N Hanson (link)
[3] MSDN (2012) Whitepaper: Fast Track Data Warehouse Reference Guide for SQL Server 2012, by Eric Kraemer, Mike Bassett, Eric Lemoine & Dave Withers (link)
[4] MSDN (2008) Best Practices for Data Warehousing with SQL Server 2008, by Mark Whitehorn & Keith Burns (link)
[5] TechNet (2005) Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server, by Gandhi Swaminathan (link)
[6] SQL Server Customer Advisory Team (2013) Top 10 Best Practices for Building a Large Scale Relational Data Warehouse (link)

21 October 2023

🧊Data Warehousing: Architecture V (Dynamics 365, the Data Lakehouse and the Medallion Architecture)

Data Warehousing
Data Warehousing Series

An IT architecture is built and functions under a set of constraints that derive from architecture’s components. Usually, if we want flexibility or to change something in one area, this might have an impact in another area. This rule applies to the usage of the medallion architecture as well! 

In Data Warehousing the medallion architecture considers a multilayered approach in building a single source of truth, each layer denoting the quality of data stored in the lakehouse [1]. For the moment are defined 3 layers - bronze for raw data, silver for validated data, and gold for enriched data. The concept seems sound considering that a Data Lake contains all types of raw data of different quality that needs to be validated and prepared for reporting or other purposes.

On the other side there are systems like Dynamics 365 that synchronize the data in near-real-time to the Data Lake through various mechanisms at table and/or data entity level (think of data entities as views on top of other tables or views). The databases behind are relational and in theory the data should be of proper quality as needed by business.

The greatest benefit of serverless SQL pool is that it can be used to build near-real-time data analytics solutions on top of the files existing in the Data Lake and the mechanism is quite simple. On top of such files are built external tables in serverless SQL pool, tables that reflect the data model from the source systems. The external tables can be called as any other tables from the various database objects (views, stored procedures and table-valued functions). Thus, can be built an enterprise data model with dimensions, fact-like and mart-like entities on top of the synchronized filed from the Data Lake. The Data Lakehouse (= Data Warehouse + Data Lake) thus created can be used for (enterprise) reporting and other purposes.

As long as there are no special requirements for data processing (e.g. flattening hierarchies, complex data processing, high-performance, data cleaning) this approach allows to report the data from the data sources in near-real time (10-30 minutes), which can prove to be useful for operational and tactical reporting. Tapping into this model via standard Power BI and paginated reports is quite easy. 

Now, if it's to use the data medallion approach and rely on pipelines to process the data, unless one is able to process the data in near-real-time or something compared with it, a considerable delay will be introduced, delay that can span from a couple of hours to one day. It's also true that having the data prepared as needed by the reports can increase the performance considerably as compared to processing the logic at runtime. There are advantages and disadvantages to both approaches. 

Probably, the most important scenario that needs to be handled is that of integrating the data from different sources. If unique mappings between values exist, unique references are available in one system to the records from the other system, respectively when a unique logic can be identified, the data integration can be handled in serverless SQL pool.

Unfortunately, when compared to on-premise or Azure SQL functionality, the serverless SQL pool has important constraints - it's not possible to use scalar UDFs, tables, recursive CTEs, etc. So, one needs to work around these limitations and in some cases use the Spark pool or pipelines. So, at least for exceptions and maybe for strategic reporting a medallion architecture can make sense and be used in parallel. However, imposing it on all the data can reduce flexibility!

Bottom line: consider the architecture against your requirements!

Previous Post <<||>>> Next Post

[1] What is the medallion lakehouse architecture?
https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion

02 February 2010

🕋Data Warehousing: Data Warehouse [DWH] (Definitions)

"A subject oriented, integrated, time variant, and non-volatile collection of summary and detailed historical data used to support the strategic decision-making processes for the corporation." (William H Inmon, "What is a Data Warehouse?", Prism Vol. 1 (1), 1995)

"A copy of transaction data specifically structured for query and analysis." (Ralph Kimball, The Data Warehouse Toolkit, 1996)

"A database specifically structured for query and analysis. A data warehouse typically contains data representing the business history of an organization. Data in a data warehouse is usually less detailed and longer lived than data from an OLTP system." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"The conglomeration of an organization’s data warehouse staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease-of-use." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed, 2002)

"The data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data used to support the strategic decision-making process for the enterprise. It is the central point of data integration for business intelligence and is the source of data for the data marts, delivering a common view of enterprise data." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A data warehouse is a set of computer databases specifically designed with related, historical blissful [high quality] data that assist in formulating decisions and taking action." (Margaret Y Chu, "Blissful Data", 2004)

"A set of computer databases specifically designed with related, historical blissful data that assist in formulating decisions and taking action." (Margaret Y Chu, "Blissful Data ", 2004)

"A collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is relevant to some moment in time. The data warehouse contains atomic data and lightly summarized data." (William H Inmon, "Building the Data Warehouse", 2005)

"A database that can follow a Third Normal Form (3NF) or dimensional design and that houses a time-variant collection of data from multiple sources. It’s generally used to collect and store integrated sets of historical data from multiple operational systems and then feed one or more dependent data marts. In some cases, a data warehouse may also provide end user access to support enterprise views of data." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A database containing a copy of operational data that is organized for analytic purposes." (Christopher Adamson, "Mastering Data Warehouse Aggregates", 2006)

[Enterprise Data Warehouse] "A database that contains a copy of enterprise data, reorganized for analytic purposes. Subject areas within the enterprise data warehouse are called data marts." (Christopher Adamson, "Mastering Data Warehouse Aggregates", 2006)

"A database specifically structured for query and analysis. A data warehouse typically contains data representing the business history of an organization." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"A relational database used as a repository for storing and analyzing numerical information that has been cleansed and verified." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"A technology platform that stores business data for the purpose of strategic decision making. Data warehouses are normally the central integration point for large amounts of detailed, historical data from heterogeneous systems across the company to avail data for business intelligence." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"Central repository holding cleaned and transformed information needed by an organization to make decisions, usually extracted from an operational database." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is relevant to some moment in time. The data warehouse contains atomic data and lightly summarized data." (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

"A data structure that is optimized for distribution. It collects and stores integrated sets of historical data from multiple operational systems and feeds them to one or more data marts. (Standard definition from The Data Warehousing Institute)" (Steve Williams & Nancy Williams, "The Profit Impact of Business Intelligence", 2007)

"A specialised database containing consolidated historical data drawn from a number of existing databases to support strategic decision making." (Keith Gordon, "Principles of Data Management", 2007)

"Central repository holding cleaned and transformed information needed by an organization to make decisions, usually extracted from an operational database." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2007)

"Data warehouse is a central repository for summarized and integrated data from operational databases and external data sources." (S Sumathi & S Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A logical warehouse of data that gathers production and operational information from various departments of a corporation into a single data entity. This information is loaded regularly, which allows for careful analysis over a period of time." (Stuart Mudie et al, "BusinessObjects™ XI Release 2 for Dummies", 2008)

"A repository of data for offline use in building reports and analyzing historical data." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A data warehouse is a system of records (a business intelligence gathering system) that takes data from a company's operational databases and other data sources and transforms it into a structure conducive to business analysis." (Sivakumar Harinath et al, "Professional Microsoft® SQL Server® Analysis Services 2008 with MDX", 2009)

"A database designed for reporting and data analysis. A data warehouse typically contains data representing the business history of an organization." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A large data store containing the organization’s historical data, which is used primarily for data analysis and data mining." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"An integrated, centralized decision support database and the related software programs used to collect, cleanse, transform, and store data from a variety of operational sources to support Business Intelligence. A Data Warehouse may also include dependent data marts." (DAMA International, The DAMA Guide to the Data Management Body of Knowledge 1st Ed., 2009)

"(1) A centralized database for collecting the data from numerous other systems so that they can be made available for management reporting. The database is close to 3rd Normal Form.  (2)  A system that includes the central database described in 1; plus procedures for extracting, transforming, and loading data from other systems; and one or more data marts that organize subsets of the data for particular reporting purposes." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"[A] data warehouse is a collection of data designed to support management in the decision-making process. It is a subject oriented, integrated, time-variant, non-up datable collection of data used in support of management decision-making processes and business intelligence. It contains a wide variety of data that present a coherent picture of business condition at a single point of time. It is a unique kind of database which focuses on business intelligence, external data and time-variant data." (Vijay K Pallaw, "Database Management Systems" 2nd Ed., 2010)

"A data warehouse is a large, enterprise-wide database that acts as a central storage location for data that has been through the Extract, Transform, and Load (ETL) process. A data warehouse often includes historical data as well." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

[Active Data Warehouse (ADW):] "A data warehouse that is generally capable of supporting near-real-time updates, fast response times, and mixed workloads by leveraging well-architected data models, optimized ETL processes, and the use of workload management." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A large, often enterprise-wide repository of data used for reporting and analysis. Data warehouses generally collect and manage data from a large number of operational and financial systems across an enterprise." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010) 

"A subject-oriented, integrated, time-variant, and historical collection of summary and detailed data used to support the decision-making and other reporting and analysis needs that require historical, point-in-time information. Data, once captured within the warehouse, is nonvolatile and relevant to a point in time." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A specialized type of database that is used to aggregate data from transaction databases for data analysis purposes, such as identifying and examining business trends, to support planning and decision making." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

[Enterprise Data Warehousing (EDW):] "A data repository of organizational data that is organized, analyzed, and used to enable more informed decision making and planning." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

[federated data warehouse:] "1.A conceptual Data Warehouse made up of multiple decision support databases, potentially on multiple servers, but presented transparently to Business Intelligence users as a unified schema for query, analysis, and reporting. 2.An Enterprise Data Warehouse fed by extracts from departmental Data Warehouses and/or legacy Data Warehouses prior to their incorporation and/or retirement." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The database that stores operations data for long periods of time." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A database used for reporting and analysis." (Craig S Mullins, "Database Administration", 2012)

"A large data store containing the organization’s historical data, which is used primarily for data analysis and data mining. It is the data system of record." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A shared repository of data, often used to support the centralized consolidation of information for decision support." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A shared repository of data, often used to support the centralized consolidation of information for decision support." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A store that provides data from the originating source or the operational data stores; it contains historical and derived data. Also known as an information warehouse." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A large data store containing the organization’s historical data, which is used primarily for data analysis and data mining. It is the data system of record." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"A centralized database system which captures data and allows later analysis of the collected data." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"A secondary database that holds older data for analysis. In some applications, you may want to analyze the data and store modified or aggregated forms in the warehouse instead of keeping every outdated record." (Rod Stephens, "Beginning Software Engineering", 2015)

"Decision-making data base containing the totality of a business’ decision-making data (all subjects)." (Fernando Iafrate, "From Big Data to Smart Data", 2015) 

[Enterprise Data Warehouse (EDW):] "A clean data store created to merge and store data from different sources for enterprise data analysis." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"A granular, time-variant, structured store of historical data in a neutral, nonredundant format for multiple uses. Its purpose is the reuse of data." (Gregory Lampshire et al, "The Data and Analytics Playbook", 2016)

"Electronic storehouses where vast amounts of data are arrayed, integrated, categorised, stored, and sold." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A repository for storing business-relevant data." (Jonathan Ferrar et al, "The Power of People", 2017)

"A very large database designed to support decision making in organizations. It is usually batch updated and structured for rapid online queries and managerial summaries. A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data." (Daniel J Power & Ciara Heavin, "Decision Support, Analytics, and Business Intelligence" 3rd Ed., 2017)

"A data warehouse is a repository for all the data that an enterprise collects from internal and external sources." (Amar Sahay, "Business Analytics" Vol. I, 2018)

"A data warehouse is a repository of enterprise data used for reporting and analysis." (Michelle Gutzait et al, "Hands-On Data Warehousing with Azure Data Factory", 2018) 

"A subject-oriented collection of data that is used to support strategic decision making. The warehouse is the central point of data integration for business intelligence. It is the source of data for data marts within an enterprise and delivers a common view of enterprise data." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

[Enterprise Data Warehouse (EDW):] "system used for analysis and reporting that consists of central repositories of integrated data from a wide spectrum of different sources." (Francesco Corea, "An Introduction to Data: Everything You Need to Know About AI, Big Data and Data Science", 2019)

"A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data that supports management’s decision-making process." (Piethein Strengholt, "Data Management at Scale", 2020)

"A data warehouse is a central storage for all data that an enterprise’s various business systems collect." (Logi Analytics) [source]

"A data warehouse is a data management solution to store large quantities of historical business data, performing queries to support various business intelligence and analytics use cases." (Qlik) [source]

"A data warehouse is a database designed to enable business intelligence activities: it exists to help users understand and enhance their organization's performance. It is designed for query and analysis rather than for transaction processing, and usually contains historical data derived from transaction data, but can include data from other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources." (Oracle)

"A data warehouse is a relational database that is designed for analytical rather than transactional work. It collects and aggregates data from one or many sources so it can be analyzed to produce business insights. It serves as a federated repository for all or certain data sets collected by a business’s operational systems." (snowflake) [source]

"A data warehouse is a repository for data generated and collected by an enterprise's various operational systems." (Techtarget) [source]

"A data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an aggregate, summary form suitable for enterprise-wide data analysis and reporting for predefined business needs." (Gartner)

"A data warehouse is a system used to do quick analysis of business trends using data from many sources." (KDnuggets)

[Big data warehouse:] "A specialized, cohesive set of data repositories andplatforms that supports a broad variety of analytics running on-premises, inthe cloud, or in a hybrid environment. BDW leverages traditional and new bigdata technologies such as Hadoop, Spark, columnar and row-based datawarehouses, ETL and streaming, and elastic in-memory and storage frameworks." (Forrester)

[Cloud data warehouse:] "An on-demand, secure, and scalable self-service data warehouse that automates the provisioning, administration, tuning, backup, and recovery to accelerate analytics and actionable insights while minimizing administration requirements." (Forrester)

"A database, typically very large, containing the historical data of an enterprise. Used for decision support or business intelligence, it organizes data and allows coordinated updates and loads." (Microstrategy)

"A large store of data drawing from a wide range of sources that can be processed, split, and analyzed to extract insights that guide management decisions. Data warehouses are typically relational databases that contain historical data and are designed for query and analysis." (Insight Software)

"A record of an enterprise’s past transactional and operational information, stored in a database. Data warehousing is not meant for current 'live' data; rather, data from the production databases are copied to the data warehouse so that queries can be performed without disturbing the performance or the stability of the production systems." (Appian)

"A system used for data analytics. They are a central location of integrated data from other more disparate sources, storing both current (real-time) and historical data which can then be used to create trends reports." (Solutions Review)

"An implementation of an informational database used to store sharable data sourced from an operational database-of-record. It is typically a subject database that allows users to tap into a company's vast store of operational data to track and respond to business trends and facilitate forecasting and planning efforts." (Information Management)

"The database that stores operations data for long periods of time." (Microsoft)

[Enterprise data warehouse:] "A repository of information that is used for reporting and analytics. It includes key data management functions, such as concurrency, security, storage, processing, SQL access, and integration." (Forrester)

[Enterprise data warehouse:] "a single database or set of databases that allow all of an organisation’s data to be stored in a central repository ideally in relationship to each other." (BI System Builders)

"In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc.)" (Wikipedia)

04 May 2019

🧊Data Warehousing: Architecture (Part I: Push vs. Pull)

Data Warehousing

In data integrations, data migrations and data warehousing there is the need to move data between two or more systems. In the simplest scenario there are only two systems involved, a source and a target system, though there can be complex scenarios in which data from multiple sources need to be available in a common target system (as in the case of data warehouses/marts or data migrations), or data from one source (e.g. ERP systems) need to be available in other systems (e.g. Web shops, planning systems), or there can be complex cases in which there is a many-to-many relationship (e.g. data from two ERP systems are consolidated in other systems).  

The data can flow in one direction from the source systems to the target systems (aka unidirectional flow), though there can be situations in which once the data are modified in the target system they need to flow back to the source system (aka bidirectional flow), as in the case of planning or product development systems. In complex scenarios the communication may occur multiple times within same process until a final state is reached.

Independently of the number of systems and the type of communication involved, data need to flow between the systems as smooth as possible, assuring that the data are consistent between the various systems and available when needed. The architectures responsible for moving data between the sources are based on two simple mechanisms - push vs pull – or combinations of them.

A push mechanism makes data to be pushed from the source system into the target system(s), the source system being responsible for the operation. Typically the push can happen as soon as an event occurs in the source system, event that leads to or follows a change in the data. There can be also cases when is preferred to push the data at regular points in time (e.g. hourly, daily), especially when the changes aren’t needed immediately. This later scenario allows to still make changes to the data in the source until they are sent to other system(s). When the ability to make changes is critical this can be controlled over specific business rules.

A pull mechanism makes the data to be pulled from the source system into the target system, the target systems being responsible for the operation. This usually happens at regular points in time or on demand, however the target system has to check whether the data have been changed.

Hybrid scenarios may involve a middleware that sits between the systems, being responsible for pulling the data from the source systems and pushing them into the targets system. Another hybrid scenario is when the source system pushes the data to an intermediary repository, the target system(s) pulling the data on a need basis. The repository can reside on the source, target on in-between. A variation of it is when the source informs the target that a change happened at it’s up to the target to decide whether it needs the data or not.

The main differentiators between the various methods is the timeliness, completeness and consistency of the data. Timeliness refers to the urgency with which data need to be available in the target system(s), completeness refers to the degree to which the data are ready to be sent, while consistency refers to the degree the data from the source are consistent with the data from the target systems.

Based on their characteristics integrations seem to favor push methods while data migrations and data warehousing the pull methods, though which method suits the best depends entirely on the business needs under consideration.

03 February 2010

🕋Data Warehousing: Data Mart [DM] (Definitions)

"A subset of the contents of a data warehouse, stored within its database. A data mart tends to contain data focused at the department level, or on a specific business area. It is frequently implemented to manage the volume and scope of data." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"A data warehouse, or repository, whose scope is limited to a single subject area." (William A Giovinazzo, "Internet-Enabled Business Intelligence", 2002)

"A type of data warehouse with data specifically designed for a defined set of functions." (Margaret Y Chu, "Blissful Data ", 2004)

[dependent data mart:] "A data mart that obtains its source data from an Enterprise Data Warehouse." (Margaret Y Chu, "Blissful Data ", 2004)

[independent data mart:] "A data mart that obtains its source data from operational systems or other external media." (Margaret Y Chu, "Blissful Data ", 2004)

"A database that contains a copy of operational data, organized to support analysis of a business process. A data mart may be a subject area within an enterprise data warehouse, or an analytic database that is departmentally focused. When not planned as part of an enterprise data warehouse, a data mart may become a stovepipe. When deployed as an adjunct to a normalized data warehouse, a data mart may contain aggregated data. When built around a conformance bus, the data mart is neither a stovepipe nor an aggregation." (Christopher Adamson, "Mastering Data Warehouse Aggregates", 2006)

[stovepipe data mart:] "A departmentally focused data warehouse implementation that does not interoperate with other subject areas. Stovepipes are avoided through the design of a data warehouse bus - a set of conformed dimensions used consistently across subject areas." (Christopher Adamson, "Mastering Data Warehouse Aggregates", 2006)

"A platform that maintains data for analysis by a single organization or user group for a specific set of business purposes." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"A departmentalized structure of data feeding from the data warehouse where data is denormalized based on the department’s need for information." (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

"A data mart is a specialized database containing a subset of data from a data warehouse that is needed for a particular business purpose. A data mart is used for reporting and analysis of business data." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"A smaller data warehouse that holds data of interest to a particular group." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A specialized type of data warehouse that works with a specific set of data to answer a specific need. A data mart is designed to provide quick, easy access to crucial data." (Stuart Mudie et al, "BusinessObjects™ XI Release 2 for Dummies", 2008) 

"A collection of related data from internal and external sources, transformed, integrated, and stored for the purpose of providing strategic information to a specific set of users in an enterprise." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A database in a data warehouse configuration that holds a subset of data specifically organized for a particular kind of reporting." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A database structured for specific analysis and historical reporting needs." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A smaller, more specialized version of a data warehouse that includes data from a specific functional area or department." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"A decision support database supporting Business Intelligence in a limited subject area, using a dimensional data model design." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Small data warehouse designed to support a department or SBU." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"A subset of a data warehouse that is designed to focus on a specific set of business information." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A subset of a data warehouse that’s usually oriented to a business group or team." (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

[dependent data mart:] "A data mart whose sole source of data is the data warehouse; a dependent data mart is a component of the corporate information factory" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

[independent data mart:] "A data mart whose source data comes directly from legacy systems, rather than being sourced by a data warehouse" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"Data organized to support specific needs of a user community." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"An analytical database built for and used by a business unit or department to slice and dice for analytical reporting and analysis." (Andrew Pham et al, "From Business Strategy to Information Technology Roadmap", 2016)

"A focused collection of operational data that is usually confined to a specific aspect or subject of a business, such as customers, products, or suppliers. It is a more focused decision support data store than a data warehouse." (Daniel J Power & Ciara Heavin, "Decision Support, Analytics, and Business Intelligence" 3rd Ed., 2017)

"A subset of a data warehouse that allows data to be accessed and customized by specific business functions." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"An analytical database built for and used by a business unit or department to slice and dice for analytical reporting and analysis." (Tiffany Pham et al, "From Business Strategy to Information Technology Roadmap", 2018)

"A subset of a data warehouse that contains data that is tailored and optimized for the specific reporting needs of a department or team. A data mart can be a subset of a warehouse for an entire organization, such as data that is contained in online analytical processing (OLAP) tools." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"a subset of the data warehouse used for a specific purpose. Data marts are then department-specific or related to a single line of business (LoB)."  (Francesco Corea, "An Introduction to Data: Everything You Need to Know About AI, Big Data and Data Science", 2019)

"A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers." (The Data Warehousing Institute)

"A database, usually smaller than a data warehouse, designed to help managers make strategic decisions about their business by focusing on a specific subject or department." (Microstrategy)

"A subset of the contents of a data warehouse that tends to contain data focused at the department level, or on a specific business area." (Microsoft)

"A simple data repository that houses data of a specific discipline." (Solutions Review)

"A data mart is a curated subset of data often generated for analytics and business intelligence users. Data marts are often created as a repository of pertinent information for a subgroup of workers or a particular use case." (snowflake) [source]

"A data mart is a subject-oriented database that is often a partitioned segment of an enterprise data warehouse. The subset of data held in a data mart typically aligns with a particular business unit like sales, finance, or marketing." (Talend) [source]

"A data mart is a subset of data from an enterprise data warehouse in which the relevance is limited to a specific business unit or group of users." (Informatica) [source]

"A data mart is a subset of data stored within the overall data warehouse, for the needs of a specific team, section or department within the business enterprise. […] Data marts make it much easier for individual departments to access key data insights more quickly and helps prevent departments within the business organization from interfering with each other’s data." (Sisense) [source]

"A data mart is the access layer of a data warehouse that is used to provide users with data. Data marts are often seen as small slices of the data warehouse. Data warehouses typically house enterprise-wide data, and information stored in a data mart usually belongs to a specific department or team." (Logi Analytics) [source]

"A data mart serves the same role as a data warehouse, but it is intentionally limited in scope. It may serve one particular department or line of business." (Oracle)

"The data mart is a subject-oriented slice of the data warehouse logical model serving a narrow group of users. Many data marts only need a subset of data from the full tables in the data warehouse." (Teradata) [source]

02 January 2015

📊Business Intelligence: Business Intelligence [BI] (Definitions)

"Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news of the many battles fought was thus received first by him, and the fall of Namur added to his profits, owing to his early receipt of the news." (Richard M Devens, "Cyclopaedia of Commercial and Business Anecdotes", 1865) [first usage of the term] 

"An automatic system is being developed to disseminate information to the various sections of any industrial, scientific or government organization. This intelligence system will utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points. […] All of these techniques are based on statistical procedures which can be performed on present-day data processing machines. Together with proper communication facilities and input-output equipment a comprehensive system may be assembled to accommodate all information problems of an organization. We call this a Business Intelligence System." (Hans P Luhn, "A Business Intelligence System", IBM Journal, 1958)  [first usage of the term in modern context] 

"The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as ‘the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal’." (Hans P Luhn,"A Business Intelligence System", IBM Journal, 1958)

"The process of accessing and analyzing data and using it to make better business decisions. Business intelligence distinguishes the use of data, which may or may not be valuable, with the use of information, which is always of value in business decisions." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A generic term to describe leveraging the organization’s internal and external information assets for making better business decisions." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"The processes, technologies, and tools needed to turn data into information, information into knowledge, and knowledge into plans that drive profitable business action. Business intelligence encompasses data warehousing, business analytic tools, and content/knowledge management." (Data Warehousing Institute, 2002)

"Thinking abstractly about an organization, reasoning about the business, organizing large quantities of information about the business in order to define and execute a strategy." (William A Giovinazzo, "Internet-Enabled Business Intelligence", 2002)

"Business intelligence is the set of processes and data structures used to analyze data and information used in strategic decision support. The components of Business Intelligence are the data warehouse, data marts, the DSS interface and the processes to 'get data in' to the data warehouse and to 'get information out'." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"The set of products or services used to access and analyze data to turn them into information or knowledge enhancement. It includes decision support and data warehousing." (Margaret Y Chu, "Blissful Data", 2004)

"A category of applications and technologies to guide the analysis and use of detailed business data for improved business decision making. The term is sometimes used synonymously with decision support, though business intelligence is technically much broader." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"An approach to management that allows an organization to define what information is useful and relevant to its corporate decision making. Business intelligence helps decision makers make better decisions faster by converting data into information." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"Business Intelligence is defined as getting the right information to the right people at the right time. The term encompasses all the capabilities required to turn data into intelligence that everyone in your organization can trust and use for more effective decision making."(Stefanie V Gerlach et al, "Business Intelligence Competency Centers", 2006)

"The part of information technology that focuses on reporting and analysis currently goes by the name business intelligence (BI)." (Stephen Few, "Information Dashboard Design", 2006)

"Business information and business analyses within the context of key business processes that lead to decisions and actions and which result in improved business performance." (Steve Williams & Nancy Williams, "The Profit Impact of Business Intelligence", 2007)

"The activity of converting data into information." (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

"Business Intelligence is a method of storing and presenting key enterprise data so that anyone in your company can quickly and easily ask questions of accurate and timely data. Effective BI allows end users to use data to understand why your business got the particular results that it did, to decide on courses of action based on past data, and to accurately forecast future results." (Lynn Langit, "Foundations of SQL Server 2005 Business Intelligence", 2007)

"A generic term to describe leveraging the organization’s internal and external information assets to support improved business decision making. Some commentators use the term business intelligence to refer only to the reporting and analysis of data stored in the data warehouse. Because the industry has not reached agreement, we consistently use the phrase data warehouse/business intelligence (DW/BI) to mean the complete end-to-end system. Though some would argue that you can theoretically deliver BI without a data warehouse, and vice versa, that is ill-advised from our perspective. Linking the two together in the DW/BI acronym further reinforces their dependency." (Ralph Kimball, "The Data Warehouse Lifecycle Toolkit", 2008)

"A method used to analyze and interpret business performance data so that fact-based business decisions can be made. The business data referred to in BI is usually extracted from a variety of domains and databases, and presented in a way to bring about more efficient analysis." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"A somewhat generic term used for computer programs that store, analyze, and broadcast data to users to answer business questions."  (Stuart Mudie et al, "BusinessObjects™ XI Release 2 for Dummies", 2008)

"Business intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision making." (Boris Evelson, Forrester Research, 2008)

"Skills and technologies used to help organizations make better decisions by better understanding their business, their market, and their customers." (Tony Fisher, "The Data Asset", 2009)

"The collection of one or more reports or analyses, using data from the data warehouse, that provide insight into the performance of a business organization. These reports and analyses are typically interactive to enable further understanding of specific areas of interest. They are used to support business professionals in their decision-making processes." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"BI combines products, technology, and methods to organize key information that management needs to improve profit and performance. More broadly, we think of BI as business information and business analyses within the context of key business processes that lead to decisions and actions and that result in improved business performance. In particular, BI means leveraging information assets within key business processes to achieve improved business performance." (Nancy Williams & Steve Williams, "The Profit Impact of Business Intelligence", 2010)

"Focuses on the collection of those transactions and forming them into a database structure that facilitates analysis." (Anthony D Giordano, "Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture", 2010)

"Generally used synonymously with the information available in an enterprise for making strategic decisions." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"Using computer software systematically, throughout an organization, to get a handle on the mountains of data that flow from modern business. BI turns the raw data into ready-to-use business information that becomes an ongoing part of strategic decision-making." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"Software that enables users to obtain enterprise-wide information for reporting, analytics, data mining, benchmarking, business performance management, and predictive analytics in order to support business decision making." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"This is a term that describes a broad variety of analytical applications used by an enterprise to get intelligent and meaningful insight into how the business performed in the past or is currently performing. This insight is typically used to make decisions, giving a business a competitive advantage. BI covers a broad field such as Data Warehousing, data marts, text analytics, data mining, or business reporting to name just a few." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A collection of data analysis methods and techniques used by businesses to improve decision making, forecasting, and operational processes in order to gain a competitive advantage in the marketplace." (John R Talburt, "Entity Resolution and Information Quality", 2011)

"A comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Category of applications for gathering, storing, analyzing, and providing access to data to help enterprise users make better decisions." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"Software products that create integrated systems across an organization or between an organization and its customers and suppliers to improve management of employee teams, customer service, and supply chains. May be used for strategic planning, budgeting, financial consolidation, decision support, and reporting to support diagnostic and interactive controls." (Leslie G Eldenburg & Susan K Wolcott, "Cost Management 2nd Ed", 2011)

[Strategic BI:] The application of BI tools to provide metrics to executives, often in conjunction with some formal method of business performance management, to help determine if a corporation is on target for meeting its goals and objectives. Used to support long-term corporate goals and objectives." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

"Business intelligence (BI) is a set of techniques that takes business data and creates information from those data so that managers can make decisions. In that way, organizations create business intelligence." (Michael S Gendron, "Business Intelligence Applied", 2012)

"Business intelligence taps information systems to extract and report data in organized ways that are helpful to decision makers." (John R Schermerhorn Jr, "Management" 12th Ed., 2012)

"Computer-based techniques used in identifying, extracting, and analyzing business data. Common functions of BI technologies are reporting, online analytical processing (OLAP), analytics, data and process mining, complex event processing, business performance management, benchmarking, and predictive analytics." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)

"A broad classification of information-systems-based technologies that support the identification and presentation of insight. Common historical usage referred primarily to reporting-focused systems, but usage of the term has been broadened by some to include all forms of insight generation (including exploratory data analysis and predictive analytics)." (Evan Stubbs, Delivering Business Analytics: Practical Guidelines for Best Practice, 2013)

"A term often used to describe the range of analysis approaches used to process business data." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"A broad category of applications and technologies for reporting, analyzing, and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support systems, query and reporting, and online analytical processing (OLAP)." (Jim Davis & Aiman Zeid, "Business Transformation", 2014)

"A process for improving the decision-making process through enhanced data analysis." (Owen P. Hall Jr., "Teaching and Using Analytics in Management Education", 2014)

"Business intelligence is a set of theories, methodologies, architectures, and technologies that transform raw data into meaningful and useful information for business purposes."(Keith Holdaway, "Harness Oil and Gas Big Data with Analytics", 2014)

"The ability to collect, integrate, and organize the data in a way which received by the right source, at the right time, and via the right tool. It provides basic insights about the data by regenerating reports, queries, alerts, etc." (Shokoufeh Mirzaei, Defining a Business-Driven Optimization Problem, 2014) 

"The practice of reporting what has happened, analyzing contributing data to determine why it happened, and monitoring new data to determine what is happening now. It may include data summarization, visualization, and data interactions capability. Also known as descriptive analytics and reporting." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A broad category of applications, technologies, and processes for integrated acquisition, interpretation, collation, analysis, and exploitation of data to help business users make better decisions in order to improve business operations, reduce uncertainty and apply past experience to develop an exact understanding of business dynamics." (Mandana Farzaneh et al, "Using Fuzzy Logic for Optimizing Business Intelligence Success in Multiple Investment Combinations", 2015)

"Business Intelligence, the set of tools and structures related to the management and the use of data for operational or analytical (decision-making) purposes." (Fernando Iafrate, "From Big Data to Smart Data", 2015)

"Business intelligence is a broad set of information technology (IT) solutions that includes tools for gathering, analyzing, and reporting information to the users about performance of the organization and its environment." (Anil K. Maheshwari, "Business Intelligence and Data Mining", 2015)

"Raw data derived from manufacturing and other business processes that has been organized and structured into meaningful information on which decisions can be based." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"Business intelligence is the process of delivering actionable business decisions from analytical manipulation and presentation of data within the confines of a business environment." (Ahmed Sherif, "Practical Business Intelligence", 2016)

"BI is a popularized, umbrella term that describes a set of concepts and methods used to improve business decision making by using fact-based support systems. The term is sometimes used interchangeably with briefing books and executive information systems. A Business Intelligence system is a data-driven DSS." (Daniel J Power & Ciara Heavin, "Decision Support, Analytics, and Business Intelligence" 3rd Ed., 2017)

"Umbrella term that describe a set of concepts and methods to improve business decision making by using fact-based ­decision support systems. Also, refers to a category of software tools that can be used to extract and analyze data from corporate databases." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

"Business intelligence is getting the right information to the right people at the right time so they can make decisions that ultimately improve performance." (Satyadhyan Chickerur et al, "Forecasting the Demand of Agricultural Crops/Commodity Using Business Intelligence Framework", 2019)

"A technological driven process for analyzing data and presenting information, in such a way that user can take immediate actions and unable decision making." (Neha Garg & Kamlesh Sharma, "Machine Learning in Text Analysis", 2020)

"A set of processes, technologies and tools comprising data warehousing, On-Line Analytical Processing, and information delivery in order to turn data into information and information into knowledge." (Nenad Stefanovic, "Big Data Analytics in Supply Chain Management", 2021)

"A catchall term encompassing a variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources. BI can be used to prepare data for analysis, develop and run queries, and create reports, dashboards and visualizations with the end goal of providing results to decision makers and end users." (Insight Software)

"A process for analyzing data and presenting actionable insights to stakeholders in order to help them make more informed business decisions." (Solutions Review)

"A set of methodologies, processes, architectures, and technologies - supported by organizational structures, roles, and responsibilities - that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision making that contribute to improving overall enterprise performance." (Forrester)

"Encompasses the technologies, applications and practices used in the collection, integration, analysis, and presentation of business information to support better business decision-making." (Accenture)

"Uses technologies, processes, and applications to analyze mostly internal, structured data and business processes to support decision-making. Common functions are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, benchmarking, text mining, predictive analytics, and prescriptive analytics." (Board International)

"The activity of taking data from source systems and turning it into valuable information for business users." (BI System Builders)

"The applications, infrastructure, tools or processes for analyzing data and presenting information to help company executives, managers and others make more informed business decisions." (KDnuggets)

"Business intelligence (BI) combines business analytics, data mining, data visualization, data tools and infrastructure, and best practices to help organizations to make more data-driven decisions." (Tableau) [source]

"Business Intelligence (BI) encompasses the technologies, applications and practices used in the collection, integration, analysis, and presentation of business information to support better business decision-making." (Accenture)

"Business intelligence (BI) includes the applications, infrastructure, tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance." (Tibco) [source]

"Business intelligence involves using software to analyze data so companies can make informed decisions." (Xplenty) [source]

"Business Intelligence (BI), is a methodology which covers the compiling, analyzing and interpreting of business data in order to make better-informed decisions. BI data tends to be put together through extensive research across a wide range of sources like industry reports, customer feedback, actual usage data of the company’s products, and competitive research." (kloudless)

"Business intelligence is actually an environment in which business users receive data that is reliable, consistent, understandable, easily manipulated and timely. With this data, business users are able to conduct analyses that yield overall understanding of where the business has been, where it is now and where it will be in the near future. Business intelligence serves two main purposes. It monitors the financial and operational health of the organization (reports, alerts, alarms, analysis tools, key performance indicators and dashboards). It also regulates the operation of the organization providing two-way integration with operational systems and information feedback analysis." (Information Management)

"BI is a broad term that encompasses data mining, process analysis, performance benchmarking, and descriptive analytics. BI parses all the data generated by a business and presents easy-to-digest reports, performance measures, and trends driving management decisions. Business intelligence addresses the needs of casual users, including executives, managers, front-line workers, customers and suppliers. It delivers reports, dashboards and scorecards that are tailored to each user’s role and populated with metrics aligned with strategic objectives and goals. This top-down style is powered by a classic data warehousing structure that consolidates enterprise data and enforces information consistency by transforming shared data into a common data model (schema) and BI semantic layer (metadata)." (Teradata) [source]

"Business intelligence is a data-driven process for analyzing and understanding how organizations work and make better decisions based on real insights. Business intelligence, or BI, has become a popular term across industries, but it is a catch-all term that encompasses various processes, tools, and methodologies that let companies capture data, analyze it, and derive better answers to key questions." (Sisense) [source]

"Business intelligence is a software-driven process allowing organizations to analyze raw data from multiple sources, extracting insights that lead to more effective business decisions.  […] While the term 'business intelligence' describes both a methodology and a category of enterprise software, the primary activity in business intelligence is data analysis. Business intelligence tools and applications correlate data about business performance and process it to determine the best course of action for a wide range of business functions." (Informatica) [source]

"Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information which helps executives, managers and other corporate end users make informed business decisions. BI encompasses a wide variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, develop and run queries against that data and create reports, dashboards and data visualizations to make the analytical results available to corporate decision-makers, as well as operational workers." (Techtarget) [source]

"Business intelligence (BI) leverages software and services to transform data into actionable insights that inform an organization’s strategic and tactical business decisions. BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts and maps to provide users with detailed intelligence about the state of the business." (CIO) [source]

"Business intelligence (BI) is the collection of processes, technologies, skills, and applications used to make informed, data-driven business decisions. BI includes data collection, data aggregation, analysis, and meaningful presentation that facilitates decision-making." (Talend) [source]

"Business intelligence is the process by which enterprises use strategies and technologies for analyzing current and historical data, with the objective of improving strategic decision-making and providing a competitive advantage." (OmiSci) [source]

"[...] business intelligence is the process of collecting business data and turning it into information that is meaningful and actionable towards a strategic goal. Or put even more simply, BI is the effective use of data and information to make sound business decisions." (Logi Analytics) [source]

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.