"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)
"To interpret and understand information over time, a
whole new dimension of context is required. While content of information
remains important, the comparison and understanding of information over time
mandates that context be an equal partner to content. And in years past,
context has been an undiscovered, unexplored dimension of information."
"When management receives the conflicting reports, it is
forced to make decisions based on politics and personalities because neither
source is more or less credible. This is an example of the crisis of data
credibility in the naturally evolving architecture." (William H Inmon, "Building
the Data Warehouse" 4th Ed., 2005)
"An interesting aspect of KPIs are that they change over time. At one moment in time the organization is interested in profitability. There will be one set of KPIs that measure profitability. At another moment in time the organization is interested in market share. There will be another set of KPIs that measure market share. As the focus of the corporation changes over time, so do the KPIs that measure that focus." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)
"Both the ODS and a data warehouse contain subject-oriented, integrated information. In that regard they are similar. But an ODS contains data that can be individually updated, deleted, or added. And a data warehouse contains nonvolatile data. A data warehouse contains snapshots of data. Once the snapshot is taken, the data in the data warehouse does not change. So when it comes to volatility, a data warehouse and an ODS are very different." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)
"In general, analytic processing is known as 'heuristic' processing. In heuristic processing the requirements for analysis are
discovered by the results of the current iteration of processing. […] In
heuristic processing you start with some requirements. You build a system to
analyze those requirements. Then, after you have results, you sit back and
rethink your requirements after you have had time to reflect on the results
that have been achieved. You then restate the requirements and redevelop and
reanalyze again. Each time you go through the redevelopment exercise is called
an 'iteration'. You continue the process of building different iterations of
processing until such time as you achieve the results that satisfy the
organization that is sponsoring the exercise."
"There are, however, many problems with independent data
marts. Independent data marts: (1) Do not have data that can be reconciled with
other data marts (2) Require their own independent integration of raw data (3) Do
not provide a foundation that can be built on whenever there are future
analytical needs."
"There is then a real mismatch between the volume of data
and the business value of data. For people who are examining repetitive data
and hoping to find massive business value there, there is most likely
disappointment in their future. But for people looking for business value in
nonrepetitive data, there is a lot to look forward to." (William H Inmon &
Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data,
Data Warehouse and Data Vault", 2015)
"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"Data privacy, data confidentiality, and data protection are sometimes incorrectly diluted with security. For example, data privacy is related to, but not the same as, data security. Data security is concerned with assuring the confidentiality, integrity, and availability of data. Data privacy focuses on how and to what extent businesses may collect and process information about individuals." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"Data visualization adds credibility to any message. [...] Data visualizations are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While boring numbers are authoritative, data visualization is inclusive. [...] Data visualizations absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her effortlessly."
"Data visualization‘s key responsibilities and challenges include the obligation to earn your audience’s attention - do not take it for granted." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"In general, a data or data set contains its sensitivity or controversial nature only if it is linked or related to an individual’s personal information. Else an isolated, abandoned, or unrelated sensitive or controversial attribute has no significance."
"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)
"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)