23 November 2006

🔢Saurabh Gupta - Collected Quotes

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data in the data lake should never get disposed. Data driven strategy must define steps to version the data and handle deletes and updates from the source systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data lake is an ecosystem for the realization of big data analytics. What makes data lake a huge success is its ability to contain raw data in its native format on a commodity machine and enable a variety of data analytics models to consume data through a unified analytical layer. While the data lake remains highly agile and data-centric, the data governance council governs the data privacy norms, data exchange policies, and the ensures quality and reliability of data lake." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data swamp, on the other hand, presents the devil side of a lake. A data lake in a state of anarchy is nothing but turns into a data swamp. It lacks stable data governance practices, lacks metadata management, and plays weak on ingestion framework. Uncontrolled and untracked access to source data may produce duplicate copies of data and impose pressure on storage systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Metadata is the key to effective data governance. Metadata in this context is the data that defines the structure and attributes of data. This could mean data types, data privacy attributes, scale, and precision. In general, quality of data is directly proportional to the amount and depth of metadata provided. Without metadata, consumers will have to depend on other sources and mechanisms." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"The quality of data that flows within a data pipeline is as important as the functionality of the pipeline. If the data that flows within the pipeline is not a valid representation of the source data set(s), the pipeline doesn’t serve any real purpose. It’s very important to incorporate data quality checks within different phases of the pipeline. These checks should verify the correctness of data at every phase of the pipeline. There should be clear isolation between checks at different parts of the pipeline. The checks include checks like row count, structure, and data type validation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

22 November 2006

🎯William H Inmon - Collected Quotes

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"To interpret and understand information over time, a whole new dimension of context is required. While content of information remains important, the comparison and understanding of information over time mandates that context be an equal partner to content. And in years past, context has been an undiscovered, unexplored dimension of information." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"When management receives the conflicting reports, it is forced to make decisions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"An interesting aspect of KPIs are that they change over time. At one moment in time the organization is interested in profitability. There will be one set of KPIs that measure profitability. At another moment in time the organization is interested in market share. There will be another set of KPIs that measure market share. As the focus of the corporation changes over time, so do the KPIs that measure that focus." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"Both the ODS and a data warehouse contain subject-oriented, integrated information. In that regard they are similar. But an ODS contains data that can be individually updated, deleted, or added. And a data warehouse contains nonvolatile data. A data warehouse contains snapshots of data. Once the snapshot is taken, the data in the data warehouse does not change. So when it comes to volatility, a data warehouse and an ODS are very different." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"In general, analytic processing is known as 'heuristic' processing. In heuristic processing the requirements for analysis are discovered by the results of the current iteration of processing. […] In heuristic processing you start with some requirements. You build a system to analyze those requirements. Then, after you have results, you sit back and rethink your requirements after you have had time to reflect on the results that have been achieved. You then restate the requirements and redevelop and reanalyze again. Each time you go through the redevelopment exercise is called an 'iteration'. You continue the process of building different iterations of processing until such time as you achieve the results that satisfy the organization that is sponsoring the exercise." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There is then a real mismatch between the volume of data and the business value of data. For people who are examining repetitive data and hoping to find massive business value there, there is most likely disappointment in their future. But for people looking for business value in nonrepetitive data, there is a lot to look forward to." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data privacy, data confidentiality, and data protection are sometimes incorrectly diluted with security. For example, data privacy is related to, but not the same as, data security. Data security is concerned with assuring the confidentiality, integrity, and availability of data. Data privacy focuses on how and to what extent businesses may collect and process information about individuals." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data visualization adds credibility to any message. [...] Data visualizations are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While boring numbers are authoritative, data visualization is inclusive. [...] Data visualizations absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her effortlessly."

"Data visualization‘s key responsibilities and challenges include the obligation to earn your audience’s attention - do not take it for granted." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"In general, a data or data set contains its sensitivity or controversial nature only if it is linked or related to an individual’s personal information. Else an isolated, abandoned, or unrelated sensitive or controversial attribute has no significance."

"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

21 November 2006

🔢Angelika Klidas - Collected Quotes

"Also, remember that data literacy is not just a set of technical skills. There is an equal need and weight for soft skills and business skills. This can be misleading for some technical resources within an organization, as those technical resources may believe they are data literate by default as they are data architects or data analysts. They have the existing technical skills, but maybe they do not have any deep proficiencies in other skills such as communicating with data, challenging assumptions, and mitigating bias, or perhaps they do not have an open mindset to be open to different perspectives." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Critical thinking is part of data literacy; it is the ability to question the logic of arguments or assumptions and examine evidence in order to determine whether a claim is true, false, or uncertain." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Current decision-making in business suffers from insight gaps. Organizations invest in data and analytics, hoping that will provide them with insights that they can use to make decisions, but in reality, there are many challenges and obstacles that get in the way of that process. One of the biggest challenges is that these organizations tend to focus on technology and hard skills only. They are definitely important, but you will not automatically get insights and better decisions with hard skills alone. Using data to make better data-informed decisions requires not only hard skills but also soft skills as well as mindsets." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is not achieved by mastering a uniform set of competencies that applies to everyone. Those that are relevant to each individual can vary significantly depending on how they interact with data and which part of the data process they are involved in." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is something that affects everyone and every organization. The more people who can debate, analyze, work with, and use data in their daily roles, the better data-informed decision-making will be." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"It is also important to note that data literacy is not about expecting to or becoming an expert; rather, it is a journey that must begin somewhere." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Organizations must have a plan and vision for data literacy, which they then communicate to all employees. They will need to develop and foster a culture that embraces data literacy and data-informed decisions. They will need to provide employees with access to various learning content specific to data literacy. Along their journey, they will need to make sure they benchmark and measure progress toward their vision and celebrate successes along the way." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"People can get confused or experience anxiety when they have to work with data and analytics. As we data literacy geeks say, people shouldn’t be pushed to work with data and analytics - they should do this because they want to. [...] Visualizations that are not understood present another risk. To be successful with data and analytics, we need visualizations that are presented in a clear meaningful way. If we do not take care of the data literacy levels within an organization, we might lose our public. Therefore, it is necessary to think of the risk of overwhelming our readers/viewers." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

20 November 2006

🔢Zach Gemignani - Collected Quotes

"A culture of data fluency needs to be built on a shared understanding of the data sources, data analysis, key metrics, and data products. It requires employees to be on the same page about how data is used and why it is important." (Zach Gemignani et al, "Data Fluency", 2014)

"Any presentation of data, whether a simple calculated metric or a complex predictive model, is going to have a set of assumptions and choices that the producer has made to get to the output. The more that these can be made explicit, the more the audience of the data will be open to accepting the message offered by the presenter." (Zach Gemignani et al, "Data Fluency", 2014)

"Broadly defined, data means events that are captured and made available for analysis. A data source is a consistent record of these events. And a data product translates this record of events into something that can easily be understood. [...] Data products can be organized and characterized by a series of continuums that describe the nature of the data and how it is presented." (Zach Gemignani et al, "Data Fluency", 2014)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Data alone isn’t valuable. In fact, it can be expensive in time and resources to manage and maintain. The analysis of this data is closer to something that is valuable. A clearly communicated analysis starts to transform a reflection of the world into knowledge in the minds of people. Even so, knowledge alone does not make your organization better. It is the decisions and actions of people - based on this data-sourced knowledge - that is the goal. But these decisions are seldom made in a vacuum. In most organizations, decisions are a collaborative, social experience. People come together to discuss options, review their knowledge of the situation, and arrive at a path to go down. Herein is one of the great powers of effective data products: They can shape and guide these discussions. Conclusions are seldom clear-cut, even when there is data to support a direction." (Zach Gemignani et al, "Data Fluency", 2014)

"Data captures actions and characteristics of the real world and transforms them into something that can be examined and explored after the fact." (Zach Gemignani et al, "Data Fluency", 2014)

"Data visualizations are designed to emphasize patterns and deviations in data. In fact, each specific chart type is well suited to highlighting particular forms of insight. A skilled author of data products will choose the right visualization to emphasize a message. The data, chart, and supporting descriptions should work in harmony to point out what is interesting. The reader simply goes along for the ride." (Zach Gemignani et al, "Data Fluency", 2014)

"Goals associated with a few, well-understood key metrics is a powerful combination. For both internal and external stakeholders, there is a strong alignment between organization mission, vision, goals, and tracking of progress. The efforts of everyone can be directed at these measurable goals, and people will focus on the processes that can impact these metrics." (Zach Gemignani et al, "Data Fluency", 2014)

"In fact, the analogy to storytelling is limited when applied to communicating with data. Data visualization has fundamental characteristics missing from traditional storytelling. For example, interactive data visualizations let audiences explore information to find insights that resonate with them. Visualizations take shape based to a large extent on the underlying data. And as this data changes, the emphasis and message of the visualization is likely to change." (Zach Gemignani et al, "Data Fluency", 2014)

"Metrics can serve two purposes: identifying problems and measuring performance. When the goal is to identify problems and pinpoint areas of operational inefficiency and ineffectiveness, defining the right metric requires a bit of detective work. It requires you to uncover the data residue of a problem and to determine what evidence can be found and how exactly it shows up. When the goal is to measure performance, the right success metrics focus on measures that can be controlled and where improvement in the metric is an unambiguously good thing." (Zach Gemignani et al, "Data Fluency", 2014)

"Most discussions of decision making assume that only senior executives make decisions or that only senior executives' decisions matter. This is a dangerous mistake. Decisions are made at every level of the organization, beginning with individual professional contributors and frontline supervisors. These apparently low-level decisions are extremely important in a knowledge-based organization." (Zach Gemignani et al, "Data Fluency", 2014)

"The most common mistake in ineffective data products is an inability to make difficult decisions about what information is most important. [...] Often information gets included in data products for reasons that are superfluous to the purpose, audience, and message - reasons that cater the product to someone influential or use information that has been included historically. The bar should be higher." (Zach Gemignani et al, "Data Fluency", 2014)

"We have an inbuilt ability to manipulate visual metaphors in ways we cannot do with the things and concepts they stand for—to use them as malleable, conceptual Tetris blocks or modeling clay that we can more easily squeeze, stack, and reorder. And then - whammo! - a pattern emerges, and we’ve arrived someplace we would never have gotten by any other means." (Zach Gemignani et al, "Data Fluency", 2014)

19 November 2006

✏️Steve Wexler - Collected Quotes

"A dashboard is a visual display of data used to monitor conditions and/or facilitate understanding. (Steve Wexler et al, "The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios", 2017)

"A good test of how effective your data visualizations are: can you remove all or most of the numbers and still understand the visualization and make comparisons?" (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"A good visualization can do more than just answer questions; it can help you see that there are other questions you need to answer." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Data visualization isn’t just about informing, it’s also about persuading." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Most organizations are drowning in data but are thirsty for understanding." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The best way to engage people and drive adoption is to provide something useful and meaningful, and one of the best ways to do this is to make it personal." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The goal of using data visualization to make better and faster decisions may lead people to think that any data visualization that is not immediately understood is a failure. Yes, a good visualization should allow you to see things that you might have missed, and to glean insights faster, but you still have to think." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"The problem is that a pie chart does one thing well, and most people don’t use it for that one thing. Specifically, they’re great at giving you a fast and accurate estimate of the part-to-whole relationship for two of the slices. Other than that, pie charts are terrible. [...] The same strengths and shortcomings that apply to the pie chart also apply to the donut chart." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"What is the secret to getting people to use charts and dashboards? Personalization. Inserting the audience into the visualization, and making it especially meaningful and relevant to the user, never fails." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"A dashboard is not a good place to tell a story. But it's a great place to find a story worth telling!" (Steve Wexler) [in media]

🎯Stephen Few - Collected Quotes

"An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication." (Stephen Few, "Information Dashboard Design", 2006)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"A signal is a useful message that resides in data. Data that isn’t useful is noise. […] When data is expressed visually, noise can exist not only as data that doesn’t inform but also as meaningless non-data elements of the display (e.g. irrelevant attributes, such as a third dimension of depth in bars, color variation that has no significance, and artificial light and shadow effects)." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Apart from the secondary benefits of digital data, which are many, such as faster and cheaper information collection and distribution, the primary benefit is better decision making based on evidence. Despite our intellectual powers, when we allow our minds to become disconnected from reliable information about the world, we tend to screw up and make bad decisions." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Data contain descriptions. Some are true, some are not. Some are useful, most are not. Skillful use of data requires that we learn to pick out the pieces that are true and useful. [...] To find signals in data, we must learn to reduce the noise - not just the noise that resides in the data, but also the noise that resides in us. It is nearly impossible for noisy minds to perceive anything but noise in data." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Signals always point to something. In this sense, a signal is not a thing but a relationship. Data becomes useful knowledge of something that matters when it builds a bridge between a question and an answer. This connection is the signal." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context."  (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Everything that informs us of something useful that we didn't already know is a potential signal. If it matters and deserves a response, its potential is actualized." (Stephen Few)

"One of the great purposes of education today is to help us filter the data, to reduce it to what's true and useful." (Stephen Few)

17 November 2006

🔢Adam Bellemare - Collected Quotes

"A data mesh is inherently multimodal, and data products can be provided via a variety of means. Event streams remain the best option for the majority of data products, as it is far easier to power both operational and analytical use cases through a stream than a batch of files at rest." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Bad data is costly to fix, and it’s more costly the more widespread it is. Everyone who has accessed, used, copied, or processed the data may be affected and may require mitigating action on their part. The complexity is further increased by the fact that not every consumer will “fix” it in the same way. This can lead to divergent results that are divergent with others and can be a nightmare to detect, track down, and rectify." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Creating data products requires that domain owners have a degree of autonomy in modeling, building, and delivering data to their consumers. However, by empowering them with autonomy and independence, you run the risk of a significant technological sprawl across data product implementations, making it more difficult for consumers to use the data products for their own ends. Federated governance focuses on finding an equilibrium between the needs of the consumers, the autonomy of the data product owners, the business compliance and security requirements, and global data product requirements." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh architectures are inherently decentralized, and significant responsibility is delegated to the data product owners. A data mesh also benefits from a degree of centralization in the form of data product compatibility and common self-service tooling. Differing opinions, preferences, business requirements, legal constraints, technologies, and technical debt are just a few of the many factors that influence how we work together." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh promotes data to a product with the same rigor, ownership, and feature management of any other product in your business. The free-for-all, 'figure it out yourself' data access is replaced with purpose-built, maintained, and supported modes. It is as much a social shift as it is a technological shift and requires both top-down and bottom-up buy-in." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Enforcing a schema at read time, instead of at write time, leads to a proliferation of what we call 'bad data'. The lack of write-time checks means that data written into HDFS may not adhere to the schemas that the readers are using in their existing work […]. Some bad data will cause consumers to halt processing, while other bad data may go silently undetected. While both of these are problematic, silent failures can be deadly and difficult to detect." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Federated governance can be roughly broken down into two main tasks. The first is establishing cross-organization policies, including data product standards and datah andling requirements, that apply to all users of the data mesh. The second is providing guidance on creating and using data products with self-service tools to make it easy to participate in the data mesh." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The premise of the data mesh solution is simple. Publish important business data sets to dedicated, durable, and easily accessible data structures known as data products. The original creators of the data are responsible for modeling, evolution, quality, and support of the data, treating it with the same first-class care given to any other product in the organization. Prospective consumers can explore, discover, and subscribe to the data products they need for their business use cases. The data products should be well-described, easy to interpret, and form the basis for a set of self-updating data primitives for powering both business services and analytics." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The problem of bad data has existed for a very long time. Data copies diverge as their original source changes. Copies get stale. Errors detected in one data set are not fixed in duplicate ones. Domain knowledge related to interpreting and understanding data remains incomplete, as does support from the owners of the original data." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

14 November 2006

🎯Zhamak Dehghani - Collected Quotes

"A data pipeline is a series of transformation steps (functions) executed as the data flows from one step to another. Data mesh refrains from using pipelines as a top-level architectural paradigm and in between data products. The challenge with pipelines as currently used is that they don’t create clear interfaces, contracts, and abstractions that can be maintained easily as the pipeline complexity complexity grows. Due to lack of abstractions, single failure in the pipeline causes cascading failures." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product encapsulates more than just the data. It needs to contain all the structural components needed to manifest its baseline usability characteristics - discoverable, understandable, addressable, etc. - in an autonomous fashion, while continuing to share data in a compliant and secure manner."(Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product’s primary job is to consume data from upstream sources using its input data ports, transform it, and serve the result as permanently accessible data via its output data ports." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is a collection of facts put together according to a model. The data model is an approximation of reality, good enough for the (analytical) tasks at hand." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In addition to limitations of scale, other challenges of data centralization are data quality and resilience to change. This is because business domains and teams that are most familiar with the data are not responsible for data quality." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] the governance function is accountable to define what constitutes data quality and how each data product communicates that in a standard way. It’s no longer accountable for the quality of each data product. The platform team is accountable to build capabilities to validate the quality of the data and communicate its quality metrics, and each domain (data product owner) is accountable to adhere to the quality standards and provide quality data products." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data management of the future must build in embracing change, by default. Rigid data modeling and querying languages that expect to put the system in a straitjacket of a never-changing schema can only result in a fragile and unusable analytics system. [...] The data management of the future must support managing and accessing data across multiple hosting platforms, by default." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh focuses on the impact of the data and not its volumes. It values data usability, data satisfaction, data availability, and data quality over the volume of the data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] data mesh introduces a fundamental shift that the owners of the data products must communicate and guarantee an acceptable level of quality and trustworthiness - specific to their domain - as an intrinsic characteristic of their data product. This means cleansing and running automated data integrity tests at the point of the creation of a data product." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is an element of a data strategy that fosters a data-driven organization to get value from data at scale." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In short, a monolithic architecture, technology, and organizational structure are not suitable for analytical data management of large-scale and complex organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed. It has high functional cohesion, i.e., performing a specific analytical transformation and securely sharing the result as domain-oriented analytical data. It has all the structural components that it requires to do its function: the transformation code, the data, the metadata, the policies that govern the data, and its dependencies to infrastructure." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"One of the limitations of data management solutions today is how we have attempted to manage its unwieldy complexity, how we have decomposed an ever-growing monolithic data platform and team to smaller partitions. We have chosen the path of least resistance, a technical partitioning." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"The distributed nature of data mesh demands immutability to give confidence to data users that (1) there is consistency between multiple data products for a point-in-time piece of data and (2) once they read data at a point in time, that data doesn’t change and they can reliably repeat the reads and processing." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"There are a set of characteristics that can be grouped together as quality. These attributes aren’t intended to define whether a data product is good or bad. They just communicate the threshold of guarantees the data product expects to meet, which may be well within an acceptable range for certain use cases." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Unlike other analytical data management paradigms, data mesh does not embrace the concept of the mythical single source of truth. Every data product provides a truthful portion of the reality - for a particular domain - to the best of its ability, a single slice of truth." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Ultimately, Data Mesh’s goal is to enable organizations to thrive in the face of the growth of data sources, growth of data users and use cases, and the increasing change in cadence and complexity. Adopting Data Mesh, organizations must thrive in agility, creating data-driven value while embracing change." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

13 November 2006

🔢Sid Adelman - Collected Quotes

"Data archeology (finding bad data), data cleansing (correcting bad data), and data quality enforcement (preventing data defects at the source) should be business objectives. Therefore, data quality initiatives are business initiatives and require the involvement of business people, such as information consumers and data originators." (Sid Adelman et al, "Data Strategy", 2005)

"Data strategy is one of the most ubiquitous and misunderstood topics in the information technology (IT) industry. Most corporations' data strategy and IT infrastructure were not planned, but grew out of "stovepipe" applications over time with little to no regard for the goals and objectives of the enterprise. This stovepipe approach has produced the highly convoluted and inflexible IT architectures so prevalent in corporations today." (Sid Adelman et al, "Data Strategy", 2005)

"Dealing with [...] resistance is where social sensitivity, leadership, and power come into play. Social sensitivity is the ability to read the players and respond appropriately to their concerns. Leadership and power can quickly overcome most resistance to change and allow you to establish an environment and convince management to properly support the data strategy." (Sid Adelman et al, "Data Strategy", 2005)

"It is important to remember that the 'single version of the truth' - or enterprise logical data model - is not and should not be built all at once (that would take too long), but that it evolves over time as the project-specific logical data models are merged, one-by-one, a project at a time." (Sid Adelman et al, "Data Strategy", 2005)

"The chaos without a data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data, the inability to integrate, poor performance, terrible availability, little accountability, users who are increasingly dissatisfied with the performance of IT, and the general feeling that things are out of control." (Sid Adelman et al, "Data Strategy", 2005)

"The data strategist is responsible for creating and maintaining the data strategy. This includes fully understanding the strategic goals of the organization. [...] The data strategist must know (or learn) the existing environment including the important internal databases, the external data that will be integrated, and the data quality characteristics. The data strategist must be aware of the data volumes expected in the next five years. [...] The data strategist must be aware of changes in the business that will require more complex transactions and queries. He or she must also be aware of governmental factors including regulations and governmental reporting requirements. The data strategist must know about the requirements of service level agreements (SLAs) for both performance and availability and be sure that the data strategy supports those SLAs (it's also likely that the data strategist would have input into creating those SLAs.) And finally, the data strategist must be wired into the politics of the organization so that his or her proposals will be pragmatic and accepted by management and staff." (Sid Adelman et al, "Data Strategy", 2005)

"The folks in IT don't like change if they believe it will diminish the power of the IT group. This is particularly true for managers. Managers put forward countless reasons why the organization should stay as is, especially if a change can decrease the number of employees they control because managers often equate headcount to power in the organization." (Sid Adelman et al, "Data Strategy", 2005) [?!]

"The vision of a data strategy that fits your organization has to conform to the overall strategy of IT, which in turn must conform to the strategy of the business. Therefore, the vision should conform to and support where the organization wants to be in 5 years." (Sid Adelman et al, "Data Strategy", 2005)

"Working without a data strategy is analogous to a company allowing each department and each person within each department to develop its own financial chart of accounts. This empowerment allows each person in the organization to choose his own numbering scheme. Existing charts of accounts would be ignored as each person exercises his or her own creativity." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

"You cannot boil the ocean; you have to prioritize your data integration deliverables. An enterprise-wide data integration effort must be carved up into small iterative projects, starting with the most critical data and working down to the less significant data. The business people working with the data integration team must determine which data is most appropriate for integration. Some data might not be suitable for integration at all, such as department-specific data, highly secured data, and data that is too risky to integrate. The team also needs to look at historical data and decide how much of it to include in the data integration process." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

🔢Ian Wallis - Collected Quotes

"A data strategy is the opportunity to bring data, one of the most important assets your organisation has, to the fore and to drive the future direction of the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A data strategy which no longer reflects the priorities of the organisation as a whole is doomed to fail, and likely to struggle to keep any momentum beyond the immediate term." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A KPI is a performance measure that demonstrates how effectively an organisation is achieving its critical objectives. They are used to track performance over a period of time to ensure the organisation is heading in the desired direction, and are quantifiable to guide whether activities need to be dialled up or down, resources adjusted or management resource focused on understanding what is in play that may be holding back the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Culture is not something that can be read in a corporate document (though many organisations will claim to have values, beliefs and other concepts that articulate the culture as the corporate centre wants it to be seen). It is intangible and can be challenging to comprehend to those on the outside looking in. Much of it is unspoken, a series of behavioural norms which are engrained in the fabric of the organisation and drive attitudes of employees to one another, management, change programmes and any external (to the group, as well as the organisation) effort to drive change that may be resisted simply because it ‘isn’t the way we do things around here’." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data has a value, without which an organisation is largely a shell, worthless and of limited appeal other than as a means of sweeping up fixed assets at a knock-down price. It is the lifeblood of an organisation, so whether you regard it as the water that is essential to life or the blood circulating around the body, without it our organisations are not functional." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data strategy is even less understood [thank business strategy], so the chances of success can be further decreased, simply because you need organisation-wide commitment and buy-in to succeed. Data does not exist in a bubble; it is not the preserve of a function that can fix it for all, detached from touching everyone else. It is core to how you run the organisation, and without a focus on where you are heading, it is going to trip the organisation up at every turn - regulatory compliance; operational effectiveness; financial performance; customer and employee experience; essentially, the efficiency in managing virtually every activity in the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"I am using ‘data strategy’ as an overarching term to describe a far broader set of capabilities from which sub-strategies can be developed to focus on particular facets of the strategy, such as management information (MI) and reporting; analytics, machine learning and AI; insight; and, of course, data management." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"If there is one all too common a failing in data strategies, it is the temptation to make them too detailed through either straying into implementation activities or overplaying the content by providing too much information. The key is to recognise the level of information that needs to be imparted to make the data strategy coherent and likely to be endorsed, with as little information as is necessary to be able to make the point cogently. Brevity, and associated clarity in what needs to be achieved and why, is a winning formula in gaining senior executive sponsorship." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"It is also important to regard the data strategy as a living document. Do not regard it as a masterpiece, never to be reviewed, amended or critiqued within the time frame it covers, but instead see it as a strategy that can flex to the changing demands of an organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"[...] it is always useful to learn from past mistakes, but evidence shows that most strategies fail due to an inability to follow through into execution." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"In the same vein, data strategy is often a misnomer for a much wider scope of coverage, but the lack of coherence in how we use the language has led to data strategy being perceived to cover data management activities all the way through to exploitation of data in the broadest sense. The occasional use of information strategy, intelligence strategy or even data exploitation strategy may differentiate, but the lack of a common definition on what we mean tends to lead to data strategy being used as a catch-all for the more widespread coverage such a document would typically include. Much of this is due to the generic use of the term ‘data’ to cover everything from its capture, management, governance through to reporting, analytics and insight." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Many organisations start a data strategy from a need to get data into some sort of organised state in which it is feasible to demonstrate compliance. In my opinion, compliance should be a component of a data strategy, not the data strategy in itself." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The challenge with using OKRs is to focus on just three to five objectives - sounds simple enough, but so many organisations follow the ‘if it moves, track it’ philosophy such that they can’t see the wood for the trees." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The key for a successful data strategy is to align it clearly with the corporate strategy. The data strategy is a crucial enabler of the corporate strategy, and the data strategy should clearly call out those components that have a clear line of sight to delivering, or enabling, the corporate goals. If the data strategy does not align to the corporate goals it will be a much more challenging task to get the wider organisation to buy into it, not least because it will fail to have any resonance with the objectives of the organisational leaders and be regarded as optional at best." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The KPI juggernaut has been misused and abused in too many organisations to the extent it has devalued the concept of KPIs. KPIs used well - the ten things that really matter to an organisation - can, in my experience, be a real galvanising force to get focus and attention put in those areas which really can make a difference. The rest is a distraction, there through some misplaced view that more adds value when actually it detracts through losing the focus from where it needs to be." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The nature of the change that the data strategy is to drive will be determined by the appetite and commitment of the organisation to change. It will also be shaped by the maturity of the organisation, with the maturity assessment process having identified and demonstrated where the gaps lie, and the resolve of the organisation to set its own pace and objectives to be achieved by the time of the next assessment." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The premise of OKRs is to keep objectives and results simple and flexible, ensuring they align with business goals and enterprise initiatives guided by regular reviews to assess progress during the quarter. The intent is to keep OKRs clear and accountable, as well as measurable, with between three and five objectives recommended at a high level that can each be tracked by three to five key measures. They should be ambitious goals, even uncomfortable, in challenging aspirations, making them stretch targets." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

🔢Bernard Marr - Collected Quotes

"A good data strategy is not determined by what data is readily or potentially available –​​​​​​​ it’​​​​​​​s about what your business wants to achieve, and how data can help you get there." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"A picture can paint a thousand words, as the saying goes. In this way, visuals are great for conveying information because they’​​​​​​​re quick and direct, they’​​​​​​​re memorable, and they add interest (being much more likely to hold the reader’​​​​​​​s attention than a full page of text). But unless we know how to decode its message, a picture can also be difficult to read." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Analytics is the process of collecting, processing and analysing data to generate insights that help you improve the way you do business." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Data for data’​​​​​​​s sake is meaningless. Therefore, instead of hoarding data, collect only what you really need and what makes business sense." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"Data is certainly exciting –​​​​​​​ revolutionary, even. But that doesn’​​​​​​​t always mean useful. To be truly useful, in a business sense, data must address a specific business need, help the organization reach its strategic goals, or generate real value." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"[…] from a data strategy point of view, you need to describe the ideal data sets that would help you achieve your strategic objectives. You can then choose the best options for you based on how well they help you achieve your objectives, how easy it is to access or gather that data, and how cost effective it is." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"However you plan to use data, even if you plan to treat data as a key business asset, it is never a good idea to capture huge mountains of data that you don’​​​​​​​t really need. Remember, the power of big data is not in the data it - self, it’​​​​​​​s in how you use it." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"I can’​​​​​​​t stress enough how important this stage is; ‘​​​​​​​selling’​​​​​​​ big data to your people is a crucial early step on your data journey. It instils confidence in data." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"[…] if companies want to avoid drowning in data, they need to develop a smart strategy that focuses on the data they really need to achieve their goals. In other words, this means defining the business-critical questions that need answering and then collecting and analysing only that data which will answer those questions." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Structured data is any data or information that is located in a fixed field within a defined record or file, usually in databases or spreadsheets. Essentially, it is data that is organized in a predetermined way, usually in rows and columns." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"[…] the better insights are communicated, the more likely it is that data leads to positive action (in this case, better business decisions)." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Unfortunately, the widespread perception among business executives is that data and analytics are purely IT matters. And as with all IT matters, this means they don’​​​​​​​t really need to understand how they work, or why." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"When data isn’​​​​​​​t properly looked after, it becomes meaningless and valueless. Even worse, if the data is out of date, incorrectly categorized, or used out of context, it can lead to misinformed decisions that can damage the long-term health of the company." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

11 November 2006

🎯🏭🗒️Sonia Mezzetta - Collected Quotes

"A data architecture needs to have the robustness and ability to support multiple data management and operational models to provide the necessary business value and agility to support an enterprise’s business strategy and capabilities." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"A data strategy must align with the business goals and overall framework of how data will be used and managed within an organization. It needs to include standards for how data will be discovered, integrated, accessed, shared, and protected. It needs to address how data will meet regulatory compliance policies, Master Data Management, and data democratization. There needs to be an assurance that both data and metadata have a quality control framework in place to achieve data trust. A data strategy needs to have a clear path on how an organization will accomplish data monetization." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"Data Fabric focuses on Self-Service data access via active metadata leveraging a composable set of tools and technologies. It offers the ability to discover, understand, and access data across hybrid and multi-cloud data landscapes with automation and Data Governance. It is primarily process and technology centric with flexibility in supporting diverse organizational models. On the other hand, Data Mesh is organizationally and process driven. It requires a technical implementation approach to execute its design. Data Mesh is at a higher level and Data Fabric is at a lower level. Data Fabric is capable of fulfilling Data Mesh’s key principles." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"[Data Fabric] is not a single technology, such as data virtualization. […] It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue. It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data. Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"Data Fabric supports a federated, decentralized, or centralized organization. To participate in Data Fabric, metadata is contributed in an automated manner and knowledge is populated from it to propel data management. Data Fabric is different from a Data Mesh design in that it supports decentralized, federated, and centralized organizations. Data Fabric’s objectives are to help an organization to evolve to a more mature level of data management by leveraging active metadata, which is a core prerequisite." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"I emphasize this point as there are views in the industry that Data Fabric is a centralized storage architecture, which is not the case from my point of view. A Data Fabric architecture is driven by the needs and direction of the business architecture." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"Where Data Mesh differs from Data Fabric is that it has fixed requirements for the Self-Service platform focused on organizing and managing Data Products by business domain. Another difference is Data Fabric supports managing data as an asset and as a product. A Data Product can be composed of assets that have been governed and managed in a Data Fabric architecture. Data Fabric does not have these fixed requirements, although it inherently supports isolating data and Data Governance enforcement via metadata by business domain. You can think of a Data Mesh Self-Service data platform as supporting separate, independent companies (business domains), although the key criteria are that it does not create data silos and attains data sharing across these companies in a secure, quick, and easy manner. In Data Mesh, Data Products are created and managed by federated business domains and a data platform requires capabilities that enable data and policy federation. This is where a Data Fabric solution can also address Data Mesh’s requirements." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

10 November 2006

🔢Pearl Zhu - Collected Quotes

"A good strategy tells you not only what specifically needs to accomplish, but WHY." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Agile is more a 'direction', than an 'end'. Transforming to Agile culture means the business knows the direction they want to go on." (Pearl Zhu, "Digital Agility: The Rocky Road from Doing Agile to Being Agile", 2016)

"Breaking rules is indeed an important part of creativity. Innovation needs a level of guidance." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Good governance is less about structure and rules than being focused, effective and accountable." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Governance is not about maximization, but about optimization." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Selecting the right measure and measuring things right are both art and science. And KPIs influence management behavior as well as business culture." (Pearl Zhu, "CIO Master: Unleash the Digital Potential of It", 2016)

"Setting the right priorities or having superior time management skill means knowing the difference between 'must have', and 'nice to have'." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"The art of questioning is to ignite innovative thinking; the science of questioning is to frame system thinking, with the progressive pursuit of better solutions." (Pearl Zhu, "Leadership Master: Five Digital Trends to Leap Leadership Maturity", 2016)

"The 'result' of micromanagement is perhaps tangible in the short run, but more often causes damage for the long term." (Pearl Zhu, "Change Insight: Change as an Ongoing Capability to Fuel Digital Transformation", 2016)

"Using two-dimensional lenses to perceive the multi-faceted world can limit your ability to observe the world more objectively." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"A 'roadmap' is simply a plan for moving or transitioning, from one state to another. A roadmap provides the direction to the future." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A well-defined set of digital rules are not for limiting innovation, but for setting the frame of relevance and guide through changes and digital transformation." (Pearl Zhu, "100 Digital Rules: Setting Guidelines to Explore Digital New Normal", 2017)

"Building a comprehensive problem-solving framework is about leveraging a structured methodology that allows you to frame problems systematically and solve problems creatively." (Pearl Zhu, "Problem Solving Master: Frame Problems Systematically and Solve Problem Creatively", 2017)

"Decision makers with emotional excellence have the ability to dispassionately examine alternatives via fact finding, analysis, structured planning, objective evaluations, and comparison." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision making is an art only until the person understands the science." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision maturity is to ensure the right decisions have been made by the right people at the right time to solve the right problems." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Digital synchronization and strategic alignment occur when all parts of the choir sing their respective parts in harmony to achieve a higher purpose." (Pearl Zhu, "12 CIO Personas: The Digital CIO's Situational Leadership Practices", 2017)

"Digitalization implies the full-scale changes in the way business is conducted so that it’s a multi-dimensional planning and orchestration." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"Framing the right problem is equally or even more important than solving it." (Pearl Zhu, “Change, Creativity and Problem-Solving”, 2017)

"Most organizations fail to manage performance effectively because they fail to look into the system holistically." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"The science of decision-making is to make sure there is an effective decision process in place." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"It is important to strengthen the weakest link, to ensure all important business elements integrated and knitted into ongoing organizational capabilities and unique business competency." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"The simplicity and the complexity are just the opposite ends of the same spectrum." (Pearl Zhu, "Digital Gaps: Bridging Multiple Gaps to Run Cohesive Digital Business", 2017)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2017)

"You can’t improve what you are not managing, you can’t manage what you are not measuring, and you can’t measure what you are not focusing." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A business ecosystem is just like the natural ecosystem; first, needs to be understood, then, needs to be well planned, and also needs to be thoughtfully renewed as well." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"A seamless digital transformation requires a vision to convey 'WHY', a solid strategy to clarify 'WHAT', and a technical specification to articulate 'HOW' you want to transform radically." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"An organizational structure carries inherent capabilities as to what can be achieved within its frame." (Pearl Zhu, Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight, 2018)

"Change Management is a journey, not just a one-time project, riding ahead of change curve takes both strategy and methodology." (Pearl Zhu, "The Change Agent CIO: The CIO’s Dynamic Role of Leading Digitalization", 2018)

"Coherence improves business flow; resilience makes business robust and anti-fragile." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"Going digital is more like a journey than a destination. Predicting and preparing the next level of digitalization is an iterative learning and doing continuum." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Ideally, the two structures - hierarchy, and relationship structure wrap around each other to ensure responsibility, to keep information flow and the creation of power." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Taking the multidimensional hybrid models for going digital is all about how to strike the right balance of reaping quick wins and focusing on the long-term strategic goals." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"The most effective digital workplace is one where collaboration and sharing are the norms." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

08 November 2006

🔢Robert Hawker - Collected Quotes

"[...] a conceptual data model [...] is system-agnostic and is a diagrammatic business representation of how different types of data are associated with one another in the organization." (Robert Hawker, "Practical Data Quality", 2023)

"A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Correction of data in the secondary source is not recommended. However, it is important to recognize that sometimes, secondary source fixes are required." (Robert Hawker, "Practical Data Quality", 2023)

"Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined." (Robert Hawker, "Practical Data Quality", 2023)

"Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column." (Robert Hawker, "Practical Data Quality", 2023)

"Data quality rules are only effective if they are tightly scoped. Generic rules tend to produce a lot of unwanted failed records, and business users start to ignore the results. Once business users lose faith in what they see from a data quality tool, it is hard to restore engagement." (Robert Hawker, "Practical Data Quality", 2023)

"Every data quality initiative is different, and senior stakeholders at different organizations will have different needs." (Robert Hawker, "Practical Data Quality", 2023)

"If an organization had a single overall data quality key performance indicator (KPI), then it might be appropriate to put a greater weighting on those rules which would impact regulatory compliance. A lack of regulatory compliance is a risk to the very existence of organizations like these, and therefore, a greater weighting might be needed." (Robert Hawker, "Practical Data Quality", 2023)

"It rarely makes sense to aim for what people might consider perfect data (every record is complete, accurate, and up to date). The investment required is usually prohibitive, and the gains made for the last 1% of data quality improvement effort become far too marginal." (Robert Hawker, "Practical Data Quality", 2023)

"In truth, no one knows how much bad data quality costs a company – even companies with mature data quality initiatives in place, who are measuring hundreds of data points for their quality struggle to accurately measure quantitative impact. This is often a deal-breaker for senior leaders when trying to get approval for a budget for data quality work. Data quality initiatives often seek substantial budgets and are up against projects with more tangible benefits." (Robert Hawker, "Practical Data Quality", 2023)

"Momentum is important in data quality initiatives. If an issue is problematic, even where the priority is high, it can be better to move on to an issue that can be progressed efficiently." (Robert Hawker, "Practical Data Quality", 2023)

"Most data quality issues will re-occur if the root cause is not fully understood [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Organizations will always only have a limited amount of resources available to remediate data. It will almost certainly not be possible to tackle all the issues at the same time. Therefore, prioritization is key to ensuring that the most value is generated from the available resources." (Robert Hawker, "Practical Data Quality", 2023)

"Successful organizations try to put a holistic data culture in place. Everyone is educated on the basics of looking after data and the importance of having good data. They consider what they have learned when performing their day-to-day tasks. This is often referred to as the promotion of good data literacy." (Robert Hawker, "Practical Data Quality", 2023)

"The biggest mistake that can be made in a data quality initiative is focusing on the wrong data. If you fix data that does not impact a critical business process or drive important decisions, your initiative simply will not make the difference that you want it to." (Robert Hawker, "Practical Data Quality", 2023)

"The data should be monitored in the source, it should be corrected in the source, and it should then feed the secondary source(s) with high-quality data that can be used without workarounds. The reduction in workarounds will make the data engineers, scientists, and data visualization specialists much more productive." (Robert Hawker, "Practical Data Quality", 2023)

"The level of data quality in an organization is the extent to which data can be used for its intended purposes."  (Robert Hawker, "Practical Data Quality", 2023)

"Start with a business strategy. Too many organizations start their data quality initiative by looking at the details of the data and trying to see 'what is wrong with it'. The right approach is to understand what the business is trying to achieve and to work out where data issues might impede this. It ensures that data quality work will be truly impactful." (Robert Hawker, "Practical Data Quality", 2023)

05 November 2006

✏️John Hoffmann - Collected Quotes

"A useful way to think about tables and graphics is to visualize layers. Just as photographic files may be manipulated in photo editing software using layers, data presentations are constructed by imagining that layers of an image are placed one on top of another. There are three general layers that apply to visual data presentations: (a) a frame that is typically a rectangle or matrix, (b) axes and coordinate systems (for graphics), and (c) data presented as numbers or geometric objects." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Also known as line charts or line plots, this type of graphic displays a series of data points using line segments. […] Do not include too many lines, especially if they are difficult to distinguish. […] it is best to label the lines directly rather than use a legend. […] It is not a good idea to use line graphs with unordered categorical (nominal) data These graphs are simpler to understand when the data are ordered in some way. […] Visual acuity is enhanced when the lines do not touch the x- or y-axis […] There is no need, except under exceptional circumstances, to include a marker to show at what point the line matches a specific value of the x- and y-axes. Line graphs are designed to display patterns and trends rather than data points." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Clarity is related to two other principles of good data presentation: precision and efficiency. Precision refers to ensuring that the data are presented accurately with minimal error. This is a topic that is equally important to data presentation as it is to data management. Always keep in mind: don’t mislead the audience. As already mentioned, people can be fooled by visual images, but they can also be misled by the myth of the infallible graphic. This refers to a tendency to believe there is an important association among concepts simply because they are correlated." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Contrasts can be a help or a hindrance. Our eyes are drawn to bright colors on muted backgrounds. In addition, warm colors, such as red, are more likely to get attention than cool colors (although the relative brightness affects this phenomenon). Objects in color that are included in black and white or grayscale visuals are quite effective at drawing the eye. Thus, using color to highlight certain parts of a graphic or table can be valuable. However, avoid using these strategies if they will draw attention to extraneous or trivial parts of the data presentation." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"If colors are used for different bars in a graphic, use distinguishable shades of the same color rather than distinct colors. If lines are in color in a graph, use those that are easy to discriminate, such as red and blue. But be careful of lines that cross since a red line is perceived as in front of a blue line. If colors are employed in a table, used them to highlight the relevant comparisons you wish to make. […] Use colors to highlight important parts of the graphic. […] But be careful because this practice is easily abused." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"It is generally a good idea to avoid gridlines, vertical lines, and double lines. Use single horizontal lines to separate the title, headers, and content. Lines are also employed to identify column spanners, which are used to group particular columns of data." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Many data presentations spice up the image with background images, embedded visuals, ornate typeface, and bright colors. Our eyes may be drawn to these aspects, rather than to the patterns in the data, thus breaking the principles of clarity and efficiency. It is usually best to take out the clutter: remove the chartjunk." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"People tend to comprehend visual images quicker and with fewer errors than words on a page. Visual images also activate memories better than words." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"Reference tables show a lot of data with a high degree of precision. They are designed generally to provide users with a way to fi nd particular pieces of data. […] Summary tables provide some type of extraction of data from a reference table or a spreadsheet. The data are usually manipulated, analyzed, or summarized in some way, such as by sorting or providing summary statistics (means, percentages, ranges). The results of statistical models are usually presented in research reports using this type of table." (John Hoffmann, "Principles of Data Management and Presentation", 2017)

"Some experts argue that axes - in particular, the y-axis - should always begin at zero. However, when differences are small, yet the size of the numbers is relatively large, this can make detection difficult. On the other hand, viewers can be misled by manipulating the axes to magnify differences. One guideline is to always use a zero bottom point when judging absolute magnitudes. This is often the case in bar charts." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"Titles should clearly specify the content of the table or the graphic. What is being presented? Means and standard deviations? Confidence intervals? Percentages? Trends over time? Furthermore, consider the context, such as when and where the data were gathered, as well as the name of the dataset if using secondary data (although the dataset may also be identified in a source note)." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"Whichever scale is used to represent the data, it is important to keep it consistent in data presentations. The principles of clarity, precision, and efficiency are rarely met if the measurement scales change within tables." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.