SQL Troubles

23 November 2006

🔢Saurabh Gupta - Collected Quotes

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data in the data lake should never get disposed. Data driven strategy must define steps to version the data and handle deletes and updates from the source systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data lake is an ecosystem for the realization of big data analytics. What makes data lake a huge success is its ability to contain raw data in its native format on a commodity machine and enable a variety of data analytics models to consume data through a unified analytical layer. While the data lake remains highly agile and data-centric, the data governance council governs the data privacy norms, data exchange policies, and the ensures quality and reliability of data lake." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data swamp, on the other hand, presents the devil side of a lake. A data lake in a state of anarchy is nothing but turns into a data swamp. It lacks stable data governance practices, lacks metadata management, and plays weak on ingestion framework. Uncontrolled and untracked access to source data may produce duplicate copies of data and impose pressure on storage systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Metadata is the key to effective data governance. Metadata in this context is the data that defines the structure and attributes of data. This could mean data types, data privacy attributes, scale, and precision. In general, quality of data is directly proportional to the amount and depth of metadata provided. Without metadata, consumers will have to depend on other sources and mechanisms." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"The quality of data that flows within a data pipeline is as important as the functionality of the pipeline. If the data that flows within the pipeline is not a valid representation of the source data set(s), the pipeline doesn’t serve any real purpose. It’s very important to incorporate data quality checks within different phases of the pipeline. These checks should verify the correctness of data at every phase of the pipeline. There should be clear isolation between checks at different parts of the pipeline. The checks include checks like row count, structure, and data type validation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

22 November 2006

🎯William H Inmon - Collected Quotes

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"To interpret and understand information over time, a whole new dimension of context is required. While content of information remains important, the comparison and understanding of information over time mandates that context be an equal partner to content. And in years past, context has been an undiscovered, unexplored dimension of information." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"When management receives the conflicting reports, it is forced to make decisions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"An interesting aspect of KPIs are that they change over time. At one moment in time the organization is interested in profitability. There will be one set of KPIs that measure profitability. At another moment in time the organization is interested in market share. There will be another set of KPIs that measure market share. As the focus of the corporation changes over time, so do the KPIs that measure that focus." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"Both the ODS and a data warehouse contain subject-oriented, integrated information. In that regard they are similar. But an ODS contains data that can be individually updated, deleted, or added. And a data warehouse contains nonvolatile data. A data warehouse contains snapshots of data. Once the snapshot is taken, the data in the data warehouse does not change. So when it comes to volatility, a data warehouse and an ODS are very different." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"In general, analytic processing is known as 'heuristic' processing. In heuristic processing the requirements for analysis are discovered by the results of the current iteration of processing. […] In heuristic processing you start with some requirements. You build a system to analyze those requirements. Then, after you have results, you sit back and rethink your requirements after you have had time to reflect on the results that have been achieved. You then restate the requirements and redevelop and reanalyze again. Each time you go through the redevelopment exercise is called an 'iteration'. You continue the process of building different iterations of processing until such time as you achieve the results that satisfy the organization that is sponsoring the exercise." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There is then a real mismatch between the volume of data and the business value of data. For people who are examining repetitive data and hoping to find massive business value there, there is most likely disappointment in their future. But for people looking for business value in nonrepetitive data, there is a lot to look forward to." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data privacy, data confidentiality, and data protection are sometimes incorrectly diluted with security. For example, data privacy is related to, but not the same as, data security. Data security is concerned with assuring the confidentiality, integrity, and availability of data. Data privacy focuses on how and to what extent businesses may collect and process information about individuals." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data visualization adds credibility to any message. [...] Data visualizations are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While boring numbers are authoritative, data visualization is inclusive. [...] Data visualizations absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her effortlessly."

"Data visualization‘s key responsibilities and challenges include the obligation to earn your audience’s attention - do not take it for granted." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"In general, a data or data set contains its sensitivity or controversial nature only if it is linked or related to an individual’s personal information. Else an isolated, abandoned, or unrelated sensitive or controversial attribute has no significance."

"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

21 November 2006

🔢Angelika Klidas - Collected Quotes

"Also, remember that data literacy is not just a set of technical skills. There is an equal need and weight for soft skills and business skills. This can be misleading for some technical resources within an organization, as those technical resources may believe they are data literate by default as they are data architects or data analysts. They have the existing technical skills, but maybe they do not have any deep proficiencies in other skills such as communicating with data, challenging assumptions, and mitigating bias, or perhaps they do not have an open mindset to be open to different perspectives." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Critical thinking is part of data literacy; it is the ability to question the logic of arguments or assumptions and examine evidence in order to determine whether a claim is true, false, or uncertain." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Current decision-making in business suffers from insight gaps. Organizations invest in data and analytics, hoping that will provide them with insights that they can use to make decisions, but in reality, there are many challenges and obstacles that get in the way of that process. One of the biggest challenges is that these organizations tend to focus on technology and hard skills only. They are definitely important, but you will not automatically get insights and better decisions with hard skills alone. Using data to make better data-informed decisions requires not only hard skills but also soft skills as well as mindsets." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is not achieved by mastering a uniform set of competencies that applies to everyone. Those that are relevant to each individual can vary significantly depending on how they interact with data and which part of the data process they are involved in." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is something that affects everyone and every organization. The more people who can debate, analyze, work with, and use data in their daily roles, the better data-informed decision-making will be." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"It is also important to note that data literacy is not about expecting to or becoming an expert; rather, it is a journey that must begin somewhere." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Organizations must have a plan and vision for data literacy, which they then communicate to all employees. They will need to develop and foster a culture that embraces data literacy and data-informed decisions. They will need to provide employees with access to various learning content specific to data literacy. Along their journey, they will need to make sure they benchmark and measure progress toward their vision and celebrate successes along the way." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"People can get confused or experience anxiety when they have to work with data and analytics. As we data literacy geeks say, people shouldn’t be pushed to work with data and analytics - they should do this because they want to. [...] Visualizations that are not understood present another risk. To be successful with data and analytics, we need visualizations that are presented in a clear meaningful way. If we do not take care of the data literacy levels within an organization, we might lose our public. Therefore, it is necessary to think of the risk of overwhelming our readers/viewers." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

20 November 2006

🔢Zach Gemignani - Collected Quotes

"A culture of data fluency needs to be built on a shared understanding of the data sources, data analysis, key metrics, and data products. It requires employees to be on the same page about how data is used and why it is important." (Zach Gemignani et al, "Data Fluency", 2014)

"Any presentation of data, whether a simple calculated metric or a complex predictive model, is going to have a set of assumptions and choices that the producer has made to get to the output. The more that these can be made explicit, the more the audience of the data will be open to accepting the message offered by the presenter." (Zach Gemignani et al, "Data Fluency", 2014)

"Broadly defined, data means events that are captured and made available for analysis. A data source is a consistent record of these events. And a data product translates this record of events into something that can easily be understood. [...] Data products can be organized and characterized by a series of continuums that describe the nature of the data and how it is presented." (Zach Gemignani et al, "Data Fluency", 2014)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Data alone isn’t valuable. In fact, it can be expensive in time and resources to manage and maintain. The analysis of this data is closer to something that is valuable. A clearly communicated analysis starts to transform a reflection of the world into knowledge in the minds of people. Even so, knowledge alone does not make your organization better. It is the decisions and actions of people - based on this data-sourced knowledge - that is the goal. But these decisions are seldom made in a vacuum. In most organizations, decisions are a collaborative, social experience. People come together to discuss options, review their knowledge of the situation, and arrive at a path to go down. Herein is one of the great powers of effective data products: They can shape and guide these discussions. Conclusions are seldom clear-cut, even when there is data to support a direction." (Zach Gemignani et al, "Data Fluency", 2014)

"Data captures actions and characteristics of the real world and transforms them into something that can be examined and explored after the fact." (Zach Gemignani et al, "Data Fluency", 2014)

"Data visualizations are designed to emphasize patterns and deviations in data. In fact, each specific chart type is well suited to highlighting particular forms of insight. A skilled author of data products will choose the right visualization to emphasize a message. The data, chart, and supporting descriptions should work in harmony to point out what is interesting. The reader simply goes along for the ride." (Zach Gemignani et al, "Data Fluency", 2014)

"Goals associated with a few, well-understood key metrics is a powerful combination. For both internal and external stakeholders, there is a strong alignment between organization mission, vision, goals, and tracking of progress. The efforts of everyone can be directed at these measurable goals, and people will focus on the processes that can impact these metrics." (Zach Gemignani et al, "Data Fluency", 2014)

"In fact, the analogy to storytelling is limited when applied to communicating with data. Data visualization has fundamental characteristics missing from traditional storytelling. For example, interactive data visualizations let audiences explore information to find insights that resonate with them. Visualizations take shape based to a large extent on the underlying data. And as this data changes, the emphasis and message of the visualization is likely to change." (Zach Gemignani et al, "Data Fluency", 2014)

"Metrics can serve two purposes: identifying problems and measuring performance. When the goal is to identify problems and pinpoint areas of operational inefficiency and ineffectiveness, defining the right metric requires a bit of detective work. It requires you to uncover the data residue of a problem and to determine what evidence can be found and how exactly it shows up. When the goal is to measure performance, the right success metrics focus on measures that can be controlled and where improvement in the metric is an unambiguously good thing." (Zach Gemignani et al, "Data Fluency", 2014)

"Most discussions of decision making assume that only senior executives make decisions or that only senior executives' decisions matter. This is a dangerous mistake. Decisions are made at every level of the organization, beginning with individual professional contributors and frontline supervisors. These apparently low-level decisions are extremely important in a knowledge-based organization." (Zach Gemignani et al, "Data Fluency", 2014)

"The most common mistake in ineffective data products is an inability to make difficult decisions about what information is most important. [...] Often information gets included in data products for reasons that are superfluous to the purpose, audience, and message - reasons that cater the product to someone influential or use information that has been included historically. The bar should be higher." (Zach Gemignani et al, "Data Fluency", 2014)

"We have an inbuilt ability to manipulate visual metaphors in ways we cannot do with the things and concepts they stand for—to use them as malleable, conceptual Tetris blocks or modeling clay that we can more easily squeeze, stack, and reorder. And then - whammo! - a pattern emerges, and we’ve arrived someplace we would never have gotten by any other means." (Zach Gemignani et al, "Data Fluency", 2014)

19 November 2006

🎯Stephen Few - Collected Quotes

"An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication." (Stephen Few, "Information Dashboard Design", 2006)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"A signal is a useful message that resides in data. Data that isn’t useful is noise. […] When data is expressed visually, noise can exist not only as data that doesn’t inform but also as meaningless non-data elements of the display (e.g. irrelevant attributes, such as a third dimension of depth in bars, color variation that has no significance, and artificial light and shadow effects)." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Apart from the secondary benefits of digital data, which are many, such as faster and cheaper information collection and distribution, the primary benefit is better decision making based on evidence. Despite our intellectual powers, when we allow our minds to become disconnected from reliable information about the world, we tend to screw up and make bad decisions." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Data contain descriptions. Some are true, some are not. Some are useful, most are not. Skillful use of data requires that we learn to pick out the pieces that are true and useful. [...] To find signals in data, we must learn to reduce the noise - not just the noise that resides in the data, but also the noise that resides in us. It is nearly impossible for noisy minds to perceive anything but noise in data." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Signals always point to something. In this sense, a signal is not a thing but a relationship. Data becomes useful knowledge of something that matters when it builds a bridge between a question and an answer. This connection is the signal." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Everything that informs us of something useful that we didn't already know is a potential signal. If it matters and deserves a response, its potential is actualized." (Stephen Few)

"One of the great purposes of education today is to help us filter the data, to reduce it to what's true and useful." (Stephen Few)

17 November 2006

🔢Adam Bellemare - Collected Quotes

"A data mesh is inherently multimodal, and data products can be provided via a variety of means. Event streams remain the best option for the majority of data products, as it is far easier to power both operational and analytical use cases through a stream than a batch of files at rest." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Bad data is costly to fix, and it’s more costly the more widespread it is. Everyone who has accessed, used, copied, or processed the data may be affected and may require mitigating action on their part. The complexity is further increased by the fact that not every consumer will “fix” it in the same way. This can lead to divergent results that are divergent with others and can be a nightmare to detect, track down, and rectify." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Creating data products requires that domain owners have a degree of autonomy in modeling, building, and delivering data to their consumers. However, by empowering them with autonomy and independence, you run the risk of a significant technological sprawl across data product implementations, making it more difficult for consumers to use the data products for their own ends. Federated governance focuses on finding an equilibrium between the needs of the consumers, the autonomy of the data product owners, the business compliance and security requirements, and global data product requirements." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh architectures are inherently decentralized, and significant responsibility is delegated to the data product owners. A data mesh also benefits from a degree of centralization in the form of data product compatibility and common self-service tooling. Differing opinions, preferences, business requirements, legal constraints, technologies, and technical debt are just a few of the many factors that influence how we work together." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh promotes data to a product with the same rigor, ownership, and feature management of any other product in your business. The free-for-all, 'figure it out yourself' data access is replaced with purpose-built, maintained, and supported modes. It is as much a social shift as it is a technological shift and requires both top-down and bottom-up buy-in." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Enforcing a schema at read time, instead of at write time, leads to a proliferation of what we call 'bad data'. The lack of write-time checks means that data written into HDFS may not adhere to the schemas that the readers are using in their existing work […]. Some bad data will cause consumers to halt processing, while other bad data may go silently undetected. While both of these are problematic, silent failures can be deadly and difficult to detect." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Federated governance can be roughly broken down into two main tasks. The first is establishing cross-organization policies, including data product standards and datah andling requirements, that apply to all users of the data mesh. The second is providing guidance on creating and using data products with self-service tools to make it easy to participate in the data mesh." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The premise of the data mesh solution is simple. Publish important business data sets to dedicated, durable, and easily accessible data structures known as data products. The original creators of the data are responsible for modeling, evolution, quality, and support of the data, treating it with the same first-class care given to any other product in the organization. Prospective consumers can explore, discover, and subscribe to the data products they need for their business use cases. The data products should be well-described, easy to interpret, and form the basis for a set of self-updating data primitives for powering both business services and analytics." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The problem of bad data has existed for a very long time. Data copies diverge as their original source changes. Copies get stale. Errors detected in one data set are not fixed in duplicate ones. Domain knowledge related to interpreting and understanding data remains incomplete, as does support from the owners of the original data." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

14 November 2006

🎯Zhamak Dehghani - Collected Quotes

"A data pipeline is a series of transformation steps (functions) executed as the data flows from one step to another. Data mesh refrains from using pipelines as a top-level architectural paradigm and in between data products. The challenge with pipelines as currently used is that they don’t create clear interfaces, contracts, and abstractions that can be maintained easily as the pipeline complexity complexity grows. Due to lack of abstractions, single failure in the pipeline causes cascading failures." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product encapsulates more than just the data. It needs to contain all the structural components needed to manifest its baseline usability characteristics - discoverable, understandable, addressable, etc. - in an autonomous fashion, while continuing to share data in a compliant and secure manner."(Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product’s primary job is to consume data from upstream sources using its input data ports, transform it, and serve the result as permanently accessible data via its output data ports." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is a collection of facts put together according to a model. The data model is an approximation of reality, good enough for the (analytical) tasks at hand." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In addition to limitations of scale, other challenges of data centralization are data quality and resilience to change. This is because business domains and teams that are most familiar with the data are not responsible for data quality." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] the governance function is accountable to define what constitutes data quality and how each data product communicates that in a standard way. It’s no longer accountable for the quality of each data product. The platform team is accountable to build capabilities to validate the quality of the data and communicate its quality metrics, and each domain (data product owner) is accountable to adhere to the quality standards and provide quality data products." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data management of the future must build in embracing change, by default. Rigid data modeling and querying languages that expect to put the system in a straitjacket of a never-changing schema can only result in a fragile and unusable analytics system. [...] The data management of the future must support managing and accessing data across multiple hosting platforms, by default." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh focuses on the impact of the data and not its volumes. It values data usability, data satisfaction, data availability, and data quality over the volume of the data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] data mesh introduces a fundamental shift that the owners of the data products must communicate and guarantee an acceptable level of quality and trustworthiness - specific to their domain - as an intrinsic characteristic of their data product. This means cleansing and running automated data integrity tests at the point of the creation of a data product." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is an element of a data strategy that fosters a data-driven organization to get value from data at scale." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In short, a monolithic architecture, technology, and organizational structure are not suitable for analytical data management of large-scale and complex organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed. It has high functional cohesion, i.e., performing a specific analytical transformation and securely sharing the result as domain-oriented analytical data. It has all the structural components that it requires to do its function: the transformation code, the data, the metadata, the policies that govern the data, and its dependencies to infrastructure." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"One of the limitations of data management solutions today is how we have attempted to manage its unwieldy complexity, how we have decomposed an ever-growing monolithic data platform and team to smaller partitions. We have chosen the path of least resistance, a technical partitioning." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"The distributed nature of data mesh demands immutability to give confidence to data users that (1) there is consistency between multiple data products for a point-in-time piece of data and (2) once they read data at a point in time, that data doesn’t change and they can reliably repeat the reads and processing." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"There are a set of characteristics that can be grouped together as quality. These attributes aren’t intended to define whether a data product is good or bad. They just communicate the threshold of guarantees the data product expects to meet, which may be well within an acceptable range for certain use cases." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Unlike other analytical data management paradigms, data mesh does not embrace the concept of the mythical single source of truth. Every data product provides a truthful portion of the reality - for a particular domain - to the best of its ability, a single slice of truth." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Ultimately, Data Mesh’s goal is to enable organizations to thrive in the face of the growth of data sources, growth of data users and use cases, and the increasing change in cadence and complexity. Adopting Data Mesh, organizations must thrive in agility, creating data-driven value while embracing change." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

13 November 2006

🔢Sid Adelman - Collected Quotes

"Data archeology (finding bad data), data cleansing (correcting bad data), and data quality enforcement (preventing data defects at the source) should be business objectives. Therefore, data quality initiatives are business initiatives and require the involvement of business people, such as information consumers and data originators." (Sid Adelman et al, "Data Strategy", 2005)

"Data strategy is one of the most ubiquitous and misunderstood topics in the information technology (IT) industry. Most corporations' data strategy and IT infrastructure were not planned, but grew out of "stovepipe" applications over time with little to no regard for the goals and objectives of the enterprise. This stovepipe approach has produced the highly convoluted and inflexible IT architectures so prevalent in corporations today." (Sid Adelman et al, "Data Strategy", 2005)

"Dealing with [...] resistance is where social sensitivity, leadership, and power come into play. Social sensitivity is the ability to read the players and respond appropriately to their concerns. Leadership and power can quickly overcome most resistance to change and allow you to establish an environment and convince management to properly support the data strategy." (Sid Adelman et al, "Data Strategy", 2005)

"It is important to remember that the 'single version of the truth' - or enterprise logical data model - is not and should not be built all at once (that would take too long), but that it evolves over time as the project-specific logical data models are merged, one-by-one, a project at a time." (Sid Adelman et al, "Data Strategy", 2005)

"The chaos without a data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data, the inability to integrate, poor performance, terrible availability, little accountability, users who are increasingly dissatisfied with the performance of IT, and the general feeling that things are out of control." (Sid Adelman et al, "Data Strategy", 2005)

"The data strategist is responsible for creating and maintaining the data strategy. This includes fully understanding the strategic goals of the organization. [...] The data strategist must know (or learn) the existing environment including the important internal databases, the external data that will be integrated, and the data quality characteristics. The data strategist must be aware of the data volumes expected in the next five years. [...] The data strategist must be aware of changes in the business that will require more complex transactions and queries. He or she must also be aware of governmental factors including regulations and governmental reporting requirements. The data strategist must know about the requirements of service level agreements (SLAs) for both performance and availability and be sure that the data strategy supports those SLAs (it's also likely that the data strategist would have input into creating those SLAs.) And finally, the data strategist must be wired into the politics of the organization so that his or her proposals will be pragmatic and accepted by management and staff." (Sid Adelman et al, "Data Strategy", 2005)

"The folks in IT don't like change if they believe it will diminish the power of the IT group. This is particularly true for managers. Managers put forward countless reasons why the organization should stay as is, especially if a change can decrease the number of employees they control because managers often equate headcount to power in the organization." (Sid Adelman et al, "Data Strategy", 2005) [?!]

"The vision of a data strategy that fits your organization has to conform to the overall strategy of IT, which in turn must conform to the strategy of the business. Therefore, the vision should conform to and support where the organization wants to be in 5 years." (Sid Adelman et al, "Data Strategy", 2005)

"Working without a data strategy is analogous to a company allowing each department and each person within each department to develop its own financial chart of accounts. This empowerment allows each person in the organization to choose his own numbering scheme. Existing charts of accounts would be ignored as each person exercises his or her own creativity." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

"You cannot boil the ocean; you have to prioritize your data integration deliverables. An enterprise-wide data integration effort must be carved up into small iterative projects, starting with the most critical data and working down to the less significant data. The business people working with the data integration team must determine which data is most appropriate for integration. Some data might not be suitable for integration at all, such as department-specific data, highly secured data, and data that is too risky to integrate. The team also needs to look at historical data and decide how much of it to include in the data integration process." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

🔢Ian Wallis - Collected Quotes

"A data strategy is the opportunity to bring data, one of the most important assets your organisation has, to the fore and to drive the future direction of the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A data strategy which no longer reflects the priorities of the organisation as a whole is doomed to fail, and likely to struggle to keep any momentum beyond the immediate term." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A KPI is a performance measure that demonstrates how effectively an organisation is achieving its critical objectives. They are used to track performance over a period of time to ensure the organisation is heading in the desired direction, and are quantifiable to guide whether activities need to be dialled up or down, resources adjusted or management resource focused on understanding what is in play that may be holding back the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Culture is not something that can be read in a corporate document (though many organisations will claim to have values, beliefs and other concepts that articulate the culture as the corporate centre wants it to be seen). It is intangible and can be challenging to comprehend to those on the outside looking in. Much of it is unspoken, a series of behavioural norms which are engrained in the fabric of the organisation and drive attitudes of employees to one another, management, change programmes and any external (to the group, as well as the organisation) effort to drive change that may be resisted simply because it ‘isn’t the way we do things around here’." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data has a value, without which an organisation is largely a shell, worthless and of limited appeal other than as a means of sweeping up fixed assets at a knock-down price. It is the lifeblood of an organisation, so whether you regard it as the water that is essential to life or the blood circulating around the body, without it our organisations are not functional." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data strategy is even less understood [thank business strategy], so the chances of success can be further decreased, simply because you need organisation-wide commitment and buy-in to succeed. Data does not exist in a bubble; it is not the preserve of a function that can fix it for all, detached from touching everyone else. It is core to how you run the organisation, and without a focus on where you are heading, it is going to trip the organisation up at every turn - regulatory compliance; operational effectiveness; financial performance; customer and employee experience; essentially, the efficiency in managing virtually every activity in the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"I am using ‘data strategy’ as an overarching term to describe a far broader set of capabilities from which sub-strategies can be developed to focus on particular facets of the strategy, such as management information (MI) and reporting; analytics, machine learning and AI; insight; and, of course, data management." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"If there is one all too common a failing in data strategies, it is the temptation to make them too detailed through either straying into implementation activities or overplaying the content by providing too much information. The key is to recognise the level of information that needs to be imparted to make the data strategy coherent and likely to be endorsed, with as little information as is necessary to be able to make the point cogently. Brevity, and associated clarity in what needs to be achieved and why, is a winning formula in gaining senior executive sponsorship." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"It is also important to regard the data strategy as a living document. Do not regard it as a masterpiece, never to be reviewed, amended or critiqued within the time frame it covers, but instead see it as a strategy that can flex to the changing demands of an organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"[...] it is always useful to learn from past mistakes, but evidence shows that most strategies fail due to an inability to follow through into execution." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"In the same vein, data strategy is often a misnomer for a much wider scope of coverage, but the lack of coherence in how we use the language has led to data strategy being perceived to cover data management activities all the way through to exploitation of data in the broadest sense. The occasional use of information strategy, intelligence strategy or even data exploitation strategy may differentiate, but the lack of a common definition on what we mean tends to lead to data strategy being used as a catch-all for the more widespread coverage such a document would typically include. Much of this is due to the generic use of the term ‘data’ to cover everything from its capture, management, governance through to reporting, analytics and insight." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Many organisations start a data strategy from a need to get data into some sort of organised state in which it is feasible to demonstrate compliance. In my opinion, compliance should be a component of a data strategy, not the data strategy in itself." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The challenge with using OKRs is to focus on just three to five objectives - sounds simple enough, but so many organisations follow the ‘if it moves, track it’ philosophy such that they can’t see the wood for the trees." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The key for a successful data strategy is to align it clearly with the corporate strategy. The data strategy is a crucial enabler of the corporate strategy, and the data strategy should clearly call out those components that have a clear line of sight to delivering, or enabling, the corporate goals. If the data strategy does not align to the corporate goals it will be a much more challenging task to get the wider organisation to buy into it, not least because it will fail to have any resonance with the objectives of the organisational leaders and be regarded as optional at best." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The KPI juggernaut has been misused and abused in too many organisations to the extent it has devalued the concept of KPIs. KPIs used well - the ten things that really matter to an organisation - can, in my experience, be a real galvanising force to get focus and attention put in those areas which really can make a difference. The rest is a distraction, there through some misplaced view that more adds value when actually it detracts through losing the focus from where it needs to be." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The nature of the change that the data strategy is to drive will be determined by the appetite and commitment of the organisation to change. It will also be shaped by the maturity of the organisation, with the maturity assessment process having identified and demonstrated where the gaps lie, and the resolve of the organisation to set its own pace and objectives to be achieved by the time of the next assessment." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The premise of OKRs is to keep objectives and results simple and flexible, ensuring they align with business goals and enterprise initiatives guided by regular reviews to assess progress during the quarter. The intent is to keep OKRs clear and accountable, as well as measurable, with between three and five objectives recommended at a high level that can each be tracked by three to five key measures. They should be ambitious goals, even uncomfortable, in challenging aspirations, making them stretch targets." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

🔢Bernard Marr - Collected Quotes

"A good data strategy is not determined by what data is readily or potentially available – it’s about what your business wants to achieve, and how data can help you get there." (Bernard Marr, "Data Strategy", 2017)

"A picture can paint a thousand words, as the saying goes. In this way, visuals are great for conveying information because they’re quick and direct, they’re memorable, and they add interest (being much more likely to hold the reader’s attention than a full page of text). But unless we know how to decode its message, a picture can also be difficult to read." (Bernard Marr, "Data Strategy", 2017)

"Analytics is the process of collecting, processing and analysing data to generate insights that help you improve the way you do business." (Bernard Marr, "Data Strategy", 2017)

"Data for data’s sake is meaningless. Therefore, instead of hoarding data, collect only what you really need and what makes business sense." (Bernard Marr, "Data Strategy", 2017) [?!]

"Data is certainly exciting – revolutionary, even. But that doesn’t always mean useful. To be truly useful, in a business sense, data must address a specific business need, help the organization reach its strategic goals, or generate real value." (Bernard Marr, "Data Strategy", 2017)

"[…] from a data strategy point of view, you need to describe the ideal data sets that would help you achieve your strategic objectives. You can then choose the best options for you based on how well they help you achieve your objectives, how easy it is to access or gather that data, and how cost effective it is." (Bernard Marr, "Data Strategy", 2017)

"However you plan to use data, even if you plan to treat data as a key business asset, it is never a good idea to capture huge mountains of data that you don’t really need. Remember, the power of big data is not in the data it - self, it’s in how you use it." (Bernard Marr, "Data Strategy", 2017) [?!]

"I can’t stress enough how important this stage is; ‘selling’ big data to your people is a crucial early step on your data journey. It instils confidence in data." (Bernard Marr, "Data Strategy", 2017) [?!]

"[…] if companies want to avoid drowning in data, they need to develop a smart strategy that focuses on the data they really need to achieve their goals. In other words, this means defining the business-critical questions that need answering and then collecting and analysing only that data which will answer those questions." (Bernard Marr, "Data Strategy", 2017)

"Structured data is any data or information that is located in a fixed field within a defined record or file, usually in databases or spreadsheets. Essentially, it is data that is organized in a predetermined way, usually in rows and columns." (Bernard Marr, "Data Strategy", 2017)

"[…] the better insights are communicated, the more likely it is that data leads to positive action (in this case, better business decisions)." (Bernard Marr, "Data Strategy", 2017)

"Unfortunately, the widespread perception among business executives is that data and analytics are purely IT matters. And as with all IT matters, this means they don’t really need to understand how they work, or why." (Bernard Marr, "Data Strategy", 2017)

"When data isn’t properly looked after, it becomes meaningless and valueless. Even worse, if the data is out of date, incorrectly categorized, or used out of context, it can lead to misinformed decisions that can damage the long-term health of the company." (Bernard Marr, "Data Strategy", 2017) [?!]

11 November 2006

🎯🏭🗒️Sonia Mezzetta - Collected Quotes

"A data architecture needs to have the robustness and ability to support multiple data management and operational models to provide the necessary business value and agility to support an enterprise’s business strategy and capabilities." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy must align with the business goals and overall framework of how data will be used and managed within an organization. It needs to include standards for how data will be discovered, integrated, accessed, shared, and protected. It needs to address how data will meet regulatory compliance policies, Master Data Management, and data democratization. There needs to be an assurance that both data and metadata have a quality control framework in place to achieve data trust. A data strategy needs to have a clear path on how an organization will accomplish data monetization." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy is a living document that needs to be continuously updated to align with business goals. It should have a clear maintenance process with frequent reviews and identification of authors and stakeholders that will contribute to the data strategy. This also includes the handling of exceptions to a data strategy process for any one-off decisions in special circumstances. A data strategy document must always be easily assessable, to the point, and understandable." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Apply DataOps principles to the development and delivery of data. DataOps is a best practice framework that accelerates the development of data and quality across its entire life cycle with high efficiency and quality. This is especially important when integrating data across distributed complex systems and environments." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric’s building blocks represent groupings of different components and characteristics. They are high-level blocks that describe a package of capabilities that address specific business needs. The building blocks are Data Governance and its knowledge layer, Data Integration, and Self-Service." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric focuses on Self-Service data access via active metadata leveraging a composable set of tools and technologies. It offers the ability to discover, understand, and access data across hybrid and multi-cloud data landscapes with automation and Data Governance. It is primarily process and technology centric with flexibility in supporting diverse organizational models. On the other hand, Data Mesh is organizationally and process driven. It requires a technical implementation approach to execute its design. Data Mesh is at a higher level and Data Fabric is at a lower level. Data Fabric is capable of fulfilling Data Mesh’s key principles." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"[Data Fabric] is not a single technology, such as data virtualization. […] It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue. It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data. Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric supports a federated, decentralized, or centralized organization. To participate in Data Fabric, metadata is contributed in an automated manner and knowledge is populated from it to propel data management. Data Fabric is different from a Data Mesh design in that it supports decentralized, federated, and centralized organizations. Data Fabric’s objectives are to help an organization to evolve to a more mature level of data management by leveraging active metadata, which is a core prerequisite." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Establish an organization’s data maturity level and progress toward ongoing improvement. An organization needs to first understand what its current data maturity level is to determine the areas of improvement to create a forward-looking plan. A data maturity assessment offers a position on the current data maturity that serves as an indicator of the health of an organization. A data maturity assessment can be used as a tool to drive continuous improvement by measuring progress. The key thing here is to always strive for continuous improvement to achieve success." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"I emphasize this point as there are views in the industry that Data Fabric is a centralized storage architecture, which is not the case from my point of view. A Data Fabric architecture is driven by the needs and direction of the business architecture." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Manage data as a strategic asset that evolves into a data product. The premise here is to stop managing data as a byproduct and create an ecosystem that manages data as a valuable strategic asset that can evolve into a data product. Data producers are accountable for managing the life cycle of data from creation to end of life and ensuring it creates business value along the way for data consumers. This requires data that is governed, trusted, protected, secure, and easily accessible. Move data from technical data assets to Data Products by operationalizing data for high scale sharing." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Where Data Mesh differs from Data Fabric is that it has fixed requirements for the Self-Service platform focused on organizing and managing Data Products by business domain. Another difference is Data Fabric supports managing data as an asset and as a product. A Data Product can be composed of assets that have been governed and managed in a Data Fabric architecture. Data Fabric does not have these fixed requirements, although it inherently supports isolating data and Data Governance enforcement via metadata by business domain. You can think of a Data Mesh Self-Service data platform as supporting separate, independent companies (business domains), although the key criteria are that it does not create data silos and attains data sharing across these companies in a secure, quick, and easy manner. In Data Mesh, Data Products are created and managed by federated business domains and a data platform requires capabilities that enable data and policy federation. This is where a Data Fabric solution can also address Data Mesh’s requirements." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

🔢Charles D Tupper - Collected Quotes

"An architecture is the response to the integrated collections of models and views within the problem area being examined." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"An architecture represents combined perspectives in a structured format that is easily viewable and explains the context of the area being analyzed to all those viewing it." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Analyzing and defining an area must be done prior to doing any activity within that area. Without understanding all that must be done, incorrect assumptions can be reached. Short-term vision may handicap future development. Inappropriate scoping may produce artificial boundaries where there should be none." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architecture allows strategic development of flexible modular designs by insulating the data from the business as well as the technology process." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Methodologies provide guidelines for the application development process. They specify analysis and design techniques as well as the stages in which they occur. They also develop event sequencing. Lastly, they specify milestones and work products that must be created and the appropriate documentation that should be generated." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architectures are the heart of business functionality. Given the proper data architecture, all possible functions can be completed within the enterprise easily and expeditiously." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Processes that use data change far more frequently than the data structures themselves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"The enterprise architecture delineates the data according to the inherent structure within the organization rather than by organizational function or use. In this manner it makes the data dependent on business objects but independent of business processes." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Using architecture leads to foundational stability, not rigidity. As long as the appropriate characteristics are in place to ensure positive architectural evolution, the architecture will remain a living construct. Well-developed architectures are frameworks that evolve as the business evolves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

10 November 2006

🔢Pearl Zhu - Collected Quotes

"A good strategy tells you not only what specifically needs to accomplish, but WHY." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Agile is more a 'direction', than an 'end'. Transforming to Agile culture means the business knows the direction they want to go on." (Pearl Zhu, "Digital Agility: The Rocky Road from Doing Agile to Being Agile", 2016)

"Breaking rules is indeed an important part of creativity. Innovation needs a level of guidance." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Good governance is less about structure and rules than being focused, effective and accountable." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Governance is not about maximization, but about optimization." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Selecting the right measure and measuring things right are both art and science. And KPIs influence management behavior as well as business culture." (Pearl Zhu, "CIO Master: Unleash the Digital Potential of It", 2016)

"Setting the right priorities or having superior time management skill means knowing the difference between 'must have', and 'nice to have'." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"The art of questioning is to ignite innovative thinking; the science of questioning is to frame system thinking, with the progressive pursuit of better solutions." (Pearl Zhu, "Leadership Master: Five Digital Trends to Leap Leadership Maturity", 2016)

"The 'result' of micromanagement is perhaps tangible in the short run, but more often causes damage for the long term." (Pearl Zhu, "Change Insight: Change as an Ongoing Capability to Fuel Digital Transformation", 2016)

"Using two-dimensional lenses to perceive the multi-faceted world can limit your ability to observe the world more objectively." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"A 'roadmap' is simply a plan for moving or transitioning, from one state to another. A roadmap provides the direction to the future." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A well-defined set of digital rules are not for limiting innovation, but for setting the frame of relevance and guide through changes and digital transformation." (Pearl Zhu, "100 Digital Rules: Setting Guidelines to Explore Digital New Normal", 2017)

"Building a comprehensive problem-solving framework is about leveraging a structured methodology that allows you to frame problems systematically and solve problems creatively." (Pearl Zhu, "Problem Solving Master: Frame Problems Systematically and Solve Problem Creatively", 2017)

"Decision makers with emotional excellence have the ability to dispassionately examine alternatives via fact finding, analysis, structured planning, objective evaluations, and comparison." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision making is an art only until the person understands the science." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision maturity is to ensure the right decisions have been made by the right people at the right time to solve the right problems." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Digital synchronization and strategic alignment occur when all parts of the choir sing their respective parts in harmony to achieve a higher purpose." (Pearl Zhu, "12 CIO Personas: The Digital CIO's Situational Leadership Practices", 2017)

"Digitalization implies the full-scale changes in the way business is conducted so that it’s a multi-dimensional planning and orchestration." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"Framing the right problem is equally or even more important than solving it." (Pearl Zhu, “Change, Creativity and Problem-Solving”, 2017)

"Most organizations fail to manage performance effectively because they fail to look into the system holistically." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"The science of decision-making is to make sure there is an effective decision process in place." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"It is important to strengthen the weakest link, to ensure all important business elements integrated and knitted into ongoing organizational capabilities and unique business competency." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"The simplicity and the complexity are just the opposite ends of the same spectrum." (Pearl Zhu, "Digital Gaps: Bridging Multiple Gaps to Run Cohesive Digital Business", 2017)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2017)

"You can’t improve what you are not managing, you can’t manage what you are not measuring, and you can’t measure what you are not focusing." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A business ecosystem is just like the natural ecosystem; first, needs to be understood, then, needs to be well planned, and also needs to be thoughtfully renewed as well." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"A seamless digital transformation requires a vision to convey 'WHY', a solid strategy to clarify 'WHAT', and a technical specification to articulate 'HOW' you want to transform radically." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"An organizational structure carries inherent capabilities as to what can be achieved within its frame." (Pearl Zhu, Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight, 2018)

"Change Management is a journey, not just a one-time project, riding ahead of change curve takes both strategy and methodology." (Pearl Zhu, "The Change Agent CIO: The CIO’s Dynamic Role of Leading Digitalization", 2018)

"Coherence improves business flow; resilience makes business robust and anti-fragile." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"Going digital is more like a journey than a destination. Predicting and preparing the next level of digitalization is an iterative learning and doing continuum." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Ideally, the two structures - hierarchy, and relationship structure wrap around each other to ensure responsibility, to keep information flow and the creation of power." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Taking the multidimensional hybrid models for going digital is all about how to strike the right balance of reaping quick wins and focusing on the long-term strategic goals." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"The most effective digital workplace is one where collaboration and sharing are the norms." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

08 November 2006

🔢Robert Hawker - Collected Quotes

"[...] a conceptual data model [...] is system-agnostic and is a diagrammatic business representation of how different types of data are associated with one another in the organization." (Robert Hawker, "Practical Data Quality", 2023)

"A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Correction of data in the secondary source is not recommended. However, it is important to recognize that sometimes, secondary source fixes are required." (Robert Hawker, "Practical Data Quality", 2023)

"Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined." (Robert Hawker, "Practical Data Quality", 2023)

"Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column." (Robert Hawker, "Practical Data Quality", 2023)

"Data quality rules are only effective if they are tightly scoped. Generic rules tend to produce a lot of unwanted failed records, and business users start to ignore the results. Once business users lose faith in what they see from a data quality tool, it is hard to restore engagement." (Robert Hawker, "Practical Data Quality", 2023)

"Every data quality initiative is different, and senior stakeholders at different organizations will have different needs." (Robert Hawker, "Practical Data Quality", 2023)

"If an organization had a single overall data quality key performance indicator (KPI), then it might be appropriate to put a greater weighting on those rules which would impact regulatory compliance. A lack of regulatory compliance is a risk to the very existence of organizations like these, and therefore, a greater weighting might be needed." (Robert Hawker, "Practical Data Quality", 2023)

"It rarely makes sense to aim for what people might consider perfect data (every record is complete, accurate, and up to date). The investment required is usually prohibitive, and the gains made for the last 1% of data quality improvement effort become far too marginal." (Robert Hawker, "Practical Data Quality", 2023)

"In truth, no one knows how much bad data quality costs a company – even companies with mature data quality initiatives in place, who are measuring hundreds of data points for their quality struggle to accurately measure quantitative impact. This is often a deal-breaker for senior leaders when trying to get approval for a budget for data quality work. Data quality initiatives often seek substantial budgets and are up against projects with more tangible benefits." (Robert Hawker, "Practical Data Quality", 2023)

"Momentum is important in data quality initiatives. If an issue is problematic, even where the priority is high, it can be better to move on to an issue that can be progressed efficiently." (Robert Hawker, "Practical Data Quality", 2023)

"Most data quality issues will re-occur if the root cause is not fully understood [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Organizations will always only have a limited amount of resources available to remediate data. It will almost certainly not be possible to tackle all the issues at the same time. Therefore, prioritization is key to ensuring that the most value is generated from the available resources." (Robert Hawker, "Practical Data Quality", 2023)

"Successful organizations try to put a holistic data culture in place. Everyone is educated on the basics of looking after data and the importance of having good data. They consider what they have learned when performing their day-to-day tasks. This is often referred to as the promotion of good data literacy." (Robert Hawker, "Practical Data Quality", 2023)

"The biggest mistake that can be made in a data quality initiative is focusing on the wrong data. If you fix data that does not impact a critical business process or drive important decisions, your initiative simply will not make the difference that you want it to." (Robert Hawker, "Practical Data Quality", 2023)

"The data should be monitored in the source, it should be corrected in the source, and it should then feed the secondary source(s) with high-quality data that can be used without workarounds. The reduction in workarounds will make the data engineers, scientists, and data visualization specialists much more productive." (Robert Hawker, "Practical Data Quality", 2023)

"The level of data quality in an organization is the extent to which data can be used for its intended purposes." (Robert Hawker, "Practical Data Quality", 2023)

"Start with a business strategy. Too many organizations start their data quality initiative by looking at the details of the data and trying to see 'what is wrong with it'. The right approach is to understand what the business is trying to achieve and to work out where data issues might impede this. It ensures that data quality work will be truly impactful." (Robert Hawker, "Practical Data Quality", 2023)

04 November 2006

🔢Dhanurjay "DJ" Patil - Collected Quotes

"[...] a good definition of a data product is a product that facilitates an end goal through the use of data. It’s tempting to think of a data product purely as a data problem. After all, there’s nothing more fun than throwing a lot of technical expertise and fancy algorithmic work at a difficult problem." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"As data scientists, we prefer to interact with the raw data. We know how to import it, transform it, mash it up with other data sources, and visualize it. Most of your customers can’t do that. One of the biggest challenges of developing a data product is figuring out how to give data back to the user. Giving back too much data in a way that’s overwhelming and paralyzing is 'data vomit'. It’s natural to build the product that you would want, but it’s very easy to overestimate the abilities of your users. The product you want may not be the product they want." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Ideas for data products tend to start simple and become complex; if they start complex, they become impossible." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"In many applications, a design treatment that gives the user control over the outcome can go far to create interactions that leave the user feeling good." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The best way to avoid data vomit is to focus on actionability of data. That is, what action do you want the user to take? If you want them to be impressed with the number of things that you can do with the data, then you’re likely producing data vomit. If you’re able to lead them to a clear set of actions, then you’ve built a product with a clear focus." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The key aspect of making a data product is putting the 'product' first and 'data' second. Saying it another way, data is one mechanism by which you make the product user-focused. With all products, you should ask yourself the following three questions: (1) What do you want the user to take away from this product? (2) What action do you want the user to take because of the product? (3) How should the user feel during and after using your product?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"You can give your data product a better chance of success by carefully setting the users’ expectations. [...] One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see." (Dhanurjay Patil, "A Code of Ethics for Data Science", 2016)