14 November 2006

Zhamak Dehghani - Collected Quotes

"A data pipeline is a series of transformation steps (functions) executed as the data flows from one step to another. Data mesh refrains from using pipelines as a top-level architectural paradigm and in between data products. The challenge with pipelines as currently used is that they don’t create clear interfaces, contracts, and abstractions that can be maintained easily as the pipeline complexity complexity grows. Due to lack of abstractions, single failure in the pipeline causes cascading failures." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product encapsulates more than just the data. It needs to contain all the structural components needed to manifest its baseline usability characteristics - discoverable, understandable, addressable, etc. - in an autonomous fashion, while continuing to share data in a compliant and secure manner."(Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product’s primary job is to consume data from upstream sources using its input data ports, transform it, and serve the result as permanently accessible data via its output data ports." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is a collection of facts put together according to a model. The data model is an approximation of reality, good enough for the (analytical) tasks at hand." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In addition to limitations of scale, other challenges of data centralization are data quality and resilience to change. This is because business domains and teams that are most familiar with the data are not responsible for data quality." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] the governance function is accountable to define what constitutes data quality and how each data product communicates that in a standard way. It’s no longer accountable for the quality of each data product. The platform team is accountable to build capabilities to validate the quality of the data and communicate its quality metrics, and each domain (data product owner) is accountable to adhere to the quality standards and provide quality data products." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data management of the future must build in embracing change, by default. Rigid data modeling and querying languages that expect to put the system in a straitjacket of a never-changing schema can only result in a fragile and unusable analytics system. [...] The data management of the future must support managing and accessing data across multiple hosting platforms, by default." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh focuses on the impact of the data and not its volumes. It values data usability, data satisfaction, data availability, and data quality over the volume of the data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] data mesh introduces a fundamental shift that the owners of the data products must communicate and guarantee an acceptable level of quality and trustworthiness - specific to their domain - as an intrinsic characteristic of their data product. This means cleansing and running automated data integrity tests at the point of the creation of a data product." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is an element of a data strategy that fosters a data-driven organization to get value from data at scale." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In short, a monolithic architecture, technology, and organizational structure are not suitable for analytical data management of large-scale and complex organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed. It has high functional cohesion, i.e., performing a specific analytical transformation and securely sharing the result as domain-oriented analytical data. It has all the structural components that it requires to do its function: the transformation code, the data, the metadata, the policies that govern the data, and its dependencies to infrastructure." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"One of the limitations of data management solutions today is how we have attempted to manage its unwieldy complexity, how we have decomposed an ever-growing monolithic data platform and team to smaller partitions. We have chosen the path of least resistance, a technical partitioning." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"The distributed nature of data mesh demands immutability to give confidence to data users that (1) there is consistency between multiple data products for a point-in-time piece of data and (2) once they read data at a point in time, that data doesn’t change and they can reliably repeat the reads and processing." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"There are a set of characteristics that can be grouped together as quality. These attributes aren’t intended to define whether a data product is good or bad. They just communicate the threshold of guarantees the data product expects to meet, which may be well within an acceptable range for certain use cases." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Unlike other analytical data management paradigms, data mesh does not embrace the concept of the mythical single source of truth. Every data product provides a truthful portion of the reality - for a particular domain - to the best of its ability, a single slice of truth." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Ultimately, Data Mesh’s goal is to enable organizations to thrive in the face of the growth of data sources, growth of data users and use cases, and the increasing change in cadence and complexity. Adopting Data Mesh, organizations must thrive in agility, creating data-driven value while embracing change." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.