SQL Troubles: 🎯Hubert Dulay

05 November 2006

🎯Hubert Dulay - Collected Quotes

"A data fabric is a pattern that is very similar to a data mesh in that both provide solutions encompassing data governance and self-service: discovery, access, security, integration, transformation, and lineage. [...] In simple terms, a data fabric is a metadriven means of connecting disparate sets of data and related tools to provide a cohesive data experience and to deliver data in a self-service manner." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"A data fabric is an architectural approach to provide data access across multiple technologies and platforms, and is based on a technology solution. One key contrast is that a data mesh is much more than just technology: it is a pattern that involves people and processes. Instead of taking ownership of an entire data platform, as in a data fabric, the data mesh allows data producers to focus on data production, allows data consumers to focus on consumption, and allows hybrid teams to consume other data products, blend other data to create even more interesting data products, and publish these data products - with some data governance considerations in place." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"A domain has two main roles: data product engineer (or just data engineer) and the data product owner (or data product manager, or data steward). These roles can be the same or dedicated people in the domain. Data product owners must have a deep understanding of who their data consumers are, how the data is used, and what methods are used to consume the data. This will help ensure that the data products meet the needs of their use cases. Data product engineers are responsible for creating data products that are high quality, reliable, and usable by consumers. It should be possible to extend existing domain roles to include these domain roles with minimal effort." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Consumability is a very important requirement because it will directly affect the experience domain consumers will have in a streaming data mesh. If other domains cannot easily consume streaming data products, then they may opt out of the streaming data mesh and decide to build their own integrations by hand, bypassing any issues they encounter with the data mesh. Some factors to consider when ingesting data derivatives that will affect the consumability of other domains are as follows: (*) Lack of scalability (*) Lack of interoperability" (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data governance creates access controls between the data product producer and consumer and provides metadata like schema definitions and lineages. In some cases, mastered data along with reference data may be relevant to the implementation. Data governance allows us to create appropriate access controls for these resources as well." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data governance is a set of policies, standards, processes, roles, and responsibilities that collectively ensure accountability and ownership of data across the business. Policies are the rules and regulations surrounding data defined by the business itself or, more importantly, externally by laws that, if broken, could cost a business a massive amount in fines. These policies also include enforcement of standards that enable interoperability and consumability of data between domains, especially in a decentralized data platform like a streaming data mesh. These policies are implemented as processes and controls on data by authorizing, authenticating, and safeguarding private or personal data. Policies are implemented using roles that represent groups, people, or systems to create access controls around data." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data lineage is the path the data took from its source origin, the stops it made along the way, and its destination. This includes information on all the systems it passed through, how it was cleansed, what it was enriched with, and how it was secured. Capturing all that metadata is difficult because many of those systems and applications don’t share information. It’s up to you to assemble the data’s path by pulling metadata from all those systems/applications and assembling them in hopes that you find the path your data took from its current location (destination) to its source system. Lineage is probably the hardest piece of metadata to acquire for either streaming or batching data pipelines." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data lineage provides the entire history of the data: origin, transformations, enrichments, users who engineered the transformations, etc. Consumers of the data need to trust that the data they will be using is the correct data. Data lineage provides a perspective that creates trust. It does so by mapping out the steps for policies, standards, processes, roles, and responsibilities that were involved with the sourcing, transformation, enrichment, and cleansing of the data." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data mesh is not completely decentralized. The data is decentralized in domains, but the mesh part of data mesh is not. Data governance is critical in building the mesh in a data mesh." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Data tags are a simple way of providing the consuming domains with more information about the streaming data product: how it was built and what to expect when consuming it. Many of the streaming data characteristics are hard to measure, like quality and security, so it’s sometimes tough to provide that important information to the consuming domain. Instead of providing a number or a score, we can provide tags that represent levels of quality and security." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Domain-driven design (DDD) is the methodology that helps us understand complex domain models by connecting the data model itself to core business concepts. The understanding that emerges from DDD creates a foundation to designing distributed, microservice-based, client-facing applications. DDD connects the implementation of software and its components to an evolving and ever-changing data model. The domain is the world of the business you are working with and the problems you are trying to solve. This typically involves rules, processes, and existing systems that need to be integrated as part of your solution." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"In a data mesh, data is decentralized, while in a data fabric, centralization of data is allowed. And with data centralization like data lakes, you get the monolithic problems that come with it. Data mesh tries to apply a microservices approach to data by decomposing data domains into smaller and more agile groups." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"Since domains are used to create data products, and sharing data products across many domains ultimately builds a mesh of data, we need to ensure that the data being served follows some guidelines. Data governance involves creating and adhering to a set of global rules, standards, and policies applied to all data products and their interfaces to ensure a collaborative and interoperable data mesh community. These guidelines must be agreed upon among the participating data mesh domains." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"The best approach to building self-services for domains is to follow what many SaaS services do. They follow a serverless model that is easier for their users to understand and utilize in their applications. The intention of the serverless SaaS providers is to not require their users to worry about the 'servers' that are allocated on their behalf. Users can focus more on their business rather than managing and tuning servers. This should be the same model for self-services: to make a streaming data mesh serverless so domains need to focus only on their business."(Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"To overcome ambiguous domain challenges, each domain boundary must be distinct and explicit. Business area, processes, and data that belong together need to stay together. Additionally, each data domain should belong to one, and only one, Agile or DevOps team. Data integration points within a data domain should be manageable and understood by all team members." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"We recommend making domain boundaries concrete and immutable. This helps avoid lengthy discussions about who owns what data, and also prohibits teams from freely interpreting domain boundaries to suit their own needs. Creating a domain-oriented structure is a transition - not only for data, but for people and resources. When creating domain boundaries, resources may eventually align with other teams, disrupting and evolving the current team structure. The entire concept of data mesh is just as much about resource alignment as it is about data, so the realignment of resources should not be considered a roadblock as you go through this process." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"When building a data mesh, it is necessary to enable existing engineers in a domain to perform the tasks required. Domains have to capture data from their operational stores, transform (join or enrich, aggregate, balance) that data, and publish their data products to the data mesh. Self-service services are the “easy buttons” necessary to make data mesh easy to adopt with high usability. In summary, the selfservices enable the domain engineers to take on many of the tasks the data engineer was responsible for across all lines of the business. A data mesh not only breaks up the monolithic data lake, but also breaks up the monolithic role of the data engineer into simple tasks the domain engineers can perform." (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

"While a data mesh seeks to solve many of the same problems that a data fabric addresses - namely, the ability to address data in a single, composite data environment—the approach is different. While a data fabric enables users to create a single, virtual layer on top of distributed data, a data mesh further empowers distributed groups of data producers to manage and publish data as they see fit. Data fabrics allow for a low-to-no-code data virtualization experience by applying data integration within APIs that reside within the data fabric. The data mesh, however, allows for data engineers to write code for APIs with which to interface further. Without clearly defined boundaries, domains appear to be too interconnected, and ownership becomes either political or subject to interpretation. For instance, a large retailer most likely has multiple domains. [...]" (Hubert Dulay & Stephen Mooney, "Streaming Data Mesh", 2023)

SQL Troubles

Pages

05 November 2006

🎯Hubert Dulay - Collected Quotes

No comments:

About Me