14 November 2006

🎯Zhamak Dehghani - Collected Quotes

"A data pipeline is a series of transformation steps (functions) executed as the data flows from one step to another. Data mesh refrains from using pipelines as a top-level architectural paradigm and in between data products. The challenge with pipelines as currently used is that they don’t create clear interfaces, contracts, and abstractions that can be maintained easily as the pipeline complexity complexity grows. Due to lack of abstractions, single failure in the pipeline causes cascading failures." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product encapsulates more than just the data. It needs to contain all the structural components needed to manifest its baseline usability characteristics - discoverable, understandable, addressable, etc. - in an autonomous fashion, while continuing to share data in a compliant and secure manner."(Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product’s primary job is to consume data from upstream sources using its input data ports, transform it, and serve the result as permanently accessible data via its output data ports." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is a collection of facts put together according to a model. The data model is an approximation of reality, good enough for the (analytical) tasks at hand." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In addition to limitations of scale, other challenges of data centralization are data quality and resilience to change. This is because business domains and teams that are most familiar with the data are not responsible for data quality." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] the governance function is accountable to define what constitutes data quality and how each data product communicates that in a standard way. It’s no longer accountable for the quality of each data product. The platform team is accountable to build capabilities to validate the quality of the data and communicate its quality metrics, and each domain (data product owner) is accountable to adhere to the quality standards and provide quality data products." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data management of the future must build in embracing change, by default. Rigid data modeling and querying languages that expect to put the system in a straitjacket of a never-changing schema can only result in a fragile and unusable analytics system. [...] The data management of the future must support managing and accessing data across multiple hosting platforms, by default." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh focuses on the impact of the data and not its volumes. It values data usability, data satisfaction, data availability, and data quality over the volume of the data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"[...] data mesh introduces a fundamental shift that the owners of the data products must communicate and guarantee an acceptable level of quality and trustworthiness - specific to their domain - as an intrinsic characteristic of their data product. This means cleansing and running automated data integrity tests at the point of the creation of a data product." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is an element of a data strategy that fosters a data-driven organization to get value from data at scale." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In short, a monolithic architecture, technology, and organizational structure are not suitable for analytical data management of large-scale and complex organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed. It has high functional cohesion, i.e., performing a specific analytical transformation and securely sharing the result as domain-oriented analytical data. It has all the structural components that it requires to do its function: the transformation code, the data, the metadata, the policies that govern the data, and its dependencies to infrastructure." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"One of the limitations of data management solutions today is how we have attempted to manage its unwieldy complexity, how we have decomposed an ever-growing monolithic data platform and team to smaller partitions. We have chosen the path of least resistance, a technical partitioning." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"The distributed nature of data mesh demands immutability to give confidence to data users that (1) there is consistency between multiple data products for a point-in-time piece of data and (2) once they read data at a point in time, that data doesn’t change and they can reliably repeat the reads and processing." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"There are a set of characteristics that can be grouped together as quality. These attributes aren’t intended to define whether a data product is good or bad. They just communicate the threshold of guarantees the data product expects to meet, which may be well within an acceptable range for certain use cases." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Unlike other analytical data management paradigms, data mesh does not embrace the concept of the mythical single source of truth. Every data product provides a truthful portion of the reality - for a particular domain - to the best of its ability, a single slice of truth." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Ultimately, Data Mesh’s goal is to enable organizations to thrive in the face of the growth of data sources, growth of data users and use cases, and the increasing change in cadence and complexity. Adopting Data Mesh, organizations must thrive in agility, creating data-driven value while embracing change." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

13 November 2006

🔢Sid Adelman - Collected Quotes

"Data archeology (finding bad data), data cleansing (correcting bad data), and data quality enforcement (preventing data defects at the source) should be business objectives. Therefore, data quality initiatives are business initiatives and require the involvement of business people, such as information consumers and data originators." (Sid Adelman et al, "Data Strategy", 2005)

"Data strategy is one of the most ubiquitous and misunderstood topics in the information technology (IT) industry. Most corporations' data strategy and IT infrastructure were not planned, but grew out of "stovepipe" applications over time with little to no regard for the goals and objectives of the enterprise. This stovepipe approach has produced the highly convoluted and inflexible IT architectures so prevalent in corporations today." (Sid Adelman et al, "Data Strategy", 2005)

"Dealing with [...] resistance is where social sensitivity, leadership, and power come into play. Social sensitivity is the ability to read the players and respond appropriately to their concerns. Leadership and power can quickly overcome most resistance to change and allow you to establish an environment and convince management to properly support the data strategy." (Sid Adelman et al, "Data Strategy", 2005)

"It is important to remember that the 'single version of the truth' - or enterprise logical data model - is not and should not be built all at once (that would take too long), but that it evolves over time as the project-specific logical data models are merged, one-by-one, a project at a time." (Sid Adelman et al, "Data Strategy", 2005)

"The chaos without a data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data, the inability to integrate, poor performance, terrible availability, little accountability, users who are increasingly dissatisfied with the performance of IT, and the general feeling that things are out of control." (Sid Adelman et al, "Data Strategy", 2005)

"The data strategist is responsible for creating and maintaining the data strategy. This includes fully understanding the strategic goals of the organization. [...] The data strategist must know (or learn) the existing environment including the important internal databases, the external data that will be integrated, and the data quality characteristics. The data strategist must be aware of the data volumes expected in the next five years. [...] The data strategist must be aware of changes in the business that will require more complex transactions and queries. He or she must also be aware of governmental factors including regulations and governmental reporting requirements. The data strategist must know about the requirements of service level agreements (SLAs) for both performance and availability and be sure that the data strategy supports those SLAs (it's also likely that the data strategist would have input into creating those SLAs.) And finally, the data strategist must be wired into the politics of the organization so that his or her proposals will be pragmatic and accepted by management and staff." (Sid Adelman et al, "Data Strategy", 2005)

"The folks in IT don't like change if they believe it will diminish the power of the IT group. This is particularly true for managers. Managers put forward countless reasons why the organization should stay as is, especially if a change can decrease the number of employees they control because managers often equate headcount to power in the organization." (Sid Adelman et al, "Data Strategy", 2005) [?!]

"The vision of a data strategy that fits your organization has to conform to the overall strategy of IT, which in turn must conform to the strategy of the business. Therefore, the vision should conform to and support where the organization wants to be in 5 years." (Sid Adelman et al, "Data Strategy", 2005)

"Working without a data strategy is analogous to a company allowing each department and each person within each department to develop its own financial chart of accounts. This empowerment allows each person in the organization to choose his own numbering scheme. Existing charts of accounts would be ignored as each person exercises his or her own creativity." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

"You cannot boil the ocean; you have to prioritize your data integration deliverables. An enterprise-wide data integration effort must be carved up into small iterative projects, starting with the most critical data and working down to the less significant data. The business people working with the data integration team must determine which data is most appropriate for integration. Some data might not be suitable for integration at all, such as department-specific data, highly secured data, and data that is too risky to integrate. The team also needs to look at historical data and decide how much of it to include in the data integration process." (Sid Adelman et al, "Data Strategy" 1st Ed., 2005)

🔢Ian Wallis - Collected Quotes

"A data strategy is the opportunity to bring data, one of the most important assets your organisation has, to the fore and to drive the future direction of the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A data strategy which no longer reflects the priorities of the organisation as a whole is doomed to fail, and likely to struggle to keep any momentum beyond the immediate term." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"A KPI is a performance measure that demonstrates how effectively an organisation is achieving its critical objectives. They are used to track performance over a period of time to ensure the organisation is heading in the desired direction, and are quantifiable to guide whether activities need to be dialled up or down, resources adjusted or management resource focused on understanding what is in play that may be holding back the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Culture is not something that can be read in a corporate document (though many organisations will claim to have values, beliefs and other concepts that articulate the culture as the corporate centre wants it to be seen). It is intangible and can be challenging to comprehend to those on the outside looking in. Much of it is unspoken, a series of behavioural norms which are engrained in the fabric of the organisation and drive attitudes of employees to one another, management, change programmes and any external (to the group, as well as the organisation) effort to drive change that may be resisted simply because it ‘isn’t the way we do things around here’." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data has a value, without which an organisation is largely a shell, worthless and of limited appeal other than as a means of sweeping up fixed assets at a knock-down price. It is the lifeblood of an organisation, so whether you regard it as the water that is essential to life or the blood circulating around the body, without it our organisations are not functional." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Data strategy is even less understood [thank business strategy], so the chances of success can be further decreased, simply because you need organisation-wide commitment and buy-in to succeed. Data does not exist in a bubble; it is not the preserve of a function that can fix it for all, detached from touching everyone else. It is core to how you run the organisation, and without a focus on where you are heading, it is going to trip the organisation up at every turn - regulatory compliance; operational effectiveness; financial performance; customer and employee experience; essentially, the efficiency in managing virtually every activity in the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"I am using ‘data strategy’ as an overarching term to describe a far broader set of capabilities from which sub-strategies can be developed to focus on particular facets of the strategy, such as management information (MI) and reporting; analytics, machine learning and AI; insight; and, of course, data management." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"If there is one all too common a failing in data strategies, it is the temptation to make them too detailed through either straying into implementation activities or overplaying the content by providing too much information. The key is to recognise the level of information that needs to be imparted to make the data strategy coherent and likely to be endorsed, with as little information as is necessary to be able to make the point cogently. Brevity, and associated clarity in what needs to be achieved and why, is a winning formula in gaining senior executive sponsorship." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"It is also important to regard the data strategy as a living document. Do not regard it as a masterpiece, never to be reviewed, amended or critiqued within the time frame it covers, but instead see it as a strategy that can flex to the changing demands of an organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"[...] it is always useful to learn from past mistakes, but evidence shows that most strategies fail due to an inability to follow through into execution." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"In the same vein, data strategy is often a misnomer for a much wider scope of coverage, but the lack of coherence in how we use the language has led to data strategy being perceived to cover data management activities all the way through to exploitation of data in the broadest sense. The occasional use of information strategy, intelligence strategy or even data exploitation strategy may differentiate, but the lack of a common definition on what we mean tends to lead to data strategy being used as a catch-all for the more widespread coverage such a document would typically include. Much of this is due to the generic use of the term ‘data’ to cover everything from its capture, management, governance through to reporting, analytics and insight." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"Many organisations start a data strategy from a need to get data into some sort of organised state in which it is feasible to demonstrate compliance. In my opinion, compliance should be a component of a data strategy, not the data strategy in itself." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The challenge with using OKRs is to focus on just three to five objectives - sounds simple enough, but so many organisations follow the ‘if it moves, track it’ philosophy such that they can’t see the wood for the trees." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The key for a successful data strategy is to align it clearly with the corporate strategy. The data strategy is a crucial enabler of the corporate strategy, and the data strategy should clearly call out those components that have a clear line of sight to delivering, or enabling, the corporate goals. If the data strategy does not align to the corporate goals it will be a much more challenging task to get the wider organisation to buy into it, not least because it will fail to have any resonance with the objectives of the organisational leaders and be regarded as optional at best." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The KPI juggernaut has been misused and abused in too many organisations to the extent it has devalued the concept of KPIs. KPIs used well - the ten things that really matter to an organisation - can, in my experience, be a real galvanising force to get focus and attention put in those areas which really can make a difference. The rest is a distraction, there through some misplaced view that more adds value when actually it detracts through losing the focus from where it needs to be." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The nature of the change that the data strategy is to drive will be determined by the appetite and commitment of the organisation to change. It will also be shaped by the maturity of the organisation, with the maturity assessment process having identified and demonstrated where the gaps lie, and the resolve of the organisation to set its own pace and objectives to be achieved by the time of the next assessment." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The premise of OKRs is to keep objectives and results simple and flexible, ensuring they align with business goals and enterprise initiatives guided by regular reviews to assess progress during the quarter. The intent is to keep OKRs clear and accountable, as well as measurable, with between three and five objectives recommended at a high level that can each be tracked by three to five key measures. They should be ambitious goals, even uncomfortable, in challenging aspirations, making them stretch targets." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

🔢Bernard Marr - Collected Quotes

"A good data strategy is not determined by what data is readily or potentially available –​​​​​​​ it’​​​​​​​s about what your business wants to achieve, and how data can help you get there." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"A picture can paint a thousand words, as the saying goes. In this way, visuals are great for conveying information because they’​​​​​​​re quick and direct, they’​​​​​​​re memorable, and they add interest (being much more likely to hold the reader’​​​​​​​s attention than a full page of text). But unless we know how to decode its message, a picture can also be difficult to read." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Analytics is the process of collecting, processing and analysing data to generate insights that help you improve the way you do business." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Data for data’​​​​​​​s sake is meaningless. Therefore, instead of hoarding data, collect only what you really need and what makes business sense." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"Data is certainly exciting –​​​​​​​ revolutionary, even. But that doesn’​​​​​​​t always mean useful. To be truly useful, in a business sense, data must address a specific business need, help the organization reach its strategic goals, or generate real value." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"[…] from a data strategy point of view, you need to describe the ideal data sets that would help you achieve your strategic objectives. You can then choose the best options for you based on how well they help you achieve your objectives, how easy it is to access or gather that data, and how cost effective it is." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"However you plan to use data, even if you plan to treat data as a key business asset, it is never a good idea to capture huge mountains of data that you don’​​​​​​​t really need. Remember, the power of big data is not in the data it - self, it’​​​​​​​s in how you use it." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"I can’​​​​​​​t stress enough how important this stage is; ‘​​​​​​​selling’​​​​​​​ big data to your people is a crucial early step on your data journey. It instils confidence in data." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

"[…] if companies want to avoid drowning in data, they need to develop a smart strategy that focuses on the data they really need to achieve their goals. In other words, this means defining the business-critical questions that need answering and then collecting and analysing only that data which will answer those questions." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Structured data is any data or information that is located in a fixed field within a defined record or file, usually in databases or spreadsheets. Essentially, it is data that is organized in a predetermined way, usually in rows and columns." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"[…] the better insights are communicated, the more likely it is that data leads to positive action (in this case, better business decisions)." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"Unfortunately, the widespread perception among business executives is that data and analytics are purely IT matters. And as with all IT matters, this means they don’​​​​​​​t really need to understand how they work, or why." (Bernard Marr, ​​​​​​​"Data Strategy", 2017)

"When data isn’​​​​​​​t properly looked after, it becomes meaningless and valueless. Even worse, if the data is out of date, incorrectly categorized, or used out of context, it can lead to misinformed decisions that can damage the long-term health of the company." (Bernard Marr, ​​​​​​​"Data Strategy", 2017) [?!] 

11 November 2006

🎯🏭🗒️Sonia Mezzetta - Collected Quotes

"A data architecture needs to have the robustness and ability to support multiple data management and operational models to provide the necessary business value and agility to support an enterprise’s business strategy and capabilities." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy must align with the business goals and overall framework of how data will be used and managed within an organization. It needs to include standards for how data will be discovered, integrated, accessed, shared, and protected. It needs to address how data will meet regulatory compliance policies, Master Data Management, and data democratization. There needs to be an assurance that both data and metadata have a quality control framework in place to achieve data trust. A data strategy needs to have a clear path on how an organization will accomplish data monetization." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy is a living document that needs to be continuously updated to align with business goals. It should have a clear maintenance process with frequent reviews and identification of authors and stakeholders that will contribute to the data strategy. This also includes the handling of exceptions to a data strategy process for any one-off decisions in special circumstances. A data strategy document must always be easily assessable, to the point, and understandable." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Apply DataOps principles to the development and delivery of data. DataOps is a best practice framework that accelerates the development of data and quality across its entire life cycle with high efficiency and quality. This is especially important when integrating data across distributed complex systems and environments." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Automated data orchestration is a key DataOps principle. An example of orchestration can take ETL jobs and a Python script to ingest and transform data based on a specific sequence from different source systems. It can handle the versioning of data to avoid breaking existing data consumption pipelines already in place." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric’s building blocks represent groupings of different components and characteristics. They are high-level blocks that describe a package of capabilities that address specific business needs. The building blocks are Data Governance and its knowledge layer, Data Integration, and Self-Service." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric focuses on Self-Service data access via active metadata leveraging a composable set of tools and technologies. It offers the ability to discover, understand, and access data across hybrid and multi-cloud data landscapes with automation and Data Governance. It is primarily process and technology centric with flexibility in supporting diverse organizational models. On the other hand, Data Mesh is organizationally and process driven. It requires a technical implementation approach to execute its design. Data Mesh is at a higher level and Data Fabric is at a lower level. Data Fabric is capable of fulfilling Data Mesh’s key principles." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"[Data Fabric] is not a single technology, such as data virtualization. […] It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue. It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data. Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric supports a federated, decentralized, or centralized organization. To participate in Data Fabric, metadata is contributed in an automated manner and knowledge is populated from it to propel data management. Data Fabric is different from a Data Mesh design in that it supports decentralized, federated, and centralized organizations. Data Fabric’s objectives are to help an organization to evolve to a more mature level of data management by leveraging active metadata, which is a core prerequisite." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Establish an organization’s data maturity level and progress toward ongoing improvement. An organization needs to first understand what its current data maturity level is to determine the areas of improvement to create a forward-looking plan. A data maturity assessment offers a position on the current data maturity that serves as an indicator of the health of an organization. A data maturity assessment can be used as a tool to drive continuous improvement by measuring progress. The key thing here is to always strive for continuous improvement to achieve success." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"I emphasize this point as there are views in the industry that Data Fabric is a centralized storage architecture, which is not the case from my point of view. A Data Fabric architecture is driven by the needs and direction of the business architecture." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Manage data as a strategic asset that evolves into a data product. The premise here is to stop managing data as a byproduct and create an ecosystem that manages data as a valuable strategic asset that can evolve into a data product. Data producers are accountable for managing the life cycle of data from creation to end of life and ensuring it creates business value along the way for data consumers. This requires data that is governed, trusted, protected, secure, and easily accessible. Move data from technical data assets to Data Products by operationalizing data for high scale sharing." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Where Data Mesh differs from Data Fabric is that it has fixed requirements for the Self-Service platform focused on organizing and managing Data Products by business domain. Another difference is Data Fabric supports managing data as an asset and as a product. A Data Product can be composed of assets that have been governed and managed in a Data Fabric architecture. Data Fabric does not have these fixed requirements, although it inherently supports isolating data and Data Governance enforcement via metadata by business domain. You can think of a Data Mesh Self-Service data platform as supporting separate, independent companies (business domains), although the key criteria are that it does not create data silos and attains data sharing across these companies in a secure, quick, and easy manner. In Data Mesh, Data Products are created and managed by federated business domains and a data platform requires capabilities that enable data and policy federation. This is where a Data Fabric solution can also address Data Mesh’s requirements." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

🔢Charles D Tupper - Collected Quotes

"An architecture is the response to the integrated collections of models and views within the problem area being examined." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"An architecture represents combined perspectives in a structured format that is easily viewable and explains the context of the area being analyzed to all those viewing it." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Analyzing and defining an area must be done prior to doing any activity within that area. Without understanding all that must be done, incorrect assumptions can be reached. Short-term vision may handicap future development. Inappropriate scoping may produce artificial boundaries where there should be none." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architecture allows strategic development of flexible modular designs by insulating the data from the business as well as the technology process." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Methodologies provide guidelines for the application development process. They specify analysis and design techniques as well as the stages in which they occur. They also develop event sequencing. Lastly, they specify milestones and work products that must be created and the appropriate documentation that should be generated." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architectures are the heart of business functionality. Given the proper data architecture, all possible functions can be completed within the enterprise easily and expeditiously." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Processes that use data change far more frequently than the data structures themselves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"The enterprise architecture delineates the data according to the inherent structure within the organization rather than by organizational function or use. In this manner it makes the data dependent on business objects but independent of business processes." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Using architecture leads to foundational stability, not rigidity. As long as the appropriate characteristics are in place to ensure positive architectural evolution, the architecture will remain a living construct. Well-developed architectures are frameworks that evolve as the business evolves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

10 November 2006

🔢Pearl Zhu - Collected Quotes

"A good strategy tells you not only what specifically needs to accomplish, but WHY." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Agile is more a 'direction', than an 'end'. Transforming to Agile culture means the business knows the direction they want to go on." (Pearl Zhu, "Digital Agility: The Rocky Road from Doing Agile to Being Agile", 2016)

"Breaking rules is indeed an important part of creativity. Innovation needs a level of guidance." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Good governance is less about structure and rules than being focused, effective and accountable." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Governance is not about maximization, but about optimization." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Selecting the right measure and measuring things right are both art and science. And KPIs influence management behavior as well as business culture." (Pearl Zhu, "CIO Master: Unleash the Digital Potential of It", 2016)

"Setting the right priorities or having superior time management skill means knowing the difference between 'must have', and 'nice to have'." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"The art of questioning is to ignite innovative thinking; the science of questioning is to frame system thinking, with the progressive pursuit of better solutions." (Pearl Zhu, "Leadership Master: Five Digital Trends to Leap Leadership Maturity", 2016)

"The 'result' of micromanagement is perhaps tangible in the short run, but more often causes damage for the long term." (Pearl Zhu, "Change Insight: Change as an Ongoing Capability to Fuel Digital Transformation", 2016)

"Using two-dimensional lenses to perceive the multi-faceted world can limit your ability to observe the world more objectively." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"A 'roadmap' is simply a plan for moving or transitioning, from one state to another. A roadmap provides the direction to the future." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A well-defined set of digital rules are not for limiting innovation, but for setting the frame of relevance and guide through changes and digital transformation." (Pearl Zhu, "100 Digital Rules: Setting Guidelines to Explore Digital New Normal", 2017)

"Building a comprehensive problem-solving framework is about leveraging a structured methodology that allows you to frame problems systematically and solve problems creatively." (Pearl Zhu, "Problem Solving Master: Frame Problems Systematically and Solve Problem Creatively", 2017)

"Decision makers with emotional excellence have the ability to dispassionately examine alternatives via fact finding, analysis, structured planning, objective evaluations, and comparison." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision making is an art only until the person understands the science." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision maturity is to ensure the right decisions have been made by the right people at the right time to solve the right problems." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Digital synchronization and strategic alignment occur when all parts of the choir sing their respective parts in harmony to achieve a higher purpose." (Pearl Zhu, "12 CIO Personas: The Digital CIO's Situational Leadership Practices", 2017)

"Digitalization implies the full-scale changes in the way business is conducted so that it’s a multi-dimensional planning and orchestration." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"Framing the right problem is equally or even more important than solving it." (Pearl Zhu, “Change, Creativity and Problem-Solving”, 2017)

"Most organizations fail to manage performance effectively because they fail to look into the system holistically." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"The science of decision-making is to make sure there is an effective decision process in place." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"It is important to strengthen the weakest link, to ensure all important business elements integrated and knitted into ongoing organizational capabilities and unique business competency." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"The simplicity and the complexity are just the opposite ends of the same spectrum." (Pearl Zhu, "Digital Gaps: Bridging Multiple Gaps to Run Cohesive Digital Business", 2017)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2017)

"You can’t improve what you are not managing, you can’t manage what you are not measuring, and you can’t measure what you are not focusing." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A business ecosystem is just like the natural ecosystem; first, needs to be understood, then, needs to be well planned, and also needs to be thoughtfully renewed as well." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"A seamless digital transformation requires a vision to convey 'WHY', a solid strategy to clarify 'WHAT', and a technical specification to articulate 'HOW' you want to transform radically." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"An organizational structure carries inherent capabilities as to what can be achieved within its frame." (Pearl Zhu, Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight, 2018)

"Change Management is a journey, not just a one-time project, riding ahead of change curve takes both strategy and methodology." (Pearl Zhu, "The Change Agent CIO: The CIO’s Dynamic Role of Leading Digitalization", 2018)

"Coherence improves business flow; resilience makes business robust and anti-fragile." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"Going digital is more like a journey than a destination. Predicting and preparing the next level of digitalization is an iterative learning and doing continuum." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Ideally, the two structures - hierarchy, and relationship structure wrap around each other to ensure responsibility, to keep information flow and the creation of power." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Taking the multidimensional hybrid models for going digital is all about how to strike the right balance of reaping quick wins and focusing on the long-term strategic goals." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"The most effective digital workplace is one where collaboration and sharing are the norms." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

🎯Rukmani Gopalan - Collected Quotes

"A cloud data warehouse is an enterprise data warehouse offered as a managed service (PaaS) on public clouds with optimized integrations for data ingestion, analytics processing, and BI analytics." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Churn refers to rapidly changing the activities and your plan when they are in flux - this is disruptive to your organization and slows your progress. Change refers to an inevitable movement in requirements and helps you plan for and execute this movement thoughtfully." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"If there is one thing I strongly recommend, it is to invest in a cloud data lake and start collecting and processing data that you believe is useful to your organization today." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"It’s true that data and data strategy are critical to the organization; however, it’s also true that data by itself is a means to the end of business or customer impact unless you’re a provider of data or data-related services." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Plan for customer impact, and prepare to learn and fine-tune as you progress. Make choices based on the impact they offer to customers, and stay consistent in your implementation while keeping open-minded for learnings. Especially if you are an early adopter of a technology, you can help develop the technology with the provider and thus get ample support from the technology provider in return. Similarly, identify highly motivated early adopters within your customer base and offer to develop your solution with them." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Real-time stream processing refers to the ingestion, processing, and consumption of data with a specific focus on speed, targeting near real time - that is, almost instantaneous results. […] Real-time stream processing pipelines involve data that is arriving from its source at very high velocity; in other words, it is data that is streaming into the system, just like rain or a waterfall." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The lakehouse provides a key advantage over the modern data warehouse by eliminating the need to have two places to store the same data. [...] Data lakehouses offer the key benefit of being able to run performant BI/SQL-based scenarios directly on the data lake, right alongside the other exploratory data science and machine learning scenarios." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The promise of a cloud data lake architecture lies in the boundless diversity of scenarios that it enables." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The very simple definition of cloud data lake storage is a service available as a cloud offering that can serve as a central repository for all kinds of data (structured, unstructured, and semistructured) and can support data and transactions at a large scale." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"When it comes to data lakes, some things usually stay constant: the storage and processing patterns. Change could come in any of the following ways: Adding new components and processing or consumption patterns to respond to new requirements. […] Optimizing existing architecture for better cost or performance" (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

08 November 2006

🔢Robert Hawker - Collected Quotes

"[...] a conceptual data model [...] is system-agnostic and is a diagrammatic business representation of how different types of data are associated with one another in the organization." (Robert Hawker, "Practical Data Quality", 2023)

"A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Correction of data in the secondary source is not recommended. However, it is important to recognize that sometimes, secondary source fixes are required." (Robert Hawker, "Practical Data Quality", 2023)

"Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined." (Robert Hawker, "Practical Data Quality", 2023)

"Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column." (Robert Hawker, "Practical Data Quality", 2023)

"Data quality rules are only effective if they are tightly scoped. Generic rules tend to produce a lot of unwanted failed records, and business users start to ignore the results. Once business users lose faith in what they see from a data quality tool, it is hard to restore engagement." (Robert Hawker, "Practical Data Quality", 2023)

"Every data quality initiative is different, and senior stakeholders at different organizations will have different needs." (Robert Hawker, "Practical Data Quality", 2023)

"If an organization had a single overall data quality key performance indicator (KPI), then it might be appropriate to put a greater weighting on those rules which would impact regulatory compliance. A lack of regulatory compliance is a risk to the very existence of organizations like these, and therefore, a greater weighting might be needed." (Robert Hawker, "Practical Data Quality", 2023)

"It rarely makes sense to aim for what people might consider perfect data (every record is complete, accurate, and up to date). The investment required is usually prohibitive, and the gains made for the last 1% of data quality improvement effort become far too marginal." (Robert Hawker, "Practical Data Quality", 2023)

"In truth, no one knows how much bad data quality costs a company – even companies with mature data quality initiatives in place, who are measuring hundreds of data points for their quality struggle to accurately measure quantitative impact. This is often a deal-breaker for senior leaders when trying to get approval for a budget for data quality work. Data quality initiatives often seek substantial budgets and are up against projects with more tangible benefits." (Robert Hawker, "Practical Data Quality", 2023)

"Momentum is important in data quality initiatives. If an issue is problematic, even where the priority is high, it can be better to move on to an issue that can be progressed efficiently." (Robert Hawker, "Practical Data Quality", 2023)

"Most data quality issues will re-occur if the root cause is not fully understood [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Organizations will always only have a limited amount of resources available to remediate data. It will almost certainly not be possible to tackle all the issues at the same time. Therefore, prioritization is key to ensuring that the most value is generated from the available resources." (Robert Hawker, "Practical Data Quality", 2023)

"Successful organizations try to put a holistic data culture in place. Everyone is educated on the basics of looking after data and the importance of having good data. They consider what they have learned when performing their day-to-day tasks. This is often referred to as the promotion of good data literacy." (Robert Hawker, "Practical Data Quality", 2023)

"The biggest mistake that can be made in a data quality initiative is focusing on the wrong data. If you fix data that does not impact a critical business process or drive important decisions, your initiative simply will not make the difference that you want it to." (Robert Hawker, "Practical Data Quality", 2023)

"The data should be monitored in the source, it should be corrected in the source, and it should then feed the secondary source(s) with high-quality data that can be used without workarounds. The reduction in workarounds will make the data engineers, scientists, and data visualization specialists much more productive." (Robert Hawker, "Practical Data Quality", 2023)

"The level of data quality in an organization is the extent to which data can be used for its intended purposes."  (Robert Hawker, "Practical Data Quality", 2023)

"Start with a business strategy. Too many organizations start their data quality initiative by looking at the details of the data and trying to see 'what is wrong with it'. The right approach is to understand what the business is trying to achieve and to work out where data issues might impede this. It ensures that data quality work will be truly impactful." (Robert Hawker, "Practical Data Quality", 2023)

04 November 2006

🔢Dhanurjay "DJ" Patil - Collected Quotes

"[...] a good definition of a data product is a product that facilitates an end goal through the use of data. It’s tempting to think of a data product purely as a data problem. After all, there’s nothing more fun than throwing a lot of technical expertise and fancy algorithmic work at a difficult problem." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"As data scientists, we prefer to interact with the raw data. We know how to import it, transform it, mash it up with other data sources, and visualize it. Most of your customers can’t do that. One of the biggest challenges of developing a data product is figuring out how to give data back to the user. Giving back too much data in a way that’s overwhelming and paralyzing is 'data vomit'. It’s natural to build the product that you would want, but it’s very easy to overestimate the abilities of your users. The product you want may not be the product they want." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Ideas for data products tend to start simple and become complex; if they start complex, they become impossible." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"In many applications, a design treatment that gives the user control over the outcome can go far to create interactions that leave the user feeling good." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The best way to avoid data vomit is to focus on actionability of data. That is, what action do you want the user to take? If you want them to be impressed with the number of things that you can do with the data, then you’re likely producing data vomit. If you’re able to lead them to a clear set of actions, then you’ve built a product with a clear focus." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The key aspect of making a data product is putting the 'product' first and 'data' second. Saying it another way, data is one mechanism by which you make the product user-focused. With all products, you should ask yourself the following three questions: (1) What do you want the user to take away from this product? (2) What action do you want the user to take because of the product? (3) How should the user feel during and after using your product?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"You can give your data product a better chance of success by carefully setting the users’ expectations. [...] One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see." (Dhanurjay Patil, "A Code of Ethics for Data Science", 2016)

01 November 2006

🎯Clay Helberg - Collected Quotes

"Another key element in making informative graphs is to avoid confounding design variation with data variation. This means that changes in the scale of the graphic should always correspond to changes in the data being represented." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Another trouble spot with graphs is multidimensional variation. This occurs where two-dimensional figures are used to represent one-dimensional values. What often happens is that the size of the graphic is scaled both horizontally and vertically according to the value being graphed. However, this results in the area of the graphic varying with the square of the underlying data, causing the eye to read an exaggerated effect in the graph." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"It may be helpful to consider some aspects of statistical thought which might lead many people to be distrustful of it. First of all, statistics requires the ability to consider things from a probabilistic perspective, employing quantitative technical concepts such as 'confidence', 'reliability', 'significance'. This is in contrast to the way non-mathematicians often cast problems: logical, concrete, often dichotomous conceptualizations are the norm: right or wrong, large or small, this or that." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"[...] many non-mathematicians hold quantitative data in a sort of awe. They have been lead to believe that numbers are, or at least should be, unquestionably correct." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Most statistical models assume error free measurement, at least of independent (predictor) variables. However, as we all know, measurements are seldom if ever perfect. Particularly when dealing with noisy data such as questionnaire responses or processes which are difficult to measure precisely, we need to pay close attention to the effects of measurement errors. Two characteristics of measurement which are particularly important in psychological measurement are reliability and validity." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Remember that a p-value merely indicates the probability of a particular set of data being generated by the null model - it has little to say about the size of a deviation from that model (especially in the tails of the distribution, where large changes in effect size cause only small changes in p-values)." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"There are a number of ways that statistical techniques can be misapplied to problems in the real world. Three of the most common hazards are designing experiments with insufficient power, ignoring measurement error, and performing multiple comparisons." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"We can consider three broad classes of statistical pitfalls. The first involves sources of bias. These are conditions or circumstances which affect the external validity of statistical results. The second category is errors in methodology, which can lead to inaccurate or invalid results. The third class of problems concerns interpretation of results, or how statistical results are applied (or misapplied) to real world issues." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

References:
[1] Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995 [link]

31 October 2006

⛩️Daniel Jackson - Collected Quotes

"A commitment to simplicity of design means addressing the essence of design - the abstractions on which software is built - explicitly and up front. Abstractions are articulated, explained, reviewed and examined deeply, in isolation from the details of the implementation. This doesn’t imply a waterfall process, in which all design and specification precedes all coding. But developers who have experienced the benefits of this separation of concerns are reluctant to rush to code, because they know that an hour spent on designing abstractions can save days of refactoring." (Daniel Jackson, "Software Abstractions", 2006)

"A language for describing software abstractions is more than just a logic. You need ways to organize a model, to build larger models from smaller ones, and to factor out components that can be used more than once. There are also small syntactic details - such as shorthands for declarations - that make a language usable in practice. And finally, there’s the need to communicate with an analysis tool, by indicating which analyses are to be performed." (Daniel Jackson, "Software Abstractions", 2006)

"A model diagram declares some sets and binary relations, and imposes some basic constraints on them. A diagram is a good way to convey the outline of a model, but diagrams aren’t expressive enough to include detailed constraints." (Daniel Jackson, "Software Abstractions", 2006) 

"Abstractions matter to users too. Novice users want programs whose abstractions are simple and easy to understand; experts want abstractions that are robust and general enough to be combined in new ways. When good abstractions are missing from the design, or erode as the system evolves, the resulting program grows barnacles of complexity. The user is then forced to master a mass of spurious details, to develop workarounds, and to accept frequent, inexplicable failures." (Daniel Jackson, "Software Abstractions", 2006)

"An assertion is a constraint that is intended to follow from the facts of the model. […] Typically, assertions play two different roles. Some express mundane properties that aren’t interesting in their own right; they’re written purely to detect flaws in the model. It’s surprising how effective even a few such assertions can be in uncovering subtle flaws. […] Other assertions express truly essential properties, and are sometimes more fundamental than the facts of the model." (Daniel Jackson, "Software Abstractions", 2006)

"Analysis brings software abstractions to life in three ways. First, it encourages you as you explore, by giving you concrete examples that reinforce intuition and suggest new scenarios. Second, it keeps you honest, by helping you to check as you go along that what you write down means what you think it means. And third, it can reveal subtle fl aws that you might not have discovered until much later (or not at all)." (Daniel Jackson, "Software Abstractions", 2006)

"An abstraction is not a module, or an interface, class, or method; it is a structure, pure and simple - an idea reduced to its essential form. Since the same idea can be reduced to different forms, abstractions are always, in a sense, inventions, even if the ideas they reduce existed before in the world outside the software. The best abstractions, however, capture their underlying ideas so naturally and convincingly that they seem more like discoveries." (Daniel Jackson, "Software Abstractions", 2006)

"Software is built on abstractions. Pick the right ones, and programming will flow naturally from design; modules will have small and simple interfaces; and new functionality will more likely fit in without extensive reorganization […] Pick the wrong ones, and programming will be a series of nasty surprises: interfaces will become baroque and clumsy as they are forced to accommodate unanticipated interactions, and even the simplest of changes will be hard to make." (Daniel Jackson, "Software Abstractions", 2006)

30 October 2006

⛩️Alan J Perlis - Collected Quotes

"A language that doesn’t affect the way you think about programming, is not worth knowing." (Alan J Perlis, "Epigrams on Programming", 1982)

"A program without a loop and a structured variable isn’t worth writing." (Alan J Perlis, "Epigrams on Programming", 1982)

"A programming language is low level when its programs require attention to the irrelevant." (Alan J Perlis, "Epigrams on Programming", 1982)

"Adapting old programs to fit new machines usually means adapting new machines to behave like old ones." (Alan J Perlis, "Epigrams on Programming", 1982)

"Computers don’t introduce order anywhere as much as they expose opportunities." (Alan J Perlis, "Epigrams on Programming", 1982)

"Documentation is like term insurance: It satisfies because almost no one who subscribes to it depends on its benefits." (Alan J Perlis, "Epigrams on Programming", 1982)

"Don’t have good ideas if you aren’t willing to be responsible for them." (Alan J Perlis, "Epigrams on Programming", 1982)

"Epigrams retrieve deep semantics from a data base that is all procedure." (Alan J Perlis, "Epigrams on Programming", 1982)

"Every program has (at least) two purposes: the one for which it was written, and another for which it wasn’t." (Alan J Perlis, "Epigrams on Programming", 1982)

"Functions delay binding; data structures induce binding. Moral: Structure data late in the programming process. " (Alan J Perlis, "Epigrams on Programming", 1982)

"If a program manipulates a large amount of data, it does so in a small number of ways." (Alan J Perlis, "Epigrams on Programming", 1982)

"If we believe in data structures, we must believe in independent (hence simultaneous) processing. For why else would we collect items within a structure? Why do we tolerate languages that give us the one without the other?" (Alan J Perlis, "Epigrams on Programming", 1982)

"In programming, everything we do is a special case of something more general — and often we know it too quickly." (Alan J Perlis, "Epigrams on Programming", 1982)

"In seeking the unattainable, simplicity only gets in the way." (Alan J Perlis, "Epigrams on Programming", 1982)

"Interfaces keep things tidy, but don’t accelerate growth: Functions do." (Alan J Perlis, "Epigrams on Programming", 1982)

"It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." (Alan J Perlis, "Epigrams on Programming", 1982)

"It is easier to change the specification to fit the program than vice versa. " (Alan J Perlis, "Epigrams on Programming", 1982)

"It is easier to write an incorrect program than understand a correct one." (Alan J Perlis, "Epigrams on Programming", 1982)

"It is not a language’s weakness but its strengths that control the gradient of its change: Alas, a language never escapes its embryonic sac." (Alan J Perlis, "Epigrams on Programming", 1982)

"Make no mistake about it: Computers process numbers — not symbols. We measure our understanding (and control) by the extent to which we can arithmetize an activity." (Alan J Perlis, "Epigrams on Programming", 1982)

"Making something variable is easy. Controlling duration of constancy is the trick." (Alan J Perlis, "Epigrams on Programming", 1982)

"Most people find the concept of programming obvious, but the doing impossible." (Alan J Perlis, "Epigrams on Programming", 1982)

"Often it is the means that justify the ends: Goals advance technique and technique survives even when goal structures crumble." (Alan J Perlis, "Epigrams on Programming", 1982)

"One can only display complex information in the mind. Like seeing, movement or flow or alteration of view is more important than the static picture, no matter how lovely." (Alan J Perlis, "Epigrams on Programming", 1982)

"Programmers are not to be measured by their ingenuity and their logic but by the completeness of their case analysis." (Alan J Perlis, "Epigrams on Programming", 1982)

"Prolonged contact with the computer turns mathematicians into clerks and vice versa." (Alan J Perlis, "Epigrams on Programming", 1982)

"Recursion is the root of computation since it trades description for time." (Alan J Perlis, "Epigrams on Programming", 1982)

"Simplicity does not precede complexity, but follows it." (Alan J Perlis, "Epigrams on Programming", 1982)

"Software is under a constant tension. Being symbolic it is arbitrarily perfectible; but also it is arbitrarily changeable." (Alan J Perlis, "Epigrams on Programming", 1982)

"Some programming languages manage to absorb change, but withstand progress." (Alan J Perlis, "Epigrams on Programming", 1982)

"Symmetry is a complexity-reducing concept (co-routines include subroutines); seek it everywhere." (Alan J Perlis, "Epigrams on Programming", 1982)

"Systems have sub-systems and sub-systems have sub-systems and so on ad infinitum - which is why we’re always starting over." (Alan J Perlis, "Epigrams on Programming", 1982)

"The cybernetic exchange between man, computer and algorithm is like a game of musical chairs: The frantic search for balance always leaves one of the three standing ill at ease." (Alan J Perlis, "Epigrams on Programming", 1982)

"The goal of computation is the emulation of our synthetic abilities, not the understanding of our analytic ones." (Alan J Perlis, "Epigrams on Programming", 1982)

"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information." (Alan J Perlis, "Epigrams on Programming", 1982)

"The use of a program to prove the 4-color theorem will not change mathematics - it merely demonstrates that the theorem, a challenge for a century, is probably not important to mathematics." (Alan J Perlis, "Epigrams on Programming", 1982)

"To understand a program you must become both the machine and the program." (Alan J Perlis, "Epigrams on Programming", 1982)

"We kid ourselves if we think that the ratio of procedure to data in an active data-base system can be made arbitrarily small or even kept small." (Alan J Perlis, "Epigrams on Programming", 1982)

"We will never run out of things to program as long as there is a single program around." (Alan J Perlis, "Epigrams on Programming", 1982)

"Wherever there is modularity there is the potential for misunderstanding: Hiding information implies a need to check communication." (Alan J Perlis, "Epigrams on Programming", 1982)

29 October 2006

⛩️Yegor Bugayenko - Collected Quotes

"All companies are built as hierarchies, no matter what that holacracy adepts are saying now. It's always a boss on the top and then people who report to him down to the lowest level. Staying on the lowest level is what I always try to avoid. Not only because I have some dignity, but mostly because I am lazy. The lower you are in the hierarchy, the more work you have to do and the less money you get for it. This is how the division of labor works, not only in the software industry." (Yegor Bugayenko, "Code Ahead", 2018)

"Any software project must have a technical leader, who is responsible for all technical decisions made by the team and have enough authority to make them. Responsibility and authority are two mandatory components that must be present in order to make it possible to call such a person an architect." (Yegor Bugayenko, "Code Ahead", 2018)

"Attributing bugs to their authors doesn't make them more responsible, only more scared." (Yegor Bugayenko, "Code Ahead", 2018)

"Automated testing is a safety net that protects the program from its programmers." (Yegor Bugayenko, "Code Ahead", 2018)

"Every conflict must produce a win-win outcome and must never be resolved through a compromise, which makes both sides suffer in some way. Even forcing one side to do what the other side wants is better than a compromise." (Yegor Bugayenko, "Code Ahead", 2018)

"Fixing the system without fixing people that work in it would be a huge trauma for them; they will do everything they can to prevent it from happening." (Yegor Bugayenko, "Code Ahead", 2018)

"It is not loyalty or internal motivation that drives us programmers forward. We must write our code when the road to our personal success is absolutely clear for us and writing high quality code obviously helps us move forward on this road. To make this happen, the management has to define the rules of the game, also known as "process", and make sure they are strictly enforced, which is much more difficult than 'being agile'." (Yegor Bugayenko, "Code Ahead", 2018)

"It's impossible to change the management system without changing the managers who built it. The management is the product of people who created it." (Yegor Bugayenko, "Code Ahead", 2018)

"Just by making the architect role explicit, a team can effectively resolve many technical conflicts." (Yegor Bugayenko, "Code Ahead", 2018)

"Punishment demotivates when it comes from people rather than a system of well-defined rules." (Yegor Bugayenko, "Code Ahead", 2018)

"Quality is a product of a conflict between programmers and testers." (Yegor Bugayenko, "Code Ahead", 2018)

"Quality must be enforced, otherwise it won't happen. We programmers must be required to write tests, otherwise we won't do it." (Yegor Bugayenko, "Code Ahead", 2018)

"Responsibility means an inevitable punishment for mistakes; authority means full power to make them." (Yegor Bugayenko, "Code Ahead", 2018)

"The higher the price of information in a software team, the less effective the team is." (Yegor Bugayenko, "Code Ahead", 2018)

"The job of a tester is to prove that the software is bug free, while it has to be the other way around: The job of a tester is to prove that the software is broken. The better testers are doing their jobs, the more bugs they manage to find and report." (Yegor Bugayenko, "Code Ahead", 2018)

"To make technical decisions, a result-oriented team needs a strong architect and a decision making process, not meetings." (Yegor Bugayenko, "Code Ahead", 2018)

"Very often managers are just a noise, while the real boss is the project, which we work for and which pays us." (Yegor Bugayenko, "Code Ahead", 2018)

"We must not blame programmers for their bugs. They belong to them only until the code is merged to the repository. After that, all bugs are ours!" (Yegor Bugayenko, "Code Ahead", 2018)

"We, newbies and young programmers, don't like chaos because it makes us dependent on experts. We have to beg for information and feel bad." (Yegor Bugayenko, "Code Ahead", 2018)

⛩️Martin Kleppmann - Collected Quotes

"A fault is usually defined as one component of the system deviating from its spec, where - as a failure is when the system as a whole stops providing the required service to the user. It is impossible to reduce the probability of a fault to zero; therefore it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"[…] a NoSQL system may find itself accidentally reinventing SQL, albeit in disguise."(Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rare - the load parameters. If those assumptions turn out to be wrong, the engineering effort for scaling is at best wasted, and at worst counterproductive." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"[…] as software engineers and architects, we also need to have a technically accurate and precise understanding of the various technologies and their trade-offs if we want to build good applications." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"[…] building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Consensus is one of the most important and fundamental problems in distributed computing. On the surface, it seems simple: informally, the goal is simply to get several nodes to agree on something." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Every legacy system is unpleasant in its own way, and so it is difficult to give general recommendations for dealing with them." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Everybody has an intuitive idea of what it means for something to be reliable or unreliable. For software, typical expectations include: The application performs the function that the user expected. It can tolerate the user making mistakes or using the software in unexpected ways. Its performance is good enough for the required use case, under the expected load and data volume. The system prevents any unauthorized access and abuse." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"It would be unwise to assume that faults are rare and simply hope for the best. It is important to consider a wide range of possible faults - even fairly unlikely ones - and to artificially create such situations in your testing environment to see what happens. In distributed systems, suspicion, pessimism, and paranoia pay off." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Reducing response times at very high percentiles is difficult because they are easily affected by random events outside of your control, and the benefits are diminishing." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015) 

"Technology is a powerful force in our society. Data, software, and communication can be used for bad: to entrench unfair power structures, to undermine human rights, and to protect vested interests. But they can also be used for good: to make underrepresented people’s voices heard, to create opportunities for everyone, and to avert disasters." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"The architecture of systems that operate at large scale is usually highly specific to the application - there is no such thing as a generic, one-size-fits-all scalable architecture (informally known as magic scaling sauce)." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"The fact that SQL is more limited in functionality gives the database much more room for automatic optimizations." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"The need for data integration often only becomes apparent if you zoom out and consider the dataflows across an entire organization." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"This is a deliberate choice in the design of computers: if an internal fault occurs, we prefer a computer to crash completely rather than returning a wrong result, because wrong results are difficult and confusing to deal with. Thus, computers hide the fuzzy physical reality on which they are implemented and present an idealized system model that operates with mathematical perfection." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"[...] when measuring performance, it’s worth using percentiles rather than averages. The main advantage of the mean is that it’s easy to calculate, but percentiles are much more meaningful." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"When we develop predictive analytics systems, we are not merely automating a human’s decision by using software to specify the rules for when to say yes or no; we are even leaving the rules themselves to be inferred from data. However, the patterns learned by these systems are opaque: even if there is some correlation in the data, we may not know why. If there is a systematic bias in the input to an algorithm, the system will most likely learn and amplify that bias in its output." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Working with distributed systems is fundamentally different from writing software on a single computer - and the main difference is that there are lots of new and exciting ways for things to go wrong." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.