SQL Troubles

01 January 2016

♜Strategic Management: Strategy (Definitions)

"Strategy can be defined as the determination of the long-term goals and objectives of an enterprise, and the adoption of courses of action and the allocation of resources necessary for carrying out these goals." (Alfred D. Chandler Jr., "Strategy and Structure", 1962)

"Strategy is the pattern of objectives, purposes or goals and major policies and plans for achieving these goals, stated in such a way as to define what businesses the company is in or is to be in and the kind of company it is or is to be." (Edmund P Learned et al, "Business Policy: Text and Cases", 1965)

"Strategies are forward-looking plans that anticipate change and initiate actions to take advantage of opportunities that are integrated into the concept or mission of the company." (William A Newman & J. P Logan, "Strategy, Policy, and Central Management", 1971)

"Strategy is the basic goals and objectives of the organization, the major programs of action chosen to reach these goals and objectives, and the major pattern of resource allocation used to relate the organization to its environment." (Dan E Schendel & K J Hatten, "Business Policy or Strategic Management: A View for an Emerging Discipline", 1972)

"Strategy is a unified, comprehensive, and integrative plan designed to assure that the basic objectives of the enterprise are achieved." (William F Glueck, "Business Policy, Strategy Formation, and Management Action", 1976)

"Strategy is the forging of company missions, setting objectives for the organization in light of external and internal forces, formulating specific policies and strategies to achieve objectives, and ensuring their proper implementation so that the basic purposes and objectives of the organization will be achieved." (George A Steiner & John B. Miner,"Management Policy and Strategy", 1977)

"Strategy is a mediating force between the organization and its environment: consistent patterns of streams of organizational decisions to deal with the environment." (Henry Mintzberg, "The Structuring of Organizations", 1979)

"Strategy is defined as orienting 'metaphases' or frames of reference that allow the organization and its environment to be understood by organizational stakeholders. On this basis, stakeholders are motivated to believe and to act in ways that are expected to produce favorable results for the organization." (Ellen E Chaffee, "Three Models of Strategy," Academy of Management Review Vol. 10 (1), 1985)

"Strategy is the creation of a unique and valuable position, involving a different set of activities. [...] Strategy is creating fit among a company’s activities." (Michael E Porter, "What is Strategy?", Harvard Business Review, 1996)

"General direction set for the organization and its various components to achieve a desired state in the future, resulting from the detailed strategic planning process." (Alan Wa Steiss, "Strategic Management for Public and Nonprofit Organizations", 2003)

"An organization's overall plan of development, describing the effective use of resources in support of the organization in its future activities. It involves setting objectives and proposing initiatives for action." (ISO/IEC 38500:2008, 2008)

"An organized set of initiation programs and projects undertaken in order to achieve the organization ’ s vision." (Terry Schimidt, "Strategic Management Made Simple", 2009)

"This is a plan of action stating how an organisation will achieve its long-term objectives." (Bernard Burnes, "Managing change : a strategic approach to organisational dynamics" 5th Ed., 2009)

"The essential course of action attempted to achieve an enterprise’s end - particularly goals. Moreover, a strategy must be to carry out exactly one mission. In general, strategies address goals, and tactics address objectives." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A broad-based formula for how a business is going to accomplish its mission, what its goals should be, and what plans and policies will be needed to carry out those goals." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"denotes, by an extension of military language, the development of a policy by the enterprise (its objectives, structure, and operation), defined on one hand on the basis of its strengths and weaknesses and, on the other hand, taking into account threats and opportunities identified in its environment." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence", 2011)

"A comprehensive plan that states how a corporation will achieve its mission and objectives." (Thomas L Wheelen & J David Hunger., "Strategic management and business policy: toward global sustainability" 13th Ed., 2012)

"The proposed direction an organization will achieve over the long term, through the configuration of resources in a challenging environment, to meet the needs of markets and to fulfill stakeholder expectations." (Paul C Dinsmore et al, "Enterprise Project Governance", 2012)

"A strategy is a comprehensive plan guiding resource allocation to achieve long-term organization goals." (John R Schermerhorn Jr, "Management" 12th Ed., 2012)

"The definition of the model’s goals, the high-level approach to achieve these goals, and the decision making mechanisms to execute this approach." (Panos Alexopoulos, "Semantic Modeling for Data", 2020)

"Strategy is a style of thinking, a conscious and deliberate process, an intensive implementation system, the science of insuring future success." (Pete Johnson)

"Strategy is the way an organization seeks to achieve its vision and mission. It is a forward-looking statement about an organization’s planned use of resources and deployment capabilities. Strategy becomes real when it is associated with: 1) a concrete set of goals and objectives; and 2) a method involving people, resources and processes." (Intrafocus)

31 December 2015

🪙Business Intelligence: Data Fabric (Just the Quotes)

"Data architects often turn to graphs because they are flexible enough to accommodate multiple heterogeneous representations of the same entities as described by each of the source systems. With a graph, it is possible to associate underlying records incrementally as data is discovered. There is no need for big, up-front design, which serves only to hamper business agility. This is important because data fabric integration is not a one-off effort and a graph model remains flexible over the lifetime of the data domains." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Data fabrics are general-purpose, organization-wide data access interfaces that offer a connected view of the integrated domains by combining data stored in a local graph with data retrieved on demand from third-party systems. Their job is to provide a sophisticated index and integration points so that they can curate data across silos, offering consistent capabilities regardless of the underlying store (which might or might not be graph based) […]." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"A Data Fabric has its focus more on the architectural underpinning, technical capabilities, and intelligent analysis to produce active metadata supporting a smarter, AI-infused system to orchestrate various data integration styles, enabling trusted and reusable data in a hybrid cloud landscape to be consumed by humans, applications, or other downstream systems." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"Data Fabric’s building blocks represent groupings of different components and characteristics. They are high-level blocks that describe a package of capabilities that address specific business needs. The building blocks are Data Governance and its knowledge layer, Data Integration, and Self-Service." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Gaining more insight into data, simplifying data access, enabling shopping-for-data, augmenting traditional data governance, generating active metadata, and accelerating development of products and services are enabled by infusing AI into the Data Fabric architecture. An AI-infused Data Fabric is not only leveraging AI but also likewise an architecture to manage and deal with AI artefacts, including AI models, pipelines, etc." (Eberhard Hechler et al, "Data Fabric and Data Mesh Approaches with AI", 2023)

"The data fabric is an approach that addresses today’s data management and scalability challenges by adding intelligence and simplifying data access using self-service. In contrast to the data mesh, it focuses more on the technology layer. It’s an architectural vision using unified metadata with an end-to-end integrated layer (fabric) for easily accessing, integrating, provisioning, and using data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures. (James Serra, "Deciphering Data Architectures", 2024)

30 December 2015

🪙Business Intelligence: Complexity (Just the Quotes)

"The more complex the shape of any object. the more difficult it is to perceive it. The nature of thought based on the visual apprehension of objective forms suggests, therefore, the necessity to keep all graphics as simple as possible. Otherwise, their meaning will be lost or ambiguous, and the ability to convey the intended information and to persuade will be inhibited." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Any presentation of data, whether a simple calculated metric or a complex predictive model, is going to have a set of assumptions and choices that the producer has made to get to the output. The more that these can be made explicit, the more the audience of the data will be open to accepting the message offered by the presenter." (Zach Gemignani et al, "Data Fluency", 2014)

"Decision trees are also considered nonparametric models. The reason for this is that when we train a decision tree from data, we do not assume a fixed set of parameters prior to training that define the tree. Instead, the tree branching and the depth of the tree are related to the complexity of the dataset it is trained on. If new instances were added to the dataset and we rebuilt the tree, it is likely that we would end up with a (potentially very) different tree." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"When datasets are small, a parametric model may perform well because the strong assumptions made by the model - if correct - can help the model to avoid overfitting. However, as the size of the dataset grows, particularly if the decision boundary between the classes is very complex, it may make more sense to allow the data to inform the predictions more directly. Obviously the computational costs associated with nonparametric models and large datasets cannot be ignored. However, support vector machines are an example of a nonparametric model that, to a large extent, avoids this problem. As such, support vector machines are often a good choice in complex domains with lots of data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

🪙Business Intelligence: Data Analysis (Just the Quotes)

"As in Mathematics, so in Natural Philosophy, the Investigation of difficult Things by the Method of Analysis, ought ever to precede the Method of Composition. This Analysis consists in making Experiments and Observations, and in drawing general Conclusions from them by Induction, and admitting of no Objections against the Conclusions but such as are taken from Experiments, or other certain Truths." (Sir Isaac Newton, "Opticks", 1704)

"The errors which arise from the absence of facts are far more numerous and more durable than those which result from unsound reasoning respecting true data." (Charles Babbage, "On the Economy of Machinery and Manufactures", 1832)

"In every branch of knowledge the progress is proportional to the amount of facts on which to build, and therefore to the facility of obtaining data." (James C Maxwell, [letter to Lewis Campbell] 1851)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"The technical analysis of any large collection of data is a task for a highly trained and expensive man who knows the mathematical theory of statistics inside and out. Otherwise the outcome is likely to be a collection of drawings - quartered pies, cute little battleships, and tapering rows of sturdy soldiers in diversified uniforms - interesting enough in the colored Sunday supplement, but hardly the sort of thing from which to draw reliable inferences." (Eric T Bell, "Mathematics: Queen and Servant of Science", 1951)

"If data analysis is to be well done, much of it must be a matter of judgment, and ‘theory’ whether statistical or non-statistical, will have to guide, not command." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33 (1), 1962)

"If one technique of data analysis were to be exalted above all others for its ability to be revealing to the mind in connection with each of many different models, there is little doubt which one would be chosen. The simple graph has brought more information to the data analyst’s mind than any other device. It specializes in providing indications of unexpected phenomena." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics Vol. 33 (1), 1962)

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"The first step in data analysis is often an omnibus step. We dare not expect otherwise, but we equally dare not forget that this step, and that step, and other step, are all omnibus steps and that we owe the users of such techniques a deep and important obligation to develop ways, often varied and competitive, of replacing omnibus procedures by ones that are more sharply focused." (John W Tukey, "The Future of Processes of Data Analysis", 1965)

"The basic general intent of data analysis is simply stated: to seek through a body of data for interesting relationships and information and to exhibit the results in such a way as to make them recognizable to the data analyzer and recordable for posterity. Its creative task is to be productively descriptive, with as much attention as possible to previous knowledge, and thus to contribute to the mysterious process called insight." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Comparable objectives in data analysis are (l) to achieve more specific description of what is loosely known or suspected; (2) to find unanticipated aspects in the data, and to suggest unthought-of-models for the data's summarization and exposure; (3) to employ the data to assess the (always incomplete) adequacy of a contemplated model; (4) to provide both incentives and guidance for further analysis of the data; and (5) to keep the investigator usefully stimulated while he absorbs the feeling of his data and considers what to do next." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Data analysis must be iterative to be effective. [...] The iterative and interactive interplay of summarizing by fit and exposing by residuals is vital to effective data analysis. Summarizing and exposing are complementary and pervasive." (John W Tukey & Martin B Wilk, "Data Analysis and Statistics: An Expository Overview", 1966)

"Every student of the art of data analysis repeatedly needs to build upon his previous statistical knowledge and to reform that foundation through fresh insights and emphasis." (John W Tukey, "Data Analysis, Including Statistics", 1968)

"[...] bending the question to fit the analysis is to be shunned at all costs." (John W Tukey, "Analyzing Data: Sanctification or Detective Work?", 1969)

"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"Almost all efforts at data analysis seek, at some point, to generalize the results and extend the reach of the conclusions beyond a particular set of data. The inferential leap may be from past experiences to future ones, from a sample of a population to the whole population, or from a narrow range of a variable to a wider range. The real difficulty is in deciding when the extrapolation beyond the range of the variables is warranted and when it is merely naive. As usual, it is largely a matter of substantive judgment - or, as it is sometimes more delicately put, a matter of 'a priori nonstatistical considerations'." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"[…] it is not enough to say: 'There's error in the data and therefore the study must be terribly dubious'. A good critic and data analyst must do more: he or she must also show how the error in the measurement or the analysis affects the inferences made on the basis of that data and analysis." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Typically, data analysis is messy, and little details clutter it. Not only confounding factors, but also deviant cases, minor problems in measurement, and ambiguous results lead to frustration and discouragement, so that more data are collected than analyzed. Neglecting or hiding the messy details of the data reduces the researcher's chances of discovering something new." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"[...] be wary of analysts that try to quantify the unquantifiable." (Ralph Keeney & Raiffa Howard, "Decisions with Multiple Objectives: Preferences and Value Trade-offs", 1976)

"[...] exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as for those we believe might be there. Except for its emphasis on graphs, its tools are secondary to its purpose." (John W Tukey, [comment] 1979)

"[...] any hope that we are smart enough to find even transiently optimum solutions to our data analysis problems is doomed to failure, and, indeed, if taken seriously, will mislead us in the allocation of effort, thus wasting both intellectual and computational effort." (John W Tukey, "Choosing Techniques for the Analysis of Data", 1981)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"Exploratory data analysis, EDA, calls for a relatively free hand in exploring the data, together with dual obligations: (•) to look for all plausible alternatives and oddities - and a few implausible ones, (graphic techniques can be most helpful here) and (•) to remove each appearance that seems large enough to be meaningful - ordinarily by some form of fitting, adjustment, or standardization [...] so that what remains, the residuals, can be examined for further appearances." (John W Tukey, "Introduction to Styles of Data Analysis Techniques", 1982)

"A competent data analysis of an even moderately complex set of data is a thing of trials and retreats, of dead ends and branches." (John W Tukey, Computer Science and Statistics: Proceedings of the 14th Symposium on the Interface, 1983)

"Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]" (George Greenstein, "Frozen Star", 1983)

"Iteration and experimentation are important for all of data analysis, including graphical data display. In many cases when we make a graph it is immediately clear that some aspect is inadequate and we regraph the data. In many other cases we make a graph, and all is well, but we get an idea for studying the data in a different way with a different graph; one successful graph often suggests another." (William S Cleveland, "The Elements of Graphing Data", 1985)

"There are some who argue that a graph is a success only if the important information in the data can be seen within a few seconds. While there is a place for rapidly-understood graphs, it is too limiting to make speed a requirement in science and technology, where the use of graphs ranges from, detailed, in-depth data analysis to quick presentation." (William S Cleveland, "The Elements of Graphing Data", 1985)

"A first analysis of experimental results should, I believe, invariably be conducted using flexible data analytical techniques - looking at graphs and simple statistics - that so far as possible allow the data to 'speak for themselves'. The unexpected phenomena that such a approach often uncovers can be of the greatest importance in shaping and sometimes redirecting the course of an ongoing investigation." (George Box, "Signal to Noise Ratios, Performance Criteria, and Transformations", Technometrics 30, 1988)

"Data analysis is an art practiced by individuals who are skilled at quantitative reasoning and have much experience in looking at numbers and detecting patterns in data. Usually these individuals have some background in statistics." (David Lubinsky, Daryl Pregibon , "Data analysis as search", Journal of Econometrics Vol. 38 (1–2), 1988)

"Like a detective, a data analyst will experience many dead ends, retrace his steps, and explore many alternatives before settling on a single description of the evidence in front of him." (David Lubinsky & Daryl Pregibon , "Data analysis as search", Journal of Econometrics Vol. 38 (1–2), 1988)

"[…] data analysis in the context of basic mathematical concepts and skills. The ability to use and interpret simple graphical and numerical descriptions of data is the foundation of numeracy […] Meaningful data aid in replacing an emphasis on calculation by the exercise of judgement and a stress on interpreting and communicating results." (David S Moore, "Statistics for All: Why, What and How?", 1990)

"Data analysis is rarely as simple in practice as it appears in books. Like other statistical techniques, regression rests on certain assumptions and may produce unrealistic results if those assumptions are false. Furthermore it is not always obvious how to translate a research question into a regression model." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Data analysis typically begins with straight-line models because they are simplest, not because we believe reality is inherently linear. Theory or data may suggest otherwise [...]" (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"90 percent of all problems can be solved by using the techniques of data stratification, histograms, and control charts. Among the causes of nonconformance, only one-fifth or less are attributable to the workers." (Kaoru Ishikawa, The Quality Management Journal Vol. 1, 1993)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"Visualization is an approach to data analysis that stresses a penetrating look at the structure of data. No other approach conveys as much information. […] Conclusions spring from data when this information is combined with the prior knowledge of the subject under investigation." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"A careful and sophisticated analysis of the data is often quite useless if the statistician cannot communicate the essential features of the data to a client for whom statistics is an entirely foreign language." (Christopher J Wild, "Embracing the ‘Wider view’ of Statistics", The American Statistician 48, 1994)

"Science is not impressed with a conglomeration of data. It likes carefully constructed analysis of each problem." (Daniel E Koshland Jr, Science Vol. 263 (5144), [editorial] 1994)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No matter what the data, and no matter how the values are arranged and presented, you must always use some method of analysis to come up with an interpretation of the data.
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"The purpose of analysis is insight. The best analysis is the simplest analysis which provides the needed insight." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Without meaningful data there can be no meaningful analysis. The interpretation of any data set must be based upon the context of those data." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Statistical analysis of data can only be performed within the context of selected assumptions, models, and/or prior distributions. A statistical analysis is actually the extraction of substantive information from data and assumptions. And herein lies the rub, understood well by Disraeli and others skeptical of our work: For given data, an analysis can usually be selected which will result in 'information' more favorable to the owner of the analysis then is objectively warranted." (Stephen B Vardeman & Max D Morris, "Statistics and Ethics: Some Advice for Young Statisticians", The American Statistician vol 57, 2003)

"Exploratory Data Analysis is more than just a collection of data-analysis techniques; it provides a philosophy of how to dissect a data set. It stresses the power of visualisation and aspects such as what to look for, how to look for it and how to interpret the information it contains. Most EDA techniques are graphical in nature, because the main aim of EDA is to explore data in an open-minded way. Using graphics, rather than calculations, keeps open possibilities of spotting interesting patterns or anomalies that would not be apparent with a calculation (where assumptions and decisions about the nature of the data tend to be made in advance)." (Alan Graham, "Developing Thinking in Statistics", 2006)

"It is the aim of all data analysis that a result is given in form of the best estimate of the true value. Only in simple cases is it possible to use the data value itself as result and thus as best estimate." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"Data analysis is careful thinking about evidence." (Michael Milton, "Head First Data Analysis", 2009)

"Doing data analysis without explicitly defining your problem or goal is like heading out on a road trip without having decided on a destination." (Michael Milton, "Head First Data Analysis", 2009)

"The discrepancy between our mental models and the real world may be a major problem of our times; especially in view of the difficulty of collecting, analyzing, and making sense of the unbelievable amount of data to which we have access today." (Ugo Bardi, "The Limits to Growth Revisited", 2011)

"Too often there is a disconnect between the people who run a study and those who do the data analysis. This is as predictable as it is unfortunate. If data are gathered with particular hypotheses in mind, too often they (the data) are passed on to someone who is tasked with testing those hypotheses and who has only marginal knowledge of the subject matter. Graphical displays, if prepared at all, are just summaries or tests of the assumptions underlying the tests being done. Broader displays, that have the potential of showing us things that we had not expected, are either not done at all, or their message is not able to be fully appreciated by the data analyst." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

"Data analysis is not generally thought of as being simple or easy, but it can be. The first step is to understand that the purpose of data analysis is to separate any signals that may be contained within the data from the noise in the data. Once you have filtered out the noise, anything left over will be your potential signals. The rest is just details." (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"The four questions of data analysis are the questions of description, probability, inference, and homogeneity. Any data analyst needs to know how to organize and use these four questions in order to obtain meaningful and correct results. [...]
THE DESCRIPTION QUESTION: Given a collection of numbers, are there arithmetic values that will summarize the information contained in those numbers in some meaningful way?
THE PROBABILITY QUESTION: Given a known universe, what can we say about samples drawn from this universe? [...]
THE INFERENCE QUESTION: Given an unknown universe, and given a sample that is known to have been drawn from that unknown universe, and given that we know everything about the sample, what can we say about the unknown universe? [...]
THE HOMOGENEITY QUESTION: Given a collection of observations, is it reasonable to assume that they came from one universe, or do they show evidence of having come from multiple universes?" (Donald J Wheeler," Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Each systems archetype embodies a particular theory about dynamic behavior that can serve as a starting point for selecting and formulating raw data into a coherent set of interrelationships. Once those relationships are made explicit and precise, the 'theory' of the archetype can then further guide us in our data-gathering process to test the causal relationships through direct observation, data analysis, or group deliberation." (Daniel H Kim, "Systems Archetypes as Dynamic Theories", The Systems Thinker Vol. 24 (1), 2013)

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Statistics is an integral part of the quantitative approach to knowledge. The field of statistics is concerned with the scientific study of collecting, organizing, analyzing, and drawing conclusions from data." (Kandethody M Ramachandran & Chris P Tsokos, "Mathematical Statistics with Applications in R" 2nd Ed., 2015)
"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"[…] the data itself can lead to new questions too. In exploratory data analysis (EDA), for example, the data analyst discovers new questions based on the data. The process of looking at the data to address some of these questions generates incidental visualizations - odd patterns, outliers, or surprising correlations that are worth looking into further." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)
"Analysis is a two-step process that has an exploratory and an explanatory phase. In order to create a powerful data story, you must effectively transition from data discovery (when you’re finding insights) to data communication (when you’re explaining them to an audience). If you don’t properly traverse these two phases, you may end up with something that resembles a data story but doesn’t have the same effect. Yes, it may have numbers, charts, and annotations, but because it’s poorly formed, it won’t achieve the same results." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"While visuals are an essential part of data storytelling, data visualizations can serve a variety of purposes from analysis to communication to even art. Most data charts are designed to disseminate information in a visual manner. Only a subset of data compositions is focused on presenting specific insights as opposed to just general information. When most data compositions combine both visualizations and text, it can be difficult to discern whether a particular scenario falls into the realm of data storytelling or not." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)
"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)
"We all know that the numerical values on each side of an equation have to be the same. The key to dimensional analysis is that the units have to be the same as well. This provides a convenient way to keep careful track of units when making calculations in engineering and other quantitative disciplines, to make sure one is computing what one thinks one is computing. When an equation exists only for the sake of mathiness, dimensional analysis often makes no sense." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)
"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)
[Murphy’s Laws of Analysis:] "(1) In any collection of data, the figures that are obviously correct contain errors. (2) It is customary for a decimal to be misplaced. (3) An error that can creep into a calculation, will. Also, it will always be in the direction that will cause the most damage to the calculation." (G C Deakly)

"We must include in any language with which we hope to describe complex data-processing situations the capability for describing data. We must also include a mechanism for determining the priorities to be applied to the data. These priorities are not fixed and are indicated in many cases by the data." (Grace Hopper)

See also: Analysis

🪙Business Intelligence: Data Pipelines (Just the Quotes)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018

"The quality of data that flows within a data pipeline is as important as the functionality of the pipeline. If the data that flows within the pipeline is not a valid representation of the source data set(s), the pipeline doesn’t serve any real purpose. It’s very important to incorporate data quality checks within different phases of the pipeline. These checks should verify the correctness of data at every phase of the pipeline. There should be clear isolation between checks at different parts of the pipeline. The checks include checks like row count, structure, and data type validation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"For advanced analytics, a well-designed data pipeline is a prerequisite, so a large part of your focus should be on automation. This is also the most difficult work. To be successful, you need to stitch everything together." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"A data pipeline is a series of transformation steps (functions) executed as the data flows from one step to another. Data mesh refrains from using pipelines as a top-level architectural paradigm and in between data products. The challenge with pipelines as currently used is that they don’t create clear interfaces, contracts, and abstractions that can be maintained easily as the pipeline complexity complexity grows. Due to lack of abstractions, single failure in the pipeline causes cascading failures." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare,"Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

26 December 2015

🪙Business Intelligence: Measurement (Just the Quotes)

"There is no inquiry which is not finally reducible to a question of Numbers; for there is none which may not be conceived of as consisting in the determination of quantities by each other, according to certain relations." (Auguste Comte, “The Positive Philosophy”, 1830)

"When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science.” (Lord Kelvin, "Electrical Units of Measurement", 1883)

“Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness.” (Arthur Lyon Bowley, “The Nature and Purpose of the Measurement of Social Phenomena”, 1915)

“Science depends upon measurement, and things not measurable are therefore excluded, or tend to be excluded, from its attention.” (Arthur J Balfour, “Address”, 1917)

“It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest.” (William E Deming, “Statistical Adjustment of Data”, 1943)

“The purpose of computing is insight, not numbers […] sometimes […] the purpose of computing numbers is not yet in sight.” (Richard Hamming, “Numerical Methods for Scientists and Engineers”, 1962)

“A quantity like time, or any other physical measurement, does not exist in a completely abstract way. We ﬁnd no sense in talking about something unless we specify how we measure it. It is the deﬁnition by the method of measuring a quantity that is the one sure way of avoiding talking nonsense...” (Hermann Bondi, “Relativity and Common Sense”, 1964)

“Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate.” (Abraham Kaplan, “The Conduct of Inquiry: Methodology for Behavioral Science”, 1964)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

“Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]” (George Greenstein, “Frozen Star”, 1983)

"Changing measures are a particularly common problem with comparisons over time, but measures also can cause problems of their own. [...] We cannot talk about change without making comparisons over time. We cannot avoid such comparisons, nor should we want to. However, there are several basic problems that can affect statistics about change. It is important to consider the problems posed by changing - and sometimes unchanging - measures, and it is also important to recognize the limits of predictions. Claims about change deserve critical inspection; we need to ask ourselves whether apples are being compared to apples - or to very different objects." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Measurement is often associated with the objectivity and neatness of numbers, and performance measurement efforts are typically accompanied by hope, great expectations and promises of change; however, these are then often followed by disbelief, frustration and what appears to be sheer madness." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Measuring anything subjective always prompts perverse behavior. [...] All measurement systems are subject to abuse." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

“The value of having numbers - data - is that they aren't subject to someone else's interpretation. They are just the numbers. You can decide what they mean for you.” (Emily Oster, “Expecting Better”, 2013)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Usually, it is impossible to restate past data. As a result, all history must be whitewashed and measurement starts from scratch." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

25 December 2015

🪙Business Intelligence: Data Mesh (Just the quotes)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Data mesh architectures are inherently decentralized, and significant responsibility is delegated to the data product owners. A data mesh also benefits from a degree of centralization in the form of data product compatibility and common self-service tooling. Differing opinions, preferences, business requirements, legal constraints, technologies, and technical debt are just a few of the many factors that influence how we work together." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"A data mesh splits the boundaries of the exchange of data into multiple data products. This provides a unique opportunity to partially distribute the responsibility of data security. Each data product team can be made responsible for how their data should be accessed and what privacy policies should be applied." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules."(James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)

"Data mesh fundamentally reframes data governance and validation by distributing accountability to domain-oriented teams who act as custodians and producers of their respective data products. These teams possess intimate domain knowledge, which is essential for nuanced validation criteria that adapt to the semantics, context, and evolution of their datasets. By treating datasets as first-class products with clear ownership, interfaces, and service-level objectives, data mesh encourages autonomous validation workflows embedded directly within the domains where data originates and is consumed." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Modern complex organizations increasingly confront the challenge of ensuring data quality at scale without centralizing validation activities into a single bottlenecked team. The data mesh paradigm and federated controls emerge as pivotal architectural styles and organizational patterns that enable decentralized, self-serve data quality validation while preserving coherence and reliability across diverse data products." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)