26 December 2015

🪙Business Intelligence: Measurement (Just the Quotes)

"There is no inquiry which is not finally reducible to a question of Numbers; for there is none which may not be conceived of as consisting in the determination of quantities by each other, according to certain relations." (Auguste Comte, “The Positive Philosophy”, 1830)

"When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science.” (Lord Kelvin, "Electrical Units of Measurement", 1883)

“Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness.” (Arthur Lyon Bowley, “The Nature and Purpose of the Measurement of Social Phenomena”, 1915)

“Science depends upon measurement, and things not measurable are therefore excluded, or tend to be excluded, from its attention.” (Arthur J Balfour, “Address”, 1917)

“It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest.” (William E Deming, “Statistical Adjustment of Data”, 1943)

“The purpose of computing is insight, not numbers […] sometimes […] the purpose of computing numbers is not yet in sight.” (Richard Hamming, “Numerical Methods for Scientists and Engineers”, 1962)

“A quantity like time, or any other physical measurement, does not exist in a completely abstract way. We find no sense in talking about something unless we specify how we measure it. It is the definition by the method of measuring a quantity that is the one sure way of avoiding talking nonsense...” (Hermann Bondi, “Relativity and Common Sense”, 1964)

“Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate.” (Abraham Kaplan, “The Conduct of Inquiry: Methodology for Behavioral Science”, 1964)

“A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions.” (Otis D Duncan, “Introduction to Structural Equation Models”, 1975)

“Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]” (George Greenstein, “Frozen Star”, 1983)

"Changing measures are a particularly common problem with comparisons over time, but measures also can cause problems of their own. [...] We cannot talk about change without making comparisons over time. We cannot avoid such comparisons, nor should we want to. However, there are several basic problems that can affect statistics about change. It is important to consider the problems posed by changing - and sometimes unchanging - measures, and it is also important to recognize the limits of predictions. Claims about change deserve critical inspection; we need to ask ourselves whether apples are being compared to apples - or to very different objects." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Measurement is often associated with the objectivity and neatness of numbers, and performance measurement efforts are typically accompanied by hope, great expectations and promises of change; however, these are then often followed by disbelief, frustration and what appears to be sheer madness." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Measuring anything subjective always prompts perverse behavior. [...] All measurement systems are subject to abuse." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

“The value of having numbers - data - is that they aren't subject to someone else's interpretation. They are just the numbers. You can decide what they mean for you.” (Emily Oster, “Expecting Better”, 2013)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Usually, it is impossible to restate past data. As a result, all history must be whitewashed and measurement starts from scratch." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

25 December 2015

🪙Business Intelligence: Data Mesh (Just the quotes)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh attempts to strike a balance between team autonomy and inter-term interoperability and collaboration, with a few complementary techniques. It gives domain teams autonomy to have control of their local decision making, such as choosing the best data model for their data products. While it uses the computational governance policies to impose a consistent experience across all data products; for example, standardizing on the data modeling language that all domains utilize." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh is a solution for organizations that experience scale and complexity, where existing data warehouse or lake solutions have become blockers in their ability to get value from data at scale and across many functions of their business, in a timely fashion and with less friction." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data Mesh must allow for data models to change continuously without fatal impact to downstream data consumers, or slowing down access to data as a result of synchronizing change of a shared global canonical model. Data Mesh achieves this by localizing change to domains by providing autonomy to domains to model their data based on their most intimate understanding of the business without the need for central coordinations of change to a single shared canonical model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare,"Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"Data mesh architectures are inherently decentralized, and significant responsibility is delegated to the data product owners. A data mesh also benefits from a degree of centralization in the form of data product compatibility and common self-service tooling. Differing opinions, preferences, business requirements, legal constraints, technologies, and technical debt are just a few of the many factors that influence how we work together." (Adam Bellemare, "Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"A data mesh splits the boundaries of the exchange of data into multiple data products. This provides a unique opportunity to partially distribute the responsibility of data security. Each data product team can be made responsible for how their data should be accessed and what privacy policies should be applied." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules."(James Serra, "Deciphering Data Architectures", 2024)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)

22 December 2015

🪙Business Intelligence: Data Lakes (Just the Quotes)

"If you think of a Data Mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the Data Lake is a large body of water in a more natural state. [...] The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples." (James Dixon, "Pentaho, Hadoop, and Data Lakes", 2010) [sorce] [first known usage]

"A data lake represents an environment that collects and stores large volumes of structured and unstructured datasets, typically in their original, unaltered forms. More than a data depository, the data lake architecture enables the various users and data science teams to conduct data exploration and related analytical activities." (EMC Education Services, "Data Science & Big Data Analytics", 2015)

"A data lake strategy supports the introduction of a separate analytics environment that off-loads the analytics being done today on your overly expensive data warehouse. This separate analytics environment provides the data science team an on-demand, fail-fast environment for quickly ingesting and analyzing a wide variety of data sources in an attempt to address immediate business opportunities independent of the data warehouse's production schedule and service level agreement (SLA) rules." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"At its core, it is a data storage and processing repository in which all of the data in an organization can be placed so that every internal and external systems', partners', and collaborators' data flows into it and insights spring out. [...] Data Lake is a huge repository that holds every kind of data in its raw format until it is needed by anyone in the organization to analyze." (Beulah S Purra & Pradeep Pasupuleti, "Data Lake Development with Big Data", 2015) 

"Having multiple data lakes replicates the same problems that were created with multiple data warehouses - disparate data siloes and data fiefdoms that don't facilitate sharing of the corporate data assets across the organization. Organizations need to have a single data lake from which they can source the data for their BI/data warehousing and analytic needs. The data lake may never become the 'single version of the truth' for the organization, but then again, neither will the data warehouse. Instead, the data lake becomes the 'single or central repository for all the organization's data' from which all the organization's reporting and analytic needs are sourced." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"[...] the real power of the data lake is to enable advanced analytics or data science on the detailed and complete history of data in an attempt to uncover new variables and metrics that are better predictors of business performance." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"The data lake is not an incremental enhancement to the data warehouse, and it is NOT data warehouse 2.0. The data lake enables entirely new capabilities that allow your organization to address data and analytic challenges that the data warehouse could not address." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"Unfortunately, some organizations are replicating the bad data warehouse practice by creating special-purpose data lakes - data lakes to address a specific business need. Resist that urge! Instead, source the data that is needed for that specific business need into an 'analytic sandbox' where the data scientists and the business users can collaborate to find those data variables and analytic models that are better predictors of the business performance. Within the 'analytic sandbox', the organization can bring together (ingest and integrate) the data that it wants to test, build the analytic models, test the model's goodness of fit, acquire new data, refine the analytic models, and retest the goodness of fit." (Billl Schmarzo, "Driving Business Strategies with Data Science: Big Data MBA" 1st Ed., 2015)

"A data lake is a storage repository that holds a very large amount of data, often from diverse sources, in native format until needed. In some respects, a data lake can be compared to a staging area of a data warehouse, but there are key differences. Just like a staging area, a data lake is a conglomeration point for raw data from diverse sources. However, a staging area only stores new data needed for addition to the data warehouse and is a transient data store. In contrast, a data lake typically stores all possible data that might be needed for an undefined amount of analysis and reporting, allowing analysts to explore new data relationships. In addition, a data lake is usually built on commodity hardware and software such as Hadoop, whereas traditional staging areas typically reside in structured databases that require specialized servers." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data in the data lake should never get disposed. Data driven strategy must define steps to version the data and handle deletes and updates from the source systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data swamp, on the other hand, presents the devil side of a lake. A data lake in a state of anarchy is nothing but turns into a data swamp. It lacks stable data governance practices, lacks metadata management, and plays weak on ingestion framework. Uncontrolled and untracked access to source data may produce duplicate copies of data and impose pressure on storage systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data lake architecture suffers from complexity and deterioration. It creates complex and unwieldy pipelines of batch or streaming jobs operated by a central team of hyper-specialized data engineers. It deteriorates over time. Its unmanaged datasets, which are often untrusted and inaccessible, provide little value. The data lineage and dependencies are obscured and hard to track." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"When it comes to data lakes, some things usually stay constant: the storage and processing patterns. Change could come in any of the following ways: Adding new components and processing or consumption patterns to respond to new requirements. […] Optimizing existing architecture for better cost or performance" (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Delta Lake is a transactional storage software layer that runs on top of an existing data lake and adds RDW-like features that improve the lake’s reliability, security, and performance. Delta Lake itself is not storage. In most cases, it’s easy to turn a data lake into a Delta Lake; all you need to do is specify, when you are storing data to your data lake, that you want to save it in Delta Lake format (as opposed to other formats, like CSV or JSON)." (James Serra, "Deciphering Data Architectures", 2024)

17 December 2015

🪙Business Intelligence: Decision-Making (Just the Quotes)

"Charts and graphs are a method of organizing information for a unique purpose. The purpose may be to inform, to persuade, to obtain a clear understanding of certain facts, or to focus information and attention on a particular problem. The information contained in charts and graphs must, obviously, be relevant to the purpose. For decision-making purposes, information must be focused clearly on the issue or issues requiring attention. The need is not simply for 'information', but for structured information, clearly presented and narrowed to fit a distinctive decision-making context. An advantage of having a 'formula' or 'model' appropriate to a given situation is that the formula indicates what kind of information is needed to obtain a solution or answer to a specific problem." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." (Donald T Campbell, "Assessing the impact of planned social change", 1976)

"The greater the uncertainty, the greater the amount of decision making and information processing. It is hypothesized that organizations have limited capacities to process information and adopt different organizing modes to deal with task uncertainty. Therefore, variations in organizing modes are actually variations in the capacity of organizations to process information and make decisions about events which cannot be anticipated in advance." (John K Galbraith, "Organization Design", 1977)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage.". (Margaret Y Chu, "Blissful Data", 2004)

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"If you simply present data, it’s easy for your audience to say, Oh, that’s interesting, and move on to the next thing. But if you ask for action, your audience has to make a decision whether to comply or not. This elicits a more productive reaction from your audience, which can lead to a more productive conversation - one that might never have been started if you hadn’t recommended the action in the first place." (Cole N Knaflic, "Storytelling with Data: A Data Visualization Guide for Business Professionals", 2015)

"All human storytellers bring their subjectivity to their narratives. All have bias, and possibly error. Acknowledging and defusing that bias is a vital part of successfully using data stories. By debating a data story collaboratively and subjecting it to critical thinking, organizations can get much higher levels of engagement with data and analytics and impact their decision making much more than with reports and dashboards alone." (James Richardson, 2017)

"An actionable task means that it is possible to act on its result. That action might be to present a useful result to a decision maker or to proceed to a next step in a different result. An answer is actionable when it no longer needs further work to make sense of it." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Business intelligence tools can only present the facts. Removing biases and other errors in decision making are dynamics of company culture that affect how well business intelligence is used." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The problem is when biases and inaccurate data also get filtered into the gut. In this case, the gut-feel decision making should be supported with objective data, or errors in decision making may occur." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Apart from the secondary benefits of digital data, which are many, such as faster and cheaper information collection and distribution, the primary benefit is better decision making based on evidence. Despite our intellectual powers, when we allow our minds to become disconnected from reliable information about the world, we tend to screw up and make bad decisions." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"The goal of using data visualization to make better and faster decisions may lead people to think that any data visualization that is not immediately understood is a failure. Yes, a good visualization should allow you to see things that you might have missed, and to glean insights faster, but you still have to think." (Steve Wexler, "The Big Picture: How to use data visualization to make better decisions - faster", 2021)

"Current decision-making in business suffers from insight gaps. Organizations invest in data and analytics, hoping that will provide them with insights that they can use to make decisions, but in reality, there are many challenges and obstacles that get in the way of that process. One of the biggest challenges is that these organizations tend to focus on technology and hard skills only. They are definitely important, but you will not automatically get insights and better decisions with hard skills alone. Using data to make better data-informed decisions requires not only hard skills but also soft skills as well as mindsets." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is something that affects everyone and every organization. The more people who can debate, analyze, work with, and use data in their daily roles, the better data-informed decision-making will be." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

🪙Business Intelligence: Business Intelligence (Just the Quotes)

"A key sign of successful business intelligence is the degree to which it impacts business performance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Business intelligence tools can only present the facts. Removing biases and other errors in decision making are dynamics of company culture that affect how well business intelligence is used." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful business intelligence is influenced by both technical aspects and organizational aspects. In general, companies rate organizational aspects (such as executive level sponsorship) as having a higher impact on success than technical aspects. And yet, even if you do everything right from an organizational perspective, if you don’t have high quality, relevant data, your BI initiative will fail." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"There is one crucial aspect of extending the reach of business intelligence that has nothing to do with technology and that is Relevance. Understanding what information someone needs to do a job or to complete a task is what makes business intelligence relevant to that person. Much of business intelligence thus far has been relevant to power users and senior managers but not to front/line workers, customers, and suppliers." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Data migration is not just about moving data from one place to another; it should be focused on: realizing all the benefits promised by the new system when you entertained the concept of new software in the first place; creating the improved enterprise performance that was the driver for the project; importing the best, the most appropriate and the cleanest data you can so that you enhance business intelligence; maintaining all your regulatory, legal and governance compliance criteria; staying securely in control of the project." (John Morris, "Practical Data Migration", 2009)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Dashboards are collections of several linked visualizations all in one place. The idea is very popular as part of business intelligence: having current data on activity summarized and presented all in one place. One danger of cramming a lot of disparate information into one place is that you will quickly hit information overload. Interactivity and small multiples are definitely worth considering as ways of simplifying the information a reader has to digest in a dashboard. As with so many other visualizations, layering the detail for different readers is valuable." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

See also: the [definitions] and the [index] of similar posts.

13 December 2015

Businesses Intelligence: Data Analytics Myths (Just the Quotes)

"[myth:] Accuracy is more important than precision. For single best estimates, be it a mean value or a single data value, this question does not arise because in that case there is no difference between accuracy and precision. (Think of a single shot aimed at a target.) Generally, it is good practice to balance precision and accuracy. The actual requirements will differ from case to case." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"The simplicity of the process behavior chart can be deceptive. This is because the simplicity of the charts is based on a completely different concept of data analysis than that which is used for the analysis of experimental data.  When someone does not understand the conceptual basis for process behavior charts they are likely to view the simplicity of the charts as something that needs to be fixed.  Out of these urges to fix the charts all kinds of myths have sprung up resulting in various levels of complexity and obstacles to the use of one of the most powerful analysis techniques ever invented." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The field of big-data analytics is still littered with a few myths and evidence-free lore. The reasons for these myths are simple: the emerging nature of technologies, the lack of common definitions, and the non-availability of validated best practices. Whatever the reasons, these myths must be debunked, as allowing them to persist usually has a negative impact on success factors and Return on Investment (RoI). On a positive note, debunking the myths allows us to set the right expectations, allocate appropriate resources, redefine business processes, and achieve individual/organizational buy-in." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017) 

"The first myth is that prediction is always based on time-series extrapolation into the future (also known as forecasting). This is not the case: predictive analytics can be applied to generate any type of unknown data, including past and present. In addition, prediction can be applied to non-temporal (time-based) use cases such as disease progression modeling, human relationship modeling, and sentiment analysis for medication adherence, etc. The second myth is that predictive analytics is a guarantor of what will happen in the future. This also is not the case: predictive analytics, due to the nature of the insights they create, are probabilistic and not deterministic. As a result, predictive analytics will not be able to ensure certainty of outcomes." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Another myth is that we shall have a single source of truth for each concept or entity. […] This is a wonderful idea, and is placed to prevent multiple copies of out-of-date and untrustworthy data. But in reality it’s proved costly, an impediment to scale and speed, or simply unachievable. Data Mesh does not enforce the idea of one source of truth. However, it places multiple practices in place that reduces the likelihood of multiple copies of out-of-date data." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data mesh [...] reduces points of centralization that act as coordination bottlenecks. It finds a new way of decomposing the data architecture without slowing the organization down with synchronizations. It removes the gap between where the data originates and where it gets used and removes the accidental complexities - aka pipelines - that happen in between the two planes of data. Data mesh departs from data myths such as a single source of truth, or one tightly controlled canonical data model." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"I think sometimes organizations are looking at tools or the mythical and elusive data driven culture to be the strategy. Let me emphasize now: culture and tools are not strategies; they are enabling pieces." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the world of data and analytics, people get enamored by the nice, shiny object. We are pulled around by the wind of the latest technology, but in so doing we are pulled away from the sound and intelligent path that can lead us to data and analytical success. The data and analytical world is full of examples of overhyped technology or processes, thinking this thing will solve all of the data and analytical needs for an individual or organization. Such topics include big data or data science. These two were pushed into our minds and down our throats so incessantly over the past decade that they are somewhat of a myth, or people finally saw the light. In reality, both have a place and do matter, but they are not the only solution to your data and analytical needs. Unfortunately, though, organizations bit into them, thinking they would solve everything, and were left at the alter, if you will, when it came time for the marriage of data and analytical success with tools." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Unlike other analytical data management paradigms, data mesh does not embrace the concept of the mythical single source of truth. Every data product provides a truthful portion of the reality - for a particular domain - to the best of its ability, a single slice of truth." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

05 December 2015

🪙Business Intelligence: Indicators (Just the Quotes)

"If we view organizations as adaptive, problem-solving structures, then inferences about effectiveness have to be made, not from static measures of output, but on the basis of the processes through which the organization approaches problems. In other words, no single measurement of organizational efficiency or satisfaction - no single time-slice of organizational performance can provide valid indicators of organizational health." (Warren G Bennis, "General Systems Yearbook", 1962)

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." (Donald T Campbell, "Assessing the impact of planned social change", 1976)

"Indicators tend to direct your attention toward what they are monitoring. It is like riding a bicycle: you will probably steer it where you are looking. If, for example, you start measuring your inventory levels carefully, you are likely to take action to drive your inventory levels down, which is good up to a point. But your inventories could become so lean that you can’t react to changes in demand without creating shortages. So because indicators direct one’s activities, you should guard against overreacting. This you can do by pairing indicators, so that together both effect and counter-effect are measured. Thus, in the inventory example, you need to monitor both inventory levels and the incidence of shortages. A rise in the latter will obviously lead you to do things to keep inventories from becoming too low." (Andrew S Grove, "High Output Management", 1983)

"So because indicators direct one’s activities, you should guard against overreacting. This you can do by pairing indicators, so that together both effect and counter-effect are measured. […] In sum, joint monitoring is likely to keep things in the optimum middle ground." (Andrew S Grove, "High Output Management", 1983)

"The first rule is that a measurement - any measurement - is better than none. But a genuinely effective indicator will cover the output of the work unit and not simply the activity involved. […] If you do not systematically collect and maintain an archive of indicators, you will have to do an awful lot of quick research to get the information you need, and by the time you have it, the problem is likely to have gotten worse." (Andrew S Grove, "High Output Management", 1983)

"The number of possible indicators you can choose is virtually limitless, but for any set of them to be useful, you have to focus each indicator on a specific operational goal. […] Put another way, which five pieces of information would you want to look at each day, immediately upon arriving at your office?" (Andrew S Grove, "High Output Management", 1983)

"All good KPIs that I have come across, that have made a difference, had the CEO’s constant attention, with daily calls to the relevant staff. [...] A KPI should tell you about what action needs to take place. [...] A KPI is deep enough in the organization that it can be tied down to an individual. [...] A good KPI will affect most of the core CSFs and more than one BSC perspective. [...] A good KPI has a flow on effect." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"If the KPIs you currently have are not creating change, throw them out because there is a good chance that they may be wrong. They are probably measures that were thrown together without the in-depth research and investigation KPIs truly deserve." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Key performance indicators (KPIs) are the vital navigation instruments used by managers to understand whether their business is on a successful voyage or whether it is veering off the prosperous path. The right set of indicators will shine light on performance and highlight areas that need attention. ‘What gets measured gets done’ and ‘if you can’t measure it, you can’t manage it’ are just two of the popular sayings used to highlight the critical importance of metrics. Without the right KPIs managers are sailing blind." (Bernard Marr, "Key Performance Indicators (KPI): The 75 measures every manager needs to know", 2011)

"KRAs and KPIs KRA and KPI are two confusing acronyms for an approach commonly recommended for identifying a person’s major job responsibilities. KRA stands for key result areas; KPI stands for key performance indicators. As academics and consultants explain this jargon, key result areas are the primary components or parts of the job in which a person is expected to deliver results. Key performance indicators represent the measures that will be used to determine how well the individual has performed. In other words, KRAs tell where the individual is supposed to concentrate her attention; KPIs tell how her performance in the specified areas should be measured. Probably few parts of the performance appraisal process create more misunderstanding and bewilderment than do the notion of KRAs and KPIs. The reason is that so much of the material written about KPIs and KRAs is both." (Dick Grote, "How to Be Good at Performance Appraisals: Simple, Effective, Done Right", 2011)

"A statistical index has all the potential pitfalls of any descriptive statistic - plus the distortions introduced by combining multiple indicators into a single number. By definition, any index is going to be sensitive to how it is constructed; it will be affected both by what measures go into the index and by how each of those measures is weighted." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Even if you have a solid indicator of what you are trying to measure and manage, the challenges are not over. The good news is that 'managing by statistics' can change the underlying behavior of the person or institution being managed for the better. If you can measure the proportion of defective products coming off an assembly line, and if those defects are a function of things happening at the plant, then some kind of bonus for workers that is tied to a reduction in defective products would presumably change behavior in the right kinds of ways. Each of us responds to incentives (even if it is just praise or a better parking spot). Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Defining an indicator as lagging, coincident, or leading is connected to another vital notion: the business cycle. Indicators are lagging or leading based on where economists believe we are in the business cycle: whether we are heading into a recession or emerging from one." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] economics is a profession grounded in the belief that 'the economy' is a machine and a closed system. The more clearly that machine is understood, the more its variables are precisely measured, the more we will be able to manage and steer it as we choose, avoiding the frenetic expansions and sharp contractions. With better indicators would come better policy, and with better policy, states would be less likely to fall into depression and risk collapse." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are meaningless unless they exist in some context. One reason why the indicators have become more central and potent over time is that the longer they have been kept, the easier it is to find useful patterns and points of reference." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The indicators - through no particular fault of anyone in particular - have not kept up with the changing world. As these numbers have become more deeply embedded in our culture as guides to how we are doing, we rely on a few big averages that can never be accurate pictures of complicated systems for the very reason that they are too simple and that they are averages. And we have neither the will nor the resources to invent or refine our current indicators enough to integrate all of these changes." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"We don’t need new indicators that replace old simple numbers with new simple numbers. We need instead bespoke indicators, tailored to the specific needs and specific questions of governments, businesses, communities, and individuals." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Yet our understanding of the world is still framed by our leading indicators. Those indicators define the economy, and what they say becomes the answer to the simple question 'Are we doing well?'" (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] an overall green status indicator doesn’t mean anything most of the time. All it says is that the things under measurement seem okay. But there always will be many more things not under measurement. To celebrate green indicators is to ignore the unknowns. […] The tendency to roll up metrics into dashboards promotes ignorance of the real situation on the ground. We forget that we only see what is under measurement. We only act when something is not green." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Financial measures are a quantification of an activity that has taken place; we have simply placed a value on the activity. Thus, behind every financial measure is an activity. I call financial measures result indicators, a summary measure. It is the activity that you will want more or less of. It is the activity that drives the dollars, pounds, or yen. Thus financial measures cannot possibly be KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key performance indicators (KPIs) are those indicators that focus on the aspects of organizational performance that are the most critical for the current and future success of the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key Performance Indicators (KPIs) in many organizations are a broken tool. The KPIs are often a random collection prepared with little expertise, signifying nothing. [...] KPIs should be measures that link daily activities to the organization’s critical success factors (CSFs), thus supporting an alignment of effort within the organization in the intended direction." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Most organizational measures are very much past indicators measuring events of the last month or quarter. These indicators cannot be and never were KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"We need indicators of overall performance that need only be reviewed on a monthly or bimonthly basis. These measures need to tell the story about whether the organization is being steered in the right direction at the right speed, whether the customers and staff are happy, and whether we are acting in a responsible way by being environmentally friendly. These measures are called key result indicators (KRIs)." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Indicators represent a way of 'distilling' the larger volume of data collected by organizations. As data become bigger and bigger, due to the greater span of control or growing complexity of operations, data management becomes increasingly difficult. Actions and decisions are greatly influenced by the nature, use and time horizon (e.g., short or long-term) of indicators." (Fiorenzo Franceschini et al, "Designing Performance Measurement Systems: Theory and Practice of Key Performance Indicators", 2019)

"Indicators take on the role of real 'conceptual technologies', capable of driving organizational management in intangible terms, conditioning the 'what' to focus and the 'how'; in other words, they become the beating heart of the management, operational and technological processes." (Fiorenzo Franceschini et al, "Designing Performance Measurement Systems: Theory and Practice of Key Performance Indicators", 2019)

"Monitoring a process requires identifying specific activities, responsibilities and indicators for testing effectiveness and efficiency. Effectiveness means setting the right goals and objectives, making sure that they are properly accomplished (doing the right things); effectiveness is measured comparing the achieved results with target objectives. On the other hand, efficiency means getting the most (output) from the available (input) resources (doing things right): efficiency defines a link between process performance and available resources." (Fiorenzo Franceschini et al, "Designing Performance Measurement Systems: Theory and Practice of Key Performance Indicators", 2019)

"People do care about how they are measured. What can we do about this? If you are in the position to measure something, think about whether measuring it will change people’s behaviors in ways that undermine the value of your results. If you are looking at quantitative indicators that others have compiled, ask yourself: Are these numbers measuring what they are intended to measure? Or are people gaming the system and rendering this measure useless?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A KPI is a performance measure that demonstrates how effectively an organisation is achieving its critical objectives. They are used to track performance over a period of time to ensure the organisation is heading in the desired direction, and are quantifiable to guide whether activities need to be dialled up or down, resources adjusted or management resource focused on understanding what is in play that may be holding back the organisation." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

"The KPI juggernaut has been misused and abused in too many organisations to the extent it has devalued the concept of KPIs. KPIs used well - the ten things that really matter to an organisation - can, in my experience, be a real galvanising force to get focus and attention put in those areas which really can make a difference. The rest is a distraction, there through some misplaced view that more adds value when actually it detracts through losing the focus from where it needs to be." (Ian Wallis, "Data Strategy: From definition to execution", 2021)

04 December 2015

🪙Business Intelligence: Measures/Metrics (Just the Quotes)

"The most important and frequently stressed prescription for avoiding pitfalls in the use of economic statistics, is that one should find out before using any set of published statistics, how they have been collected, analysed and tabulated. This is especially important, as you know, when the statistics arise not from a special statistical enquiry, but are a by-product of law or administration. Only in this way can one be sure of discovering what exactly it is that the figures measure, avoid comparing the non-comparable, take account of changes in definition and coverage, and as a consequence not be misled into mistaken interpretations and analysis of the events which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"If we view organizations as adaptive, problem-solving structures, then inferences about effectiveness have to be made, not from static measures of output, but on the basis of the processes through which the organization approaches problems. In other words, no single measurement of organizational efficiency or satisfaction - no single time-slice of organizational performance can provide valid indicators of organizational health." (Warren G Bennis, "General Systems Yearbook", 1962)

"[Management by objectives is] a process whereby the superior and the subordinate managers of an enterprise jointly identify its common goals, define each individual's major areas of responsibility in terms of the results expected of him, and use these measures as guides for operating the unit and assessing the contribution of each of its members." (Robert House, "Administrative Science Quarterly", 1971)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." (Charles Goodhart, "Problems of Monetary Management: the U.K. Experience", 1975)

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." (Donald T Campbell, "Assessing the impact of planned social change", 1976)

"Reengineering is the fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical contemporary measures of performance such as cost, quality, service and speed." (James A Champy & Michael M Hammer, "Reengineering the Corporation", 1993)

"Industrial managers faced with a problem in production control invariably expect a solution to be devised that is simple and unidimensional. They seek the variable in the situation whose control will achieve control of the whole system: tons of throughput, for example. Business managers seek to do the same thing in controlling a company; they hope they have found the measure of the entire system when they say 'everything can be reduced to monetary terms'." (Stanford Beer, "Decision and Control", 1994)

"A strategy is a set of hypotheses about cause and effect. The measurement system should make the relationships (hypotheses) among objectives (and measures) in the various perspectives explicit so that they can be managed and validated. The chain of cause and effect should pervade all four perspectives of a Balanced Scorecard." (Robert S Kaplan & David P Norton, "The Balanced Scorecard", Harvard Business Review, 1996)

"The Balanced Scorecard has its greatest impact when it is deployed to drive organizational change. [...] The Balanced Scorecard is primarily a mechanism for strategy implementation, not for strategy formulation. It can accommodate either approach for formulating business unit strategy-starting from the customer perspective, or starting from excellent internal-business-process capabilities. For whatever approach that SBU senior executives use to formulate their strategy, the Balanced Scorecard will provide an invaluable mechanism for translating that strategy into specific objectives, measures, and targets, and monitoring the implementation of that strategy during subsequent periods." (Robert S Kaplan & David P Norton, "The Balanced Scorecard", Harvard Business Review, 1996)

"The Balanced Scorecard translates mission and strategy into objectives and measures, organized into four different perspectives: financial, customer, internal business process, and learning and growth. The scorecard provides a framework, a language, to communicate mission and strategy; it uses measurement to inform employees about the drivers of current and future success." (Robert S Kaplan & David P Norton, "The Balanced Scorecard", Harvard Business Review, 1996)

"When a measure becomes a target, it ceases to be a good measure." (Marilyn Strathern, "‘Improving ratings’: audit in the British University system", 1997)

"Since the average is a measure of location, it is common to use averages to compare two data sets. The set with the greater average is thought to ‘exceed’ the other set. While such comparisons may be helpful, they must be used with caution. After all, for any given data set, most of the values will not be equal to the average." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"First, good statistics are based on more than guessing. [...] Second, good statistics are based on clear, reasonable definitions. Remember, every statistic has to define its subject. Those definitions ought to be clear and made public. [...] Third, good statistics are based on clear, reasonable measures. Again, every statistic involves some sort of measurement; while all measures are imperfect, not all flaws are equally serious. [...] Finally, good statistics are based on good samples." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"If the KPIs you currently have are not creating change, throw them out because there is a good chance that they may be wrong. They are probably measures that were thrown together without the in-depth research and investigation KPIs truly deserve." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Key performance indicators (KPIs) are the vital navigation instruments used by managers to understand whether their business is on a successful voyage or whether it is veering off the prosperous path. The right set of indicators will shine light on performance and highlight areas that need attention. ‘What gets measured gets done’ and ‘if you can’t measure it, you can’t manage it’ are just two of the popular sayings used to highlight the critical importance of metrics. Without the right KPIs managers are sailing blind." (Bernard Marr, "Key Performance Indicators (KPI): The 75 measures every manager needs to know", 2011)

"A statistical index has all the potential pitfalls of any descriptive statistic - plus the distortions introduced by combining multiple indicators into a single number. By definition, any index is going to be sensitive to how it is constructed; it will be affected both by what measures go into the index and by how each of those measures is weighted." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Even if you have a solid indicator of what you are trying to measure and manage, the challenges are not over. The good news is that 'managing by statistics' can change the underlying behavior of the person or institution being managed for the better. If you can measure the proportion of defective products coming off an assembly line, and if those defects are a function of things happening at the plant, then some kind of bonus for workers that is tied to a reduction in defective products would presumably change behavior in the right kinds of ways. Each of us responds to incentives (even if it is just praise or a better parking spot). Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons […] The advantage of any index is that it consolidates lots of complex information into a single number. We can then rank things that otherwise defy simple comparison […] Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades." (Charles Wheelan, "Naked Statistics: Stripping the Dread from the Data", 2012)

"No subjective metric can escape strategic gaming [...] The possibility of mischief is bottomless. Fighting ratings is fruitless, as they satisfy a very human need. If one scheme is beaten down, another will take its place and wear its flaws. Big Data just deepens the danger. The more complex the rating formulas, the more numerous the opportunities there are to dress up the numbers. The larger the data sets, the harder it is to audit them." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Until a new metric generates a body of data, we cannot test its usefulness. Lots of novel measures hold promise only on paper." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"[…] an overall green status indicator doesn’t mean anything most of the time. All it says is that the things under measurement seem okay. But there always will be many more things not under measurement. To celebrate green indicators is to ignore the unknowns. […] The tendency to roll up metrics into dashboards promotes ignorance of the real situation on the ground. We forget that we only see what is under measurement. We only act when something is not green." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Financial measures are a quantification of an activity that has taken place; we have simply placed a value on the activity. Thus, behind every financial measure is an activity. I call financial measures result indicators, a summary measure. It is the activity that you will want more or less of. It is the activity that drives the dollars, pounds, or yen. Thus financial measures cannot possibly be KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"'Getting it right the first time' is a rare achievement, and ascertaining the organization’s winning KPIs and associated reports is no exception. The performance measure framework and associated reporting is just like a piece of sculpture: you can be criticized on taste and content, but you can’t be wrong. The senior management team and KPI project team need to ensure that the project has a just-do-it culture, not one in which every step and measure is debated as part of an intellectual exercise." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"In order to get measures to drive performance, a reporting framework needs to be developed at all levels within the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Most organizational measures are very much past indicators measuring events of the last month or quarter. These indicators cannot be and never were KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Rolling up fine-grained metrics to create high-level dashboards puts pressure on teams to keep the fine-grained metrics green even when it might not be the best use of their time." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Scaling supervision using metrics is one thing; scaling results is quite another. The former doesn’t automatically ensure the latter." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"We need indicators of overall performance that need only be reviewed on a monthly or bimonthly basis. These measures need to tell the story about whether the organization is being steered in the right direction at the right speed, whether the customers and staff are happy, and whether we are acting in a responsible way by being environmentally friendly. These measures are called key result indicators (KRIs)." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Unfortunately, setting the scale at zero is the best recipe for creating dull charts, in both senses of the word: boring and with little variation. The solution is not to break the scale, but rather to find a similar message that can be communicated using alternative metrics." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"Statistical metrics can show us facts and trends that would be impossible to see in any other way, but often they’re used as a substitute for relevant experience, by managers or politicians without specific expertise or a close-up view." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

02 December 2015

🪙Business Intelligence: Reporting (Just the Quotes)

"A man's judgment cannot be better than the information on which he has based it. Give him no news, or present him only with distorted and incomplete data, with ignorant, sloppy, or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning process and make him somewhat less than a man." (Arthur H Sulzberger, [speech] 1948)

"The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, 'opinion' polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"It is probable that one day we shall begin to draw organization charts as a series of linked groups rather than as a hierarchical structure of individual 'reporting' relationships." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"[...] as the planning process proceeds to a specific financial or marketing state, it is usually discovered that a considerable body of 'numbers' is missing, but needed numbers for which there has been no regular system of collection and reporting; numbers that must be collected outside the firm in some cases. This serendipity usually pays off in a much better management information system in the form of reports which will be collected and reviewed routinely." (William H. Franklin Jr., Financial Strategies, 1987)

"Intangible assets [...] surpass physical assets in most business enterprises, both in value and contribution to growth, yet they are routinely expensed in the financial reports and hence remain absent from corporate balance sheets. This asymmetric treatment of capitalizing (considering as assets) physical and financial investment while expensing intangibles leads to biased and deficient reporting of firms’ performance and value." (Baruch Lev, "Intangibles: Management, Measurement, and Reporting", 2000)

"Project planning is the key to effective project management. Detailed and accurate planning of a project produces the managerial information that is the basis of project justification (costs, benefits, strategic impact, etc.) and the defining of the business drivers (scope, objectives) that form the context for the technical solution. In addition, project planning also produces the project schedules and resource allocations that are the framework for the other project management processes: tracking, reporting, and review." (Rob Thomsett, "Radical Project Management", 2002)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage 'what gets measured gets done' still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Reporting to the Board is a classic 'catch-22' situation. Boards complain about getting too much information too late, and management complains that up to 20% of their time is tied up in the Board reporting process. Boards obviously need to ascertain whether management is steering the ship correctly and the state of the crew and customers before they can relax and 'strategize' about future initiatives. The process of assessing the current status of the organization from the most recent Board report is where the principal problem lies. Board reporting needs to occur more efficiently and effectively for both the Board and management." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"A dashboard is like the executive summary of a report. We read executive summaries and skip the body of the report if the summary is more or less in line with our expectations. Trouble is, measurement is never exhaustive. It is only when we dive in that we realize what areas may have been missed." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"'Getting it right the first time' is a rare achievement, and ascertaining the organization’s winning KPIs and associated reports is no exception. The performance measure framework and associated reporting is just like a piece of sculpture: you can be criticized on taste and content, but you can’t be wrong. The senior management team and KPI project team need to ensure that the project has a just-do-it culture, not one in which every step and measure is debated as part of an intellectual exercise." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"In order to get measures to drive performance, a reporting framework needs to be developed at all levels within the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

🪙Business Intelligence: Analytics (Just the Quotes)

"Data are essential, but performance improvements and competitive advantage arise from analytics models that allow managers to predict and optimize outcomes. More important, the most effective approach to building a model rarely starts with the data; instead it originates with identifying the business opportunity and determining how the model can improve performance." (Dominic Barton & David Court, "Making Advanced Analytics Work for You", 2012) 

"Even with simple and usable models, most organizations will need to upgrade their analytical skills and literacy. Managers must come to view analytics as central to solving problems and identifying opportunities - to make it part of the fabric of daily operations." (Dominic Barton & David Court, "Making Advanced Analytics Work for You", 2012)

"There is another important distinction pertaining to mining data: the difference between (1) mining the data to find patterns and build models, and (2) using the results of data mining. Students often confuse these two processes when studying data science, and managers sometimes confuse them when discussing business analytics. The use of data mining results should influence and inform the data mining process itself, but the two should be kept distinct." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so." (Pedro Domingos, "The Master Algorithm", 2015)

"The human side of analytics is the biggest challenge to implementing big data." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"One important thing to bear in mind about the outputs of data science and analytics is that in the vast majority of cases they do not uncover hidden patterns or relationships as if by magic, and in the case of predictive analytics they do not tell us exactly what will happen in the future. Instead, they enable us to forecast what may come. In other words, once we have carried out some modelling there is still a lot of work to do to make sense out of the results obtained, taking into account the constraints and assumptions in the model, as well as considering what an acceptable level of reliability is in each scenario." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"One of the biggest truths about the real–time analytics is that nothing is actually real–time; it's a myth. In reality, it's close to real–time. Depending upon the performance and ability of a solution and the reduction of operational latencies, the analytics could be close to real–time, but, while day-by-day we are bridging the gap between real–time and near–real–time, it's practically impossible to eliminate the gap due to computational, operational, and network latencies." (Shilpi Saxena & Saurabh Gupta, "Practical Real-time Data Processing and Analytics", 2017)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017) 

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"For advanced analytics, a well-designed data pipeline is a prerequisite, so a large part of your focus should be on automation. This is also the most difficult work. To be successful, you need to stitch everything together." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"Data literacy is not a change in an individual’s abilities, talents, or skills within their careers, but more of an enhancement and empowerment of the individual to succeed with data. When it comes to data and analytics succeeding in an organization’s culture, the increase in the workforces’ skills with data literacy will help individuals to succeed with the strategy laid in front of them. In this way, organizations are not trying to run large change management programs; the process is more of an evolution and strengthening of individual’s talents with data. When we help individuals do more with data, we in turn help the organization’s culture do more with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the world of data and analytics, people get enamored by the nice, shiny object. We are pulled around by the wind of the latest technology, but in so doing we are pulled away from the sound and intelligent path that can lead us to data and analytical success. The data and analytical world is full of examples of overhyped technology or processes, thinking this thing will solve all of the data and analytical needs for an individual or organization. Such topics include big data or data science. These two were pushed into our minds and down our throats so incessantly over the past decade that they are somewhat of a myth, or people finally saw the light. In reality, both have a place and do matter, but they are not the only solution to your data and analytical needs. Unfortunately, though, organizations bit into them, thinking they would solve everything, and were left at the alter, if you will, when it came time for the marriage of data and analytical success with tools." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

04 August 2015

Statistics: Median (Definitions)

"The middle value in an ordered set of values for which there are an equal number of values." (Jennifer George-Palilonis, "A Practical Guide to Graphics Reporting", 2006)

"The center-most value in an ordered set of values. If the set quantity is even, then the average of the two center-most values." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The median is a statistical measure of variation. It represents the middle measurement when a set of measurements are collected in ascending order: 50% of the measurements are above the median and 50% are below it." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The middle value in a set of ordered numbers. The median value is determined by choosing the smallest value such that at least half of the values in the set are no greater than the chosen value. If the number of values within the set is odd, the median value corresponds to a single value. If the number of values within the set is even, the median value corresponds to the sum of the two middle values divided by two." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The middle value in a set of values. Half the values fall below the median, and half the values fall above the median. See also average; mode." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)

"To find the median, list the values of the data set in numerical order and identify which value appears in the middle of the list." (Christopher Donohue et al, "Foundations of Financial Risk: An Overview of Financial Risk and Risk-based Financial Regulation, 2nd Ed", 2015)

"Middle score in a distribution." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

Statistics: Mean (Definitions)

"In a numerical sequence, the number that has an equal number of values before and after it. In the sequence 3, 5, 7, 9, 11, seven is the mean." (Dale Furtwengler, "Ten Minute Guide to Performance Appraisals", 2000)

"The average value of a sample of data that is typically gathered in a matrix experiment." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"The sum of all values in a variable divided by the number of values." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"The average value of a sample of data that is typically gathered in a matrix experiment." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"The sum of all values in a variable divided by the number of values." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2007)

"The result of dividing the sum of all values within a set by the count of all values included." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The mean is a statistical measure of central tendency. It is most easily understood as the mathematical average. It is calculated by summing the value of a set of measurements and dividing by the number of measurements taken." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement", 2012)

"To find the mean add up the values in the data set and then divide by the number of values." (Christopher Donohue et al, "Foundations of Financial Risk: An Overview of Financial Risk and Risk-based Financial Regulation" 2nd Ed., 2015)

"Arithmetic averages of scores. The mean is the most commonly used measure of central tendency, but should be computed only for score data." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.