29 December 2018

Data Science: Data (Just the Quotes)

"Before anything can be reasoned upon to a conclusion, certain facts, principles, or data, to reason from, must be established, admitted, or denied." (Thomas Paine, "Rights of Man", 1791) 

"The errors which arise from the absence of facts are far more numerous and more durable than those which result from unsound reasoning respecting true data." (Charles Babbage, "On the Economy of Machinery and Manufactures", 1832)

"In every branch of knowledge the progress is proportional to the amount of facts on which to build, and therefore to the facility of obtaining data." (James C Maxwell, [letter to Lewis Campbell] 1851)

"It is a capital mistake to theorise before one has data." (Arthur C Doyle, "The Adventures of Sherlock Holmes", 1892)

"The man of science, by virtue of his training, is alone capable of realising the difficulties - often enormous - of obtaining accurate data upon which just judgment may be based." (Sir Richard Gregory, "Discovery; or, The Spirit and Service of Science", 1918)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"When evaluating the reliability and generality of data, it is often important to know the aims of the experimenter. When evaluating the importance of experimental results, however, science has a trick of disregarding the experimenter's rationale and finding a more appropriate context for the data than the one he proposed." (Murray Sidman, "Tactics of Scientific Research", 1960)

"Philosophers of science have repeatedly demonstrated that more than one theoretical construction can always be placed upon a given collection of data." (Thomas Kuhn, "The Structure of Scientific Revolutions", 1962) 

"Modern science is characterized by its ever-increasing specialization, necessitated by the enormous amount of data, the complexity of techniques and of theoretical structures within every field. Thus science is split into innumerable disciplines continually generating new subdisciplines. In consequence, the physicist, the biologist, the psychologist and the social scientist are, so to speak, encapusulated in their private universes, and it is difficult to get word from one cocoon to the other." (Ludwig von Bertalanffy, "General System Theory", 1968)

"At root what is needed for scientific inquiry is just receptivity to data, skill in reasoning, and yearning for truth. Admittedly, ingenuity can help too." (Willard v O Quine, "The Web of Belief", 1970)

"Statistical methods of analysis are intended to aid the interpretation of data that are subject to appreciable haphazard variability." (David V. Hinkley & David Cox, "Theoretical Statistics", 1974)

"In a way, science might be described as paranoid thinking applied to Nature: we are looking for natural conspiracies, for connections among apparently disparate data." (Carl Sagan, "The Dragons of Eden", 1977)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

"There is a tendency to mistake data for wisdom, just as there has always been a tendency to confuse logic with values, intelligence with insight. Unobstructed access to facts can produce unlimited good only if it is matched by the desire and ability to find out what they mean and where they lead." (Norman Cousins, "Human Options : An Autobiographical Notebook", 1981) 

"Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]" (George Greenstein, "Frozen Star", 1983)

"Data is raw. It simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself. In computer parlance, a spreadsheet generally starts out by holding data." (Russell L Ackoff, "Towards a Systems Theory of Organization, 1985)

"Information is data that has been given meaning by way of relational connection. This "meaning" can be useful, but does not have to be. In computer parlance, a relational database makes information from the data stored within it." (Russell L Ackoff, "Towards a Systems Theory of Organization", 1985)

"Intuition becomes increasingly valuable in the new information society precisely because there is so much data." (John Naisbitt, "Re-Inventing the Corporation", 1988)

"The unit of coding is the most basic segment, or element, of the raw data or information that can be assessed in a meaningful way regarding the phenomenon." (Richard Boyatzis, "Transforming qualitative information", 1998)

"Data are collected as a basis for action. Yet before anyone can use data as a basis for action the data have to be interpreted. The proper interpretation of data will require that the data be presented in context, and that the analysis technique used will filter out the noise."  (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"While all data contain noise, some data contain signals. Before you can detect a signal, you must filter out the noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] you simply cannot make sense of any number without a contextual basis. Yet the traditional attempts to provide this contextual basis are often flawed in their execution. [...] Data have no meaning apart from their context. Data presented without a context are effectively rendered meaningless.(Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"The more data we have, the more likely we are to drown in it." (Nassim N Taleb, "Fooled by Randomness", 2001)

"Data is a fact of life. As time goes by, we collect more and more data, making our original reason for collecting the data harder to accomplish. We don't collect data just to waste time or keep busy; we collect data so that we can gain knowledge, which can be used to improve the efficiency of our organization, improve profit margins, and on and on. The problem is that as we collect more data, it becomes harder for us to use the data to derive this knowledge. We are being suffocated by this raw data, yet we need to find a way to use it." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage.". (Margaret Y Chu, "Blissful Data", 2004)

"Perception requires imagination because the data people encounter in their lives are never complete and always equivocal. [...] We also use our imagination and take shortcuts to fill gaps in patterns of nonvisual data. As with visual input, we draw conclusions and make judgments based on uncertain and incomplete information, and we conclude, when we are done analyzing the patterns, that out picture is clear and accurate. But is it?" (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The value of having numbers - data - is that they aren't subject to someone else's interpretation. They are just the numbers. You can decide what they mean for you." (Emily Oster, "Expecting Better", 2013)

"A study that leaves out data is waving a big red flag. A decision to include or exclude data sometimes makes all the difference in the world. This decision should be based on the relevance and quality of the data, not on whether the data support or undermine a conclusion that is expected or desired." (Gary Smith, "Standard Deviations", 2014)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Data without theory can fuel a speculative stock market bubble or create the illusion of a bubble where there is none. How do we tell the difference between a real bubble and a false alarm? You know the answer: we need a theory. Data are not enough. […] Data without theory is alluring, but misleading." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"We naturally draw conclusions from what we see […]. We should also think about what we do not see […]. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward." (Gary Smith, "Standard Deviations", 2014)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context."  (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"To find signals in data, we must learn to reduce the noise - not just the noise that resides in the data, but also the noise that resides in us. It is nearly impossible for noisy minds to perceive anything but noise in data. […] Signals always point to something. In this sense, a signal is not a thing but a relationship. Data becomes useful knowledge of something that matters when it builds a bridge between a question and an answer. This connection is the signal." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see." (Dhanurjay Patil, "A Code of Ethics for Data Science", 2016)

"The first epistemic principle to embrace is that there is always a gap between our data and the real world. We fall headfirst into a pitfall when we forget that this gap exists, that our data isn't a perfect reflection of the real-world phenomena it's representing. Do people really fail to remember this? It sounds so basic. How could anyone fall into such an obvious trap?" (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"In general, the more complex the data, the more the analyst has to make prior inferences of what is considered normal for modeling purposes." (Charu C Aggarwal, "Artificial Intelligence: A Textbook", 2021)

"Understanding the entire data ecosystem, from the production of a data point to its consumption in a dashboard or a visualization, provides the ability to invoke action, which is more valuable than the mere sum of its parts." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Data has historically been treated as a second-class citizen, as a form of exhaust or by-product emitted by business applications. This application-first thinking remains the major source of problems in today’s computing environments, leading to ad hoc data pipelines, cobbled together data access mechanisms, and inconsistent sources of similar-yet-different truths. Data mesh addresses these shortcomings head-on, by fundamentally altering the relationships we have with our data. Instead of a secondary by-product, data, and the access to it, is promoted to a first-class citizen on par with any other business service." (Adam Bellemare,"Building an Event-Driven Data Mesh: Patterns for Designing and Building Event-Driven Architectures", 2023)

"'Let the data speak'" is a catchy and powerful slogan, but [...] data itself is not always enough. It’s worth remembering that in many cases 'data cannot speak for themselves' and we might need more information than just observations to address some of our questions." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"Data are most valuable at their point of origin. The value of data is directly related to their timeliness." (Lawrence M Miller)

"Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control." (William E Deming)

More quotes on" Data" at the-web-of-knowledge.blogspot.com.

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.