19 December 2018

🔭Data Science: Sampling (Just the Quotes)

"By a small sample we may judge of the whole piece." (Miguel de Cervantes, "Don Quixote de la Mancha", 1605–1615)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"The postulate of randomness thus resolves itself into the question, 'of what population is this a random sample?' which must frequently be asked by every practical statistician." (Ronald Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922)

"The principle underlying sampling is that a set of objects taken at random from a larger group tends to reproduce the characteristics of that larger group: this is called the Law of Statistical Regularity. There are exceptions to this rule, and a certain amount of judgment must be exercised, especially when there are a few abnormally large items in the larger group. With erratic data, the accuracy of sampling can often be tested by comparing several samples. On the whole, the larger the sample the more closely will it tend to resemble the population from which it is taken; too small a sample would not give reliable results." (Lewis R Connor, "Statistics in Theory and Practice", 1932)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"A good sample-design is lost if it is not carried out according to plans." (W Edwards Deming, "Some Theory of Sampling", 1950)

"Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability." (William E Deming, "Some Theory of Sampling", 1950)

"Almost any sort of inquiry that is general and not particular involves both sampling and measurement […]. Further, both the measurement and the sampling will be imperfect in almost every case. We can define away either imperfection in certain cases. But the resulting appearance of perfection is usually only an illusion." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"By sampling we can learn only about collective properties of populations, not about properties of individuals. We can study the average height, the percentage who wear hats, or the variability in weight of college juniors [...]. The population we study may be small or large, but there must be a population - and what we are studying must be a population characteristic. By sampling, we cannot study individuals as particular entities with unique idiosyncrasies; we can study regularities (including typical variabilities as well as typical levels) in a population as exemplified by the individuals in the sample." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"In many cases general probability samples can be thought of in terms of (1) a subdivision of the population into strata, (2) a self-weighting probability sample in each stratum, and (3) combination of the stratum sample means weighted by the size of the stratum." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"Precision is expressed by an international standard, viz., the standard error. It measures the average of the difference between a complete coverage and a long series of estimates formed from samples drawn from this complete coverage by a particular procedure or drawing, and processed by a particular estimating formula." (W Edwards Deming, "On the Presentation of the Results of Sample Surveys as Legal Evidence", Journal of the American Statistical Association Vol 49 (268), 1954)

"The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"Weighing a sample appropriately is no more fudging the data than is correcting a gas volume for barometric pressure." (Frederick Mosteller, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1964)

"[...] a priori reasons for believing that the null hypothesis is generally false anyway. One of the common experiences of research workers is the very high frequency with which significant results are obtained with large samples." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"Entropy theory is indeed a first attempt to deal with global form; but it has not been dealing with structure. All it says is that a large sum of elements may have properties not found in a smaller sample of them." (Rudolf Arnheim, "Entropy and Art: An Essay on Disorder and Order", 1974) 

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"Since a point hypothesis is not to be expected in practice to be exactly true, but only approximate, a proper test of significance should almost always show significance for large enough samples. So the whole game of testing point hypotheses, power analysis notwithstanding, is but a mathematical game without empirical importance." (Louis Guttman, "The illogic of statistical inference for cumulative science", Applied Stochastic Models and Data Analysis, 1985)

"The law of truly large numbers states: With a large enough sample, any outrageous thing is likely to happen." (Frederick Mosteller, "Methods for Studying Coincidences", Journal of the American Statistical Association Vol. 84, 1989)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen,"Things I Have Learned (So Far)", American Psychologist, 1990)

"When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method forg athering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"Forget 'large-sample' methods. In the real world of experiments samples are so nearly always 'small' that it is not worth making any distinction, and small-sample methods are no harder to apply." (George Dyke, "How to avoid bad statistics", 1997)

"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)

"When the sample size is small or the study is of one organization, descriptive use of the thematic coding is desirable." (Richard Boyatzis, "Transforming qualitative information", 1998)

"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems.[...] The best samples are those that come as close as possible to being random." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"There are two problems with sampling - one obvious, and  the other more subtle. The obvious problem is sample size. Samples tend to be much smaller than their populations. [...] Obviously, it is possible to question results based on small samples. The smaller the sample, the less confidence we have that the sample accurately reflects the population. However, large samples aren't necessarily good samples. This leads to the second issue: the representativeness of a sample is actually far more important than sample size. A good sample accurately reflects (or 'represents') the population." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"I have always thought that statistical design and sampling from populations should be the first courses taught, but all elementary courses I know of start with statistical methods or probability. To me, this is putting the cart before the horse!" (Walter Federer, "A Conversation with Walter T Federer", Statistical Science Vol 20, 2005)

"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

"Be careful not to confuse clustering and stratification. Even though both of these sampling strategies involve dividing the population into subgroups, both the way in which the subgroups are sampled and the optimal strategy for creating the subgroups are different. In stratified sampling, we sample from every stratum, whereas in cluster sampling, we include only selected whole clusters in the sample. Because of this difference, to increase the chance of obtaining a sample that is representative of the population, we want to create homogeneous groups for strata and heterogeneous (reflecting the variability in the population) groups for clusters." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Bias in sampling is the tendency for samples to differ from the corresponding population in some systematic way. Bias can result from the way in which the sample is selected or from the way in which information is obtained once the sample has been chosen. The most common types of bias encountered in sampling situations are selection bias, measurement or response bias, and nonresponse bias." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The central limit theorem is often used to justify the assumption of normality when using the sample mean and the sample standard deviation. But it is inevitable that real data contain gross errors. Five to ten percent unusual values in a dataset seem to be the rule rather than the exception. The distribution of such data is no longer Normal." (A S Hedayat and Guoqin Su, "Robustness of the Simultaneous Estimators of Location and Scale From Approximating a Histogram by a Normal Density Curve", The American Statistician 66, 2012)

"The goal of random sampling is to produce a sample that is likely to be representative of the population. Although random sampling does not guarantee that the sample will be representative, it does allow us to assess the risk of an unrepresentative sample. It is the ability to quantify this risk that will enable us to generalize with confidence from a random sample to the corresponding population." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question)." (Greg Snow, "R-Help", 2014)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"If you study one group and assume that your results apply to other groups, this is extrapolation. If you think you are studying one group, but do not manage to obtain a representative sample of that group, this is a different problem. It is a problem so important in statistics that it has a special name: selection bias. Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"There are many ways for error to creep into facts and figures that seem entirely straightforward. Quantities can be miscounted. Small samples can fail to accurately reflect the properties of the whole population. Procedures used to infer quantities from other information can be faulty. And then, of course, numbers can be total bullshit, fabricated out of whole cloth in an effort to confer credibility on an otherwise flimsy argument. We need to keep all of these things in mind when we look at quantitative claims. They say the data never lie - but we need to remember that the data often mislead." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"While the main emphasis in the development of power analysis has been to provide methods for assessing and increasing power, it should also be noted that it is possible to have too much power. If your sample is too large, nearly any difference, no matter how small or meaningless from a practical standpoint, will be ‘statistically significant’." (Clay Helberg) 

"The old rule of trusting the Central Limit Theorem if the sample size is larger than 30 is just that–old. Bootstrap and permutation testing let us more easily do inferences for a wider variety of statistics." (Tim Hesterberg)

More quotes on "Sampling" at the-web-of-knowledge.blogspot.com.

18 December 2018

🔭Data Science: Context (Just the Quotes)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"When evaluating the reliability and generality of data, it is often important to know the aims of the experimenter. When evaluating the importance of experimental results, however, science has a trick of disregarding the experimenter's rationale and finding a more appropriate context for the data than the one he proposed." (Murray Sidman, "Tactics of Scientific Research", 1960)

"Data in isolation are meaningless, a collection of numbers. Only in context of a theory do they assume significance […]" (George Greenstein, "Frozen Star" , 1983)

"Graphics must not quote data out of context." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The problem solver needs to stand back and examine problem contexts in the light of different 'Ws' (Weltanschauungen). Perhaps he can then decide which 'W' seems to capture the essence of the particular problem context he is faced with. This whole process needs formalizing if it is to be carried out successfully. The problem solver needs to be aware of different paradigms in the social sciences, and he must be prepared to view the problem context through each of these paradigms." (Michael C Jackson, "Towards a System of Systems Methodologies", 1984)

"It is commonly said that a pattern, however it is written, has four essential parts: a statement of the context where the pattern is useful, the problem that the pattern addresses, the forces that play in forming a solution, and the solution that resolves those forces. [...] it supports the definition of a pattern as 'a solution to a problem in a context', a definition that [unfortunately] fixes the bounds of the pattern to a single problem-solution pair." (Martin Fowler, "Analysis Patterns: Reusable Object Models", 1997)

"We do not learn much from looking at a model - we learn more from building the model and manipulating it. Just as one needs to use or observe the use of a hammer in order to really understand its function, similarly, models have to be used before they will give up their secrets. In this sense, they have the quality of a technology - the power of the model only becomes apparent in the context of its use." (Margaret Morrison & Mary S Morgan, "Models as mediating instruments", 1999)

"Data are collected as a basis for action. Yet before anyone can use data as a basis for action the data have to be interpreted. The proper interpretation of data will require that the data be presented in context, and that the analysis technique used will filter out the noise."  (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"[…] you simply cannot make sense of any number without a contextual basis. Yet the traditional attempts to provide this contextual basis are often flawed in their execution. [...] Data have no meaning apart from their context. Data presented without a context are effectively rendered meaningless." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"All scientific theories, even those in the physical sciences, are developed in a particular cultural context. Although the context may help to explain the persistence of a theory in the face of apparently falsifying evidence, the fact that a theory arises from a particular context is not sufficient to condemn it. Theories and paradigms must be accepted, modified or rejected on the basis of evidence." (Richard P Bentall,  "Madness Explained: Psychosis and Human Nature", 2003)

"Mathematical modeling is as much ‘art’ as ‘science’: it requires the practitioner to (i) identify a so-called ‘real world’ problem (whatever the context may be); (ii) formulate it in mathematical terms (the ‘word problem’ so beloved of undergraduates); (iii) solve the problem thus formulated (if possible; perhaps approximate solutions will suffice, especially if the complete problem is intractable); and (iv) interpret the solution in the context of the original problem." (John A Adam, "Mathematics in Nature", 2003)

"Context is not as simple as being in a different space [...] context includes elements like our emotions, recent experiences, beliefs, and the surrounding environment - each element possesses attributes, that when considered in a certain light, informs what is possible in the discussion." (George Siemens, "Knowing Knowledge", 2006)

"Statistics can certainly pronounce a fact, but they cannot explain it without an underlying context, or theory. Numbers have an unfortunate tendency to supersede other types of knowing. […] Numbers give the illusion of presenting more truth and precision than they are capable of providing." (Ronald J Baker, "Measure what Matters to Customers: Using Key Predictive Indicators", 2006)

"A valid digit is not necessarily a significant digit. The significance of numbers is a result of its scientific context." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[… ] statistics is about understanding the role that variability plays in drawing conclusions based on data. […] Statistics is not about numbers; it is about data - numbers in context. It is the context that makes a problem meaningful and something worth considering." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Context (information that lends to better understanding the who, what, when, where, and why of your data) can make the data clearer for readers and point them in the right direction. At the least, it can remind you what a graph is about when you come back to it a few months later. […] Context helps readers relate to and understand the data in a visualization better. It provides a sense of scale and strengthens the connection between abstract geometry and colors to the real world." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Without context, data is useless, and any visualization you create with it will also be useless. Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Statistics are meaningless unless they exist in some context. One reason why the indicators have become more central and potent over time is that the longer they have been kept, the easier it is to find useful patterns and points of reference." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Infographics combine art and science to produce something that is not unlike a dashboard. The main difference from a dashboard is the subjective data and the narrative or story, which enhances the data-driven visual and engages the audience quickly through highlighting the required context." (Travis Murphy, "Infographics Powered by SAS®: Data Visualization Techniques for Business Reporting", 2018)

"For numbers to be transparent, they must be placed in an appropriate context. Numbers must presented in a way that allows for fair comparisons." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Without knowing the source and context, a particular statistic is worth little. Yet numbers and statistics appear rigorous and reliable simply by virtue of being quantitative, and have a tendency to spread." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

More quotes on "Context" at the-web-of-knowledge.blogspot.com


🔭Data Science: Discovery (Just the Quotes)

"A good method of discovery is to imagine certain members of a system removed and then see how what is left would behave: for example, where would we be if iron were absent from the world: this is an old example." (Georg C Lichtenberg, Notebook J, 1789-1793)

"Every science has for its basis a system of principles as fixed and unalterable as those by which the universe is regulated and governed. Man cannot make principles; he can only discover them." (Thomas Paine, "The Age of Reason", 1794)

"We learn wisdom from failure much more than from success. We often discover what will do, by finding out what will not do; and probably he who never made a mistake never made a discovery." (Samuel Smiles, "Facilities and Difficulties", 1859)

"The process of discovery is very simple. An unwearied and systematic application of known laws to nature, causes the unknown to reveal themselves. Almost any mode of observation will be successful at last, for what is most wanted is method." (Henry Thoreau, "A Week on the Concord and Merrimack Rivers", 1862)

"Every process has laws, known or unknown, according to which it must take place. A consciousness of them is so far from being necessary to the process, that we cannot discover what they are, except by analyzing the results it has left us." (Lord William T Kelvin , "An Outline of the Necessary Laws of Thought", 1866)

"Accurate and minute measurement seems to the nonscientific imagination a less lofty and dignified work than looking for something new. But nearly all the grandest discoveries of science have been but the rewards of accurate measurement and patient long contained labor in the minute sifting of numerical results." (William T Kelvin, "Report of the British Association For the Advancement of Science" Vol. 41, 1871)

"Modern discoveries have not been made by large collections of facts, with subsequent discussion, separation, and resulting deduction of a truth thus rendered perceptible. A few facts have suggested an hypothesis, which means a supposition, proper to explain them. The necessary results of this supposition are worked out, and then, and not till then, other facts are examined to see if their ulterior results are found in Nature." (Augustus de Morgan, "A Budget of Paradoxes", 1872)

"Science arises from the discovery of Identity amid Diversity." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"It would be an error to suppose that the great discoverer seizes at once upon the truth, or has any unerring method of divining it. In all probability the errors of the great mind exceed in number those of the less vigorous one. Fertility of imagination and abundance of guesses at truth are among the first requisites of discovery; but the erroneous guesses must be many times as numerous as those that prove well founded. The weakest analogies, the most whimsical notions, the most apparently absurd theories, may pass through the teeming brain, and no record remain of more than the hundredth part. […] The truest theories involve suppositions which are inconceivable, and no limit can really be placed to the freedom of hypotheses." (W Stanley Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1877)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"All great scientists have, in a certain sense, been great artists; the man with no imagination may collect facts, but he cannot make great discoveries." (Karl Pearson, "The Grammar of Science", 1892)

"The folly of mistaking a paradox for a discovery, a metaphor for a proof, a torrent of verbiage for a spring of capital truths, and oneself for an oracle, is inborn in us." (Paul Valéry, "Introduction to the Method of Leonardo da Vinci", 1895)

"If we study the history of science we see happen two inverse phenomena […] Sometimes simplicity hides under complex appearances; sometimes it is the simplicity which is apparent, and which disguises extremely complicated realities. […] No doubt, if our means of investigation should become more and more penetrating, we should discover the simple under the complex, then the complex under the simple, then again the simple under the complex, and so on, without our being able to foresee what will be the last term. We must stop somewhere, and that science may be possible, we must stop when we have found simplicity. This is the only ground on which we can rear the edifice of our generalizations." (Henri Poincaré, "Science and Hypothesis", 1901)

"The only true voyage of discovery […] would be not to visit new landscapes, but to possess other eyes, to see the universe through the eyes of another, of a hundred others, to see the hundred universes that each of them sees." (Marcel Proust, "À la recherche du temps perdu", 1913)

"To come very near to a true theory, and to grasp its precise application, are two very different things, as the history of a science teaches us. Everything of importance has been said before by somebody who did not discover it." (Alfred N Whitehead, "The Organization of Thought", 1917)

"Great scientific discoveries have been made by men seeking to verify quite erroneous theories about the nature of things." (Aldous L Huxley, "Life and Letters and the London Mercury" Vol. 1, 1928)

"The art of discovery is confused with the logic of proof and an artificial simplification of the deeper movements of thought results. We forget that we invent by intuition though we prove by logic." (Sarvepalli Radhakrishnan, "An Idealist View of Life", 1929)

"Science is the attempt to discover, by means of observation, and reasoning based upon it, first, particular facts about the world, and then laws connecting facts with one another and (in fortunate cases) making it possible to predict future occurrences." (Bertrand Russell, "Religion and Science, Grounds of Conflict", 1935) 

"A great discovery solves a great problem but there is a grain of discovery in the solution of any problem. Your problem may be modest; but if it challenges your curiosity and brings into play your inventive faculties, and if you solve it by your own means, you may experience the tension and enjoy the triumph of discovery." (George Polya, "How to solve it", 1944)

"It is always more easy to discover and proclaim general principles than it is to apply them." (Winston Churchill, "The Second World War: The gathering storm", 1948)

"Scientific discovery consists in the interpretation for our own convenience of a system of existence which has been made with no eye to our convenience at all." (Norbert Wiener, "The Human Use of Human Beings", 1949)

"The scientist who discovers a theory is usually guided to his discovery by guesses; he cannot name a method by means of which he found the theory and can only say that it appeared plausible to him, that he had the right hunch or that he saw intuitively which assumption would fit the facts." (Hans Reichenbach, "The Rise of Scientific Philosophy", 1951)

"It is important for him who wants to discover not to confine himself to one chapter of science, but to keep in touch with various others." (Jacques S Hadamard, "An Essay on the Psychology of Invention in the Mathematical Field", 1954)

"The true aim of science is to discover a simple theory which is necessary and sufficient to cover the facts, when they have been purified of traditional prejudices." (Lancelot L Whyte, "Accent on Form", 1954)

"There comes a point where the mind takes a leap - call it intuition or what you will - and comes out upon a higher plane of knowledge, but can never prove how it got there. All great discoveries have involved such a leap." (Albert Einstein, [interview in Life, "Death of a Genius"] 1955)

"Discovery follows discovery, each both raising and answering questions, each ending a long search, and each providing the new instruments for a new search." (J Robert Oppenheimer, "Prospects in the Arts and Sciences", 1964)

"Discovery always carries an honorific connotation. It is the stamp of approval on a finding of lasting value. Many laws and theories have come and gone in the history of science, but they are not spoken of as discoveries. […] Theories are especially precarious, as this century profoundly testifies. World views can and do often change. Despite these difficulties, it is still true that to count as a discovery a finding must be of at least relatively permanent value, as shown by its inclusion in the generally accepted body of scientific knowledge." (Richard J. Blackwell, "Discovery in the Physical Sciences", 1969)

"A discovery must be, by definition, at variance with existing knowledge." (Albert Szent-Gyorgyi, "Dionysians and Apollonians", Science 176, 1972)

"It is one of our most exciting discoveries that local discovery leads to a complex of further discoveries. Corollary to this we find that we no sooner get a problem solved than we are overwhelmed with a multiplicity of additional problems in a most beautiful payoff of heretofore unknown, previously unrecognized, and as-yet unsolved problems." (Buckminster Fuller, "Synergetics: Explorations in the Geometry of Thinking", 1975)

"You cannot learn, through common sense, how things are you can only discover where they fit into the existing scheme of things." (Stuart Hall, 1977)

"Every discovery, every enlargement of the understanding, begins as an imaginative preconception of what the truth might be. The imaginative preconception - a ‘hypothesis’ - arises by a process as easy or as difficult to understand as any other creative act of mind; it is a brainwave, an inspired guess, a product of a blaze of insight. It comes anyway from within and cannot be achieved by the exercise of any known calculus of discovery. " (Sir Peter B Medawar, "Advice to a Young Scientist", 1979)

"Metaphors can have profound significance because, as images or figures, they allow the mind to grasp or discover unsuspected ideal and material relationships between objects." (Giuseppe Del Re, "Cosmic Dance", 1999)

"Alternative models are neither right nor wrong, just more or less useful in allowing us to operate in the world and discover more and better options for solving problems." (Andrew Weil," The Natural Mind: A Revolutionary Approach to the Drug Problem", 2004)

"We tackle a multifaceted universe one face at a time, tailoring our models and equations to fit the facts at hand. Whatever mechanical conception proves appropriate, that is the one to use. Discovering worlds within worlds, a practical observer will deal with each realm on its own terms. It is the only sensible approach to take." (Michael Munowitz, "Knowing: The Nature of Physical Law", 2005)

"Equations seem like treasures, spotted in the rough by some discerning individual, plucked and examined, placed in the grand storehouse of knowledge, passed on from generation to generation. This is so convenient a way to present scientific discovery, and so useful for textbooks, that it can be called the treasure-hunt picture of knowledge." (Robert P Crease, "The Great Equations", 2009)

"Models do not only describe reality, they are also instruments for exploring reality. They are not only involved in the integration of known data, but also in the discovery of new data." (Andreas Bartels, "The Standard Model of Cosmology as a Tool for Interpretation and Discovery", 2013)

More quotes on "Discovery" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Problem Solving (Just the Quotes)

"Reflexion is careful and laborious thought, and watchful attention directed to the agreeable effect of one's plan. Invention, on the other hand, is the solving of intricate problems and the discovery of new principles by means of brilliancy and versatility." (Marcus Vitruvius Pollio, "De architectura" ["On Architecture], cca. 15BC)

"The insights gained and garnered by the mind in its wanderings among basic concepts are benefits that theory can provide. Theory cannot equip the mind with formulas for solving problems, nor can it mark the narrow path on which the sole solution is supposed to lie by planting a hedge of principles on either side. But it can give the mind insight into the great mass of phenomena and of their relationships, then leave it free to rise into the higher realms of action." (Carl von Clausewitz, "On War", 1832)

"The correct solution to any problem depends principally on a true understanding of what the problem is." (Arthur M Wellington, "The Economic Theory of Railway Location", 1887)

"He who seeks for methods without having a definite problem in mind seeks for the most part in vain." (David Hilbert, 1902)

"This diagrammatic method has, however, serious inconveniences as a method for solving logical problems. It does not show how the data are exhibited by cancelling certain constituents, nor does it show how to combine the remaining constituents so as to obtain the consequences sought. In short, it serves only to exhibit one single step in the argument, namely the equation of the problem; it dispenses neither with the previous steps, i.e., 'throwing of the problem into an equation' and the transformation of the premises, nor with the subsequent steps, i.e., the combinations that lead to the various consequences. Hence it is of very little use, inasmuch as the constituents can be represented by algebraic symbols quite as well as by plane regions, and are much easier to deal with in this form." (Louis Couturat, "The Algebra of Logic", 1914)

"A great discovery solves a great problem but there is a grain of discovery in the solution of any problem. Your problem may be modest; but if it challenges your curiosity and brings into play your inventive faculties, and if you solve it by your own means, you may experience the tension and enjoy the triumph of discovery." (George Polya, "How to solve it", 1944)

"Success in solving the problem depends on choosing the right aspect, on attacking the fortress from its accessible side." (George Polya, "How to Solve It", 1944)

"[The] function of thinking is not just solving an actual problem but discovering, envisaging, going into deeper questions. Often, in great discovery the most important thing is that a certain question is found." (Max Wertheimer, "Productive Thinking", 1945)

"We can scarcely imagine a problem absolutely new, unlike and unrelated to any formerly solved problem; but if such a problem could exist, it would be insoluble. In fact, when solving a problem, we should always profit from previously solved problems, using their result or their method, or the experience acquired in solving them." (George Polya, 1945)

"I believe, that the decisive idea which brings the solution of a problem is rather often connected with a well-turned word or sentence. The word or the sentence enlightens the situation, gives things, as you say, a physiognomy. It can precede by little the decisive idea or follow on it immediately; perhaps, it arises at the same time as the decisive idea. […]  The right word, the subtly appropriate word, helps us to recall the mathematical idea, perhaps less completely and less objectively than a diagram or a mathematical notation, but in an analogous way. […] It may contribute to fix it in the mind." (George Pólya [in a letter to Jaque Hadamard, "The Psychology of Invention in the Mathematical Field", 1949])

"The problems are solved, not by giving new information, but by arranging what we have known since long." (Ludwig Wittgenstein, "Philosophical Investigations", 1953)

"Solving problems is the specific achievement of intelligence." (George Pólya, 1957)

"Systems engineering embraces every scientific and technical concept known, including economics, management, operations, maintenance, etc. It is the job of integrating an entire problem or problem to arrive at one overall answer, and the breaking down of this answer into defined units which are selected to function compatibly to achieve the specified objectives." (Instrumentation Technology, 1957)

"A problem that is located and identified is already half solved!" (Bror R Carlson, "Managing for Profit", 1961)

"If we view organizations as adaptive, problem-solving structures, then inferences about effectiveness have to be made, not from static measures of output, but on the basis of the processes through which the organization approaches problems. In other words, no single measurement of organizational efficiency or satisfaction - no single time-slice of organizational performance can provide valid indicators of organizational health." (Warren G Bennis, "General Systems Yearbook", 1962)

"Solving problems can be regarded as the most characteristically human activity." (George Pólya, "Mathematical Discovery", 1962)

"The final test of a theory is its capacity to solve the problems which originated it." (George Dantzig, "Linear Programming and Extensions", 1963)

"It is a commonplace of modern technology that there is a high measure of certainty that problems have solutions before there is knowledge of how they are to be solved." (John K Galbraith, "The New Industrial State", 1967)

"An expert problem solver must be endowed with two incompatible qualities, a restless imagination and a patient pertinacity.” (Howard W Eves, “In Mathematical Circles”, 1969)

"The problem-solving approach allows for mental double-clutching. It does not require a direct switch from one point of view to another. It provides a period 'in neutural' where there is an openness to facts and, therefore, a willingness to consider alternative views." (William Reddin, "Managerial Effectiveness", 1970)

"In general, complexity and precision bear an inverse relation to one another in the sense that, as the complexity of a problem increases, the possibility of analysing it in precise terms diminishes. Thus 'fuzzy thinking' may not be deplorable, after all, if it makes possible the solution of problems which are much too complex for precise analysis." (Lotfi A Zadeh, "Fuzzy languages and their relation to human intelligence", 1972)

"If we deal with our problem not knowing, or pretending not to know the general theory encompassing the concrete case before us, if we tackle the problem "with bare hands", we have a better chance to understand the scientist's attitude in general, and especially the task of the applied mathematician." (George Pólya, "Mathematical Methods in Science", 1977)

"Systems represent someone's attempt at solution to problems, but they do not solve problems; they produce complicated responses." (Melvin J Sykes, Maryland Law Review, 1978)

“Solving problems can be regarded as the most characteristically human activity.” (George Polya, 1981)

"The problem solver needs to stand back and examine problem contexts in the light of different 'Ws' (Weltanschauungen). Perhaps he can then decide which 'W' seems to capture the essence of the particular problem context he is faced with. This whole process needs formalizing if it is to be carried out successfully. The problem solver needs to be aware of different paradigms in the social sciences, and he must be prepared to view the problem context through each of these paradigms." (Michael C Jackson, "Towards a System of Systems Methodologies", 1984)

"People in general tend to assume that there is some 'right' way of solving problems. Formal logic, for example, is regarded as a correct approach to thinking, but thinking is always a compromise between the demands of comprehensiveness, speed, and accuracy. There is no best way of thinking." (James L McKenney & Peter G W Keen, Harvard Business Review on Human Relations, 1986)

"A great many problems are easier to solve rigorously if you know in advance what the answer is." (Ian Stewart, "From Here to Infinity", 1987)

"Define the problem before you pursue a solution." (John Williams, Inc. Magazine's Guide to Small Business Success, 1987)

"No matter how complicated a problem is, it usually can be reduced to a simple, comprehensible form which is often the best solution." (Dr. An Wang, Nation's Business, 1987)

"There are many things you can do with problems besides solving them. First you must define them, pose them. But then of course you can also refi ne them, depose them, or expose them or even dissolve them! A given problem may send you looking for analogies, and some of these may lead you astray, suggesting new and different problems, related or not to the original. Ends and means can get reversed. You had a goal, but the means you found didn’t lead to it, so you found a new goal they did lead to. It’s called play. Creative mathematicians play a lot; around any problem really interesting they develop a whole cluster of analogies, of playthings." (David Hawkins, "The Spirit of Play", Los Alamos Science, 1987)

"A scientific problem can be illuminated by the discovery of a profound analogy, and a mundane problem can be solved in a similar way." (Philip Johnson-Laird, "The Computer and the Mind", 1988)

"Anecdotes may be more useful than equations in understanding the problem." (Robert Kuttner, "The New Republic", The New York Times, 1988)

"Most people would rush ahead and implement a solution before they know what the problem is." (Q T Wiles, Inc. Magazine, 1988)

“A mental model is a knowledge structure that incorporates both declarative knowledge (e.g., device models) and procedural knowledge (e.g., procedures for determining distributions of voltages within a circuit), and a control structure that determines how the procedural and declarative knowledge are used in solving problems (e.g., mentally simulating the behavior of a circuit).” (Barbara Y White & John R Frederiksen, “Causal Model Progressions as a Foundation for Intelligent Learning Environments”, Artificial Intelligence 42, 1990)

"An important symptom of an emerging understanding is the capacity to represent a problem in a number of different ways and to approach its solution from varied vantage points; a single, rigid representation is unlikely to suffice." (Howard Gardner, “The Unschooled Mind”, 1991)

“[By understanding] I mean simply a sufficient grasp of concepts, principles, or skills so that one can bring them to bear on new problems and situations, deciding in which ways one’s present competencies can suffice and in which ways one may require new skills or knowledge.” (Howard Gardner, “The Unschooled Mind”, 1991)

"We consider the notion of ‘system’ as an organising concept, before going on to look in detail at various systemic metaphors that may be used as a basis for structuring thinking about organisations and problem situations." (Michael C Jackson, "Creative Problem Solving: Total Systems Intervention", 1991)

“But our ways of learning about the world are strongly influenced by the social preconceptions and biased modes of thinking that each scientist must apply to any problem. The stereotype of a fully rational and objective ‘scientific method’, with individual scientists as logical (and interchangeable) robots, is self-serving mythology.” (Stephen Jay Gould, “This View of Life: In the Mind of the Beholder”, Natural History Vol. 103 (2), 1994)

"The term mental model refers to knowledge structures utilized in the solving of problems. Mental models are causal and thus may be functionally defined in the sense that they allow a problem solver to engage in description, explanation, and prediction. Mental models may also be defined in a structural sense as consisting of objects, states that those objects exist in, and processes that are responsible for those objects’ changing states." (Robert Hafner & Jim Stewart, "Revising Explanatory Models to Accommodate Anomalous Genetic Phenomena: Problem Solving in the ‘Context of Discovery’", Science Education 79 (2), 1995)

"The purpose of a conceptual model is to provide a vocabulary of terms and concepts that can be used to describe problems and/or solutions of design. It is not the purpose of a model to address specific problems, and even less to propose solutions for them. Drawing an analogy with linguistics, a conceptual model is analogous to a language, while design patterns are analogous to rhetorical figures, which are predefined templates of language usages, suited particularly to specific problems." (Peter P Chen [Ed.], "Advances in Conceptual Modeling", 1999)

"The three basic mechanisms of averaging, feedback and division of labor give us a first idea of a how a CMM [Collective Mental Map] can be developed in the most efficient way, that is, how a given number of individuals can achieve a maximum of collective problem-solving competence. A collective mental map is developed basically by superposing a number of individual mental maps. There must be sufficient diversity among these individual maps to cover an as large as possible domain, yet sufficient redundancy so that the overlap between maps is large enough to make the resulting graph fully connected, and so that each preference in the map is the superposition of a number of individual preferences that is large enough to cancel out individual fluctuations. The best way to quickly expand and improve the map and fill in gaps is to use a positive feedback that encourages individuals to use high preference paths discovered by others, yet is not so strong that it discourages the exploration of new paths." (Francis Heylighen, "Collective Intelligence and its Implementation on the Web", 1999)

"What it means for a mental model to be a structural analog is that it embodies a representation of the spatial and temporal relations among, and the causal structures connecting the events and entities depicted and whatever other information that is relevant to the problem-solving talks. […] The essential points are that a mental model can be nonlinguistic in form and the mental mechanisms are such that they can satisfy the model-building and simulative constraints necessary for the activity of mental modeling." (Nancy J Nersessian, "Model-based reasoning in conceptual change", 1999)

"A model is an imitation of reality and a mathematical model is a particular form of representation. We should never forget this and get so distracted by the model that we forget the real application which is driving the modelling. In the process of model building we are translating our real world problem into an equivalent mathematical problem which we solve and then attempt to interpret. We do this to gain insight into the original real world situation or to use the model for control, optimization or possibly safety studies." (Ian T Cameron & Katalin Hangos, "Process Modelling and Model Analysis", 2001)

"[...] a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration." (Yu-Chi Ho & David L Pepyne, "Simple explanation of the no-free-lunch theorem and its implications", Journal of Optimization Theory and Applications 115, 2002)

"Mathematical modeling is as much ‘art’ as ‘science’: it requires the practitioner to (i) identify a so-called ‘real world’ problem (whatever the context may be); (ii) formulate it in mathematical terms (the ‘word problem’ so beloved of undergraduates); (iii) solve the problem thus formulated (if possible; perhaps approximate solutions will suffice, especially if the complete problem is intractable); and (iv) interpret the solution in the context of the original problem." (John A Adam, "Mathematics in Nature", 2003)

"What is a mathematical model? One basic answer is that it is the formulation in mathematical terms of the assumptions and their consequences believed to underlie a particular ‘real world’ problem. The aim of mathematical modeling is the practical application of mathematics to help unravel the underlying mechanisms involved in, for example, economic, physical, biological, or other systems and processes." (John A Adam, "Mathematics in Nature", 2003)

"Alternative models are neither right nor wrong, just more or less useful in allowing us to operate in the world and discover more and better options for solving problems." (Andrew Weil," The Natural Mind: A Revolutionary Approach to the Drug Problem", 2004)

“A conceptual model is a mental image of a system, its components, its interactions. It lays the foundation for more elaborate models, such as physical or numerical models. A conceptual model provides a framework in which to think about the workings of a system or about problem solving in general. An ensuing operational model can be no better than its underlying conceptualization.” (Henry N Pollack, “Uncertain Science … Uncertain World”, 2005)

"Graphics is the visual means of resolving logical problems." (Jacques Bertin, "Graphics and Graphic Information Processing", 2011)

"In specific cases, we think by applying mental rules, which are similar to rules in computer programs. In most of the cases, however, we reason by constructing, inspecting, and manipulating mental models. These models and the processes that manipulate them are the basis of our competence to reason. In general, it is believed that humans have the competence to perform such inferences error-free. Errors do occur, however, because reasoning performance is limited by capacities of the cognitive system, misunderstanding of the premises, ambiguity of problems, and motivational factors. Moreover, background knowledge can significantly influence our reasoning performance. This influence can either be facilitation or an impedance of the reasoning process." (Carsten Held et al, "Mental Models and the Mind", 2006)

"Every problem has a solution; it may sometimes just need another perspective.” (Rebecca Mallery et al, "NLP for Rookies", 2009)

"Mental acuity of any kind comes from solving problems yourself, not from being told how to solve them.” (Paul Lockhart, "A Mathematician's Lament", 2009)

"Mostly we rely on stories to put our ideas into context and give them meaning. It should be no surprise, then, that the human capacity for storytelling plays an important role in the intrinsically human-centered approach to problem solving, design thinking." (Tim Brown, "Change by Design: How Design Thinking Transforms Organizations and Inspires Innovation", 2009)

"Mental models are formed over time through a deep enculturation process, so it follows that any attempt to align mental models must focus heavily on collective sense making. Alignment only happens through a process of socialisation; people working together, solving problems together, making sense of the world together." (Robina Chatham & Brian Sutton, "Changing the IT Leader’s Mindset", 2010)

"Mathematical modeling is the application of mathematics to describe real-world problems and investigating important questions that arise from it." (Sandip Banerjee, "Mathematical Modeling: Models, Analysis and Applications", 2014)

"Mental imagery is often useful in problem solving. Verbal descriptions of problems can become confusing, and a mental image can clear away excessive detail to bring out important aspects of the problem. Imagery is most useful with problems that hinge on some spatial relationship. However, if the problem requires an unusual solution, mental imagery alone can be misleading, since it is difficult to change one’s understanding of a mental image. In many cases, it helps to draw a concrete picture since a picture can be turned around, played with, and reinterpreted, yielding new solutions in a way that a mental image cannot." (James Schindler, "Followership", 2014)

“Framing the right problem is equally or even more important than solving it.” (Pearl Zhu, “Change, Creativity and Problem-Solving”, 2017)

17 December 2018

🔭Data Science: Bias (Just the Quotes)

"The human mind can hardly remain entirely free from bias, and decisive opinions are often formed before a thorough examination of a subject from all its aspects has been made." (Helena P. Blavatsky, "The Secret Doctrine", 1888)

"The classification of facts, the recognition of their sequence and relative significance is the function of science, and the habit of forming a judgment upon these facts unbiased by personal feeling is characteristic of what may be termed the scientific frame of mind." (Karl Pearson, "The Grammar of Science", 1892)

"It may be impossible for human intelligence to comprehend absolute truth, but it is possible to observe Nature with an unbiased mind and to bear truthful testimony of things seen." (Sir Richard A Gregory, "Discovery, Or, The Spirit and Service of Science", 1916)

"Scientific discovery, or the formulation of scientific theory, starts in with the unvarnished and unembroidered evidence of the senses. It starts with simple observation - simple, unbiased, unprejudiced, naive, or innocent observation - and out of this sensory evidence, embodied in the form of simple propositions or declarations of fact, generalizations will grow up and take shape, almost as if some process of crystallization or condensation were taking place. Out of a disorderly array of facts, an orderly theory, an orderly general statement, will somehow emerge." (Sir Peter B Medawar, "Is the Scientific Paper Fraudulent?", The Saturday Review, 1964)

"Errors may also creep into the information transfer stage when the originator of the data is unconsciously looking for a particular result. Such situations may occur in interviews or questionnaires designed to gather original data. Improper wording of the question, or improper voice inflections. and other constructional errors may elicit nonobjective responses. Obviously, if the data is incorrectly gathered, any graph based on that data will contain the original error - even though the graph be most expertly designed and beautifully presented." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Numbers have undoubted powers to beguile and benumb, but critics must probe behind numbers to the character of arguments and the biases that motivate them." (Stephen J Gould, "An Urchin in the Storm: Essays About Books and Ideas", 1987)

"But our ways of learning about the world are strongly influenced by the social preconceptions and biased modes of thinking that each scientist must apply to any problem. The stereotype of a fully rational and objective ‘scientific method’, with individual scientists as logical (and interchangeable) robots, is self-serving mythology." (Stephen J Gould, "This View of Life: In the Mind of the Beholder", "Natural History", Vol. 103, No. 2, 1994)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Self-selection bias occurs when people choose to be in the data - for example, when people choose to go to college, marry, or have children. […] Self-selection bias is pervasive in 'observational data', where we collect data by observing what people do. Because these people chose to do what they are doing, their choices may reflect who they are. This self-selection bias could be avoided with a controlled experiment in which people are randomly assigned to groups and told what to do." (Gary Smith, "Standard Deviations", 2014)

"Self-selection bias occurs when we compare people who made different choices without thinking about why they made these choices. […] Our conclusions would be more convincing if choice was removed […]" (Gary Smith, "Standard Deviations", 2014)

"We naturally draw conclusions from what we see […]. We should also think about what we do not see […]. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward." (Gary Smith, "Standard Deviations", 2014)

"We live in a world with a surfeit of information at our service. It is our choice whether we seek out data that reinforce our biases or choose to look at the world in a critical, rational manner, and allow reality to bend our preconceptions. In the long run, the truth will work better for us than our cherished fictions." (Razib Khan, "The Abortion Stereotype", The New York Times, 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Bias occurs normally when the model is underfitted and has failed to learn enough from the training data. It is the difference between the mean of the probability distribution and the actual correct value. Hence, the accuracy of the model is different for different data sets (test and training sets). To reduce the bias error, data scientists repeat the model-building process by resampling the data to obtain better prediction values." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"High-bias models typically produce simpler models that do not overfit and in those cases the danger is that of underfitting. Models with low-bias are typically more complex and that complexity enables us to represent the training data in a more accurate way. The danger here is that the flexibility provided by higher complexity may end up representing not only a relationship in the data but also the noise. Another way of portraying the bias-variance trade-off is in terms of complexity v simplicity." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017) 

"If either bias or variance is high, the model can be very far off from reality. In general, there is a trade-off between bias and variance. The goal of any machine-learning algorithm is to achieve low bias and low variance such that it gives good prediction performance. In reality, because of so many other hidden parameters in the model, it is hard to calculate the real bias and variance error. Nevertheless, the bias and variance provide a measure to understand the behavior of the machine-learning algorithm so that the model model can be adjusted to provide good prediction performance." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"The human brain always concocts biases to aid in the construction of a coherent mental life, exclusively suitable for an individual’s personal needs." (Abhijit Naskar, "We Are All Black: A Treatise on Racism", 2017)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017) 

"When we have all the data, it is straightforward to produce statistics that describe what has been measured. But when we want to use the data to draw broader conclusions about what is going on around us, then the quality of the data becomes paramount, and we need to be alert to the kind of systematic biases that can jeopardize the reliability of any claims." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Any machine learning model is trained based on certain assumptions. In general, these assumptions are the simplistic approximations of some real-world phenomena. These assumptions simplify the actual relationships between features and their characteristics and make a model easier to train. More assumptions means more bias. So, while training a model, more simplistic assumptions = high bias, and realistic assumptions that are more representative of actual phenomena = low bias." (Imran Ahmad, "40 Algorithms Every Programmer Should Know", 2020)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"If you study one group and assume that your results apply to other groups, this is extrapolation. If you think you are studying one group, but do not manage to obtain a representative sample of that group, this is a different problem. It is a problem so important in statistics that it has a special name: selection bias. Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"To accomplish their goals, what are now called machine learning systems must each learn something specific. Researchers call this giving the machine a 'bias'. […] A bias in machine learning means that the system is designed and tuned to learn something. But this is, of course, just the problem of producing narrow problem-solving applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Any time you run regression analysis on arbitrary real-world observational data, there’s a significant risk that there’s hidden confounding in your dataset and so causal conclusions from such analysis are likely to be (causally) biased." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"Science is the search for truth, that is the effort to understand the world: it involves the rejection of bias, of dogma, of revelation, but not the rejection of morality." (Linus Pauling)

"Facts and values are entangled in science. It's not because scientists are biased, not because they are partial or influenced by other kinds of interests, but because of a commitment to reason, consistency, coherence, plausibility and replicability. These are value commitments." (Alva Noë)

"A scientist has to be neutral in his search for the truth, but he cannot be neutral as to the use of that truth when found. If you know more than other people, you have more responsibility, rather than less." (Charles P Snow)

More quotes on "Bias" at the-web-of-knowledge.blogspot.com

🔭Data Science: Insight (Just the Quotes)

"[…] it is from long experience chiefly that we are to expect the most certain rules of practice, yet it is withal to be remembered, that observations, and to put us upon the most probable means of improving any art, is to get the best insight we can into the nature and properties of those things which we are desirous to cultivate and improve." (Stephen Hales, "Vegetable Staticks", 1727)

"The insights gained and garnered by the mind in its wanderings among basic concepts are benefits that theory can provide. Theory cannot equip the mind with formulas for solving problems, nor can it mark the narrow path on which the sole solution is supposed to lie by planting a hedge of principles on either side. But it can give the mind insight into the great mass of phenomena and of their relationships, then leave it free to rise into the higher realms of action." (Carl von Clausewitz, "On War", 1832)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"The attempt to characterize exactly models of an empirical theory almost inevitably yields a more precise and clearer understanding of the exact character of a theory. The emptiness and shallowness of many classical theories in the social sciences is well brought out by the attempt to formulate in any exact fashion what constitutes a model of the theory. The kind of theory which mainly consists of insightful remarks and heuristic slogans will not be amenable to this treatment. The effort to make it exact will at the same time reveal the weakness of the theory." (Patrick Suppes," A Comparison of the Meaning and Uses of Models in Mathematics and the Empirical Sciences", Synthese  Vol. 12 (2/3), 1960)

"Model-making, the imaginative and logical steps which precede the experiment, may be judged the most valuable part of scientific method because skill and insight in these matters are rare. Without them we do not know what experiment to do. But it is the experiment which provides the raw material for scientific theory. Scientific theory cannot be built directly from the conclusions of conceptual models." (Herbert G Andrewartha," Introduction to the Study of Animal Population", 1961)

"The purpose of computing is insight, not numbers […] sometimes […] the purpose of computing numbers is not yet in sight." (Richard Hamming, "Numerical Methods for Scientists and Engineers", 1962)

"The mediation of theory and praxis can only be clarified if to begin with we distinguish three functions, which are measured in terms of different criteria: the formation and extension of critical theorems, which can stand up to scientific discourse; the organization of processes of enlightenment, in which such theorems are applied and can be tested in a unique manner by the initiation of processes of reflection carried on within certain groups toward which these processes have been directed; and the selection of appropriate strategies, the solution of tactical questions, and the conduct of the political struggle. On the first level, the aim is true statements, on the second, authentic insights, and on the third, prudent decisions." (Jürgen Habermas, "Introduction to Theory and Practice", 1963)

"[...] it is rather more difficult to recapture directness and simplicity than to advance in the direction of ever more sophistication and complexity. Any third-rate engineer or researcher can increase complexity; but it takes a certain flair of real insight to make things simple again." (Ernst F Schumacher, "Small Is Beautiful", 1973)

"Every discovery, every enlargement of the understanding, begins as an imaginative preconception of what the truth might be. The imaginative preconception - a ‘hypothesis’ - arises by a process as easy or as difficult to understand as any other creative act of mind; it is a brainwave, an inspired guess, a product of a blaze of insight. It comes anyway from within and cannot be achieved by the exercise of any known calculus of discovery." (Sir Peter B Medawar, "Advice to a Young Scientist", 1979)

"There is a tendency to mistake data for wisdom, just as there has always been a tendency to confuse logic with values, intelligence with insight. Unobstructed access to facts can produce unlimited good only if it is matched by the desire and ability to find out what they mean and where they lead." (Norman Cousins, "Human Options : An Autobiographical Notebook", 1981)

"The heart of mathematics consists of concrete examples and concrete problems. Big general theories are usually afterthoughts based on small but profound insights; the insights themselves come from concrete special cases." (Paul Halmos, "Selecta: Expository writing", 1983)

"All the efforts of the researcher to find other models, conceptions, different mathematical forms, better linguistic modes of expression, to do justice to newly discovered layers of being mean self-transformation. The researcher in his place is the human being in self-transformation to more profound insight into what is given." (John Dessauer, Universitas: A Quarterly German Review of the Arts and Sciences Vol. 26 (4), 1984)

"[…] new insights fail to get put into practice because they conflict with deeply held internal images of how the world works [...] images that limit us to familiar ways of thinking and acting. That is why the discipline of managing mental models - surfacing, testing, and improving our internal pictures of how the world works - promises to be a major breakthrough for learning organizations." (Peter Senge, "The Fifth Discipline: The Art and Practice of the Learning Organization", 1990)

"Science is (or should be) a precise art. Precise, because data may be taken or theories formulated with a certain amount of accuracy; an art, because putting the information into the most useful form for investigation or for presentation requires a certain amount of creativity and insight." (Patricia H Reiff, "The Use and Misuse of Statistics in Space Physics", Journal of Geomagnetism and Geoelectricity 42, 1990)

"Management is not founded on observation and experiment, but on a drive towards a set of outcomes. These aims are not altogether explicit; at one extreme they may amount to no more than an intention to preserve the status quo, at the other extreme they may embody an obsessional demand for power, profit or prestige. But the scientist's quest for insight, for understanding, for wanting to know what makes the system tick, rarely figures in the manager's motivation. Secondly, and therefore, management is not, even in intention, separable from its own intentions and desires: its policies express them. Thirdly, management is not normally aware of the conventional nature of its intellectual processes and control procedures. It is accustomed to confuse its conventions for recording information with truths-about-the-business, its subjective institutional languages for discussing the business with an objective language of fact and its models of reality with reality itself." (Stanford Beer, "Decision and Control", 1994)

"Ideas about organization are always based on implicit images or metaphors that persuade us to see, understand, and manage situations in a particular way. Metaphors create insight. But they also distort. They have strengths. But they also have limitations. In creating ways of seeing, they create ways of not seeing. There can be no single theory or metaphor that gives an all-purpose point of view, and there can be no simple 'correct theory' for structuring everything we do." (Gareth Morgan, "Imaginization", 1997)

"We use mathematics and statistics to describe the diverse realms of randomness. From these descriptions, we attempt to glean insights into the workings of chance and to search for hidden causes. With such tools in hand, we seek patterns and relationships and propose predictions that help us make sense of the world."  (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"The purpose of analysis is insight. The best analysis is the simplest analysis which provides the needed insight." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"A model is an imitation of reality and a mathematical model is a particular form of representation. We should never forget this and get so distracted by the model that we forget the real application which is driving the modelling. In the process of model building we are translating our real world problem into an equivalent mathematical problem which we solve and then attempt to interpret. We do this to gain insight into the original real world situation or to use the model for control, optimization or possibly safety studies." (Ian T Cameron & Katalin Hangos, "Process Modelling and Model Analysis", 2001)

"Central tendency is the formal expression for the notion of where data is centered, best understood by most readers as 'average'. There is no one way of measuring where data are centered, and different measures provide different insights." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"A common mistake is that all visualization must be simple, but this skips a step. You should actually design graphics that lend clarity, and that clarity can make a chart 'simple' to read. However, sometimes a dataset is complex, so the visualization must be complex. The visualization might still work if it provides useful insights that you wouldn’t get from a spreadsheet. […] Sometimes a table is better. Sometimes it’s better to show numbers instead of abstract them with shapes. Sometimes you have a lot of data, and it makes more sense to visualize a simple aggregate than it does to show every data point." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The other buzzword that epitomizes a bias toward substitution is 'big data'. Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is usually dumb data. Computers can find patterns that elude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst (or the kind of generalized artificial intelligence that exists only in science fiction)." (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"As business leaders we need to understand that lack of data is not the issue. Most businesses have more than enough data to use constructively; we just don't know how to use it. The reality is that most businesses are already data rich, but insight poor." (Bernard Marr, Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance, 2015)

"While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"Big Data allows us to meaningfully zoom in on small segments of a dataset to gain new insights on who we are." (Seth Stephens-Davidowitz, "Everybody Lies: What the Internet Can Tell Us About Who We Really Are", 2017)

"Mathematical modeling is the modern version of both applied mathematics and theoretical physics. In earlier times, one proposed not a model but a theory. By talking today of a model rather than a theory, one acknowledges that the way one studies the phenomenon is not unique; it could also be studied other ways. One's model need not claim to be unique or final. It merits consideration if it provides an insight that isn't better provided by some other model." (Reuben Hersh, ”Mathematics as an Empirical Phenomenon, Subject to Modeling”, 2017)

"Quantum Machine Learning is defined as the branch of science and technology that is concerned with the application of quantum mechanical phenomena such as superposition, entanglement and tunneling for designing software and hardware to provide machines the ability to learn insights and patterns from data and the environment, and the ability to adapt automatically to changing situations with high precision, accuracy and speed." (Amit Ray, "Quantum Computing Algorithms for Artificial Intelligence", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope. (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The patterns that we extract using data science are useful only if they give us insight into the problem that enables us to do something to help solve the problem." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"A random collection of interesting but disconnected facts will lack the unifying theme to become a data story - it may be informative, but it won’t be insightful." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"An essential underpinning of both the kaizen and lean methodologies is data. Without data, companies using these approaches simply wouldn’t know what to improve or whether their incremental changes were successful. Data provides the clarity and specificity that’s often needed to drive positive change. The importance of having baselines, benchmarks, and targets isn’t isolated to just business; it can transcend everything from personal development to social causes. The right insight can instill both the courage and confidence to forge a new direction - turning a leap of faith into an informed expedition." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"An insight is when you mix your creative and intellectual labor with a set of data points to create a point of view resulting in a useful assertion. You 'see into' an object of inquiry to reveal important characteristics about its nature." (Eben Hewitt, "Technology Strategy Patterns: Architecture as strategy" 2nd Ed., 2019)

"An essential underpinning of both the kaizen and lean methodologies is data. Without data, companies using these approaches simply wouldn’t know what to improve or whether their incremental changes were successful. Data provides the clarity and specificity that’s often needed to drive positive change. The importance of having baselines, benchmarks, and targets isn’t isolated to just business; it can transcend everything from personal development to social causes. The right insight can instill both the courage and confidence to forge a new direction - turning a leap of faith into an informed expedition." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Before you can even consider creating a data story, you must have a meaningful insight to share. One of the essential attributes of a data story is a central or main insight. Without a main point, your data story will lack purpose, direction, and cohesion. A central insight is the unifying theme (telos appeal) that ties your various findings together and guides your audience to a focal point or climax for your data story. However, when you have an increasing amount of data at your disposal, insights can be elusive. The noise from irrelevant and peripheral data can interfere with your ability to pinpoint the important signals hidden within its core." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Data storytelling is transformative. Many people don’t realize that when they share insights, they’re not just imparting information to other people. The natural consequence of sharing an insight is change. Stop doing that, and do more of this. Focus less on them, and concentrate more on these people. Spend less there, and invest more here. A poignant insight will drive an enlightened audience to think or act differently. So, as a data storyteller, you’re not only guiding the audience through the data, you’re also acting as a change agent. Rather than just pointing out possible enhancements, you’re helping your audience fully understand the urgency of the changes and giving them the confidence to move forward." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Some problems are just too complicated for rational logical solutions. They admit of insights, not answers." (Jerome B Wiesner)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.