02 December 2018

🔭Data Science: Complexity (Just the Quotes)

"If we study the history of science we see happen two inverse phenomena […] Sometimes simplicity hides under complex appearances; sometimes it is the simplicity which is apparent, and which disguises extremely complicated realities. […] No doubt, if our means of investigation should become more and more penetrating, we should discover the simple under the complex, then the complex under the simple, then again the simple under the complex, and so on, without our being able to foresee what will be the last term. We must stop somewhere, and that science may be possible, we must stop when we have found simplicity. This is the only ground on which we can rear the edifice of our generalizations." (Henri Poincaré, "Science and Hypothesis", 1901)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"[Disorganized complexity] is a problem in which the number of variables is very large, and one in which each of the many variables has a behavior which is individually erratic, or perhaps totally unknown. However, in spite of this helter-skelter, or unknown, behavior of all the individual variables, the system as a whole possesses certain orderly and analyzable average properties. [...] [Organized complexity is] not problems of disorganized complexity, to which statistical methods hold the key. They are all problems which involve dealing simultaneously with a sizable number of factors which are interrelated into an organic whole. They are all, in the language here proposed, problems of organized complexity." (Warren Weaver, "Science and Complexity", American Scientist Vol. 36, 1948)

"Nor does complexity deny the valid simplification which is part of the process of analysis, and even a method of achieving complex architecture itself." (Robert Venturi, "Complexity and Contradiction in Architecture", 1966)

"The central task of a natural science is to make the wonderful commonplace: to show that complexity, correctly viewed, is only a mask for simplicity; to find pattern hidden in apparent chaos." (Herbert A Simon, "The Sciences of the Artificial", 1969)

"At each level of complexity, entirely new properties appear. [And] at each stage, entirely new laws, concepts, and generalizations are necessary, requiring inspiration and creativity to just as great a degree as in the previous one." (Herb Anderson, 1972)

"In general, complexity and precision bear an inverse relation to one another in the sense that, as the complexity of a problem increases, the possibility of analysing it in precise terms diminishes. Thus 'fuzzy thinking' may not be deplorable, after all, if it makes possible the solution of problems which are much too complex for precise analysis." (Lotfi A Zadeh, "Fuzzy languages and their relation to human intelligence", 1972)

"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage to move in the opposite direction." (Ernst F Schumacher, "Small is Beautiful", 1973)

"The aim of the model is of course not to reproduce reality in all its complexity. It is rather to capture in a vivid, often formal, way what is essential to understanding some aspect of its structure or behavior." (Joseph Weizenbaum, "Computer power and human reason: From judgment to calculation", 1976)

"All nature is a continuum. The endless complexity of life is organized into patterns which repeat themselves at each level of system." (James G Miller, "Living Systems", 1978)

"Simplicity does not precede complexity, but follows it." (Alan J Perlis, "Epigrams on Programming", 1982)

"Organized simplicity occurs where a small number of significant factors and a large number of insignificant factors appear initially to be complex, but on investigation display hidden simplicity." (Robert L Flood & Ewart R Carson, "Dealing with Complexity: An introduction to the theory and application of systems", 1988)

"The state of development of mathematical theory in relation to some attributes of complexity is a clear measure of our ability/inability to deal with that attribute […]" (Robert L Flood & Ewart R Carson, "Dealing with Complexity: An introduction to the theory and application of systems", 1988)

"Complexity is not an objective factor but a subjective one. Supersignals reduce complexity, collapsing a number of features into one. Consequently, complexity must be understood in terms of a specific individual and his or her supply of supersignals. We learn supersignals from experience, and our supply can differ greatly from another individual's. Therefore there can be no objective measure of complexity." (Dietrich Dorner, "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations", 1989)

"Modeling in its broadest sense is the cost-effective use of something in place of something else for some [cognitive] purpose. It allows us to use something that is simpler, safer, or cheaper than reality instead of reality for some purpose. A model represents reality for the given purpose; the model is an abstraction of reality in the sense that it cannot represent all aspects of reality. This allows us to deal with the world in a simplified manner, avoiding the complexity, danger and irreversibility of reality." (Jeff Rothenberg, "The Nature of Modeling. In: Artificial Intelligence, Simulation, and Modeling", 1989)

"A measure that corresponds much better to what is usually meant by complexity in ordinary conversation, as well as in scientific discourse, refers not to the length of the most concise description of an entity (which is roughly what AIC [algorithmic information content] is), but to the length of a concise description of a set of the entity’s regularities. Thus something almost entirely random, with practically no regularities, would have effective complexity near zero. So would something completely regular, such as a bit string consisting entirely of zeroes. Effective complexity can be high only a region intermediate between total order and complete." (Murray Gell-Mann, "What is Complexity?", Complexity Vol 1 (1), 1995)

"The larger, more detailed and complex the model - the less abstract the abstraction – the smaller the number of people capable of understanding it and the longer it takes for its weaknesses and limitations to be found out." (John Adams, "Risk", 1995)

"Complexity is that property of a model which makes it difficult to formulate its overall behaviour in a given language, even when given reasonably complete information about its atomic components and their inter-relations." (Bruce Edmonds, "Syntactic Measures of Complexity", 1999)

"Falling between order and chaos, the moment of complexity is the point at which self-organizing systems emerge to create new patterns of coherence and structures of behaviour." (Mark C Taylor, "The Moment of Complexity: Emerging Network Culture", 2001)

"[…] most earlier attempts to construct a theory of complexity have overlooked the deep link between it and networks. In most systems, complexity starts where networks turn nontrivial." (Albert-László Barabási, "Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life", 2002)

"The urge to tinker with a formula is a hunger that keeps coming back. Tinkering almost always leads to more complexity. The more complicated the metric, the harder it is for users to learn how to affect the metric, and the less likely it is to improve it." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

More on "Complexity" at the-web-of-knowledge.blogspot.com.

🔭Data Science: Error (Just the Quotes)

"The probable is something which lies midway between truth and error" (Christian Thomasius, "Institutes of Divine Jurisprudence", 1688)

"Knowledge being to be had only of visible and certain truth, error is not a fault of our knowledge, but a mistake of our judgment, giving assent to that which is not true." (John Locke, "An Essay Concerning Human Understanding", 1689)

"The errors of definitions multiply themselves according as the reckoning proceeds; and lead men into absurdities, which at last they see but cannot avoid, without reckoning anew from the beginning." (Thomas Hobbes, "The Moral and Political Works of Thomas Hobbes of Malmesbury", 1750)

"Men are often led into errors by the love of simplicity, which disposes us to reduce things to few principles, and to conceive a greater simplicity in nature than there really is." (Thomas Reid, "Essays on the Intellectual Powers of Man", 1785)

"The orbits of certainties touch one another; but in the interstices there is room enough for error to go forth and prevail." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"Nothing hurts a new truth more than an old error." (Johann Wolfgang von Goethe, "Sprüche in Prosa", 1840)

"Every detection of what is false directs us towards what is true: every trial exhausts some tempting form of error. Not only so; but scarcely any attempt is entirely a failure; scarcely any theory, the result of steady thought, is altogether false; no tempting form of error is without some latent charm derived from truth." (William Whewell, "Lectures on the History of Moral Philosophy in England", 1852)

"[…] ideas may be both novel and important, and yet, if they are incorrect - if they lack the very essential support of incontrovertible fact, they are unworthy of credence. Without this, a theory may be both beautiful and grand, but must be as evanescent as it is beautiful, and as unsubstantial as it is grand." (George Brewster, "A New Philosophy of Matter", 1858)

"When a power of nature, invisible and impalpable, is the subject of scientific inquiry, it is necessary, if we would comprehend its essence and properties, to study its manifestations and effects. For this purpose simple observation is insufficient, since error always lies on the surface, whilst truth must be sought in deeper regions." (Justus von Liebig," Familiar Letters on Chemistry", 1859)

"As in the experimental sciences, truth cannot be distinguished from error as long as firm principles have not been established through the rigorous observation of facts." (Louis Pasteur, "Étude sur la maladie des vers à soie", 1870)

"It would be an error to suppose that the great discoverer seizes at once upon the truth, or has any unerring method of divining it. In all probability the errors of the great mind exceed in number those of the less vigorous one. Fertility of imagination and abundance of guesses at truth are among the first requisites of discovery; but the erroneous guesses must be many times as numerous as those that prove well founded. The weakest analogies, the most whimsical notions, the most apparently absurd theories, may pass through the teeming brain, and no record remain of more than the hundredth part. […] The truest theories involve suppositions which are inconceivable, and no limit can really be placed to the freedom of hypotheses." (W Stanley Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1877)

"Perfect readiness to reject a theory inconsistent with fact is a primary requisite of the philosophic mind. But it, would be a mistake to suppose that this candour has anything akin to fickleness; on the contrary, readiness to reject a false theory may be combined with a peculiar pertinacity and courage in maintaining an hypothesis as long as its falsity is not actually apparent." (William S Jevons, "The Principles of Science", 1887)

"One is almost tempted to assert that quite apart from its intellectual mission, theory is the most practical thing conceivable, the quintessence of practice as it were, since the precision of its conclusions cannot be reached by any routine of estimating or trial and error; although given the hidden ways of theory, this will hold only for those who walk them with complete confidence." (Ludwig E Boltzmann, "On the Significance of Theories", 1890)

"[…] to kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact." (Charles R Darwin, "More Letters of Charles Darwin", Vol 2, 1903)

"Man's determination not to be deceived is precisely the origin of the problem of knowledge. The question is always and only this: to learn to know and to grasp reality in the midst of a thousand causes of error which tend to vitiate our observation." (Federigo Enriques, "Problems of Science", 1906)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Poor statistics may be attributed to a number of causes. There are the mistakes which arise in the course of collecting the data, and there are those which occur when those data are being converted into manageable form for publication. Still later, mistakes arise because the conclusions drawn from the published data are wrong. The real trouble with errors which arise during the course of collecting the data is that they are the hardest to detect." (Alfred R Ilersic, "Statistics", 1959)

"When using estimated figures, i.e. figures subject to error, for further calculation make allowance for the absolute and relative errors. Above all, avoid what is known to statisticians as 'spurious' accuracy. For example, if the arithmetic Mean has to be derived from a distribution of ages given to the nearest year, do not give the answer to several places of decimals. Such an answer would imply a degree of accuracy in the results of your calculations which are quite un- justified by the data. The same holds true when calculating percentages." (Alfred R Ilersic, "Statistics", 1959)

"While it is true to assert that much statistical work involves arithmetic and mathematics, it would be quite untrue to suggest that the main source of errors in statistics and their use is due to inaccurate calculations." (Alfred R Ilersic, "Statistics", 1959)

"Errors may also creep into the information transfer stage when the originator of the data is unconsciously looking for a particular result. Such situations may occur in interviews or questionnaires designed to gather original data. Improper wording of the question, or improper voice inflections. and other constructional errors may elicit nonobjective responses. Obviously, if the data is incorrectly gathered, any graph based on that data will contain the original error - even though the graph be most expertly designed and beautifully presented." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"One grievous error in interpreting approximations is to allow only good approximations." (Preston C Hammer, "Mind Pollution", Cybernetics, Vol. 14, 1971)

"Thus, the construction of a mathematical model consisting of certain basic equations of a process is not yet sufficient for effecting optimal control. The mathematical model must also provide for the effects of random factors, the ability to react to unforeseen variations and ensure good control despite errors and inaccuracies." (Yakov Khurgin, "Did You Say Mathematics?", 1974)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Most people like to believe something is or is not true. Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory. If you believe too much you'll never notice the flaws; if you doubt too much you won't get started. It requires a lovely balance." (Richard W Hamming, "You and Your Research", 1986) 

"We have found that some of the hardest errors to detect by traditional methods are unsuspected gaps in the data collection (we usually discovered them serendipitously in the course of graphical checking)." (Peter Huber, "Huge data sets", Compstat '94: Proceedings, 1994)

"Humans may crave absolute certainty; they may aspire to it; they may pretend, as partisans of certain religions do, to have attained it. But the history of science - by far the most successful claim to knowledge accessible to humans - teaches that the most we can hope for is successive improvement in our understanding, learning from our mistakes, an asymptotic approach to the Universe, but with the proviso that absolute certainty will always elude us. We will always be mired in error. The most each generation can hope for is to reduce the error bars a little, and to add to the body of data to which error bars apply." (Carl Sagan, "The Demon-Haunted World: Science as a Candle in the Dark", 1995)

"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Random errors can always be determined by repeating measurements under identical conditions. […] this statement is true only for time-related random errors ." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"[myth:] Systematic errors can be determined inductively. It should be quite obvious that it is not possible to determine the scale error from the pattern of data values." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"What is so unconventional about the statistical way of thinking? First, statisticians do not care much for the popular concept of the statistical average; instead, they fixate on any deviation from the average. They worry about how large these variations are, how frequently they occur, and why they exist. [...] Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. [...] Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. Ignoring group differences when they are present frequently portends inequitable treatment. [...] Fourth, decisions based on statistics can be calibrated to strike a balance between two types of errors. Predictably, decision makers have an incentive to focus exclusively on minimizing any mistake that could bring about public humiliation, but statisticians point out that because of this bias, their decisions will aggravate other errors, which are unnoticed but serious. [...] Finally, statisticians follow a specific protocol known as statistical testing when deciding whether the evidence fits the crime, so to speak. Unlike some of us, they don’t believe in miracles. In other words, if the most unusual coincidence must be contrived to explain the inexplicable, they prefer leaving the crime unsolved." (Kaiser Fung, "Numbers Rule the World", 2010) 

"A key difference between a traditional statistical problems and a time series problem is that often, in time series, the errors are not independent." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

 "A wide variety of statistical procedures (regression, t-tests, ANOVA) require three assumptions: (i) Normal observations or errors. (ii) Independent observations (or independent errors, which is equivalent, in normal linear models to independent observations). (iii) Equal variance - when that is appropriate (for the one-sample t-test, for example, there is nothing being compared, so equal variances do not apply).(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"If the observations/errors are not independent, the statistical formulations are completely unreliable unless corrections can be made.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Once a model has been fitted to the data, the deviations from the model are the residuals. If the model is appropriate, then the residuals mimic the true errors. Examination of the residuals often provides clues about departures from the modeling assumptions. Lack of fit - if there is curvature in the residuals, plotted versus the fitted values, this suggests there may be whole regions where the model overestimates the data and other whole regions where the model underestimates the data. This would suggest that the current model is too simple relative to some better model.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

 "The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"When data is not normal, the reason the formulas are working is usually the central limit theorem. For large sample sizes, the formulas are producing parameter estimates that are approximately normal even when the data is not itself normal. The central limit theorem does make some assumptions and one is that the mean and variance of the population exist. Outliers in the data are evidence that these assumptions may not be true. Persistent outliers in the data, ones that are not errors and cannot be otherwise explained, suggest that the usual procedures based on the central limit theorem are not applicable.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

🔭Data Science: Certainty (Just the Quotes)

"Reasoning draws a conclusion and makes us grant the conclusion, but does not make the conclusion certain, nor does it remove doubt so that the mind may rest on the intuition of truth, unless the mind discovers it by the path of experience." (Roger Bacon, "Opus Majus", cca. 1267)

"[…] the highest probability amounts not to certainty, without which there can be no true knowledge." (John Locke, "An Essay Concerning Human Understanding", 1689)

"All effects follow not with like certainty from their supposed causes." (David Hume, "An Enquiry Concerning Human Understanding", 1748)

"As mathematical and absolute certainty is seldom to be attained in human affairs, reason and public utility require that judges and all mankind in forming their opinions of the truth of facts should be regulated by the superior number of the probabilities on the one side or the other whether the amount of these probabilities be expressed in words and arguments or by figures and numbers." (William Murray, 1773) 

"In order to supply the defects of experience, we will have recourse to the probable conjectures of analogy, conclusions which we will bequeath to our posterity to be ascertained by new observations, which, if we augur rightly, will serve to establish our theory and to carry it gradually nearer to absolute certainty." (Johann H Lambert, "The System of the World", 1800)

"One may even say, strictly speaking, that almost all our knowledge is only probable; and in the small number of things that we are able to know with certainty, in the mathematical sciences themselves, the principal means of arriving at the truth - induction and analogy - are based on probabilities, so that the whole system of human knowledge is tied up with the theory set out in this essay." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"The orbits of certainties touch one another; but in the interstices there is room enough for error to go forth and prevail." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"All certainty which does not consist in mathematical demonstration is nothing more than the highest probability; there is no other historical certainty." (Voltaire, "A Philosophical Dictionary", 1881)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"Sometimes the probability in favor of a generalization is enormous, but the infinite probability of certainty is never reached." (William Dampier-Whetham, "Science and the Human Mind", 1912)

"No matter how solidly founded a prediction may appear to us, we are never absolutely sure that experiment will not contradict it, if we undertake to verify it . […] It is far better to foresee even without certainty than not to foresee at all." (Henri Poincaré, "The Foundations of Science", 1913)

"The very name calculus of probabilities is a paradox. Probability opposed to certainty is what we do not know, and how can we calculate what we do not know?" (Henri Poincaré, "The Foundations of Science", 1913)

"The making of decisions, as everyone knows from personal experience, is a burdensome task. Offsetting the exhilaration that may result from correct and successful decision and the relief that follows the termination of a struggle to determine issues is the depression that comes from failure, or error of decision, and the frustration which ensues from uncertainty." (Chester I Barnard, "The Functions of the Executive", 1938)

"Uncertainty is introduced, however, by the impossibility of making generalizations, most of the time, which happens to all members of a class. Even scientific truth is a matter of probability and the degree of probability stops somewhere short of certainty." (Wayne C Minnick, "The Art of Persuasion", 1957)

"Incomplete knowledge must be considered as perfectly normal in probability theory; we might even say that, if we knew all the circumstances of a phenomenon, there would be no place for probability, and we would know the outcome with certainty." (Félix E Borel, Probability and Certainty", 1963)

"It is a commonplace of modern technology that there is a high measure of certainty that problems have solutions before there is knowledge of how they are to be solved." (John K Galbraith, "The New Industrial State", 1967)

"Statistics is a body of methods and theory applied to numerical evidence in making decisions in the face of uncertainty." (Lawrence Lapin, "Statistics for Modern Business Decisions", 1973)

"The most dominant decision type [that will have to be made in an organic organization] will be decisions under uncertainty." (Henry L Tosi & Stephen J Carroll, "Management", 1976)

"The greater the uncertainty, the greater the amount of decision making and information processing. It is hypothesized that organizations have limited capacities to process information and adopt different organizing modes to deal with task uncertainty. Therefore, variations in organizing modes are actually variations in the capacity of organizations to process information and make decisions about events which cannot be anticipated in advance." (John K Galbraith, "Organization Design", 1977)

"Facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts. Facts do not go away while scientists debate rival theories for explaining them." (Stephen J Gould "Evolution as Fact and Theory", 1981)

"Knowledge specialists may ascribe a degree of certainty to their models of the world that baffles and offends managers. Often the complexity of the world cannot be reduced to mathematical abstractions that make sense to a manager. Managers who expect complete, one-to-one correspondence between the real world and each element in a model are disappointed and skeptical." (Dale E Zand, "Information, Organization, and Power", 1981)

"But there is trouble in store for anyone who surrenders to the temptation of mistaking an elegant hypothesis for a certainty: the readers of detective stories know this quite well." (Primo Levi, "The Periodic Table", 1984)

"Probability is the mathematics of uncertainty. Not only do we constantly face situations in which there is neither adequate data nor an adequate theory, but many modem theories have uncertainty built into their foundations. Thus learning to think in terms of probability is essential. Statistics is the reverse of probability (glibly speaking). In probability you go from the model of the situation to what you expect to see; in statistics you have the observations and you wish to estimate features of the underlying model." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Probability plays a central role in many fields, from quantum mechanics to information theory, and even older fields use probability now that the presence of 'noise' is officially admitted. The newer aspects of many fields start with the admission of uncertainty." (Richard Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"The mathematical theories generally called 'mathematical theories of chance' actually ignore chance, uncertainty and probability. The models they consider are purely deterministic, and the quantities they study are, in the final analysis, no more than the mathematical frequencies of particular configurations, among all equally possible configurations, the calculation of which is based on combinatorial analysis. In reality, no axiomatic definition of chance is conceivable." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"The worst, i.e., most dangerous, feature of 'accepting the null hypothesis' is the giving up of explicit uncertainty. [...] Mathematics can sometimes be put in such black-and-white terms, but our knowledge or belief about the external world never can." (John Tukey, "The Philosophy of Multiple Comparisons", Statistical Science Vol. 6 (1), 1991)

"In nonlinear systems - and the economy is most certainly nonlinear - chaos theory tells you that the slightest uncertainty in your knowledge of the initial conditions will often grow inexorably. After a while, your predictions are nonsense." (M Mitchell Waldrop, "Complexity: The Emerging Science at the Edge of Order and Chaos", 1992)

"It is in the nature of theoretical science that there can be no such thing as certainty. A theory is only ‘true’ for as long as the majority of the scientific community maintain the view that the theory is the one best able to explain the observations." (Jim Baggott, "The Meaning of Quantum Theory", 1992)

"Statistics as a science is to quantify uncertainty, not unknown." (Chamont Wang, "Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety", 1993)

"There is a new science of complexity which says that the link between cause and effect is increasingly difficult to trace; that change (planned or otherwise) unfolds in non-linear ways; that paradoxes and contradictions abound; and that creative solutions arise out of diversity, uncertainty and chaos." (Andy P Hargreaves & Michael Fullan, "What’s Worth Fighting for Out There?", 1998)

"Information entropy has its own special interpretation and is defined as the degree of unexpectedness in a message. The more unexpected words or phrases, the higher the entropy. It may be calculated with the regular binary logarithm on the number of existing alternatives in a given repertoire. A repertoire of 16 alternatives therefore gives a maximum entropy of 4 bits. Maximum entropy presupposes that all probabilities are equal and independent of each other. Minimum entropy exists when only one possibility is expected to be chosen. When uncertainty, variety or entropy decreases it is thus reasonable to speak of a corresponding increase in information." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"Most physical systems, particularly those complex ones, are extremely difficult to model by an accurate and precise mathematical formula or equation due to the complexity of the system structure, nonlinearity, uncertainty, randomness, etc. Therefore, approximate modeling is often necessary and practical in real-world applications. Intuitively, approximate modeling is always possible. However, the key questions are what kind of approximation is good, where the sense of 'goodness' has to be first defined, of course, and how to formulate such a good approximation in modeling a system such that it is mathematically rigorous and can produce satisfactory results in both theory and applications." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Any scientific data without (a stated) uncertainty is of no avail. Therefore the analysis and description of uncertainty are almost as important as those of the data value itself . It should be clear that the uncertainty itself also has an uncertainty – due to its nature as a scientific quantity – and so on. The uncertainty of an uncertainty is generally not determined." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter 'slope' these distant points carry more effective weight. Naturally, this weight is distinct from the 'statistical' weight usually used in regression analysis." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"As uncertainties of scientific data values are nearly as important as the data values themselves, it is usually not acceptable that a best estimate is only accompanied by an estimated uncertainty. Therefore, only the size of nondominant uncertainties should be estimated. For estimating the size of a nondominant uncertainty we need to find its upper limit, i.e., we want to be as sure as possible that the uncertainty does not exceed a certain value." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic)." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In fact, H [entropy] measures the amount of uncertainty that exists in the phenomenon. If there were only one event, its probability would be equal to 1, and H would be equal to 0 - that is, there is no uncertainty about what will happen in a phenomenon with a single event because we always know what is going to occur. The more events that a phenomenon possesses, the more uncertainty there is about the state of the phenomenon. In other words, the more entropy, the more information." (Diego Rasskin-Gutman, "Chess Metaphors: Artificial Intelligence and the Human Mind", 2009)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"The storytelling mind is allergic to uncertainty, randomness, and coincidence. It is addicted to meaning. If the storytelling mind cannot find meaningful patterns in the world, it will try to impose them. In short, the storytelling mind is a factory that churns out true stories when it can, but will manufacture lies when it can't." (Jonathan Gottschall, "The Storytelling Animal: How Stories Make Us Human", 2012)

"The data is a simplification - an abstraction - of the real world. So when you visualize data, you visualize an abstraction of the world, or at least some tiny facet of it. Visualization is an abstraction of data, so in the end, you end up with an abstraction of an abstraction, which creates an interesting challenge. […] Just like what it represents, data can be complex with variability and uncertainty, but consider it all in the right context, and it starts to make sense." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"We have minds that are equipped for certainty, linearity and short-term decisions, that must instead make long-term decisions in a non-linear, probabilistic world." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting - that is, the more you should prefer simplicity." (Brian Christian & Thomas L Griffiths, "Algorithms to Live By: The Computer Science of Human Decisions", 2016)

"A notable difference between many fields and data science is that in data science, if a customer has a wish, even an experienced data scientist may not know whether it’s possible. Whereas a software engineer usually knows what tasks software tools are capable of performing, and a biologist knows more or less what the laboratory can do, a data scientist who has not yet seen or worked with the relevant data is faced with a large amount of uncertainty, principally about what specific data is available and about how much evidence it can provide to answer any given question. Uncertainty is, again, a major factor in the data scientific process and should be kept at the forefront of your mind when talking with customers about their wishes."  (Brian Godsey, "Think Like a Data Scientist", 2017)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Uncertainty is an adversary of coldly logical algorithms, and being aware of how those algorithms might break down in unusual circumstances expedites the process of fixing problems when they occur - and they will occur. A data scientist’s main responsibility is to try to imagine all of the possibilities, address the ones that matter, and reevaluate them all as successes and failures happen." (Brian Godsey, "Think Like a Data Scientist", 2017)

"Bootstrapping provides an intuitive, computer-intensive way of assessing the uncertainty in our estimates, without making strong assumptions and without using probability theory. But the technique is not feasible when it comes to, say, working out the margins of error on unemployment surveys of 100,000 people. Although bootstrapping is a simple, brilliant and extraordinarily effective idea, it is just too clumsy to bootstrap such large quantities of data, especially when a convenient theory exists that can generate formulae for the width of uncertainty intervals." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Entropy is a measure of amount of uncertainty or disorder present in the system within the possible probability distribution. The entropy and amount of unpredictability are directly proportional to each other." (G Suseela & Y Asnath V Phamila, "Security Framework for Smart Visual Sensor Networks", 2019)

"Estimates based on data are often uncertain. If the data were intended to tell us something about a wider population (like a poll of voting intentions before an election), or about the future, then we need to acknowledge that uncertainty. This is a double challenge for data visualization: it has to be calculated in some meaningful way and then shown on top of the data or statistics without making it all too cluttered." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Uncertainty confuses many people because they have the unreasonable expectation that science and statistics will unearth precise truths, when all they can yield is imperfect estimates that can always be subject to changes and updates." (Alberto Cairo, "How Charts Lie", 2019)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what anyone man will be up to, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician." (Sir Arthur C Doyle)

More quotes on" Certainty" at the-web-of-knowledge.blogspot.com

🔭Data Science: Hypothesis (Just the Quotes)

"[…] it is not necessary that these hypotheses should be true, or even probably; but it is enough if they provide a calculus which fits the observations […]" (Andrew Osiander, "On the Revolutions of the Heavenly Spheres", 1543)

"The art of discovering the causes of phenomena, or true hypothesis, is like the art of decyphering, in which an ingenious conjecture greatly shortens the road." (Gottfried W Leibniz, "New Essays Concerning Human Understanding", 1704) [published 1765]

"In order to shake a hypothesis, it is sometimes not necessary to do anything more than push it as far as it will go." (Denis Diderot, "On the Interpretation of Nature", 1753)

"No hypothesis can lay claim to any value unless it assembles many phenomena under one concept." (Johann Wolfgang von Goethe, [letter to Sommering] 1795)

"Induction, analogy, hypotheses founded upon facts and rectified continually by new observations, a happy tact given by nature and strengthened by numerous comparisons of its indications with experience, such are the principal means for arriving at truth." (Pierre-Simon Laplace, "A Philosophical Essay on Probabilities", 1814)

"The hypothesis is like the captain, and the observations like the soldiers of an army: while he appears to command them, and in this way to work his own will, he does in fact derive all his power of conquest from their obedience, and becomes helpless and useless if they mutiny." (William Whewell, "Philosophy of the Inductive Sciences", 1840)

"The process of scientific discovery is cautious and rigorous, not by abstaining from hypothesis, but by rigorously comparing hypotheses with facts, and by resolutely rejecting all which the comparison does not confirm." (William Whewell, "The Philosophy of the Inductive Sciences Founded Upon Their History" Vol. 2, 1840)

"When the hypothesis, of itself and without adjustment for the purpose, gives us the rule and reason of a class of facts not contemplated in its construction, we have a criterion of its reality, which has never yet been produced in favour of falsehood." (William Whewell, "The Philosophy of the Inductive Sciences", 1840) 

"An hypothesis being a mere supposition, there are no other limits to hypotheses than those of the human imagination; we may, if we please, imagine, by way of accounting for an effect, some cause of a kind utterly unknown, and acting according to a law altogether fictitious." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"It appears, then, to be a condition of a genuinely scientific hypothesis, that it be not destined always to remain an hypothesis, but be certain to be either proved or disproved by [...] comparison with observed facts." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The hypothesis, by suggesting observations and experiments, puts us upon the road to that independent evidence if it be really attainable; and till it be attained, the hypothesis ought not to count for more than a suspicion." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The rules of scientific investigation always require us, when we enter the domains of conjecture, to adopt that hypothesis by which the greatest number of known facts and phenomena may be reconciled." (Matthew F Maury, "The Physical Geography of the Sea", 1855) 

"An anticipative idea or an hypothesis is, then, the necessary starting point for all experimental reasoning. Without it, we could not make any investigation at all nor learn anything; we could only pile up sterile observations. If we experiment without a preconceived idea, we should move at random […]" (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1865)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1868)

"The great tragedy of Science - the slaying of a beautiful hypothesis by an ugly fact." (Thomas H Huxley, "Biogenesis and abiogenesis", [address] 1870)

"[…] wrong hypotheses, rightly worked from, have produced more useful results than unguided observation." (Augustus de Morgan, "A Budget of Paradoxes", 1872)

"An hypothesis is only a habit - a habit of looking through a glass of one peculiar colour, which imparts its hue to all around it." (Frederick Marryat, "The King's Own", 1873) 

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The scientific discovery appears first as the hypothesis of an analogy; and science tends to become independent of the hypothesis." (William K Clifford, "Lectures and Essays", 1879)

"Every hypothesis must derive indubitable results from mechanically well-defined assumptions by mathematically correct methods." (Ludwig Boltzmann, "Certain Questions of the Theory of Gasses", Nature Vol. 51 (1322), 1895) 

"For the truly scientific man, the hypothesis is destined solely to enable him to get the facts of nature in some definite order, an order which shall make apparent their connection with the great order and harmony which is believed to be present in the universe." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"If the working hypothesis fails in any essential particular he [the scientist] is ready to modify or discard it. For the truly inspired investigator, one undoubted fact weighs more in the balance than a thousand theories." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1896)

"Entia non sunt multiplicanda praeter necessitatem. That is to say; before you try a complicated hypothesis, you should make quite sure that no simplification of it will explain the facts equally well." (Charles S Peirce," Pragmatism and Pragmaticism", [lecture] 1903)

"A false hypothesis, if it serve as a guide for further enquiry, may, at the right stage of science, be as useful as, or more useful than, a truer one for which acceptable evidence is not yet at hand." (William C Dampier, "Science and the Human Mind, Science in the Ancient World", 1912) 

"Without hypothesis there can be no progress in knowledge." (Max Verworn, "Irritability", 1913) 

"The great difference between induction and hypothesis is that the former infers the existence of phenomena such as we have observed in cases which are similar, while hypothesis supposes something of a different kind from what we have directly observed, and frequently something which it would be impossible for us to observe directly." (Charles S Peirce, "Chance, Love and Logic: Philosophical Essays, Deduction, Induction, Hypothesis", 1914)

"Theory is the best guide for experiment - that were it not for theory and the problems and hypotheses that come out of it, we would not know the points we wanted to verify, and hence would experiment aimlessly" (Henry Hazlitt,  "Thinking as a Science", 1916)

"A good hypothesis in science must have other properties than those of the phenomenon it is immediately invoked to explain, otherwise it is not prolific enough." (William James, "Selected Papers on Philosophy", 1918) 

"An indispensable hypothesis, even though still far from being a guarantee of success, is however the pursuit of a specific aim, whose lighted beacon, even by initial failures, is not betrayed." (Max Planck, [Nobel lecture] 1918) 

"A hypothesis or theory is clear, decisive, and positive, but it is believed by no one but the man who created it. Experimental findings, on the other hand, are messy, inexact things, which are believed by everyone except the man who did the work." (Harlow Shapley, "Review of Scientific Instruments" Vol. 6, 1922) 

"However successful a theory or law may have been in the past, directly it fails to interpret new discoveries its work is finished, and it must be discarded or modified. However plausible the hypothesis, it must be ever ready for sacrifice on the altar of observation." (Joseph W Mellor, "A Comprehensive Treatise on Inorganic and Theoretical Chemistry", 1922) 

"Hypothesis, however, is an inference based on knowledge which is insufficient to prove its high probability." (Frederick L Barry, "The Scientific Habit of Thought", 1927) 

"Abstraction is the detection of a common quality in the characteristics of a number of diverse observations […] A hypothesis serves the same purpose, but in a different way. It relates apparently diverse experiences, not by directly detecting a common quality in the experiences themselves, but by inventing a fictitious substance or process or idea, in terms of which the experience can be expressed. A hypothesis, in brief, correlates observations by adding something to them, while abstraction achieves the same end by subtracting something." (Herbert Dingle, Science and Human Experience, 1931)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl Popper, "The Logic of Scientific Discovery", 1934) 

"All the theories and hypotheses of empirical science share this provisional character of being established and accepted ‘until further notice’, whereas a mathematical theorem, once proved, is established once and for all; it holds with that particular certainty which no subsequent empirical discoveries, however unexpected and extraordinary, can ever affect to the slightest extent." (Carl G Hempel, "Geometry and Empirical Science", 1935)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher, "The Design of Experiments", 1935)

"The laws of science are the permanent contributions to knowledge - the individual pieces that are fitted together in an attempt to form a picture of the physical universe in action. As the pieces fall into place, we often catch glimpses of emerging patterns, called theories; they set us searching for the missing pieces that will fill in the gaps and complete the patterns. These theories, these provisional interpretations of the data in hand, are mere working hypotheses, and they are treated with scant respect until they can be tested by new pieces of the puzzle." (Edwin P Whipple, "Experiment and Experience", [Commencement Address, California Institute of Technology] 1938)

"When two hypotheses are possible, we provisionally choose that which our minds adjudge to the simpler on the supposition that this Is the more likely to lead in the direction of the truth." (James H Jeans, "Physics and Philosophy" 3rd Ed., 1943)

"We see what we want to see, and observation conforms to hypothesis." (Bergen Evans, "The Natural History of Nonsense", 1946)

"A successful hypothesis is not necessarily a permanent hypothesis, but it is one which stimulates additional research, opens up new fields, or explains and coordinates previously unrelated facts." (Farrington Daniels, "Outlines of Physical Chemistry", 1948)

"There would be cases where we would not want to accept an hypothesis even though the evidence gives a high d. c. [degree of confirmation] score, because we are fearful of the consequences of a wrong decision." (C West Churchman, "Theory of Experimental Inference", 1948) 

"Hypothesis is a tool which can cause trouble if not used properly. We must be ready to abandon out hypothesis as soon as it is shown to be inconsistent with the facts." (William I B Beveridge, "The Art of Scientific Investigation", 1950) 

"A collection of observable concepts in a purely formal hypothesis suggesting no analogy with anything would consequently not suggest either any directions for its own development." (Mary B Hesse, "Operational Definition and Analogy in Physical Theories", British Journal for the Philosophy of Science 2 (8), 1952)

"Whenever we attempt to test a hypothesis we naturally try to avoid errors in judging it. This seems to indicate the right way of proceeding: when choosing a test we should try to minimize the frequency of errors that may be committed in applying it." (Jerzy Neyman, "Lectures and Conferences on Mathematical Statistics", 1952) 

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"[…] the grand aim of all science […] is to cover the greatest possible number of empirical facts by logical deductions from the smallest possible number of hypotheses or axioms." (Albert Einstein, 1954)

"One must credit an hypothesis with all that has had to be discovered in order to demolish it." (Jean Rostand, "The substance of man", 1962)

"The formulation of a hypothesis carries with it an obligation to test it as rigorously as we can command skills to do so." (Peter Medawar, "Hypothesis and Imagination", 1963)

"Truth in science can be defined as the working hypothesis best suited to open the way to the next better one." (Konrad Lorenz, "On Aggression", 1963) 

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology", 1966)

"All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. And this system is not a more or less arbitrary and doubtful point of departure for all our arguments; no it belongs to the essence of what we call an argument. The system is not so much the point of departure, as the element in which our arguments have their life." (Ludwig Wittgenstein, "On Certainty", 1969) 

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"A hypothesis will in the end become a truth when all phenomena let themselves be derived from it in a natural and in an obvious manner, when all these consequences are connected with one another and with the general reasons, in short, when that hypothesis is consistent in all its parts with itself." (Johann H Lambert, 1976)

"The essential function of a hypothesis consists in the guidance it affords to new observations and experiments, by which our conjecture is either confirmed or refuted." (Ernst Mach, "Knowledge and Error: Sketches on the Psychology of Enquiry", 1976)

"Be suspicious of a theory if more and more hypotheses are needed to support it as new facts become available, or as new considerations are brought to bear." (Sir Fred Hoyle & Nalin C Wickramasinghe, "Evolution from Space", 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Don't just read it; fight it! Ask your own question, look for your own examples, dicover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis?" (Paul R Halmos, "I Want to be a Mathematician", 1985)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M Stigler, "Testing Hypotheses or fitting Models? Another Look at Mass Extinctions" [in "Neutral Models in Biology"], 1987)

"All science is based on models, and every scientific model comprises three distinct stages: statement of well-defined hypotheses; deduction of all the consequences of these hypotheses, and nothing but these consequences; confrontation of these consequences with observed data." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"Any physical theory is always provisional, in the sense that it is only a hypothesis: you can never prove it. No matter how many times the results of experiments agree with some theory, you can never be sure that the next time the result will not contradict the theory." (Stephen Hawking,  "A Brief History of Time", 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"The model and the theory it represents must be accepted, at least temporarily, or rejected, depending on the agreement or disagreement between observed data and the hypotheses and implications of the model. When neither the hypotheses nor the implications of a theory can be confronted with the real world, that theory is devoid of any scientific interest. Mere logical, even mathematical, deduction remains worthless in terms of the understanding of reality if it is not closely linked to that reality." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A hypothesis is a novel suggestion that no one wants to believe. It is guilty, until found effective." (Edward Teller, "Conversations on the Dark Secrets of Physics", 1991)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002) 

"[…] a conceptual model is a diagram connecting variables and constructs based on theory and logic that displays the hypotheses to be tested." (Mary W Celsi et al, "Essentials of Business Research Methods", 2011)

"Data science is an iterative process. It starts with a hypothesis (or several hypotheses) about the system we’re studying, and then we analyze the information. The results allow us to reject our initial hypotheses and refine our understanding of the data. When working with thousands of fields and millions of rows, it’s important to develop intuitive ways to reject bad hypotheses quickly." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Observation and experiment, without a rational hypothesis, is like a man groping at objects at random with his eyes shut." (Henry P Tappan, "Elements of Logic", 2015)

"A hypothesis is a starting point for an investigation. When you hypothesize, you make a claim about why something might be the case, based on limited data, to offer an explanation or a path forward. You wouldn’t make a proposition about something you are certain of. You may not have enough evidence yet to even convince you that it’s true. But making such a claim puts a stake in the ground that suggests a path for focused analysis." (Eben Hewitt, "Technology Strategy Patterns: Architecture as strategy" 2nd Ed., 2019)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

01 December 2018

🔭Data Science: Data Visualization (Just the Quotes)

"No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods." (William S Cleveland, "The Elements of Graphing Data", 1985)

"The greatest possibilities of visual display lie in vividness and inescapability of the intended message. A visual display can stop your mental flow in its tracks and make you think. A visual display can force you to notice what you never expected to see. One should see the intended at once; one should not even have to wait for it to appear." (John W Tukey, "Data-based graphics: Visual display in the decades to come", Statistical Science 5, 1990)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"Many of the applications of visualization in this book give the impression that data analysis consists of an orderly progression of exploratory graphs, fitting, and visualization of fits and residuals. Coherence of discussion and limited space necessitate a presentation that appears to imply this. Real life is usually quite different. There are blind alleys. There are mistaken actions. There are effects missed until the very end when some visualization saves the day. And worse, there is the possibility of the nearly unmentionable: missed effects." (William S Cleveland, "Visualizing Data", 1993)

"One important aspect of reality is improvisation; as a result of special structure in a set of data, or the finding of a visualization method, we stray from the standard methods for the data type to exploit the structure or the finding." (William S Cleveland, "Visualizing Data", 1993)

"There are two components to visualizing the structure of statistical data - graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered." (William S Cleveland, "Visualizing Data", 1993)

"Visualization is an approach to data analysis that stresses a penetrating look at the structure of data. No other approach conveys as much information. […] Conclusions spring from data when this information is combined with the prior knowledge of the subject under investigation." (William S Cleveland, "Visualizing Data", 1993)

"Visualization is an effective framework for drawing inferences from data because its revelation of the structure of data can be readily combined with prior knowledge to draw conclusions. By contrast, because of the formalism of probablistic methods, it is typically impossible to incorporate into them the full body of prior information." (William S Cleveland, "Visualizing Data", 1993)

"When visualization tools act as a catalyst to early visual thinking about a relatively unexplored problem, neither the semantics nor the pragmatics of map signs is a dominant factor. On the other hand, syntactics (or how the sign-vehicles, through variation in the visual variables used to construct them, relate logically to one another) are of critical importance." (Alan M MacEachren, "How Maps Work: Representation, Visualization, and Design", 1995)

"The nature of maps and of their use in science and society is in the midst of remarkable change - change that is stimulated by a combination of new scientific and societal needs for geo-referenced information and rapidly evolving technologies that can provide that information in innovative ways. A key issue at the heart of this change is the concept of ‘visualization’." (Alan M MacEachren, "Exploratory cartographic visualization: advancing the agenda", 1997)

"Visualization for large data is an oxymoron - the art is to reduce size before one visualizes. The contradiction (and challenge) is that we may need to visualize first in order to find out how to reduce size." (Peter Huber, "Massive datasets workshop: Four years after", Journal of Computational and Graphical Statistics Vol 8, 1999)

"Functional visualizations are more than innovative statistical analyses and computational algorithms. They must make sense to the user and require a visual language system that uses color, shape, line, hierarchy and composition to communicate clearly and appropriately, much like the alphabetic and character-based languages used worldwide between humans." (Matt Woolman, "Digital Information Graphics", 2002)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002) 

"Dashboards and visualization are cognitive tools that improve your 'span of control' over a lot of business data. These tools help people visually identify trends, patterns and anomalies, reason about what they see and help guide them toward effective decisions. As such, these tools need to leverage people's visual capabilities. With the prevalence of scorecards, dashboards and other visualization tools now widely available for business users to review their data, the issue of visual information design is more important than ever." (Richard Brath & Michael Peters, "Dashboard Design: Why Design is Important," DM Direct, 2004)

"Merely drawing a plot does not constitute visualization. Visualization is about conveying important information to the reader accurately. It should reveal information that is in the data and should not impose structure on the data." (Robert Gentleman, "Bioinformatics and Computational Biology Solutions using R and Bioconductor", 2005)

"Exploratory Data Analysis is more than just a collection of data-analysis techniques; it provides a philosophy of how to dissect a data set. It stresses the power of visualisation and aspects such as what to look for, how to look for it and how to interpret the information it contains. Most EDA techniques are graphical in nature, because the main aim of EDA is to explore data in an open-minded way. Using graphics, rather than calculations, keeps open possibilities of spotting interesting patterns or anomalies that would not be apparent with a calculation (where assumptions and decisions about the nature of the data tend to be made in advance)." (Alan Graham, "Developing Thinking in Statistics", 2006) 

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008) 

"The main goal of data visualization is its ability to visualize data, communicating information clearly and effectively. It doesn’t mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex dataset by communicating its key aspects in a more intuitive way. Yet designers often tend to discard the balance between design and function, creating gorgeous data visualizations which fail to serve its main purpose - communicate information." (Vitaly Friedman, "Data Visualization and Infographics", Smashing Magazine, 2008)

"The purpose of visualization is insight, not pictures." (Ben Shneiderman, "Extreme visualization: squeezing a billion records into a million pixels",  SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD, 2008)

"With the ever increasing amount of empirical information that scientists from all disciplines are dealing with, there exists a great need for robust, scalable and easy to use clustering techniques for data abstraction, dimensionality reduction or visualization to cope with and manage this avalanche of data."  (Jörg Reichardt, "Structure in Complex Networks", 2009)

"So what is the difference between a chart or graph and a visualization? […] a chart or graph is a clean and simple atomic piece; bar charts contain a short story about the data being presented. A visualization, on the other hand, seems to contain much more ʻchart junkʼ, with many sometimes complex graphics or several layers of charts and graphs. A visualization seems to be the super-set for all sorts of data-driven design." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers." (J Heer et al, "A tour through the visualization zoo", Queue 8, 2010) 

"All graphics present data and allow a certain degree of exploration of those same data. Some graphics are almost all presentation, so they allow just a limited amount of exploration; hence we can say they are more infographics than visualization, whereas others are mostly about letting readers play with what is being shown, tilting more to the visualization side of our linear scale. But every infographic and every visualization has a presentation and an exploration component: they present, but they also facilitate the analysis of what they show, to different degrees." (Alberto Cairo, "The Functional Art", 2011)

"Exploratory data visualizations are appropriate when you have a whole bunch of data and you’re not sure what’s in it. […] By contrast, explanatory data visualization is appropriate when you already know what the data has to say, and you are trying to tell that story to somebody else." (Noah Iliinsky & Julie Steele, "Designing Data Visualizations", 2011)

"In data visualization, the number one rule of thumb to bear is mind is: Function first, suave second." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"The first and main goal of any graphic and visualization is to be a tool for your eyes and brain to perceive what lies beyond their natural reach." (Alberto Cairo, "The Functional Art", 2011)

"Thinking of graphics as art leads many to put bells and whistles over substance and to confound infographics with mere illustrations." (Alberto Cairo, "The Functional Art", 2011)

"[...] the terms data visualization and information visualization (casually, data viz and info viz) are useful for referring to any visual representation of data that is: (•) algorithmically drawn (may have custom touches but is largely rendered with the help of computerized methods); (•) easy to regenerate with different data (the same form may be repurposed to represent different datasets with similar dimensions or characteristics); (•) often aesthetically barren (data is not decorated); and (•) relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics)." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"Visualizations act as a campfire around which we gather to tell stories." (Al Shalloway, 2011)

"Good infographic design is about storytelling by combining data visualization design and graphic design." (Randy Krum, "Good Infographics: Effective Communication with Data Visualization and Design", 2013)

"Good visualization is a winding process that requires statistics and design knowledge. Without the former, the visualization becomes an exercise only in illustration and aesthetics, and without the latter, one of only analyses. On their own, these are fine skills, but they make for incomplete data graphics. Having skills in both provides you with the luxury - which is growing into a necessity - to jump back and forth between data exploration and storytelling." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The biggest thing to know is that data visualization is hard. Really difficult to pull off well. It requires harmonization of several skills sets and ways of thinking: conceptual, analytic, statistical, graphic design, programmatic, interface-design, story-telling, journalism - plus a bit of 'gut feel'. The end result is often simple and beautiful, but the process itself is usually challenging and messy." (David McCandless, 2013)

"Visualization can be appreciated purely from an aesthetic point of view, but it’s most interesting when it’s about data that’s worth looking at. That’s why you start with data, explore it, and then show results rather than start with a visual and try to squeeze a dataset into it. It’s like trying to use a hammer to bang in a bunch of screws. […] Aesthetics isn’t just a shiny veneer that you slap on at the last minute. It represents the thought you put into a visualization, which is tightly coupled with clarity and affects interpretation." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Visualization is what happens when you make the jump from raw data to bar graphs, line charts, and dot plots. […] In its most basic form, visualization is simply mapping data to geometry and color. It works because your brain is wired to find patterns, and you can switch back and forth between the visual and the numbers it represents. This is the important bit. You must make sure that the essence of the data isn’t lost in that back and forth between visual and the value it represents because if you can’t map back to the data, the visualization is just a bunch of shapes." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"What is good visualization? It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns, and outliers that tell you about yourself and what surrounds you. The best visualization evokes that moment of bliss when seeing something for the first time, knowing that what you see has been right in front of you, just slightly hidden. Sometimes it is a simple bar graph, and other times the visualization is complex because the data requires it." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"To be sure, data doesn’t always need to be visualized, and many data visualizations just plain suck. Look around you. It’s not hard to find truly awful representations of information. Some work in concept but fail because they are too busy; they confuse people more than they convey information [...]. Visualization for the sake of visualization is unlikely to produce desired results - and this goes double in an era of Big Data. Bad is still bad, even and especially at a larger scale." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"We are all becoming more comfortable with data. Data visualization is no longer just something we have to do at work. Increasingly, we want to do it as consumers and as citizens. Put simply, visualizing helps us understand what’s going on in our lives - and how to solve problems." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Data visualization is marketed today as the miracle cure that will open the doors to success, whatever its shape. We have enough experience to realize that in reality it’s not always easy to distinguish between real usefulness and zealous marketing. After the initial excitement over the prospects of data visualization comes disillusionment, and after that the possibility of a balanced assessment. The key is to get to this point quickly, without disappointments and at a lower cost." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"[...] data visualization [is] a tool that, by applying perceptual mechanisms to the visual representation of abstract quantitative data, facilitates the search for relevant shapes, order, or exceptions. [...]  We must think of data visualization as a generic field where several (combinations of) perspectives, processes, technologies, and objectives (not forgetting the subjective component of personal style) can coexist. In this sense, data art, infographics, and business visualization are branches of data visualization." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"Data visualization is not a science; it is a crossroads at which certain scientific knowledge is used to justify and frame subjective choices. Ÿis doesn’t mean that rules don’t count. Rules exist and are effective when applied within the context for which they were designed." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"[…] no single visualization is ever quite able to show all of the important aspects of our data at once - there just are not enough visual encoding channels. […] designing effective visualizations to make sense of data is not an art - it is a systematic and repeatable process." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

 "[…] the data itself can lead to new questions too. In exploratory data analysis (EDA), for example, the data analyst discovers new questions based on the data. The process of looking at the data to address some of these questions generates incidental visualizations - odd patterns, outliers, or surprising correlations that are worth looking into further." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The field of [data] visualization takes on that goal more broadly: rather than attempting to identify a single metric, the analyst instead tries to look more holistically across the data to get a usable, actionable answer. Arriving at that answer might involve exploring multiple attributes, and using a number of views that allow the ideas to come together. Thus, operationalization in the context of visualization is the process of identifying tasks to be performed over the dataset that are a reasonable approximation of the high-level question of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"As a first principle, any visualization should convey its information quickly and easily, and with minimal scope for misunderstanding. Unnecessary visual clutter makes more work for the reader’s brain to do, slows down the understanding (at which point they may give up) and may even allow some incorrect interpretations to creep in." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Data storytelling can be defined as a structured approach for communicating data insights using narrative elements and explanatory visuals." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Data storytelling involves the skillful combination of three key elements: data, narrative, and visuals. Data is the primary building block of every data story. It may sound simple, but a data story should always find its origin in data, and data should serve as the foundation for the narrative and visual elements of your story." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"(1) Good data visualization is trustworthy: Is it reliable? Is the portrayal of the data and the subject faithful? Do the representation and presentation design have integrity? (2) Good data visualization is accessible: Is it usable? Is the portrayal of the data and the subject relevant? Is the representation and presentation design suitably understandable? (3) Good data visualization is elegant: Is it aesthetic? Is the representation and presentation design appealing?" (Andy Kirk, "Data Visualisation: A Handbook for Data Driven Design" 2nd Ed., 2019)

"In addition to managing how the data is visualized to reduce noise, you can also decrease the visual interference by minimizing the extraneous cognitive load. In these cases, the nonrelevant information and design elements surrounding the data can cause extraneous noise. Poor design or display decisions by the data storyteller can inadvertently interfere with the communication of the intended signal. This form of noise can occur at both a macro and micro level." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"One very common problem in data visualization is that encoding numerical variables to area is incredibly popular, but readers can’t translate it back very well." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"There is often no one 'best' visualization, because it depends on context, what your audience already knows, how numerate or scientifically trained they are, what formats and conventions are regarded as standard in the particular field you’re working in, the medium you can use, and so on. It’s also partly scientific and partly artistic, so you get to express your own design style in it, which is what makes it so fascinating." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019) 

"When visuals are applied to data, they can enlighten the audience to insights that they wouldn’t see without charts or graphs. Many interesting patterns and outliers in the data would remain hidden in the rows and columns of data tables without the help of data visualizations. They connect with our visual nature as human beings and impart knowledge that couldn’t be obtained as easily using other approaches that involve just words or numbers." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"While visuals are an essential part of data storytelling, data visualizations can serve a variety of purposes from analysis to communication to even art. Most data charts are designed to disseminate information in a visual manner. Only a subset of data compositions is focused on presenting specific insights as opposed to just general information. When most data compositions combine both visualizations and text, it can be difficult to discern whether a particular scenario falls into the realm of data storytelling or not." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Another problem is that while data visualizations may appear to be objective, the designer has a great deal of control over the message a graphic conveys. Even using accurate data, a designer can manipulate how those data make us feel. She can create the illusion of a correlation where none exists, or make a small difference between groups look big." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"As presenters of data visualizations, often we just want our audience to understand something about their environment – a trend, a pattern, a breakdown, a way in which things have been progressing. If we ask ourselves what we want our audience to do with that information, we might have a hard time coming up with a clear answer sometimes. We might just want them to know something." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"Data visualizations are either used (1) to help people complete a task, or (2) to give them a general awareness of the way things are, or (3) to enable them to explore the topic for themselves."  (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

"Much of the data visualization that bombards us today is decoration at best, and distraction or even disinformation at worst. The decorative function is surprisingly common, perhaps because the data visualization teams of many media organizations are part of the art departments. They are led by people whose skills and experience are not in statistics but in illustration or graphic design. The emphasis is on the visualization, not on the data. It is, above all, a picture." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"A data visualization, or dashboard, is great for summarizing or describing what has gone on in the past, but if people don’t know how to progress beyond looking just backwards on what has happened, then they cannot diagnose and find the ‘why’ behind it." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is for the masses, and data visualization is powerful to simplify what could be very complicated." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Understanding the entire data ecosystem, from the production of a data point to its consumption in a dashboard or a visualization, provides the ability to invoke action, which is more valuable than the mere sum of its parts." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Data visualization is a simplified approach to studying data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data visualization is a mix of science and art. Sometimes we want to be closer to the science side of the spectrum - in other words, use visualizations that allow readers to more accurately perceive the absolute values of data and make comparisons. Other times we may want to be closer to the art side of the spectrum and create visuals that engage and excite the reader, even if they do not permit the most accurate comparisons." (Jonathan Schwabish, "Better Data Visualizations: A guide for scholars, researchers, and wonks", 2021)

"I agree that data visualizations should be visually appealing, driving and utilizing the appeal and power for individuals to utilize it effectively, but sometimes this can take too much time, taking it away from more valuable uses in data. Plus, if the data visualization is not moving the needle of a business goal or objective, how effective is that visualization?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data becomes more useful once it’s transformed into a data visualization or used in a data story. Data storytelling is the ability to effectively communicate insights from a dataset using narratives and visualizations. It can be used to put data insights into context and inspire action from your audience. Color can be very helpful when you are trying to make information stand out within your data visualizations." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Data visualization is the practice of taking insights found in data analysis and turning them into numbers, graphs, charts, and other visual concepts to make them easier to grasp, understand, learn from, and utilize.[...] The visualization of data can be thought of as both a science and an art in that the way it is displayed is often as important to its understanding as the actual information that is being displayed." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"The best approach is to build visualizations in the most digestible form, fitted to how that executive thinks. You will have to interact with executives, show them different visualizations, and see how they react in order to learn which forms work best for them. Be ready to fail often and learn fast, particularly with visualizations." (John Lucker)

"Visualisation is fundamentally limited by the number of pixels you can pump to a screen. If you have big data, you have way more data than pixels, so you have to summarise your data. Statistics gives you lots of really good tools for this." (Hadley Wickham)

"We often think of visualization as a design and programming task, but the process starts further back with the data. You have to understand the data - its trends and patterns, along with its flaws and imperfections - and the rest follows." (Nathan Yau)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.