Showing posts with label testing. Show all posts
Showing posts with label testing. Show all posts

17 March 2024

Business Intelligence: Data Products (Part II: The Complexity Challenge)

Business Intelligence
Business Intelligence Series

Creating data products within a data mesh resumes in "partitioning" a given set of inputs, outputs and transformations to create something that looks like a Lego structure, in which each Lego piece represents a data product. The word partition is improperly used as there can be overlapping in terms of inputs, outputs and transformations, though in an ideal solution the outcome should be close to a partition.

If the complexity of inputs and outputs can be neglected, even if their number could amount to a big number, not the same can be said about the transformations that must be performed in the process. Moreover, the transformations involve reengineering the logic built in the source systems, which is not a trivial task and must involve adequate testing. The transformations are a must and there's no way to avoid them. 

When designing a data warehouse or data mart one of the goals is to keep the redundancy of the transformations and of the intermediary results to a minimum to minimize the unnecessary duplication of code and data. Code duplication becomes usually an issue when the logic needs to be changed, and in business contexts that can happen often enough to create other challenges. Data duplication becomes an issue when they are not in synch, fact derived from code not synchronized or with different refresh rates.

Building the transformations as SQL-based database objects has its advantages. There were many attempts for providing non-SQL operators for the same (in SSIS, Power Query) though the solutions built based on them are difficult to troubleshoot and maintain, the overall complexity increasing with the volume of transformations that must be performed. In data mashes, the complexity increases also with the number of data products involved, especially when there are multiple stakeholders and different goals involved (see the challenges for developing data marts supposed to be domain-specific). 

To growing complexity organizations answer with complexity. On one side the teams of developers, business users and other members of the governance teams who together with the solution create an ecosystem. On the other side, the inherent coordination and organization meetings, managing proposals, the negotiation of scope for data products, their design, testing, etc.  The more complex the whole ecosystem becomes, the higher the chances for systemic errors to occur and multiply, respectively to create unwanted behavior of the parties involved. Ecosystems are challenging to monitor and manage. 

The more complex the architecture, the higher the chances for failure. Even if some organizations might succeed, it doesn't mean that such an endeavor is for everybody - a certain maturity in building data architectures, data-based artefacts and managing projects must exist in the organization. Many organizations fail in addressing basic analytical requirements, why would one think that they are capable of handling an increased complexity? Even if one breaks the complexity of a data warehouse to more manageable units, the complexity is just moved at other levels that are more difficult to manage in ensemble. 

Being able to audit and test each data product individually has its advantages, though when a data product becomes part of an aggregate it can be easily get lost in the bigger picture. Thus, is needed a global observability framework that allows to monitor the performance and health of each data product in aggregate. Besides that, there are needed event brokers and other mechanisms to handle failure, availability, security, etc. 

Data products make sense in certain scenarios, especially when the complexity of architectures is manageable, though attempting to redesign everything from their perspective is like having a hammer in one's hand and treating everything like a nail.

Previous Post <<||>> Next Post

30 December 2018

Data Science: Testing (Just the Quotes)

"We must trust to nothing but facts: These are presented to us by Nature, and cannot deceive. We ought, in every instance, to submit our reasoning to the test of experiment, and never to search for truth but by the natural road of experiment and observation." (Antoin-Laurent de Lavoisiere, "Elements of Chemistry", 1790)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"Science, in the broadest sense, is the entire body of the most accurately tested, critically established, systematized knowledge available about that part of the universe which has come under human observation. For the most part this knowledge concerns the forces impinging upon human beings in the serious business of living and thus affecting man’s adjustment to and of the physical and the social world. […] Pure science is more interested in understanding, and applied science is more interested in control […]" (Austin L Porterfield, "Creative Factors in Scientific Research", 1941)

"To a scientist a theory is something to be tested. He seeks not to defend his beliefs, but to improve them. He is, above everything else, an expert at ‘changing his mind’." (Wendell Johnson, 1946)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"The methods of science may be described as the discovery of laws, the explanation of laws by theories, and the testing of theories by new observations. A good analogy is that of the jigsaw puzzle, for which the laws are the individual pieces, the theories local patterns suggested by a few pieces, and the tests the completion of these patterns with pieces previously unconsidered." (Edwin P Hubble, "The Nature of Science and Other Lectures", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"It is easy to obtain confirmations, or verifications, for nearly every theory - if we look for confirmations. Confirmations should count only if they are the result of risky predictions. […] A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice. Every genuine test of a theory is an attempt to falsify it, or refute it." (Karl R Popper, "Conjectures and Refutations: The Growth of Scientific Knowledge", 1963)

"The final test of a theory is its capacity to solve the problems which originated it." (George Dantzig, "Linear Programming and Extensions", 1963)

"The mediation of theory and praxis can only be clarified if to begin with we distinguish three functions, which are measured in terms of different criteria: the formation and extension of critical theorems, which can stand up to scientific discourse; the organization of processes of enlightenment, in which such theorems are applied and can be tested in a unique manner by the initiation of processes of reflection carried on within certain groups toward which these processes have been directed; and the selection of appropriate strategies, the solution of tactical questions, and the conduct of the political struggle. On the first level, the aim is true statements, on the second, authentic insights, and on the third, prudent decisions." (Jürgen Habermas, "Introduction to Theory and Practice", 1963)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems.” (Richard Levins, "The Strategy of Model Building in Population Biology”, 1966)

"Discovery always carries an honorific connotation. It is the stamp of approval on a finding of lasting value. Many laws and theories have come and gone in the history of science, but they are not spoken of as discoveries. […] Theories are especially precarious, as this century profoundly testifies. World views can and do often change. Despite these difficulties, it is still true that to count as a discovery a finding must be of at least relatively permanent value, as shown by its inclusion in the generally accepted body of scientific knowledge." (Richard J. Blackwell, "Discovery in the Physical Sciences", 1969)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"Science is systematic organisation of knowledge about the universe on the basis of explanatory hypotheses which are genuinely testable. Science advances by developing gradually more comprehensive theories; that is, by formulating theories of greater generality which can account for observational statements and hypotheses which appear as prima facie unrelated." (Francisco J Ayala, "Studies in the Philosophy of Biology: Reduction and Related Problems", 1974)

"A good scientific law or theory is falsifiable just because it makes definite claims about the world. For the falsificationist, If follows fairly readily from this that the more falsifiable a theory is the better, in some loose sense of more. The more a theory claims, the more potential opportunities there will be for showing that the world does not in fact behave in the way laid down by the theory. A very good theory will be one that makes very wide-ranging claims about the world, and which is consequently highly falsifiable, and is one that resists falsification whenever it is put to the test." (Alan F Chalmers,  "What Is This Thing Called Science?", 1976)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, "Mind and Nature, A necessary unity", 1979)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Theoretical scientists, inching away from the safe and known, skirting the point of no return, confront nature with a free invention of the intellect. They strip the discovery down and wire it into place in the form of mathematical models or other abstractions that define the perceived relation exactly. The now-naked idea is scrutinized with as much coldness and outward lack of pity as the naturally warm human heart can muster. They try to put it to use, devising experiments or field observations to test its claims. By the rules of scientific procedure it is then either discarded or temporarily sustained. Either way, the central theory encompassing it grows. If the abstractions survive they generate new knowledge from which further exploratory trips of the mind can be planned. Through the repeated alternation between flights of the imagination and the accretion of hard data, a mutual agreement on the workings of the world is written, in the form of natural law." (Edward O Wilson, "Biophilia", 1984)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"Science has become a social method of inquiring into natural phenomena, making intuitive and systematic explorations of laws which are formulated by observing nature, and then rigorously testing their accuracy in the form of predictions. The results are then stored as written or mathematical records which are copied and disseminated to others, both within and beyond any given generation. As a sort of synergetic, rigorously regulated group perception, the collective enterprise of science far transcends the activity within an individual brain." (Lynn Margulis & Dorion Sagan, "Microcosmos", 1986)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M. Stigler, "Neutral Models in Biology", 1987)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, Mind and Nature: A necessary unity", 1988)

"Science doesn't purvey absolute truth. Science is a mechanism. It's a way of trying to improve your knowledge of nature. It's a system for testing your thoughts against the universe and seeing whether they match. And this works, not just for the ordinary aspects of science, but for all of life. I should think people would want to know that what they know is truly what the universe is like, or at least as close as they can get to it." (Isaac Asimov, [Interview by Bill Moyers] 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe, and seeing whether they match." (Isaac Asimov, [interview with Bill Moyers in The Humanist] 1989)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, New Theories of Everything", 1991)

"Scientists use mathematics to build mental universes. They write down mathematical descriptions - models - that capture essential fragments of how they think the world behaves. Then they analyse their consequences. This is called 'theory'. They test their theories against observations: this is called 'experiment'. Depending on the result, they may modify the mathematical model and repeat the cycle until theory and experiment agree. Not that it's really that simple; but that's the general gist of it, the essence of the scientific method." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"The amount of understanding produced by a theory is determined by how well it meets the criteria of adequacy - testability, fruitfulness, scope, simplicity, conservatism - because these criteria indicate the extent to which a theory systematizes and unifies our knowledge." (Theodore Schick Jr.,  "How to Think about Weird Things: Critical Thinking for a New Age", 1995)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Science is distinguished not for asserting that nature is rational, but for constantly testing claims to that or any other affect by observation and experiment." (Timothy Ferris, "The Whole Shebang: A State-of-the Universe’s Report", 1996)

"There are two kinds of mistakes. There are fatal mistakes that destroy a theory; but there are also contingent ones, which are useful in testing the stability of a theory." (Gian-Carlo Rota, [lecture] 1996)

"Validation is the process of testing how good the solutions produced by a system are. The results produced by a system are usually compared with the results obtained either by experts or by other systems. Validation is an extremely important part of the process of developing every knowledge-based system. Without comparing the results produced by the system with reality, there is little point in using it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test." (Richard Feynman, "The Meaning of It All", 1998)

"Let us regard a proof of an assertion as a purely mechanical procedure using precise rules of inference starting with a few unassailable axioms. This means that an algorithm can be devised for testing the validity of an alleged proof simply by checking the successive steps of the argument; the rules of inference constitute an algorithm for generating all the statements that can be deduced in a finite number of steps from the axioms." (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical Modeling: The two cultures", Statistical Science 16(3), 2001)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002)

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"In science, for a theory to be believed, it must make a prediction - different from those made by previous theories - for an experiment not yet done. For the experiment to be meaningful, we must be able to get an answer that disagrees with that prediction. When this is the case, we say that a theory is falsifiable - vulnerable to being shown false. The theory also has to be confirmable, it must be possible to verify a new prediction that only this theory makes. Only when a theory has been tested and the results agree with the theory do we advance the statement to the rank of a true scientific theory." (Lee Smolin, "The Trouble with Physics", 2006)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"Each systems archetype embodies a particular theory about dynamic behavior that can serve as a starting point for selecting and formulating raw data into a coherent set of interrelationships. Once those relationships are made explicit and precise, the 'theory' of the archetype can then further guide us in our data-gathering process to test the causal relationships through direct observation, data analysis, or group deliberation." (Daniel H Kim, "Systems Archetypes as Dynamic Theories", The Systems Thinker Vol. 24 (1), 2013)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Early stopping and regularization can ensure network generalization when you apply them properly. [...] With early stopping, the choice of the validation set is also important. The validation set should be representative of all points in the training set. When you use Bayesian regularization, it is important to train the network until it reaches convergence. The sum-squared error, the sum-squared weights, and the effective number of parameters should reach constant values when the network has converged. With both early stopping and regularization, it is a good idea to train the network starting from several different initial conditions. It is possible for either method to fail in certain circumstances. By testing several different initial conditions, you can verify robust network performance." (Mark H Beale et al, "Neural Network Toolbox™ User's Guide", 2017)

"Scientists generally agree that no theory is 100 percent correct. Thus, the real test of knowledge is not truth, but utility." (Yuval N Harari, "Sapiens: A brief history of humankind", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

22 December 2018

Data Science: Significance (Just the Quotes)

"What the use of P [the significance level] implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred." (Harold Jeffreys, "Theory of Probability", 1939)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things: (a) a test of significance, to be used on the data first of all, and (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney, "Facts from Figures", 1951)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"Science usually amounts to a lot more than blind trial and error. Good statistics consists of much more than just significance tests; there are more sophisticated tools available for the analysis of results, such as confidence statements, multiple comparisons, and Bayesian analysis, to drop a few names. However, not all scientists are good statisticians, or want to be, and not all people who are called scientists by the media deserve to be so described." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"The idea of statistical significance is valuable because it often keeps us from announcing results that later turn out to be nonresults. A significant result tells us that enough cases were observed to provide reasonable assurance of a real effect. It does not necessarily mean, though, that the effect is big enough to be important." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"A tendency to drastically underestimate the frequency of coincidences is a prime characteristic of innumerates, who generally accord great significance to correspondences of all sorts while attributing too little significance to quite conclusive but less flashy statistical evidence." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen,"Things I Have Learned (So Far)", American Psychologist, 1990)

"Statistical significance testing can involve a tautological logic in which tired researchers, having collected data on hundreds of subjects, then conduct a statistical test to evaluate whether there were a lot of subjects, which the researchers already know, because they collected the data and know they are tired. This tautology has created considerable damage as regards the cumulation of knowledge." (Bruce Thompson, "Two and One-Half Decades of Leadership in Measurement and Evaluation", Journal of Counseling & Development 70 (3), 1992)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"Graphic misrepresentation is a frequent misuse in presentations to the nonprofessional. The granddaddy of all graphical offenses is to omit the zero on the vertical axis. As a consequence, the chart is often interpreted as if its bottom axis were zero, even though it may be far removed. This can lead to attention-getting headlines about 'a soar' or 'a dramatic rise (or fall)'. A modest, and possibly insignificant, change is amplified into a disastrous or inspirational trend." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"If you flip a coin three times and it lands on heads each time, it's probably chance. If you flip it a hundred times and it lands on heads each time, you can be pretty sure the coin has heads on both sides. That's the concept behind statistical significance - it's the odds that the correlation (or other finding) is real, that it isn't just random chance." (T Colin Campbell, "The China Study", 2004)

"The dual meaning of the word significant brings into focus the distinction between drawing a mathematical inference and practical inference from statistical results." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"For the study of the topology of the interactions of a complex system it is of central importance to have proper random null models of networks, i.e., models of how a graph arises from a random process. Such models are needed for comparison with real world data. When analyzing the structure of real world networks, the null hypothesis shall always be that the link structure is due to chance alone. This null hypothesis may only be rejected if the link structure found differs significantly from an expectation value obtained from a random model. Any deviation from the random null model must be explained by non-random processes." (Jörg Reichardt, "Structure in Complex Networks", 2009)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"If the group is large enough, even very small differences can become statistically significant." (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers." (Gary Smith, "Standard Deviations", 2014)

"In short, statistical significance does not mean your result has any practical significance. As for statistical insignificance, it doesn’t tell you much. A statistically insignificant difference could be nothing but noise, or it could represent a real effect that can be pinned down only with more data." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"Statistical significance refers to the probability that something is true. It’s a measure of how probable it is that the effect we’re seeing is real (rather than due to chance occurrence), which is why it’s typically measured with a p-value. P, in this case, stands for probability. If you accept p-values as a measure of statistical significance, then the lower your p-value is, the less likely it is that the results you’re seeing are due to chance alone." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

More quotes on "Significance" at the-web-of-knowledge.blogspot.com.

04 December 2018

Data Science: Hypothesis Testing (Just the Quotes)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. And this system is not a more or less arbitrary and doubtful point of departure for all our arguments; no it belongs to the essence of what we call an argument. The system is not so much the point of departure, as the element in which our arguments have their life." (Ludwig Wittgenstein, "On Certainty", 1969)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"Decision-making problems (hypothesis testing) involve situations where it is desired to make a choice among various alternative decisions (hypotheses). Such problems can be viewed as generalized state estimation problems where the definition of state has simply been expanded." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Hypothesis testing can introduce the need for multiple models for the multiple hypotheses and,' if appropriate, a priori probabilities. The one modeling aspect of hypothesis testing that has no estimation counterpart is the problem of specifying the hypotheses to be considered. Often this is a critical step which influences both performance arid the difficulty of implementation." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Pattern recognition can be viewed as a special case of hypothesis testing. In pattern recognition, an observation z is to be used to decide what pattern caused it. Each possible pattern can be viewed as one hypothesis. The main problem in pattern recognition is the development of models for the z corresponding to each pattern (hypothesis)." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The term hypothesis testing arises because the choice as to which process is observed is based on hypothesized models. Thus hypothesis testing could also be called model testing. Hypothesis testing is sometimes called decision theory. The detection theory of communication theory is a special case." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M. Stigler, "Neutral Models in Biology", 1987)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

Data Science: Null Hypothesis (Just the Quotes)

"The first step in beginning the scientific study of a problem is to collect the data, which are or ought to be 'facts'." (John A Thomson, "Introduction to Science", 1911)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher, "The Design of Experiments", 1935)

"The essential feature is that we express ignorance of whether the new parameter is needed by taking half the prior probability for it as concentrated in the value indicated by the null hypothesis and distributing the other half over the range possible." (Harold Jeffreys, "Theory of Probablitity", 1939)

"What the use of P [the significance level] implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred." (Harold Jeffreys, "Theory of Probability", 1939)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"It is very easy to devise different tests which, on the average, have similar properties, [...] hey behave satisfactorily when the null hypothesis is true and have approximately the same power of detecting departures from that hypothesis. Two such tests may, however, give very different results when applied to a given set of data. The situation leads to a good deal of contention amongst statisticians and much discredit of the science of statistics. The appalling position can easily arise in which one can get any answer one wants if only one goes around to a large enough number of statisticians." (Frances Yates, "Discussion on the Paper by Dr. Box and Dr. Andersen", Journal of the Royal Statistical Society B Vol. 17, 1955)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963) 

"Operational research is the application of methods of the research scientist to various rather complex practical operations. [...] A paucity of numerical data with which to work is a usual characteristic of the operations to which operational research is applied." (John T Davies, "The Scientific Approach", 1965)

"[…] most of us still remain content to build our theoretical castles on the quicksand of merely rejecting the null hypothesis." (Marvin D Dunnette, "Fads, Fashions, and Folderol in Psychology", American Psychologist Vol. 21, 1966)

"What used to be called judgment is now called prejudice, and what used to be called prejudice is now called a null hypothesis." (Anthony W F Edwards. "Likelihood", 1972)

"Failing to reject a null hypothesis is distinctly different from proving a null hypothesis; the difference in these interpretations is not merely a semantic point. Rather, the two interpretations can lead to quite different biological conclusions." (David F Parkhurst, "Interpreting Failure to Reject a Null Hypothesis", Bulletin of the Ecological Society of America Vol. 66, 1985)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"The worst, i.e., most dangerous, feature of 'accepting the null hypothesis' is the giving up of explicit uncertainty. [...] Mathematics can sometimes be put in such black-and-white terms, but our knowledge or belief about the external world never can." (John Tukey, "The Philosophy of Multiple Comparisons", Statistical Science Vol. 6 (1), 1991)

"If the null hypothesis is not rejected, [Sir Ronald] Fisher's position was that nothing could be concluded. But researchers find it hard to go to all the trouble of conducting a study only to conclude that nothing can be concluded." (Frank L Schmidt, "Statistical Significance Testing and Cumulative Knowledge", "Psychology: Implications for Training of Researchers, Psychological Methods" Vol. 1 (2), 1996)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"For the study of the topology of the interactions of a complex system it is of central importance to have proper random null models of networks, i.e., models of how a graph arises from a random process. Such models are needed for comparison with real world data. When analyzing the structure of real world networks, the null hypothesis shall always be that the link structure is due to chance alone. This null hypothesis may only be rejected if the link structure found differs significantly from an expectation value obtained from a random model. Any deviation from the random null model must be explained by non-random processes." (Jörg Reichardt, "Structure in Complex Networks", 2009)

"Null hypothesis is something we attempt to find evidence against in the hypothesis tests. Null hypothesis is usually an initial claim that researchers make on the basis of previous knowledge or experience. Alternative hypothesis has a population parameter value different from that of null hypothesis. Alternative hypothesis is something you hope to come out to be true. Statistical tests are performed to decide which of these holds true in a hypothesis test. If the experiment goes in favor of the null hypothesis then we say the experiment has failed in rejecting the null hypothesis." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

02 December 2018

Data Science: Hypothesis (Just the Quotes)

"[…] it is not necessary that these hypotheses should be true, or even probably; but it is enough if they provide a calculus which fits the observations […]" (Andrew Osiander, "On the Revolutions of the Heavenly Spheres", 1543)

"The art of discovering the causes of phenomena, or true hypothesis, is like the art of decyphering, in which an ingenious conjecture greatly shortens the road." (Gottfried W Leibniz, "New Essays Concerning Human Understanding", 1704) [published 1765]

"In order to shake a hypothesis, it is sometimes not necessary to do anything more than push it as far as it will go." (Denis Diderot, "On the Interpretation of Nature", 1753)

"No hypothesis can lay claim to any value unless it assembles many phenomena under one concept." (Johann Wolfgang von Goethe, [letter to Sommering] 1795)

"Induction, analogy, hypotheses founded upon facts and rectified continually by new observations, a happy tact given by nature and strengthened by numerous comparisons of its indications with experience, such are the principal means for arriving at truth." (Pierre-Simon Laplace, "A Philosophical Essay on Probabilities", 1814)

"The hypothesis is like the captain, and the observations like the soldiers of an army: while he appears to command them, and in this way to work his own will, he does in fact derive all his power of conquest from their obedience, and becomes helpless and useless if they mutiny." (William Whewell, "Philosophy of the Inductive Sciences", 1840)

"The process of scientific discovery is cautious and rigorous, not by abstaining from hypothesis, but by rigorously comparing hypotheses with facts, and by resolutely rejecting all which the comparison does not confirm." (William Whewell, "The Philosophy of the Inductive Sciences Founded Upon Their History" Vol. 2, 1840)

"When the hypothesis, of itself and without adjustment for the purpose, gives us the rule and reason of a class of facts not contemplated in its construction, we have a criterion of its reality, which has never yet been produced in favour of falsehood." (William Whewell, "The Philosophy of the Inductive Sciences", 1840) 

"An hypothesis being a mere supposition, there are no other limits to hypotheses than those of the human imagination; we may, if we please, imagine, by way of accounting for an effect, some cause of a kind utterly unknown, and acting according to a law altogether fictitious." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"It appears, then, to be a condition of a genuinely scientific hypothesis, that it be not destined always to remain an hypothesis, but be certain to be either proved or disproved by [...] comparison with observed facts." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The hypothesis, by suggesting observations and experiments, puts us upon the road to that independent evidence if it be really attainable; and till it be attained, the hypothesis ought not to count for more than a suspicion." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The rules of scientific investigation always require us, when we enter the domains of conjecture, to adopt that hypothesis by which the greatest number of known facts and phenomena may be reconciled." (Matthew F Maury, "The Physical Geography of the Sea", 1855) 

"An anticipative idea or an hypothesis is, then, the necessary starting point for all experimental reasoning. Without it, we could not make any investigation at all nor learn anything; we could only pile up sterile observations. If we experiment without a preconceived idea, we should move at random […]" (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1865)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1868)

"The great tragedy of Science - the slaying of a beautiful hypothesis by an ugly fact." (Thomas H Huxley, "Biogenesis and abiogenesis", [address] 1870)

"[…] wrong hypotheses, rightly worked from, have produced more useful results than unguided observation." (Augustus de Morgan, "A Budget of Paradoxes", 1872)

"An hypothesis is only a habit - a habit of looking through a glass of one peculiar colour, which imparts its hue to all around it." (Frederick Marryat, "The King's Own", 1873) 

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The scientific discovery appears first as the hypothesis of an analogy; and science tends to become independent of the hypothesis." (William K Clifford, "Lectures and Essays", 1879)

"Every hypothesis must derive indubitable results from mechanically well-defined assumptions by mathematically correct methods." (Ludwig Boltzmann, "Certain Questions of the Theory of Gasses", Nature Vol. 51 (1322), 1895) 

"For the truly scientific man, the hypothesis is destined solely to enable him to get the facts of nature in some definite order, an order which shall make apparent their connection with the great order and harmony which is believed to be present in the universe." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"If the working hypothesis fails in any essential particular he [the scientist] is ready to modify or discard it. For the truly inspired investigator, one undoubted fact weighs more in the balance than a thousand theories." (James M Baldwin, "The Processes of Life Revealed by the Microscope: A Plea for Physiological Histology", Science N.S. Vol. 2 (34), 1895)

"In scientific investigations, it is permitted to invent any hypothesis and, if it explains various large and independent classes of facts, it rises to the ranks of a well-grounded theory." (Charles Darwin, "The Variations of Animals and Plants Under Domestication" Vol. 1, 1896)

"Entia non sunt multiplicanda praeter necessitatem. That is to say; before you try a complicated hypothesis, you should make quite sure that no simplification of it will explain the facts equally well." (Charles S Peirce," Pragmatism and Pragmaticism", [lecture] 1903)

"A false hypothesis, if it serve as a guide for further enquiry, may, at the right stage of science, be as useful as, or more useful than, a truer one for which acceptable evidence is not yet at hand." (William C Dampier, "Science and the Human Mind, Science in the Ancient World", 1912) 

"Without hypothesis there can be no progress in knowledge." (Max Verworn, "Irritability", 1913) 

"The great difference between induction and hypothesis is that the former infers the existence of phenomena such as we have observed in cases which are similar, while hypothesis supposes something of a different kind from what we have directly observed, and frequently something which it would be impossible for us to observe directly." (Charles S Peirce, "Chance, Love and Logic: Philosophical Essays, Deduction, Induction, Hypothesis", 1914)

"Theory is the best guide for experiment - that were it not for theory and the problems and hypotheses that come out of it, we would not know the points we wanted to verify, and hence would experiment aimlessly" (Henry Hazlitt,  "Thinking as a Science", 1916)

"A good hypothesis in science must have other properties than those of the phenomenon it is immediately invoked to explain, otherwise it is not prolific enough." (William James, "Selected Papers on Philosophy", 1918) 

"An indispensable hypothesis, even though still far from being a guarantee of success, is however the pursuit of a specific aim, whose lighted beacon, even by initial failures, is not betrayed." (Max Planck, [Nobel lecture] 1918) 

"A hypothesis or theory is clear, decisive, and positive, but it is believed by no one but the man who created it. Experimental findings, on the other hand, are messy, inexact things, which are believed by everyone except the man who did the work." (Harlow Shapley, "Review of Scientific Instruments" Vol. 6, 1922) 

"However successful a theory or law may have been in the past, directly it fails to interpret new discoveries its work is finished, and it must be discarded or modified. However plausible the hypothesis, it must be ever ready for sacrifice on the altar of observation." (Joseph W Mellor, "A Comprehensive Treatise on Inorganic and Theoretical Chemistry", 1922) 

"Hypothesis, however, is an inference based on knowledge which is insufficient to prove its high probability." (Frederick L Barry, "The Scientific Habit of Thought", 1927) 

"Abstraction is the detection of a common quality in the characteristics of a number of diverse observations […] A hypothesis serves the same purpose, but in a different way. It relates apparently diverse experiences, not by directly detecting a common quality in the experiences themselves, but by inventing a fictitious substance or process or idea, in terms of which the experience can be expressed. A hypothesis, in brief, correlates observations by adding something to them, while abstraction achieves the same end by subtracting something." (Herbert Dingle, Science and Human Experience, 1931)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl Popper, "The Logic of Scientific Discovery", 1934) 

"All the theories and hypotheses of empirical science share this provisional character of being established and accepted ‘until further notice’, whereas a mathematical theorem, once proved, is established once and for all; it holds with that particular certainty which no subsequent empirical discoveries, however unexpected and extraordinary, can ever affect to the slightest extent." (Carl G Hempel, "Geometry and Empirical Science", 1935)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher, "The Design of Experiments", 1935)

"The laws of science are the permanent contributions to knowledge - the individual pieces that are fitted together in an attempt to form a picture of the physical universe in action. As the pieces fall into place, we often catch glimpses of emerging patterns, called theories; they set us searching for the missing pieces that will fill in the gaps and complete the patterns. These theories, these provisional interpretations of the data in hand, are mere working hypotheses, and they are treated with scant respect until they can be tested by new pieces of the puzzle." (Edwin P Whipple, "Experiment and Experience", [Commencement Address, California Institute of Technology] 1938)

"When two hypotheses are possible, we provisionally choose that which our minds adjudge to the simpler on the supposition that this Is the more likely to lead in the direction of the truth." (James H Jeans, "Physics and Philosophy" 3rd Ed., 1943)

"We see what we want to see, and observation conforms to hypothesis." (Bergen Evans, "The Natural History of Nonsense", 1946)

"A successful hypothesis is not necessarily a permanent hypothesis, but it is one which stimulates additional research, opens up new fields, or explains and coordinates previously unrelated facts." (Farrington Daniels, "Outlines of Physical Chemistry", 1948)

"There would be cases where we would not want to accept an hypothesis even though the evidence gives a high d. c. [degree of confirmation] score, because we are fearful of the consequences of a wrong decision." (C West Churchman, "Theory of Experimental Inference", 1948) 

"Hypothesis is a tool which can cause trouble if not used properly. We must be ready to abandon out hypothesis as soon as it is shown to be inconsistent with the facts." (William I B Beveridge, "The Art of Scientific Investigation", 1950) 

"A collection of observable concepts in a purely formal hypothesis suggesting no analogy with anything would consequently not suggest either any directions for its own development." (Mary B Hesse, "Operational Definition and Analogy in Physical Theories", British Journal for the Philosophy of Science 2 (8), 1952)

"Whenever we attempt to test a hypothesis we naturally try to avoid errors in judging it. This seems to indicate the right way of proceeding: when choosing a test we should try to minimize the frequency of errors that may be committed in applying it." (Jerzy Neyman, "Lectures and Conferences on Mathematical Statistics", 1952) 

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"[…] the grand aim of all science […] is to cover the greatest possible number of empirical facts by logical deductions from the smallest possible number of hypotheses or axioms." (Albert Einstein, 1954)

"One must credit an hypothesis with all that has had to be discovered in order to demolish it." (Jean Rostand, "The substance of man", 1962)

"The formulation of a hypothesis carries with it an obligation to test it as rigorously as we can command skills to do so." (Peter Medawar, "Hypothesis and Imagination", 1963)

"Truth in science can be defined as the working hypothesis best suited to open the way to the next better one." (Konrad Lorenz, "On Aggression", 1963) 

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology", 1966)

"All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. And this system is not a more or less arbitrary and doubtful point of departure for all our arguments; no it belongs to the essence of what we call an argument. The system is not so much the point of departure, as the element in which our arguments have their life." (Ludwig Wittgenstein, "On Certainty", 1969) 

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"A hypothesis will in the end become a truth when all phenomena let themselves be derived from it in a natural and in an obvious manner, when all these consequences are connected with one another and with the general reasons, in short, when that hypothesis is consistent in all its parts with itself." (Johann H Lambert, 1976)

"The essential function of a hypothesis consists in the guidance it affords to new observations and experiments, by which our conjecture is either confirmed or refuted." (Ernst Mach, "Knowledge and Error: Sketches on the Psychology of Enquiry", 1976)

"Be suspicious of a theory if more and more hypotheses are needed to support it as new facts become available, or as new considerations are brought to bear." (Sir Fred Hoyle & Nalin C Wickramasinghe, "Evolution from Space", 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Don't just read it; fight it! Ask your own question, look for your own examples, dicover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis?" (Paul R Halmos, "I Want to be a Mathematician", 1985)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M Stigler, "Testing Hypotheses or fitting Models? Another Look at Mass Extinctions" [in "Neutral Models in Biology"], 1987)

"All science is based on models, and every scientific model comprises three distinct stages: statement of well-defined hypotheses; deduction of all the consequences of these hypotheses, and nothing but these consequences; confrontation of these consequences with observed data." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"Any physical theory is always provisional, in the sense that it is only a hypothesis: you can never prove it. No matter how many times the results of experiments agree with some theory, you can never be sure that the next time the result will not contradict the theory." (Stephen Hawking,  "A Brief History of Time", 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"The model and the theory it represents must be accepted, at least temporarily, or rejected, depending on the agreement or disagreement between observed data and the hypotheses and implications of the model. When neither the hypotheses nor the implications of a theory can be confronted with the real world, that theory is devoid of any scientific interest. Mere logical, even mathematical, deduction remains worthless in terms of the understanding of reality if it is not closely linked to that reality." (Maurice Allais, "An Outline of My Main Contributions to Economic Science", [Noble lecture] 1988)

"A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A hypothesis is a novel suggestion that no one wants to believe. It is guilty, until found effective." (Edward Teller, "Conversations on the Dark Secrets of Physics", 1991)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002) 

"[…] a conceptual model is a diagram connecting variables and constructs based on theory and logic that displays the hypotheses to be tested." (Mary W Celsi et al, "Essentials of Business Research Methods", 2011)

"Data science is an iterative process. It starts with a hypothesis (or several hypotheses) about the system we’re studying, and then we analyze the information. The results allow us to reject our initial hypotheses and refine our understanding of the data. When working with thousands of fields and millions of rows, it’s important to develop intuitive ways to reject bad hypotheses quickly." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Observation and experiment, without a rational hypothesis, is like a man groping at objects at random with his eyes shut." (Henry P Tappan, "Elements of Logic", 2015)

"A hypothesis is a starting point for an investigation. When you hypothesize, you make a claim about why something might be the case, based on limited data, to offer an explanation or a path forward. You wouldn’t make a proposition about something you are certain of. You may not have enough evidence yet to even convince you that it’s true. But making such a claim puts a stake in the ground that suggests a path for focused analysis." (Eben Hewitt, "Technology Strategy Patterns: Architecture as strategy" 2nd Ed., 2019)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

01 December 2018

Data Science: The Science in Data Science (Just the Quotes)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"Science is nothing but the finding of analogy, identity, in the most remote parts." (Ralph W Emerson, 1837)

"Therefore science always goes abreast with the just elevation of the man, keeping step with religion and metaphysics; or, the state of science is an index of our self-knowledge." (Ralph W Emerson, "The Poet", 1844)

"It may sound quite strange, but for me, as for other scientists on whom these kinds of imaginative images have a greater effect than other poems do, no science is at its very heart more closely related to poetry, perhaps, than is chemistry." (Just Liebig, 1854)

"Science is the systematic classification of experience." (George H Lewes, "The Physical Basis of Mind", 1877)

"Science is the observation of things possible, whether present or past; prescience is the knowledge of things which may come to pass, though but slowly." (Leonardo da Vinci, "The Notebooks of Leonardo da Vinci", 1883)

"While science is pursuing a steady onward movement, it is convenient from time to time to cast a glance back on the route already traversed, and especially to consider the new conceptions which aim at discovering the general meaning of the stock of facts accumulated from day to day in our laboratories." (Dmitry Mendeleyev, "The Periodic Law of the Chemical Elements", Journal of the Chemical Society Vol. 55, 1889)

"The aim of science is always to reduce complexity to simplicity." (William James, "The Principles of Psychology", 1890)

"Science is not the monopoly of the naturalist or the scholar, nor is it anything mysterious or esoteric. Science is the search for truth, and truth is the adequacy of a description of facts." (Paul Carus, "Philosophy as a Science", 1909)

"Science is reduction. Mathematics is its ideal, its form par excellence, for it is in mathematics that assimilation, identification, is most perfectly realized. The universe, scientifically explained, would be a certain formula, one and eternal, regarded as the equivalent of the entire diversity and movement of things." (Émile Boutroux, "Natural law in Science and Philosophy", 1914)

"Abstract as it is, science is but an outgrowth of life. That is what the teacher must continually keep in mind. […] Let him explain […] science is not a dead system - the excretion of a monstrous pedantism - but really one of the most vigorous and exuberant phases of human life." (George A L Sarton, "The Teaching of the History of Science", The Scientific Monthly, 1918)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Science is simply setting out on a fishing expedition to see whether it cannot find some procedure which it can call measurement of space and some procedure which it can call the measurement of time, and something which it can call a system of forces, and something which it can call masses." (Alfred N Whitehead, "The Concept of Nature", 1920)

"Science is a magnificent force, but it is not a teacher of morals. It can perfect machinery, but it adds no moral restraints to protect society from the misuse of the machine. It can also build gigantic intellectual ships, but it constructs no moral rudders for the control of storm tossed human vessel. It not only fails to supply the spiritual element needed but some of its unproven hypotheses rob the ship of its compass and thus endangers its cargo." (William J Bryan, "Undelivered Trial Summation Scopes Trial", 1925)

"Science is but a method. Whatever its material, an observation accurately made and free of compromise to bias and desire, and undeterred by consequence, is science." (Hans Zinsser, "Untheological Reflections", The Atlantic Monthly, 1929)

"Although this may seem a paradox, all exact science is dominated by the idea of approximation. When a man tells you that he knows the exact truth about anything, you are safe in inferring that he is an inexact man." (Bertrand Russell, "The Scientific Outlook", 1931)

"The common view of science is that it is a sort of machine for increasing the race’s store of dependable facts. It is that only in part; in even larger part it is a machine for upsetting undependable facts." (Will Durant, 1931)

"One has to recognize that science is not metaphysics, and certainly not mysticism; it can never bring us the illumination and the satisfaction experienced by one enraptured in ecstasy. Science is sobriety and clarity of conception, not intoxicated vision."(Ludwig Von Mises, "Epistemological Problems of Economics", 1933)

"Modern positivists are apt to see more clearly that science is not a system of concepts but rather a system of statements." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Science is a system of statements based on direct experience, and controlled by experimental verification. Verification in science is not, however, of single statements but of the entire system or a sub-system of such statements." (Rudolf Carnap, "The Unity of Science", 1934)

"Science is the attempt to discover, by means of observation, and reasoning based upon it, first, particular facts about the world, and then laws connecting facts with one another and (in fortunate cases) making it possible to predict future occurrences." (Bertrand Russell, "Religion and Science, Grounds of Conflict", 1935)

"[…] that all science is merely a game can be easily discarded as a piece of wisdom too easily come by. But it is legitimate to enquire whether science is not liable to indulge in play within the closed precincts of its own method. Thus, for instance, the scientist’s continuous penchant for systems tends in the direction of play." (Johan Huizinga, "Homo Ludens", 1938)

"Science makes no pretension to eternal truth or absolute truth; some of its rivals do. That science is in some respects inhuman may be the secret of its success in alleviating human misery and mitigating human stupidity." (Eric T Bell, "Mathematics: Queen and Servant of Science", 1938)

"Science is the attempt to make the chaotic diversity of our sense experience correspond to a logically uniform system of thought." (Albert Einstein, "Considerations Concerning the Fundaments of Theoretical Physics", Science Vol. 91 (2369), 1940)

"Science is the organised attempt of mankind to discover how things work as causal systems. The scientific attitude of mind is an interest in such questions. It can be contrasted with other attitudes, which have different interests; for instance the magical, which attempts to make things work not as material systems but as immaterial forces which can be controlled by spells; or the religious, which is interested in the world as revealing the nature of God." (Conrad H Waddington, "The Scientific Attitude", 1941)

"Science, in the broadest sense, is the entire body of the most accurately tested, critically established, systematized knowledge available about that part of the universe which has come under human observation. For the most part this knowledge concerns the forces impinging upon human beings in the serious business of living and thus affecting man’s adjustment to and of the physical and the social world. […] Pure science is more interested in understanding, and applied science is more interested in control […]" (Austin L Porterfield, "Creative Factors in Scientific Research", 1941)

"Science is an interconnected series of concepts and schemes that have developed as a result of experimentation and observation and are fruitful of further experimentation and observation."(James B Conant, "Science and Common Sense", 1951)

"[…] theoretical science is essentially disciplined exploitation of metaphor." (Anatol Rapoport, "Operational Philosophy", 1953)

"Prediction is all very well; but we must make sense of what we predict. The mainspring of science is the conviction that by honest, imaginative enquiry we can build up a system of ideas about Nature which has some legitimate claim to ‘reality’." (Stephen Toulmin, "The Philosophy of Science: An Introduction", 1953)

"An engineering science aims to organize the design principles used in engineering practice into a discipline and thus to exhibit the similarities between different areas of engineering practice and to emphasize the power of fundamental concepts. In short, an engineering science is predominated by theoretical analysis and very often uses the tool of advanced mathematics." (Qian Xuesen, "Engineering cybernetics", 1954))

"The true aim of science is to discover a simple theory which is necessary and sufficient to cover the facts, when they have been purified of traditional prejudices." (Lancelot L Whyte, "Accent on Form", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"The progress of science is the discovery at each step of a new order which gives unity to what had seemed unlike." (Jacob Bronowski, "Science and Human Values", 1956)

"[…] any serious examination of the basic concepts of any science is far more difficult than the elaboration of their ultimate consequences." (George F J Temple, "Turning Points in Physics", 1959)

"Science is usually understood to depict a universe of strict order and lawfulness, of rigorous economy - one whose currency is energy, convertible against a service charge into a growing common pool called entropy." (Paul A Weiss,"Organic Form: Scientific and Aesthetic Aspects", 1960)

"[…] the progress of science is a little like making a jig-saw puzzle. One makes collections of pieces which certainly fit together, though at first it is not clear where each group should come in the picture as a whole, and if at first one makes a mistake in placing it, this can be corrected later without dismantling the whole group." (Sir George Thomson, "The Inspiration of Science", 1961)

"Science is the reduction of the bewildering diversity of unique events to manageable uniformity within one of a number of symbol systems, and technology is the art of using these symbol systems so as to control and organize unique events. Scientific observation is always a viewing of things through the refracting medium of a symbol system, and technological praxis is always handling of things in ways that some symbol system has dictated. Education in science and technology is essentially education on the symbol level." (Aldous L Huxley, "Essay", Daedalus, 1962)

"The important distinction between science and those other systematizations [i.e., art, philosophy, and theology] is that science is self-testing and self-correcting. Here the essential point of science is respect for objective fact. What is correctly observed must be believed [...] the competent scientist does quite the opposite of the popular stereotype of setting out to prove a theory; he seeks to disprove it." (George G Simpson, "Notes on the Nature of Science", 1962)

"What, then, is science according to common opinion? Science is what scientists do. Science is knowledge, a body of information about the external world. Science is the ability to predict. Science is power, it is engineering. Science explains, or gives causes and reasons." (John Bremer "What Is Science?" [in "Notes on the Nature of Science"], 1962)

"Science is a matter of disinterested observation, patient ratiocination within some system of logically correlated concepts. In real-life conflicts between reason and passion the issue is uncertain. Passion and prejudice are always able to mobilize their forces more rapidly and press the attack with greater fury; but in the long run (and often, of course, too late) enlightened self-interest may rouse itself, launch a counterattack and win the day for reason." (Aldous L Huxley, "Literature and Science", 1963)

"Science is a way to teach how something gets to be known, what is not known, to what extent things are known (for nothing is known absolutely), how to handle doubt and uncertainty, what the rules of evidence are, how to think about things so that judgments can be made, how to distinguish truth from fraud, and from show." (Richard P Feynman, "The Problem of Teaching Physics in Latin America", Engineering and Science, 1963)

"The aim of science is to apprehend this purely intelligible world as a thing in itself, an object which is what it is independently of all thinking, and thus antithetical to the sensible world. [...] The world of thought is the universal, the timeless and spaceless, the absolutely necessary, whereas the world of sense is the contingent, the changing and moving appearance which somehow indicates or symbolizes it." (Robin G Collingwood, "Essays in the Philosophy of Art", 1964)

"The central task of a natural science is to make the wonderful commonplace: to show that complexity, correctly viewed, is only a mask for simplicity; to find pattern hidden in apparent chaos." (Herbert A Simon, "The Sciences of the Artificial", 1969)

"The central task of a natural science is to make the wonderful commonplace: to show that complexity, correctly viewed, is only a mask for simplicity; to find pattern hidden in apparent chaos." (Herbert A Simon, "The Sciences of the Artificial", 1969)

"Science is a product of man, of his mind; and science creates the real world in its own image." (Frank E Egler, "The Way of Science", 1970)

"To do science is to search for repeated patterns, not simply to accumulate facts [...]" (Robert H. MacArthur, "Geographical Ecology", 1972)

"Science is systematic organisation of knowledge about the universe on the basis of explanatory hypotheses which are genuinely testable. Science advances by developing gradually more comprehensive theories; that is, by formulating theories of greater generality which can account for observational statements and hypotheses which appear as prima facie unrelated." (Francisco J Ayala, "Studies in the Philosophy of Biology: Reduction and Related Problems", 1974)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"The very nature of science is such that scientists need the metaphor as a bridge between old and new theories." (Earl R MacCormac, "Metaphor and Myth in Science and Religion", 1976)

"Facts do not ‘speak for themselves’; they are read in the light of theory. Creative thought, in science as much as in the arts, is the motor of changing opinion. Science is a quintessentially human activity, not a mechanized, robot-like accumulation of objective information, leading by laws of logic to inescapable interpretation." (Stephen J Gould, "Ever Since Darwin", 1977)

"Science is not a heartless pursuit of objective information. It is a creative human activity, its geniuses acting more as artists than information processors. Changes in theory are not simply the derivative results of the new discoveries but the work of creative imagination influenced by contemporary social and political forces." (Stephen J Gould, "Ever Since Darwin: Reflections in Natural History", 1977)

"Engineering or Technology is the making of things that did not previously exist, whereas science is the discovering of things that have long existed." (David Billington, "The Tower and the Bridge: The New Art of Structural Engineering", 1983)

"Science is a process. It is a way of thinking, a manner of approaching and of possibly resolving problems, a route by which one can produce order and sense out of disorganized and chaotic observations. Through it we achieve useful conclusions and results that are compelling and upon which there is a tendency to agree." (Isaac Asimov, "‘X’ Stands for Unknown", 1984)

"If doing mathematics or science is looked upon as a game, then one might say that in mathematics you compete against yourself or other mathematicians; in physics your adversary is nature and the stakes are higher." (Mark Kac, "Enigmas Of Chance", 1985)

"Science is defined as a set of observations and theories about observations." (F Albert Matsen, "The Role of Theory in Chemistry", Journal of Chemical Education Vol. 62 (5), 1985)

"We expect to learn new tricks because one of our science based abilities is being able to predict. That after all is what science is about. Learning enough about how a thing works so you'll know what comes next. Because as we all know everything obeys the universal laws, all you need is to understand the laws." (James Burke, "The Day the Universe Changed", 1985)

"Science is human experience systematically extended (by intent, methodology and instrumentation) for the purpose of learning more about the natural world and for the critical empirical testing and possible falsification of all ideas about the natural world. Scientific hypotheses may incorporate only elements of the natural empirical world, and thus may contain no element of the supernatural." (Robert E Kofahl, Correctly Redefining Distorted Science: A Most Essential Task", Creation Research Society Quarterly Vol. 23, 1986)

"Science is not a given set of answers but a system for obtaining answers. The method by which the search is conducted is more important than the nature of the solution. Questions need not be answered at all, or answers may be provided and then changed. It does not matter how often or how profoundly our view of the universe alters, as long as these changes take place in a way appropriate to science. For the practice of science, like the game of baseball, is covered by definite rules." (Robert Shapiro, "Origins: A Skeptic’s Guide to the Creation of Life on Earth", 1986)

"Science doesn't purvey absolute truth. Science is a mechanism. It's a way of trying to improve your knowledge of nature. It's a system for testing your thoughts against the universe and seeing whether they match. And this works, not just for the ordinary aspects of science, but for all of life. I should think people would want to know that what they know is truly what the universe is like, or at least as close as they can get to it." (Isaac Asimov, [Interview by Bill Moyers] 1988)

"Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe, and seeing whether they match." (Isaac Asimov, [interview with Bill Moyers in The Humanist] 1989)

"The view of science is that all processes ultimately run down, but entropy is maximized only in some far, far away future. The idea of entropy makes an assumption that the laws of the space-time continuum are infinitely and linearly extendable into the future. In the spiral time scheme of the timewave this assumption is not made. Rather, final time means passing out of one set of laws that are conditioning existence and into another radically different set of laws. The universe is seen as a series of compartmentalized eras or epochs whose laws are quite different from one another, with transitions from one epoch to another occurring with unexpected suddenness." (Terence McKenna, "True Hallucinations", 1989)

"Science is (or should be) a precise art. Precise, because data may be taken or theories formulated with a certain amount of accuracy; an art, because putting the information into the most useful form for investigation or for presentation requires a certain amount of creativity and insight." (Patricia H Reiff, "The Use and Misuse of Statistics in Space Physics", Journal of Geomagnetism and Geoelectricity 42, 1990)

"In science if you know what you are doing you should not be doing it. In engineering if you do not know what you are doing you should not be doing it. Of course, you seldom, if ever, see either pure state." (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, New Theories of Everything", 1991)

"The goal of science is to make sense of the diversity of Nature." (John D Barrow, "Theories of Everything: The Quest for Ultimate Explanation", 1991)

"Science is not about control. It is about cultivating a perpetual condition of wonder in the face of something that forever grows one step richer and subtler than our latest theory about it. It is about  reverence, not mastery." (Richard Power, "Gold Bug Variations", 1993)

"Statistics as a science is to quantify uncertainty, not unknown." (Chamont Wang, "Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety", 1993)

"Clearly, science is not simply a matter of observing facts. Every scientific theory also expresses a worldview. Philosophical preconceptions determine where facts are sought, how experiments are designed, and which conclusions are drawn from them." (Nancy R Pearcey & Charles B. Thaxton, "The Soul of Science: Christian Faith and Natural Philosophy", 1994)

"Science is distinguished not for asserting that nature is rational, but for constantly testing claims to that or any other affect by observation and experiment." (Timothy Ferris, "The Whole Shebang: A State-of-the Universe’s Report", 1996)

"Science is more than a mere attempt to describe nature as accurately as possible. Frequently the real message is well hidden, and a law that gives a poor approximation to nature has more significance than one which works fairly well but is poisoned at the root." (Robert H March, "Physics for Poets", 1996)

"The art of science is knowing which observations to ignore and which are the key to the puzzle." (Edward W Kolb, "Blind Watchers of the Sky", 1996)

"Mathematics is the study of analogies between analogies. All science is. Scientists want to show that things that don’t look alike are really the same. That is one of their innermost Freudian motivations. In fact, that is what we mean by understanding." (Gian-Carlo Rota, "Indiscrete Thoughts", 1997)

"Religion is the antithesis of science; science is competent to illuminate all the deep questions of existence, and does so in a manner that makes full use of, and respects the human intellect. I see neither need nor sign of any future reconciliation." (Peter W Atkins, "Religion - The Antithesis to Science", 1997)

"[…] the pursuit of science is more than the pursuit of understanding. It is driven by the creative urge, the urge to construct a vision, a map, a picture of the world that gives the world a little more beauty and coherence than it had before." (John A Wheeler, "Geons, Black Holes, and Quantum Foam: A Life in Physics", 1998)

"The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test." (Richard Feynman, "The Meaning of It All", 1998)

"The passion and beauty and joy of science is that we humans have invented a process to understand the universe in a way that is true for everyone. We are finding universal truths." (Bill Nye, 2000)

"The poetry of science is in some sense embodied in its great equations, and these equations can also be peeled. But their layers represent their attributes and consequences, not their meanings." (Graham Farmelo, 2002)

"Science is the art of the appropriate approximation. While the flat earth model is usually spoken of with derision it is still widely used. Flat maps, either in atlases or road maps, use the flat earth model as an approximation to the more complicated shape." (Byron K. Jennings, "On the Nature of Science", Physics in Canada Vol. 63 (1), 2007)

"It is ironic but true: the one reality science cannot reduce is the only reality we will ever know. This is why we need art. By expressing our actual experience, the artist reminds us that our science is incomplete, that no map of matter will ever explain the immateriality of our consciousness." (Jonah Lehrer, "Proust Was a Neuroscientist", 2011)

"Science isn’t about being right. It is about convincing others of the correctness of an idea through a methodology all will accept using data everyone can trust. New ideas take time to be accepted because they compete with others that have already passed the test." (Tom Koch, "Commentary: Nobody loves a critic: Edmund A Parkes and John Snow’s cholera", International Journal of Epidemiology Vol. 42 (6), 2013)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

More quotes on "Science" at quotablemath.blogspot.com.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.