Showing posts with label theories. Show all posts
Showing posts with label theories. Show all posts

12 March 2024

Systems Engineering: A Play of Problems (Much Ado about Nothing)

Disclaimer: This post was created just for fun. No problem was hurt or solved in the process! 
Updated: 12-Jun-2024

On Problems

Everybody has at least a problem. If somebody doesn’t have a problem, he’ll make one. If somebody can't make a problem, he can always find a problem. One doesn't need to search long for finding a problem. Looking for a problem one sees problems. 

Not having a problem can easily become a problem. It’s better to have a problem than none. The none problem is undefinable, which makes it a problem. 

Avoiding a problem might lead you to another problem. Some problems are so old, that's easier to ignore them. 

In every big problem there’s a small problem trying to come out. Most problems can be reduced to smaller problems. A small problem may hide a bigger problem. 

It’s better to solve a problem when is still small, however problems can be perceived only when they grow bigger. 

In the neighborhood of a problem there’s another problem getting closer. Problems tend to attract each other. 

Between two problems there’s enough place for a third to appear. The shortest path between two problems is another problem. 

Two problems that appear together in successive situations might be the parts of the same problem. 

A problem is more than the sum of its parts.

Any problem can be simplified to the degree that it becomes another problem. 

The complementary of a problem is another problem. At the intersection/reunion of two problems lies another problem.

The inverse of a problem is another problem more complex than the initial problem.

Defining a problem correctly is another problem. A known problem doesn’t make one problem less. 

When a problem seems to be enough, a second appears. A problem never comes alone.  The interplay of the two problems creates a third.

Sharing the problems with somebody else just multiplies the number of problems. 

Problems multiply beyond necessity. Problems multiply beyond our expectations. Problems multiply faster than we can solve. 

Having more than one problem is for many already too much. Between many big problems and an infinity of problems there seem to be no big difference. 

Many small problems can converge toward a bigger problem. Many small problems can also diverge toward two bigger problems. 

When neighboring problems exist, people tend to isolate them. Isolated problems tend to find other ways to surprise.

Several problems aggregate and create bigger problems that tend to suck within the neighboring problems.

If one waits long enough some problems will solve themselves or it will get bigger. Bigger problems exceed one's area of responsibility. 

One can get credit for a self-created problem. It takes only a good problem to become famous.

A good problem can provide a lifetime. A good problem has the tendency to kick back where it hurts the most. One can fall in love with a good problem. 

One should not theorize before one has a (good) problem. A problem can lead to a new theory, while a theory brings with it many more problems. 

If the only tool you have is a hammer, every problem will look like a nail. (paraphrasing Abraham H Maslow)

Any field of knowledge can be covered by a set of problems. A field of knowledge should be learned by the problems it poses.

A problem thoroughly understood is always fairly simple, but unfairly complex. (paraphrasing Charles F Kettering)

The problem solver created usually the problem. 

Problem Solving

Break a problem in two to solve it easier. Finding how to break a problem is already another problem. Deconstructing a problem to its parts is no guarantee for solving the problem.

Every problem has at least two solutions from which at least one is wrong. It’s easier to solve the wrong problem. 

It’s easier to solve a problem if one knows the solution already. Knowing a solution is not a guarantee for solving the problem.

Sometimes a problem disappears faster than one can find a solution. 

If a problem has two solutions, more likely a third solution exists. 

Solutions can be used to generate problems. The design of a problem seldom lies in its solutions. 

The solution of a problem can create at least one more problem. 

One can solve only one problem at a time. 

Unsolvable problems lead to problematic approximations. There's always a better approximation, one just needs to find it. One needs to be o know when to stop searching for an approximation. 

There's not only a single way for solving a problem. Finding another way for solving a problem provides more insight into the problem. More insight complicates the problem unnecessarily. 

Solving a problem is a matter of perspective. Finding the right perspective is another problem.

Solving a problem is a matter of tools. Searching for the right tool can be a laborious process. 

Solving a problem requires a higher level of consciousness than the level that created it. (see Einstein) With the increase complexity of the problems one an run out of consciousness.

Trying to solve an old problem creates resistance against its solution(s). 

The premature optimization of a problem is the root of all evil. (paraphrasing Donald Knuth)

A great discovery solves a great problem but creates a few others on its way. (paraphrasing George Polya)

Solving the symptoms of a problem can prove more difficult that solving the problem itself.

A master is a person who knows the solutions to his problems. To learn the solutions to others' problems he needs a pupil. 

"The final test of a theory is its capacity to solve the problems which originated it." (George Dantzig) It's easier to theorize if one has a set of problems.

A problem is defined as a gap between where you are and where you want to be, though nobody knows exactly where he is or wants to be.

Complex problems are the problems that persist - so are minor ones.

"The problems are solved, not by giving new information, but by arranging what we have known since long." (Ludwig Wittgenstein, 1953) Some people are just lost in rearranging. 

Solving problems is a practical skill, but impractical endeavor. (paraphrasing George Polya) 

"To ask the right question is harder than to answer it." (Georg Cantor) So most people avoid asking the right question.

Solve more problems than you create.

They Said It

"A great many problems do not have accurate answers, but do have approximate answers, from which sensible decisions can be made." (Berkeley's Law)

"A problem is an opportunity to grow, creating more problems. [...] most important problems cannot be solved; they must be outgrown." (Wayne Dyer)

"A system represents someone's solution to a problem. The system doesn't solve the problem." (John Gall, 1975)

"As long as a branch of science offers an abundance of problems, so long is it alive." (David Hilbert)

"Complex problems have simple, easy to understand, wrong answers." [Grossman's Misquote]

"Every solution breeds new problems." [Murphy's laws]

"Given any problem containing n equations, there will be n+1 unknowns." [Snafu]

"I have not seen any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated." (Paul Anderson)

"If a problem causes many meetings, the meetings eventually become more important than the problem." (Hendrickson’s Law)

"If you think the problem is bad now, just wait until we’ve solved it." (Arthur Kasspe) [Epstein’s Law]

"Inventing is easy for staff outfits. Stating a problem is much harder. Instead of stating problems, people like to pass out half- accurate statements together with half-available solutions which they can't finish and which they want you to finish." [Katz's Maxims]

"It is better to do the right problem the wrong way than to do the wrong problem the right way." (Richard Hamming)

"Most problems have either many answers or no answer. Only a few problems have a single answer." [Berkeley's Law]

"Problems worthy of attack prove their worth by fighting back." (Piet Hein)

Rule of Accuracy: "When working toward the solution of a problem, it always helps if you know the answer."
Corollary: "Provided, of course, that you know there is a problem."

"Some problems are just too complicated for rational logical solutions. They admit of insights, not answers." (Jerome B Wiesner, 1963)

"Sometimes, where a complex problem can be illuminated by many tools, one can be forgiven for applying the one he knows best." [Screwdriver Syndrome]

"The best way to escape from a problem is to solve it." (Brendan Francis)

"The chief cause of problems is solutions." [Sevareid's Law]

"The first step of problem solving is to understand the existing conditions." (Kaoru Ishikawa)

"The human race never solves any of its problems, it only outlives them." (David Gerrold)

"The most fruitful research grows out of practical problems."  (Ralph B Peck)

"The problem-solving process will always break down at the point at which it is possible to determine who caused the problem." [Fyffe's Axiom]

"The worst thing you can do to a problem is solve it completely." (Daniel Kleitman)

"The easiest way to solve a problem is to deny it exists." (Isaac Asimov)

"The solution to a problem changes the problem." [Peers's Law]

"There is a solution to every problem; the only difficulty is finding it." [Evvie Nef's Law]

"There is no mechanical problem so difficult that it cannot be solved by brute strength and ignorance. [William's Law]

"Today's problems come from yesterday’s 'solutions'." (Peter M Senge, 1990)

"While the difficulties and dangers of problems tend to increase at a geometric rate, the knowledge and manpower qualified to deal with these problems tend to increase linearly." [Dror's First Law]

"You are never sure whether or not a problem is good unless you actually solve it." (Mikhail Gromov)

More quotes on Problem solving at QuotableMath.blogpost.com.

Resources:
Murphy's laws and corollaries (link)

30 December 2018

Data Science: Testing (Just the Quotes)

"We must trust to nothing but facts: These are presented to us by Nature, and cannot deceive. We ought, in every instance, to submit our reasoning to the test of experiment, and never to search for truth but by the natural road of experiment and observation." (Antoin-Laurent de Lavoisiere, "Elements of Chemistry", 1790)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"Science, in the broadest sense, is the entire body of the most accurately tested, critically established, systematized knowledge available about that part of the universe which has come under human observation. For the most part this knowledge concerns the forces impinging upon human beings in the serious business of living and thus affecting man’s adjustment to and of the physical and the social world. […] Pure science is more interested in understanding, and applied science is more interested in control […]" (Austin L Porterfield, "Creative Factors in Scientific Research", 1941)

"To a scientist a theory is something to be tested. He seeks not to defend his beliefs, but to improve them. He is, above everything else, an expert at ‘changing his mind’." (Wendell Johnson, 1946)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"The methods of science may be described as the discovery of laws, the explanation of laws by theories, and the testing of theories by new observations. A good analogy is that of the jigsaw puzzle, for which the laws are the individual pieces, the theories local patterns suggested by a few pieces, and the tests the completion of these patterns with pieces previously unconsidered." (Edwin P Hubble, "The Nature of Science and Other Lectures", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"It is easy to obtain confirmations, or verifications, for nearly every theory - if we look for confirmations. Confirmations should count only if they are the result of risky predictions. […] A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice. Every genuine test of a theory is an attempt to falsify it, or refute it." (Karl R Popper, "Conjectures and Refutations: The Growth of Scientific Knowledge", 1963)

"The final test of a theory is its capacity to solve the problems which originated it." (George Dantzig, "Linear Programming and Extensions", 1963)

"The mediation of theory and praxis can only be clarified if to begin with we distinguish three functions, which are measured in terms of different criteria: the formation and extension of critical theorems, which can stand up to scientific discourse; the organization of processes of enlightenment, in which such theorems are applied and can be tested in a unique manner by the initiation of processes of reflection carried on within certain groups toward which these processes have been directed; and the selection of appropriate strategies, the solution of tactical questions, and the conduct of the political struggle. On the first level, the aim is true statements, on the second, authentic insights, and on the third, prudent decisions." (Jürgen Habermas, "Introduction to Theory and Practice", 1963)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems.” (Richard Levins, "The Strategy of Model Building in Population Biology”, 1966)

"Discovery always carries an honorific connotation. It is the stamp of approval on a finding of lasting value. Many laws and theories have come and gone in the history of science, but they are not spoken of as discoveries. […] Theories are especially precarious, as this century profoundly testifies. World views can and do often change. Despite these difficulties, it is still true that to count as a discovery a finding must be of at least relatively permanent value, as shown by its inclusion in the generally accepted body of scientific knowledge." (Richard J. Blackwell, "Discovery in the Physical Sciences", 1969)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"Science is systematic organisation of knowledge about the universe on the basis of explanatory hypotheses which are genuinely testable. Science advances by developing gradually more comprehensive theories; that is, by formulating theories of greater generality which can account for observational statements and hypotheses which appear as prima facie unrelated." (Francisco J Ayala, "Studies in the Philosophy of Biology: Reduction and Related Problems", 1974)

"A good scientific law or theory is falsifiable just because it makes definite claims about the world. For the falsificationist, If follows fairly readily from this that the more falsifiable a theory is the better, in some loose sense of more. The more a theory claims, the more potential opportunities there will be for showing that the world does not in fact behave in the way laid down by the theory. A very good theory will be one that makes very wide-ranging claims about the world, and which is consequently highly falsifiable, and is one that resists falsification whenever it is put to the test." (Alan F Chalmers,  "What Is This Thing Called Science?", 1976)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, "Mind and Nature, A necessary unity", 1979)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Theoretical scientists, inching away from the safe and known, skirting the point of no return, confront nature with a free invention of the intellect. They strip the discovery down and wire it into place in the form of mathematical models or other abstractions that define the perceived relation exactly. The now-naked idea is scrutinized with as much coldness and outward lack of pity as the naturally warm human heart can muster. They try to put it to use, devising experiments or field observations to test its claims. By the rules of scientific procedure it is then either discarded or temporarily sustained. Either way, the central theory encompassing it grows. If the abstractions survive they generate new knowledge from which further exploratory trips of the mind can be planned. Through the repeated alternation between flights of the imagination and the accretion of hard data, a mutual agreement on the workings of the world is written, in the form of natural law." (Edward O Wilson, "Biophilia", 1984)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"Science has become a social method of inquiring into natural phenomena, making intuitive and systematic explorations of laws which are formulated by observing nature, and then rigorously testing their accuracy in the form of predictions. The results are then stored as written or mathematical records which are copied and disseminated to others, both within and beyond any given generation. As a sort of synergetic, rigorously regulated group perception, the collective enterprise of science far transcends the activity within an individual brain." (Lynn Margulis & Dorion Sagan, "Microcosmos", 1986)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M. Stigler, "Neutral Models in Biology", 1987)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, Mind and Nature: A necessary unity", 1988)

"Science doesn't purvey absolute truth. Science is a mechanism. It's a way of trying to improve your knowledge of nature. It's a system for testing your thoughts against the universe and seeing whether they match. And this works, not just for the ordinary aspects of science, but for all of life. I should think people would want to know that what they know is truly what the universe is like, or at least as close as they can get to it." (Isaac Asimov, [Interview by Bill Moyers] 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe, and seeing whether they match." (Isaac Asimov, [interview with Bill Moyers in The Humanist] 1989)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, New Theories of Everything", 1991)

"Scientists use mathematics to build mental universes. They write down mathematical descriptions - models - that capture essential fragments of how they think the world behaves. Then they analyse their consequences. This is called 'theory'. They test their theories against observations: this is called 'experiment'. Depending on the result, they may modify the mathematical model and repeat the cycle until theory and experiment agree. Not that it's really that simple; but that's the general gist of it, the essence of the scientific method." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"The amount of understanding produced by a theory is determined by how well it meets the criteria of adequacy - testability, fruitfulness, scope, simplicity, conservatism - because these criteria indicate the extent to which a theory systematizes and unifies our knowledge." (Theodore Schick Jr.,  "How to Think about Weird Things: Critical Thinking for a New Age", 1995)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Science is distinguished not for asserting that nature is rational, but for constantly testing claims to that or any other affect by observation and experiment." (Timothy Ferris, "The Whole Shebang: A State-of-the Universe’s Report", 1996)

"There are two kinds of mistakes. There are fatal mistakes that destroy a theory; but there are also contingent ones, which are useful in testing the stability of a theory." (Gian-Carlo Rota, [lecture] 1996)

"Validation is the process of testing how good the solutions produced by a system are. The results produced by a system are usually compared with the results obtained either by experts or by other systems. Validation is an extremely important part of the process of developing every knowledge-based system. Without comparing the results produced by the system with reality, there is little point in using it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test." (Richard Feynman, "The Meaning of It All", 1998)

"Let us regard a proof of an assertion as a purely mechanical procedure using precise rules of inference starting with a few unassailable axioms. This means that an algorithm can be devised for testing the validity of an alleged proof simply by checking the successive steps of the argument; the rules of inference constitute an algorithm for generating all the statements that can be deduced in a finite number of steps from the axioms." (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical Modeling: The two cultures", Statistical Science 16(3), 2001)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002)

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"In science, for a theory to be believed, it must make a prediction - different from those made by previous theories - for an experiment not yet done. For the experiment to be meaningful, we must be able to get an answer that disagrees with that prediction. When this is the case, we say that a theory is falsifiable - vulnerable to being shown false. The theory also has to be confirmable, it must be possible to verify a new prediction that only this theory makes. Only when a theory has been tested and the results agree with the theory do we advance the statement to the rank of a true scientific theory." (Lee Smolin, "The Trouble with Physics", 2006)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"Each systems archetype embodies a particular theory about dynamic behavior that can serve as a starting point for selecting and formulating raw data into a coherent set of interrelationships. Once those relationships are made explicit and precise, the 'theory' of the archetype can then further guide us in our data-gathering process to test the causal relationships through direct observation, data analysis, or group deliberation." (Daniel H Kim, "Systems Archetypes as Dynamic Theories", The Systems Thinker Vol. 24 (1), 2013)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Early stopping and regularization can ensure network generalization when you apply them properly. [...] With early stopping, the choice of the validation set is also important. The validation set should be representative of all points in the training set. When you use Bayesian regularization, it is important to train the network until it reaches convergence. The sum-squared error, the sum-squared weights, and the effective number of parameters should reach constant values when the network has converged. With both early stopping and regularization, it is a good idea to train the network starting from several different initial conditions. It is possible for either method to fail in certain circumstances. By testing several different initial conditions, you can verify robust network performance." (Mark H Beale et al, "Neural Network Toolbox™ User's Guide", 2017)

"Scientists generally agree that no theory is 100 percent correct. Thus, the real test of knowledge is not truth, but utility." (Yuval N Harari, "Sapiens: A brief history of humankind", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

29 December 2018

Data Science: Experience (Just the Quotes)

"[…] it is from long experience chiefly that we are to expect the most certain rules of practice, yet it is withal to be remembered, that observations, and to put us upon the most probable means of improving any art, is to get the best insight we can into the nature and properties of those things which we are desirous to cultivate and improve." (Stephen Hales, "Vegetable Staticks", 1727)

"In order to supply the defects of experience, we will have recourse to the probable conjectures of analogy, conclusions which we will bequeath to our posterity to be ascertained by new observations, which, if we augur rightly, will serve to establish our theory and to carry it gradually nearer to absolute certainty." (Johann H Lambert, "The System of the World", 1800)

"Induction, analogy, hypotheses founded upon facts and rectified continually by new observations, a happy tact given by nature and strengthened by numerous comparisons of its indications with experience, such are the principal means for arriving at truth." (Pierre-Simon Laplace, "A Philosophical Essay on Probabilities", 1814)

"Observation is so wide awake, and facts are being so rapidly added to the sum of human experience, that it appears as if the theorizer would always be in arrears, and were doomed forever to arrive at imperfect conclusion; but the power to perceive a law is equally rare in all ages of the world, and depends but little on the number of facts observed." (Henry D Thoreau, "A Week on the Concord and Merrimack Rivers", 1862)

"Science is the systematic classification of experience." (George H Lewes, "The Physical Basis of Mind", 1877)

"Experience teaches that one will be led to new discoveries almost exclusively by means of special mechanical models." (Ludwig Boltzmann, "Lectures on Gas Theory", 1896)

"Philosophy, like science, consists of theories or insights arrived at as a result of systemic reflection or reasoning in regard to the data of experience. It involves, therefore, the analysis of experience and the synthesis of the results of analysis into a comprehensive or unitary conception. Philosophy seeks a totality and harmony of reasoned insight into the nature and meaning of all the principal aspects of reality." (Joseph A Leighton, "The Field of Philosophy: An outline of lectures on introduction to philosophy", 1919)

"Abstraction is the detection of a common quality in the characteristics of a number of diverse observations […] A hypothesis serves the same purpose, but in a different way. It relates apparently diverse experiences, not by directly detecting a common quality in the experiences themselves, but by inventing a fictitious substance or process or idea, in terms of which the experience can be expressed. A hypothesis, in brief, correlates observations by adding something to them, while abstraction achieves the same end by subtracting something." (Herbert Dingle, Science and Human Experience, 1931)

"It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." (Albert Einstein, [lecture] 1933)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl Popper, "The Logic of Scientific Discovery", 1934) 

"Science is a system of statements based on direct experience, and controlled by experimental verification. Verification in science is not, however, of single statements but of the entire system or a sub-system of such statements." (Rudolf Carnap, "The Unity of Science", 1934)

"Science is the attempt to make the chaotic diversity of our sense experience correspond to a logically uniform system of thought." (Albert Einstein, "Considerations Concerning the Fundaments of Theoretical Physics", Science Vol. 91 (2369), 1940)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"Every bit of knowledge we gain and every conclusion we draw about the universe or about any part or feature of it depends finally upon some observation or measurement. Mankind has had again and again the humiliating experience of trusting to intuitive, apparently logical conclusions without observations, and has seen Nature sail by in her radiant chariot of gold in an entirely different direction." (Oliver J Lee, "Measuring Our Universe: From the Inner Atom to Outer Space", 1950)

"Statistics is the name for that science and art which deals with uncertain inferences - which uses numbers to find out something about nature and experience." (Warren Weaver, 1952)

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Experience without theory teaches nothing." (William E Deming, "Out of the Crisis", 1986)

"A discovery in science, or a new theory, even where it appears most unitary and most all-embracing, deals with some immediate element of novelty or paradox within the framework of far vaster, unanalyzed, unarticulated reserves of knowledge, experience, faith, and presupposition. Our progress is narrow: it takes a vast world unchallenged and for granted." (James R Oppenheimer, "Atom and Void", 1989)

"It is ironic but true: the one reality science cannot reduce is the only reality we will ever know. This is why we need art. By expressing our actual experience, the artist reminds us that our science is incomplete, that no map of matter will ever explain the immateriality of our consciousness." (Jonah Lehrer, "Proust Was a Neuroscientist", 2011)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statistical metrics can show us facts and trends that would be impossible to see in any other way, but often they’re used as a substitute for relevant experience, by managers or politicians without specific expertise or a close-up view." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The contradiction between what we see with our own eyes and what the statistics claim can be very real. […] The truth is more complicated. Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience - and in those cases, which to believe." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

27 December 2018

Data Science: Experiment (Just the Quotes)

"Those who have not imbibed the prejudices of philosophers, are easily convinced that natural knowledge is to be founded on experiment and observation." (Colin Maclaurin, "An Account of Sir Isaac Newton’s Philosophical Discoveries", 1748)

"We have three principal means: observation of nature, reflection, and experiment. Observation gathers the facts reflection combines them, experiment verifies the result of the combination. It is essential that the observation of nature be assiduous, that reflection be profound, and that experimentation be exact. Rarely does one see these abilities in combination. And so, creative geniuses are not common." (Denis Diderot, "On the Interpretation of Nature", 1753)

"Facts, observations, experiments - these are the materials of a great edifice, but in assembling them we must combine them into classes, distinguish which belongs to which order and to which part of the whole each pertains." (Antoine L Lavoisier, "Mémoires de l’Académie Royale des Sciences", 1777)

"The art of drawing conclusions from experiments and observations consists in evaluating probabilities and in estimating whether they are sufficiently great or numerous enough to constitute proofs. This kind of calculation is more complicated and more difficult than it is commonly thought to be […]" (Antoine-Laurent Lavoisier, cca. 1790)

"We must trust to nothing but facts: These are presented to us by Nature, and cannot deceive. We ought, in every instance, to submit our reasoning to the test of experiment, and never to search for truth but by the natural road of experiment and observation." (Antoin-Laurent de Lavoisiere, "Elements of Chemistry", 1790)

"Conjecture may lead you to form opinions, but it cannot produce knowledge. Natural philosophy must be built upon the phenomena of nature discovered by observation and experiment." (George Adams, "Lectures on Natural and Experimental Philosophy" Vol. 1, 1794)

"[Precision] is the very soul of science; and its attainment afford the only criterion, or at least the best, of the truth of theories, and the correctness of experiments." (John F W Herschel, "A Preliminary Discourse on the Study of Natural Philosophy", 1830)

"The hypothesis, by suggesting observations and experiments, puts us upon the road to that independent evidence if it be really attainable; and till it be attained, the hypothesis ought not to count for more than a suspicion." (John S Mill, "A System of Logic, Ratiocinative and Inductive", 1843)

"The framing of hypotheses is, for the enquirer after truth, not the end, but the beginning of his work. Each of his systems is invented, not that he may admire it and follow it into all its consistent consequences, but that he may make it the occasion of a course of active experiment and observation. And if the results of this process contradict his fundamental assumptions, however ingenious, however symmetrical, however elegant his system may be, he rejects it without hesitation. He allows no natural yearning for the offspring of his own mind to draw him aside from the higher duty of loyalty to his sovereign, Truth, to her he not only gives his affections and his wishes, but strenuous labour and scrupulous minuteness of attention." (William Whewell, "Philosophy of the Inductive Sciences" Vol. 2, 1847)

"An anticipative idea or an hypothesis is, then, the necessary starting point for all experimental reasoning. Without it, we could not make any investigation at all nor learn anything; we could only pile up sterile observations. If we experiment without a preconceived idea, we should move at random […]" (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1865)

"Isolated facts and experiments have in themselves no value, however great their number may be. They only become valuable in a theoretical or practical point of view when they make us acquainted with the law of a series of uniformly recurring phenomena, or, it may be, only give a negative result showing an incompleteness in our knowledge of such a law, till then held to be perfect." (Hermann von Helmholtz, "The Aim and Progress of Physical Science", 1869)

"It is surprising to learn the number of causes of error which enter into the simplest experiment, when we strive to attain rigid accuracy." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"Even one well-made observation will be enough in many cases, just as one well-constructed experiment often suffices for the establishment of a law." (Émile Durkheim, "The Rules of Sociological Method", "The Rules of Sociological Method", 1895)

"Every experiment, every observation has, besides its immediate result, effects which, in proportion to its value, spread always on all sides into ever distant parts of knowledge." (Sir Michael Foster, "Annual Report of the Board of Regents of the Smithsonian Institution", 1898)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"An experiment is an observation that can be repeated, isolated and varied. The more frequently you can repeat an observation, the more likely are you to see clearly what is there and to describe accurately what you have seen. The more strictly you can isolate an observation, the easier does your task of observation become, and the less danger is there of your being led astray by irrelevant circumstances, or of placing emphasis on the wrong point. The more widely you can vary an observation, the more clearly will be the uniformity of experience stand out, and the better is your chance of discovering laws." (Edward B Titchener, "A Text-Book of Psychology", 1909)

"Theory is the best guide for experiment - that were it not for theory and the problems and hypotheses that come out of it, we would not know the points we wanted to verify, and hence would experiment aimlessly" (Henry Hazlitt,  "Thinking as a Science", 1916)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"While it is true that theory often sets difficult, if not impossible tasks for the experiment, it does, on the other hand, often lighten the work of the experimenter by disclosing cogent relationships which make possible the indirect determination of inaccessible quantities and thus render difficult measurements unnecessary." (Georg Joos, "Theoretical Physics", 1934)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher, "The Design of Experiments", 1935)

"Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained from observation or experiment. The subject has a coherent structure based on the theory of Probability and includes many different procedures which contribute to research and development throughout the whole of Science and Technology." (Egon Pearson, 1936)

"Experiment as compared with mere observation has some of the characteristics of cross-examining nature rather than merely overhearing her." (Alan Gregg, "The Furtherance of Medical Research", 1941)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Observation, reason, and experiment make up what we call the scientific method. (Richard Feynman, "Mainly mechanics, radiation, and heat", 1963)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"In moving from conjecture to experimental data, (D), experiments must be designed which make best use of the experimenter's current state of knowledge and which best illuminate his conjecture. In moving from data to modified conjecture, (A), data must be analyzed so as to accurately present information in a manner which is readily understood by the experimenter." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"The essential function of a hypothesis consists in the guidance it affords to new observations and experiments, by which our conjecture is either confirmed or refuted." (Ernst Mach, "Knowledge and Error: Sketches on the Psychology of Enquiry", 1976)

"Theoretical scientists, inching away from the safe and known, skirting the point of no return, confront nature with a free invention of the intellect. They strip the discovery down and wire it into place in the form of mathematical models or other abstractions that define the perceived relation exactly. The now-naked idea is scrutinized with as much coldness and outward lack of pity as the naturally warm human heart can muster. They try to put it to use, devising experiments or field observations to test its claims. By the rules of scientific procedure it is then either discarded or temporarily sustained. Either way, the central theory encompassing it grows. If the abstractions survive they generate new knowledge from which further exploratory trips of the mind can be planned. Through the repeated alternation between flights of the imagination and the accretion of hard data, a mutual agreement on the workings of the world is written, in the form of natural law." (Edward O Wilson, "Biophilia", 1984)

"The only touchstone for empirical truth is experiment and observation." (Heinz Pagels, "Perfect Symmetry: The Search for the Beginning of Time", 1985)

"Any physical theory is always provisional, in the sense that it is only a hypothesis: you can never prove it. No matter how many times the results of experiments agree with some theory, you can never be sure that the next time the result will not contradict the theory." (Stephen Hawking,  "A Brief History of Time", 1988)

"Scientists use mathematics to build mental universes. They write down mathematical descriptions - models - that capture essential fragments of how they think the world behaves. Then they analyse their consequences. This is called 'theory'. They test their theories against observations: this is called 'experiment'. Depending on the result, they may modify the mathematical model and repeat the cycle until theory and experiment agree. Not that it's really that simple; but that's the general gist of it, the essence of the scientific method." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"Clearly, science is not simply a matter of observing facts. Every scientific theory also expresses a worldview. Philosophical preconceptions determine where facts are sought, how experiments are designed, and which conclusions are drawn from them." (Nancy R Pearcey & Charles B. Thaxton, "The Soul of Science: Christian Faith and Natural Philosophy", 1994)

"Probability theory is an ideal tool for formalizing uncertainty in situations where class frequencies are known or where evidence is based on outcomes of a sufficiently long series of independent random experiments. Possibility theory, on the other hand, is ideal for formalizing incomplete information expressed in terms of fuzzy propositions." (George Klir, "Fuzzy sets and fuzzy logic", 1995)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996)

"[…] because observations are all we have, we take them seriously. We choose hard data and the framework of mathematics as our guides, not unrestrained imagination or unrelenting skepticism, and seek the simplest yet most wide-reaching theories capable of explaining and predicting the outcome of today’s and future experiments." (Brian Greene, "The Fabric of the Cosmos", 2004)

"In science, for a theory to be believed, it must make a prediction - different from those made by previous theories - for an experiment not yet done. For the experiment to be meaningful, we must be able to get an answer that disagrees with that prediction. When this is the case, we say that a theory is falsifiable - vulnerable to being shown false. The theory also has to be confirmable, it must be possible to verify a new prediction that only this theory makes. Only when a theory has been tested and the results agree with the theory do we advance the statement to the rank of a true scientific theory." (Lee Smolin, "The Trouble with Physics", 2006)

"Observation and experiment, without a rational hypothesis, is like a man groping at objects at random with his eyes shut." (Henry P Tappan, "Elements of Logic", 2015)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"If your experiment needs statistics, you ought to have done a better experiment." (Ernest Rutherford)

More quotes on "Experiment" at the-web-of-knowledge.blogspot.com

26 December 2018

Data Science: Precision (Just the Quotes)

"Simplicity and precision ought to be the characteristics of a scientific nomenclature: words should signify things, or the analogies of things, and not opinions." (Sir Humphry Davy, Elements of Chemical Philosophy", 1812)

"[Precision] is the very soul of science; and its attainment afford the only criterion, or at least the best, of the truth of theories, and the correctness of experiments." (John F W Herschel, "A Preliminary Discourse on the Study of Natural Philosophy", 1830)

"Numerical facts, like other facts, are but the raw materials of knowledge, upon which our reasoning faculties must be exerted in order to draw forth the principles of nature. [...] Numerical precision is the soul of science [...]" (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"One is almost tempted to assert that quite apart from its intellectual mission, theory is the most practical thing conceivable, the quintessence of practice as it were, since the precision of its conclusions cannot be reached by any routine of estimating or trial and error; although given the hidden ways of theory, this will hold only for those who walk them with complete confidence." (Ludwig E Boltzmann, "On the Significance of Theories", 1890)

"Physical research by experimental methods is both a broadening and a narrowing field. There are many gaps yet to be filled, data to be accumulated, measurements to be made with great precision, but the limits within which we must work are becoming, at the same time, more and more defined." (Elihu Thomson, "Annual Report of the Board of Regents of the Smithsonian Institution", 1899)

"The apodictic quality of mathematical thought, the certainty and correctness of its conclusions, are due, not to a special mode of ratiocination, but to the character of the concepts with which it deals. What is that distinctive characteristic? I answer: precision, sharpness, completeness of definition. But how comes your mathematician by such completeness? There is no mysterious trick involved; some ideas admit of such precision, others do not; and the mathematician is one who deals with those that do." (Cassius J Keyser, "The Universe and Beyond", Hibbert Journal Vol. 3, 1904–1905)

"It is difficult to find an intelligible account of the meaning of ‘probability’, or of how we are ever to determine the probability of any particular proposition; and yet treatises on the subject profess to arrive at complicated results of the greatest precision and the most profound practical importance." (John M Keynes, "A Treatise on Probability", 1921)

"It is never possible to predict a physical occurrence with unlimited precision." (Max Planck, "A Scientific Autobiography", 1949)

"Precision is expressed by an international standard, viz., the standard error. It measures the average of the difference between a complete coverage and a long series of estimates formed from samples drawn from this complete coverage by a particular procedure or drawing, and processed by a particular estimating formula." (W Edwards Deming, "On the Presentation of the Results of Sample Surveys as Legal Evidence", Journal of the American Statistical Association Vol 49 (268), 1954)

"Scientists whose work has no clear, practical implications would want to make their decisions considering such things as: the relative worth of (1) more observations, (2) greater scope of his conceptual model, (3) simplicity, (4) precision of language, (5) accuracy of the probability assignment." (C West Churchman, "Costs, Utilities, and Values", 1956)

"The precision of a number is the degree of exactness with which it is stated, while the accuracy of a number is the degree of exactness with which it is known or observed. The precision of a quantity is reported by the number of significant figures in it." (Edmund C Berkeley & Lawrence Wainwright, Computers: Their Operation and Applications", 1956)

"The two most important characteristics of the language of statistics are first, that it describes things in quantitative terms, and second, that it gives this description an air of accuracy and precision." (Ely Devons, "Essays in Economics", 1961)

"We all know that in economic statistics particularly, true precision, comparability and accuracy is extremely difficult to achieve, and it is for this reason that the language of economic statistics is so difficult to handle." (Ely Devons, "Essays in Economics", 1961)

"It is of course desirable to work with manageable models which maximize generality, realism, and precision toward the overlapping but not identical goals of understanding, predicting, and modifying nature. But this cannot be done." (Richard Levins, "The strategy of model building in population biology", American Scientist Vol. 54 (4), 1966) 

"In general, complexity and precision bear an inverse relation to one another in the sense that, as the complexity of a problem increases, the possibility of analysing it in precise terms diminishes. Thus 'fuzzy thinking' may not be deplorable, after all, if it makes possible the solution of problems which are much too complex for precise analysis." (Lotfi A Zadeh, "Fuzzy languages and their relation to human intelligence", 1972)

"As the complexity of a system increases, our ability to make precise and yet significant statements about its behavior diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive characteristics." (Lotfi A Zadeh, 1973)

"Simplicity is worth buying if we do not have to pay too great a loss of precision for it." (George Pólya, "Mathematical Methods in Science", 1977)

"Computational reducibility may well be the exception rather than the rule: Most physical questions may be answerable only through irreducible amounts of computation. Those that concern idealized limits of infinite time, volume, or numerical precision can require arbitrarily long computations, and so be formally undecidable." (Stephen Wolfram, Undecidability and intractability in theoretical physics", Physical Review Letters 54 (8), 1985)

"Negative feedback only improves the precision of goal-seeking, but does not determine it. Feedback devices are only executive mechanisms that operate during the translation of a program." (Ernst Mayr, "Toward a New Philosophy of Biology: Observations of an Evolutionist", 1988)

"A mathematical model uses mathematical symbols to describe and explain the represented system. Normally used to predict and control, these models provide a high degree of abstraction but also of precision in their application." (Lars Skyttner, "General Systems Theory: Ideas and Applications", 2001)

"Precision does not vary linearly with increasing sample size. As is well known, the width of a confidence interval is a function of the square root of the number of observations. But it is more complicate than that. The basic elements determining a confidence interval are the sample size, an estimate of variability, and a pivotal variable associated with the estimate of variability." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Statistics can certainly pronounce a fact, but they cannot explain it without an underlying context, or theory. Numbers have an unfortunate tendency to supersede other types of knowing. […] Numbers give the illusion of presenting more truth and precision than they are capable of providing." (Ronald J Baker, "Measure what Matters to Customers: Using Key Predictive Indicators", 2006)

"[myth:] Accuracy is more important than precision. For single best estimates, be it a mean value or a single data value, this question does not arise because in that case there is no difference between accuracy and precision. (Think of a single shot aimed at a target.) Generally, it is good practice to balance precision and accuracy. The actual requirements will differ from case to case." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Popular accounts of mathematics often stress the discipline’s obsession with certainty, with proof. And mathematicians often tell jokes poking fun at their own insistence on precision. However, the quest for precision is far more than an end in itself. Precision allows one to reason sensibly about objects outside of ordinary experience. It is a tool for exploring possibility: about what might be, as well as what is." (Donal O’Shea, “The Poincaré Conjecture”, 2007)

"Precision and recall are ways of monitoring the power of the machine learning implementation. Precision is a metric that monitors the percentage of true positives. […] Recall is the ratio of true positives to true positive plus false negatives." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

24 December 2018

Data Science: Phenomena (Just the Quotes)

"The word ‘chance’ then expresses only our ignorance of the causes of the phenomena that we observe to occur and to succeed one another in no apparent order. Probability is relative in part to this ignorance, and in part to our knowledge.” (Pierre-Simon Laplace, "Mémoire sur les Approximations des Formules qui sont Fonctions de Très Grands Nombres", 1783)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"The insights gained and garnered by the mind in its wanderings among basic concepts are benefits that theory can provide. Theory cannot equip the mind with formulas for solving problems, nor can it mark the narrow path on which the sole solution is supposed to lie by planting a hedge of principles on either side. But it can give the mind insight into the great mass of phenomena and of their relationships, then leave it free to rise into the higher realms of action." (Carl von Clausewitz, "On War", 1832)

"Theories usually result from the precipitate reasoning of an impatient mind which would like to be rid of phenomena and replace them with images, concepts, indeed often with mere words." (Johann Wolfgang von Goethe, "Maxims and Reflections", 1833)

"[…] in order to observe, our mind has need of some theory or other. If in contemplating phenomena we did not immediately connect them with principles, not only would it be impossible for us to combine these isolated observations, and therefore to derive profit from them, but we should even be entirely incapable of remembering facts, which would for the most remain unnoted by us." (Auguste Comte, "Cours de Philosophie Positive", 1830-1842)

"The dimmed outlines of phenomenal things all merge into one another unless we put on the focusing-glass of theory, and screw it up sometimes to one pitch of definition and sometimes to another, so as to see down into different depths through the great millstone of the world." (James C Maxwell, "Are There Real Analogies in Nature?", 1856) 

"Isolated facts and experiments have in themselves no value, however great their number may be. They only become valuable in a theoretical or practical point of view when they make us acquainted with the law of a series of uniformly recurring phenomena, or, it may be, only give a negative result showing an incompleteness in our knowledge of such a law, till then held to be perfect." (Hermann von Helmholtz, "The Aim and Progress of Physical Science", 1869)

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondences or anomalies, to find the causes, to identify the laws." (Émile Cheysson, cca. 1877) 

"Most surprising and far-reaching analogies revealed themselves between apparently quite disparate natural processes. It seemed that nature had built the most various things on exactly the same pattern; or, in the dry words of the analyst, the same differential equations hold for the most various phenomena." (Ludwig Boltzmann, "On the methods of theoretical physics", 1892)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"The principle of complementarity states that no single model is possible which could provide a precise and rational analysis of the connections between these phenomena [before and after measurement]. In such a case, we are not supposed, for example, to attempt to describe in detail how future phenomena arise out of past phenomena. Instead, we should simply accept without further analysis the fact that future phenomena do in fact somehow manage to be produced, in a way that is, however, necessarily beyond the possibility of a detailed description. The only aim of a mathematical theory is then to predict the statistical relations, if any, connecting the phenomena." (David Bohm, "A Suggested Interpretation of the Quantum Theory in Terms of ‘Hidden’ Variables", 1952)

"The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work" (John Von Neumann, "Method in the Physical Sciences", 1955)

"As shorthand, when the phenomena are suitably simple, words such as equilibrium and stability are of great value and convenience. Nevertheless, it should be always borne in mind that they are mere shorthand, and that the phenomena will not always have the simplicity that these words presuppose." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"Can there be laws of chance? The answer, it would seem should be negative, since chance is in fact defined as the characteristic of the phenomena which follow no law, phenomena whose causes are too complex to permit prediction." (Félix E Borel, "Probabilities and Life", 1962)

"Theories are usually introduced when previous study of a class of phenomena has revealed a system of uniformities. […] Theories then seek to explain those regularities and, generally, to afford a deeper and more accurate understanding of the phenomena in question. To this end, a theory construes those phenomena as manifestations of entities and processes that lie behind or beneath them, as it were." (Carl G Hempel, "Philosophy of Natural Science", 1966)

"The less we understand a phenomenon, the more variables we require to explain it." (Russell L Ackoff, "Management Science", 1967)

 "As soon as we inquire into the reasons for the phenomena, we enter the domain of theory, which connects the observed phenomena and traces them back to a single ‘pure’ phenomena, thus bringing about a logical arrangement of an enormous amount of observational material." (Georg Joos, "Theoretical Physics", 1968)

"A model is an abstract description of the real world. It is a simple representation of more complex forms, processes and functions of physical phenomena and ideas." (Moshe F Rubinstein & Iris R Firstenberg, "Patterns of Problem Solving", 1975)

"A real change of theory is not a change of equations - it is a change of mathematical structure, and only fragments of competing theories, often not very important ones conceptually, admit comparison with each other within a limited range of phenomena." (Yuri I Manin, "Mathematics and Physics", 1981)

"In all scientific fields, theory is frequently more important than experimental data. Scientists are generally reluctant to accept the existence of a phenomenon when they do not know how to explain it. On the other hand, they will often accept a theory that is especially plausible before there exists any data to support it." (Richard Morris, 1983)

"Nature is disordered, powerful and chaotic, and through fear of the chaos we impose system on it. We abhor complexity, and seek to simplify things whenever we can by whatever means we have at hand. We need to have an overall explanation of what the universe is and how it functions. In order to achieve this overall view we develop explanatory theories which will give structure to natural phenomena: we classify nature into a coherent system which appears to do what we say it does." (James Burke, "The Day the Universe Changed", 1985) 

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"[…] the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth." (Richard Swinburne, "Simplicity as Evidence for Truth", 1997)

"The point is that scientific descriptions of phenomena in all of these cases do not fully capture reality they are models. This is not a shortcoming but a strength of science much of the scientist's art lies in figuring out what to include and what to exclude in a model, and this ability allows science to make useful predictions without getting bogged down by intractable details." (Philip Ball," The Self-Made Tapestry: Pattern Formation in Nature", 1998)

"A scientific theory is a concise and coherent set of concepts, claims, and laws (frequently expressed mathematically) that can be used to precisely and accurately explain and predict natural phenomena." (Mordechai Ben-Ari, "Just a Theory: Exploring the Nature of Science", 2005)

"Complexity arises when emergent system-level phenomena are characterized by patterns in time or a given state space that have neither too much nor too little form. Neither in stasis nor changing randomly, these emergent phenomena are interesting, due to the coupling of individual and global behaviours as well as the difficulties they pose for prediction. Broad patterns of system behaviour may be predictable, but the system's specific path through a space of possible states is not." (Steve Maguire et al, "Complexity Science and Organization Studies", 2006)

"Humans have difficulty perceiving variables accurately […]. However, in general, they tend to have inaccurate perceptions of system states, including past, current, and future states. This is due, in part, to limited ‘mental models’ of the phenomena of interest in terms of both how things work and how to influence things. Consequently, people have difficulty determining the full implications of what is known, as well as considering future contingencies for potential systems states and the long-term value of addressing these contingencies. " (William B. Rouse, "People and Organizations: Explorations of Human-Centered Design", 2007)

"A theory is a speculative explanation of a particular phenomenon which derives it legitimacy from conforming to the primary assumptions of the worldview of the culture in which it appears. There can be more than one theory for a particular phenomenon that conforms to a given worldview. […]  A new theory may seem to trigger a change in worldview, as in this case, but logically a change in worldview must precede a change in theory, otherwise the theory will not be viable. A change in worldview will necessitate a change in all theories in all branches of study." (M G Jackson, "Transformative Learning for a New Worldview: Learning to Think Differently", 2008)

"[...] construction of a data model is precisely the selective relevant depiction of the phenomena by the user of the theory required for the possibility of representation of the phenomenon."  (Bas C van Fraassen, "Scientific Representation: Paradoxes of Perspective", 2008)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"A theory is a set of deductively closed propositions that explain and predict empirical phenomena, and a model is a theory that is idealized." (Jay Odenbaugh, "True Lies: Realism, Robustness, and Models", Philosophy of Science, Vol. 78, No. 5, 2011)

"Mathematical modeling is the modern version of both applied mathematics and theoretical physics. In earlier times, one proposed not a model but a theory. By talking today of a model rather than a theory, one acknowledges that the way one studies the phenomenon is not unique; it could also be studied other ways. One's model need not claim to be unique or final. It merits consideration if it provides an insight that isn't better provided by some other model." (Reuben Hersh, ”Mathematics as an Empirical Phenomenon, Subject to Modeling”, 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"The first epistemic principle to embrace is that there is always a gap between our data and the real world. We fall headfirst into a pitfall when we forget that this gap exists, that our data isn't a perfect reflection of the real-world phenomena it's representing. Do people really fail to remember this? It sounds so basic. How could anyone fall into such an obvious trap?" (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

"Although to penetrate into the intimate mysteries of nature and hence to learn the true causes of phenomena is not allowed to us, nevertheless it can happen that a certain fictive hypothesis may suffice for explaining many phenomena." (Leonhard Euler)

Data Science: Data Mining (Just the Quotes)

"Data mining is the efficient discovery of valuable, nonobvious information from a large collection of data. […] Data mining centers on the automated discovery of new facts and relationships in data. The idea is that the raw material is the business data, and the data mining algorithm is the excavator, sifting through the vast quantities of raw data looking for the valuable nuggets of business information." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Data mining is more of an art than a science. No one can tell you exactly how to choose columns to include in your data mining models. There are no hard and fast rules you can follow in deciding which columns either help or hinder the final model. For this reason, it is important that you understand how the data behaves before beginning to mine it. The best way to achieve this level of understanding is to see how the data is distributed across columns and how the different columns relate to one another. This is the process of exploring the data." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algorithms are more scalable than statisticians ever thought possible. Formal statistical theory is more pervasive than computer scientists had realized." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Most mainstream data-mining techniques ignore the fact that real-world datasets are combinations of underlying data, and build single models from them. If such datasets can first be separated into the components that underlie them, we might expect that the quality of the models will improve significantly. (David Skillicorn, "Understanding Complex Datasets: Data Mining with Matrix Decompositions", 2007)

"The name ‘data mining’ derives from the metaphor of data as something that is large, contains far too much detail to be used as it is, but contains nuggets of useful information that can have value. So data mining can be defined as the extraction of the valuable information and actionable knowledge that is implicit in large amounts of data. (David Skillicorn, "Understanding Complex Datasets: Data Mining with Matrix Decompositions", 2007)

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Traditional statistical studies use past information to determine a future state of a system (often called prediction), whereas data mining studies use past information to construct patterns based not solely on the input data, but also on the logical consequences of those data. This process is also called prediction, but it contains a vital element missing in statistical analysis: the ability to provide an orderly expression of what might be in the future, compared to what was in the past (based on the assumptions of the statistical method)." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The difference between human dynamics and data mining boils down to this: Data mining predicts our behaviors based on records of our patterns of activity; we don't even have to understand the origins of the patterns exploited by the algorithm. Students of human dynamics, on the other hand, seek to develop models and theories to explain why, when, and where we do the things we do with some regularity." (Albert-László Barabási, "Bursts: The Hidden Pattern Behind Everything We Do", 2010)

"Data mining is a craft. As with many crafts, there is a well-defined process that can help to increase the likelihood of a successful result. This process is a crucial conceptual tool for thinking about data science projects. [...] data mining is an exploratory undertaking closer to research and development than it is to engineering." (Foster Provost, "Data Science for Business", 2013)

"There is another important distinction pertaining to mining data: the difference between (1) mining the data to find patterns and build models, and (2) using the results of data mining. Students often confuse these two processes when studying data science, and managers sometimes confuse them when discussing business analytics. The use of data mining results should influence and inform the data mining process itself, but the two should be kept distinct." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Unfortunately, creating an objective function that matches the true goal of the data mining is usually impossible, so data scientists often choose based on faith and experience." (Foster Provost, "Data Science for Business", 2013)

"Data Mining is the art and science of discovering useful innovative patterns from data. (Anil K. Maheshwari, "Business Intelligence and Data Mining", 2015)

"Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so." (Pedro Domingos, "The Master Algorithm", 2015)

"Today we routinely learn models with millions of parameters, enough to give each elephant in the world his own distinctive wiggle. It’s even been said that data mining means 'torturing the data until it confesses'." (Pedro Domingos, "The Master Algorithm", 2015)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope." (John D Kelleher & Brendan Tierney, "Data Science", 2018)
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.