Showing posts with label measurement. Show all posts
Showing posts with label measurement. Show all posts

19 December 2018

Data Science: Errors in Statistics (Just the Quotes)

"[It] may be laid down as a general rule that, if the result of a long series of precise observations approximates a simple relation so closely that the remaining difference is undetectable by observation and may be attributed to the errors to which they are liable, then this relation is probably that of nature." (Pierre-Simon Laplace, "Mémoire sur les Inégalites Séculaires des Planètes et des Satellites", 1787)

"It is surprising to learn the number of causes of error which enter into the simplest experiment, when we strive to attain rigid accuracy." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"Some of the common ways of producing a false statistical argument are to quote figures without their context, omitting the cautions as to their incompleteness, or to apply them to a group of phenomena quite different to that to which they in reality relate; to take these estimates referring to only part of a group as complete; to enumerate the events favorable to an argument, omitting the other side; and to argue hastily from effect to cause, this last error being the one most often fathered on to statistics. For all these elementary mistakes in logic, statistics is held responsible." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"We know not to what are due the accidental errors, and precisely because we do not know, we are aware they obey the law of Gauss. Such is the paradox." (Henri Poincaré, "The Foundations of Science", 1913)

"No observations are absolutely trustworthy. In no field of observation can we entirely rule out the possibility that an observation is vitiated by a large measurement or execution error. If a reading is found to lie a very long way from its fellows in a series of replicate observations, there must be a suspicion that the deviation is caused by a blunder or gross error of some kind. [...] One sufficiently erroneous reading can wreck the whole of a statistical analysis, however many observations there are." (Francis J Anscombe, "Rejection of Outliers", Technometrics Vol. 2 (2), 1960)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"The art of using the language of figures correctly is not to be over-impressed by the apparent ai

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Linear regression assumes that in the population a normal distribution of error values around the predicted Y is associated with each X value, and that the dispersion of the error values for each X value is the same. The assumptions imply normal and similarly dispersed error distributions." (Fred C Pampel, "Linear Regression: A primer", 2000)

"Compound errors can begin with any of the standard sorts of bad statistics - a guess, a poor sample, an inadvertent transformation, perhaps confusion over the meaning of a complex statistic. People inevitably want to put statistics to use, to explore a number's implications. [...] The strengths and weaknesses of those original numbers should affect our confidence in the second-generation statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Trimming potentially theoretically meaningful variables is not advisable unless one is quite certain that the coefficient for the variable is near zero, that the variable is inconsequential, and that trimming will not introduce misspecification error." (James Jaccard, "Interaction Effects in Logistic Regression", 2001)

"The central limit theorem says that, under conditions almost always satisfied in the real world of experimentation, the distribution of such a linear function of errors will tend to normality as the number of its components becomes large. The tendency to normality occurs almost regardless of the individual distributions of the component errors. An important proviso is that several sources of error must make important contributions to the overall error and that no particular source of error dominate the rest." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"There are many ways for error to creep into facts and figures that seem entirely straightforward. Quantities can be miscounted. Small samples can fail to accurately reflect the properties of the whole population. Procedures used to infer quantities from other information can be faulty. And then, of course, numbers can be total bullshit, fabricated out of whole cloth in an effort to confer credibility on an otherwise flimsy argument. We need to keep all of these things in mind when we look at quantitative claims. They say the data never lie - but we need to remember that the data often mislead." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Always expect to find at least one error when you proofread your own statistics. If you don’t, you are probably making the same mistake twice." (Cheryl Russell)

[Murphy’s Laws of Analysis:] "(1) In any collection of data, the figures that are obviously correct contain errors. (2) It is customary for a decimal to be misplaced. (3) An error that can creep into a calculation, will. Also, it will always be in the direction that will cause the most damage to the calculation." (G C Deakly)

Data Science: Sampling (Just the Quotes)

"By a small sample we may judge of the whole piece." (Miguel de Cervantes, "Don Quixote de la Mancha", 1605–1615)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"The postulate of randomness thus resolves itself into the question, 'of what population is this a random sample?' which must frequently be asked by every practical statistician." (Ronald Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922)

"The principle underlying sampling is that a set of objects taken at random from a larger group tends to reproduce the characteristics of that larger group: this is called the Law of Statistical Regularity. There are exceptions to this rule, and a certain amount of judgment must be exercised, especially when there are a few abnormally large items in the larger group. With erratic data, the accuracy of sampling can often be tested by comparing several samples. On the whole, the larger the sample the more closely will it tend to resemble the population from which it is taken; too small a sample would not give reliable results." (Lewis R Connor, "Statistics in Theory and Practice", 1932)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"A good sample-design is lost if it is not carried out according to plans." (W Edwards Deming, "Some Theory of Sampling", 1950)

"Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability." (William E Deming, "Some Theory of Sampling", 1950)

"Almost any sort of inquiry that is general and not particular involves both sampling and measurement […]. Further, both the measurement and the sampling will be imperfect in almost every case. We can define away either imperfection in certain cases. But the resulting appearance of perfection is usually only an illusion." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"By sampling we can learn only about collective properties of populations, not about properties of individuals. We can study the average height, the percentage who wear hats, or the variability in weight of college juniors [...]. The population we study may be small or large, but there must be a population - and what we are studying must be a population characteristic. By sampling, we cannot study individuals as particular entities with unique idiosyncrasies; we can study regularities (including typical variabilities as well as typical levels) in a population as exemplified by the individuals in the sample." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"In many cases general probability samples can be thought of in terms of (1) a subdivision of the population into strata, (2) a self-weighting probability sample in each stratum, and (3) combination of the stratum sample means weighted by the size of the stratum." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"Precision is expressed by an international standard, viz., the standard error. It measures the average of the difference between a complete coverage and a long series of estimates formed from samples drawn from this complete coverage by a particular procedure or drawing, and processed by a particular estimating formula." (W Edwards Deming, "On the Presentation of the Results of Sample Surveys as Legal Evidence", Journal of the American Statistical Association Vol 49 (268), 1954)

"The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"Weighing a sample appropriately is no more fudging the data than is correcting a gas volume for barometric pressure." (Frederick Mosteller, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1964)

"Entropy theory is indeed a first attempt to deal with global form; but it has not been dealing with structure. All it says is that a large sum of elements may have properties not found in a smaller sample of them." (Rudolf Arnheim, "Entropy and Art: An Essay on Disorder and Order", 1974) 

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"The law of truly large numbers states: With a large enough sample, any outrageous thing is likely to happen." (Frederick Mosteller, "Methods for Studying Coincidences", Journal of the American Statistical Association Vol. 84, 1989)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen,"Things I Have Learned (So Far)", American Psychologist, 1990)

"When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method forg athering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"Forget 'large-sample' methods. In the real world of experiments samples are so nearly always 'small' that it is not worth making any distinction, and small-sample methods are no harder to apply." (George Dyke, "How to avoid bad statistics", 1997)

"When the sample size is small or the study is of one organization, descriptive use of the thematic coding is desirable." (Richard Boyatzis, "Transforming qualitative information", 1998)

"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems.[...] The best samples are those that come as close as possible to being random." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"There are two problems with sampling - one obvious, and  the other more subtle. The obvious problem is sample size. Samples tend to be much smaller than their populations. [...] Obviously, it is possible to question results based on small samples. The smaller the sample, the less confidence we have that the sample accurately reflects the population. However, large samples aren't necessarily good samples. This leads to the second issue: the representativeness of a sample is actually far more important than sample size. A good sample accurately reflects (or 'represents') the population." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

"Be careful not to confuse clustering and stratification. Even though both of these sampling strategies involve dividing the population into subgroups, both the way in which the subgroups are sampled and the optimal strategy for creating the subgroups are different. In stratified sampling, we sample from every stratum, whereas in cluster sampling, we include only selected whole clusters in the sample. Because of this difference, to increase the chance of obtaining a sample that is representative of the population, we want to create homogeneous groups for strata and heterogeneous (reflecting the variability in the population) groups for clusters." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Bias in sampling is the tendency for samples to differ from the corresponding population in some systematic way. Bias can result from the way in which the sample is selected or from the way in which information is obtained once the sample has been chosen. The most common types of bias encountered in sampling situations are selection bias, measurement or response bias, and nonresponse bias." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The goal of random sampling is to produce a sample that is likely to be representative of the population. Although random sampling does not guarantee that the sample will be representative, it does allow us to assess the risk of an unrepresentative sample. It is the ability to quantify this risk that will enable us to generalize with confidence from a random sample to the corresponding population." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"If you study one group and assume that your results apply to other groups, this is extrapolation. If you think you are studying one group, but do not manage to obtain a representative sample of that group, this is a different problem. It is a problem so important in statistics that it has a special name: selection bias. Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"There are many ways for error to creep into facts and figures that seem entirely straightforward. Quantities can be miscounted. Small samples can fail to accurately reflect the properties of the whole population. Procedures used to infer quantities from other information can be faulty. And then, of course, numbers can be total bullshit, fabricated out of whole cloth in an effort to confer credibility on an otherwise flimsy argument. We need to keep all of these things in mind when we look at quantitative claims. They say the data never lie - but we need to remember that the data often mislead." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

More quotes on "Sampling" at the-web-of-knowledge.blogspot.com.

17 December 2018

Data Science: Method (Just the Quotes)

"There are two aspects of statistics that are continually mixed, the method and the science. Statistics are used as a method, whenever we measure something, for example, the size of a district, the number of inhabitants of a country, the quantity or price of certain commodities, etc. […] There is, moreover, a science of statistics. It consists of knowing how to gather numbers, combine them and calculate them, in the best way to lead to certain results. But this is, strictly speaking, a branch of mathematics." (Alphonse P de Candolle, "Considerations on Crime Statistics", 1833)

"The process of discovery is very simple. An unwearied and systematic application of known laws to nature, causes the unknown to reveal themselves. Almost any mode of observation will be successful at last, for what is most wanted is method." (Henry D Thoreau, "A Week on the Concord and Merrimack Rivers", 1862)

"As systematic unity is what first raises ordinary knowledge to the rank of science, that is, makes a system out of a mere aggregate of knowledge, architectonic is the doctrine of the scientific in our knowledge, and therefore necessarily forms part of the doctrine of method." (Immanuel Kant, "Critique of Pure Reason", 1871)

"Nothing is more certain in scientific method than that approximate coincidence alone can be expected. In the measurement of continuous quantity perfect correspondence must be accidental, and should give rise to suspicion rather than to satisfaction." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"The object of statistical science is to discover methods of condensing information concerning large groups of allied facts into brief and compendious expressions suitable for discussion. The possibility of doing this is based on the constancy and continuity with which objects of the same species are found to vary." (Sir Francis Galton, "Inquiries into Human Faculty and Its Development, Statistical Methods", 1883)

"Physical research by experimental methods is both a broadening and a narrowing field. There are many gaps yet to be filled, data to be accumulated, measurements to be made with great precision, but the limits within which we must work are becoming, at the same time, more and more defined." (Elihu Thomson, "Annual Report of the Board of Regents of the Smithsonian Institution", 1899)

"A statistical estimate may be good or bad, accurate or the reverse; but in almost all cases it is likely to be more accurate than a casual observer’s impression, and the nature of things can only be disproved by statistical methods." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"A method is a dangerous thing unless its underlying philosophy is understood, and none more dangerous than the statistical. […] Over-attention to technique may actually blind one to the dangers that lurk about on every side- like the gambler who ruins himself with his system carefully elaborated to beat the game. In the long run it is only clear thinking, experienced methods, that win the strongholds of science." (Edwin B Wilson, "The Statistical Significance of Experimental Data", Science, Volume 58 (1493), 1923)

"[…] the methods of statistics are so variable and uncertain, so apt to be influenced by circumstances, that it is never possible to be sure that one is operating with figures of equal weight." (Havelock Ellis, "The Dance of Life", 1923)

"Statistics may be regarded as (i) the study of populations, (ii) as the study of variation, and (iii) as the study of methods of the reduction of data." (Sir Ronald A Fisher, "Statistical Methods for Research Worker", 1925)

"Science is but a method. Whatever its material, an observation accurately made and free of compromise to bias and desire, and undeterred by consequence, is science." (Hans Zinsser, "Untheological Reflections", The Atlantic Monthly, 1929)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"The fundamental difference between engineering with and without statistics boils down to the difference between the use of a scientific method based upon the concept of laws of nature that do not allow for chance or uncertainty and a scientific method based upon the concepts of laws of probability as an attribute of nature." (Walter A Shewhart, 1940)

"[Statistics] is both a science and an art. It is a science in that its methods are basically systematic and have general application; and an art in that their successful application depends to a considerable degree on the skill and special experience of the statistician, and on his knowledge of the field of application, e.g. economics." (Leonard H C Tippett, "Statistics", 1943)

"Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena. In this definition 'natural phenomena' includes all the happenings of the external world, whether human or not " (Sir Maurice G Kendall, "Advanced Theory of Statistics", Vol. 1, 1943)

"We can scarcely imagine a problem absolutely new, unlike and unrelated to any formerly solved problem; but if such a problem could exist, it would be insoluble. In fact, when solving a problem, we should always profit from previously solved problems, using their result or their method, or the experience acquired in solving them." (George Polya, 1945)

"The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1951)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"The methods of science may be described as the discovery of laws, the explanation of laws by theories, and the testing of theories by new observations. A good analogy is that of the jigsaw puzzle, for which the laws are the individual pieces, the theories local patterns suggested by a few pieces, and the tests the completion of these patterns with pieces previously unconsidered." (Edwin P Hubble, "The Nature of Science and Other Lectures", 1954)

"We have to remember that what we observe is not nature herself, but nature exposed to our method of questioning." (Werner K Heisenberg, "Physics and Philosophy: The revolution in modern science", 1958)

"We are committed to the scientific method, and measurement is the foundation of that method; hence we are prone to assume that whatever is measurable must be significant and that whatever cannot be measured may as well be disregarded." (Joseph W Krutch, "Human Nature and the Human Condition", 1959)

"Scientific method is the way to truth, but it affords, even in principle, no unique definition of truth. Any so-called pragmatic definition of truth is doomed to failure equally." (Willard v O Quine, "Word and Object", 1960)

"Observation, reason, and experiment make up what we call the scientific method." (Richard Feynman, "Mainly mechanics, radiation, and heat", 1963)

"Engineering is the art of skillful approximation; the practice of gamesmanship in the highest form. In the end it is a method broad enough to tame the unknown, a means of combing disciplined judgment with intuition, courage with responsibility, and scientific competence within the practical aspects of time, of cost, and of talent." (Ronald B Smith, "Professional Responsibility of Engineering", Mechanical Engineering Vol. 86 (1), 1964)

"Statistics is a body of methods and theory applied to numerical evidence in making decisions in the face of uncertainty." (Lawrence Lapin, "Statistics for Modern Business Decisions", 1973)

"Statistical methods of analysis are intended to aid the interpretation of data that are subject to appreciable haphazard variability." (David V. Hinkley & David Cox, "Theoretical Statistics", 1974)

"Scientists use mathematics to build mental universes. They write down mathematical descriptions - models - that capture essential fragments of how they think the world behaves. Then they analyse their consequences. This is called 'theory'. They test their theories against observations: this is called 'experiment'. Depending on the result, they may modify the mathematical model and repeat the cycle until theory and experiment agree. Not that it's really that simple; but that's the general gist of it, the essence of the scientific method." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"But our ways of learning about the world are strongly influenced by the social preconceptions and biased modes of thinking that each scientist must apply to any problem. The stereotype of a fully rational and objective ‘scientific method’, with individual scientists as logical (and interchangeable) robots, is self-serving mythology." (Stephen J Gould, "This View of Life: In the Mind of the Beholder", Natural History Vol. 103, No. 2, 1994)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996) 

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No matter what the data, and no matter how the values are arranged and presented, you must always use some method of analysis to come up with an interpretation of the data.
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Scientists pursue ideas in an ill-defined but effective way that is often called the scientific method. There is no strict rule of procedure that will lead you from a good idea to a Nobel prize or even to a publishable discovery. Some scientists are meticulously careful; others are highly creative. The best scientists are probably both careful and creative. Although there are various scientific methods in use, a typical approach consists of a series of steps." (Peter Atkins et al, "Chemical Principles: The Quest for Insight" 6th ed., 2013)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

Data Science: Mathematical Models (Just the Quotes)

"Experience teaches that one will be led to new discoveries almost exclusively by means of special mechanical models." (Ludwig Boltzmann, "Lectures on Gas Theory", 1896)

"If the system exhibits a structure which can be represented by a mathematical equivalent, called a mathematical model, and if the objective can be also so quantified, then some computational method may be evolved for choosing the best schedule of actions among alternatives. Such use of mathematical models is termed mathematical programming."  (George Dantzig, "Linear Programming and Extensions", 1959)

“In fact, the construction of mathematical models for various fragments of the real world, which is the most essential business of the applied mathematician, is nothing but an exercise in axiomatics.” (Marshall Stone, cca 1960)

"[...] sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work - that is, correctly to describe phenomena from a reasonably wide area. Furthermore, it must satisfy certain aesthetic criteria - that is, in relation to how much it describes, it must be rather simple.” (John von Neumann, “Method in the physical sciences”, 1961)

“Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation.” (Marshall J Walker, “The Nature of Scientific Thought”, 1963)

"Thus, the construction of a mathematical model consisting of certain basic equations of a process is not yet sufficient for effecting optimal control. The mathematical model must also provide for the effects of random factors, the ability to react to unforeseen variations and ensure good control despite errors and inaccuracies." (Yakov Khurgin, "Did You Say Mathematics?", 1974)

"A mathematical model is any complete and consistent set of mathematical equations which are designed to correspond to some other entity, its prototype. The prototype may be a physical, biological, social, psychological or conceptual entity, perhaps even another mathematical model." (Rutherford Aris, "Mathematical Modelling", 1978)

"Mathematical model making is an art. If the model is too small, a great deal of analysis and numerical solution can be done, but the results, in general, can be meaningless. If the model is too large, neither analysis nor numerical solution can be carried out, the interpretation of the results is in any case very difficult, and there is great difficulty in obtaining the numerical values of the parameters needed for numerical results." (Richard E Bellman, "Eye of the Hurricane: An Autobiography", 1984)

“Theoretical scientists, inching away from the safe and known, skirting the point of no return, confront nature with a free invention of the intellect. They strip the discovery down and wire it into place in the form of mathematical models or other abstractions that define the perceived relation exactly. The now-naked idea is scrutinized with as much coldness and outward lack of pity as the naturally warm human heart can muster. They try to put it to use, devising experiments or field observations to test its claims. By the rules of scientific procedure it is then either discarded or temporarily sustained. Either way, the central theory encompassing it grows. If the abstractions survive they generate new knowledge from which further exploratory trips of the mind can be planned. Through the repeated alternation between flights of the imagination and the accretion of hard data, a mutual agreement on the workings of the world is written, in the form of natural law.” (Edward O Wilson, “Biophilia”, 1984)

“The usual approach of science of constructing a mathematical model cannot answer the questions of why there should be a universe for the model to describe. Why does the universe go to all the bother of existing?” (Stephen Hawking, "A Brief History of Time", 1988)

“Mathematical modeling is about rules - the rules of reality. What distinguishes a mathematical model from, say, a poem, a song, a portrait or any other kind of ‘model’, is that the mathematical model is an image or picture of reality painted with logical symbols instead of with words, sounds or watercolors.” (John L Casti, "Reality Rules, The Fundamentals", 1992)

“Pedantry and sectarianism aside, the aim of theoretical physics is to construct mathematical models such as to enable us, from the use of knowledge gathered in a few observations, to predict by logical processes the outcomes in many other circumstances. Any logically sound theory satisfying this condition is a good theory, whether or not it be derived from ‘ultimate’ or ‘fundamental’ truth.” (Clifford Truesdell & Walter Noll, “The Non-Linear Field Theories of Mechanics” 2nd Ed., 1992)

"Nature behaves in ways that look mathematical, but nature is not the same as mathematics. Every mathematical model makes simplifying assumptions; its conclusions are only as valid as those assumptions. The assumption of perfect symmetry is excellent as a technique for deducing the conditions under which symmetry-breaking is going to occur, the general form of the result, and the range of possible behaviour. To deduce exactly which effect is selected from this range in a practical situation, we have to know which imperfections are present." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry", 1992)

“A model is an imitation of reality and a mathematical model is a particular form of representation. We should never forget this and get so distracted by the model that we forget the real application which is driving the modelling. In the process of model building we are translating our real world problem into an equivalent mathematical problem which we solve and then attempt to interpret. We do this to gain insight into the original real world situation or to use the model for control, optimization or possibly safety studies." (Ian T Cameron & Katalin Hangos, “Process Modelling and Model Analysis”, 2001)

"Formulation of a mathematical model is the first step in the process of analyzing the behaviour of any real system. However, to produce a useful model, one must first adopt a set of simplifying assumptions which have to be relevant in relation to the physical features of the system to be modelled and to the specific information one is interested in. Thus, the aim of modelling is to produce an idealized description of reality, which is both expressible in a tractable mathematical form and sufficiently close to reality as far as the physical mechanisms of interest are concerned." (Francois Axisa, "Discrete Systems" Vol. I, 2001)

"[…] interval mathematics and fuzzy logic together can provide a promising alternative to mathematical modeling for many physical systems that are too vague or too complicated to be described by simple and crisp mathematical formulas or equations. When interval mathematics and fuzzy logic are employed, the interval of confidence and the fuzzy membership functions are used as approximation measures, leading to the so-called fuzzy systems modeling." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Modeling, in a general sense, refers to the establishment of a description of a system (a plant, a process, etc.) in mathematical terms, which characterizes the input-output behavior of the underlying system. To describe a physical system […] we have to use a mathematical formula or equation that can represent the system both qualitatively and quantitatively. Such a formulation is a mathematical representation, called a mathematical model, of the physical system." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

“What is a mathematical model? One basic answer is that it is the formulation in mathematical terms of the assumptions and their consequences believed to underlie a particular ‘real world’ problem. The aim of mathematical modeling is the practical application of mathematics to help unravel the underlying mechanisms involved in, for example, economic, physical, biological, or other systems and processes.” (John A Adam, “Mathematics in Nature”, 2003)

“Mathematical modeling is as much ‘art’ as ‘science’: it requires the practitioner to (i) identify a so-called ‘real world’ problem (whatever the context may be); (ii) formulate it in mathematical terms (the ‘word problem’ so beloved of undergraduates); (iii) solve the problem thus formulated (if possible; perhaps approximate solutions will suffice, especially if the complete problem is intractable); and (iv) interpret the solution in the context of the original problem.” (John A Adam, “Mathematics in Nature”, 2003)

“Mathematical modeling is the application of mathematics to describe real-world problems and investigating important questions that arise from it.” (Sandip Banerjee, “Mathematical Modeling: Models, Analysis and Applications”, 2014)

“A mathematical model is a mathematical description (often by means of a function or an equation) of a real-world phenomenon such as the size of a population, the demand for a product, the speed of a falling object, the concentration of a product in a chemical reaction, the life expectancy of a person at birth, or the cost of emission reductions. The purpose of the model is to understand the phenomenon and perhaps to make predictions about future behavior. [...] A mathematical model is never a completely accurate representation of a physical situation - it is an idealization." (James Stewart, “Calculus: Early Transcedentals” 8th Ed., 2016)

"Machine learning is about making computers learn and perform tasks better based on past historical data. Learning is always based on observations from the data available. The emphasis is on making computers build mathematical models based on that learning and perform tasks automatically without the intervention of humans." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"Mathematical modeling is the modern version of both applied mathematics and theoretical physics. In earlier times, one proposed not a model but a theory. By talking today of a model rather than a theory, one acknowledges that the way one studies the phenomenon is not unique; it could also be studied other ways. One's model need not claim to be unique or final. It merits consideration if it provides an insight that isn't better provided by some other model." (Reuben Hersh, ”Mathematics as an Empirical Phenomenon, Subject to Modeling”, 2017)

16 December 2018

Data Science: Laws (Just the Quotes)

"[…] we must not measure the simplicity of the laws of nature by our facility of conception; but when those which appear to us the most simple, accord perfectly with observations of the phenomena, we are justified in supposing them rigorously exact." (Pierre-Simon Laplace, "The System of the World", 1809)

"Primary causes are unknown to us; but are subject to simple and constant laws, which may be discovered by observation, the study of them being the object of natural philosophy." (Jean-Baptiste-Joseph Fourier, "The Analytical Theory of Heat", 1822)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"But law is no explanation of anything; law is simply a generalization, a category of facts. Law is neither a cause, nor a reason, nor a power, nor a coercive force. It is nothing but a general formula, a statistical table." (Florence Nightingale, "Suggestions for Thought", 1860)

"The process of discovery is very simple. An unwearied and systematic application of known laws to nature, causes the unknown to reveal themselves. Almost any mode of observation will be successful at last, for what is most wanted is method." (Henry D Thoreau, "A Week on the Concord and Merrimack Rivers", 1862)

"Isolated facts and experiments have in themselves no value, however great their number may be. They only become valuable in a theoretical or practical point of view when they make us acquainted with the law of a series of uniformly recurring phenomena, or, it may be, only give a negative result showing an incompleteness in our knowledge of such a law, till then held to be perfect." (Hermann von Helmholtz, "The Aim and Progress of Physical Science", 1869)

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondences or anomalies, to find the causes, to identify the laws." (Émile Cheysson, cca. 1877)

"The history of thought should warn us against concluding that because the scientific theory of the world is the best that has yet been formulated, it is necessarily complete and final. We must remember that at bottom the generalizations of science or, in common parlance, the laws of nature are merely hypotheses devised to explain that ever-shifting phantasmagoria of thought which we dignify with the high-sounding names of the world and the universe." (Sir James G Frazer, "The Golden Bough: A Study in Magic and Religion", 1890)

"Even one well-made observation will be enough in many cases, just as one well-constructed experiment often suffices for the establishment of a law." (Émile Durkheim, "The Rules of Sociological Method", "The Rules of Sociological Method", 1895)

"An experiment is an observation that can be repeated, isolated and varied. The more frequently you can repeat an observation, the more likely are you to see clearly what is there and to describe accurately what you have seen. The more strictly you can isolate an observation, the easier does your task of observation become, and the less danger is there of your being led astray by irrelevant circumstances, or of placing emphasis on the wrong point. The more widely you can vary an observation, the more clearly will be the uniformity of experience stand out, and the better is your chance of discovering laws." (Edward B Titchener, "A Text-Book of Psychology", 1909)

"It is well to notice in this connection [the mutual relations between the results of counting and measuring] that a natural law, in the statement of which measurable magnitudes occur, can only be understood to hold in nature with a certain degree of approximation; indeed natural laws as a rule are not proof against sufficient refinement of the measuring tools." (Luitzen E J Brouwer, "Intuitionism and Formalism", Bulletin of the American Mathematical Society, Vol. 20, 1913)

"[…] as the sciences have developed further, the notion has gained ground that most, perhaps all, of our laws are only approximations." (William James, "Pragmatism: A New Name for Some Old Ways of Thinking", 1914)

"Scientific laws, when we have reason to think them accurate, are different in form from the common-sense rules which have exceptions: they are always, at least in physics, either differential equations, or statistical averages." (Bertrand A Russell, "The Analysis of Matter", 1927)

"Science is the attempt to discover, by means of observation, and reasoning based upon it, first, particular facts about the world, and then laws connecting facts with one another and (in fortunate cases) making it possible to predict future occurrences." (Bertrand Russell, "Religion and Science, Grounds of Conflict", 1935)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"The world is not made up of empirical facts with the addition of the laws of nature: what we call the laws of nature are conceptual devices by which we organize our empirical knowledge and predict the future." (Richard B Braithwaite, "Scientific Explanation", 1953)

"The methods of science may be described as the discovery of laws, the explanation of laws by theories, and the testing of theories by new observations. A good analogy is that of the jigsaw puzzle, for which the laws are the individual pieces, the theories local patterns suggested by a few pieces, and the tests the completion of these patterns with pieces previously unconsidered." (Edwin P Hubble, "The Nature of Science and Other Lectures", 1954)

"Can there be laws of chance? The answer, it would seem should be negative, since chance is in fact defined as the characteristic of the phenomena which follow no law, phenomena whose causes are too complex to permit prediction." (Félix E Borel, "Probabilities and Life", 1962)

"Each piece, or part, of the whole of nature is always merely an approximation to the complete truth, or the complete truth so far as we know it. In fact, everything we know is only some kind of approximation, because we know that we do not know all the laws as yet. Therefore, things must be learned only to be unlearned again or, more likely, to be corrected." (Richard Feynman, "The Feynman Lectures on Physics" Vol. 1, 1964)

"At each level of complexity, entirely new properties appear. [And] at each stage, entirely new laws, concepts, and generalizations are necessary, requiring inspiration and creativity to just as great a degree as in the previous one." (Herb Anderson, 1972)

"A good scientific law or theory is falsifiable just because it makes definite claims about the world. For the falsificationist, If follows fairly readily from this that the more falsifiable a theory is the better, in some loose sense of more. The more a theory claims, the more potential opportunities there will be for showing that the world does not in fact behave in the way laid down by the theory. A very good theory will be one that makes very wide-ranging claims about the world, and which is consequently highly falsifiable, and is one that resists falsification whenever it is put to the test." (Alan F Chalmers,  "What Is This Thing Called Science?", 1976)

"Scientific laws give algorithms, or procedures, for determining how systems behave. The computer program is a medium in which the algorithms can be expressed and applied. Physical objects and mathematical structures can be represented as numbers and symbols in a computer, and a program can be written to manipulate them according to the algorithms. When the computer program is executed, it causes the numbers and symbols to be modified in the way specified by the scientific laws. It thereby allows the consequences of the laws to be deduced." (Stephen Wolfram, "Computer Software in Science and Mathematics", 1984)

"The connection between a model and a theory is that a model satisfies a theory; that is, a model obeys those laws of behavior that a corresponding theory explicity states or which may be derived from it. [...[] Computers make possible an entirely new relationship between theories and models. [...] A theory written in the form of a computer program is [...] both a theory and, when placed on a computer and run, a model to which the theory applies." (Joseph Weizenbaum, "Computer Power and Human Reason", 1984)

"We expect to learn new tricks because one of our science based abilities is being able to predict. That after all is what science is about. Learning enough about how a thing works so you'll know what comes next. Because as we all know everything obeys the universal laws, all you need is to understand the laws." (James Burke, "The Day the Universe Changed", 1985)

"A law explains a set of observations; a theory explains a set of laws. […] Unlike laws, theories often postulate unobservable objects as part of their explanatory mechanism." (John L Casti, "Searching for Certainty", 1990)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996) 

"A scientific theory is a concise and coherent set of concepts, claims, and laws (frequently expressed mathematically) that can be used to precisely and accurately explain and predict natural phenomena." (Mordechai Ben-Ari, "Just a Theory: Exploring the Nature of Science", 2005)

"[...] things that seem hopelessly random and unpredictable when viewed in isolation often turn out to be lawful and predictable when viewed in aggregate." (Steven Strogatz, "The Joy of X: A Guided Tour of Mathematics, from One to Infinity", 2012)

14 December 2018

Data Science: Coincidence (Just the Quotes)

"It is no great wonder if in long process of time, while fortune takes her course hither and thither, numerous coincidences should spontaneously occur. If the number and variety of subjects to be wrought upon be infinite, it is all the more easy for fortune, with such an abundance of material, to effect this similarity of results." (Plutarch, Life of Sertorius, 1st century BC)

"Coincidences, in general, are great stumbling blocks in the way of that class of thinkers who have been educated to know nothing of the theory of probabilities - that theory to which the most glorious objects of human research are indebted for the most glorious of illustrations." (Edgar A Poe, "The Murders in the Rue Morgue", 1841)

"Nothing is more certain in scientific method than that approximate coincidence alone can be expected. In the measurement of continuous quantity perfect correspondence must be accidental, and should give rise to suspicion rather than to satisfaction." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"Before we can completely explain a phenomenon we require not only to find its true cause, its chief relations to other causes, and all the conditions which determine how the cause operates, and what its effect and amount of effect are, but also all the coincidences." (George Gore, "The Art of Scientific Discovery", 1878)

"As science progress, it becomes more and more difficult to fit in the new facts when they will not fit in spontaneously. The older theories depend upon the coincidences of so many numerical results which can not be attributed to chance. We should not separate what has been joined together." (Henri Poincaré, "The Ether and Matter", 1912)

"By the laws of statistics we could probably approximate just how unlikely it is that it would happen. But people forget - especially those who ought to know better, such as yourself - that while the laws of statistics tell you how unlikely a particular coincidence is, they state just as firmly that coincidences do happen." (Robert A Heinlein, "The Door Into Summer", 1957)

"There is no coherent knowledge, i.e. no uniform comprehensive account of the world and the events in it. There is no comprehensive truth that goes beyond an enumeration of details, but there are many pieces of information, obtained in different ways from different sources and collected for the benefit of the curious. The best way of presenting such knowledge is the list - and the oldest scientific works were indeed lists of facts, parts, coincidences, problems in several specialized domains." (Paul K Feyerabend, "Farewell to Reason", 1987)

"A tendency to drastically underestimate the frequency of coincidences is a prime characteristic of innumerates, who generally accord great significance to correspondences of all sorts while attributing too little significance to quite conclusive but less flashy statistical evidence." (John A Paulos, "Innumeracy: Mathematical Illiteracy and its Consequences", 1988)

"The law of truly large numbers states: With a large enough sample, any outrageous thing is likely to happen." (Frederick Mosteller, "Methods for Studying Coincidences", Journal of the American Statistical Association Vol. 84, 1989)

"Most coincidences are simply chance events that turn out to be far more probable than many people imagine." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1997)

"Often, we use the word random loosely to describe something that is disordered, irregular, patternless, or unpredictable. We link it with chance, probability, luck, and coincidence. However, when we examine what we mean by random in various contexts, ambiguities and uncertainties inevitably arise. Tackling the subtleties of randomness allows us to go to the root of what we can understand of the universe we inhabit and helps us to define the limits of what we can know with certainty." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"Coincidence surprises us because our intuition about the likelihood of an event is often wildly inaccurate." (Michael Starbird, "Coincidences, Chaos, and All That Math Jazz", 2005)

"With our heads spinning in the world of coincidence and chaos, we nevertheless must make decisions and take steps into the minefield of our future. To avoid explosive missteps, we rely on data and statistical reasoning to inform our thinking." (Michael Starbird, "Coincidences, Chaos, and All That Math Jazz", 2005)

"The human mind delights in finding pattern - so much so that we often mistake coincidence or forced analogy for profound meaning. No other habit of thought lies so deeply within the soul of a small creature trying to make sense of a complex world not constructed for it." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 2010)

More quotes on "Coincidence" at the-web-of-knowledge.blogspot.com.

13 December 2018

Data Science: Approximation (Just the Quotes)

"Man’s mind cannot grasp the causes of events in their completeness, but the desire to find those causes is implanted in man’s soul. And without considering the multiplicity and complexity of the conditions any one of which taken separately may seem to be the cause, he snatches at the first approximation to a cause that seems to him intelligible and says: ‘This is the cause!’" (Leo Tolstoy, "War and Peace", 1867)

"[It] may be laid down as a general rule that, if the result of a long series of precise observations approximates a simple relation so closely that the remaining difference is undetectable by observation and may be attributed to the errors to which they are liable, then this relation is probably that of nature." (Pierre-Simon Laplace, "Mémoire sur les Inégalites Séculaires des Planètes et des Satellites", 1787)

"Although this may seem a paradox, all exact science is dominated by the idea of approximation. When a man tells you that he knows the exact truth about anything, you are safe in inferring that he is an inexact man." (Bertrand Russell, "The Scientific Outlook", 1931)

"We live in a system of approximations. Every end is prospective of some other end, which is also temporary; a round and final success nowhere. We are encamped in nature, not domesticated." (Ralph W Emerson, "Essays", 1865)

"It is well to notice in this connection [the mutual relations between the results of counting and measuring] that a natural law, in the statement of which measurable magnitudes occur, can only be understood to hold in nature with a certain degree of approximation; indeed natural laws as a rule are not proof against sufficient refinement of the measuring tools." (Luitzen E J Brouwer, "Intuitionism and Formalism", Bulletin of the American Mathematical Society, Vol. 20, 1913)

"[…] as the sciences have developed further, the notion has gained ground that most, perhaps all, of our laws are only approximations." (William James, "Pragmatism: A New Name for Some Old Ways of Thinking", 1914)

"Science does not aim at establishing immutable truths and eternal dogmas; its aim is to approach the truth by successive approximations, without claiming that at any stage final and complete accuracy has been achieved." (Bertrand Russell, "The ABC of Relativity", 1925)

"[…] reality is a system, completely ordered and fully intelligible, with which thought in its advance is more and more identifying itself. We may look at the growth of knowledge […] as an attempt by our mind to return to union with things as they are in their ordered wholeness. […] and if we take this view, our notion of truth is marked out for us. Truth is the approximation of thought to reality […] Its measure is the distance thought has travelled […] toward that intelligible system […] The degree of truth of a particular proposition is to be judged in the first instance by its coherence with experience as a whole, ultimately by its coherence with that further whole, all comprehensive and fully articulated, in which thought can come to rest." (Brand Blanshard, "The Nature of Thought" Vol. II, 1939) 

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"Because engineering is science in action - the practice of decision making at the earliest moment - it has been defined as the art of skillful approximation. No situation in engineering is simple enough to be solved precisely, and none worth evaluating is solved exactly. Never are there sufficient facts, sufficient time, or sufficient money for an exact solution, for if by chance there were, the answer would be of academic and not economic interest to society. These are the circumstances that make engineering so vital and so creative." (Ronald B Smith, "Engineering Is…", Mechanical Engineering Vol. 86 (5), 1964)

"Each piece, or part, of the whole of nature is always merely an approximation to the complete truth, or the complete truth so far as we know it. In fact, everything we know is only some kind of approximation, because we know that we do not know all the laws as yet. Therefore, things must be learned only to be unlearned again or, more likely, to be corrected." (Richard Feynman, "The Feynman Lectures on Physics" Vol. 1, 1964)

"Engineering is the art of skillful approximation; the practice of gamesmanship in the highest form. In the end it is a method broad enough to tame the unknown, a means of combing disciplined judgment with intuition, courage with responsibility, and scientific competence within the practical aspects of time, of cost, and of talent." (Ronald B Smith, "Professional Responsibility of Engineering", Mechanical Engineering Vol. 86 (1), 1964)

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"One grievous error in interpreting approximations is to allow only good approximations." (Preston C Hammer, "Mind Pollution", Cybernetics, Vol. 14, 1971)

"The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful." (George Box, 1987)

"Science is more than a mere attempt to describe nature as accurately as possible. Frequently the real message is well hidden, and a law that gives a poor approximation to nature has more significance than one which works fairly well but is poisoned at the root." (Robert H March, "Physics for Poets", 1996)

"Most physical systems, particularly those complex ones, are extremely difficult to model by an accurate and precise mathematical formula or equation due to the complexity of the system structure, nonlinearity, uncertainty, randomness, etc. Therefore, approximate modeling is often necessary and practical in real-world applications. Intuitively, approximate modeling is always possible. However, the key questions are what kind of approximation is good, where the sense of 'goodness' has to be first defined, of course, and how to formulate such a good approximation in modeling a system such that it is mathematically rigorous and can produce satisfactory results in both theory and applications." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Mathematical modeling is as much ‘art’ as ‘science’: it requires the practitioner to (i) identify a so-called ‘real world’ problem (whatever the context may be); (ii) formulate it in mathematical terms (the ‘word problem’ so beloved of undergraduates); (iii) solve the problem thus formulated (if possible; perhaps approximate solutions will suffice, especially if the complete problem is intractable); and (iv) interpret the solution in the context of the original problem." (John A Adam, "Mathematics in Nature", 2003)

"All models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind." (George E P Box & Norman R Draper, "Response Surfaces, Mixtures, and Ridge Analyses", 2007)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"Science is about finding ever better approximations rather than pretending you have already found ultimate truth." (Friedrich Nietzsche)

More quotes on "Approximation" at the-web-of-knowledge.blogspot.com

11 December 2018

Data Science: Measurement (Just the Quotes)

"Accurate and minute measurement seems to the nonscientific imagination a less lofty and dignified work than looking for something new. But nearly all the grandest discoveries of science have been but the rewards of accurate measurement and patient long contained labor in the minute sifting of numerical results." (William T Kelvin, "Report of the British Association For the Advancement of Science" Vol. 41, 1871)

"It is clear that one who attempts to study precisely things that are changing must have a great deal to do with measures of change." (Charles Cooley, "Observations on the Measure of Change", Journal of the American Statistical Association (21), 1893)

"Nothing is more certain in scientific method than that approximate coincidence alone can be expected. In the measurement of continuous quantity perfect correspondence must be accidental, and should give rise to suspicion rather than to satisfaction." (William S Jevons, "The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"Physical research by experimental methods is both a broadening and a narrowing field. There are many gaps yet to be filled, data to be accumulated, measurements to be made with great precision, but the limits within which we must work are becoming, at the same time, more and more defined." (Elihu Thomson, "Annual Report of the Board of Regents of the Smithsonian Institution", 1899)

"[…] statistics is the science of the measurement of the social organism, regarded as a whole, in all its manifestations." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Statistics may rightly be called the science of averages. […] Great numbers and the averages resulting from them, such as we always obtain in measuring social phenomena, have great inertia. […] It is this constancy of great numbers that makes statistical measurement possible. It is to great numbers that statistical measurement chiefly applies." (Sir Arthur L Bowley, "Elements of Statistics", 1901)

"Just as data gathered by an incompetent observer are worthless - or by a biased observer, unless the bias can be measured and eliminated from the result - so also conclusions obtained from even the best data by one unacquainted with the principles of statistics must be of doubtful value." (William F White, "A Scrap-Book of Elementary Mathematics: Notes, Recreations, Essays", 1908)

"Science begins with measurement and there are some people who cannot be measurers; and just as we distinguish carpenters who can work to this or that traction of an inch of accuracy, so we must distinguish ourselves and our acquaintances as able to observe and record to this or that degree of truthfulness." (John A Thomson, "Introduction to Science", 1911)

"Science depends upon measurement, and things not measurable are therefore excluded, or tend to be excluded, from its attention." (Arthur J Balfour, "Address", 1917)

"Make more measurements than necessary to obtain the result and see to what extent these measurements, which in a certain sense control one another, agree with one another. By looking at how the measures fit to one another one can gain a sort of indication of probability of how precise the single measurements are and within which margins the result reasonably has to be maintained." (Felix Klein, "Elementary Mathematics from a Higher Standpoint" Vol III: "Precision Mathematics and Approximation Mathematics", 1928)

"Search for measurable elements among your phenomena, and then search for relations between these measures of physical quantities." (Alfred N Whitehead, "Science and the Modern World", 1929)

"While it is true that theory often sets difficult, if not impossible tasks for the experiment, it does, on the other hand, often lighten the work of the experimenter by disclosing cogent relationships which make possible the indirect determination of inaccessible quantities and thus render difficult measurements unnecessary." (Georg Joos, "Theoretical Physics", 1934)

"It is important to realize that it is not the one measurement, alone, but its relation to the rest of the sequence that is of interest." (William E Deming, "Statistical Adjustment of Data", 1938)

"Probabilities must be regarded as analogous to the measurement of physical magnitudes; that is to say, they can never be known exactly, but only within certain approximation." (Emile Borel, "Probabilities and Life", 1943)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"Every bit of knowledge we gain and every conclusion we draw about the universe or about any part or feature of it depends finally upon some observation or measurement. Mankind has had again and again the humiliating experience of trusting to intuitive, apparently logical conclusions without observations, and has seen Nature sail by in her radiant chariot of gold in an entirely different direction." (Oliver J Lee, "Measuring Our Universe: From the Inner Atom to Outer Space", 1950)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"We are committed to the scientific method, and measurement is the foundation of that method; hence we are prone to assume that whatever is measurable must be significant and that whatever cannot be measured may as well be disregarded." (Joseph W Krutch, "Human Nature and the Human Condition", 1959)

"No observations are absolutely trustworthy. In no field of observation can we entirely rule out the possibility that an observation is vitiated by a large measurement or execution error. If a reading is found to lie a very long way from its fellows in a series of replicate observations, there must be a suspicion that the deviation is caused by a blunder or gross error of some kind. [...] One sufficiently erroneous reading can wreck the whole of a statistical analysis, however many observations there are." (Francis J Anscombe, "Rejection of Outliers", Technometrics Vol. 2 (2), 1960)

"Statistics provides a quantitative example of the scientific process usually described qualitatively by saying that scientists observe nature, study the measurements, postulate models to predict new measurements, and validate the model by the success of prediction." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"This other world is the so-called physical world image; it is merely an intellectual structure. To a certain extent it is arbitrary. It is a kind of model or idealization created in order to avoid the inaccuracy inherent in every measurement and to facilitate exact definition." (Max Planck, "The Philosophy of Physics", 1963)

"Measurement, we have seen, always has an element of error in it. The most exact description or prediction that a scientist can make is still only approximate." (Abraham Kaplan, "The Conduct of Inquiry: Methodology for Behavioral Science", 1964)

"Measurement is the link between mathematics and science." (Brian Ellis, "Basic Concepts of Measurement", 1966)

"The aim of science is not so much to search for truth, or even truths, as to classify our knowledge and to establish relations between observable phenomena in order to be able to predict the future in a certain measure and to explain the sequence of phenomena in relation to ourselves." (Pierre L du Noüy, "Between Knowing and Believing", 1967)

"[…] it is not enough to say: 'There's error in the data and therefore the study must be terribly dubious'. A good critic and data analyst must do more: he or she must also show how the error in the measurement or the analysis affects the inferences made on the basis of that data and analysis." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Typically, data analysis is messy, and little details clutter it. Not only confounding factors, but also deviant cases, minor problems in measurement, and ambiguous results lead to frustration and discouragement, so that more data are collected than analyzed. Neglecting or hiding the messy details of the data reduces the researcher's chances of discovering something new." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Crude measurement usually yields misleading, even erroneous conclusions no matter how sophisticated a technique is used." (Henry T Reynolds, "Analysis of Nominal Data", 1977)

"But real-life situations often require us to measure probability in precisely this fashion - from sample to universe. In only rare cases does life replicate games of chance, for which we can determine the probability of an outcome before an event even occurs - a priori […] . In most instances, we have to estimate probabilities from what happened after the fact - a posteriori. The very notion of a posteriori implies experimentation and changing degrees of belief." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Measurement has meaning only if we can transmit the information without ambiguity to others." (Russell Fox & Max Gorbuny, "The Science of Science", 1997)

"Since the average is a measure of location, it is common to use averages to compare two data sets. The set with the greater average is thought to ‘exceed’ the other set. While such comparisons may be helpful, they must be used with caution. After all, for any given data set, most of the values will not be equal to the average." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"First, good statistics are based on more than guessing. [...] Second, good statistics are based on clear, reasonable definitions. Remember, every statistic has to define its subject. Those definitions ought to be clear and made public. [...] Third, good statistics are based on clear, reasonable measures. Again, every statistic involves some sort of measurement; while all measures are imperfect, not all flaws are equally serious. [...] Finally, good statistics are based on good samples." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"Repeated observations of the same phenomenon do not always produce the same results, due to random noise or error. Sampling errors result when our observations capture unrepresentative circumstances, like measuring rush hour traffic on weekends as well as during the work week. Measurement errors reflect the limits of precision inherent in any sensing device. The notion of signal to noise ratio captures the degree to which a series of observations reflects a quantity of interest as opposed to data variance. As data scientists, we care about changes in the signal instead of the noise, and such variance often makes this problem surprisingly difficult." (Steven S Skiena, "The Data Science Design Manual", 2017)

"It’d be nice to fondly imagine that high-quality statistics simply appear in a spreadsheet somewhere, divine providence from the numerical heavens. Yet any dataset begins with somebody deciding to collect the numbers. What numbers are and aren’t collected, what is and isn’t measured, and who is included or excluded are the result of all-too-human assumptions, preconceptions, and oversights." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"People do care about how they are measured. What can we do about this? If you are in the position to measure something, think about whether measuring it will change people’s behaviors in ways that undermine the value of your results. If you are looking at quantitative indicators that others have compiled, ask yourself: Are these numbers measuring what they are intended to measure? Or are people gaming the system and rendering this measure useless?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Premature enumeration is an equal-opportunity blunder: the most numerate among us may be just as much at risk as those who find their heads spinning at the first mention of a fraction. Indeed, if you’re confident with numbers you may be more prone than most to slicing and dicing, correlating and regressing, normalizing and rebasing, effortlessly manipulating the numbers on the spreadsheet or in the statistical package - without ever realizing that you don’t fully understand what these abstract quantities refer to. Arguably this temptation lay at the root of the last financial crisis: the sophistication of mathematical risk models obscured the question of how, exactly, risks were being measured, and whether those measurements were something you’d really want to bet your global banking system on." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The whole discipline of statistics is built on measuring or counting things. […] it is important to understand what is being measured or counted, and how. It is surprising how rarely we do this. Over the years, as I found myself trying to lead people out of statistical mazes week after week, I came to realize that many of the problems I encountered were because people had taken a wrong turn right at the start. They had dived into the mathematics of a statistical claim - asking about sampling errors and margins of error, debating if the number is rising or falling, believing, doubting, analyzing, dissecting - without taking the ti- me to understand the first and most obvious fact: What is being measured, or counted? What definition is being used?" (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

05 May 2018

Data Science: Clustering (Definitions)

"Grouping of similar patterns together. In this text the term 'clustering' is used only for unsupervised learning problems in which the desired groupings are not known in advance." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"The process of grouping similar input patterns together using an unsupervised training algorithm." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"Clustering attempts to identify groups of observations with similar characteristics." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"The process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects, which are 'similar' between them and are 'dissimilar' to the objects belonging to other clusters." (Juan R González et al, "Nature-Inspired Cooperative Strategies for Optimization", 2008)

"Grouping the nodes of an ad hoc network such that each group is a self-organized entity having a cluster-head which is responsible for formation and management of its cluster." (Prayag Narula, "Evolutionary Computing Approach for Ad-Hoc Networks", 2009)

"The process of assigning individual data items into groups (called clusters) so that items from the same cluster are more similar to each other than items from different clusters. Often similarity is assessed according to a distance measure." (Alfredo Vellido & Iván Olie, "Clustering and Visualization of Multivariate Time Series", 2010)

"Verb. To output a smaller data set based on grouping criteria of common attributes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of partitioning the data attributes of an entity or table into subsets or clusters of similar attributes, based on subject matter or characteristic (domain)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A data mining technique that analyzes data to group records together according to their location within the multidimensional attribute space." (SQL Server 2012 Glossary, "Microsoft", 2012)

"Clustering aims to partition data into groups called clusters. Clustering is usually unsupervised in the sense that the training data is not labeled. Some clustering algorithms require a guess for the number of clusters, while other algorithms don't." (Ivan Idris, "Python Data Analysis", 2014)

"Form of data analysis that groups observations to clusters. Similar observations are grouped in the same cluster, whereas dissimilar observations are grouped in different clusters. As opposed to classification, there is not a class attribute and no predefined classes exist." (Efstathios Kirkos, "Composite Classifiers for Bankruptcy Prediction", 2014)

"Organization of data in some semantically meaningful way such that each cluster contains related data while the unrelated data are assigned to different clusters. The clusters may not be predefined." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"Techniques for organizing data into groups of similar cases." (Meta S Brown, "Data Mining For Dummies", 2014)

[cluster analysis:] "A technique that identifies homogenous subgroups or clusters of subjects or study objects." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Clustering is a classification technique where similar kinds of objects are grouped together. The similarity between the objects maybe determined in different ways depending upon the use case. Therefore, clustering in measurement space may be an indicator of similarity of image regions, and may be used for segmentation purposes." (Shiwangi Chhawchharia, "Improved Lymphocyte Image Segmentation Using Near Sets for ALL Detection", 2016)

"Clustering techniques share the goal of creating meaningful categories from a collection of items whose properties are hard to directly perceive and evaluate, which implies that category membership cannot easily be reduced to specific property tests and instead must be based on similarity. The end result of clustering is a statistically optimal set of categories in which the similarity of all the items within a category is larger than the similarity of items that belong to different categories." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

[cluster analysis:]"A statistical technique for finding natural groupings in data; it can also be used to assign new cases to groupings or categories." (Jonathan Ferrar et al, "The Power of People", 2017)

"Clustering or cluster analysis is a set of techniques of multivariate data analysis aimed at selecting and grouping homogeneous elements in a data set. Clustering techniques are based on measures relating to the similarity between the elements. In many approaches this similarity, or better, dissimilarity, is designed in terms of distance in a multidimensional space. Clustering algorithms group items on the basis of their mutual distance, and then the belonging to a set or not depends on how the element under consideration is distant from the collection itself." (Crescenzio Gallo, "Building Gene Networks by Analyzing Gene Expression Profiles", 2018)

"Unsupervised learning or clustering is a way of discovering hidden structure in unlabeled data. Clustering algorithms aim to discover latent patterns in unlabeled data using features to organize instances into meaningfully dissimilar groups." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The term clustering refers to the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters." (Satyadhyan Chickerur et al, "Forecasting the Demand of Agricultural Crops/Commodity Using Business Intelligence Framework", 2019)

"In the machine learning context, clustering is the task of grouping examples into related groups. This is generally an unsupervised task, that is, the algorithm does not use preexisting labels, though there do exist some supervised clustering algorithms." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A cluster is a group of data objects which have similarities among them. It's a group of the same or similar elements gathered or occurring closely together." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"Clustering describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"Describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"The process of identifying objects that are similar to each other and cluster them in order to understand the differences as well as the similarities within the data." (Analytics Insight)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.