29 November 2018

🔭Data Science: Change (Just the Quotes)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"It is clear that one who attempts to study precisely things that are changing must have a great deal to do with measures of change." (Charles Cooley, "Observations on the Measure of Change", Journal of the American Statistical Association (21), 1893)

"Given any object, relatively abstracted from its surroundings for study, the behavioristic approach consists in the examination of the output of the object and of the relations of this output to the input. By output is meant any change produced in the surroundings by the object. By input, conversely, is meant any event external to the object that modifies this object in any manner." (Arturo Rosenblueth, Norbert Wiener & Julian Bigelow, "Behavior, Purpose and Teleology", Philosophy of Science 10, 1943)

"The general method involved may be very simply stated. In cases where the equilibrium values of our variables can be regarded as the solutions of an extremum (maximum or minimum) problem, it is often possible regardless of the number of variables involved to determine unambiguously the qualitative behavior of our solution values in respect to changes of parameters." (Paul Samuelson, "Foundations of Economic Analysis", 1947)

"A common and very powerful constraint is that of continuity. It is a constraint because whereas the function that changes arbitrarily can undergo any change, the continuous function can change, at each step, only to a neighbouring value." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"As a simple trick, the discrete can often be carried over into the continuous, in a way suitable for practical purposes, by making a graph of the discrete, with the values shown as separate points. It is then easy to see the form that the changes will take if the points were to become infinitely numerous and close together." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"The discrete change has only to become small enough in its jump to approximate as closely as is desired to the continuous change. It must further be remembered that in natural phenomena the observations are almost invariably made at discrete intervals; the 'continuity' ascribed to natural events has often been put there by the observer's imagina- tion, not by actual observation at each of an infinite number of points. Thus the real truth is that the natural system is observed at discrete points, and our transformation represents it at discrete points. There can, therefore, be no real incompatibility." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"Prediction of the future is possible only in systems that have stable parameters like celestial mechanics. The only reason why prediction is so successful in celestial mechanics is that the evolution of the solar system has ground to a halt in what is essentially a dynamic equilibrium with stable parameters. Evolutionary systems, however, by their very nature have unstable parameters. They are disequilibrium systems and in such systems our power of prediction, though not zero, is very limited because of the unpredictability of the parameters themselves. If, of course, it were possible to predict the change in the parameters, then there would be other parameters which were unchanged, but the search for ultimately stable parameters in evolutionary systems is futile, for they probably do not exist… Social systems have Heisenberg principles all over the place, for we cannot predict the future without changing it." (Kenneth E Boulding, Evolutionary Economics, 1981)

"Model is used as a theory. It becomes theory when the purpose of building a model is to understand the mechanisms involved in the developmental process. Hence as theory, model does not carve up or change the world, but it explains how change takes place and in what way or manner. This leads to build change in the structures." (Laxmi K Patnaik, "Model Building in Political Science", The Indian Journal of Political Science Vol. 50 (2), 1989)

"A useful description relates the systematic variation to one or more factors; if the residuals dwarf the effects for a factor, we may not be able to relate variation in the data to changes in the factor. Furthermore, changes in the factor may bring no important change in the response. Such comparisons of residuals and effects require a measure of the variation of overlays relative to each other." (Christopher H Schrnid, "Value Splitting: Taking the Data Apart", 1991)

"[…] continuity appears when we try to mathematically express continuously changing phenomena, and differentiability is the result of expressing smoothly changing phenomena."  (Kenji Ueno & Toshikazu Sunada, "A Mathematical Gift, III: The Interplay Between Topology, Functions, Geometry, and Algebra", Mathematical World Vol. 23, 1996)

"How deep truths can be defined as invariants – things that do not change no matter what; how invariants are defined by symmetries, which in turn define which properties of nature are conserved, no matter what. These are the selfsame symmetries that appeal to the senses in art and music and natural forms like snowflakes and galaxies. The fundamental truths are based on symmetry, and there’s a deep kind of beauty in that." (K C Cole, "The Universe and the Teacup: The Mathematics of Truth and Beauty", 1997)

"There is a new science of complexity which says that the link between cause and effect is increasingly difficult to trace; that change (planned or otherwise) unfolds in non-linear ways; that paradoxes and contradictions abound; and that creative solutions arise out of diversity, uncertainty and chaos." (Andy P Hargreaves & Michael Fullan, "What’s Worth Fighting for Out There?", 1998)

"We analyze numbers in order to know when a change has occurred in our processes or systems. We want to know about such changes in a timely manner so that we can respond appropriately. While this sounds rather straightforward, there is a complication - the numbers can change even when our process does not. So, in our analysis of numbers, we need to have a way to distinguish those changes in the numbers that represent changes in our process from those that are essentially noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Each of the most basic physical laws that we know corresponds to some invariance, which in turn is equivalent to a collection of changes which form a symmetry group. […] whilst leaving some underlying theme unchanged. […] for example, the conservation of energy is equivalent to the invariance of the laws of motion with respect to translations backwards or forwards in time […] the conservation of linear momentum is equivalent to the invariance of the laws of motion with respect to the position of your laboratory in space, and the conservation of angular momentum to an invariance with respect to directional orientation… discovery of conservation laws indicated that Nature possessed built-in sustaining principles which prevented the world from just ceasing to be." (John D Barrow, "New Theories of Everything", 2007)

"The concept of symmetry is used widely in physics. If the laws that determine relations between physical magnitudes and a change of these magnitudes in the course of time do not vary at the definite operations (transformations), they say, that these laws have symmetry (or they are invariant) with respect to the given transformations. For example, the law of gravitation is valid for any points of space, that is, this law is in variant with respect to the system of coordinates." (Alexey Stakhov et al, "The Mathematics of Harmony", 2009)

"After you visualize your data, there are certain things to look for […]: increasing, decreasing, outliers, or some mix, and of course, be sure you’re not mixing up noise for patterns. Also note how much of a change there is and how prominent the patterns are. How does the difference compare to the randomness in the data? Observations can stand out because of human or mechanical error, because of the uncertainty of estimated values, or because there was a person or thing that stood out from the rest. You should know which it is." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"In negative feedback regulation the organism has set points to which different parameters (temperature, volume, pressure, etc.) have to be adapted to maintain the normal state and stability of the body. The momentary value refers to the values at the time the parameters have been measured. When a parameter changes it has to be turned back to its set point. Oscillations are characteristic to negative feedback regulation […]" (Gaspar Banfalvi, "Homeostasis - Tumor – Metastasis", 2014)

"Regression does not describe changes in ability that happen as time passes […]. Regression is caused by performances fluctuating about ability, so that performances far from the mean reflect abilities that are closer to the mean." (Gary Smith, "Standard Deviations", 2014)

"When memorization happens, you may have the illusion that everything is working well because your machine learning algorithm seems to have fitted the in sample data so well. Instead, problems can quickly become evident when you start having it work with out-of-sample data and you notice that it produces errors in its predictions as well as errors that actually change a lot when you relearn from the same data with a slightly different approach. Overfitting occurs when your algorithm has learned too much from your data, up to the point of mapping curve shapes and rules that do not exist [...]. Any slight change in the procedure or in the training data produces erratic predictions." (John P Mueller & Luca Massaron, Machine Learning for Dummies, 2016)

"Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. 'Structure' has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Symmetries directly point to invariants, which pinpoint intrinsic properties of the data and of the background empirical domain of interest. As our data models change, so too do our perspectives on analysing data." (Fionn Murtagh, "Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics", 2018)

More quotes on "Change" at the-web-of-knowledge.blogspot.com

28 November 2018

🔭Data Science: Classification (Just the Quotes)

"Classification is the process of arranging data into sequences and groups according to their common characteristics, or separating them into different but related parts." (Horace Secrist, "An Introduction to Statistical Methods", 1917)


"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"A classification is a scheme for breaking a category into a set of parts, called classes, according to some precisely defined differing characteristics possessed by all the elements of the category." (Alva M Tuttle, "Elementary Business and Economic Statistics", 1957)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996) 

"While classification is important, it can certainly be overdone. Making too fine a distinction between things can be as serious a problem as not being able to decide at all. Because we have limited storage capacity in our brain (we still haven't figured out how to add an extender card), it is important for us to be able to cluster similar items or things together. Not only is clustering useful from an efficiency standpoint, but the ability to group like things together (called chunking by artificial intelligence practitioners) is a very important reasoning tool. It is through clustering that we can think in terms of higher abstractions, solving broader problems by getting above all of the nitty-gritty details." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The well-known 'No Free Lunch' theorem indicates that there does not exist a pattern classification method that is inherently superior to any other, or even to random guessing without using additional information. It is the type of problem, prior information, and the amount of training samples that determine the form of classifier to apply. In fact, corresponding to different real-world problems, different classes may have different underlying data structures. A classifier should adjust the discriminant boundaries to fit the structures which are vital for classification, especially for the generalization capacity of the classifier." (Hui Xue et al, "SVM: Support Vector Machines", 2009)

"A problem in data mining when random variations in data are misclassified as important patterns. Overfitting often occurs when the data set is too small to represent the real world." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Choosing an appropriate classification algorithm for a particular problem task requires practice: each algorithm has its own quirks and is based on certain assumptions. To restate the 'No Free Lunch' theorem: no single classifier works best across all possible scenarios. In practice, it is always recommended that you compare the performance of at least a handful of different learning algorithms to select the best model for the particular problem; these may differ in the number of features or samples, the amount of noise in a dataset, and whether the classes are linearly separable or not." (Sebastian Raschka, "Python Machine Learning", 2015)

"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Decision trees are important for a few reasons. First, they can both classify and regress. It requires literally one line of code to switch between the two models just described, from a classification to a regression. Second, they are able to determine and share the feature importance of a given training set." (Russell Jurney, "Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark", 2017)

"Multilayer perceptrons share with polynomial classifiers one unpleasant property. Theoretically speaking, they are capable of modeling any decision surface, and this makes them prone to overfitting the training data."  (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

 "The main reason why pruning tends to improve classification performance on future examples is that the removal of low-level tests, which have poor statistical support, usually reduces the danger of overfitting. This, however, works only up to a certain point. If overdone, a very high extent of pruning can (in the extreme) result in the decision being replaced with a single leaf labeled with the majority class." (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The no free lunch theorems set limits on the range of optimality of any method. That is, each methodology has a ‘catchment area’ where it is optimal or nearly so. Often, intuitively, if the optimality is particularly strong then the effectiveness of the methodology falls off more quickly outside its catchment area than if its optimality were not so strong. Boosting is a case in point: it seems so well suited to binary classification that efforts to date to extend it to give effective classification (or regression) more generally have not been very successful. Overall, it remains to characterize the catchment areas where each class of predictors performs optimally, performs generally well, or breaks down." (Bertrand S Clarke & Jennifer L. Clarke, "Predictive Statistics: Analysis and Inference beyond Models", 2018)

"The premise of classification is simple: given a categorical target variable, learn patterns that exist between instances composed of independent variables and their relationship to the target. Because the target is given ahead of time, classification is said to be supervised machine learning because a model can be trained to minimize error between predicted and actual categories in the training data. Once a classification model is fit, it assigns categorical labels to new instances based on the patterns detected during training." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"A classification tree is perhaps the simplest form of algorithm, since it consists of a series of yes/no questions, the answer to each deciding the next question to be asked, until a conclusion is reached." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"An advantage of random forests is that it works with both regression and classification trees so it can be used with targets whose role is binary, nominal, or interval. They are also less prone to overfitting than a single decision tree model. A disadvantage of a random forest is that they generally require more trees to improve their accuracy. This can result in increased run times, particularly when using very large data sets." (Richard V McCarthy et al, "Applying Predictive Analytics: Finding Value in Data", 2019)

"The classifier accuracy would be extra ordinary when the test data and the training data are overlapping. But when the model is applied to a new data it will fail to show acceptable accuracy. This condition is called as overfitting." (Jesu V  Nayahi J & Gokulakrishnan K, "Medical Image Classification", 2019)

More quotes on "Classification" at the-web-of-knowledge.blogspot.com

🔭Data Science: Standard Deviation (Just the Quotes)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"The most important reason for portraying standard deviations is that they give us a sense of the relative variability of the points in different regions of the plot." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"The bounds on the standard deviation are pretty crude but it is surprising how often the rule will pick up gross errors such as confusing the standard error and standard deviation, confusing the variance and the standard deviation, or reporting the mean in one scale and the standard deviation in another scale." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Data often arrive in raw form, as long lists of numbers. In this case your job is to summarize the data in a way that captures its essence and conveys its meaning. This can be done numerically, with measures such as the average and standard deviation, or graphically. At other times you find data already in summarized form; in this case you must understand what the summary is telling, and what it is not telling, and then interpret the information for your readers or viewers." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Roughly stated, the standard deviation gives the average of the differences between the numbers on the list and the mean of that list. If data are very spread out, the standard deviation will be large. If the data are concentrated near the mean, the standard deviation will be small." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"A feature shared by both the range and the interquartile range is that they are each calculated on the basis of just two values - the range uses the maximum and the minimum values, while the IQR uses the two quartiles. The standard deviation, on the other hand, has the distinction of using, directly, every value in the set as part of its calculation. In terms of representativeness, this is a great strength. But the chief drawback of the standard deviation is that, conceptually, it is harder to grasp than other more intuitive measures of spread." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Numerical precision should be consistent throughout and summary statistics such as means and standard deviations should not have more than one extra decimal place (or significant digit) compared to the raw data. Spurious precision should be avoided although when certain measures are to be used for further calculations or when presenting the results of analyses, greater precision may sometimes be appropriate." (Jenny Freeman et al, "How to Display Data", 2008)

"Need to consider outliers as they can affect statistics such as means, standard deviations, and correlations. They can either be explained, deleted, or accommodated (using either robust statistics or obtaining additional data to fill-in). Can be detected by methods such as box plots, scatterplots, histograms or frequency distributions." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Outliers or influential data points can be defined as data values that are extreme or atypical on either the independent (X variables) or dependent (Y variables) variables or both. Outliers can occur as a result of observation errors, data entry errors, instrument errors based on layout or instructions, or actual extreme values from self-report data. Because outliers affect the mean, the standard deviation, and correlation coefficient values, they must be explained, deleted, or accommodated by using robust statistics." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

[myth] "The standard deviation statistic is more efficient than the range and therefore we should use the standard deviation statistic when computing limits for a process behavior chart."(Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Outliers make it very hard to give an intuitive interpretation of the mean, but in fact, the situation is even worse than that. For a real‐world distribution, there always is a mean (strictly speaking, you can define distributions with no mean, but they’re not realistic), and when we take the average of our data points, we are trying to estimate that mean. But when there are massive outliers, just a single data point is likely to dominate the value of the mean and standard deviation, so much more data is required to even estimate the mean, let alone make sense of it." (Field Cady, "The Data Science Handbook", 2017)

"Theoretically, the normal distribution is most famous because many distributions converge to it, if you sample from them enough times and average the results. This applies to the binomial distribution, Poisson distribution and pretty much any other distribution you’re likely to encounter (technically, any one for which the mean and standard deviation are finite)." (Field Cady, "The Data Science Handbook", 2017)

"With time series though, there is absolutely no substitute for plotting. The pertinent pattern might end up being a sharp spike followed by a gentle taper down. Or, maybe there are weird plateaus. There could be noisy spikes that have to be filtered out. A good way to look at it is this: means and standard deviations are based on the naïve assumption that data follows pretty bell curves, but there is no corresponding 'default' assumption for time series data (at least, not one that works well with any frequency), so you always have to look at the data to get a sense of what’s normal. [...] Along the lines of figuring out what patterns to expect, when you are exploring time series data, it is immensely useful to be able to zoom in and out." (Field Cady, "The Data Science Handbook", 2017)

"With skewed data, quantiles will reflect the skew, while adding standard deviations assumes symmetry in the distribution and can be misleading." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"[…] whenever people make decisions after being supplied with the standard deviation number, they act as if it were the expected mean deviation." (Nassim N Taleb, "Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

27 November 2018

🔭Data Science: Data Science (Just the Quotes)

"Data is frequently missing or incongruous. If data is missing, do you simply ignore the missing points? That isn’t always possible. If data is incongruous, do you decide that something is wrong with badly behaved data (after all, equipment fails), or that the incongruous data is telling its own story, which may be more interesting? It’s reported that the discovery of ozone layer depletion was delayed because automated data collection tools discarded readings that were too low. In data science, what you have is frequently all you’re going to get. It’s usually impossible to get 'better' data, and you have no alternative but to work with the data at hand." (Mike Loukides, "What Is Data Science?", 2011).

"Data science isn’t just about the existence of data, or making guesses about what that data might mean; it’s about testing hypotheses and making sure that the conclusions you’re drawing from the data are valid." (Mike Loukides, "What Is Data Science?", 2011)

"The thread that ties most of these applications together is that data collected from users provides added value. Whether that data is search terms, voice samples, or product reviews, the users are in a feedback loop in which they contribute to the products they use. That’s the beginning of data science." (Mike Loukides, "What Is Data Science?", 2011)

"Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others" (Mike Loukides, "What Is Data Science?", 2011).

"Data science is an iterative process. It starts with a hypothesis (or several hypotheses) about the system we’re studying, and then we analyze the information. The results allow us to reject our initial hypotheses and refine our understanding of the data. When working with thousands of fields and millions of rows, it’s important to develop intuitive ways to reject bad hypotheses quickly." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Hollywood loves the myth of a lone scientist working late nights in a dark laboratory on a mysterious island, but the truth is far less melodramatic. Real science is almost always a team sport. Groups of people, collaborating with other groups of people, are the norm in science - and data science is no exception to the rule. When large groups of people work together for extended periods of time, a culture begins to emerge." (Mike Barlow, "Learning to Love Data Science", 2015) 

"One important thing to bear in mind about the outputs of data science and analytics is that in the vast majority of cases they do not uncover hidden patterns or relationships as if by magic, and in the case of predictive analytics they do not tell us exactly what will happen in the future. Instead, they enable us to forecast what may come. In other words, once we have carried out some modelling there is still a lot of work to do to make sense out of the results obtained, taking into account the constraints and assumptions in the model, as well as considering what an acceptable level of reliability is in each scenario." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope. (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The patterns that we extract using data science are useful only if they give us insight into the problem that enables us to do something to help solve the problem." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"We humans are reasonably good at defining rules that check one, two, or even three attributes (also commonly referred to as features or variables), but when we go higher than three attributes, we can start to struggle to handle the interactions between them. By contrast, data science is often applied in contexts where we want to look for patterns among tens, hundreds, thousands, and, in extreme cases, millions of attributes." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Even in an era of open data, data science and data journalism, we still need basic statistical principles in order not to be misled by apparent patterns in the numbers." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data scientists are advanced in their technical skills. They like to do coding, statistics, and so forth. In its purest form, data science is where an individual uses the scientific method on data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Aim for simplicity in Data Science. Real creativity won’t make things more complex. Instead, it will simplify them." (Damian D Mingle)

"Data Science is a series of failures punctuated by the occasional success." (Nigel C Lewis)

"Invite your Data Science team to ask questions and assume any system, rule, or way of doing things is open to further consideration." (Damian D Mingle)

📑Andrew Gelman - Collected Quotes

"The idea of optimization transfer is very appealing to me, especially since I have never succeeded in fully understanding the EM algorithm." (Andrew Gelman, "Discussion", Journal of Computational and Graphical Statistics vol 9, 2000)

"The difference between 'statistically significant' and 'not statistically significant' is not in itself necessarily statistically significant. By this, I mean more than the obvious point about arbitrary divisions, that there is essentially no difference between something significant at the 0.049 level or the 0.051 level. I have a bigger point to make. It is common in applied research–in the last couple of weeks, I have seen this mistake made in a talk by a leading political scientist and a paper by a psychologist–to compare two effects, from two different analyses, one of which is statistically significant and one which is not, and then to try to interpret/explain the difference. Without any recognition that the difference itself was not statistically significant." (Andrew Gelman, "The difference between ‘statistically significant’ and ‘not statistically significant’ is not in itself necessarily statistically significant", 2005)

"A naive interpretation of regression to the mean is that heights, or baseball records, or other variable phenomena necessarily become more and more 'average' over time. This view is mistaken because it ignores the error in the regression predicting y from x. For any data point xi, the point prediction for its yi will be regressed toward the mean, but the actual yi that is observed will not be exactly where it is predicted. Some points end up falling closer to the mean and some fall further." (Andrew Gelman & Jennifer Hill, "Data Analysis Using Regression and Multilevel/Hierarchical Models", 2007)

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman, "The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)

"The arguments I lay out are, briefly, that graphs are a distraction from more serious analysis; that graphs can mislead in displaying compelling patterns that are not statistically significant and that could easily enough be consistent with chance variation; that diagnostic plots could be useful in the development of a model but do not belong in final reports; that, when they take the place of tables, graphs place the careful reader one step further away from the numerical inferences that are the essence of rigorous scientific inquiry; and that the effort spent making flashy graphics would be better spent on the substance of the problem being studied." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)

"Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It is basically a trade-off: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis does not need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate." (Andrew Gelman et al, "Why Tables Are Really Much Better Than Graphs", Journal of Computational and Graphical Statistics, Vol. 20(1), 2011)"

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013) 

"Flaws can be found in any research design if you look hard enough. […] In our experience, it is good scientific practice to refine one's research hypotheses in light of the data. Working scientists are also keenly aware of the risks of data dredging, and they use confidence intervals and p-values as a tool to avoid getting fooled by noise. Unfortunately, a by-product of all this struggle and care is that when a statistically significant pattern does show up, it is natural to get excited and believe it. The very fact that scientists generally don't cheat, generally don't go fishing for statistical significance, makes them vulnerable to drawing strong conclusions when they encounter a pattern that is robust enough to cross the p < 0.05 threshold." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There are many roads to statistical significance; if data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth. Realistically, though, a researcher will come into a study with strong substantive hypotheses, to the extent that, for any given data set, the appropriate analysis can seem evidently clear. But even if the chosen data analysis is a deterministic function of the observed data, this does not eliminate the problem posed by multiple comparisons." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"There is a growing realization that reported 'statistically significant' claims in statistical publications  are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for 'probability') is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p- value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear." (Andrew Gelman & Eric Loken, "The Statistical Crisis in Science", American Scientist Vol. 102(6), 2014)

"I agree with the general message: 'The right variables make a big difference for accuracy. Complex statistical methods, not so much.' This is similar to something Hal Stern told me once: the most important aspect of a statistical analysis is not what you do with the data, it’s what data you use." (Andrew Gelman, "The most important aspect of a statistical analysis is not what you do with the data, it’s what data you use", 2018)

"We thus echo the classical Bayesian literature in concluding that ‘noninformative prior information’ is a contradiction in terms. The flat prior carries information just like any other; it represents the assumption that the effect is likely to be large. This is often not true. Indeed, the signal-to-noise ratios is often very low and then it is necessary to shrink the unbiased estimate. Failure to do so by inappropriately using the flat prior causes overestimation of effects and subsequent failure to replicate them." (Erik van Zwet & Andrew Gelman, "A proposal for informative default priors scaled by the standard error of estimates", The American Statistician 76, 2022)

"Taking a model too seriously is really just another way of not taking it seriously at all." (Andrew Gelman)

🔭Data Science: Percentiles & Quantiles (Just the Quotes)

"When distributions are compared, the goal is to understand how the distributions shift in going from one data set to the next. […] The most effective way to investigate the shifts of distributions is to compare corresponding quantiles." (William S Cleveland, "Visualizing Data", 1993)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin, "Common Errors in Statistics (and How to Avoid Them)", 2003)

"A feature shared by both the range and the interquartile range is that they are each calculated on the basis of just two values - the range uses the maximum and the minimum values, while the IQR uses the two quartiles. The standard deviation, on the other hand, has the distinction of using, directly, every value in the set as part of its calculation. In terms of representativeness, this is a great strength. But the chief drawback of the standard deviation is that, conceptually, it is harder to grasp than other more intuitive measures of spread." (Alan Graham, "Developing Thinking in Statistics", 2006)

"A useful feature of a stem plot is that the values maintain their natural order, while at the same time they are laid out in a way that emphasises the overall distribution of where the values are concentrated (that is, where the longer branches are). This enables you easily to pick out key values such as the median and quartiles." (Alan Graham, "Developing Thinking in Statistics", 2006)

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Percentile points are used to define the percentage of cases equal to and below a certain point in a distribution or set of scores." (Neil J Salkind, "Statistics for People who (think They) Hate Statistics: Excel 2007 Edition", 2010)

"Had we started with this [quantile] plot, noticed that it looks straight and not looked further, we would have missed the important features of the data. The general lesson is important. Theoretical quantile -quantile plots are not a panacea and must be used in conjunction with other displays and analyses to get a full picture of the behavior of the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 2011)

"[...] when measuring performance, it’s worth using percentiles rather than averages. The main advantage of the mean is that it’s easy to calculate, but percentiles are much more meaningful." (Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems", 2015)

"Many researchers have fallen into the trap of assuming percentiles are interval data and using them in Statistical procedures that require interval data. The results are somewhat distorted under these conditions since the scores are actually only ordinal data." (Martin L Abbott, "Using Statistics in the Social and Health Sciences with SPSS and Excel", 2016)

"The percentile or rank is the point in a distribution of scores below which a given percentage of scores fall. This is an indication of rank since it establishes score that is above the percentage of a set of scores. [...] Therefore, percentiles describe where a certain score is in relation to others in the distribution. [...] Statistically, it is important to remember that percentile ranks are ranks and therefore not interval data." (Martin L Abbott, "Using Statistics in the Social and Health Sciences with SPSS and Excel", 2016)

"It is not enough to give a single summary for a distribution - we need to have an idea of the spread, sometimes known as the variability. [...] The range is a natural choice, but is clearly very sensitive to extreme values [...] In contrast the inter-quartile range (IQR) is unaffected by extremes. This is the distance between the 25th and 75th percentiles of the data and so contains the ‘central half’ of the numbers [...] Finally the standard deviation is a widely used measure of spread. It is the most technically complex measure, but is only really appropriate for well-behaved symmetric data* since it is also unduly influenced by outlying values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"With skewed data, quantiles will reflect the skew, while adding standard deviations assumes symmetry in the distribution and can be misleading." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

🔭Data Science: Fuzziness (Just the Quotes)

"Today we preach that science is not science unless it is quantitative. We substitute correlation for causal studies, and physical equations for organic reasoning. Measurements and equations are supposed to sharpen thinking, but [...] they more often tend to make the thinking non-causal and fuzzy." (John R Platt, "Strong Inference", Science Vol. 146 (3641), 1964)

"Information that is only partially structured (and therefore contains some 'noise' is fuzzy, inconsistent, and indistinct. Such imperfect information may be regarded as having merit only if it represents an intermediate step in structuring the information into a final meaningful form. If the partially Structured information remains in fuzzy form, it will create a state of dissatisfaction in the mind of the originator and certainly in the mind of the recipient. The natural desire is to continue structuring until clarity, simplicity, precision, and definitiveness are obtained." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Mental models are fuzzy, incomplete, and imprecisely stated. Furthermore, within a single individual, mental models change with time, even during the flow of a single conversation. The human mind assembles a few relationships to fit the context of a discussion. As debate shifts, so do the mental models. Even when only a single topic is being discussed, each participant in a conversation employs a different mental model to interpret the subject. Fundamental assumptions differ but are never brought into the open. […] A mental model may be correct in structure and assumptions but, even so, the human mind - either individually or as a group consensus - is apt to draw the wrong implications for the future." (Jay W Forrester, "Counterintuitive Behaviour of Social Systems", Technology Review, 1971)

"In general, complexity and precision bear an inverse relation to one another in the sense that, as the complexity of a problem increases, the possibility of analysing it in precise terms diminishes. Thus 'fuzzy thinking' may not be deplorable, after all, if it makes possible the solution of problems which are much too complex for precise analysis." (Lotfi A Zadeh, "Fuzzy languages and their relation to human intelligence", 1972)

"Fuzziness, then, is a concomitant of complexity. This implies that as the complexity of a task, or of a system for performing that task, exceeds a certain threshold, the system must necessarily become fuzzy in nature. Thus, with the rapid increase in the complexity of the information processing tasks which the computers are called upon to perform, we are reaching a point where computers will have to be designed for processing of information in fuzzy form. In fact, it is the capability to manipulate fuzzy concepts that distinguishes human intelligence from the machine intelligence of current generation computers. Without such capability we cannot build machines that can summarize written text, translate well from one natural language to another, or perform many other tasks that humans can do with ease because of their ability to manipulate fuzzy concepts." (Lotfi A Zadeh, "The Birth and Evolution of Fuzzy Logic", 1989)

"Probability theory is an ideal tool for formalizing uncertainty in situations where class frequencies are known or where evidence is based on outcomes of a sufficiently long series of independent random experiments. Possibility theory, on the other hand, is ideal for formalizing incomplete information expressed in terms of fuzzy propositions." (George Klir, "Fuzzy sets and fuzzy logic", 1995)

"[…] interval mathematics and fuzzy logic together can provide a promising alternative to mathematical modeling for many physical systems that are too vague or too complicated to be described by simple and crisp mathematical formulas or equations. When interval mathematics and fuzzy logic are employed, the interval of confidence and the fuzzy membership functions are used as approximation measures, leading to the so-called fuzzy systems modeling." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"Fuzzy relations are developed by allowing the relationship between elements of two or more sets to take on an infinite number of degrees of relationship between the extremes of 'completely related' and 'not related', which are the only degrees of relationship possible in crisp relations. In this sense, fuzzy relations are to crisp relations as fuzzy sets are to crisp sets; crisp sets and relations are more constrained realizations of fuzzy sets and relations."  (Timothy J Ross & W Jerry Parkinson, "Fuzzy Set Theory, Fuzzy Logic, and Fuzzy Systems", 2002)

"The vast majority of information that we have on most processes tends to be nonnumeric and nonalgorithmic. Most of the information is fuzzy and linguistic in form." (Timothy J Ross & W Jerry Parkinson, "Fuzzy Set Theory, Fuzzy Logic, and Fuzzy Systems", 2002)

"Each fuzzy set is uniquely defined by a membership function. […] There are two approaches to determining a membership function. The first approach is to use the knowledge of human experts. Because fuzzy sets are often used to formulate human knowledge, membership functions represent a part of human knowledge. Usually, this approach can only give a rough formula of the membership function and fine-tuning is required. The second approach is to use data collected from various sensors to determine the membership function. Specifically, we first specify the structure of membership function and then fine-tune the parameters of membership function based on the data." (Huaguang Zhang & Derong Liu, "Fuzzy Modeling and Fuzzy Control", 2006)

"Granular computing is a general computation theory for using granules such as subsets, classes, objects, clusters, and elements of a universe to build an efficient computational model for complex applications with huge amounts of data, information, and knowledge. Granulation of an object a leads to a collection of granules, with a granule being a clump of points (objects) drawn together by indiscernibility, similarity, proximity, or functionality. In human reasoning and concept formulation, the granules and the values of their attributes are fuzzy rather than crisp. In this perspective, fuzzy information granulation may be viewed as a mode of generalization, which can be applied to any concept, method, or theory." (Salvatore Greco et al, "Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach", 2009)

"We use the term fuzzy logic to refer to all aspects of representing and manipulating knowledge that employ intermediary truth-values. This general, commonsense meaning of the term fuzzy logic encompasses, in particular, fuzzy sets, fuzzy relations, and formal deductive systems that admit intermediary truth-values, as well as the various methods based on them." (Radim Belohlavek & George J Klir, "Concepts and Fuzzy Logic", 2011)

🔭Data Science: Problems (Just the Quotes)

"The problems which arise in the reduction of data may thus conveniently be divided into three types: (i) Problems of Specification, which arise in the choice of the mathematical form of the population. (ii) When a specification has been obtained, problems of Estimation arise. These involve the choice among the methods of calculating, from our sample, statistics fit to estimate the unknow nparameters of the population. (iii) Problems of Distribution include the mathematical deduction of the exact nature of the distributions in random samples of our estimates of the parameters, and of other statistics designed to test the validity of our specification (tests of Goodness of Fit)." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The most important maxim for data analysis to heed, and one which many statisticians seem to have shunned is this: ‘Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.’ Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (John W Tukey, "The Future of Data Analysis", Annals of Mathematical Statistics, Vol. 33, No. 1, 1962)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology”, 1966)

"Statistical methods are tools of scientific investigation. Scientific investigation is a controlled learning process in which various aspects of a problem are illuminated as the study proceeds. It can be thought of as a major iteration within which secondary iterations occur. The major iteration is that in which a tentative conjecture suggests an experiment, appropriate analysis of the data so generated leads to a modified conjecture, and this in turn leads to a new experiment, and so on." (George E P Box & George C Tjao, "Bayesian Inference in Statistical Analysis", 1973)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"Doing data analysis without explicitly defining your problem or goal is like heading out on a road trip without having decided on a destination." (Michael Milton, "Head First Data Analysis", 2009)

"Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: 'there’s a lot of data, what can you make from it?'" (Mike Loukides, "What Is Data Science?", 2011)

"Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The big problems with statistics, say its best practitioners, have little to do with computations and formulas. They have to do with judgment - how to design a study, how to conduct it, then how to analyze and interpret the results. Journalists reporting on statistics have many chances to do harm by shaky reporting, and so are also called on to make sophisticated judgments. How, then, can we tell which studies seem credible, which we should report?" (Victor Cohn & Lewis Cope, "News & Numbers: A writer’s guide to statistics" 3rd Ed, 2012)

"We have let ourselves become enchanted by big data only because we exoticize technology. We’re impressed with small feats accomplished by computers alone, but we ignore big achievements from complementarity because the human contribution makes them less uncanny. Watson, Deep Blue, and ever-better machine learning algorithms are cool. But the most valuable companies in the future won’t ask what problems can be solved with computers alone. Instead, they’ll ask: how can computers help humans solve hard problems?" (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"The term [Big Data] simply refers to sets of data so immense that they require new methods of mathematical analysis, and numerous servers. Big Data - and, more accurately, the capacity to collect it - has changed the way companies conduct business and governments look at problems, since the belief wildly trumpeted in the media is that this vast repository of information will yield deep insights that were previously out of reach." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Your machine-learning algorithm should answer a very specific question that tells you something you need to know and that can be answered appropriately by the data you have access to. The best first question is something you already know the answer to, so that you have a reference and some intuition to compare your results with. Remember: you are solving a business problem, not a math problem."(Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Data scientists should have some domain expertise. Most data science projects begin with a real-world, domain-specific problem and the need to design a data-driven solution to this problem. As a result, it is important for a data scientist to have enough domain expertise that they understand the problem, why it is important, an dhow a data science solution to the problem might fit into an organization’s processes. This domain expertise guides the data scientist as she works toward identifying an optimized solution." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis. All too often, this is much harder than it should be. […] So the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"The way we explore data today, we often aren't constrained by rigid hypothesis testing or statistical rigor that can slow down the process to a crawl. But we need to be careful with this rapid pace of exploration, too. Modern business intelligence and analytics tools allow us to do so much with data so quickly that it can be easy to fall into a pitfall by creating a chart that misleads us in the early stages of the process." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020) 

🔭Data Science: Facts (Just the Quotes)

"Isolated facts, those that can only be obtained by rough estimate and that require development, can only be presented in memoires; but those that can be presented in a body, with details, and on whose accuracy one can rely, may be expounded in tables." (E Duvillard, "Memoire sur le travail du Bureau de statistique", 1806)

"Facts, however numerous, do not constitute a science. Like innumerable grains of sand on the sea shore, single facts appear isolated, useless, shapeless; it is only when compared, when arranged in their natural relations, when crystallised by the intellect, that they constitute the eternal truths of science." (William Farr, "Observation", Br. Ann. Med. 1, 1837)

"From carefully compiled statistical facts more may be learned [about] the moral nature of Man than can be gathered from all the accumulated experiences of the preceding ages." (Henry Thomas Buckle, "A History of Civilization in England", 1857/1898)

"The graphical method has considerable superiority for the exposition of statistical facts over the tabular. A heavy bank of figures is grievously wearisome to the eye, and the popular mind is as incapable of drawing any useful lessons from it as of extracting sunbeams from cucumbers." (Arthur B Farquhar & Henry Farquhar, "Economic and Industrial Delusions", 1891)

"[…] to kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact." (Charles R Darwin, "More Letters of Charles Darwin", Vol 2, 1903)

"Entia non sunt multiplicanda praeter necessitatem. That is to say; before you try a complicated hypothesis, you should make quite sure that no simplification of it will explain the facts equally well." (Charles S Peirce," Pragmatism and Pragmaticism", [lecture] 1903)

"But, once again, what the physical states as the result of an experiment is not the recital of observed facts, but the interpretation and the transposing of these facts into the ideal, abstract, symbolic world created by the theories he regards as established." (Pierre-Maurice-Marie Duhem, "The Aim and Structure of Physical Theory", 1908)

"The facts of greatest outcome are those we think simple; may be they really are so, because they are influenced only by a small number of well-defined circumstances, may be they take on an appearance of simplicity because the various circumstances upon which they depend obey the laws of chance and so come to mutually compensate." (Henri Poincaré, "The Foundations of Science", 1913)

"Statistics may be defined as numerical statements of facts by means of which large aggregates are analyzed, the relations of individual units to their groups are ascertained, comparisons are made between groups, and continuous records are maintained for comparative purposes." (Melvin T Copeland. "Statistical Methods" [in: Harvard Business Studies, Vol. III, Ed. by Melvin T Copeland, 1917])

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Observed facts must be built up, woven together, ordered, arranged, systematized into conclusions and theories by reflection and reason, if they are to have full bearing on life and the universe. Knowledge is the accumulation of facts. Wisdom is the establishment of relations. And just because the latter process is delicate and perilous, it is all the more delightful." (Gamaliel Bradford, "Darwin", 1926)

"In scientific thought we adopt the simplest theory which will explain all the facts under consideration and enable us to predict new facts of the same kind. The catch in this criterion lies in the world 'simplest'. It is really an aesthetic canon such as we find implicit in our criticisms of poetry or painting. The layman finds such a law as dx/dt = K(d2x/dy2) much less simple than 'it oozes', of which it is the mathematical statement. The physicist reverses this judgment, and his statement is certainly the more fruitful of the two, so far as prediction is concerned. It is, however, a statement about something very unfamiliar to the plainman, namely, the rate of change of a rate of change." (John B S Haldane, "Possible Worlds", 1927)

"We can invent as many theories we like, and any one of them can be made to fit the facts. But that theory is always preferred which makes the fewest number of assumptions." (Albert Einstein [interview] 1929)

"A system is said to be coherent if every fact in the system is related every other fact in the system by relations that are not merely conjunctive. A deductive system affords a good example of a coherent system." (Lizzie S Stebbing, "A modern introduction to logic", 1930)

"In experimental science facts of the greatest importance are rarely discovered accidentally: more frequently new ideas point the way towards them." (Erwin Schrödinger, "Science and the Human Temperament", 1935)

"Science is the attempt to discover, by means of observation, and reasoning based upon it, first, particular facts about the world, and then laws connecting facts with one another and (in fortunate cases) making it possible to predict future occurrences." (Bertrand Russell, "Religion and Science, Grounds of Conflict", 1935)

"With the help of physical theories we try to find our way through the maze of observed facts, to order and understand the world of our sense impressions." (Albert Einstein & Leopold Infeld, "The Evolution of Physics", 1938)

"Graphs are all inclusive. No fact is too slight or too great to plot to a scale suited to the eye. Graphs may record the path of an ion or the orbit of the sun, the rise of a civilization, or the acceleration of a bullet, the climate of a century or the varying pressure of a heart beat, the growth of a business, or the nerve reactions of a child." (Henry D Hubbard [foreword to Willard C Brinton, "Graphic Presentation", 1939)])

"The graphic art depicts magnitudes to the eye. It does more. It compels the seeing of relations. We may portray by simple graphic methods whole masses of intricate routine, the organization of an enterprise, or the plan of a campaign. Graphs serve as storm signals for the manager, statesman, engineer; as potent narratives for the actuary, statist, naturalist; and as forceful engines of research for science, technology and industry. They display results. They disclose new facts and laws. They reveal discoveries as the bud unfolds the flower."  (Henry D Hubbard [foreword to Willard C Brinton, "Graphic Presentation", 1939)])

"[…] the grand aim of all science […] is to cover the greatest possible number of empirical facts by logical deductions from the smallest possible number of hypotheses or axioms." (Albert Einstein, 1954)

"Science does not begin with facts; one of its tasks is to uncover the facts by removing misconceptions." (Lancelot L Whyte, "Accent on Form", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"When we meet a fact which contradicts a prevailing theory, we must accept the fact and abandon the theory, even when the theory is supported by great names and generally accepted." (Claude Bernard, "An Introduction to the Study of Experimental Medicine", 1957)

"Science aims at the discovery, verification, and organization of fact and information [...] engineering is fundamentally committed to the translation of scientific facts and information to concrete machines, structures, materials, processes, and the like that can be used by men." (Eric A Walker, "Engineers and/or Scientists", Journal of Engineering Education Vol. 51, 1961)

"A model is a useful (and often indispensable) framework on which to organize our knowledge about a phenomenon. […] It must not be overlooked that the quantitative consequences of any model can be no more reliable than the a priori agreement between the assumptions of the model and the known facts about the real phenomenon. When the model is known to diverge significantly from the facts, it is self-deceiving to claim quantitative usefulness for it by appeal to agreement between a prediction of the model and observation." (John R Philip, 1966)

"To do science is to search for repeated patterns, not simply to accumulate facts, and to do the science of geographical ecology is to search for patterns of plants and animal life that can be put on a map." (Robert H. MacArthur, "Geographical Ecology", 1972)

"No theory ever agrees with all the facts in its domain, yet it is not always the theory that is to blame. Facts are constituted by older ideologies, and a clash between facts and theories may be proof of progress. It is also a first step in our attempt to find the principles implicit in familiar observational notions." (Paul K Feyerabend, "Against Method: Outline of an Anarchistic Theory of Knowledge", 1975)

"Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance."  (Ronald P. Carver, "The case against statistical testing", Harvard Educational Review 48, 1978)

"Facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts. Facts do not go away while scientists debate rival theories for explaining them." (Stephen J Gould "Evolution as Fact and Theory", 1981) 

"Facts do not 'speak for themselves'. They speak for or against competing theories. Facts divorced from theory or visions are mere isolated curiosities." (Thomas Sowell, "A Conflict of Visions: Ideological Origins of Political Struggles", 1987)

"[…] no good model ever accounted for all the facts, since some data was bound to be misleading if not plain wrong. A theory that did fit all the data would have been ‘carpentered’ to do this and would thus be open to suspicion." (Francis H C Crick, "What Mad Pursuit: A Personal View of Scientific Discovery", 1988)

"The common perception of science as a rational activity, in which one confronts the evidence of fact with an open mind, could not be more false. Facts assume significance only within a pre-existing intellectual structure, which may be based as much on intuition and prejudice as on reason." (Walter Gratzer, The Guardian, 1989)

"As a result, surprisingly enough, scientific advance rarely comes solely through the accumulation of new facts. It comes most often through the construction of new theoretical frameworks. [..]  To understand scientific development, it is not enough merely to chronicle new discoveries and inventions. We must also trace the succession of worldviews" (Nancy R Pearcey & Charles B Thaxton, "The Soul of Science: Christian Faith and Natural Philosophy", 1994)

"Modeling involves a style of scientific thinking in which the argument is structured by the model, but in which the application is achieved via a narrative prompted by an external fact, an imagined event or question to be answered." (Uskali Mäki, "Fact and Fiction in Economics: Models, Realism and Social Construction", 2002)

"Although fiction is not fact, paradoxically we need some fictions, particularly mathematical ideas and highly idealized models, to describe, explain, and predict facts.  This is not because the universe is mathematical, but because our brains invent or use refined and law-abiding fictions, not only for intellectual pleasure but also to construct conceptual models of reality." (Mario Bunge, "Chasing Reality: Strife over Realism", 2006)

"There are no surprising facts, only models that are surprised by facts; and if a model is surprised by the facts, it is no credit to that model." (Eliezer S Yudkowsky, "Quantum Explanations", 2008)

"Obviously, the final goal of scientists and mathematicians is not simply the accumulation of facts and lists of formulas, but rather they seek to understand the patterns, organizing principles, and relationships between these facts to form theorems and entirely new branches of human thought." (Clifford A Pickover, "The Math Book", 2009)

"Relevance is not something you can predict. It is something you discover after the fact." (Thomas Sowell, "The Thomas Sowell Reader", 2011)

"Science does not live with facts alone. In addition to facts, it needs models. Scientific models fulfill two main functions with respect to empirical facts." (Andreas Bartels [in "Models, Simulations, and the Reduction of Complexity", Ed. by Ulrich Gähde et al, 2013)

"The whole point of science is that most of it is uncertain. That’s why science is exciting–because we don’t know. Science is all about things we don’t understand. The public, of course, imagines science is just a set of facts. But it’s not. Science is a process of exploring, which is always partial. We explore, and we find out things that we understand. We find out things we thought we understood were wrong. That’s how it makes progress." (Freeman Dyson,  [interview] 2014) 

"A mental representation is a mental structure that corresponds to an object, an idea, a collection of information, or anything else, concrete or abstract, that the brain is thinking about. […] Because the details of mental representations can differ dramatically from field to field, it’s hard to offer an overarching definition that is not too vague, but in essence these representations are preexisting patterns of information - facts, images, rules, relationships, and so on - that are held in long-term memory and that can be used to respond quickly and effectively in certain types of situations." (Anders Ericsson & Robert Pool," Peak: Secrets from  the  New  Science  of  Expertise", 2016)

"Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. […] Statistics is the science of learning from data." (Moore McCabe & Alwan Craig, "The Practice of Statistics for Business and Economics" 4th Ed., 2016)

"That is the trouble with facts: they sometimes force you to conclusions that differ with your intuition." (Steven G Krantz, "A Primer of Mathematical Writing" 2nd Ed., 2016)

More quotes on "Facts" at the-web-of-knowledge.blogspot.com

🔭Data Science: Constraints (Just the Quotes)

"A common and very powerful constraint is that of continuity. It is a constraint because whereas the function that changes arbitrarily can undergo any change, the continuous function can change, at each step, only to a neighbouring value." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"A most important concept […] is that of constraint. It is a relation between two sets, and occurs when the variety that exists under one condition is less than the variety that exists under another. [...] Constraints are of high importance in cybernetics […] because when a constraint exists advantage can usually be taken of it." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"[…] as every law of nature implies the existence of an invariant, it follows that every law of nature is a constraint. […] Science looks for laws; it is therefore much concerned with looking for constraints. […] the world around us is extremely rich in constraints. We are so familiar with them that we take most of them for granted, and are often not even aware that they exist. […] A world without constraints would be totally chaotic." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"[...] the existence of any invariant over a set of phenomena implies a constraint, for its existence implies that the full range of variety does not occur. The general theory of invariants is thus a part of the theory of constraints. Further, as every law of nature implies the existence of an invariant, it follows that every law of nature is a constraint." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"Formulating consists of determining the system inputs, outputs, requirements, objectives, constraints. Structuring the system provides one or more methods of organizing the solution, the method of operation, the selection of parts, and the nature of their performance requirements. It is evident that the processes of formulating a system and structuring it are strongly related." (Harold Chestnut, "Systems Engineering Tools", 1965)

"In general, we can say that the larger the system becomes, the more the parts interact, the more difficult it is to understand environmental constraints, the more obscure becomes the problem of what resources should be made available, and deepest of all, the more difficult becomes the problem of the legitimate values of the system."  (C West Churchman, "The Systems Approach", 1968)

"A physical theory must accept some actual data as inputs and must be able to generate from them another set of possible data (the output) in such a way that both input and output match the assumptions of the theory - laws, constraints, etc. This concept of matching involves relevance: thus boundary conditions are relevant only to field-like theories such as hydrodynamics and quantum mechanics. But matching is more than relevance: it is also logical compatibility." (Mario Bunge, "Philosophy of Physics", 1973)

"Physics is like that. It is important that the models we construct allow us to draw the right conclusions about the behaviour of the phenomena and their causes. But it is not essential that the models accurately describe everything that actually happens; and in general it will not be possible for them to do so, and for much the same reasons. The requirements of the theory constrain what can be literally represented. This does not mean that the right lessons cannot be drawn. Adjustments are made where literal correctness does not matter very much in order to get the correct effects where we want them; and very often, as in the staging example, one distortion is put right by another. That is why it often seems misleading to say that a particular aspect of a model is false to reality: given the other constraints that is just the way to restore the representation." (Nancy Cartwright, "How the Laws of Physics Lie", 1983)

"Indeed, except for the very simplest physical systems, virtually everything and everybody in the world is caught up in a vast, nonlinear web of incentives and constraints and connections. The slightest change in one place causes tremors everywhere else. We can't help but disturb the universe, as T.S. Eliot almost said. The whole is almost always equal to a good deal more than the sum of its parts. And the mathematical expression of that property - to the extent that such systems can be described by mathematics at all - is a nonlinear equation: one whose graph is curvy." (M Mitchell Waldrop, "Complexity: The Emerging Science at the Edge of Order and Chaos", 1992)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"A conceptual model is a representation of the system expertise using this formalism. An internal model is derived from the conceptual model and from a specification of the system transactions and the performance constraints." (Zbigniew W. Ras & Andrzej Skowron [Eds.], Foundations of Intelligent Systems: 10th International Symposium Vol 10, 1997)

"Whereas formal systems apply inference rules to logical variables, neural networks apply evolutive principles to numerical variables. Instead of calculating a solution, the network settles into a condition that satisfies the constraints imposed on it." (Paul Cilliers, "Complexity and Postmodernism: Understanding Complex Systems", 1998)

"What it means for a mental model to be a structural analog is that it embodies a representation of the spatial and temporal relations among, and the causal structures connecting the events and entities depicted and whatever other information that is relevant to the problem-solving talks. […] The essential points are that a mental model can be nonlinguistic in form and the mental mechanisms are such that they can satisfy the model-building and simulative constraints necessary for the activity of mental modeling." (Nancy J Nersessian, "Model-based reasoning in conceptual change", 1999)

"To develop a Control, the designer should find aspect systems, subsystems, or constraints that will prevent the negative interferences between elements (friction) and promote positive interferences (synergy). In other words, the designer should search for ways of minimizing frictions that will result in maximization of the global satisfaction" (Carlos Gershenson, "Design and Control of Self-organizing Systems", 2007)

"[chaos theory] presents a universe that is at once deterministic and obeys the fundamental physical laws, but is capable of disorder, complexity, and unpredictability. It shows that predictability is a rare phenomenon operating only within the constraints that science has filtered out from the rich diversity of our complex world." (Ziauddin Sardar & Iwona Abrams, "Introducing Chaos: A Graphic Guide", 2008)

"Cybernetics is the art of creating equilibrium in a world of possibilities and constraints. This is not just a romantic description, it portrays the new way of thinking quite accurately. Cybernetics differs from the traditional scientific procedure, because it does not try to explain phenomena by searching for their causes, but rather by specifying the constraints that determine the direction of their development." (Ernst von Glasersfeld, "Partial Memories: Sketches from an Improbable Life", 2010)

"Optimization is more than finding the best simulation results. It is itself a complex and evolving field that, subject to certain information constraints, allows data scientists, statisticians, engineers, and traders alike to perform reality checks on modeling results." (Chris Conlan, "Automated Trading with R: Quantitative Research and Platform Development", 2016)

"Exponentially growing systems are prevalent in nature, spanning all scales from biochemical reaction networks in single cells to food webs of ecosystems. How exponential growth emerges in nonlinear systems is mathematically unclear. […] The emergence of exponential growth from a multivariable nonlinear network is not mathematically intuitive. This indicates that the network structure and the flux functions of the modeled system must be subjected to constraints to result in long-term exponential dynamics." (Wei-Hsiang Lin et al, "Origin of exponential growth in nonlinear reaction networks", PNAS 117 (45), 2020)

More quotes on "Constraints" at the-web-of-knowledge.blogspot.com
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.