SQL Troubles: classification

Showing posts with label classification. Show all posts

29 December 2018

🔭Data Science: Experience (Just the Quotes)

"[…] it is from long experience chiefly that we are to expect the most certain rules of practice, yet it is withal to be remembered, that observations, and to put us upon the most probable means of improving any art, is to get the best insight we can into the nature and properties of those things which we are desirous to cultivate and improve." (Stephen Hales, "Vegetable Staticks", 1727)

"In order to supply the defects of experience, we will have recourse to the probable conjectures of analogy, conclusions which we will bequeath to our posterity to be ascertained by new observations, which, if we augur rightly, will serve to establish our theory and to carry it gradually nearer to absolute certainty." (Johann H Lambert, "The System of the World", 1800)

"Induction, analogy, hypotheses founded upon facts and rectified continually by new observations, a happy tact given by nature and strengthened by numerous comparisons of its indications with experience, such are the principal means for arriving at truth." (Pierre-Simon Laplace, "A Philosophical Essay on Probabilities", 1814)

"Observation is so wide awake, and facts are being so rapidly added to the sum of human experience, that it appears as if the theorizer would always be in arrears, and were doomed forever to arrive at imperfect conclusion; but the power to perceive a law is equally rare in all ages of the world, and depends but little on the number of facts observed." (Henry D Thoreau, "A Week on the Concord and Merrimack Rivers", 1862)

"Science is the systematic classification of experience." (George H Lewes, "The Physical Basis of Mind", 1877)

"Experience teaches that one will be led to new discoveries almost exclusively by means of special mechanical models." (Ludwig Boltzmann, "Lectures on Gas Theory", 1896)

"Philosophy, like science, consists of theories or insights arrived at as a result of systemic reflection or reasoning in regard to the data of experience. It involves, therefore, the analysis of experience and the synthesis of the results of analysis into a comprehensive or unitary conception. Philosophy seeks a totality and harmony of reasoned insight into the nature and meaning of all the principal aspects of reality." (Joseph A Leighton, "The Field of Philosophy: An outline of lectures on introduction to philosophy", 1919)

"Abstraction is the detection of a common quality in the characteristics of a number of diverse observations […] A hypothesis serves the same purpose, but in a different way. It relates apparently diverse experiences, not by directly detecting a common quality in the experiences themselves, but by inventing a fictitious substance or process or idea, in terms of which the experience can be expressed. A hypothesis, in brief, correlates observations by adding something to them, while abstraction achieves the same end by subtracting something." (Herbert Dingle, Science and Human Experience, 1931)

"It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." (Albert Einstein, [lecture] 1933)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Science does not aim, primarily, at high probabilities. It aims at a high informative content, well backed by experience. But a hypothesis may be very probable simply because it tells us nothing, or very little." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Science is a system of statements based on direct experience, and controlled by experimental verification. Verification in science is not, however, of single statements but of the entire system or a sub-system of such statements." (Rudolf Carnap, "The Unity of Science", 1934)

"Science is the attempt to make the chaotic diversity of our sense experience correspond to a logically uniform system of thought." (Albert Einstein, "Considerations Concerning the Fundaments of Theoretical Physics", Science Vol. 91 (2369), 1940)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"Every bit of knowledge we gain and every conclusion we draw about the universe or about any part or feature of it depends finally upon some observation or measurement. Mankind has had again and again the humiliating experience of trusting to intuitive, apparently logical conclusions without observations, and has seen Nature sail by in her radiant chariot of gold in an entirely different direction." (Oliver J Lee, "Measuring Our Universe: From the Inner Atom to Outer Space", 1950)

"Statistics is the name for that science and art which deals with uncertain inferences - which uses numbers to find out something about nature and experience." (Warren Weaver, 1952)

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Experience without theory teaches nothing." (William E Deming, "Out of the Crisis", 1986)

"A discovery in science, or a new theory, even where it appears most unitary and most all-embracing, deals with some immediate element of novelty or paradox within the framework of far vaster, unanalyzed, unarticulated reserves of knowledge, experience, faith, and presupposition. Our progress is narrow: it takes a vast world unchallenged and for granted." (James R Oppenheimer, "Atom and Void", 1989)

"It is ironic but true: the one reality science cannot reduce is the only reality we will ever know. This is why we need art. By expressing our actual experience, the artist reminds us that our science is incomplete, that no map of matter will ever explain the immateriality of our consciousness." (Jonah Lehrer, "Proust Was a Neuroscientist", 2011)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"Ideally, a decision maker or a forecaster will combine the outside view and the inside view - or, similarly, statistics plus personal experience. But it’s much better to start with the statistical view, the outside view, and then modify it in the light of personal experience than it is to go the other way around. If you start with the inside view you have no real frame of reference, no sense of scale - and can easily come up with a probability that is ten times too large, or ten times too small." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Statistical metrics can show us facts and trends that would be impossible to see in any other way, but often they’re used as a substitute for relevant experience, by managers or politicians without specific expertise or a close-up view." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The contradiction between what we see with our own eyes and what the statistics claim can be very real. […] The truth is more complicated. Our personal experiences should not be dismissed along with our feelings, at least not without further thought. Sometimes the statistics give us a vastly better way to understand the world; sometimes they mislead us. We need to be wise enough to figure out when the statistics are in conflict with everyday experience - and in those cases, which to believe." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

17 December 2018

🔭Data Science: Bias (Just the Quotes)

"The human mind can hardly remain entirely free from bias, and decisive opinions are often formed before a thorough examination of a subject from all its aspects has been made." (Helena P. Blavatsky, "The Secret Doctrine", 1888)

"The classification of facts, the recognition of their sequence and relative significance is the function of science, and the habit of forming a judgment upon these facts unbiased by personal feeling is characteristic of what may be termed the scientific frame of mind." (Karl Pearson, "The Grammar of Science", 1892)

"It may be impossible for human intelligence to comprehend absolute truth, but it is possible to observe Nature with an unbiased mind and to bear truthful testimony of things seen." (Sir Richard A Gregory, "Discovery, Or, The Spirit and Service of Science", 1916)

"Scientific discovery, or the formulation of scientific theory, starts in with the unvarnished and unembroidered evidence of the senses. It starts with simple observation - simple, unbiased, unprejudiced, naive, or innocent observation - and out of this sensory evidence, embodied in the form of simple propositions or declarations of fact, generalizations will grow up and take shape, almost as if some process of crystallization or condensation were taking place. Out of a disorderly array of facts, an orderly theory, an orderly general statement, will somehow emerge." (Sir Peter B Medawar, "Is the Scientific Paper Fraudulent?", The Saturday Review, 1964)

"Errors may also creep into the information transfer stage when the originator of the data is unconsciously looking for a particular result. Such situations may occur in interviews or questionnaires designed to gather original data. Improper wording of the question, or improper voice inflections. and other constructional errors may elicit nonobjective responses. Obviously, if the data is incorrectly gathered, any graph based on that data will contain the original error - even though the graph be most expertly designed and beautifully presented." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"Numbers have undoubted powers to beguile and benumb, but critics must probe behind numbers to the character of arguments and the biases that motivate them." (Stephen J Gould, "An Urchin in the Storm: Essays About Books and Ideas", 1987)

"But our ways of learning about the world are strongly influenced by the social preconceptions and biased modes of thinking that each scientist must apply to any problem. The stereotype of a fully rational and objective ‘scientific method’, with individual scientists as logical (and interchangeable) robots, is self-serving mythology." (Stephen J Gould, "This View of Life: In the Mind of the Beholder", "Natural History", Vol. 103, No. 2, 1994)

"Under conditions of uncertainty, both rationality and measurement are essential to decision-making. Rational people process information objectively: whatever errors they make in forecasting the future are random errors rather than the result of a stubborn bias toward either optimism or pessimism. They respond to new information on the basis of a clearly defined set of preferences. They know what they want, and they use the information in ways that support their preferences." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Self-selection bias occurs when people choose to be in the data - for example, when people choose to go to college, marry, or have children. […] Self-selection bias is pervasive in 'observational data', where we collect data by observing what people do. Because these people chose to do what they are doing, their choices may reflect who they are. This self-selection bias could be avoided with a controlled experiment in which people are randomly assigned to groups and told what to do." (Gary Smith, "Standard Deviations", 2014)

"Self-selection bias occurs when we compare people who made different choices without thinking about why they made these choices. […] Our conclusions would be more convincing if choice was removed […]" (Gary Smith, "Standard Deviations", 2014)

"We naturally draw conclusions from what we see […]. We should also think about what we do not see […]. The unseen data may be just as important, or even more important, than the seen data. To avoid survivor bias, start in the past and look forward." (Gary Smith, "Standard Deviations", 2014)

"We live in a world with a surfeit of information at our service. It is our choice whether we seek out data that reinforce our biases or choose to look at the world in a critical, rational manner, and allow reality to bend our preconceptions. In the long run, the truth will work better for us than our cherished fictions." (Razib Khan, "The Abortion Stereotype", The New York Times, 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Bias occurs normally when the model is underfitted and has failed to learn enough from the training data. It is the difference between the mean of the probability distribution and the actual correct value. Hence, the accuracy of the model is different for different data sets (test and training sets). To reduce the bias error, data scientists repeat the model-building process by resampling the data to obtain better prediction values." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"High-bias models typically produce simpler models that do not overfit and in those cases the danger is that of underfitting. Models with low-bias are typically more complex and that complexity enables us to represent the training data in a more accurate way. The danger here is that the flexibility provided by higher complexity may end up representing not only a relationship in the data but also the noise. Another way of portraying the bias-variance trade-off is in terms of complexity v simplicity." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"If either bias or variance is high, the model can be very far off from reality. In general, there is a trade-off between bias and variance. The goal of any machine-learning algorithm is to achieve low bias and low variance such that it gives good prediction performance. In reality, because of so many other hidden parameters in the model, it is hard to calculate the real bias and variance error. Nevertheless, the bias and variance provide a measure to understand the behavior of the machine-learning algorithm so that the model model can be adjusted to provide good prediction performance." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"The human brain always concocts biases to aid in the construction of a coherent mental life, exclusively suitable for an individual’s personal needs." (Abhijit Naskar, "We Are All Black: A Treatise on Racism", 2017)

"The tension between bias and variance, simplicity and complexity, or underfitting and overfitting is an area in the data science and analytics process that can be closer to a craft than a fixed rule. The main challenge is that not only is each dataset different, but also there are data points that we have not yet seen at the moment of constructing the model. Instead, we are interested in building a strategy that enables us to tell something about data from the sample used in building the model." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"When we have all the data, it is straightforward to produce statistics that describe what has been measured. But when we want to use the data to draw broader conclusions about what is going on around us, then the quality of the data becomes paramount, and we need to be alert to the kind of systematic biases that can jeopardize the reliability of any claims." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"We over-fit when we go too far in adapting to local circumstances, in a worthy but misguided effort to be ‘unbiased’ and take into account all the available information. Usually we would applaud the aim of being unbiased, but this refinement means we have less data to work on, and so the reliability goes down. Over-fitting therefore leads to less bias but at a cost of more uncertainty or variation in the estimates, which is why protection against over-fitting is sometimes known as the bias/variance trade-off." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Any machine learning model is trained based on certain assumptions. In general, these assumptions are the simplistic approximations of some real-world phenomena. These assumptions simplify the actual relationships between features and their characteristics and make a model easier to train. More assumptions means more bias. So, while training a model, more simplistic assumptions = high bias, and realistic assumptions that are more representative of actual phenomena = low bias." (Imran Ahmad, "40 Algorithms Every Programmer Should Know", 2020)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"If you study one group and assume that your results apply to other groups, this is extrapolation. If you think you are studying one group, but do not manage to obtain a representative sample of that group, this is a different problem. It is a problem so important in statistics that it has a special name: selection bias. Selection bias arises when the individuals that you sample for your study differ systematically from the population of individuals eligible for your study." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Machine learning bias is typically understood as a source of learning error, a technical problem. […] Machine learning bias can introduce error simply because the system doesn’t 'look' for certain solutions in the first place. But bias is actually necessary in machine learning - it’s part of learning itself." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"To accomplish their goals, what are now called machine learning systems must each learn something specific. Researchers call this giving the machine a 'bias'. […] A bias in machine learning means that the system is designed and tuned to learn something. But this is, of course, just the problem of producing narrow problem-solving applications." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Any time you run regression analysis on arbitrary real-world observational data, there’s a significant risk that there’s hidden confounding in your dataset and so causal conclusions from such analysis are likely to be (causally) biased." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"Science is the search for truth, that is the effort to understand the world: it involves the rejection of bias, of dogma, of revelation, but not the rejection of morality." (Linus Pauling)

"Facts and values are entangled in science. It's not because scientists are biased, not because they are partial or influenced by other kinds of interests, but because of a commitment to reason, consistency, coherence, plausibility and replicability. These are value commitments." (Alva Noë)

"A scientist has to be neutral in his search for the truth, but he cannot be neutral as to the use of that truth when found. If you know more than other people, you have more responsibility, rather than less." (Charles P Snow)

More quotes on "Bias" at the-web-of-knowledge.blogspot.com.

01 December 2018

🔭Data Science: The Science in Data Science (Just the Quotes)

"The aim of every science is foresight. For the laws of established observation of phenomena are generally employed to foresee their succession. All men, however little advanced make true predictions, which are always based on the same principle, the knowledge of the future from the past." (Auguste Compte, "Plan des travaux scientifiques nécessaires pour réorganiser la société", 1822)

"Science is nothing but the finding of analogy, identity, in the most remote parts." (Ralph W Emerson, 1837)

"Therefore science always goes abreast with the just elevation of the man, keeping step with religion and metaphysics; or, the state of science is an index of our self-knowledge." (Ralph W Emerson, "The Poet", 1844)

"It may sound quite strange, but for me, as for other scientists on whom these kinds of imaginative images have a greater effect than other poems do, no science is at its very heart more closely related to poetry, perhaps, than is chemistry." (Just Liebig, 1854)

"Science is the systematic classification of experience." (George H Lewes, "The Physical Basis of Mind", 1877)

"Science is the observation of things possible, whether present or past; prescience is the knowledge of things which may come to pass, though but slowly." (Leonardo da Vinci, "The Notebooks of Leonardo da Vinci", 1883)

"While science is pursuing a steady onward movement, it is convenient from time to time to cast a glance back on the route already traversed, and especially to consider the new conceptions which aim at discovering the general meaning of the stock of facts accumulated from day to day in our laboratories." (Dmitry Mendeleyev, "The Periodic Law of the Chemical Elements", Journal of the Chemical Society Vol. 55, 1889)

"The aim of science is always to reduce complexity to simplicity." (William James, "The Principles of Psychology", 1890)

"Science is not the monopoly of the naturalist or the scholar, nor is it anything mysterious or esoteric. Science is the search for truth, and truth is the adequacy of a description of facts." (Paul Carus, "Philosophy as a Science", 1909)

"Science is reduction. Mathematics is its ideal, its form par excellence, for it is in mathematics that assimilation, identification, is most perfectly realized. The universe, scientifically explained, would be a certain formula, one and eternal, regarded as the equivalent of the entire diversity and movement of things." (Émile Boutroux, "Natural law in Science and Philosophy", 1914)

"Abstract as it is, science is but an outgrowth of life. That is what the teacher must continually keep in mind. […] Let him explain […] science is not a dead system - the excretion of a monstrous pedantism - but really one of the most vigorous and exuberant phases of human life." (George A L Sarton, "The Teaching of the History of Science", The Scientific Monthly, 1918)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Science is simply setting out on a fishing expedition to see whether it cannot find some procedure which it can call measurement of space and some procedure which it can call the measurement of time, and something which it can call a system of forces, and something which it can call masses." (Alfred N Whitehead, "The Concept of Nature", 1920)

"Science is a magnificent force, but it is not a teacher of morals. It can perfect machinery, but it adds no moral restraints to protect society from the misuse of the machine. It can also build gigantic intellectual ships, but it constructs no moral rudders for the control of storm tossed human vessel. It not only fails to supply the spiritual element needed but some of its unproven hypotheses rob the ship of its compass and thus endangers its cargo." (William J Bryan, "Undelivered Trial Summation Scopes Trial", 1925)

"Science is but a method. Whatever its material, an observation accurately made and free of compromise to bias and desire, and undeterred by consequence, is science." (Hans Zinsser, "Untheological Reflections", The Atlantic Monthly, 1929)

"Although this may seem a paradox, all exact science is dominated by the idea of approximation. When a man tells you that he knows the exact truth about anything, you are safe in inferring that he is an inexact man." (Bertrand Russell, "The Scientific Outlook", 1931)

"The common view of science is that it is a sort of machine for increasing the race’s store of dependable facts. It is that only in part; in even larger part it is a machine for upsetting undependable facts." (Will Durant, 1931)

"One has to recognize that science is not metaphysics, and certainly not mysticism; it can never bring us the illumination and the satisfaction experienced by one enraptured in ecstasy. Science is sobriety and clarity of conception, not intoxicated vision."(Ludwig Von Mises, "Epistemological Problems of Economics", 1933)

"Modern positivists are apt to see more clearly that science is not a system of concepts but rather a system of statements." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"Science is the attempt to discover, by means of observation, and reasoning based upon it, first, particular facts about the world, and then laws connecting facts with one another and (in fortunate cases) making it possible to predict future occurrences." (Bertrand Russell, "Religion and Science, Grounds of Conflict", 1935)

"[…] that all science is merely a game can be easily discarded as a piece of wisdom too easily come by. But it is legitimate to enquire whether science is not liable to indulge in play within the closed precincts of its own method. Thus, for instance, the scientist’s continuous penchant for systems tends in the direction of play." (Johan Huizinga, "Homo Ludens", 1938)

"Science makes no pretension to eternal truth or absolute truth; some of its rivals do. That science is in some respects inhuman may be the secret of its success in alleviating human misery and mitigating human stupidity." (Eric T Bell, "Mathematics: Queen and Servant of Science", 1938)

"Science is the organised attempt of mankind to discover how things work as causal systems. The scientific attitude of mind is an interest in such questions. It can be contrasted with other attitudes, which have different interests; for instance the magical, which attempts to make things work not as material systems but as immaterial forces which can be controlled by spells; or the religious, which is interested in the world as revealing the nature of God." (Conrad H Waddington, "The Scientific Attitude", 1941)

"Science, in the broadest sense, is the entire body of the most accurately tested, critically established, systematized knowledge available about that part of the universe which has come under human observation. For the most part this knowledge concerns the forces impinging upon human beings in the serious business of living and thus affecting man’s adjustment to and of the physical and the social world. […] Pure science is more interested in understanding, and applied science is more interested in control […]" (Austin L Porterfield, "Creative Factors in Scientific Research", 1941)

"Science is an interconnected series of concepts and schemes that have developed as a result of experimentation and observation and are fruitful of further experimentation and observation."(James B Conant, "Science and Common Sense", 1951)

"[…] theoretical science is essentially disciplined exploitation of metaphor." (Anatol Rapoport, "Operational Philosophy", 1953)

"Prediction is all very well; but we must make sense of what we predict. The mainspring of science is the conviction that by honest, imaginative enquiry we can build up a system of ideas about Nature which has some legitimate claim to ‘reality’." (Stephen Toulmin, "The Philosophy of Science: An Introduction", 1953)

"An engineering science aims to organize the design principles used in engineering practice into a discipline and thus to exhibit the similarities between different areas of engineering practice and to emphasize the power of fundamental concepts. In short, an engineering science is predominated by theoretical analysis and very often uses the tool of advanced mathematics." (Qian Xuesen, "Engineering cybernetics", 1954))

"The true aim of science is to discover a simple theory which is necessary and sufficient to cover the facts, when they have been purified of traditional prejudices." (Lancelot L Whyte, "Accent on Form", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"The progress of science is the discovery at each step of a new order which gives unity to what had seemed unlike." (Jacob Bronowski, "Science and Human Values", 1956)

"[…] any serious examination of the basic concepts of any science is far more difficult than the elaboration of their ultimate consequences." (George F J Temple, "Turning Points in Physics", 1959)

"Science is usually understood to depict a universe of strict order and lawfulness, of rigorous economy - one whose currency is energy, convertible against a service charge into a growing common pool called entropy." (Paul A Weiss,"Organic Form: Scientific and Aesthetic Aspects", 1960)

"[…] the progress of science is a little like making a jig-saw puzzle. One makes collections of pieces which certainly fit together, though at first it is not clear where each group should come in the picture as a whole, and if at first one makes a mistake in placing it, this can be corrected later without dismantling the whole group." (Sir George Thomson, "The Inspiration of Science", 1961)

"Science is the reduction of the bewildering diversity of unique events to manageable uniformity within one of a number of symbol systems, and technology is the art of using these symbol systems so as to control and organize unique events. Scientific observation is always a viewing of things through the refracting medium of a symbol system, and technological praxis is always handling of things in ways that some symbol system has dictated. Education in science and technology is essentially education on the symbol level." (Aldous L Huxley, "Essay", Daedalus, 1962)

"The important distinction between science and those other systematizations [i.e., art, philosophy, and theology] is that science is self-testing and self-correcting. Here the essential point of science is respect for objective fact. What is correctly observed must be believed [...] the competent scientist does quite the opposite of the popular stereotype of setting out to prove a theory; he seeks to disprove it." (George G Simpson, "Notes on the Nature of Science", 1962)

"What, then, is science according to common opinion? Science is what scientists do. Science is knowledge, a body of information about the external world. Science is the ability to predict. Science is power, it is engineering. Science explains, or gives causes and reasons." (John Bremer "What Is Science?" [in "Notes on the Nature of Science"], 1962)

"Science is a matter of disinterested observation, patient ratiocination within some system of logically correlated concepts. In real-life conflicts between reason and passion the issue is uncertain. Passion and prejudice are always able to mobilize their forces more rapidly and press the attack with greater fury; but in the long run (and often, of course, too late) enlightened self-interest may rouse itself, launch a counterattack and win the day for reason." (Aldous L Huxley, "Literature and Science", 1963)

"Science is a way to teach how something gets to be known, what is not known, to what extent things are known (for nothing is known absolutely), how to handle doubt and uncertainty, what the rules of evidence are, how to think about things so that judgments can be made, how to distinguish truth from fraud, and from show." (Richard P Feynman, "The Problem of Teaching Physics in Latin America", Engineering and Science, 1963)

"The aim of science is to apprehend this purely intelligible world as a thing in itself, an object which is what it is independently of all thinking, and thus antithetical to the sensible world. [...] The world of thought is the universal, the timeless and spaceless, the absolutely necessary, whereas the world of sense is the contingent, the changing and moving appearance which somehow indicates or symbolizes it." (Robin G Collingwood, "Essays in the Philosophy of Art", 1964)

"The central task of a natural science is to make the wonderful commonplace: to show that complexity, correctly viewed, is only a mask for simplicity; to find pattern hidden in apparent chaos." (Herbert A Simon, "The Sciences of the Artificial", 1969)

"Science is a product of man, of his mind; and science creates the real world in its own image." (Frank E Egler, "The Way of Science", 1970)

"To do science is to search for repeated patterns, not simply to accumulate facts [...]" (Robert H. MacArthur, "Geographical Ecology", 1972)

"Science is systematic organisation of knowledge about the universe on the basis of explanatory hypotheses which are genuinely testable. Science advances by developing gradually more comprehensive theories; that is, by formulating theories of greater generality which can account for observational statements and hypotheses which appear as prima facie unrelated." (Francisco J Ayala, "Studies in the Philosophy of Biology: Reduction and Related Problems", 1974)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"The very nature of science is such that scientists need the metaphor as a bridge between old and new theories." (Earl R MacCormac, "Metaphor and Myth in Science and Religion", 1976)

"Facts do not ‘speak for themselves’; they are read in the light of theory. Creative thought, in science as much as in the arts, is the motor of changing opinion. Science is a quintessentially human activity, not a mechanized, robot-like accumulation of objective information, leading by laws of logic to inescapable interpretation." (Stephen J Gould, "Ever Since Darwin", 1977)

"Science is not a heartless pursuit of objective information. It is a creative human activity, its geniuses acting more as artists than information processors. Changes in theory are not simply the derivative results of the new discoveries but the work of creative imagination influenced by contemporary social and political forces." (Stephen J Gould, "Ever Since Darwin: Reflections in Natural History", 1977)

"Engineering or Technology is the making of things that did not previously exist, whereas science is the discovering of things that have long existed." (David Billington, "The Tower and the Bridge: The New Art of Structural Engineering", 1983)

"Science is a process. It is a way of thinking, a manner of approaching and of possibly resolving problems, a route by which one can produce order and sense out of disorganized and chaotic observations. Through it we achieve useful conclusions and results that are compelling and upon which there is a tendency to agree." (Isaac Asimov, "‘X’ Stands for Unknown", 1984)

"If doing mathematics or science is looked upon as a game, then one might say that in mathematics you compete against yourself or other mathematicians; in physics your adversary is nature and the stakes are higher." (Mark Kac, "Enigmas Of Chance", 1985)

"Science is defined as a set of observations and theories about observations." (F Albert Matsen, "The Role of Theory in Chemistry", Journal of Chemical Education Vol. 62 (5), 1985)

"We expect to learn new tricks because one of our science based abilities is being able to predict. That after all is what science is about. Learning enough about how a thing works so you'll know what comes next. Because as we all know everything obeys the universal laws, all you need is to understand the laws." (James Burke, "The Day the Universe Changed", 1985)

"Science is human experience systematically extended (by intent, methodology and instrumentation) for the purpose of learning more about the natural world and for the critical empirical testing and possible falsification of all ideas about the natural world. Scientific hypotheses may incorporate only elements of the natural empirical world, and thus may contain no element of the supernatural." (Robert E Kofahl, Correctly Redefining Distorted Science: A Most Essential Task", Creation Research Society Quarterly Vol. 23, 1986)

"Science is not a given set of answers but a system for obtaining answers. The method by which the search is conducted is more important than the nature of the solution. Questions need not be answered at all, or answers may be provided and then changed. It does not matter how often or how profoundly our view of the universe alters, as long as these changes take place in a way appropriate to science. For the practice of science, like the game of baseball, is covered by definite rules." (Robert Shapiro, "Origins: A Skeptic’s Guide to the Creation of Life on Earth", 1986)

"Science doesn't purvey absolute truth. Science is a mechanism. It's a way of trying to improve your knowledge of nature. It's a system for testing your thoughts against the universe and seeing whether they match. And this works, not just for the ordinary aspects of science, but for all of life. I should think people would want to know that what they know is truly what the universe is like, or at least as close as they can get to it." (Isaac Asimov, [Interview by Bill Moyers] 1988)

"Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe, and seeing whether they match." (Isaac Asimov, [interview with Bill Moyers in The Humanist] 1989)

"The view of science is that all processes ultimately run down, but entropy is maximized only in some far, far away future. The idea of entropy makes an assumption that the laws of the space-time continuum are infinitely and linearly extendable into the future. In the spiral time scheme of the timewave this assumption is not made. Rather, final time means passing out of one set of laws that are conditioning existence and into another radically different set of laws. The universe is seen as a series of compartmentalized eras or epochs whose laws are quite different from one another, with transitions from one epoch to another occurring with unexpected suddenness." (Terence McKenna, "True Hallucinations", 1989)

"Science is (or should be) a precise art. Precise, because data may be taken or theories formulated with a certain amount of accuracy; an art, because putting the information into the most useful form for investigation or for presentation requires a certain amount of creativity and insight." (Patricia H Reiff, "The Use and Misuse of Statistics in Space Physics", Journal of Geomagnetism and Geoelectricity 42, 1990)

"In science if you know what you are doing you should not be doing it. In engineering if you do not know what you are doing you should not be doing it. Of course, you seldom, if ever, see either pure state." (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, "New Theories of Everything", 1991)

"The goal of science is to make sense of the diversity of Nature." (John D Barrow, "Theories of Everything: The Quest for Ultimate Explanation", 1991)

"Science is not about control. It is about cultivating a perpetual condition of wonder in the face of something that forever grows one step richer and subtler than our latest theory about it. It is about reverence, not mastery." (Richard Power, "Gold Bug Variations", 1993)

"Statistics as a science is to quantify uncertainty, not unknown." (Chamont Wang, "Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety", 1993)

"Clearly, science is not simply a matter of observing facts. Every scientific theory also expresses a worldview. Philosophical preconceptions determine where facts are sought, how experiments are designed, and which conclusions are drawn from them." (Nancy R Pearcey & Charles B. Thaxton, "The Soul of Science: Christian Faith and Natural Philosophy", 1994)

"Science is distinguished not for asserting that nature is rational, but for constantly testing claims to that or any other affect by observation and experiment." (Timothy Ferris, "The Whole Shebang: A State-of-the Universe’s Report", 1996)

"Science is more than a mere attempt to describe nature as accurately as possible. Frequently the real message is well hidden, and a law that gives a poor approximation to nature has more significance than one which works fairly well but is poisoned at the root." (Robert H March, "Physics for Poets", 1996)

"The art of science is knowing which observations to ignore and which are the key to the puzzle." (Edward W Kolb, "Blind Watchers of the Sky", 1996)

"Mathematics is the study of analogies between analogies. All science is. Scientists want to show that things that don’t look alike are really the same. That is one of their innermost Freudian motivations. In fact, that is what we mean by understanding." (Gian-Carlo Rota, "Indiscrete Thoughts", 1997)

"Religion is the antithesis of science; science is competent to illuminate all the deep questions of existence, and does so in a manner that makes full use of, and respects the human intellect. I see neither need nor sign of any future reconciliation." (Peter W Atkins, "Religion - The Antithesis to Science", 1997)

"[…] the pursuit of science is more than the pursuit of understanding. It is driven by the creative urge, the urge to construct a vision, a map, a picture of the world that gives the world a little more beauty and coherence than it had before." (John A Wheeler, "Geons, Black Holes, and Quantum Foam: A Life in Physics", 1998)

"The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test." (Richard Feynman, "The Meaning of It All", 1998)

"The passion and beauty and joy of science is that we humans have invented a process to understand the universe in a way that is true for everyone. We are finding universal truths." (Bill Nye, 2000)

"The poetry of science is in some sense embodied in its great equations, and these equations can also be peeled. But their layers represent their attributes and consequences, not their meanings." (Graham Farmelo, 2002)

"Science is the art of the appropriate approximation. While the flat earth model is usually spoken of with derision it is still widely used. Flat maps, either in atlases or road maps, use the flat earth model as an approximation to the more complicated shape." (Byron K. Jennings, "On the Nature of Science", Physics in Canada Vol. 63 (1), 2007)

"Science isn’t about being right. It is about convincing others of the correctness of an idea through a methodology all will accept using data everyone can trust. New ideas take time to be accepted because they compete with others that have already passed the test." (Tom Koch, "Commentary: Nobody loves a critic: Edmund A Parkes and John Snow’s cholera", International Journal of Epidemiology Vol. 42 (6), 2013)

More quotes on "Science" at quotablemath.blogspot.com.

28 November 2018

🔭Data Science: Classification (Just the Quotes)

"Classification is the process of arranging data into sequences and groups according to their common characteristics, or separating them into different but related parts." (Horace Secrist, "An Introduction to Statistical Methods", 1917)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"A classification is a scheme for breaking a category into a set of parts, called classes, according to some precisely defined differing characteristics possessed by all the elements of the category." (Alva M Tuttle, "Elementary Business and Economic Statistics", 1957)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996)

"While classification is important, it can certainly be overdone. Making too fine a distinction between things can be as serious a problem as not being able to decide at all. Because we have limited storage capacity in our brain (we still haven't figured out how to add an extender card), it is important for us to be able to cluster similar items or things together. Not only is clustering useful from an efficiency standpoint, but the ability to group like things together (called chunking by artificial intelligence practitioners) is a very important reasoning tool. It is through clustering that we can think in terms of higher abstractions, solving broader problems by getting above all of the nitty-gritty details." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The well-known 'No Free Lunch' theorem indicates that there does not exist a pattern classification method that is inherently superior to any other, or even to random guessing without using additional information. It is the type of problem, prior information, and the amount of training samples that determine the form of classifier to apply. In fact, corresponding to different real-world problems, different classes may have different underlying data structures. A classifier should adjust the discriminant boundaries to fit the structures which are vital for classification, especially for the generalization capacity of the classifier." (Hui Xue et al, "SVM: Support Vector Machines", 2009)

"A problem in data mining when random variations in data are misclassified as important patterns. Overfitting often occurs when the data set is too small to represent the real world." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Choosing an appropriate classification algorithm for a particular problem task requires practice: each algorithm has its own quirks and is based on certain assumptions. To restate the 'No Free Lunch' theorem: no single classifier works best across all possible scenarios. In practice, it is always recommended that you compare the performance of at least a handful of different learning algorithms to select the best model for the particular problem; these may differ in the number of features or samples, the amount of noise in a dataset, and whether the classes are linearly separable or not." (Sebastian Raschka, "Python Machine Learning", 2015)

"The no free lunch theorem for machine learning states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. In other words, in some sense, no machine learning algorithm is universally any better than any other. The most sophisticated algorithm we can conceive of has the same average performance (over all possible tasks) as merely predicting that every point belongs to the same class. [...] the goal of machine learning research is not to seek a universal learning algorithm or the absolute best learning algorithm. Instead, our goal is to understand what kinds of distributions are relevant to the 'real world' that an AI agent experiences, and what kinds of machine learning algorithms perform well on data drawn from the kinds of data generating distributions we care about." (Ian Goodfellow et al, "Deep Learning", 2015)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Decision trees are important for a few reasons. First, they can both classify and regress. It requires literally one line of code to switch between the two models just described, from a classification to a regression. Second, they are able to determine and share the feature importance of a given training set." (Russell Jurney, "Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark", 2017)

"Multilayer perceptrons share with polynomial classifiers one unpleasant property. Theoretically speaking, they are capable of modeling any decision surface, and this makes them prone to overfitting the training data." (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

"The main reason why pruning tends to improve classification performance on future examples is that the removal of low-level tests, which have poor statistical support, usually reduces the danger of overfitting. This, however, works only up to a certain point. If overdone, a very high extent of pruning can (in the extreme) result in the decision being replaced with a single leaf labeled with the majority class." (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The no free lunch theorems set limits on the range of optimality of any method. That is, each methodology has a ‘catchment area’ where it is optimal or nearly so. Often, intuitively, if the optimality is particularly strong then the effectiveness of the methodology falls off more quickly outside its catchment area than if its optimality were not so strong. Boosting is a case in point: it seems so well suited to binary classification that efforts to date to extend it to give effective classification (or regression) more generally have not been very successful. Overall, it remains to characterize the catchment areas where each class of predictors performs optimally, performs generally well, or breaks down." (Bertrand S Clarke & Jennifer L. Clarke, "Predictive Statistics: Analysis and Inference beyond Models", 2018)

"The premise of classification is simple: given a categorical target variable, learn patterns that exist between instances composed of independent variables and their relationship to the target. Because the target is given ahead of time, classification is said to be supervised machine learning because a model can be trained to minimize error between predicted and actual categories in the training data. Once a classification model is fit, it assigns categorical labels to new instances based on the patterns detected during training." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"A classification tree is perhaps the simplest form of algorithm, since it consists of a series of yes/no questions, the answer to each deciding the next question to be asked, until a conclusion is reached." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"An advantage of random forests is that it works with both regression and classification trees so it can be used with targets whose role is binary, nominal, or interval. They are also less prone to overfitting than a single decision tree model. A disadvantage of a random forest is that they generally require more trees to improve their accuracy. This can result in increased run times, particularly when using very large data sets." (Richard V McCarthy et al, "Applying Predictive Analytics: Finding Value in Data", 2019)

"The classifier accuracy would be extra ordinary when the test data and the training data are overlapping. But when the model is applied to a new data it will fail to show acceptable accuracy. This condition is called as overfitting." (Jesu V Nayahi J & Gokulakrishnan K, "Medical Image Classification", 2019)

More quotes on "Classification" at the-web-of-knowledge.blogspot.com.

18 May 2018

🔬Data Science: Boltzmann Machine (Definitions)

[Boltzmann machine (with learning):] "A net that adjusts its weights so that the equilibrium configuration of the net will solve a given problem, such as an encoder problem" (David H Ackley et al, "A learning algorithm for boltzmann machines", Cognitive Science Vol. 9 (1), 1985)

[Boltzmann machine (without learning):] "A class of neural networks used for solving constrained optimization problems. In a typical Boltzmann machine, the weights are fixed to represent the constraints of the problem and the function to be optimized. The net seeks the solution by changing the activations (either 1 or 0) of the units based on a probability distribution and the effect that the change would have on the energy function or consensus function for the net." (David H Ackley et al, "A learning algorithm for boltzmann machines", Cognitive Science Vol. 9 (1), 1985)

"neural-network model otherwise similar to a Hopfield network but having symmetric interconnects and stochastic processing elements. The input-output relation is optimized by adjusting the bistable values of its internal state variables one at a time, relating to a thermodynamically inspired rule, to reach a global optimum." (Teuvo Kohonen, "Self-Organizing Maps 3rd" Ed., 2001)

"A neural network model consisting of interacting binary units in which the probability of a unit being in the active state depends on its integrated synaptic inputs." (Terrence J Sejnowski, "The Deep Learning Revolution", 2018)

"An unsupervised network that maximizes the product of probabilities assigned to the elements of the training set." (Mário P Véstias, "Deep Learning on Edge: Challenges and Trends", 2020)

"Restricted Boltzmann machine (RBM) is an undirected graphical model that falls under deep learning algorithms. It plays an important role in dimensionality reduction, classification and regression. RBM is the basic block of Deep-Belief Networks. It is a shallow, two-layer neural networks. The first layer of the RBM is called the visible or input layer while the second is the hidden layer. In RBM the interconnections between visible units and hidden units are established using symmetric weights." (S Abirami & P Chitra, "The Digital Twin Paradigm for Smarter Systems and Environments: The Industry Use Cases", Advances in Computers, 2020)

"A deep Boltzmann machine (DBM) is a type of binary pairwise Markov random field (undirected probabilistic graphical model) with multiple layers of hidden random variables." (Udit Singhania & B. K. Tripathy, "Text-Based Image Retrieval Using Deep Learning", 2021)

"A Boltzmann machine is a neural network of symmetrically connected nodes that make their own decisions whether to activate. Boltzmann machines use a straightforward stochastic learning algorithm to discover “interesting” features that represent complex patterns in the database." (DeepAI) [source]

"Boltzmann Machines is a type of neural network model that was inspired by the physical process of thermodynamics and statistical mechanics. [...] Full Boltzmann machines are impractical to train, which is one of the reasons why a limited form, called the restricted Boltzmann machine, is used." (Accenture)

"RBMs [Restricted Boltzmann Machines] are a type of probabilistic graphical model that can be interpreted as a stochastic artificial neural network. RBNs learn a representation of the data in an unsupervised manner. An RBN consists of visible and hidden layer, and connections between binary neurons in each of these layers. RBNs can be efficiently trained using Contrastive Divergence, an approximation of gradient descent." (Wild ML)

16 May 2018

🔬Data Science: Training Set/Dataset (Definitions)

"set of data used as inputs in an adaptive process that teaches a neural network." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

"A set of observations that are used in creating a prediction model." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"the training set is composed by all labelled examples that are provided for constructing a classifier. The test set is composed by the new unlabelled patterns whose classes should be predicted by the classifier." (Óscar Pérez & Manuel Sánchez-Montañés, "Class Prediction in Test Sets with Shifted Distributions", 2009)

"A collection of data whose purpose is to be analyzed to discover patterns that can then be applied to other data sets." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A training set for supervised learning is taken from the labeled instances. The remaining instances are used for validation." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"A set of known and predictable data used to train a data mining model." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In data mining, a sample of data used at each iteration of the training process to evaluate the model fit." (Meta S Brown, "Data Mining For Dummies", 2014)

"Training Data is the data used to train a machine learning algorithm. Generally, data in machine learning is divided into three datasets: training, validation and testing data. In general, the more accurate and comprehensive training data is, the better the algorithm or classifier will perform." (Accenture)

10 May 2018

🔬Data Science: Support Vector Machines [SVM] (Definitions)

"A supervised machine learning classification approach with the objective to find the hyperplane maximizing the minimum distance between the plane and the training data points." (Xiaoyan Yu et al, "Automatic Syllabus Classification Using Support Vector Machines", 2009)

"Support vector machines [SVM] is a methodology used for classification and regression. SVMs select a small number of critical boundary instances called support vectors from each class and build a linear discriminant function that separates them as widely as possible." (Yorgos Goletsis et al, "Bankruptcy Prediction through Artificial Intelligence", 2009)

"SVM is a data mining method useful for classification problems. It uses training data and kernel functions to build a model that can appropriately predict the class of an unclassified observation." (Indranil Bose, "Data Mining in Tourism", 2009)

"A modeling technique that assigns points to classes based on the assignment of previous points, and then determines the gap dividing the classes where the gap is furthest from points in both classes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A machine-learning technique that classifies objects. The method starts with a training set consisting of two classes of objects as input. The SVA computes a hyperplane, in a multidimensional space, that separates objects of the two classes. The dimension of the hyperspace is determined by the number of dimensions or attributes associated with the objects. Additional objects (i.e., test set objects) are assigned membership in one class or the other, depending on which side of the hyperplane they reside." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"A machine learning algorithm that works with labeled training data and outputs results to an optimal hyperplane. A hyperplane is a subspace of the dimension minus one (that is, a line in a plane)." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"A classification algorithm that finds the hyperplane dividing the training data into given classes. This division by the hyperplane is then used to classify the data further." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"Machine learning techniques that are used to make predictions of continuous variables and classifications of categorical variables based on patterns and relationships in a set of training data for which the values of predictors and outcomes for all cases are known." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"It is a supervised machine learning tool utilized for data analysis, regression, and classification." (Shradha Verma, "Deep Learning-Based Mobile Application for Plant Disease Diagnosis", 2019)

"It is a supervised learning algorithm in ML used for problems in both classification and regression. This uses a technique called the kernel trick to transform the data and then determines an optimal limit between the possible outputs, based on those transformations." (Mehmet A Cifci, "Optimizing WSNs for CPS Using Machine Learning Techniques", 2021)

"Support Vector Machines (SVM) are supervised machine learning algorithms used for classification and regression analysis. Employed in classification analysis, support vector machines can carry out text categorization, image classification, and handwriting recognition." (Accenture)

🔬Data Science: Cross-validation (Definitions)

"A method for assessing the accuracy of a regression or classification model. A data set is divided up into a series of test and training sets, and a model is built with each of the training set and is tested with the separate test set." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A method for assessing the accuracy of a regression or classification model." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2007)

"A statistical method derived from cross-classification which main objective is to detect the outlying point in a population set." (Tomasz Ciszkowski & Zbigniew Kotulski, "Secure Routing with Reputation in MANET", 2008)

"Process by which an original dataset d is divided into a training set t and a validation set v. The training set is used to produce an effort estimation model (if applicable), later used to predict effort for each of the projects in v, as if these projects were new projects for which effort was unknown. Accuracy statistics are then obtained and aggregated to provide an overall measure of prediction accuracy." (Emilia Mendes & Silvia Abrahão, "Web Development Effort Estimation: An Empirical Analysis", 2008)

"A method of estimating predictive error of inducers. Cross-validation procedure splits that dataset into k equal-sized pieces called folds. k predictive function are built, each tested on a distinct fold after being trained on the remaining folds." (Gilles Lebrun et al, EA Multi-Model Selection for SVM, 2009)

"Method to estimate the accuracy of a classifier system. In this approach, the dataset, D, is randomly split into K mutually exclusive subsets (folds) of equal size (D1, D2, …, Dk) and K classifiers are built. The i-th classifier is trained on the union of all Dj ¤ j¹i and tested on Di. The estimate accuracy is the overall number of correct classifications divided by the number of instances in the dataset." (M Paz S Lorente et al, "Ensemble of ANN for Traffic Sign Recognition" [in "Encyclopedia of Artificial Intelligence"], 2009)

"The process of assessing the predictive accuracy of a model in a test sample compared to its predictive accuracy in the learning or training sample that was used to make the model. Cross-validation is a primary way to assure that over learning does not take place in the final model, and thus that the model approximates reality as well as can be obtained from the data available." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Validating a scoring procedure by applying it to another set of data." (Dougal Hutchison, "Automated Essay Scoring Systems", 2009)

"A method for evaluating the accuracy of a data mining model." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Cross-validation is a method of splitting all of your data into two parts: training and validation. The training data is used to build the machine learning model, whereas the validation data is used to validate that the model is doing what is expected. This increases our ability to find and determine the underlying errors in a model." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"A technique used for validation and model selection. The data is randomly partitioned into K groups. The model is then trained K times, each time with one of the groups left out, on which it is evaluated." (Simon Rogers & Mark Girolami, "A First Course in Machine Learning", 2017)

"A model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set." (Adrian Carballal et al, "Approach to Minimize Bias on Aesthetic Image Datasets", 2019)

08 May 2018

🔬Data Science: Cluster Analysis (Definitions)

"Generally, cluster analysis, or clustering, comprises a wide array of mathematical methods and algorithms for grouping similar items in a sample to create classifications and hierarchies through statistical manipulation of given measures of samples from the population being clustered. (Hannu Kivijärvi et al, "A Support System for the Strategic Scenario Process", 2008)

"Defining groups based on the 'degree' to which an item belongs in a category. The degree may be determined by indicating a percentage amount." (Mary J Lenard & Pervaiz Alam, "Application of Fuzzy Logic to Fraud Detection", 2009)

"A technique that identifies homogenous subgroups or clusters of subjects or study objects." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A statistical technique for finding natural groupings in data; it can also be used to assign new cases to groupings or categories." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"Techniques for organizing data into groups of similar cases." (Meta S Brown, "Data Mining For Dummies", 2014)

"A statistical technique whereby data or objects are classified into groups (clusters) that are similar to one another but different from data or objects in other clusters." (Soraya Sedkaoui, "Big Data Analytics for Entrepreneurial Success", 2018)

"Clustering or cluster analysis is a set of techniques of multivariate data analysis aimed at selecting and grouping homogeneous elements in a data set. Clustering techniques are based on measures relating to the similarity between the elements. In many approaches this similarity, or better, dissimilarity, is designed in terms of distance in a multidimensional space. Clustering algorithms group items on the basis of their mutual distance, and then the belonging to a set or not depends on how the element under consideration is distant from the collection itself." (Crescenzio Gallo, "Building Gene Networks by Analyzing Gene Expression Profiles", 2018)

"A type of an unsupervised learning that aims to partition a set of objects in such a way that objects in the same group (called a cluster) are more similar, whereas characteristics of objects assigned into different clusters are quite distinct." (Timofei Bogomolov et al, "Identifying Patterns in Fresh Produce Purchases: The Application of Machine Learning Techniques", 2020)

"Cluster analysis is the process of identifying objects that are similar to each other and cluster them in order to understand the differences as well as the similarities within the data." (Analytics Insight)

05 May 2018

🔬Data Science: Classification (Definitions)

"The process of learning to distinguish and discriminate between different input patterns using a supervised training algorithm." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"1.Generally, a set of discrete, exhaustive, and mutually exclusive observations that can be assigned to one or more variables to be measured in the collation and/or presentation of data. 2.In data modeling, the arrangement of entities into supertypes and subtypes. 3.In object-oriented design, the arrangement of objects into classes, and the assignment of objects to these categories." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Form of data analysis that models the relationships between a number of variables and a target feature. The target feature contains nominal values that indicate the class to which each observation belongs." (Efstathios Kirkos, "Composite Classifiers for Bankruptcy Prediction", 2014)

"Systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods, and procedural rules represented in a classification system. A coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"Techniques for organizing data into groups associated with a particular outcome, such as the likelihood to purchase a product or earn a college degree." (Meta S Brown, "Data Mining For Dummies", 2014)

"The systematic assignment of resources to a system of intentional categories, often institutional ones. Classification is applied categorization - the assignment of resources to a system of categories, called classes, using a predetermined set of principles." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"A systematic arrangement of objects into groups or categories according to a set of established criteria. Data and resources can be assigned a level of sensitivity as they are being created, amended, enhanced, stored, or transmitted. The classification level then determines the extent to which the resource needs to be controlled and secured, and is indicative of its value in terms of information assets." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known." (Soraya Sedkaoui, "Big Data Analytics for Entrepreneurial Success", 2018)

"It is task of classifying the data into predefined number of classes. It is a supervised approach. The tagged data is used to create classification model that will be used for classification on unknown data." (Siddhartha Kumar Arjaria & Abhishek S Rathore, "Heart Disease Diagnosis: A Machine Learning Approach", 2019)

"In a machine learning context, classification is the task of assigning classes to examples. The simplest form is the binary classification task where each example can have one of two classes. The binary classification task is a special case of the multiclass classification task where each example can have one of a fixed set of classes. There is also the multilabel classification task where each example can have zero or more labels from a fixed set of labels." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"The act of assigning a category to something" (ITIL)

16 April 2018

🔬Data Science: Classification Tree (Definitions)

"A decision tree that is used for prediction of categorical data." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"One of the main 'workhorse' techniques in data mining; used to predict membership of cases in the classes of a categorical dependent variable from their measurements predictor variables. Classification trees typically split the sample on simple rules and then resplit the subsamples, etc., until the data can’t sustain further complexity." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A machine learning approach that uses training data to create a model that can then be used for assigning cases (for example, workers) in a dataset to different possible groupings (for example, leavers or stayers)." (Jonathan Ferrar et al, "The Power of People", 2017)

"a form of classification algorithm in which features are examined in sequence, with the response indicating the next feature to examine, until a classification is made." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"A tree showing equivalence partitions hierarchically ordered, which is used to design test cases in the classification tree method. See also classification tree method." (SQA)

14 March 2018

🔬Data Science: Classifier (Definitions)

[pattern classifier:] "A neural net to determine whether an input pattern is or is not a member of a particular class. Training data consists of input patterns and the class to which each belongs, but does not require a description of each class; the net forms exemplar vectors for each class as it learns the training patterns." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

[Bayes classifier:] "statistical classification algorithm in which the class borders are determined decision-theoretically, on the basis of class distributions and misclassification costs." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

[nonparametric classifier:] "classification method that is not based on any mathematical functional form for the description of class regions, but directly refers to the available exemplary data." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

[parametric classifier:] "classification method in which the class regions are defined by specified mathematical functions involving free parameters." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

"A set of patterns and rules to assign a class to new examples." (Ching W Wang, "New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data", 2008)

"A structured model that maps unlabeled instances to finite set of classes." (Lior Rokach, "Incorporating Fuzzy Logic in Data Mining Tasks", Encyclopedia of Artificial Intelligence, 2009)

"A decision-supporting system that given an unseen (to-be-classified) input object yields a prediction, for instance, it classifies the given object to a certain class." (Ivan Bruha, "Knowledge Combination vs. Meta-Learning", 2009)

"Algorithm that produces class labels as output, from a set of features of an object. A classifier, for example, is used to classify certain features extracted from a face image and provide a label (an identity of the individual)." (Oscar D Suárez & Gloria B García, "Component Analysis in Artificial Vision" Encyclopedia of Artificial Intelligence, 2009)

"An algorithm to assign unknown object samples to their respective classes. The decision is made according to the classification feature vectors describing the object in question." (Michael Haefner, "Pit Pattern Classification Using Multichannel Features and Multiclassification", 2009)

"function that associates a class c to each input pattern x of interest. A classifier can be directly constructed from a set of pattern examples with their respective classes, or indirectly from a statistical model." (Óscar Pérez & Manuel Sánchez-Montañés, Class Prediction in Test Sets with Shifted Distributions, 2009)

[Naive Bayes classifier:] "A modeling technique where each attribute describes a class independent of any other attributes that also describe that class." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An algorithm that implements classification in the field of machine learning and statistical analysis." (Golnoush Abaei & Ali Selamat, "Important Issues in Software Fault Prediction: A Road Map", 2014)

"A computational method that can be trained using known labeled data for predicting the label of unlabeled data. If there's only two labels (also called classes), the method is called 'detector'." (Addisson Salazar et al, "New Perspectives of Pattern Recognition for Automatic Credit Card Fraud Detection", 2018)

[Naive Bayes classifier:] "A way to classify a data item using Bayes' theorem concerning the conditional probabilities P(A|B)=(P(B|A) * P(A))/P(B). It also assumes that variables in the data are independent, which means that no variable affects the probability of the remaining variables attaining a certain value." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A type of machine learning program that segments a set of cases into different classes or categorizations." (Shalin Hai-Jew, "Methods for Analyzing and Leveraging Online Learning Data", 2019)

"A supervised Data Mining algorithm used to categorize an instance into one of the two or more classes." (Mu L Wong & S Senthil "Development of Accurate and Timely Students' Performance Prediction Model Utilizing Heart Rate Data", 2020)

"A model that can be used to place objects into discrete categories based on some set of features. Classifiers are trained on datasets." (Laurel Powell et al, "Art Innovative Systems for Value Tagging", 2021)

11 February 2018

🔬Data Science: K-nearest neighbors (Definitions)

"A modeling technique that assigns values to points based on the values of the k nearby points, such as average value, or most common value." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A simple and popular classifier algorithm that assigns a class (in a preexisting classification) to an object whose class is unknown. [...] From a collection of data objects whose class is known, the algorithm computes the distances from the object of unknown class to k (a number chosen by the user) objects of known class. The most common class (i.e., the class that is assigned most often to the nearest k objects) is assigned to the object of unknown class." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"A method used for classification and regression. Cases are analyzed, and class membership is assigned based on similarity to other cases, where cases that are similar (or 'near' in characteristics) are known as neighbors." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A prediction method, which uses a function of the k most similar observations from the training set to generate a prediction, such as the mean." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"K-Nearest Neighbors classification is an instance-based supervised learning method that works well with distance-sensitive data." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"An algorithm that estimates an unknown data item as being like the majority of the k-closest neighbors to that item." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"K-nearest neighbourhood is a algorithm which stores all available cases and classifies new cases based on a similarity measure. It is used in statistical estimation and pattern recognition." (Aman Tyagi, "Healthcare-Internet of Things and Its Components: Technologies, Benefits, Algorithms, Security, and Challenges", 2021)

07 June 2013

🎓Knowledge Management: Taxonomy (Definitions)

"A classification system." (Ruth C Clark & Chopeta Lyons, "Graphics for Learning", 2004)

"A hierarchical structure within which related items are organized, classified, or categorized, thus illustrating the relationships between them." (Richard Caladine, "Taxonomies for Technology", 2008)

"A taxonomy is a hierarchical structure displaying parent-child relationships (a classification). A taxonomy extends a vocabulary and is a special case of a the more general ontology." (Troels Andreasen & Henrik Bulskov, "Query Expansion by Taxonomy", 2008)

"An orderly classification that explicitly expresses the relationships, usually hierarchical (e.g., genus/species, whole/part, class/instance), between and among the things being classified." (J P Getty Trust, "Introduction to Metadata" 2nd Ed., 2008)

"This term traditionally refers to the study of the general principles of classification. It is widely used to describe computer-based systems that use hierarchies of topics to help users sift through information." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A kind of classification method which organizes all kinds of things into predefined hierarchical structure." (Yong Yu et al, "Social Tagging: Properties and Applications", 2010)

"Any system of categories used to organize something, including documents, often less comprehensive than a thesaurus." (Steven Woods et al, "Knowledge Dissemination in Portals", 2011)

"Generally, a collection of controlled vocabulary terms organized into a structure of parent-child relationships. Each term is in at least one relationship with another term in the taxonomy. Each parent's relationship with all of its children are of only one type (whole-part, genus-species, or type-instance). The addition of associative relationships creates a thesaurus." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A definitional hierarchy of concepts. Traditional taxonomies are tree-structured (a concept is assumed to have exactly one superconcept and multiple subconcepts). The higher a concept is positioned in the definitional hierarchy, the more individuals it describes (the comprehension of the concept), but the less definitional properties are needed (the meaning of a concept)." (Marcus Spies & Said Tabet, "Emerging Standards and Protocols for Governance, Risk, and Compliance Management", 2012)

"A hierarchical representation of metadata. The top level is the category, and each subsequent level provides a refinement (detail) of the top-level term." (Charles Cooper & Ann Rockley, "Managing Enterprise Content: A Unified Content Strategy" 2nd Ed., 2012)

"A hierarchical structure of information components, for example, a subject, business–unit, or functional taxonomy, any part of which can be used to classify a content item in relation to other items in the structure." (Robert F Smallwood, "Managing Electronic Records: Methods, Best Practices, and Technologies", 2013)

"A classification of text" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A hierarchical structure of information components (e.g., a subject, business unit, or functional taxonomy), any part of which can be used to classify a content item in relation to other items in the structure." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"provides context within the ontology. Taxonomies are used to capture hierarchical relationships between elements of interest. " (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"Taxonomy is the science and practice of classification. Taxonomies are used when categorizing real-life as well as artificial phenomenon and the aim is to make systematic studies easier." (Ulf Larson et al, "Guidance for Selecting Data Collection Mechanisms for Intrusion Detection", 2015)

"A taxonomy is a hierarchy that is created by a set of interconnected class inclusion relationship." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"A hierarchical structure of information components, for example, a subject, business unit, or functional taxonomy, any part of which can be used to classify a content item in relation to other items in the structure." (Robert F Smallwood, "Information Governance for Healthcare Professionals", 2018)

SQL Troubles

Pages