27 October 2018

Data Science: Research (Just the Quotes)

"The aim of research is the discovery of the equations which subsist between the elements of phenomena." (Ernst Mach, 1898)

"[…] scientific research is somewhat like unraveling complicated tangles of strings, in which luck is almost as vital as skill and accurate observation." (Ernst Mach, "Knowledge and Error: Sketches on the Psychology of Enquiry", 1905)

"Research is fundamentally a state of mind involving continual re­examination of doctrines and axioms upon which current thought and action are based. It is, therefore, critical of existing practices." (Theobald Smith, "The Influence of Research in Bringing into Closer Relationship the Practice of Medicine and Public Health Activities", American Journal of Medical Sciences, 1929)

"In every important advance the physicist finds that the fundamental laws are simplified more and more as experimental research advances. He is astonished to notice how sublime order emerges from what appeared to be chaos. And this cannot be traced back to the workings of his own mind but is due to a quality that is inherent in the world of perception." (Albert Einstein, 1932)

"Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained from observation or experiment. The subject has a coherent structure based on the theory of Probability and includes many different procedures which contribute to research and development throughout the whole of Science and Technology." (Egon Pearson, 1936)

"A successful hypothesis is not necessarily a permanent hypothesis, but it is one which stimulates additional research, opens up new fields, or explains and coordinates previously unrelated facts." (Farrington Daniels, "Outlines of Physical Chemistry", 1948)

"The hypothesis is the principal intellectual instrument in research. Its function is to indicate new experiments and observations and it therefore sometimes leads to discoveries even when not correct itself. We must resist the temptation to become too attached to our hypothesis, and strive to judge it objectively and modify it or discard it as soon as contrary evidence is brought to light. Vigilance is needed to prevent our observations and interpretations being biased in favor of the hypothesis. Suppositions can be used without being believed." (William I B Beveridge, "The Art of Scientific Investigation", 1950)

"Mathematical models for empirical phenomena aid the development of a science when a sufficient body of quantitative information has been accumulated. This accumulation can be used to point the direction in which models should be constructed and to test the adequacy of such models in their interim states. Models, in turn, frequently are useful in organizing and interpreting experimental data and in suggesting new directions for experimental research." (Robert R. Bush & Frederick Mosteller, "A Mathematical Model for Simple Learning", Psychological Review 58, 1951)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"In a general way it may be said that to think in terms of systems seems the most appropriate conceptual response so far available when the phenomena under study - at any level and in any domain--display the character of being organized, and when understanding the nature of the interdependencies constitutes the research task. In the behavioral sciences, the first steps in building a systems theory were taken in connection with the analysis of internal processes in organisms, or organizations, when the parts had to be related to the whole." (Fred Emery, "The Causal Texture of Organizational Environments", 1963)

"If the null hypothesis is not rejected, [Sir Ronald] Fisher's position was that nothing could be concluded. But researchers find it hard to go to all the trouble of conducting a study only to conclude that nothing can be concluded." (Frank L Schmidt, "Statistical Significance Testing and Cumulative Knowledge", "Psychology: Implications for Training of Researchers, Psychological Methods" Vol. 1 (2), 1996)

"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems. [...] The best samples are those that come as close as possible to being random." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Meta-analytic thinking is the consideration of any result in relation to previous results on the same or similar questions, and awareness that combination with future results is likely to be valuable. Meta-analytic thinking is the application of estimation thinking to more than a single study. It prompts us to seek meta-analysis of previous related studies at the planning stage of research, then to report our results in a way that makes it easy to include them in future meta-analyses. Meta-analytic thinking is a type of estimation thinking, because it, too, focuses on estimates and uncertainty." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Statistical cognition is concerned with obtaining cognitive evidence about various statistical techniques and ways to present data. It’s certainly important to choose an appropriate statistical model, use the correct formulas, and carry out accurate calculations. It’s also important, however, to focus on understanding, and to consider statistics as communication between researchers and readers." (Geoff Cumming, "Understanding the New Statistics", 2012)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"How can we tell the difference between a good theory and quackery? There are two effective antidotes: common sense and fresh data. If it is a ridiculous theory, we shouldn’t be persuaded by anything less than overwhelming evidence, and even then be skeptical. Extraordinary claims require extraordinary evidence. Unfortunately, common sense is an uncommon commodity these days, and many silly theories have been seriously promoted by honest researchers." (Gary Smith, "Standard Deviations", 2014)

"These practices - selective reporting and data pillaging - are known as data grubbing. The discovery of statistical significance by data grubbing shows little other than the researcher’s endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won’t become a star by confirming other people’s research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged." (Gary Smith, "Standard Deviations", 2014)

"A conceptual model is a framework that is initially used in research to outline the possible courses of action or to present an idea or thought. When a conceptual model is developed in a logical manner, it will provide a rigor to the research process." (N Elangovan & R Rajendran, "Conceptual Model: A Framework for Institutionalizing the Vigor in Business Research", 2015)

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Collecting data through sampling therefore becomes a never-ending battle to avoid sources of bias. [...] While trying to obtain a random sample, researchers sometimes make errors in judgment about whether every person or thing is equally likely to be sampled." (Daniel J Levitin, "Weaponized Lies", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.