25 November 2018

Data Science: Trust (Just the Quotes)

"We must trust to nothing but facts: These are presented to us by Nature, and cannot deceive. We ought, in every instance, to submit our reasoning to the test of experiment, and never to search for truth but by the natural road of experiment and observation." (Antoin-Laurent de Lavoisiere, "Elements of Chemistry", 1790)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"It is of the nature of true science to take nothing on trust or on authority. Every fact must be established by accurate observation, experiment, or calculation. Every law and principle must rest on inductive argument. The apostolic motto, ‘Prove all things, hold fast that which is good’, is thoroughly scientific. It is true that the mere reader of popular science must often be content to take that on testimony which he cannot personally verify; but it is desirable that even the most cursory reader should fully comprehend the modes in which facts are ascertained and the reasons on which the conclusions are based." (Sir John W Dawson, "The Chain of Life in Geological Time", 1880)

"The aim of science is to seek the simplest explanations of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be, ‘Seek simplicity and distrust it’." (Alfred N Whitehead, "The Concept of Nature", 1919)

"Every bit of knowledge we gain and every conclusion we draw about the universe or about any part or feature of it depends finally upon some observation or measurement. Mankind has had again and again the humiliating experience of trusting to intuitive, apparently logical conclusions without observations, and has seen Nature sail by in her radiant chariot of gold in an entirely different direction." (Oliver J Lee, "Measuring Our Universe: From the Inner Atom to Outer Space", 1950)

"Being built on concepts, hypotheses, and experiments, laws are no more accurate or trustworthy than the wording of the definitions and the accuracy and extent of the supporting experiments." (Gerald Holton, "Introduction to Concepts and Theories in Physical Science", 1952)

"No observations are absolutely trustworthy. In no field of observation can we entirely rule out the possibility that an observation is vitiated by a large measurement or execution error. If a reading is found to lie a very long way from its fellows in a series of replicate observations, there must be a suspicion that the deviation is caused by a blunder or gross error of some kind. [...] One sufficiently erroneous reading can wreck the whole of a statistical analysis, however many observations there are." (Francis J Anscombe, "Rejection of Outliers", Technometrics Vol. 2 (2), 1960)

"Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Science’s predictions are more trustworthy, but they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things, try as we might, are just unpredictable. For the vast middle ground between the two, there’s machine learning." (Pedro Domingos, "The Master Algorithm", 2015)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"Are your insights based on data that is accurate and reliable? Trustworthy data is correct or valid, free from significant defects and gaps. The trustworthiness of your data begins with the proper collection, processing, and maintenance of the data at its source. However, the reliability of your numbers can also be influenced by how they are handled during the analysis process. Clean data can inadvertently lose its integrity and true meaning depending on how it is analyzed and interpreted." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"I believe that the backlash against statistics is due to four primary reasons. The first, and easiest for most people to relate to, is that even the most basic concepts of descriptive and inferential statistics can be difficult to grasp and even harder to explain. […] The second cause for vitriol is that even well-intentioned experts misapply the tools and techniques of statistics far too often, myself included. Statistical pitfalls are numerous and tough to avoid. When we can't trust the experts to get it right, there's a temptation to throw the baby out with the bathwater. The third reason behind all the hate is that those with an agenda can easily craft statistics to lie when they communicate with us  […] And finally, the fourth cause is that often statistics can be perceived as cold and detached, and they can fail to communicate the human element of an issue." (Ben Jones, "Avoiding Data Pitfalls: How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations", 2020)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.