01 November 2018

Data Science: Black Boxes (Just the Quotes)

"The terms 'black box' and 'white box' are convenient and figurative expressions of not very well determined usage. I shall understand by a black box a piece of apparatus, such as four-terminal networks with two input and two output terminals, which performs a definite operation on the present and past of the input potential, but for which we do not necessarily have any information of the structure by which this operation is performed. On the other hand, a white box will be similar network in which we have built in the relation between input and output potentials in accordance with a definite structural plan for securing a previously determined input-output relation." (Norbert Wiener, "Cybernetics: Or Control and Communication in the Animal and the Machine", 1948)

"The definition of a ‘good model’ is when everything inside it is visible, inspectable and testable. It can be communicated effortlessly to others. A ‘bad model’ is a model that does not meet these standards, where parts are hidden, undefined or concealed and it cannot be inspected or tested; these are often labelled black box models." (Hördur V Haraldsson & Harald U Sverdrup, "Finding Simplicity in Complexity in Biogeochemical Modelling" [in "Environmental Modelling: Finding Simplicity in Complexity", Ed. by John Wainwright and Mark Mulligan, 2004])

"Operational thinking is about mapping relationships. It is about capturing interactions, interconnections, the sequence and flow of activities, and the rules of the game. It is about how systems do what they do, or the dynamic process of using elements of the structure to produce the desired functions. In a nutshell, it is about unlocking the black box that lies between system input and system output." (Jamshid Gharajedaghi, "Systems Thinking: Managing Chaos and Complexity A Platform for Designing Business Architecture" 3rd Ed., 2011)

"The transparency of Bayesian networks distinguishes them from most other approaches to machine learning, which tend to produce inscrutable 'black boxes'. In a Bayesian network you can follow every step and understand how and why each piece of evidence changed the network’s beliefs." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"A recurring theme in machine learning is combining predictions across multiple models. There are techniques called bagging and boosting which seek to tweak the data and fit many estimates to it. Averaging across these can give a better prediction than any one model on its own. But here a serious problem arises: it is then very hard to explain what the model is (often referred to as a 'black box'). It is now a mixture of many, perhaps a thousand or more, models." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Deep neural networks have an input layer and an output layer. In between, are “hidden layers” that process the input data by adjusting various weights in order to make the output correspond closely to what is being predicted. [...] The mysterious part is not the fancy words, but that no one truly understands how the pattern recognition inside those hidden layers works. That’s why they’re called 'hidden'. They are an inscrutable black box - which is okay if you believe that computers are smarter than humans, but troubling otherwise." (Gary Smith & Jay Cordes, "The 9 Pitfalls of Data Science", 2019)

"The concept of integrated information is clearest when applied to networks. Imagine a black box with input and output terminals. Inside are some electronics, such as a network with logic elements (AND, OR, and so on) wired together. Viewed from the outside, it will usually not be possible to deduce the circuit layout simply by examining the cause–effect relationship between inputs and outputs, because functionally equivalent black boxes can be built from very different circuits. But if the box is opened, it’s a different story. Suppose you use a pair of cutters to sever some wires in the network. Now rerun the system with all manner of inputs. If a few snips dramatically alter the outputs, the circuit can be described as highly integrated, whereas in a circuit with low integration the effect of some snips may make no difference at all." (Paul Davies, "The Demon in the Machine: How Hidden Webs of Information Are Solving the Mystery of Life", 2019)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"If the data that go into the analysis are flawed, the specific technical details of the analysis don’t matter. One can obtain stupid results from bad data without any statistical trickery. And this is often how bullshit arguments are created, deliberately or otherwise. To catch this sort of bullshit, you don’t have to unpack the black box. All you have to do is think carefully about the data that went into the black box and the results that came out. Are the data unbiased, reasonable, and relevant to the problem at hand? Do the results pass basic plausibility checks? Do they support whatever conclusions are drawn?" (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.