SQL Troubles: 🔭Data Science: Success (Just the Quotes)

16 November 2018

🔭Data Science: Success (Just the Quotes)

"[…] the statistical prediction of the future from the past cannot be generally valid, because whatever is future to any given past, is in tum past for some future. That is, whoever continually revises his judgment of the probability of a statistical generalization by its successively observed verifications and failures, cannot fail to make more successful predictions than if he should disregard the past in his anticipation of the future. This might be called the ‘Principle of statistical accumulation’." (Clarence I Lewis, "Mind and the World-Order: Outline of a Theory of Knowledge", 1929)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"[Statistics] is both a science and an art. It is a science in that its methods are basically systematic and have general application; and an art in that their successful application depends to a considerable degree on the skill and special experience of the statistician, and on his knowledge of the field of application, e.g. economics." (Leonard H C Tippett, "Statistics", 1943)

"Statistics provides a quantitative example of the scientific process usually described qualitatively by saying that scientists observe nature, study the measurements, postulate models to predict new measurements, and validate the model by the success of prediction." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Changes of variables can be helpful for iterative and parametric solutions even if they do not linearize the problem. For example, a change of variables may change the 'shape' of J(x) into a more suitable form. Unfortunately there seems to be no general way to choose the 'right' change of variables. Success depends on the particular problem and the engineer's insight. However, the possibility of a change of variables should always be considered."(Fred C Scweppe, "Uncertain dynamic systems", 1973)

"There is a universality about mathematics; what was created to explain one phenomenon is very often later found to be useful in explaining other, apparently unrelated, phenomena. Theories that were developed to explain some poorly measured effects are often found to fit later, much more accurate measurements. Furthermore, from measurements over a limited range the theory is often found to fit a far wider range. Finally, and perhaps most unreasonably, quite regularly from the mathematics alone new phenomena, previously unknown and unsuspected, are successfully predicted. This universality of mathematics could, of course, be a reflection of the way the human mind works and not of the external world, but most people believe it reflects reality." (Richard W Hamming, "Methods of Mathematics Applied to Calculus, Probability, and Statistics", 1985)

"One of the critical success factors for any method and its application is its ability to facilitate communication, avoiding information overload. So for larger models, the question is how to guide the reader into different parts of the model." (Peter Coad & Edward Yourdon, "Object-Oriented Analysis" 2nd Ed., 1991)

"The ability of neural networks to operate successfully on inputs that did not form part of the training set is one of their most important characteristics. Networks are capable of finding common elements in all the training examples belonging to the same class, and will then respond appropriately when these elements are encountered again. Optimising this capability is an important consideration when designing a network." (Paul Cilliers, "Complexity and Postmodernism: Understanding Complex Systems", 1998)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"Information design, when successful - whether in print, on the web, or in the environment - represents the functional balance of the meaning of the information, the skills and inclinations of the designer, and the perceptions, education, experience, and needs of the audience." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Successful information design in movement systems gives the user the information he needs - and only the information he needs - at every decision point." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"You can give your data product a better chance of success by carefully setting the users’ expectations. [...] One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data mining is a craft. As with many crafts, there is a well-defined process that can help to increase the likelihood of a successful result. This process is a crucial conceptual tool for thinking about data science projects. [...] data mining is an exploratory undertaking closer to research and development than it is to engineering." (Foster Provost, "Data Science for Business", 2013)

"Decision trees are an important tool for decision making and risk analysis, and are usually represented in the form of a graph or list of rules. One of the most important features of decision trees is the ease of their application. Being visual in nature, they are readily comprehensible and applicable. Even if users are not familiar with the way that a decision tree is constructed, they can still successfully implement it. Most often decision trees are used to predict future scenarios, based on previous experience, and to support rational decision making." (Jelena Djuris et al, "Neural computing in pharmaceutical products and process development", Computer-Aided Applications in Pharmaceutical Technology, 2013)

"We are seduced by patterns and we want explanations for these patterns. When we see a string of successes, we think that a hot hand has made success more likely. If we see a string of failures, we think a cold hand has made failure more likely. It is easy to dismiss such theories when they involve coin flips, but it is not so easy with humans. We surely have emotions and ailments that can cause our abilities to go up and down. The question is whether these fluctuations are important or trivial." (Gary Smith, "Standard Deviations", 2014)

"We emphasize that while there are some common techniques for feature learning one may want to try, the No-Free-Lunch theorem implies that there is no ultimate feature learner. Any feature learning algorithm might fail on some problem. In other words, the success of each feature learner relies (sometimes implicitly) on some form of prior assumption on the data distribution. Furthermore, the relative quality of features highly depends on the learning algorithm we are later going to apply using these features." (Shai Shalev-Shwartz & Shai Ben-David, "Understanding Machine Learning: From Theory to Algorithms", 2014)

"Whether or not a model works is also a matter of opinion. After all, a key component of every model, whether formal or informal, is its definition of success." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"All human storytellers bring their subjectivity to their narratives. All have bias, and possibly error. Acknowledging and defusing that bias is a vital part of successfully using data stories. By debating a data story collaboratively and subjecting it to critical thinking, organizations can get much higher levels of engagement with data and analytics and impact their decision making much more than with reports and dashboards alone." (James Richardson, 2017)

"Extracting good features is the most important thing for getting your analysis to work. It is much more important than good machine learning classifiers, fancy statistical techniques, or elegant code. Especially if your data doesn’t come with readily available features (as is the case with web pages, images, etc.), how you reduce it to numbers will make the difference between success and failure." (Field Cady, "The Data Science Handbook", 2017)

"The field of big-data analytics is still littered with a few myths and evidence-free lore. The reasons for these myths are simple: the emerging nature of technologies, the lack of common definitions, and the non-availability of validated best practices. Whatever the reasons, these myths must be debunked, as allowing them to persist usually has a negative impact on success factors and Return on Investment (RoI). On a positive note, debunking the myths allows us to set the right expectations, allocate appropriate resources, redefine business processes, and achieve individual/organizational buy-in." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Uncertainty is an adversary of coldly logical algorithms, and being aware of how those algorithms might break down in unusual circumstances expedites the process of fixing problems when they occur - and they will occur. A data scientist’s main responsibility is to try to imagine all of the possibilities, address the ones that matter, and reevaluate them all as successes and failures happen." (Brian Godsey, "Think Like a Data Scientist", 2017)

"The no free lunch theorems set limits on the range of optimality of any method. That is, each methodology has a ‘catchment area’ where it is optimal or nearly so. Often, intuitively, if the optimality is particularly strong then the effectiveness of the methodology falls off more quickly outside its catchment area than if its optimality were not so strong. Boosting is a case in point: it seems so well suited to binary classification that efforts to date to extend it to give effective classification (or regression) more generally have not been very successful. Overall, it remains to characterize the catchment areas where each class of predictors performs optimally, performs generally well, or breaks down." (Bertrand S Clarke & Jennifer L. Clarke, "Predictive Statistics: Analysis and Inference beyond Models", 2018)

"[...] the focus on Big Data AI seems to be an excuse to put forth a number of vague and hand-waving theories, where the actual details and the ultimate success of neuroscience is handed over to quasi- mythological claims about the powers of large datasets and inductive computation. Where humans fail to illuminate a complicated domain with testable theory, machine learning and big data supposedly can step in and render traditional concerns about finding robust theories. This seems to be the logic of Data Brain efforts today. (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"The idea that we can predict the arrival of AI typically sneaks in a premise, to varying degrees acknowledged, that successes on narrow AI systems like playing games will scale up to general intelligence, and so the predictive line from artificial intelligence to artificial general intelligence can be drawn with some confidence. This is a bad assumption, both for encouraging progress in the field toward artificial general intelligence, and for the logic of the argument for prediction." (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

SQL Troubles

Pages

16 November 2018

🔭Data Science: Success (Just the Quotes)

No comments:

About Me