Showing posts with label bayes. Show all posts
Showing posts with label bayes. Show all posts

13 December 2018

🔭Data Science: Bayesian Networks (Just the Quotes)

"The best way to convey to the experimenter what the data tell him about theta is to show him a picture of the posterior distribution." (George E P Box & George C Tiao, "Bayesian Inference in Statistical Analysis", 1973)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain 'exact' posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don't see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science, 1995)

"Bayesian methods are complicated enough, that giving researchers user-friendly software could be like handing a loaded gun to a toddler; if the data is crap, you won't get anything out of it regardless of your political bent." (Brad Carlin, "Bayes offers a new way to make sense of numbers", Science, 1999)

"Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian inference is appealing when prior information is available since Bayes’ theorem is a natural way to combine prior information with data. Some people find Bayesian inference psychologically appealing because it allows us to make probability statements about parameters. […] In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In general, they need not agree." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The Bayesian approach is based on the following postulates: (B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. […] (B2) We can make probability statements about parameters, even though they are fixed constants. (B3) We make inferences about a parameter θ by producing a probability distribution for θ. Inferences, such as point estimates and interval estimates, may then be extracted from this distribution." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004) 

"Bayesian networks can be constructed by hand or learned from data. Learning both the topology of a Bayesian network and the parameters in the CPTs in the network is a difficult computational task. One of the things that makes learning the structure of a Bayesian network so difficult is that it is possible to define several different Bayesian networks as representations for the same full joint probability distribution." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"Bayesian networks provide a more flexible representation for encoding the conditional independence assumptions between the features in a domain. Ideally, the topology of a network should reflect the causal relationships between the entities in a domain. Properly constructed Bayesian networks are relatively powerful models that can capture the interactions between descriptive features in determining a prediction." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"Bayesian networks use a graph-based representation to encode the structural relationships - such as direct influence and conditional independence - between subsets of features in a domain. Consequently, a Bayesian network representation is generally more compact than a full joint distribution (because it can encode conditional independence relationships), yet it is not forced to assert a global conditional independence between all descriptive features. As such, Bayesian network models are an intermediary between full joint distributions and naive Bayes models and offer a useful compromise between model compactness and predictive accuracy." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Bayesian networks inhabit a world where all questions are reducible to probabilities, or (in the terminology of this chapter) degrees of association between variables; they could not ascend to the second or third rungs of the Ladder of Causation. Fortunately, they required only two slight twists to climb to the top." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The transparency of Bayesian networks distinguishes them from most other approaches to machine learning, which tend to produce inscrutable 'black boxes'. In a Bayesian network you can follow every step and understand how and why each piece of evidence changed the network’s beliefs." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"With Bayesian networks, we had taught machines to think in shades of gray, and this was an important step toward humanlike thinking. But we still couldn’t teach machines to understand causes and effects. [...] By design, in a Bayesian network, information flows in both directions, causal and diagnostic: smoke increases the likelihood of fire, and fire increases the likelihood of smoke. In fact, a Bayesian network can’t even tell what the 'causal direction' is." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

04 December 2018

🔭Data Science: Bayesian Statistics (Just the Quotes)

"Another reason for the applied statistician to care about Bayesian inference is that consumers of statistical answers, at least interval estimates, commonly interpret them as probability statements about the possible values of parameters. Consequently, the answers statisticians provide to consumers should be capable of being interpreted as approximate Bayesian statements." (Donald B Rubin, "Bayesianly justifiable and relevant frequency calculations for the applied statistician", Annals of Statistics 12(4), 1984)

"The practicing Bayesian is well advised to become friends with as many numerical analysts as possible." (James Berger, "Statistical Decision Theory and Bayesian Analysis", 1985)

"Subjective probability, also known as Bayesian statistics, pushes Bayes' theorem further by applying it to statements of the type described as 'unscientific' in the frequency definition. The probability of a theory (e.g. that it will rain tomorrow or that parity is not violated) is considered to be a subjective 'degree of belief - it can perhaps be measured by seeing what odds the person concerned will offer as a bet. Subsequent experimental evidence then modifies the initial degree of belief, making it stronger or weaker according to whether the results agree or disagree with the predictions of the theory in question." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain 'exact' posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don't see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science, 1995)

"Bayesian computations give you a straightforward answer you can understand and use. It says there is an X% probability that your hypothesis is true-not that there is some convoluted chance that if you assume the null hypothesis is true, you’ll get a similar or more extreme result if you repeated your experiment thousands of times. How does one interpret THAT!" (Steven Goodman, "Bayes offers a new way to make sense of numbers", Science 19, 1999)

"Bayesian methods are complicated enough, that giving researchers user-friendly software could be like handing a loaded gun to a toddler; if the data is crap, you won’t get anything out of it regardless of your political bent." (Brad Carlin, "Bayes offers a new way to make sense of numbers", Science 19, 1999)

"I sometimes think that the only real difference between Bayesian and non-Bayesian hierarchical modelling is whether random effects are labeled with Greek or Roman letters." (Peter Diggle, "Comment on Bayesian analysis of agricultural field experiments", Journal of Royal Statistical Society B vol. 61, 1999)

"I believe that there are many classes of problems where Bayesian analyses are reasonable, mainly classes with which I have little acquaintance." (John Tukey, "The life and professional contributions of John W. Tukey, The Annals of Statistics", Vol 30, 2001)

"Bayesian statistics give us an objective way of combining the observed evidence with our prior knowledge (or subjective belief) to obtain a revised belief and hence a revised prediction of the outcome of the coin’s next toss. [...] This is perhaps the most important role of Bayes’s rule in statistics: we can estimate the conditional probability directly in one direction, for which our judgment is more reliable, and use mathematics to derive the conditional probability in the other direction, for which our judgment is rather hazy. The equation also plays this role in Bayesian networks; we tell the computer the forward  probabilities, and the computer tells us the inverse probabilities when needed." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"We thus echo the classical Bayesian literature in concluding that ‘noninformative prior information’ is a contradiction in terms. The flat prior carries information just like any other; it represents the assumption that the effect is likely to be large. This is often not true. Indeed, the signal-to-noise ratio s is often very low and then it is necessary to shrink the unbiased estimate. Failure to do so by inappropriately using the flat prior causes overestimation of effects and subsequent failure to replicate them." (Erik van Zwet & Andrew Gelman, "A proposal for informative default priors scaled by the standard error of estimates", The American Statistician 76, 2022)

23 April 2018

🔭Data Science: Independence (Just the Quotes)

"To apply the category of cause and effect means to find out which parts of nature stand in this relation. Similarly, to apply the gestalt category means to find out which parts of nature belong as parts to functional wholes, to discover their position in these wholes, their degree of relative independence, and the articulation of larger wholes into sub-wholes." (Kurt Koffka, 1931)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"So we pour in data from the past to fuel the decision-making mechanisms created by our models, be they linear or nonlinear. But therein lies the logician's trap: past data from real life constitute a sequence of events rather than a set of independent observations, which is what the laws of probability demand. [...] It is in those outliers and imperfections that the wildness lurks." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"It is important that uncertainty components that are independent of each other are added quadratically. This is also true for correlated uncertainty components, provided they are independent of each other, i.e., as long as there is no correlation between the components." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"The fact that the same uncertainty (e.g., scale uncertainty) is uncorrelated if we are dealing with only one measurement, but correlated (i.e., systematic) if we look at more than one measurement using the same instrument shows that both types of uncertainties are of the same nature. Of course, an uncertainty keeps its characteristics (e.g., Poisson distributed), independent of the fact whether it occurs only once or more often." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"To fulfill the requirements of the theory underlying uncertainties, variables with random uncertainties must be independent of each other and identically distributed. In the limiting case of an infinite number of such variables, these are called normally distributed. However, one usually speaks of normally distributed variables even if their number is finite." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Bayesian networks provide a more flexible representation for encoding the conditional independence assumptions between the features in a domain. Ideally, the topology of a network should reflect the causal relationships between the entities in a domain. Properly constructed Bayesian networks are relatively powerful models that can capture the interactions between descriptive features in determining a prediction." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"Bayesian networks use a graph-based representation to encode the structural relationships - such as direct influence and conditional independence - between subsets of features in a domain. Consequently, a Bayesian network representation is generally more compact than a full joint distribution (because it can encode conditional independence relationships), yet it is not forced to assert a global conditional independence between all descriptive features. As such, Bayesian network models are an intermediary between full joint distributions and naive Bayes models and offer a useful compromise between model compactness and predictive accuracy." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

06 March 2018

🔬Data Science: Bayesian Network (Definitions)

"A mathematic model in graphic form that represents a set of variables and their probabilistic independencies. It can be used, for example, to calculate the probability of a patient having a specific disease." (Attila Benko & Cecília S Lányi, "History of Artificial Intelligence", 2009) 

"A Bayesian network is a set of causally interrelated variables represented graphically in which the input information is generally subjective and can be updated in light of empirical data, by using Bayes’ theorem." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"A type of neural network. The Bayesian network is based on the fundamentals of probability theory." (Meta S Brown, "Data Mining For Dummies", 2014)

"A Bayesian network is a directed acyclical graph (there are no cycles in the graph) that is composed of three basic elements: 
nodes: each feature in a domain is represented by a single node in the graph.
edges: nodes are connected by directed links; the connectivity of the links in a graph encodes the influence and conditional independence relationships between nodes. 
conditional probability tables: each node has a conditional probability table (CPT) associated with it. A CPT lists the probability distribution of the feature represented by the node conditioned on the features represented by the other nodes to which a node is connected by edges." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"A representation of knowledge in the form of a directed acyclic graph representing random variables as nodes and their conditional dependencies as edges." (Petr Berka, "Machine Learning", 2015)

"They are acyclic graphical models that capture conditional dependence among random variables. Each node is associated with a function that gives the probability of finding the variable in a given state, given particular states of its parent variables." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"A graph model representing random variables with their conditional dependencies." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A particular type of statistical model that represents a set of variables and their conditional dependencies. It is usually used to make previsions in a great variety of events." (Gaetano B Ronsivalle & Arianna Boldi, "Artificial Intelligence Applied: Six Actual Projects in Big Organizations", 2019)

"A model that represents and calculates the probabilistic relationships between a set of random variables and an uncertain domain via a directed acyclic graph." (Accenture)

"Bayesian Neural Networks (BNNs) refers to extending standard networks with posterior inference in order to control over-fitting. From a broader perspective, the Bayesian approach uses the statistical methodology so that everything has a probability distribution attached to it, including model parameters (weights and biases in neural networks). In programming languages, variables that can take a specific value will turn the same result every-time you access that specific variable." (Databricks) [source]

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.