30 December 2018

Data Science: Matching (Just the Quotes)

"A physical theory must accept some actual data as inputs and must be able to generate from them another set of possible data (the output) in such a way that both input and output match the assumptions of the theory - laws, constraints, etc. This concept of matching involves relevance: thus boundary conditions are relevant only to field-like theories such as hydrodynamics and quantum mechanics. But matching is more than relevance: it is also logical compatibility." (Mario Bunge, "Philosophy of Physics", 1973)

"The matching procedure often helps inform the reader what is going on in the data […] Matching has some defects, chiefly that it is difficult to do a very good job of matching in complex situations without a large number of cases. […] One limitation of matching, then, is that quite often the match is not very accurate. A second limitation is that if we want to control for more than one variable using matching procedures, the tables begin to have combinations of categories without any cases at all in them, and they become somewhat more difficult for the reader to understand." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Generalization is the process of matching new, unknown input data with the problem knowledge in order to obtain the best possible solution, or one close to it. Generalization means reacting properly to new situations, for example, recognizing new images, or classifying new objects and situations. Generalization can also be described as a transition from a particular object description to a general concept description. This is a major characteristic of all intelligent systems." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"More than just a new computing architecture, neural networks offer a completely different paradigm for solving problems with computers. […] The process of learning in neural networks is to use feedback to adjust internal connections, which in turn affect the output or answer produced. The neural processing element combines all of the inputs to it and produces an output, which is essentially a measure of the match between the input pattern and its connection weights. When hundreds of these neural processors are combined, we have the ability to solve difficult problems such as credit scoring." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Because No Free Lunch theorems dictate that no optimization algorithm can be considered more efficient than any other when considering all possible functions, the desired function class plays a prominent role in the model. In particular, this provides a tractable way to answer the traditionally difficult question of what algorithm is best matched to a particular class of functions. Among the benefits of the model are the ability to specify the function class in a straightforward manner, a natural way to specify noisy or dynamic functions, and a new source of insight into No Free Lunch theorems for optimization." (Christopher K Monson, "No Free Lunch, Bayesian Inference, and Utility: A Decision-Theoretic Approach to Optimization", [thesis] 2006)

"Whereas regression is about attempting to specify the underlying relationship that summarises a set of paired data, correlation is about assessing the strength of that relationship. Where there is a very close match between the scatter of points and the regression line, correlation is said to be 'strong' or 'high' . Where the points are widely scattered, the correlation is said to be 'weak' or 'low'." (Alan Graham, "Developing Thinking in Statistics", 2006)

"A decision theory that rests on the assumptions that human cognitive capabilities are limited and that these limitations are adaptive with respect to the decision environments humans frequently encounter. Decision are thought to be made usually without elaborate calculations, but instead by using fast and frugal heuristics. These heuristics certainly have the advantage of speed and simplicity, but if they are well matched to a decision environment, they can even outperform maximizing calculations with respect to accuracy. The reason for this is that many decision environments are characterized by incomplete information and noise. The information we do have is usually structured in a specific way that clever heuristics can exploit." (E Ebenhoh, "Agent-Based Modelnig with Boundedly Rational Agents", 2007)

"Learning a complicated function that matches the training data closely but fails to recognize the underlying process that generates the data. As a result of overfitting, the model performs poor on new input. Overfitting occurs when the training patterns are sparse in input space and/or the trained networks are too complex." (Frank Padberg, "Counting the Hidden Defects in Software Documents", 2010)

"Unfortunately, creating an objective function that matches the true goal of the data mining is usually impossible, so data scientists often choose based on faith and experience." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Matching is a family of methods for estimating causal effects by matching similar observations (or units) in the treatment and non-treatment groups. The goal of matching is to make comparisons between similar units in order to achieve as precise an estimate of the true causal effect as possible." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.