SQL Troubles

16 April 2018

🔬Data Science: Classification Tree (Definitions)

"A decision tree that is used for prediction of categorical data." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"One of the main 'workhorse' techniques in data mining; used to predict membership of cases in the classes of a categorical dependent variable from their measurements predictor variables. Classification trees typically split the sample on simple rules and then resplit the subsamples, etc., until the data can’t sustain further complexity." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A machine learning approach that uses training data to create a model that can then be used for assigning cases (for example, workers) in a dataset to different possible groupings (for example, leavers or stayers)." (Jonathan Ferrar et al, "The Power of People", 2017)

"a form of classification algorithm in which features are examined in sequence, with the response indicating the next feature to examine, until a classification is made." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"A tree showing equivalence partitions hierarchically ordered, which is used to design test cases in the classification tree method. See also classification tree method." (SQA)

13 April 2018

🔬Data Science: Text Mining (Definitions)

"The application of data mining techniques to discover actionable and meaningful patterns, profiles, and trends from documents or other text data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"The process of evaluating unstructured text for patterns, extract actionable data and sentiment via semantic analysis, statistical methods, etc." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Performing detailed full–text searches on the content of document." (Robert F Smallwood, "Managing Electronic Records: Methods, Best Practices, and Technologies", 2013)

"Data-mining techniques applied to text. Because these rely on the same underlying analytic approaches as text analysis, text mining is synonymous with text analysis, and the use of the term mining is primarily a matter of style and context." (Meta S Brown, "Data Mining For Dummies", 2014)

"Performing detailed full-text searches on the content of document." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"It is the process of extracting information from textual sources, via their grammatical and statistical properties. Applications of text mining include security monitoring and analysis of online texts such as blogs, web-pages, web-posts, etc." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"The analysis of raw data to produce results specific to a particular inquiry (e.g., how often a particular word is used, whether a particular product is in demand, how a particular consumer reacts to advertisements)." (James R Kalyvas & Michael R Overly, "Big Data: A Businessand Legal Guide", 2015)

"Performing detailed full-text searches on the content of document." (Robert F Smallwood, "Information Governance for Healthcare Professionals", 2018)

"The search and extraction of text, and its possible conversion to numerical data that is used for data analysis." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"The process of extracting information from collections of textual data and utilizing it for business objectives." (Gartner)

10 April 2018

🔬Data Science: Abstraction (Definitions)

"A broad and general term indicating (1) a less detailed model that conforms to (defines a subset of the properties of) another model, and (2) the process through which a less detailed but conforming model is made, that is, the process of removing details that are not relevant to the purpose of the model." (Anneke Kleppe et al, "MDA Explained: The Model Driven Architecture™: Practice and Promise", 2003)

"The process of ignoring or suppressing levels of detail to provide a simpler, more generalized view." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"The process of moving from the specific to the general by neglecting minor differences or stressing common elements. Also used as a synonym for summarization." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"Data abstraction means the storage details of the data are hidden from the user and the user is provided with the conceptual view of the database." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"In data modeling, the redefinition of data entities, attributes, and relationships by removing details to broaden the applicability of data structures to a wider class of situations, often by implementing supertypes rather than subtypes." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

[horizontal abstraction:] "The process of partitioning a model into smaller subparts for presentation. Used in data modeling to show related areas in a more readable scale." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[vertical abstraction:] "The presentation of all or part of a model detail. Used in data modeling to show higher levels of entities and relationships to illustrate the basic subject area contents." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The separation of the logical view of data from its implementation." (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"The separation of a data type’s logical properties from its implementation." (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

05 April 2018

🔬Data Science: Genetic Algorithms [GA] (Definitions)

"A method for solving optimization problems using parallel search, based on the biological paradigm of natural selection and 'survival of the fittest'." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"Algorithms for solving complex combinatorial and organizational problems with many variants, by employing analogy with nature's evolution. The general steps a genetic algorithm cycles through are: generate a new population (crossover) starting at the beginning with initial one; select the best individuals; mutate, if necessary; repeat the same until a satisfactory solution is found according to a goodness (fitness) function." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The type of algorithm that locates optimal binary strings by processing an initially random population of strings using artificial mutation, crossover, and selection operators, in an analogy with the process of natural selection." (David E Goldberg, "Genetic Algorithms", 1989)

"A technique for estimating computer models (e.g., Machine Learning) based on methods adapted from the field of genetics in biology. To use this technique, one encodes possible model behaviors into a 'genes'. After each generation, the current models are rated and allowed to mate and breed based on their fitness. In the process of mating, the genes are exchanged, and crossovers and mutations can occur. The current population is discarded and its offspring forms the next generation." (William J Raynor Jr., "The International Dictionary of Artificial Intelligence", 1999)

"Genetic algorithms are problem-solving techniques that solve problems by evolving solutions as nature does, rather than by looking for solutions in a more principled way. Genetic algorithms, sometimes hybridized with other optimization algorithms, are the best optimization algorithms available across a wide range of problem types." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)

"learning principle, in which learning results are foully from generations of solutions by crossing and eliminating their members. An improved behavior usually ensues from selective stochastic replacements in subsets of system parameters." (Teuvo Kohonen, "Self-Organizing Maps 3rd Ed.", 2001)

"A genetic algorithm is a search method used in computational intelligence to find true or approximate solutions to optimization and search problems." (Omar F El-Gayar et al, "Current Issues and Future Trends of Clinical Decision Support Systems", 2008)

"A method of evolutionary computation for problem solving. There are states also called sequences and a set of possibility final states. Methods of mutation are used on genetic sequences to achieve better sequences." (Attila Benko & Cecília S Lányi, "History of Artificial Intelligence", 2009)

"Genetic algorithms are derivative free, stochastic optimization methods based on the concepts of natural selection and evolutionary processes." (Yorgos Goletsis et al, Bankruptcy Prediction through Artificial Intelligence, 2009)

"Genetic Algorithms (GAs) are algorithms that use operations found in natural genetics to guide their way through a search space and are increasingly being used in the field of optimisation. The robust nature and simple mechanics of genetic algorithms make them inviting tools for search learning and optimization. Genetic algorithms are based on computational models of fundamental evolutionary processes such as selection, recombination and mutation." (Masoud Mohammadian, Supervised Learning of Fuzzy Logic Systems, 2009)

"The algorithms that are modelled on the natural process of evolution. These algorithms employ methods such as crossover, mutation and natural selection and provide the best possible solutions after analyzing a group of sub-optimal solutions which are provided as inputs." (Prayag Narula, "Evolutionary Computing Approach for Ad-Hoc Networks", 2009)

"These algorithms mimic the process of natural evolution and perform explorative search. The main component of this method is chromosomes that represent solutions to the problem. It uses selection, crossover, and mutation to obtain chromosomes of highest quality." (Indranil Bose, "Data Mining in Tourism", 2009)

"Search algorithms used in machine learning which involve iteratively generating new candidate solutions by combining two high scoring earlier (or parent) solutions in a search for a better solution." (Radian Belu, "Artificial Intelligence Techniques for Solar Energy and Photovoltaic Applications", 2013)

"Genetic algorithms (GAs) is a stochastic search methodology belonging to the larger family of artificial intelligence procedures and evolutionary algorithms (EA). They are used to generate useful solutions to optimization and search problems mimicking Darwinian evolution." (Niccolò Gordini, "Genetic Algorithms for Small Enterprises Default Prediction: Empirical Evidence from Italy", 2014)

"Genetic algorithms are based on the biological theory of evolution. This type of algorithms is useful for searching and optimization." (Ivan Idris, "Python Data Analysis", 2014)

"A Stochastic optimization algorithms based on the principles of natural evolution." (Harish Garg, "A Hybrid GA-GSA Algorithm for Optimizing the Performance of an Industrial System by Utilizing Uncertain Data", 2015)

"It is a stochastic but not random method of search used for optimization or learning. Genetic algorithm is basically a search technique that simulates biological evolution during optimization process." (Salim Lahmir, "Prediction of International Stock Markets Based on Hybrid Intelligent Systems", 2016)

"Machine learning algorithms inspired by genetic processes, for example, an evolution where classifiers with the greatest accuracy are trained further." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

04 April 2018

🔬Data Science: Fuzzy Logic (Definitions)

"[Fuzzy logic is] a logic whose distinguishing features are (1) fuzzy truth-values expressed in linguistic terms, e. g., true, very true, more or less true, or somewhat true, false, nor very true and not very false, etc.; (2) imprecise truth tables; and (3) rules of inference whose validity is relative to a context rather than exact." (Lotfi A. Zadeh, "Fuzzy logic and approximate reasoning", 1975)

"A logic using fuzzy sets, that is, in which elements can have partial set membership." (Bruce P Douglass, "Real-Time Agility", 2009)

"A mathematical technique that classifies subjective reasoning and assigns data to a particular group, or cluster, based on the degree of possibility the data has of being in that group." (Mary J Lenard & Pervaiz Alam, "Application of Fuzzy Logic to Fraud Detection", 2009)

"A type of logic that recognizes more than simple true and false values. With fuzzy logic, propositions can be represented with degrees of truthfulness and falsehood thus it can deal with imprecise or ambiguous data. Boolean logic is considered to be a special case of fuzzy logic." (Lior Rokach, "Incorporating Fuzzy Logic in Data Mining Tasks", 2009)

"Fuzzy logic is an application area of fuzzy set theory dealing with uncertainty in reasoning. It utilizes concepts, principles, and methods developed within fuzzy set theory for formulating various forms of sound approximate reasoning. Fuzzy logic allows for set membership values to range (inclusively) between 0 and 1, and in its linguistic form, imprecise concepts like 'slightly', 'quite' and 'very'. Specifically, it allows partial membership in a set." (Larbi Esmahi et al, Adaptive Neuro-Fuzzy Systems, 2009)

"It is a Knowledge representation technique and computing framework whose approach is based on degrees of truth rather than the usual 'true' or 'false' of classical logic." (Juan C González-Castolo & Ernesto López-Mellado, "Fuzzy Approximation of DES State", 2009)

"Fuzzy logic is a theory that deals with reasoning that is approximate rather than precisely deduced from classical predicate logic. In other words, fuzzy logic deals with well thought out real world expert values in relation to a complex problem." (Goh B Hua, "A BIM Based Application to Support Cost Feasible ‘Green Building' Concept Decisions", 2010)

"We use the term fuzzy logic to refer to all aspects of representing and manipulating knowledge that employ intermediary truth-values. This general, commonsense meaning of the term fuzzy logic encompasses, in particular, fuzzy sets, fuzzy relations, and formal deductive systems that admit intermediary truth-values, as well as the various methods based on them." (Radim Belohlavek & George J Klir, "Concepts and Fuzzy Logic", 2011)

"Fuzzy logic is a form of many-valued logic derived from fuzzy set theory to deal with uncertainty in subjective belief. In contrast with 'crisp logic', where binary sets have two-valued logic, fuzzy logic variables can have a value that ranges between 0 and 1. Furthermore, when linguistic variables are used, these unit-interval numerical values may be described by specific functions." (T T Wong & Loretta K W Sze, "A Neuro-Fuzzy Partner Selection System for Business Social Networks", 2012)

"Fuzzy logic is a problem-solving methodology that is inspired by human decision-making, taking advantage of our ability to reason with vague or approximate data." (Filipe Quinaz et al, Soft Methods for Automatic Drug Infusion in Medical Care Environment, 2013)

"Approach of using approximate reasoning based on degrees of truth for computation analysis." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"It is a type of reasoning designed to mathematically represent uncertainty and vagueness where logical statements are not only true or false. Fuzzy logic is a formalized mathematical tool which is useful to deal with imprecise problems." (Salim Lahmir, "Prediction of International Stock Markets Based on Hybrid Intelligent Systems", 2016)

"Fuzzy logic is a problem solving tool of artificial intelligence which deals with approximate reasoning rather than fixed and exact reasoning." (Narendra K Kamila & Pradeep K Mallick, "A Novel Fuzzy Logic Classifier for Classification and Quality Measurement of Apple Fruit", 2016)

"A form of many-valued logic. Fuzzy logic deals with reasoning that is approximate rather than fixed and exact. Compared to traditional true or false values, fuzzy logic variables may have a truth value that ranges in degree from 0 to 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false." (Roanna Lun & Wenbing Zhao, "Kinect Applications in Healthcare", 2018)

'Fuzzy logic is a computing approach based on multi-valued logic where the variable can take any real number between 0 and 1 as a value based on degree of truthness." (Kavita Pandey & Shikha Jain, A Fuzzy-Based Sustainable Solution for Smart Farming, 2020)

"Fuzzy Logic is a form of mathematical logic in which the truth values of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1." (Alexander P Ryjov & Igor F Mikhalevich, "Hybrid Intelligence Framework for Improvement of Information Security of Critical Infrastructures", 2021)

"Fuzzy Logic is a form of logic system, where the distinction between truth and false values is not binary but multi valued, therefore allowing for a richer expression of logical statements. " (Accenture)

🔬Data Science: Normal Distribution (Definitions)

"A frequency distribution for a continuous variable, which exhibits a bell-shaped curve." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"The symmetric distribution of data about an average point. The normal distribution takes on the form of a bell-shaped curve. It is a graphic illustration of how randomly selected data points from a product or process response will mostly fall close to the average response, with fewer and fewer data points falling farther and farther away from the mean. The normal distribution can also be expressed as a mathematical function and is often called a Gaussian distribution." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A probability distribution forming a symmetrical bell-shaped curve." (Peter Oakander et al, "CPM Scheduling for Construction: Best Practices and Guidelines", 2014)

"Distribution of scores that are characterised by a bell-shaped curve in which the probability of a score drops off rapidly from the midpoint to the tails of the distribution. A true normal curve is defined by a mathematical equation and is a function of two variables (the mean and variance of the distribution)." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Also known as a bell-shaped curve or Gaussian curve, this is a distribution of data that is symmetrical around the mean: The mean, median, and mode are all equal, with more density in the center and less in the tails." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"Also known as normal or the bell curve, is a type of continuous probability distribution which is defined by two parameters, the mean µ, and the standard deviation s." (Accenture)

🔬Data Science: Graph (Definitions)

"Informally, a graph is a finite set of dots called vertices (or nodes) connected by links called edges (or arcs). More formally: a simple graph is a (usually finite) set of vertices V and set of unordered pairs of distinct elements of V called edges." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A computation object that is used to model relationships among things. A graph is defined by two finite sets: a set of nodes and a set of edges. Each node has a label to identify it and distinguish it from other nodes. Edges in a graph connect exactly two nodes and are denoted by the pair of labels of nodes that are related." (Clay Breshears, "The Art of Concurrency", 2009)

"A graph in mathematics is a set of nodes and a set of edges between pairs of those nodes; the edges are ordered or nonordered pairs, or a relation, that defines the pairs of nodes for which the relation being examined is valid. […] The edges can either be undirected or directed; directed edges depict a relation that requires the nodes to be ordered while an undirected edge defines a relation in which no ordering of the edges is implied." (Dennis M Buede, "The Engineering Design of Systems: Models and methods", 2009)

[undirected graph:] "A graph in which the nodes of an edge are unordered. This implies that the edge can be thought of as a two-way path." (Clay Breshears, "The Art of Concurrency", 2009)

[directed graph:] "A graph whose edges are ordered pairs of nodes; this allows connections between nodes in one direction. When drawn, the edges of a directed graph are commonly shown as arrows to indicate the “direction” of the edge." (Clay Breshears, "The Art of Concurrency", 2009)

"1.Generally, a set of homogeneous nodes (vertices) and edges (arcs) between pairs of nodes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[directed acyclic graph:] "A graph that defines a partial order so that nodes can be sorted into a linear sequence with references only going in one direction. A directed acyclic graph has, as its name suggests, directed edges and no cycles." (Michael McCool et al, "Structured Parallel Programming", 2012)

"A data structure that consists of a set of nodes and a set of edges that relate the nodes to each other" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[directed graph:] "A directed graph is one in which the edges have a specified direction from one vertex to another." (Dan Sullivan, "NoSQL for Mere Mortals", 2015)

[directed graph (digraph):] "A graph in which each edge is directed from one vertex to another (or the same) vertex" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[undirected graph:] "A graph in which the edges have no direction" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[undirected graph:] "An undirected graph is one in which the edges do not indicate a direction (such as from-to) between two vertices." (Dan Sullivan, "NoSQL for Mere Mortals®", 2015)

"Like a tree, a graph consists of a set of nodes connected by edges. These edges may or may not have a direction. If they do, the graph is referred to as a 'directed graph'. If a graph is directed, it may be possible to start at a node and follow edges in a path that leads back to the starting node. Such a path is called a 'cycle'. If a directed graph has no cycles, it is referred to as an 'acyclic graph'." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"In a computer science or mathematics context, a graph is a set of nodes and edges that connect the nodes." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

Undirected graph "A graph in which the edges have no direction" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

🔬Data Science: Heuristic (Definitions)

"Problem solving or analysis by experimental and especially trial-and-error methods." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The mode of analysis in which the next step is determined by the results of the current step. Used for decision support processing." (Margaret Y Chu, "Blissful Data ", 2004)

"A type of analysis in which the next step is determined by the results of the current step of analysis." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"The mode of analysis in which the next step is determined by the results of the current step of analysis. Used for decision-support processing." (William H Inmon, "Building the Data Warehouse", 2005)

"An algorithmic technique designed to solve a problem that ignores whether the solution can be proven to be correct." (Omar F El-Gayar et al, "Current Issues and Future Trends of Clinical Decision Support Systems", 2008)

"General advice that is usually efficient but sometimes cannot be used; also it is a validate function that adds a number to the state of the problem." (Attila Benko & Cecília S Lányi, "History of Artificial Intelligence", 2009)

"These methods, found through discovery and observation, are known to produce incorrect or inexact results at times but likely to produce correct or sufficiently exact results when applied in commonly occurring conditions." (Vineet R Khare & Frank Z Wang, "Bio-Inspired Grid Resource Management", Handbook of Research on Grid Technologies and Utility Computing, 2009)

"Refers to a search and discovery approach, in which we proceed gradually, without trying to find out immediately whether the partial result, which is only adopted on a provisional basis, is true or false. This method is founded on a gradual approach to a given question, using provisional hypotheses and successive evaluations." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"'Rules of thumb' and approximation methods for obtaining a goal, a high quality solution, or improved performance. It sacrifices completeness to increase efficiency, as some potential solutions would not be practicable or acceptable due to their 'rareness' or 'complexity'. This method may not always find the best solution, but it will find an acceptable solution within a reasonable timeframe for problems that will require almost infinite or longer than acceptable times to compute." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An experience-based technique for solving problems that emphasizes personal knowledge and quick decision making." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

[heuristic process:] "An iterative process, where the next step of analysis depends on the results attained in the current level of analysis" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"An algorithm that gives a good solution for a problem but that doesn’t guarantee to give you the best solution possible." (Rod Stephens, "Beginning Software Engineering", 2015)

"Rules of thumb derived by experience, intuition, and simple logic." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Problem-solving technique that yields a sub-optimal solution judged to be sufficient." (Karl Beecher, "Computational Thinking - A beginner's guide to problem-solving and programming", 2017)

"An algorithm to solve a problem simply and quickly with an approximate solution, as compared to a complex algorithm that provides a precise solution, but may take a prohibitively long time." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

30 March 2018

🔬Data Science: Decision Tree (Definitions)

"Decision trees are a way of representing a series of rules that lead to a class or value. For example, the goal may be to classify a group of householders who have moved to a new house, based on their choice of type of the new dwelling. A simple decision tree can solve this problem and illustrate all the basic components of a decision tree (the decision nodes, branches, and leaves)." (William A V Clark & Marinus C Deurloo, "Categorical Modeling/Automatic Interaction Detection", Encyclopedia of Social Measurement, 2005)

"A decision tree is a graphical representation of various alternatives and sequence of events in these multi-stage decision problems." (P C Tulsian and Vishal Pandey, "Quantitative Techniques: Theory and Problems", 2006)

"A representation of a hierarchical set of rules that lead to sets of observations based on the class or value of the response variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A decision-making method that uses a branch diagram to portray different options and outcomes." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"It is technique for classifying data. The root node of a decision tree represents all examples. If these examples belong to two or more classes, then the most discriminating attribute is selected and the set is split into multiple classes." (Indranil Bose, "Data Mining in Tourism", 2009)

"A graph of decisions and their possible consequences (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. Regression trees approximate real-valued functions (e.g., estimate the price of a house or a patient's length of stay in a hospital). Classification trees define the logic for categorization using Boolean variables such as gender (male or female) or game results (lose or win)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A treelike model of data produced by certain data mining methods. Decision trees can be used for prediction." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A graphic tool for specifying the action that will result from each combination of a set of conditions." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"Decision trees are decision support models that classify patterns using a sequence of well-defined rules. They are tree-like graphs in which each branch node represents an option between a number of alternatives, and each leaf node represents an outcome of the cumulative choices." (Joo Chuan Tong & Shoba Ranganathan, "Computational T cell vaccine design", Computer-Aided Vaccine Design, 2013)

"The Decision Tree is a form of flow diagram that helps to map out complicated decision-making processes, or the possible directions a conversation or interaction might take." (Kevin Duncan, "The Diagrams Book", 2013)

"A family of classification methods whose results are usually represented in a tree-like graph." (Meta S Brown, "Data Mining For Dummies", 2014)

"A tool to help make decisions based on a set of rules that help to navigate the tree along its branches." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Big Data, Big Innovation", 2014)

"A representation of knowledge in a tree-like form usually used for classification. The non-terminal nodes of the tree represent questions, the terminal nodes represent class labels and the edges represent answers to questions." (Petr Berka, "Machine Learning", 2015)

"Decision tree learning is a supervised machine learning technique for inducing a decision tree from training data. A decision tree (also referred to as a classification tree or a reduction tree) is a predictive model which is a mapping from observations about an item to conclusions about its target value." (Lin Tan, "The Art and Science of Analyzing Software Data", 2015)

"A simple decision tree is an algorithm for determining a decision by making a sequence of logical or property tests." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"An organised pathway of ideas leading to a defined goal, in which at various points, a decision is made about which of two ‘branches’ of ideas to follow to the next decision point." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A decision tree is a largely used non-parametric effective machine learning modeling technique for regression and classification problems." (Thomas Plapinger, "What is a Decision Tree", 2017)

"A decision tree is the arrangement of data in a tree structure where, at each node, data is separated into different branches according to the value of the attribute at the node." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A model classifying a data item into one of the classes at the leaf node, based on matching properties between the branches on the tree and the actual data item." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"Decision tree is a technique that helps us in deriving rules from data. A rule-based technique is very helpful in explaining how the model is supposed to work in estimating a dependent variable value." (V Kishore Ayyadevara et al, "Hands-On Machine Learning on Google Cloud Platform", 2018)

"Decision trees are a machine learning algorithm that predicts the value of a target variable based on decision rules learned from training data. The algorithm can be applied to both regression and classification problems by changing the objective function that governs how the tree learns the decision rules." (Stefan Jansen, "Hands-On Machine Learning for Algorithmic Trading", 2018)

"A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility." (James D Miller, "Hands-On Machine Learning with IBM Watson", 2019)

"In a machine learning context, a decision tree is a data structure that is built for classification or regression tasks. Each node in the tree splits on a particular feature." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision." (WhatIs) [source]

"A tree and branch-based model, like a flow chart, used to map decisions and their possible consequences. The decision tree is widely used in machine learning for classification and regression algorithms." (Accenture)

"A treelike model of data produced by certain data mining methods." (Microsoft Technet)

🔬Data Science: Forecast/Forecasting (Definitions)

"1. A projection or an estimate of future sales, revenue, earnings, or costs. 2. A projection of future financial position and operating results of an organization." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"The outcome of a series of exercises and analysis that helps a company, division, or product group to predict the number of units they might sell or produce, or the market share they could attain." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"An estimate or prediction of conditions and events in the project's future, based on information and knowledge available at the time of the forecast. The information is based on the project's past performance and expected future performance, and includes information that could impact the project in the future, such as estimate at completion and estimate to complete." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Refers to the operation responding to a wish to 'see in advance' what will happen later in a given field. Forecasting methods typically rely on data from the past to make forward-looking extrapolations; they assume continuity with possible inflections based on expert opinion(s)." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"Anticipating the future using quantitative techniques, such as mathematical and statistical rules and analysis of past data to predict the future, plus qualitative techniques, such as expert judgment and opinions to validate or adjust predictions." (Joan C Dessinger, "Fundamentals of Performance Improvement" 3rd Ed., 2012)

"A numerical prediction of a future value for a time series. Forecasting techniques are used to identify previously unseen trends and anticipate fluctuations to facilitate better planning." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"The practice of predicting or estimating a future event or trend, typically from historical data." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A planning tool to help management to cope with the uncertainty of the future. It is based on certain assumptions based on management’s experience, knowledge and judgment and these estimates are projected into the future using techniques such as Box-Jenkins models, Delphi method, exponential smoothing, moving averages, regression analysis and trend projection. The technique of sensitivity analysis is also often used which assigns a range of values to uncertain variables in order to reduce potential errors." (Duncan Angwin & Stephen Cummings, "The Strategy Pathfinder" 3rd Ed., 2017)

"Estimates or predictions of conditions and events in the project's future based on information and knowledge available at the time of the forecast. Forecasts are updated and reissued based on work performance information provided as the project is executed." (Project Management Institute, "Practice Standard for Scheduling" 3rd Ed., 2019)

"Forecast usually refers to a projected value for a metric. Organizations will often create a forecast that is different than their target for a given metric. There are multiple types of forecasting methods for creating forecasts based on past data and usage of them varies widely across organizations." (Intrafocus)

29 March 2018

🔬Data Science: Mining Model (Definitions)

"An object that contains the definition of a data mining process and the results of the training activity." (Microsoft Technet)

"Built from a mining structure, the mining model applies an algorithm to the data and processes it so that predictions can be made." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"An object that contains the definition of a data mining process and the results of the training activity. For example, a data mining model may specify the input, output, algorithm, and other properties of the process and hold the information gathered during the training activity, such as a decision tree." (Microsoft, SQL Server 2012 Glossary", 2012)

"The output of a data mining function that describes patterns and relationships that are discovered in historical data. A data mining model can be applied to new data for predicting likely new outcomes." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

23 March 2018

🔬Data Science: Self-Similarity (Definitions)

"A process is said to be self-similar if its behavior is roughly the same across different spacial or time scales." (Artur Ziviani, "Internet Measurements", 2008)

"Self-similarity implies that a change of the time scale is equivalent to a change in state space scale. For discrete processes, self-similarity can be described as distributional invariance upon aggregation and scaling." (Federico Montesino Pouzols et al, "Performance Measurement of Computer Networks", 2008)

"When applied to stochastic processes, it indicates that the process follows the same distribution on all time scales." (David Rincón & Sebastià Sallent, "Scaling Properties of Network Traffic", 2008)

18 March 2018

🔬Data Science: Linear Regression (Definitions)

"A regression model that uses the equation for a straight line." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A quantitative model building tool that relates one or more independent variables (Xs) to a single dependent variable (Y)." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"A regression that deals with a straight-line relationship between variables. It is in the form of Y = a + bX, whereas nonlinear regression involves curvilinear relationships, such as exponential and quadratic functions." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"In statistics, a method of modeling the relationship between dependent and independent variables. Linear regression creates a model by fitting a straight line to the values in a dataset." (Meta S Brown, "Data Mining For Dummies", 2014)

"Linear regression is a statistical technique for modeling the relationship between a single variable and one or more other variables. In a machine learning context, linear regression refers to a regression model based on this statistical technique." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"is an area of unsupervised machine learning that uses linear predictor functions to understand the relationship between a scalar dependent variable and one or more explanatory variables." (Accenture)

16 March 2018

🔬Data Science: Monte Carlo Simulation (Definitions)

"A computer-simulation technique that uses sampling from a random number sequence to simulate characteristics or events or outcomes with multiple possible values." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A simulation in which random events are modeled using pseudo random number generators so that many replications of the random events may be evaluated statistically." (Norman Pendegraft & Mark Rounds, "Dynamic System Simulation for Decision Support", 2008)

"A range of computational algorithms that generates random samples from distributions with known overall properties that is used, for example, to explore potential future behaviours of financial instruments on the basis of historic properties." (Bin Li & Lee Gillam, "Grid Service Level Agreements Using Financial Risk Analysis Techniques", 2010)

"A process which generates hundreds or thousands of probable performance outcomes based on probability distributions for cost and schedule on individual tasks. The outcomes are then used to generate a probability distribution for the project as a whole." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"Monte Carlo is able to discover practical solutions to otherwise intractable problems because the most efficient search of an unmapped territory takes the form of a random walk. Today’s search engines, long descended from their ENIAC-era ancestors, still bear the imprint of their Monte Carlo origins: random search paths being accounted for, statistically, to accumulate increasingly accurate results. The genius of Monte Carlo - and its search-engine descendants - lies in the ability to extract meaningful solutions, in the face of overwhelming information, by recognizing that meaning resides less in the data at the end points and more in the intervening paths." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The genius of Monte Carlo - and its search-engine descendants - lies in the ability to extract meaningful solutions, in the face of overwhelming information, by recognizing that meaning resides less in the data at the end points and more in the intervening paths." (George B Dyson, "Turing's Cathedral: The Origins of the Digital Universe", 2012)

"The technique used by project management applications to estimate the likely range of outcomes from a complex random process by simulating the process a large number of times." (Christopher Carson et al, "CPM Scheduling for Construction: Best Practices and Guidelines", 2014)

"A method for estimating uncertainty in a variable which is a complex function of one or more probability distributions; it uses random numbers to provide an estimate of the distribution and a random number generator to produce random samples from the probabilistic levels." (María C Carnero, "Benchmarking of the Maintenance Service in Health Care Organizations", 2017)

"An analysis technique where a computer model is iterated many times, with the input values chosen at random for each iteration driven by the input data, including probability distributions and probabilistic branches. Outputs are generated to represent the range of possible outcomes for the project." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide )", 2017)

"A computerized simulation technique which is usually used for analyzing the behaviour of a system or a process involving uncertainties." (Henry Xu & Renae Agrey, "Major Techniques and Current Developments of Supply Chain Process Modelling", 2019)

"'What if' analysis of the future project scenarios, provided a mathematical/ logical model of the project implemented on a computer." (Franco Caron, "Project Control Using a Bayesian Approach", 2019)

15 March 2018

🔬Data Science: Neural Network [NN] (Definitions)

"Information processing systems, inspired by biological neural systems but not limited to modeling such systems. Neural networks consist of many simple processing elements joined by weighted connection paths." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"A computing model based on the architecture of the brain consisting of multiple simple processing units connected by adaptive weights." (Joseph P Bigus, "Data Mining with Neural Networks", 1996)

[Feedback neural network:] "A network in which there are connections from output to input neurons." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[Feedforward neural network: "A neural network in which there are no connections back from output to input neurons." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[Fuzzy neural network (FNN): "Neural network designed to realize a fuzzy system, consisting of fuzzy rules, fuzzy variables, and fuzzy values defined for them and the fuzzy inference method." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[probabilistic neural network (PNN):] "A feedforward neural network trained using supervised learning that allocates a hidden unit for each input pattern." (Joseph P Bigus, "Data Mining with Neural Networks", 1996)

"A system that applies neural computation. An adaptive, nonlinear dynamical system. Its equilibrium states can recall or recognize a stored pattern or can solve a mathematical or computational problem." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)

"A nonlinear modeling technique comprising of a series of interconnected nodes with weights, which are adjusted as the network learns." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A network modelled after the neurons in a biological nervous system with multiple synapses and layers. It is designed as an interconnected system of processing elements organized in a layered parallel architecture. These elements are called neurons and have a limited number of inputs and outputs. NNs can be trained to find nonlinear relationships in data, enabling specific input sets to lead to given target outputs." (Ioannis Papaioannou et al, "A Survey on Neural Networks in Automated Negotiations", Encyclopedia of Artificial Intelligence, 2009)

"A network of many simple processors ('units' or 'neurons') that imitates a biological neural network. The units are connected by unidirectional communication channels, which carry numeric data. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis." (Fernando Mateo et al, "A 2D Positioning Application in PET Using ANNs", Encyclopedia of Artificial Intelligence, 2009)

[Probabilistic Neural Network (PNN):] "A neural network using kernel-based approximation to form an estimate of the probability density functions of classes in a classification problem." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Structure composed of a group of interconnected artificial neurons or units. The objective of a NN is to transform the inputs into meaningful outputs." (M Paz S Lorente et al, Ensemble of ANN for Traffic Sign Recognition [in "Encyclopedia of Artificial Intelligence"], 2009)

"Techniques modeled after the (hypothesized) processes of learning in the cognitive system and the neurological functions of the brain and capable of predicting new observations (on specific variables) from other observations (on the same or other variables) after inducing a model from existing data. These techniques are also sometimes described as flexible nonlinear regression models, discriminant models, data reduction models, and multilayer nonlinear models." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A dynamic system in which outputs are calculated by a summation of weighted functions operating on inputs. Weights for the individual functions are determined by a learning process, simulating the learning process hypothesized for human neurons. In the computer model, individual functions that contribute to a correct output (based on the training data) have their weights increased (strengthening their influence to the calculated output)." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"An algorithm that conceptually mimics the learning patterns of biological neural networks by adaptively adjusting a series of classification functions in a nonlinear nature to maximize predictive accuracy, given a series of inputs." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A family of model types capable of simulating some very complex systems." (Meta S Brown, "Data Mining For Dummies", 2014)

"A neural network is a network of neurons - units with inputs and outputs. The output of a neuron can be passed to a neuron and so on, thus creating a multilayered network. Neural networks contain adaptive elements, making them suitable to deal with nonlinear models and pattern recognition problems." (Ivan Idris, "Python Data Analysis", 2014)

"Neural network algorithms are designed to emulate human/animal brains. The network consists of input nodes, hidden layers, and output nodes. Each of the units is assigned a weight. Using an iterative approach, the algorithm continuously adjusts the weights until it reaches a specific stopping point." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"A model composed of a network of simple processing units called neurons and connections between neurons called synapses. Each synapse has a direction and a weight, and the weight defines the effect of the neuron before on the neuron after." (Ethem Alpaydın, "Machine learning : the new AI", 2016)

"A powerful set of algorithms whose objective is to find a pattern of behavior. They are called neural because they are based on how biological neurons work when processing information. These networks try to simulate the way the neural network of a live being processes, recognizes and transmits the information. The implementation of neural networks in very different fields is due to their good performance relative to other methods" (Felix Lopez-Iturriaga & Iván Pastor-Sanz, "Using Self Organizing Maps for Banking Oversight: The Case of Spanish Savings Banks", 2016)

"Neural networks are learning algorithms that mimic the human brain in learning mechanics and complexity." (Davy Cielen et al, "Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools", 2016)

"A machine learning algorithm consisting of a network of simple classifiers that make decisions based on the input or the results of the other classifiers in the network." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A type of machine-learning model that is implemented as a network of simple processing units called neurons. It is possible to create a variety of different types of neural networks by modifying the topology of the neurons in the network. A feed-forward, fully connected neural network is a very common type of network that can be trained using backpropagation." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"Neural networks refer to a family of models that are defined by an input layer (a vectorized representation of input data), a hidden layer that consists of neurons and synapses, and an output layer with the predicted values. Within the hidden layer, synapses transmit signals between neurons, which rely on an activation function to buffer incoming signals. The synapses apply weights to incoming values, and the activation function determines if the weighted inputs are sufficiently high to activate the neuron and pass the values on to the next layer of the network." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Fully connected network with minimum of three layers namely input layer, output layer and hidden layer." (S Kayalvizhi & D Thenmozhi, "Deep Learning Approach for Extracting Catch Phrases from Legal Documents", 2020)

"An artificial network of nodes, used for predictive modelling. It is generally used to tackle classification problems and AI related applications." (R Karthik et al, "Performance Analysis of GAN Architecture for Effective Facial Expression Synthesis", 2021)

"A neural network (NN) is a network of many simple processors ('units'), each possibly having a small amount of local memory. The units are connected by communication channels ('connections') which usually carry numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections." (Statistics.com) [source]

"Are a very advanced and elegant form of computing system. Machine learning neural networks consist of an interconnected set of "nodes" which mimic the network of neurons in a biological brain. Common applications include optical character recognition and facial recognition." (Accenture)