30 March 2018

Data Science: Decision Tree (Definitions)

"Decision trees are a way of representing a series of rules that lead to a class or value. For example, the goal may be to classify a group of householders who have moved to a new house, based on their choice of type of the new dwelling. A simple decision tree can solve this problem and illustrate all the basic components of a decision tree (the decision nodes, branches, and leaves)." (William A V Clark & Marinus C Deurloo, "Categorical Modeling/Automatic Interaction Detection", Encyclopedia of Social Measurement, 2005)

"A decision tree is a graphical representation of various alternatives and sequence of events in these multi-stage decision problems." (P C Tulsian and Vishal Pandey, "Quantitative Techniques: Theory and Problems", 2006)

"A representation of a hierarchical set of rules that lead to sets of observations based on the class or value of the response variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A decision-making method that uses a branch diagram to portray different options and outcomes." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"It is technique for classifying data. The root node of a decision tree represents all examples. If these examples belong to two or more classes, then the most discriminating attribute is selected and the set is split into multiple classes." (Indranil Bose, "Data Mining in Tourism", 2009)

"A graph of decisions and their possible consequences (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. Regression trees approximate real-valued functions (e.g., estimate the price of a house or a patient's length of stay in a hospital). Classification trees define the logic for categorization using Boolean variables such as gender (male or female) or game results (lose or win)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A treelike model of data produced by certain data mining methods. Decision trees can be used for prediction." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A graphic tool for specifying the action that will result from each combination of a set of conditions." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"Decision trees are decision support models that classify patterns using a sequence of well-defined rules. They are tree-like graphs in which each branch node represents an option between a number of alternatives, and each leaf node represents an outcome of the cumulative choices." (Joo Chuan Tong & Shoba Ranganathan, "Computational T cell vaccine design", Computer-Aided Vaccine Design, 2013)

"The Decision Tree is a form of flow diagram that helps to map out complicated decision-making processes, or the possible directions a conversation or interaction might take." (Kevin Duncan, "The Diagrams Book", 2013)

"A family of classification methods whose results are usually represented in a tree-like graph." (Meta S Brown, "Data Mining For Dummies", 2014)

"A tool to help make decisions based on a set of rules that help to navigate the tree along its branches." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Big Data, Big Innovation", 2014)

"A representation of knowledge in a tree-like form usually used for classification. The non-terminal nodes of the tree represent questions, the terminal nodes represent class labels and the edges represent answers to questions." (Petr Berka, "Machine Learning", 2015)

"Decision tree learning is a supervised machine learning technique for inducing a decision tree from training data. A decision tree (also referred to as a classification tree or a reduction tree) is a predictive model which is a mapping from observations about an item to conclusions about its target value." (Lin Tan, "The Art and Science of Analyzing Software Data", 2015)

"A simple decision tree is an algorithm for determining a decision by making a sequence of logical or property tests." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"An organised pathway of ideas leading to a defined goal, in which at various points, a decision is made about which of two ‘branches’ of ideas to follow to the next decision point." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A decision tree is a largely used non-parametric effective machine learning modeling technique for regression and classification problems." (Thomas Plapinger, "What is a Decision Tree", 2017)

"A decision tree is the arrangement of data in a tree structure where, at each node, data is separated into different branches according to the value of the attribute at the node." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A model classifying a data item into one of the classes at the leaf node, based on matching properties between the branches on the tree and the actual data item." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"Decision tree is a technique that helps us in deriving rules from data. A rule-based technique is very helpful in explaining how the model is supposed to work in estimating a dependent variable value." (V Kishore Ayyadevara et al, "Hands-On Machine Learning on Google Cloud Platform", 2018)

"Decision trees are a machine learning algorithm that predicts the value of a target variable based on decision rules learned from training data. The algorithm can be applied to both regression and classification problems by changing the objective function that governs how the tree learns the decision rules." (Stefan Jansen, "Hands-On Machine Learning for Algorithmic Trading", 2018)

"A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility." (James D Miller, "Hands-On Machine Learning with IBM Watson", 2019)

"In a machine learning context, a decision tree is a data structure that is built for classification or regression tasks. Each node in the tree splits on a particular feature." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision." (WhatIs) [source]

"A tree and branch-based model, like a flow chart, used to map decisions and their possible consequences. The decision tree is widely used in machine learning for classification and regression algorithms." (Accenture)

"A treelike model of data produced by certain data mining methods." (Microsoft Technet)

Data Science: Forecast/Forecasting (Definitions)

"1. A projection or an estimate of future sales, revenue, earnings, or costs. 2. A projection of future financial position and operating results of an organization." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"The outcome of a series of exercises and analysis that helps a company, division, or product group to predict the number of units they might sell or produce, or the market share they could attain." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"An estimate or prediction of conditions and events in the project's future, based on information and knowledge available at the time of the forecast. The information is based on the project's past performance and expected future performance, and includes information that could impact the project in the future, such as estimate at completion and estimate to complete." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Refers to the operation responding to a wish to 'see in advance' what will happen later in a given field. Forecasting methods typically rely on data from the past to make forward-looking extrapolations; they assume continuity with possible inflections based on expert opinion(s)." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"Anticipating the future using quantitative techniques, such as mathematical and statistical rules and analysis of past data to predict the future, plus qualitative techniques, such as expert judgment and opinions to validate or adjust predictions." (Joan C Dessinger, "Fundamentals of Performance Improvement" 3rd Ed., 2012)

"A numerical prediction of a future value for a time series. Forecasting techniques are used to identify previously unseen trends and anticipate fluctuations to facilitate better planning." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"The practice of predicting or estimating a future event or trend, typically from historical data." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"A planning tool to help management to cope with the uncertainty of the future. It is based on certain assumptions based on management’s experience, knowledge and judgment and these estimates are projected into the future using techniques such as Box-Jenkins models, Delphi method, exponential smoothing, moving averages, regression analysis and trend projection. The technique of sensitivity analysis is also often used which assigns a range of values to uncertain variables in order to reduce potential errors." (Duncan Angwin & Stephen Cummings, "The Strategy Pathfinder" 3rd Ed., 2017)

"Estimates or predictions of conditions and events in the project's future based on information and knowledge available at the time of the forecast. Forecasts are updated and reissued based on work performance information provided as the project is executed." (Project Management Institute, "Practice Standard for Scheduling" 3rd Ed., 2019)

"Forecast usually refers to a projected value for a metric. Organizations will often create a forecast that is different than their target for a given metric. There are multiple types of forecasting methods for creating forecasts based on past data and usage of them varies widely across organizations." (Intrafocus)

18 March 2018

Data Science: Linear Regression (Definitions)

"A regression model that uses the equation for a straight line." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A quantitative model building tool that relates one or more independent variables (Xs) to a single dependent variable (Y)." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"A regression that deals with a straight-line relationship between variables. It is in the form of Y = a + bX, whereas nonlinear regression involves curvilinear relationships, such as exponential and quadratic functions." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"In statistics, a method of modeling the relationship between dependent and independent variables. Linear regression creates a model by fitting a straight line to the values in a dataset." (Meta S Brown, "Data Mining For Dummies", 2014)

"Linear regression is a statistical technique for modeling the relationship between a single variable and one or more other variables. In a machine learning context, linear regression refers to a regression model based on this statistical technique." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"is an area of unsupervised machine learning that uses linear predictor functions to understand the relationship between a scalar dependent variable and one or more explanatory variables." (Accenture)

16 March 2018

Data Science: Monte Carlo Simulation (Definitions)

"A computer-simulation technique that uses sampling from a random number sequence to simulate characteristics or events or outcomes with multiple possible values." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A simulation in which random events are modeled using pseudo random number generators so that many replications of the random events may be evaluated statistically." (Norman Pendegraft & Mark Rounds, "Dynamic System Simulation for Decision Support", 2008)

"A range of computational algorithms that generates random samples from distributions with known overall properties that is used, for example, to explore potential future behaviours of financial instruments on the basis of historic properties." (Bin Li & Lee Gillam, "Grid Service Level Agreements Using Financial Risk Analysis Techniques", 2010)

"A process which generates hundreds or thousands of probable performance outcomes based on probability distributions for cost and schedule on individual tasks. The outcomes are then used to generate a probability distribution for the project as a whole." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"The technique used by project management applications to estimate the likely range of outcomes from a complex random process by simulating the process a large number of times." (Christopher Carson et al, "CPM Scheduling for Construction: Best Practices and Guidelines", 2014)

"A method for estimating uncertainty in a variable which is a complex function of one or more probability distributions; it uses random numbers to provide an estimate of the distribution and a random number generator to produce random samples from the probabilistic levels." (María C Carnero, "Benchmarking of the Maintenance Service in Health Care Organizations", 2017)

"An analysis technique where a computer model is iterated many times, with the input values chosen at random for each iteration driven by the input data, including probability distributions and probabilistic branches. Outputs are generated to represent the range of possible outcomes for the project." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide )", 2017)

"A computerized simulation technique which is usually used for analyzing the behaviour of a system or a process involving uncertainties." (Henry Xu & Renae Agrey, "Major Techniques and Current Developments of Supply Chain Process Modelling", 2019)

"'What if' analysis of the future project scenarios, provided a mathematical/ logical model of the project implemented on a computer." (Franco Caron, "Project Control Using a Bayesian Approach", 2019)

Data Science: Data Pipeline/Pipelining (Definitions)

"A series of operations in an aggregation process." (MongoDb, "Glossary", 2008)

"A series of processes all in a row, linked by pipes, where each passes its output stream to the next." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"Description of the process workflow in sequential order." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"In data processing, a pipeline is a sequence of processing steps combined into a single object. In Spark MLlib, a pipeline is a sequence of stages. A Pipeline is an estimator containing transformers, estimators, and evaluators. When it is trained, it produces a PipelineModel containing transformers, models, and evaluators." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"Abstract concept used to describe where work is broken into several steps which enable multiple tasks to be in progress at the same time. Pipelining is applied in processors to increase processing of machine language instructions and is also a category of functional decomposition that reduces the synchronization cost while maintaining many of the benefits of concurrent execution." (Max Domeika, "Software Development for Embedded Multi-core Systems", 2011)

"A technique that breaks an instruction into smaller steps that can be overlapped" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[pipeline pattern:] "A set of data processing elements connected in series, generally so that the output of one element is the input of the next one. The elements of a pipeline are often executed concurrently. Describing many algorithms, including many signal processing problems, as pipelines is generally quite natural and lends itself to parallel execution. However, in order to scale beyond the number of pipeline stages, it is necessary to exploit parallelism within a single pipeline stage." (Michael McCool et al, "Structured Parallel Programming", 2012)

"A data pipeline is a general term for a process that moves data from a source to a destination. ETL (extract, transform, and load) uses a data pipeline to move the data it extracts from a source to the destination, where it loads the data." (Jake Stein)

"A data pipeline is a piece of infrastructure responsible for routing data from where it is to where it needs to go and provide any necessary transformations through that process." (Precisely) [source

"A data pipeline is a service or set of actions that process data in sequence. This means that the results or output from one segment of the system become the input for the next. The usual function of a data pipeline is to move data from one state or location to another."(SnapLogic) [source]

"A data pipeline is a software process that takes data from sources and pushes it to a destination. Most modern data pipelines are automated with an ETL (Extract, Transform, Load) platform." (Xplenty) [source

"A data pipeline is a set of actions that extract data (or directly analytics and visualization) from various sources. It is an automated process: take these columns from this database, merge them with these columns from this API, subset rows according to a value, substitute NAs with the median and load them in this other database." (Alan Marazzi)

"A source and all the transformations and targets that receive data from that source. Each mapping contains one or more pipelines." (Informatica)

"An ETL Pipeline refers to a set of processes extracting data from an input source, transforming the data, and loading into an output destination such as a database, data mart, or a data warehouse for reporting, analysis, and data synchronization." (Databricks) [source]

"Data pipeline consists of a set of actions performed in real-time or in batches, that captures data from various sources, sorting it and then moving that data through applications, filters, and APIs for storage and analysis." (EAI) 

Data Science: Regression Analysis (Definitions)

"A set of statistical operations that helps to predict the value of the dependent variable from the values of one or more independent variables." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"A statistical tool that measures the strength of relationship between one or more independent variables with a dependent variable. It builds upon the correlation concepts to develop an empirical, databased model. Correlation describes the X and Y relationship with a single number (the Pearson’s Correlation Coefficient (r)), whereas regression summarizes the relationship with a line - the regression line." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"A statistical procedure for estimating mathematically the average relationship between the dependent variable (e.g., sales) and one or more independent variables (e.g., price and advertising)." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"Regression analysis is a statistical technique for estimating the relationship between a set of predictors (independent variables) and an outcome variable (dependent variable). Linear least-squares regression, in which the relationship is expressed in a linear form, is the most common type of regression analysis. The mathematical model used in least-squares linear regression is often called the general linear model (GLM)." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"A statistical technique which seeks to find a line which best fits through a set of data as plotted on a graph, seeking to find the cleanest path which deviates the least from any instance within the set." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[regression] "Using one data set to predict the results of a second." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The statistical process of predicting one or more continuous variables, such as profit or loss, based on other attributes in the dataset." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A family of methods for fitting a line or curve to a dataset, used to simplify or make sense of a number of apparently random data points." (Meta S Brown, "Data Mining For Dummies", 2014)

"An analytic technique where a series of input variables are examined in relation to their corresponding output results in order to develop a mathematical or statistical relationship." (For Dummies, "PMP Certification All-in-One For Dummies, 2nd Ed.", 2013)

"A statistical technique for estimating relationships between variables." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

 "Process to statistically estimate the relationship between different attributes." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"Plotting pairs of independent and dependent variables in an XY chart and then finding a linear or exponential equation that best describes the plotted data." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)

"A statistical procedure that produces an equation for predicting a variable (the criterion measure) from one or more other variables (the predictor measures)." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A statistical technique used to estimate the mathematical relationship between a dependent variable, such as quantity demanded, and one or more explanatory variables, such as price and income." (Jeffrey M Perloff & James A Brander, "Managerial Economics and Strategy" 2nd Ed., 2016)

"A statistical process for estimating the relationships between variables, often used to forecast the change in a variable based on changes in other variables. Linear regression is used to analyze continuous variables, and logistic regression is used for discrete variables." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"In a machine learning context, regression is the task of assigning scalar value to examples." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"Algorithms used to predict values for new data based on training data fed into the system. Areas where regression in machine learning is used to predict future values include drug response modeling, marketing, real estate and financial forecasting." (Accenture)

"To define the dependency between variables. It assumes a one-way causal effect from one variable to the response of another variable." (Analytics Insight)

15 March 2018

Data Science: Neural Network (Definitions)

"Information processing systems, inspired by biological neural systems but not limited to modeling such systems. Neural networks consist of many simple processing elements joined by weighted connection paths." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"A computing model based on the architecture of the brain consisting of multiple simple processing units connected by adaptive weights." (Joseph P Bigus, "Data Mining with Neural Networks", 1996)

[Feedback neural network:] "A network in which there are connections from output to input neurons." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[Feedforward neural network: "A neural network in which there are no connections back from output to input neurons." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[Fuzzy neural network (FNN): "Neural network designed to realize a fuzzy system, consisting of fuzzy rules, fuzzy variables, and fuzzy values defined for them and the fuzzy inference method." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

[probabilistic neural network (PNN):] "A feedforward neural network trained using supervised learning that allocates a hidden unit for each input pattern." (Joseph P Bigus, "Data Mining with Neural Networks", 1996)

"A system that applies neural computation. An adaptive, nonlinear dynamical system. Its equilibrium states can recall or recognize a stored pattern or can solve a mathematical or computational problem." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)

"A nonlinear modeling technique comprising of a series of interconnected nodes with weights, which are adjusted as the network learns." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A network modelled after the neurons in a biological nervous system with multiple synapses and layers. It is designed as an interconnected system of processing elements organized in a layered parallel architecture. These elements are called neurons and have a limited number of inputs and outputs. NNs can be trained to find nonlinear relationships in data, enabling specific input sets to lead to given target outputs." (Ioannis Papaioannou et al, "A Survey on Neural Networks in Automated Negotiations", Encyclopedia of Artificial Intelligence, 2009)

"A network of many simple processors ('units' or 'neurons') that imitates a biological neural network. The units are connected by unidirectional communication channels, which carry numeric data. Neural networks can be trained to find nonlinear relationships in data, and are used in applications such as robotics, speech recognition, signal processing or medical diagnosis." (Fernando Mateo et al, "A 2D Positioning Application in PET Using ANNs", Encyclopedia of Artificial Intelligence, 2009)

[Probabilistic Neural Network (PNN):] "A neural network using kernel-based approximation to form an estimate of the probability density functions of classes in a classification problem." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Structure composed of a group of interconnected artificial neurons or units. The objective of a NN is to transform the inputs into meaningful outputs." (M Paz S Lorente et al, Ensemble of ANN for Traffic Sign Recognition [in "Encyclopedia of Artificial Intelligence"], 2009)

"Techniques modeled after the (hypothesized) processes of learning in the cognitive system and the neurological functions of the brain and capable of predicting new observations (on specific variables) from other observations (on the same or other variables) after inducing a model from existing data. These techniques are also sometimes described as flexible nonlinear regression models, discriminant models, data reduction models, and multilayer nonlinear models." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A dynamic system in which outputs are calculated by a summation of weighted functions operating on inputs. Weights for the individual functions are determined by a learning process, simulating the learning process hypothesized for human neurons. In the computer model, individual functions that contribute to a correct output (based on the training data) have their weights increased (strengthening their influence to the calculated output)." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"An algorithm that conceptually mimics the learning patterns of biological neural networks by adaptively adjusting a series of classification functions in a nonlinear nature to maximize predictive accuracy, given a series of inputs." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A family of model types capable of simulating some very complex systems." (Meta S Brown, "Data Mining For Dummies", 2014)

"A neural network is a network of neurons - units with inputs and outputs. The output of a neuron can be passed to a neuron and so on, thus creating a multilayered network. Neural networks contain adaptive elements, making them suitable to deal with nonlinear models and pattern recognition problems." (Ivan Idris, "Python Data Analysis", 2014)

"Neural network algorithms are designed to emulate human/animal brains. The network consists of input nodes, hidden layers, and output nodes. Each of the units is assigned a weight. Using an iterative approach, the algorithm continuously adjusts the weights until it reaches a specific stopping point." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"A model composed of a network of simple processing units called neurons and connections between neurons called synapses. Each synapse has a direction and a weight, and the weight defines the effect of the neuron before on the neuron after." (Ethem Alpaydın, "Machine learning : the new AI", 2016)

"A powerful set of algorithms whose objective is to find a pattern of behavior. They are called neural because they are based on how biological neurons work when processing information. These networks try to simulate the way the neural network of a live being processes, recognizes and transmits the information. The implementation of neural networks in very different fields is due to their good performance relative to other methods" (Felix Lopez-Iturriaga & Iván Pastor-Sanz, "Using Self Organizing Maps for Banking Oversight: The Case of Spanish Savings Banks", 2016)

"Neural networks are learning algorithms that mimic the human brain in learning mechanics and complexity." (Davy Cielen et al, "Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools", 2016)

"A machine learning algorithm consisting of a network of simple classifiers that make decisions based on the input or the results of the other classifiers in the network." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A type of machine-learning model that is implemented as a network of simple processing units called neurons. It is possible to create a variety of different types of neural networks by modifying the topology of the neurons in the network. A feed-forward, fully connected neural network is a very common type of network that can be trained using backpropagation." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"Neural networks refer to a family of models that are defined by an input layer (a vectorized representation of input data), a hidden layer that consists of neurons and synapses, and an output layer with the predicted values. Within the hidden layer, synapses transmit signals between neurons, which rely on an activation function to buffer incoming signals. The synapses apply weights to incoming values, and the activation function determines if the weighted inputs are sufficiently high to activate the neuron and pass the values on to the next layer of the network." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"Fully connected network with minimum of three layers namely input layer, output layer and hidden layer." (S Kayalvizhi & D Thenmozhi, "Deep Learning Approach for Extracting Catch Phrases from Legal Documents", 2020)

"An artificial network of nodes, used for predictive modelling. It is generally used to tackle classification problems and AI related applications." (R Karthik et al, "Performance Analysis of GAN Architecture for Effective Facial Expression Synthesis", 2021)

"A neural network (NN) is a network of many simple processors ('units'), each possibly having a small amount of local memory. The units are connected by communication channels ('connections') which usually carry numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections." (Statistics.com) [source]

"Are a very advanced and elegant form of computing system. Machine learning neural networks consist of an interconnected set of "nodes" which mimic the network of neurons in a biological brain. Common applications include optical character recognition and facial recognition." (Accenture)

Data Science: Logistic Regression (Definitions)

"A regression equation used to predict a binary variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A regression model where the dependent variable takes on a limited number of discrete values, often two values representing yes and no." (Peter L Stenberg & Mitchell Morehart, "Characteristics of Farm and Rural Internet Use in the USA", 2008)

"Technique for making predictions when a dependent variable is a categorical dichotomy, and the independent variable(s) are continuous and/or categorical." (Ken J Farion et al, "Clinical Decision Making by Emergency Room Physicians and Residents", 2008)

"A form of regression analysis in which the target variable (response variable) is a binary-level or ordinal-level response and the target estimate is bounded at the extremes." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A modeling technique where unknown values are predicted by known values of other valuables where the dependent variable is binary type." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Logistic regression is a method of statistical modeling appropriate for categorical outcome variables. It describes the relationship between a categorical response variable and a set of explanatory variables." (Leping Liu & Livia D’Andrea, "Initial Stages to Create Online Graduate Communities: Assessment and Development", 2011)

"Like linear regression, a statistical method of modeling the relationship between dependent and independent variables based on probability. However, in binary logistic regression, the dependent variable (the effect, or outcome) can have only one of two values, as in, say, a baby’s sex or the results of an election. (Multinomial logistic regression allows for more than two possible values.) A logistic regression model is formed by fitting data to a logit function. (The dependent variable is a 0 or 1, and the regression curve is shaped something like the letter 's'.) market basket analysis: The identification of product combinations frequently purchased within a single transaction." (Meta S Brown, "Data Mining For Dummies", 2014)

"Logistic regression is a statistical method for determining the relationship between independent predictor variables (such as financial ratios) and a dichotomously coded dependent variable (such as default or non-default)." (Niccolò Gordini, "Genetic Algorithms for Small Enterprises Default Prediction: Empirical Evidence from Italy", 2014)

"Logistic regression is a predictive analytic method for describing and explaining the relationships between a categorical dependent variable and one or more continuous or categorical independent variables in the recent and past existing data in efforts to build predictive models for predicting a membership of individuals or products into two groups or categories." (Sema A Kalaian & Rafa M Kasim, "Predictive Analytics", 2015)

"Form of regression analysis where the dependent variable is a category rather than a continuous variable. An example of a continuous variable is sales or profit. In order to understand customer retention, regression analysis would calculate the effects of variables such as age, demographics, products purchased, and competitor information on two categories: retaining the customer and losing the customer." (Brittany Bullard, "Style and Statistics", 2016)

"A regression model that is used when the dependent variable is qualitative and a probability is assigned to an observation for the likelihood that the target variable has a value of 1." (Alan Olinsky et al, Visualization of Predictive Modeling for Big Data Using Various Approaches When There Are Rare Events at Differing Levels, 2018)

"Logistic regression analysis is mainly used in epidemiology. The most common case is to explore the risk factors of a certain disease and predict the probability of the occurrence of a certain disease according to the risk factors." (Chunfa Xu et al, "Crime Hotspot Prediction Using Big Data in China", 2020)

"Logistic regression is a classification algorithm that comes under supervised learning and is used for predictive learning. Logistic regression is used to describe data. It works best for dichotomous (binary) classification." (Astha Baranwal et al, "Machine Learning in Python: Diabetes Prediction Using Machine Learning", 2020)

"Logistic regression is a statistical technique for modeling the probability of an event. In a machine learning context, logistic regression refers to a classification model based on this statistical technique." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"This is a kind of regression analysis often used when the outcome variable is dichotomous and scored 0, 1. Logistic regression is also known as logit regression and when the dependent variable has more than two categories it is called multinomial. Logistic regression is used when predicting whether an event will happen or not." (John K Rugutt & Caroline C Chemosit, "Student Collaborative Learning Strategies: A Logistic Regression Analysis Approach", 2021)

Data Science: Training (Definitions)

"A step by step procedure for adjusting the weights in a neural net." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

[supervised training:] "Process of adjusting the weights in a neural net using a learning algorithm; the desired output for each of a set of training input vectors is presented to the net. Many iterations through the training data may be required." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

[unsupervised training:] "A training procedure in which only input vectors x are supplied to a neural network; the network learns some internal features of the whole set of all the input vectors presented to it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The process of adjusting the connection weights in a neural network under the control of a learning algorithm." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

[supervised training:] "Training of a neural network when the training examples comprise input vectors x and the desired output vectors y; training is performed until the neural network 'learns' to associate each input vector x with its corresponding and desired output vector y." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Exposing a neural computing system to a set of example stimuli to achieve a particular user-defined goal." (Guido J Deboeck and Teuvo Kohonen, "Visual explorations in finance with self-organizing maps", 2000)

"The process used to configure an artificial neural network by repeatedly exposing it to sample data. In feed-forward networks, as each incoming vector or individual input is processed, the network produces an output for that case. With each pass of every case vector in a sample (see epoch), connection weights between neurons are modified. A typical training regime may require tens to thousands of complete epochs before the network converges (see convergence)." (David Scarborough & Mark J Somers, "Neural Networks in Organizational Research: Applying Pattern Recognition to the Analysis of Organizational Behavior", 2006)

"The process a data mining model uses to estimate model parameters by evaluating a set of known and predictable data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In data mining, the process of fitting a model to data. This is an iterative process and may involve thousands of iterations or more." (Meta S Brown, "Data Mining For Dummies", 2014)

"The process of adjusting the weights and threshold values in a neural net to get a desired outcome" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"Model training is the process of fitting a model to data." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"Model Training is how artificial intelligence (AI) is taught to perform its tasks, and in many ways follows the same process that new human recruits must also undergo. AI training data needs to be unbiased and comprehensive to ensure that the AI’s actions and decisions do not unintentionally disadvantage a set of people. A key feature of responsible AI is the ability to demonstrate how an AI has been trained." (Accenture)

14 March 2018

Data Science: Deep Learning (Definitions)

"Deep learning is an area of machine learning that emerged from the intersection of neural networks, artificial intelligence, graphical modeling, optimization, pattern recognition and signal processing." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Methods that are used to train models with several levels of abstraction from the raw input to the output. For example, in visual recognition, the lowest level is an image composed of pixels. In layers as we go up, a deep learner combines them to form strokes and edges of different orientations, which can then be combined to detect longer lines, arcs, corners, and junctions, which in turn can be combined to form rectangles, circles, and so on. The units of each layer may be thought of as a set of primitives at a different level of abstraction." (Ethem Alpaydın, "Machine learning : the new AI", 2016)

"A branch of machine learning to whose architectures belong deep ANNs. The term “deep” denotes the application of multiple layers with a complex structure." (Iva Mihaylova, "Applications of Artificial Neural Networks in Economics and Finance", 2018)

"A deep-learning model is a neural network that has multiple (more than two) layers of hidden units (or neurons). Deep networks are deep in terms of the number of layers of neurons in the network. Today many deep networks have tens to hundreds of layers. The power of deep-learning models comes from the ability of the neurons in the later layers to learn useful attributes derived from attributes that were themselves learned by the neurons in the earlier layers." (John D Kelleher & Brendan Tierney, "Data science", 2018)

"Also known as deep structured learning or hierarchical learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms." (Soraya Sedkaoui, "Big Data Analytics for Entrepreneurial Success", 2018)

"Deep learning broadly describes the large family of neural network architectures that contain multiple, interacting hidden layers." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"It is a part of machine learning approach used for learning data representations." (Dharmendra S Rajput et al, "Investigation on Deep Learning Approach for Big Data: Applications and Challenges", 2018)

"The ability of a neural network to improve its learning process." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A learning algorithm using a number of layers for extracting and learning feature hierarchies before providing an output for any input." (Tanu Wadhera & Deepti Kakkar, "Eye Tracker: An Assistive Tool in Diagnosis of Autism Spectrum Disorder", 2019)

"A part of a broader family of machine learning methods based on learning data representations." (Nil Goksel & Aras Bozkurt, "Artificial Intelligence in Education: Current Insights and Future Perspectives", 2019)

"A recent method of machine learning based on neural networks with more than one hidden layer." (Samih M Jammoul et al, "Open Source Software Usage in Education and Research: Network Traffic Analysis as an Example", 2019)

"A subbranch of machine learning which inspires from the artificial neural network. It has eliminated the need to design handcrafted features as in deep learning features are automatically learned by the model from the data." (Aman Kamboj et al, "Ear Localizer: A Deep-Learning-Based Ear Localization Model for Side Face Images in the Wild", 2019)

"It is class of one machine learning algorithms that can be supervised, unsupervised, or semi-supervised. It uses multiple layers of processing units for feature extraction and transformation." (Siddhartha Kumar Arjaria & Abhishek S Rathore, "Heart Disease Diagnosis: A Machine Learning Approach", 2019)

"Is the complex, unsupervised processing of unstructured data in order to create patterns used in decision making, patterns that are analogous to those of the human brain." (Samia H Rizk, "Risk-Benefit Evaluation in Clinical Research Practice", 2019)

"The ability for machines to autonomously mimic human thought patterns through artificial neural networks composed of cascading layers of information." (Kirti R Bhatele et al, "The Role of Artificial Intelligence in Cyber Security", 2019)

"The method for solving problems that have more probabilistic calculations based on artificial neural networks." (Tolga Ensari et al, "Overview of Machine Learning Approaches for Wireless Communication", 2019)

"A category of machine learning methods which is inspired by the artificial neural networks" (Shouvik Chakraborty & Kalyani Mali, "An Overview of Biomedical Image Analysis From the Deep Learning Perspective", 2020)

"A sub-field of machine learning which is based on the algorithms and layers of artificial networks." (S Kayalvizhi & D Thenmozhi, "Deep Learning Approach for Extracting Catch Phrases from Legal Documents", 2020)

"A type of machine learning based on artificial neural networks. It can be supervised, unsupervised, or semi-supervised, and it uses an artificial neural network with multiple layers between the input and output layers." (Timofei Bogomolov et al, "Identifying Patterns in Fresh Produce Purchases: The Application of Machine Learning Techniques", 2020)

"An extension of machine learning approach, which uses neural network." (Neha Garg & Kamlesh Sharma, "Machine Learning in Text Analysis", 2020)

"Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised." (R Murugan, "Implementation of Deep Learning Neural Network for Retinal Images", 2020)

 "Deep learning is a collection of algorithms used in machine learning, used to model high-level abstractions in data through the use of model architectures, which are composed of multiple nonlinear transformations. It is part of a broad family of methods used for machine learning that are based on learning representations of data." (Edward T Chen, "Deep Learning and Sustainable Telemedicine", 2020)

"Deep learning is a collection of neural-network techniques that generally use multiple layers." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"Deep learning is a kind of machine learning technique with automatic image interpretation and feature learning facility. The different deep learning algorithms are convolutional neural network (CNN), deep neural network (DNN), recurrent neural network (RNN), genetic adversarial networks (GAN), etc." (Rajandeep Kaur & Rajneesh Rani, "Comparative Study on ASD Identification Using Machine and Deep Learning", 2020)

"Deep learning is a subset of machine learning that models high-level abstractions in data by means of network architectures, which are composed of multiple nonlinear transformations." (Loris Nanni et al, "Digital Recognition of Breast Cancer Using TakhisisNet: An Innovative Multi-Head Convolutional Neural Network for Classifying Breast Ultrasonic Images", 2020)

"In contradistinction to surface or superficial learning, deep learning is inextricably associated with long-term retention of pertinent and solid knowledge, based on a thorough and critical understanding of the object of study, be it curricular content or not." (Leonor M Martínez-Serrano, "The Pedagogical Potential of Design Thinking for CLIL Teaching: Creativity, Critical Thinking, and Deep Learning", 2020)

"Is a group of methods that allow multilayer computing models to work with data that has an abstraction hierarchy." (Heorhii Kuchuk et al, "Application of Deep Learning in the Processing of the Aerospace System's Multispectral Images", 2020)

"It is a part of machine learning intended for learning form large amounts of data, as in the case of experience-based learning. It can be considered that feature engineering in deep learning-based models is partly left to the machine. In the case of artificial neural networks, deep neural networks are expected to have various layers within architectures for solving complex problems with higher accuracy compared to traditional machine learning. Moreover, high performance automatic results are expected without human intervention." (Ana Gavrovska & Andreja Samčović, "Intelligent Automation Using Machine and Deep Learning in Cybersecurity of Industrial IoT", 2020)

"Is a subset of AI and machine learning that uses multi-layered artificial neural networks to learn from data that is unstructured or unlabeled." (Lejla Banjanović-Mehmedović & Fahrudin Mehmedović, "Intelligent Manufacturing Systems Driven by Artificial Intelligence in Industry 4.0", 2020)

"This method is also called as hierarchical learning or deep structured learning. It is one of the machine learning method that is based on learning methods like supervised, semi-supervised or unsupervised. The only difference between deep learning and other machine learning algorithm is that deep learning method uses big data as input." (Anumeera Balamurali & Balamurali Ananthanarayanan,"Develop a Neural Model to Score Bigram of Words Using Bag-of-Words Model for Sentiment Analysis", 2020)

"A form of machine learning which uses multi-layered architectures to automatically learn complex representations of the input data. Deep models deliver state-of-the-art results across many fields, e.g. computer vision and NLP." (Vincent Karas & Björn W Schuller, "Deep Learning for Sentiment Analysis: An Overview and Perspectives", 2021)

"A sub branch of Artificial intelligence in which we built the DL model and we don’t need to specify any feature to the learning model . In case of DL the model will classify the data based on the input data." (Ajay Sharma, "Smart Agriculture Services Using Deep Learning, Big Data, and IoT", 2021)

"A sub-set of machine learning in artificial intelligence (AI) with network capabilities supporting learning unsupervised from unstructured data." (Mark Schofield, "Gamification Tools to Facilitate Student Learning Engagement in Higher Education: A Burden or Blessing?", 2021)

"A subarea of machine learning, which adopts a deeper and more complex neural structure to reach state-of-the-art accuracy in a given problem. Commonly applied in machine learning areas, such as classification and prediction." (Jinnie Shin et al, "Automated Essay Scoring Using Deep Learning Algorithms", 2021)

"A subset of a broader family of machine learning methods that makes use of multiple layers to extract data from raw input in order to learn its features." (R Karthik et al, "Performance Analysis of GAN Architecture for Effective Facial Expression Synthesis", 2021)

"An artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making." (Wissam Abbass et al, "Internet of Things Application for Intelligent Cities: Security Risk Assessment Challenges", 2021)

"Another term for unsupervised learning that includes reinforcement learning in which the machine responds to reaching goals given input data and constraints. Deep learning deals with multiple layers simulating neural networks with ability to process immense amount of data." (Sujata Ramnarayan, "Marketing and Artificial Intelligence: Personalization at Scale", 2021)

"Application of multi neuron, multi-layer neural networks to perform learning tasks." (Revathi Rajendran et al, "Convergence of AI, ML, and DL for Enabling Smart Intelligence: Artificial Intelligence, Machine Learning, Deep Learning, Internet of Things", 2021)

 "Deep learning approach is a subfield of the machine learning technique. The concepts of deep learning influenced by neuron and brain structure based on ANN (Artificial Neural Network)." (Sayani Ghosal & Amita Jain, "Research Journey of Hate Content Detection From Cyberspace", 2021)

"Deep learning is a compilation of algorithms used in machine learning, and used to model high-level abstractions in data through the use of model architectures." (M Srikanth Yadav & R Kalpana, "A Survey on Network Intrusion Detection Using Deep Generative Networks for Cyber-Physical Systems", 2021)

"Deep learning is a subfield of machine learning that uses artificial neural networks to predict, classify, and generate data." (Usama A Khan & Josephine M Namayanja, "Reevaluating Factor Models: Feature Extraction of the Factor Zoo", 2021)

"Deep leaning is a subset of machine learning to solve complex problems/datasets." (R Suganya et al, "A Literature Review on Thyroid Hormonal Problems in Women Using Data Science and Analytics: Healthcare Applications", 2021)

"Deep learning is a type of machine learning that can process a wider range of data resources, requires less data preprocessing by humans, and can often produce more accurate results than traditional machine-learning approaches. In deep learning, interconnected layers of software-based calculators known as 'neurons' form a neural network. The network can ingest vast amounts of input data and process them through multiple layers that learn increasingly complex features of the data at each layer. The network can then make a determination about the data, learn if its determination is correct, and use what it has learned to make determinations about new data. For example, once it learns what an object looks like, it can recognize the object in a new image." (Bistra K Vassileva, "Artificial Intelligence: Concepts and Notions", 2021)

"Deep learning refers to artificial neural networks that mimic the workings of the human brain in the formation of patterns used in data processing and decision-making. Deep learning is a subset of machine learning. They are artificial intelligence networks capable of learning from unstructured or unlabeled data." (Atakan Gerger, "Technologies for Connected Government Implementation: Success Factors and Best Practices", 2021)

"It is a machine learning method using multiple layers of nonlinear processing units to extract features from data." (Sercan Demirci et al, "Detection of Diabetic Retinopathy With Mobile Application Using Deep Learning", 2021)

"It is a subarea of machine learning, where the models are built using multiple layers of artificial neural networks for learning useful patterns from raw data." (Gunjan Ansari et al, "Natural Language Processing in Online Reviews", 2021)

"It is an artificial intelligence technology that imitates the role of the human brain in data processing and the development of decision-making patterns." (Mehmet A Cifci, "Optimizing WSNs for CPS Using Machine Learning Techniques", 2021)

"One part of the broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised." (Jan Bosch et al, "Engineering AI Systems: A Research Agenda", 2021)

"Part of Machine Learning, where methods of higher complexity are used for training data representation." (Andrej Zgank et al, "Embodied Conversation: A Personalized Conversational HCI Interface for Ambient Intelligence", 2021)

"Sub-domain in the field of machine learning that deals with the use of algorithms inspired by human brain cells to solve complex real-world problems." (Shatakshi Singhet al, "A Survey on Intelligence Tools for Data Analytics", 2021)

"This is also a subset of AI where unstructured data is processed using layers of neural networks to identify, predict and detect patterns. Deep learning is used when there is a large amount of unlabeled data and problem is too complex to be solved using machine learning algorithms. Deep learning algorithms are used in computer vision and facial recognition systems." (Vijayaraghavan Varadharajan & Akanksha Rajendra Singh, "Building Intelligent Cities: Concepts, Principles, and Technologies", 2021)

"A rapidly evolving machine learning technique used to build, train, and test neural networks that probabilistically predict outcomes and/or classify unstructured data." (Forrester)

"Deep Learning is a subset of machine learning concerned with large amounts of data with algorithms that have been inspired by the structure and function of the human brain, which is why deep learning models are often referred to as deep neural networks. It is is a part of a broader family of machine learning methods based on learning data representations, as opposed to traditional task-specific algorithms." (Databricks) [source]

"Deep Learning refers to complex multi-layer neural nets.  They are especially suitable for image and voice recognition, and for unsupervised tasks with complex, unstructured data." (Statistics.com)

"is a machine learning methodology where a system learns the patterns in data by automatically learning a hierarchical layer of features. " (Accenture)

Data Science: Classifier (Definitions)

[pattern classifier:] "A neural net to determine whether an input pattern is or is not a member of a particular class. Training data consists of input patterns and the class to which each belongs, but does not require a description of each class; the net forms exemplar vectors for each class as it learns the training patterns." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

[Bayes classifier:] "statistical classification algorithm in which the class borders are determined decision-theoretically, on the basis of class distributions and misclassification costs." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

[nonparametric classifier:] "classification method that is not based on any mathematical functional form for the description of class regions, but directly refers to the available exemplary data." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

[parametric classifier:] "classification method in which the class regions are defined by specified mathematical functions involving free parameters." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

"A set of patterns and rules to assign a class to new examples." (Ching W Wang, "New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data", 2008)

"A structured model that maps unlabeled instances to finite set of classes." (Lior Rokach, "Incorporating Fuzzy Logic in Data Mining Tasks", Encyclopedia of Artificial Intelligence, 2009)

"A decision-supporting system that given an unseen (to-be-classified) input object yields a prediction, for instance, it classifies the given object to a certain class." (Ivan Bruha, "Knowledge Combination vs. Meta-Learning", 2009)

"Algorithm that produces class labels as output, from a set of features of an object. A classifier, for example, is used to classify certain features extracted from a face image and provide a label (an identity of the individual)." (Oscar D Suárez & Gloria B García, "Component Analysis in Artificial Vision" Encyclopedia of Artificial Intelligence, 2009)

"An algorithm to assign unknown object samples to their respective classes. The decision is made according to the classification feature vectors describing the object in question." (Michael Haefner, "Pit Pattern Classification Using Multichannel Features and Multiclassification", 2009)

"function that associates a class c to each input pattern x of interest. A classifier can be directly constructed from a set of pattern examples with their respective classes, or indirectly from a statistical model." (Óscar Pérez & Manuel Sánchez-Montañés, Class Prediction in Test Sets with Shifted Distributions, 2009)

[Naive Bayes classifier:] "A modeling technique where each attribute describes a class independent of any other attributes that also describe that class." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An algorithm that implements classification in the field of machine learning and statistical analysis." (Golnoush Abaei & Ali Selamat, "Important Issues in Software Fault Prediction: A Road Map", 2014)

"A computational method that can be trained using known labeled data for predicting the label of unlabeled data. If there's only two labels (also called classes), the method is called 'detector'." (Addisson Salazar et al, "New Perspectives of Pattern Recognition for Automatic Credit Card Fraud Detection", 2018)

[Naive Bayes classifier:] "A way to classify a data item using Bayes' theorem concerning the conditional probabilities P(A|B)=(P(B|A) * P(A))/P(B). It also assumes that variables in the data are independent, which means that no variable affects the probability of the remaining variables attaining a certain value." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A type of machine learning program that segments a set of cases into different classes or categorizations." (Shalin Hai-Jew, "Methods for Analyzing and Leveraging Online Learning Data", 2019)

"A supervised Data Mining algorithm used to categorize an instance into one of the two or more classes." (Mu L Wong & S Senthil "Development of Accurate and Timely Students' Performance Prediction Model Utilizing Heart Rate Data", 2020)

"A model that can be used to place objects into discrete categories based on some set of features. Classifiers are trained on datasets." (Laurel Powell et al, "Art Innovative Systems for Value Tagging", 2021)

09 March 2018

Data Science: Simulation (Definitions)

"A computer model of part of a real-world system." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours" 3rd Ed., 2001)

"An interactive environment in which features in the environment behave similarly to real-world events." (Ruth C Clark & Chopeta Lyons, "Graphics for Learning", 2004)

"An attempt to represent a real life system via a model to determine how a change in one or more variable affects the rest of the system. It is also called 'what-if' analysis." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"An interactive environment that models a real-world system. Simulations may be conceptual, such as a simulation of genetic inheritance, or operational, such as a flight simulator." ( Ruth C Clark, "Building Expertise: Cognitive Methods for Training and Performance Improvement", 2008)

"A simulation uses a project model that translates the uncertainties specified at a detailed level into their potential impact on objectives that are expressed at the level of the total project. Project simulations use computer models and estimates of risk, usually expressed as a probability distribution of possible costs or durations at a detailed work level, and are typically performed using Monte Carlo analysis." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"An interactive environment in which features in the virtual environment behave similarly to real-world events. Simulations may be conceptual, such as a simulation of genetic inheritance, or operational, such as a flight simulator." (Ruth C Clark & Richard E Mayer, "e-Learning and the Science of Instruction", 2011)

"A process by which processes or models are run repeatedly using a variety of inputs. The outputs are normally captured and analyzed to conduct sensitivity analysis, provide insight around likely potential outcomes, and identify bottlenecks and constraints within existing processes or models." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"The practice of building models based on experts’ views on how the parts of a complicated system work." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"Developing a model of a complex system and experimenting with the model to observe the results" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"An analytical technique that models the combined effect of uncertainties to evaluate their potential impact on objectives." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide)", 2017)

"The representation of selected behavioral characteristics of one physical or abstract system by another system." (ISO 2382/1)

Data Science: Mathematical Modeling (Definitions)

"[mathematical] modeling is an activity, a cognitive activity in which we think about and make models to describe how devices or objects of interest behave." (Clive L Dym & Elizabeth S Ivey, "Principles of Mathematical Modeling", 2004)

"A representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form and expressed using a Mathematical language. Mathematical models can take many forms, including but not limited to dynamical systems, statistical models, differential equations, or game theoretic models." (Ignacio Blanquer & Vicente Hernandez, "Grid Technologies in Epidemiology", 2009)

[conventional *]: "The applied science of creating computerized models. That is a theoretical construct that represents a system composed by set of region of interest, with a set of parameters, both variables together with logical and quantitative relationships between them, by means of mathematical language to describe the behavior of the system." (Gloria Bueno García et al, "Energy Minimizing Active Models in Artificial Vision", Encyclopedia of Artificial Intelligence, 2009) 

"Description of a system using mathematical concepts and language." (Oscar Tamburis et al, "A Mathematical Model to Plan the Adoption of EHR Systems", 2014)

"Mathematical modeling is the application of mathematics to describe real-world problems and investigating important questions that arise from it." (Sandip Banerjee, "Mathematical Modeling: Models, Analysis and Applications", 2014)

"A process that gives a result to a representation of a physical phenomenon using mathematics." (Luis R S González & Avenilde Romo Vázquez, "Didactic Sequences Teaching Mathematics for Engineers With Focus on Differential Equations", 2017)

"Converting real life situations into mathematical concepts and symbols and thereby converting real life problems into mathematical problems." (G Udhaya Sankar & C Ganesa Moorthy, "Network Modelling on Tropical Diseases vs. Climate Change", 2020)

08 March 2018

Data Science: Mathematical Model (Definitions)

"A mathematical model is any complete and consistent set of mathematical equations which are designed to correspond to some other entity, its prototype. The prototype may be a physical, biological, social, psychological or conceptual entity, perhaps even another mathematical model."  (Rutherford Aris, "Mathematical Modelling", 1978)

"The identification and selection of important descriptor variables to be used within an equation or process that can generate useful predictions." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Mathematical model is an abstract model that describes a problem, environment, or system using a mathematical language." (Giusseppi Forgionne & Stephen Russell, "Unambiguous Goal Seeking Through Mathematical Modeling", 2008)

"A set of equations, usually ordinary differential equations, the solution of which gives the time course behaviour of a dynamical system." (Peter Wellstead et al, "Systems and Control Theory for Medical Systems Biology", 2009)

"An abstract model that uses mathematical language to describe the behaviour of a system. Mathematical models are used particularly in the natural sciences and engineering disciplines (such as physics, biology, and electrical engineering) but also in the social sciences (such as economics, sociology and political science). It can be defined as the representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form." (Roberta Alfieri & Luciano Milanesi, "Multi-Level Data Integration and Data Mining in Systems Biology", Handbook of Research on Systems Biology Applications in Medicine, 2009)

"Mathematical description of a physical system. In the framework of this work mathematical models pursue the descriptions of mechanisms underlying stuttering, putting emphasis in the dynamics of neuronal regions involved in the disorder." (Manuel Prado-Velasco & Carlos Fernández-Peruchena "An Advanced Concept of Altered Auditory Feedback as a Prosthesis-Therapy for Stuttering Founded on a Non-Speech Etiologic Paradigm", 2011)

"Simplified description of a real world system in mathematical terms, e. g., by means of differential equations or other suitable mathematical structures." (Benedetto Piccoli, Andrea Tosin, "Vehicular Traffic: A Review of Continuum Mathematical Models" [Mathematics of Complexity and Dynamical Systems, 2012])

"Stated loosely, models are simplified, idealized and approximate representations of the structure, mechanism and behavior of real-world systems. From the standpoint of set-theoretic model theory, a mathematical model of a target system is specified by a nonempty set - called the model’s domain, endowed with some operations and relations, delineated by suitable axioms and intended empirical interpretation." (Zoltan Domotor, "Mathematical Models in Philosophy of Science" [Mathematics of Complexity and Dynamical Systems, 2012])

"The standard view among most theoretical physicists, engineers and economists is that mathematical models are syntactic (linguistic) items, identified with particular systems of equations or relational statements. From this perspective, the process of solving a designated system of (algebraic, difference, differential, stochastic, etc.) equations of the target system, and interpreting the particular solutions directly in the context of predictions and explanations are primary, while the mathematical structures of associated state and orbit spaces, and quantity algebras – although conceptually important, are secondary." (Zoltan Domotor, "Mathematical Models in Philosophy of Science" [Mathematics of Complexity and Dynamical Systems, 2012])

"They are a set of mathematical equations that explain the behaviour of the system under various operating conditions, and determine the dominant factors that govern the rules of the process. Mathematical modeling is also associated with data collection, data interpretation, parameter estimation, optimization, and provide tools for identifying possible approaches to control and for assessing the potential impact of different intervention measures." (Eldon R Rene et al, "ANNs for Identifying Shock Loads in Continuously Operated Biofilters", 2012)

"An abstract representation of the real-world system using mathematical concepts." (R Sridharan & Vinay V Panicker, "Ant Colony Algorithm for Two Stage Supply Chain", 2014)

"Is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modelling. Mathematical models can take many forms, including but not limited to dynamical systems, statistical models, differential equations, or game theoretic models. A model may help to explain a system and to study the effects of different components, and to make predictions about behaviour." (M T Benmessaoud et al, "Modeling and Simulation of a Stand-Alone Hydrogen Photovoltaic Fuel Cell Hybrid System", 2014)

"A mathematical model is a model built using the language and tools of mathematics. A mathematical model is often constructed with the aim to provide predictions on the future ‘state’ of a phenomenon or a system." (Crescenzio Gallo, "Artificial Neural Networks Tutorial", 2015)

"A mathematical model consists of an equation or a set of equations belonging to a certain class of mathematical models to describe the dynamic behavior of the corresponding system. The parameters involved in this mathematical model are related to a certain mathematical structure. This mathematical model is characterized by its class, its structure and its parameters." (Houda Salhi & Samira Kamoun, "State and Parametric Estimation of Nonlinear Systems Described by Wiener Sate-Space Mathematical Models", 2015)

"Description of a system using mathematical concepts and language." (Tomaž Kramberger, "A Contribution to Better Organized Winter Road Maintenance by Integrating the Model in a Geographic Information System", 2015)

"A description of a system using mathematical concepts and language." (Corrado Falcolini, "Algorithms for Geometrical Models in Borromini's San Carlino alle Quattro Fontane", 2016)

"A mathematical model is a mathematical description (often by means of a function or an equation) of a real-world phenomenon such as the size of a population, the demand for a product, the speed of a falling object, the concentration of a product in a chemical reaction, the life expectancy of a person at birth, or the cost of emission reductions. The purpose of the model is to understand the phenomenon and perhaps to make predictions about future behavior. [...] A mathematical model is never a completely accurate representation of a physical situation - it is an idealization." (James Stewart, "Calculus: Early Transcedentals" 8th Ed., 2016)

"Mathematical representation of a system to describe the behavior of certain variables for an indeterminate time." (Sergio S Juárez-Gutiérrez et al, "Temperature Modeling of a Greenhouse Environment", 2016)

"A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences (such as physics, biology, earth science, meteorology) and engineering disciplines (e.g., computer science, artificial intelligence), but also in the social sciences (such as economics, psychology, sociology, and political science); physicists, engineers, statisticians, operations research analysts, and economists use mathematical models most extensively. A model may help to explain a system and to study the effects of different components, and to make predictions about behavior." (Addepalli V N Krishna & M Balamurugan, "Security Mechanisms in Cloud Computing-Based Big Data", 2019)

"A description of a system using mathematical symbols." (José I Gomar-Madriz et al, "An Analysis of the Traveling Speed in the Traveling Hoist Scheduling Problem for Electroplating Processes", 2020)

"An abstract mathematical representation of a process, device, or concept; it uses a number of variables to represent inputs, outputs and internal states, and sets of equations and inequalities to describe their interaction." (Alisher F Narynbaev, "Selection of an Information Source and Methodology for Calculating Solar Resources of the Kyrgyz Republic", 2020)

Data Science: Semantic Network (Definitions)

"We define a semantic network as 'the collection of all the relationships that concepts have to other concepts, to percepts, to procedures, and to motor mechanisms' of the knowledge." (John F Sowa, "Conceptual Structures", 1984)

"A graph for knowledge representation where concepts are represented as nodes in a graph and the binary semantic relations between the concepts are represented by named and directed edges between the nodes. All semantic networks have a declarative graphical representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge." (László Kovács et al, "Ontology-Based Semantic Models for Databases", 2009)

"A graph structure useful to represent the knowledge of a domain. It is composed of a set of objects, the graph nodes, which represent the concepts of the domain, and relations among such objects, the graph arches, which represent the domain knowledge. The semantic networks are also a reasoning tool as it is possible to find relations among the concepts of a semantic network that do not have a direct relation among them. To this aim, it is enough 'to follow the arrows' of the network arches that exit from the considered nodes and find in which node the paths meet." (Mario Ceresa, "Clinical and Biomolecular Ontologies for E-Health", Handbook of Research on Distributed Medical Informatics and E-Health, 2009)

"A form of visualization consisting of vertices (concepts) and directed or undirected edges (relationships)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A term used in computer language processing and in RF and OWL to refer to concepts linked by relationships. Memory maps are an informal example of a semantic network." (Kate Taylor, "A Common Sense Approach to Interoperability", 2011)

"nodes, encapsulating data and information, are connected by edges which include information about how these nodes are related to one another." (Simon Boese et al, "Semantic Document Networks to Support Concept Retrieval", 2014)

"A knowledge representation technique that represents the relationships among objects" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A knowledge base that represents semantic relations between concepts. Formally, the underlying representation model is a directed graph consisting of nodes, which represent concepts, and links, which represent semantic relations between concepts, mapping or connecting semantic fields." (Dmitry Korzun et al, "Semantic Methods for Data Mining in Smart Spaces", 2019)

"A knowledge base that represents semantic relations between concepts in a network. The model of knowledge representation is based on a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields." (Svetlana E Yalovitsyna et al, "Smart Museum: Semantic Approach to Generation and Presenting Information of Museum Collections", 2020)

06 March 2018

Data Science: Bayesian Network (Definitions)

"A mathematic model in graphic form that represents a set of variables and their probabilistic independencies. It can be used, for example, to calculate the probability of a patient having a specific disease." (Attila Benko & Cecília S Lányi, "History of Artificial Intelligence", 2009) 

"A Bayesian network is a set of causally interrelated variables represented graphically in which the input information is generally subjective and can be updated in light of empirical data, by using Bayes’ theorem." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"A type of neural network. The Bayesian network is based on the fundamentals of probability theory." (Meta S Brown, "Data Mining For Dummies", 2014)

"A Bayesian network is a directed acyclical graph (there are no cycles in the graph) that is composed of three basic elements: 
nodes: each feature in a domain is represented by a single node in the graph.
edges: nodes are connected by directed links; the connectivity of the links in a graph encodes the influence and conditional independence relationships between nodes. 
conditional probability tables: each node has a conditional probability table (CPT) associated with it. A CPT lists the probability distribution of the feature represented by the node conditioned on the features represented by the other nodes to which a node is connected by edges." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"A representation of knowledge in the form of a directed acyclic graph representing random variables as nodes and their conditional dependencies as edges." (Petr Berka, "Machine Learning", 2015)

"They are acyclic graphical models that capture conditional dependence among random variables. Each node is associated with a function that gives the probability of finding the variable in a given state, given particular states of its parent variables." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"A graph model representing random variables with their conditional dependencies." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A particular type of statistical model that represents a set of variables and their conditional dependencies. It is usually used to make previsions in a great variety of events." (Gaetano B Ronsivalle & Arianna Boldi, "Artificial Intelligence Applied: Six Actual Projects in Big Organizations", 2019)

"A model that represents and calculates the probabilistic relationships between a set of random variables and an uncertain domain via a directed acyclic graph." (Accenture)

"Bayesian Neural Networks (BNNs) refers to extending standard networks with posterior inference in order to control over-fitting. From a broader perspective, the Bayesian approach uses the statistical methodology so that everything has a probability distribution attached to it, including model parameters (weights and biases in neural networks). In programming languages, variables that can take a specific value will turn the same result every-time you access that specific variable." (Databricks) [source]

04 March 2018

Data Science: Delphi Method (Definitions)

 "A qualitative forecasting method that seeks to use the judgment of experts systematically in arriving at a forecast of what future events will be or when they may occur. It brings together a group of experts who have access to each other's opinions in an environment where no majority opinion is disclosed." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"A systematic forecasting practice that seeks input or advice from a panel of experts. Each expert provides their forecast input in a successive series of rounds, until consensus is achieved." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"A systematic, interactive forecasting method that relies on a panel of experts. The experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts’ forecasts from the previous round as well as the reasons they provided for their judgments." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Data collection method that happens in an anonymous fashion." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A structured communication technique used to conduct interactive forecasting. It involves a panel of experts." (IQBBA)

Related Posts Plugin for WordPress, Blogger...