14 March 2018

🔬Data Science: Generalization (Definitions)

"The ability of a neural net to produce reasonable responses to input patterns that are similar, but not identical, to training patterns. A balance between memorization and generalization is usually desired." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"The ability of an information system to process new, unknown input data in order to obtain the best possible solution, or one close to it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The ability of a neural computing system to generalize from the input/output examples it was trained on to produce a sensible output to a previously unseen input. Compromise of the variance-bias dilemma." (Guido J Deboeck and Teuvo Kohonen, "Visual explorations in finance with self-organizing maps", 2000)

"way of responding ill the same way to a class of inputs, some of which do not belong to the training set of the same class." (Teuvo Kohonen, "Self-Organizing Maps 3rd Ed.", 2001)

"The process of creating a model based on specific instances that is an acceptable predictor of other instances." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

09 March 2018

🔬Data Science: Simulation (Definitions)

"A computer model of part of a real-world system." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours" 3rd Ed., 2001)

"An interactive environment in which features in the environment behave similarly to real-world events." (Ruth C Clark & Chopeta Lyons, "Graphics for Learning", 2004)

"An attempt to represent a real life system via a model to determine how a change in one or more variable affects the rest of the system. It is also called 'what-if' analysis." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"An interactive environment that models a real-world system. Simulations may be conceptual, such as a simulation of genetic inheritance, or operational, such as a flight simulator." ( Ruth C Clark, "Building Expertise: Cognitive Methods for Training and Performance Improvement", 2008)

"A simulation uses a project model that translates the uncertainties specified at a detailed level into their potential impact on objectives that are expressed at the level of the total project. Project simulations use computer models and estimates of risk, usually expressed as a probability distribution of possible costs or durations at a detailed work level, and are typically performed using Monte Carlo analysis." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"An interactive environment in which features in the virtual environment behave similarly to real-world events. Simulations may be conceptual, such as a simulation of genetic inheritance, or operational, such as a flight simulator." (Ruth C Clark & Richard E Mayer, "e-Learning and the Science of Instruction", 2011)

"A process by which processes or models are run repeatedly using a variety of inputs. The outputs are normally captured and analyzed to conduct sensitivity analysis, provide insight around likely potential outcomes, and identify bottlenecks and constraints within existing processes or models." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"The practice of building models based on experts’ views on how the parts of a complicated system work." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"Developing a model of a complex system and experimenting with the model to observe the results" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"An analytical technique that models the combined effect of uncertainties to evaluate their potential impact on objectives." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide)", 2017)

"The representation of selected behavioral characteristics of one physical or abstract system by another system." (ISO 2382/1)

🔬Data Science: Mathematical Modeling (Definitions)

"[mathematical] modeling is an activity, a cognitive activity in which we think about and make models to describe how devices or objects of interest behave." (Clive L Dym & Elizabeth S Ivey, "Principles of Mathematical Modeling", 2004)

"A representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form and expressed using a Mathematical language. Mathematical models can take many forms, including but not limited to dynamical systems, statistical models, differential equations, or game theoretic models." (Ignacio Blanquer & Vicente Hernandez, "Grid Technologies in Epidemiology", 2009)

[conventional *]: "The applied science of creating computerized models. That is a theoretical construct that represents a system composed by set of region of interest, with a set of parameters, both variables together with logical and quantitative relationships between them, by means of mathematical language to describe the behavior of the system." (Gloria Bueno García et al, "Energy Minimizing Active Models in Artificial Vision", Encyclopedia of Artificial Intelligence, 2009) 

"Description of a system using mathematical concepts and language." (Oscar Tamburis et al, "A Mathematical Model to Plan the Adoption of EHR Systems", 2014)

"Mathematical modeling is the application of mathematics to describe real-world problems and investigating important questions that arise from it." (Sandip Banerjee, "Mathematical Modeling: Models, Analysis and Applications", 2014)

"A process that gives a result to a representation of a physical phenomenon using mathematics." (Luis R S González & Avenilde Romo Vázquez, "Didactic Sequences Teaching Mathematics for Engineers With Focus on Differential Equations", 2017)

"Converting real life situations into mathematical concepts and symbols and thereby converting real life problems into mathematical problems." (G Udhaya Sankar & C Ganesa Moorthy, "Network Modelling on Tropical Diseases vs. Climate Change", 2020)

08 March 2018

🔬Data Science: Mathematical Model (Definitions)

"A mathematical model is any complete and consistent set of mathematical equations which are designed to correspond to some other entity, its prototype. The prototype may be a physical, biological, social, psychological or conceptual entity, perhaps even another mathematical model."  (Rutherford Aris, "Mathematical Modelling", 1978)

"The identification and selection of important descriptor variables to be used within an equation or process that can generate useful predictions." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Mathematical model is an abstract model that describes a problem, environment, or system using a mathematical language." (Giusseppi Forgionne & Stephen Russell, "Unambiguous Goal Seeking Through Mathematical Modeling", 2008)

"A set of equations, usually ordinary differential equations, the solution of which gives the time course behaviour of a dynamical system." (Peter Wellstead et al, "Systems and Control Theory for Medical Systems Biology", 2009)

"An abstract model that uses mathematical language to describe the behaviour of a system. Mathematical models are used particularly in the natural sciences and engineering disciplines (such as physics, biology, and electrical engineering) but also in the social sciences (such as economics, sociology and political science). It can be defined as the representation of the essential aspects of an existing system (or a system to be constructed) which presents knowledge of that system in usable form." (Roberta Alfieri & Luciano Milanesi, "Multi-Level Data Integration and Data Mining in Systems Biology", Handbook of Research on Systems Biology Applications in Medicine, 2009)

"Mathematical description of a physical system. In the framework of this work mathematical models pursue the descriptions of mechanisms underlying stuttering, putting emphasis in the dynamics of neuronal regions involved in the disorder." (Manuel Prado-Velasco & Carlos Fernández-Peruchena "An Advanced Concept of Altered Auditory Feedback as a Prosthesis-Therapy for Stuttering Founded on a Non-Speech Etiologic Paradigm", 2011)

"Simplified description of a real world system in mathematical terms, e. g., by means of differential equations or other suitable mathematical structures." (Benedetto Piccoli, Andrea Tosin, "Vehicular Traffic: A Review of Continuum Mathematical Models" [Mathematics of Complexity and Dynamical Systems, 2012])

"Stated loosely, models are simplified, idealized and approximate representations of the structure, mechanism and behavior of real-world systems. From the standpoint of set-theoretic model theory, a mathematical model of a target system is specified by a nonempty set - called the model’s domain, endowed with some operations and relations, delineated by suitable axioms and intended empirical interpretation." (Zoltan Domotor, "Mathematical Models in Philosophy of Science" [Mathematics of Complexity and Dynamical Systems, 2012])

"The standard view among most theoretical physicists, engineers and economists is that mathematical models are syntactic (linguistic) items, identified with particular systems of equations or relational statements. From this perspective, the process of solving a designated system of (algebraic, difference, differential, stochastic, etc.) equations of the target system, and interpreting the particular solutions directly in the context of predictions and explanations are primary, while the mathematical structures of associated state and orbit spaces, and quantity algebras – although conceptually important, are secondary." (Zoltan Domotor, "Mathematical Models in Philosophy of Science" [Mathematics of Complexity and Dynamical Systems, 2012])

"They are a set of mathematical equations that explain the behaviour of the system under various operating conditions, and determine the dominant factors that govern the rules of the process. Mathematical modeling is also associated with data collection, data interpretation, parameter estimation, optimization, and provide tools for identifying possible approaches to control and for assessing the potential impact of different intervention measures." (Eldon R Rene et al, "ANNs for Identifying Shock Loads in Continuously Operated Biofilters", 2012)

"An abstract representation of the real-world system using mathematical concepts." (R Sridharan & Vinay V Panicker, "Ant Colony Algorithm for Two Stage Supply Chain", 2014)

"Is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modelling. Mathematical models can take many forms, including but not limited to dynamical systems, statistical models, differential equations, or game theoretic models. A model may help to explain a system and to study the effects of different components, and to make predictions about behaviour." (M T Benmessaoud et al, "Modeling and Simulation of a Stand-Alone Hydrogen Photovoltaic Fuel Cell Hybrid System", 2014)

"A mathematical model is a model built using the language and tools of mathematics. A mathematical model is often constructed with the aim to provide predictions on the future ‘state’ of a phenomenon or a system." (Crescenzio Gallo, "Artificial Neural Networks Tutorial", 2015)

"A mathematical model consists of an equation or a set of equations belonging to a certain class of mathematical models to describe the dynamic behavior of the corresponding system. The parameters involved in this mathematical model are related to a certain mathematical structure. This mathematical model is characterized by its class, its structure and its parameters." (Houda Salhi & Samira Kamoun, "State and Parametric Estimation of Nonlinear Systems Described by Wiener Sate-Space Mathematical Models", 2015)

"Description of a system using mathematical concepts and language." (Tomaž Kramberger, "A Contribution to Better Organized Winter Road Maintenance by Integrating the Model in a Geographic Information System", 2015)

"A description of a system using mathematical concepts and language." (Corrado Falcolini, "Algorithms for Geometrical Models in Borromini's San Carlino alle Quattro Fontane", 2016)

"A mathematical model is a mathematical description (often by means of a function or an equation) of a real-world phenomenon such as the size of a population, the demand for a product, the speed of a falling object, the concentration of a product in a chemical reaction, the life expectancy of a person at birth, or the cost of emission reductions. The purpose of the model is to understand the phenomenon and perhaps to make predictions about future behavior. [...] A mathematical model is never a completely accurate representation of a physical situation - it is an idealization." (James Stewart, "Calculus: Early Transcedentals" 8th Ed., 2016)

"Mathematical representation of a system to describe the behavior of certain variables for an indeterminate time." (Sergio S Juárez-Gutiérrez et al, "Temperature Modeling of a Greenhouse Environment", 2016)

"A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences (such as physics, biology, earth science, meteorology) and engineering disciplines (e.g., computer science, artificial intelligence), but also in the social sciences (such as economics, psychology, sociology, and political science); physicists, engineers, statisticians, operations research analysts, and economists use mathematical models most extensively. A model may help to explain a system and to study the effects of different components, and to make predictions about behavior." (Addepalli V N Krishna & M Balamurugan, "Security Mechanisms in Cloud Computing-Based Big Data", 2019)

"A description of a system using mathematical symbols." (José I Gomar-Madriz et al, "An Analysis of the Traveling Speed in the Traveling Hoist Scheduling Problem for Electroplating Processes", 2020)

"An abstract mathematical representation of a process, device, or concept; it uses a number of variables to represent inputs, outputs and internal states, and sets of equations and inequalities to describe their interaction." (Alisher F Narynbaev, "Selection of an Information Source and Methodology for Calculating Solar Resources of the Kyrgyz Republic", 2020)

🔬Data Science: Semantic Network [SN] (Definitions)

"We define a semantic network as 'the collection of all the relationships that concepts have to other concepts, to percepts, to procedures, and to motor mechanisms' of the knowledge." (John F Sowa, "Conceptual Structures", 1984)

"A graph for knowledge representation where concepts are represented as nodes in a graph and the binary semantic relations between the concepts are represented by named and directed edges between the nodes. All semantic networks have a declarative graphical representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge." (László Kovács et al, "Ontology-Based Semantic Models for Databases", 2009)

"A graph structure useful to represent the knowledge of a domain. It is composed of a set of objects, the graph nodes, which represent the concepts of the domain, and relations among such objects, the graph arches, which represent the domain knowledge. The semantic networks are also a reasoning tool as it is possible to find relations among the concepts of a semantic network that do not have a direct relation among them. To this aim, it is enough 'to follow the arrows' of the network arches that exit from the considered nodes and find in which node the paths meet." (Mario Ceresa, "Clinical and Biomolecular Ontologies for E-Health", Handbook of Research on Distributed Medical Informatics and E-Health, 2009)

"A form of visualization consisting of vertices (concepts) and directed or undirected edges (relationships)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A term used in computer language processing and in RF and OWL to refer to concepts linked by relationships. Memory maps are an informal example of a semantic network." (Kate Taylor, "A Common Sense Approach to Interoperability", 2011)

"nodes, encapsulating data and information, are connected by edges which include information about how these nodes are related to one another." (Simon Boese et al, "Semantic Document Networks to Support Concept Retrieval", 2014)

"A knowledge representation technique that represents the relationships among objects" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A knowledge base that represents semantic relations between concepts. Formally, the underlying representation model is a directed graph consisting of nodes, which represent concepts, and links, which represent semantic relations between concepts, mapping or connecting semantic fields." (Dmitry Korzun et al, "Semantic Methods for Data Mining in Smart Spaces", 2019)

"A knowledge base that represents semantic relations between concepts in a network. The model of knowledge representation is based on a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields." (Svetlana E Yalovitsyna et al, "Smart Museum: Semantic Approach to Generation and Presenting Information of Museum Collections", 2020)

06 March 2018

🔬Data Science: Bayesian Network (Definitions)

"A mathematic model in graphic form that represents a set of variables and their probabilistic independencies. It can be used, for example, to calculate the probability of a patient having a specific disease." (Attila Benko & Cecília S Lányi, "History of Artificial Intelligence", 2009) 

"A Bayesian network is a set of causally interrelated variables represented graphically in which the input information is generally subjective and can be updated in light of empirical data, by using Bayes’ theorem." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"A type of neural network. The Bayesian network is based on the fundamentals of probability theory." (Meta S Brown, "Data Mining For Dummies", 2014)

"A Bayesian network is a directed acyclical graph (there are no cycles in the graph) that is composed of three basic elements: 
nodes: each feature in a domain is represented by a single node in the graph.
edges: nodes are connected by directed links; the connectivity of the links in a graph encodes the influence and conditional independence relationships between nodes. 
conditional probability tables: each node has a conditional probability table (CPT) associated with it. A CPT lists the probability distribution of the feature represented by the node conditioned on the features represented by the other nodes to which a node is connected by edges." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"A representation of knowledge in the form of a directed acyclic graph representing random variables as nodes and their conditional dependencies as edges." (Petr Berka, "Machine Learning", 2015)

"They are acyclic graphical models that capture conditional dependence among random variables. Each node is associated with a function that gives the probability of finding the variable in a given state, given particular states of its parent variables." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"A graph model representing random variables with their conditional dependencies." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A particular type of statistical model that represents a set of variables and their conditional dependencies. It is usually used to make previsions in a great variety of events." (Gaetano B Ronsivalle & Arianna Boldi, "Artificial Intelligence Applied: Six Actual Projects in Big Organizations", 2019)

"A model that represents and calculates the probabilistic relationships between a set of random variables and an uncertain domain via a directed acyclic graph." (Accenture)

"Bayesian Neural Networks (BNNs) refers to extending standard networks with posterior inference in order to control over-fitting. From a broader perspective, the Bayesian approach uses the statistical methodology so that everything has a probability distribution attached to it, including model parameters (weights and biases in neural networks). In programming languages, variables that can take a specific value will turn the same result every-time you access that specific variable." (Databricks) [source]

05 March 2018

🔬Data Science: Business Analysis (Definitions)

 "(1) The study of business processes, practices and business systems requirements. (2) The application of information to better understand business opportunities and challenges." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A set of tools and methods used for execrating business insight making from the available data or system structure. It provide meaningful information with dynamic and sophisticate methods of problem solving such as optimization." (Shokoufeh Mirzaei, Defining a Business-Driven Optimization Problem, 2014)

"Business analytics is the combination of skills, technologies, applications, and processes used by organizations to gain insight into their business-based data and statistics to drive business planning." (K Hariharanath, "BIG Data: An Enabler in Developing Business Models in Cloud Computing Environments", 2019)

"It is the process of working with factual information in organizations, using suitable tools and techniques to identify the nuggets of wisdom (insights) from them that can have direct impact on influencing good decision making." (Tanushri Banerjee & Arindam Banerjee, "Designing a Business Analytics Culture in Organizations in India", 2021)

"Business analysis is the means through which operational problems and issues are systematically identified and investigated, different approaches are evaluated, and optimal solutions are determined." (Qlik) [source]

"The set of tasks, knowledge, tools and techniques required to identify business needs and determine solutions to business problems" (Business Analysis BOK) 

04 March 2018

🔬Data Science: Delphi Method (Definitions)

"A qualitative forecasting method that seeks to use the judgment of experts systematically in arriving at a forecast of what future events will be or when they may occur. It brings together a group of experts who have access to each other's opinions in an environment where no majority opinion is disclosed." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"A systematic forecasting practice that seeks input or advice from a panel of experts. Each expert provides their forecast input in a successive series of rounds, until consensus is achieved." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"A systematic, interactive forecasting method that relies on a panel of experts. The experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts’ forecasts from the previous round as well as the reasons they provided for their judgments." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"Data collection method that happens in an anonymous fashion." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A structured communication technique used to conduct interactive forecasting. It involves a panel of experts." (IQBBA)

🔬Data Science: Descriptive Statistics (Definitions)

"Numbers that summarize how questionnaire items were answered. Descriptive statistics include frequency, percentage, cumulative frequency, and cumulative percentage." (Teri Lund & Susan Barksdale, "10 Steps to Successful Strategic Planning", 2006)

"Statistics that characterize the central tendency, variability, and shape of a variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Describe the values in a set. For example, if you sum a set of values, that sum is a descriptive statistic. If you find the largest value or the smallest value in a set of numbers, that’s also a descriptive statistic." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)

"Those statistics or statistical procedures that summarise and/or describe the characteristics of a sample of scores." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

🔬Data Science: Fuzzy Rule (Definitions)

"A conditional of the form IF X IS A, THEN Y IS B where A and B are fuzzy sets. In mathematical terms a rule is a relation between fuzzy sets. Each rule defines a fuzzy patch (the product A x B) in the system 'state space'. The wider the fuzzy sets A and B, the wider and more uncertain the fuzzy patch. Fuzzy rules are the knowledge-building blocks in a fuzzy system. In mathematical terms each fuzzy rule acts as an associative memory that associates the fuzzy response B with the fuzzy stimulus A." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)

"In general, in rule-based systems, rules look something like: If A1 and A2 and … An then C1 and C2 and … Cm; where the Ai are the antecedents (conditions) on the left hand side (LHS) of the rule and the Cj are the consequents (conclusions) on the right hand side (RHS) of the rule. In this format, if all of the antecedents on the LHS of the rule are true then the rule will fire and the consequents will be asserted / executed. With Fuzzy rules both antecedents and conclusions can be of fuzzy nature." (Juan R González et al, Nature-Inspired Cooperative Strategies for Optimization, 2008)

"Fuzzy If-Then or fuzzy conditional statements are expressions of the form 'If A Then B', where A and B are labels of fuzzy sets characterised by appropriate membership functions. Due to their concise form, fuzzy If-Then rules are often employed to capture the imprecise modes of reasoning that play an essential role in the human ability to make decision in an environment of uncertainty and imprecision. The set of If-Then rules relate to a fuzzy logic system that are stored together is called a Fuzzy Rule Base." (Masoud Mohammadian, Supervised Learning of Fuzzy Logic Systems, 2009)

02 March 2018

🔬Data Science: Hash Function (Definition)

"A function that maps a set of keys onto a set of addresses." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A function that maps a string of arbitrary length to a fixed size value in a deterministic manner. Such a function may or may not have cryptographic applications." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

[cryptographic hash function:] "A function that takes an input string of arbitrary length and produces a fixed-size output for which it is unfeasible to find two inputs that map to the same output, and it is unfeasible to learn anything about the input from the output." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

[one-way hash function:] "A hash function for which it is computationally unfeasible to determine anything about the input from the output." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"A function that operates on an arbitrary-length input value and returns a fixed-length hash value." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

[one-way hash:] "A one-way hash is an algorithm that transforms one string into another string (a fixed-length sequence of seemingly random characters) in such a way that the original string cannot be calculated by operations on the one-way hash value (i.e., the calculation is one way only). One-way hash values can be calculated for any string, including a person’s name, a document, or an image. For any input string, the resultant one-way hash will always be the same. If a single byte of the input string is modified, the resulting one-way hash will be changed and will have a totally different sequence than the one-way hash sequence calculated for the unmodified string. One-way hash values can be made sufficiently long (e.g., 256 bits) that a hash string collision (i.e., the occurrence of two different input strings with the same one-way hash output value) is negligible." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"A hash function is an algorithm that maps from an input, for example, a string of characters, to an output string. The size of the input can vary, but the size of the output is always the same." (Dan Sullivan, "NoSQL for Mere Mortals®", 2015)

[one-way hash:] "Cryptographic process that takes an arbitrary amount of data and generates a fixed-length value. Used for integrity protection." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A function that takes as input the key of an element and produces an integer as output" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"encryption methods that use no keys." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"A function that operates on an arbitrary-length input value and returns a fixed-length hash value." (Oracle, "Oracle Database Concepts")

28 February 2018

🔬Data Science: Inference (Definitions)

"Drawing some form of conclusion about a measurable functional response based on representative or sample experimental data. Sample size, uncertainty, and the laws of probability play a major role in making inferences." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"Reasoning from known propositions." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"In general, inference is the act or process of deriving new facts from facts known or assumed to be true. In Artificial Intelligence, researchers develop automated inference engines to automate human inference." (Michael Fellmann et al, "Supporting Semantic Verification of Process Models", 2012)

[statistical inference:] "A method that uses sample data to draw conclusions about a population." (Geoff Cumming, "Understanding The New Statistics", 2013)

"Any conclusion drawn on the basis of some set of information. In research, we draw inferences on the basis of empirical data we collect and ideas we construct." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

[causal inference:] "Conclusion that changes in the independent variable resulted in a change in the dependent variable. It may be drawn only if all potential confounding variables are properly controlled." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"The process of using a probabilistic model to answer a query, given evidence." (Avi Pfeffer, "Practical Probabilistic Programming", 2016)

[inductive inference] "A machine learning method for learning the rules that produced the actual data." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"The ability to derive information not explicitly available." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

27 February 2018

🔬Data Science: Data Modeling (Definitions)

"The task of developing a data model that represents the persistent data of some enterprise." (Keith Gordon, "Principles of Data Management", 2007)

"An analysis and design method, building data models to 
a) define and analyze data requirements,
b) design logical and physical data structures that support these requirements, and
c) define business and technical meta-data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of creating a data model by applying formal data model descriptions using data modeling techniques." (Christian Galinski & Helmut Beckmann, "Concepts for Enhancing Content Quality and eAccessibility: In General and in the Field of eProcurement", 2012)

"The process of creating the abstract representation of a subject so that it can be studied more cheaply (a scale model of an airplane in a wind tunnel), at a particular moment in time (weather forecasting), or manipulated, modified, and altered without disrupting the original (economic model)." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"A method used to define and analyze the data requirements needed to support an entity’s business processes, defining the relationship between data elements and structures." (Solutions Review)

"A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and data structures. (Microstrategy)

"A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and structures." (Information Management)

"Refers to the process of defining, analyzing, and structuring data within data models." (Insight Software)

"Data modeling is a way of mapping out and visualizing all the different places that a software or application stores information, and how these sources of data will fit together and flow into one another." (Sisense) [source]

"Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. The diagram can be used to ensure efficient use of data, as a blueprint for the construction of new software or for re-engineering a legacy application." (Techtarget) [source]

24 February 2018

💎SQL Reloaded: Misusing Views and Pseudo-Constants

Views as virtual tables can be misused to replace tables in certain circumstances, either by storing values within one or multiple rows, like in the below examples:

-- parameters for a BI solution
CREATE VIEW dbo.vLoV_Parameters
AS
SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
 , Cast(GetDate() as Date) AS CurrentDate 
 , Cast(100 as int) AS BatchCount 

GO

SELECT *
FROM dbo.vLoV_Parameters

GO

-- values for a dropdown 
CREATE VIEW dbo.vLoV_DataAreas
AS
SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
, Cast('Company ABC' as nvarchar(50)) AS Description 
UNION ALL
SELECT 'XYZ' DataAreaId 
, 'Company XYZ'

GO

SELECT *
FROM dbo.vLoV_DataAreas

GO

These solutions aren’t elegant, and typically not recommended because they go against one of the principles of good database design, namely “data belong in tables”, though they do the trick when needed. Personally, I used them only in a handful of cases, e.g. when it wasn’t allowed to create tables, when it was needed testing something for a short period of time, or when there was some overhead of creating a table for 2-3 values. Because of their scarce use, I haven’t given them too much thought, not until I discovered Jared Ko’s blog posting on pseudo-constants. He considers the values from the first view as pseudo-constants, and advocates for their use especially for easier dependency tracking, easier code refactoring, avoiding implicit data conversion and easier maintenance of values.

All these are good reasons to consider them, therefore I tried to take further the idea to see if it survives a reality check. For this I took Dynamics AX as testing environment, as it makes extensive use of enumerations (aka base enums) to store list of values needed allover through the application. Behind each table there are one or more enumerations, the tables storing master data abounding of them.  For exemplification let’s consider InventTrans, table that stores the inventory transactions, the logic that governs the receipt and issued transactions are governed by three enumerations: StatusIssue, StatusReceipt and Direction.

-- Status Issue Enumeration 
 CREATE VIEW dbo.vLoV_StatusIssue
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Sold
 , cast(2 as int) AS Deducted
 , cast(3 as int) AS Picked
 , cast(4 as int) AS ReservPhysical
 , cast(5 as int) AS ReservOrdered
 , cast(6 as int) AS OnOrder
 , cast(7 as int) AS QuotationIssue

GO

-- Status Receipt Enumeration 
 CREATE VIEW dbo.vLoV_StatusReceipt
 AS
SELECT cast(0 as int) AS None
 , cast(1 as int) AS Purchased
 , cast(2 as int) AS Received
 , cast(3 as int) AS Registered
 , cast(4 as int) AS Arrived
 , cast(5 as int) AS Ordered
 , cast(6 as int) AS QuotationReceipt

GO

-- Inventory Direction Enumeration 
 CREATE VIEW dbo.vLoV_InventDirection
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Receipt
 , cast(2 as int) AS Issue

To see these views at work let’s construct the InventTrans table on the fly:

-- creating an ad-hoc table  
SELECT *
INTO  dbo.InventTrans
FROM (VALUES (1, 1, 0, 2, -1, 'A0001')
, (2, 1, 0, 2, -10, 'A0002')
, (3, 2, 0, 2, -6, 'A0001')
, (4, 2, 0, 2, -3, 'A0002')
, (5, 3, 0, 2, -2, 'A0001')
, (6, 1, 0, 1, 1, 'A0001')
, (7, 0, 1, 1, 50, 'A0001')
, (8, 0, 2, 1, 100, 'A0002')
, (9, 0, 3, 1, 30, 'A0003')
, (10, 0, 3, 1, 20, 'A0004')
, (11, 0, 1, 2, 10, 'A0001')
) A(TransId, StatusIssue, StatusReceipt, Direction, Qty, ItemId)

 Here are two sets of examples using literals vs. pseudo-constants:

--example issued with literals 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusIssue = 1 
  AND ITR.Direction = 2

GO
--example issued with pseudo-constants
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusIssue SI
        ON ITR.StatusIssue = SI.Sold
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Issue

GO

--example receipt with literals 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt= 1
   AND ITR.Direction = 1

GO

--example receipt with pseudo-constants
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusReceipt SR
        ON ITR.StatusReceipt= SR.Purchased
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Receipt

As can be seen the queries using pseudo-constants make the code somehow readable, though the gain is only relative, each enumeration implying an additional join. In addition, when further business tables are added to the logic (e.g. items, purchases or sales orders)  it complicates the logic, making it more difficult to separate the essential from nonessential. Imagine a translation of the following query:

-- complex query 
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
              <several tables here>
WHERE ((ITR.StatusReceipt<=3 AND ITR.Direction = 1)
  OR (ITR.StatusIssue<=3 AND ITR.Direction = 2))
  AND (<more constraints here>)


The more difficult the constraints in the WHERE clause, the more improbable is a translation of the literals into pseudo-constraints. Considering that an average query contains 5-10 tables, each of them with 1-3 enumerations, the queries would become impracticable by using pseudo-constants and quite difficult to troubleshoot their execution plans.

The more I’m thinking about, an enumeration data type as global variable in SQL Server (like the ones available in VB) would be more than welcome, especially because values are used over and over again through the queries. Imagine, for example, the possibility of writing code as follows:

-- hypothetical query
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt = @@StatusReceipt .Purchased
  AND ITR.Direction = @@InventDirection.Receipt

From my point of view this would make the code more readable and easier to maintain. Instead, in order to make the code more readable, one’s usually forced to add some comments in the code. This works as well, though the code can become full of comments.

-- query with commented literals
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt <=3 -- Purchased, Received, Registered 
   AND ITR.Direction = 1-- Receip

In conclusion, pseudo-constants’ usefulness is only limited, and their usage is  against developers’ common sense, however a data type in SQL Server with similar functionality would make code more readable and easier to maintain.

Notes:
1) It is possible to simulate an enumeration data type in tables’ definition by using a CHECK constraint.
2) The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code. 

Happy coding!

19 February 2018

🔬Data Science: Data Exploration (Definitions)

Data exploration: "The process of examining data in order to determine ranges and patterns within the data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

Data Exploration: "The part of the data science process where a scientist will ask basic questions that helps her understand the context of a data set. What you learn during the exploration phase will guide more in-depth analysis later. Further, it helps you recognize when a result might be surprising and warrant further investigation." (KDnuggets)

"Data exploration is the first step of data analysis used to explore and visualize data to uncover insights from the start or identify areas or patterns to dig into more." (Tibco) [source]

"Data exploration is the initial step in data analysis, where users explore a large data set in an unstructured way to uncover initial patterns, characteristics, and points of interest. This process isn’t meant to reveal every bit of information a dataset holds, but rather to help create a broad picture of important trends and major points to study in greater detail." (Sisense) [source]

"Data exploration is the process through which a data analyst investigates the characteristics of a dataset to better understand the data contained within and to define basic metadata before building a data model. Data exploration helps the analyst choose the most appropriate tool for data processing and analysis, and leverages the innate human ability to recognize patterns in data that may not be captured by analytics tools." (Qlik) [source]

"Data exploration provides a first glance analysis of available data sources. Rather than trying to deliver precise insights such as those that result from data analytics, data exploration focuses on identifying key trends and significant variables." (Xplenty) [source]
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.