"The set of tasks, knowledge, tools and techniques required to identify business needs and determine solutions to business problems" (Business Analysis BOK)
A Software Engineer and data professional's blog on SQL, data, databases, data architectures, data management, programming, Software Engineering, Project Management, ERP implementation and other IT related topics.
Pages
- 🏠Home
- 🗃️Definitions
- 🔢SQL Server
- 🎞️SQL Server: VoD
- 🏭Fabric
- 🎞️Fabric: VoD
- ⚡Power BI
- 🎞️Power BI: VoD
- 📚Data
- 📚Engineering
- 📚Management
- 📚SQL Server
- 🎞️D365: VoD
- 📚Systems Thinking
- ✂...Quotes
- 🧾D365: GL
- 💸D365: AP
- 💰D365: AR
- 🏠D365: FA
- 👥D365: HR
- ⛓️D365: SCM
- 🔤Acronyms
- 🪢Experts
- 🗃️Quotes
- 🔠Dataviz & BI
- 🔠D365
- 🔠Fabric
- 🔠Engineering
- 🔠Management
- 🔡Glossary
- 🌐Resources
- 🏺Dataviz
- 🗺️Social
- 📅Events
- ℹ️ About
05 March 2018
🔬Data Science: Business Analysis (Definitions)
04 March 2018
🔬Data Science: Delphi Method (Definitions)
"A qualitative forecasting method that seeks to use the judgment of experts systematically in arriving at a forecast of what future events will be or when they may occur. It brings together a group of experts who have access to each other's opinions in an environment where no majority opinion is disclosed." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)
"A systematic forecasting practice that seeks input or advice from a panel of experts. Each expert provides their forecast input in a successive series of rounds, until consensus is achieved." (Steven Haines, "The Product Manager's Desk Reference", 2008)
"A systematic, interactive forecasting method that relies on a panel of experts. The experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts’ forecasts from the previous round as well as the reasons they provided for their judgments." (Project Management Institute, "Practice Standard for Project Estimating", 2010)
"Data collection method that happens in an anonymous fashion." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)
"A structured communication technique used to conduct interactive forecasting. It involves a panel of experts." (IQBBA)
🔬Data Science: Descriptive Statistics (Definitions)
"Numbers that summarize how questionnaire items were answered. Descriptive statistics include frequency, percentage, cumulative frequency, and cumulative percentage." (Teri Lund & Susan Barksdale, "10 Steps to Successful Strategic Planning", 2006)
"Statistics that characterize the central tendency, variability, and shape of a variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)
"Describe the values in a set. For example, if you sum a set of values, that sum is a descriptive statistic. If you find the largest value or the smallest value in a set of numbers, that’s also a descriptive statistic." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)
"Those statistics or statistical procedures that summarise and/or describe the characteristics of a sample of scores." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)
🔬Data Science: Fuzzy Rule (Definitions)
"A conditional of the form IF X IS A, THEN Y IS B where A and B are fuzzy sets. In mathematical terms a rule is a relation between fuzzy sets. Each rule defines a fuzzy patch (the product A x B) in the system 'state space'. The wider the fuzzy sets A and B, the wider and more uncertain the fuzzy patch. Fuzzy rules are the knowledge-building blocks in a fuzzy system. In mathematical terms each fuzzy rule acts as an associative memory that associates the fuzzy response B with the fuzzy stimulus A." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)
"In general, in rule-based systems, rules look something like: If A1 and A2 and … An then C1 and C2 and … Cm; where the Ai are the antecedents (conditions) on the left hand side (LHS) of the rule and the Cj are the consequents (conclusions) on the right hand side (RHS) of the rule. In this format, if all of the antecedents on the LHS of the rule are true then the rule will fire and the consequents will be asserted / executed. With Fuzzy rules both antecedents and conclusions can be of fuzzy nature." (Juan R González et al, Nature-Inspired Cooperative Strategies for Optimization, 2008)
"Fuzzy If-Then or fuzzy conditional statements are expressions of the form 'If A Then B', where A and B are labels of fuzzy sets characterised by appropriate membership functions. Due to their concise form, fuzzy If-Then rules are often employed to capture the imprecise modes of reasoning that play an essential role in the human ability to make decision in an environment of uncertainty and imprecision. The set of If-Then rules relate to a fuzzy logic system that are stored together is called a Fuzzy Rule Base." (Masoud Mohammadian, Supervised Learning of Fuzzy Logic Systems, 2009)
02 March 2018
🔬Data Science: Hash Function (Definition)
"A function that maps a set of keys onto a set of addresses." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)
"A function that maps a string of arbitrary length to a fixed size value in a deterministic manner. Such a function may or may not have cryptographic applications." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)
[cryptographic hash function:] "A function that takes an input string of arbitrary length and produces a fixed-size output for which it is unfeasible to find two inputs that map to the same output, and it is unfeasible to learn anything about the input from the output." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)
[one-way hash function:] "A hash function for which it is computationally unfeasible to determine anything about the input from the output." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)
"A function that operates on an arbitrary-length input value and returns a fixed-length hash value." (Oracle, "Database SQL Tuning Guide Glossary", 2013)
[one-way hash:] "A one-way hash is an algorithm that transforms one string into another string (a fixed-length sequence of seemingly random characters) in such a way that the original string cannot be calculated by operations on the one-way hash value (i.e., the calculation is one way only). One-way hash values can be calculated for any string, including a person’s name, a document, or an image. For any input string, the resultant one-way hash will always be the same. If a single byte of the input string is modified, the resulting one-way hash will be changed and will have a totally different sequence than the one-way hash sequence calculated for the unmodified string. One-way hash values can be made sufficiently long (e.g., 256 bits) that a hash string collision (i.e., the occurrence of two different input strings with the same one-way hash output value) is negligible." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)
"A hash function is an algorithm that maps from an input, for example, a string of characters, to an output string. The size of the input can vary, but the size of the output is always the same." (Dan Sullivan, "NoSQL for Mere Mortals®", 2015)
[one-way hash:] "Cryptographic process that takes an arbitrary amount of data and generates a fixed-length value. Used for integrity protection." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)
"A function that takes as input the key of an element and produces an integer as output" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)
"encryption methods that use no keys." (Manish Agrawal, "Information Security and IT Risk Management", 2014)
"A function that operates on an arbitrary-length input value and returns a fixed-length hash value." (Oracle, "Oracle Database Concepts")
28 February 2018
🔬Data Science: Inference (Definitions)
"Drawing some form of conclusion about a measurable functional response based on representative or sample experimental data. Sample size, uncertainty, and the laws of probability play a major role in making inferences." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)
"Reasoning from known propositions." (DAMA International, "The DAMA Dictionary of Data Management", 2011)
"In general, inference is the act or process of deriving new facts from facts known or assumed to be true. In Artificial Intelligence, researchers develop automated inference engines to automate human inference." (Michael Fellmann et al, "Supporting Semantic Verification of Process Models", 2012)
[statistical inference:] "A method that uses sample data to draw conclusions about a population." (Geoff Cumming, "Understanding The New Statistics", 2013)
"Any conclusion drawn on the basis of some set of information. In research, we draw inferences on the basis of empirical data we collect and ideas we construct." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)
[causal inference:] "Conclusion that changes in the independent variable resulted in a change in the dependent variable. It may be drawn only if all potential confounding variables are properly controlled." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)
"The process of using a probabilistic model to answer a query, given evidence." (Avi Pfeffer, "Practical Probabilistic Programming", 2016)
[inductive inference] "A machine learning method for learning the rules that produced the actual data." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)
"The ability to derive information not explicitly available." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)
27 February 2018
🔬Data Science: Data Modeling (Definitions)
24 February 2018
💎SQL Reloaded: Misusing Views and Pseudo-Constants
Views as virtual tables can be misused to replace tables in certain circumstances, either by storing values within one or multiple rows, like in the below examples:
-- parameters for a BI solution CREATE VIEW dbo.vLoV_Parameters AS SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId , Cast(GetDate() as Date) AS CurrentDate , Cast(100 as int) AS BatchCount GO SELECT * FROM dbo.vLoV_Parameters GO -- values for a dropdown CREATE VIEW dbo.vLoV_DataAreas AS SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId , Cast('Company ABC' as nvarchar(50)) AS Description UNION ALL SELECT 'XYZ' DataAreaId , 'Company XYZ' GO SELECT * FROM dbo.vLoV_DataAreas GO
These solutions aren’t elegant, and typically not recommended because they go against one of the principles of good database design, namely “data belong in tables”, though they do the trick when needed. Personally, I used them only in a handful of cases, e.g. when it wasn’t allowed to create tables, when it was needed testing something for a short period of time, or when there was some overhead of creating a table for 2-3 values. Because of their scarce use, I haven’t given them too much thought, not until I discovered Jared Ko’s blog posting on pseudo-constants. He considers the values from the first view as pseudo-constants, and advocates for their use especially for easier dependency tracking, easier code refactoring, avoiding implicit data conversion and easier maintenance of values.
All these are good reasons to consider them, therefore I tried to take further the idea to see if it survives a reality check. For this I took Dynamics AX as testing environment, as it makes extensive use of enumerations (aka base enums) to store list of values needed allover through the application. Behind each table there are one or more enumerations, the tables storing master data abounding of them.
For exemplification let’s consider InventTrans, table that stores the inventory transactions, the logic that governs the receipt and issued transactions are governed by three enumerations: StatusIssue, StatusReceipt and Direction.
-- Status Issue Enumeration CREATE VIEW dbo.vLoV_StatusIssue AS SELECT cast(0 as int) AS None , cast(1 as int) AS Sold , cast(2 as int) AS Deducted , cast(3 as int) AS Picked , cast(4 as int) AS ReservPhysical , cast(5 as int) AS ReservOrdered , cast(6 as int) AS OnOrder , cast(7 as int) AS QuotationIssue GO -- Status Receipt Enumeration CREATE VIEW dbo.vLoV_StatusReceipt AS SELECT cast(0 as int) AS None , cast(1 as int) AS Purchased , cast(2 as int) AS Received , cast(3 as int) AS Registered , cast(4 as int) AS Arrived , cast(5 as int) AS Ordered , cast(6 as int) AS QuotationReceipt GO -- Inventory Direction Enumeration CREATE VIEW dbo.vLoV_InventDirection AS SELECT cast(0 as int) AS None , cast(1 as int) AS Receipt , cast(2 as int) AS Issue
To see these views at work let’s construct the InventTrans table on the fly:
-- creating an ad-hoc table SELECT * INTO dbo.InventTrans FROM (VALUES (1, 1, 0, 2, -1, 'A0001') , (2, 1, 0, 2, -10, 'A0002') , (3, 2, 0, 2, -6, 'A0001') , (4, 2, 0, 2, -3, 'A0002') , (5, 3, 0, 2, -2, 'A0001') , (6, 1, 0, 1, 1, 'A0001') , (7, 0, 1, 1, 50, 'A0001') , (8, 0, 2, 1, 100, 'A0002') , (9, 0, 3, 1, 30, 'A0003') , (10, 0, 3, 1, 20, 'A0004') , (11, 0, 1, 2, 10, 'A0001') ) A(TransId, StatusIssue, StatusReceipt, Direction, Qty, ItemId)
Here are two sets of examples using literals vs. pseudo-constants:
--example issued with literals SELECT top 100 ITR.* FROM dbo.InventTrans ITR WHERE ITR.StatusIssue = 1 AND ITR.Direction = 2 GO --example issued with pseudo-constants SELECT top 100 ITR.* FROM dbo.InventTrans ITR JOIN dbo.vLoV_StatusIssue SI ON ITR.StatusIssue = SI.Sold JOIN dbo.vLoV_InventDirection ID ON ITR.Direction = ID.Issue GO --example receipt with literals SELECT top 100 ITR.* FROM dbo.InventTrans ITR WHERE ITR.StatusReceipt= 1 AND ITR.Direction = 1 GO --example receipt with pseudo-constants SELECT top 100 ITR.* FROM dbo.InventTrans ITR JOIN dbo.vLoV_StatusReceipt SR ON ITR.StatusReceipt= SR.Purchased JOIN dbo.vLoV_InventDirection ID ON ITR.Direction = ID.Receipt
As can be seen the queries using pseudo-constants make the code somehow readable, though the gain is only relative, each enumeration implying an additional join. In addition, when further business tables are added to the logic (e.g. items, purchases or sales orders) it complicates the logic, making it more difficult to separate the essential from nonessential. Imagine a translation of the following query:
-- complex query SELECT top 100 ITR.* FROM dbo.InventTrans ITR <several tables here> WHERE ((ITR.StatusReceipt<=3 AND ITR.Direction = 1) OR (ITR.StatusIssue<=3 AND ITR.Direction = 2)) AND (<more constraints here>)
The more difficult the constraints in the WHERE clause, the more improbable is a translation of the literals into pseudo-constraints. Considering that an average query contains 5-10 tables, each of them with 1-3 enumerations, the queries would become impracticable by using pseudo-constants and quite difficult to troubleshoot their execution plans.
The more I’m thinking about, an enumeration data type as global variable in SQL Server (like the ones available in VB) would be more than welcome, especially because values are used over and over again through the queries. Imagine, for example, the possibility of writing code as follows:
-- hypothetical query SELECT top 100 ITR.* FROM dbo.InventTrans ITR WHERE ITR.StatusReceipt = @@StatusReceipt .Purchased AND ITR.Direction = @@InventDirection.Receipt
From my point of view this would make the code more readable and easier to maintain. Instead, in order to make the code more readable, one’s usually forced to add some comments in the code. This works as well, though the code can become full of comments.
-- query with commented literals SELECT top 100 ITR.* FROM dbo.InventTrans ITR WHERE ITR.StatusReceipt <=3 -- Purchased, Received, Registered AND ITR.Direction = 1-- Receip
In conclusion, pseudo-constants’ usefulness is only limited, and their usage is against developers’ common sense, however a data type in SQL Server with similar functionality would make code more readable and easier to maintain.
Notes:
1) It is possible to simulate an enumeration data type in tables’ definition by using a CHECK constraint.
2) The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code.
Happy coding!
19 February 2018
🔬Data Science: Data Exploration (Definitions)
15 February 2018
🔬Data Science: Data Preparation (Definitions)
🔬Data Science: Data Augmentation (Definitions)
"1.The process of adding to something to make it more or greater than the original. 2.In logic, a relationship where if X leads to Y, then XZ will lead to YZ." (DAMA International, "The DAMA Dictionary of Data Management", 2011)
"A technique for improving the performance of a model by enriching the training data, e.g. by generating additional instances of minority classes." (Vincent Karas & Björn W Schuller, "Deep Learning for Sentiment Analysis: An Overview and Perspectives", 2021)
🔬Data Science: Feature Extraction (Definitions)
"A technique that attempts to combine or transform predictors to make clear the information contained within them. Feature extraction methods include factor analysis, principal components analysis, correspondence analysis, multidimensional scaling, partial least square methods, and singular value decomposition." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)
"Extracting or deriving some useful information from the initially obtained data." (Shouvik Chakraborty & Kalyani Mali, "An Overview of Biomedical Image Analysis From the Deep Learning Perspective", 2020)
"A process of finding features of words and map them to vector space." (Neha Garg & Kamlesh Sharma, "Machine Learning in Text Analysis", 2020)
"A digital signal processing algorithm, which extracts distinctive values from the input signal." (Andrej Zgank et al, "Embodied Conversation: A Personalized Conversational HCI Interface for Ambient Intelligence", 2021)
"Feature extraction is a procedure in dimensionality reduction of extracting principal variables (features) from some random variables under consideration, usually achieved by extracting one principal variable (feature) as mapping from multiple random variables." (Usama A Khan & Josephine M Namayanja, "Reevaluating Factor Models: Feature Extraction of the Factor Zoo", 2021)
🔬Data Science: Feature Selection (Definitions)
"A method by which to decide on which features (columns) to keep in the analysis that will be done by the data mining algorithms. One of the first things to be done in a data mining project; this uncovers the most important variables among the set of predictor variables. Many of the predictor variables in a data set may not really be important for making an accurate predictive model, and only dilute/reduce the accuracy score of the model if included." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)
"The process a cybersecurity engineer uses to choose the features in which a given attack may manifest." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)
"Feature selection is the process of selecting important principal variables (features) from some random variables under consideration, usually achieved by selecting a principal variable (feature) as one of the random variables." (Usama A Khan & Josephine M Namayanja, "Reevaluating Factor Models: Feature Extraction of the Factor Zoo", 2021)
"It is used to select appropriate features from the available data for improving efficiency of machine learning algorithms." (Gunjan Ansari et al, "Natural Language Processing in Online Reviews", 2021)
🔬Data Science: Optimization (Definitions)
"Term used to describe analytics that calculate and determine the most ideal scenario to meet a specific target. Optimization procedures analyze each scenario and supply a score. An optimization analytic can run through hundreds, even thousands, of scenarios and rank each one based on a target that is being achieved." (Brittany Bullard, "Style and Statistics", 2016)
"Optimization is the process of finding the most efficient algorithm for a given task." (Edward T Chen, "Deep Learning and Sustainable Telemedicine", 2020)
🔬Data Science: Speech Recognition (Definitions)
"Automatic decoding of a sound pattern into phonemes or words." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps" 2nd Ed., 2000)
"Speech recognition is a process through which machines convert words or phrases spoken into a machine-readable format." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)
About Me

- Adrian
- Koeln, NRW, Germany
- IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.