Showing posts with label hierarchies. Show all posts
Showing posts with label hierarchies. Show all posts

29 March 2024

🗄️🗒️Data Management: Data [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. 
Last updated: 29-Mar-2024

Data

  • {definition} raw, unrelated numbers or entries that represent facts, concepts,  events,  and/or associations 
  • categorized by
    • domain
      • {type} transactional data
      • {type} master data
      • {type} configuration data
        • {subtype}hierarchical data
        • {subtype} reference data
        • {subtype} setup data
        • {subtype} policy
      • {type} analytical data
        • {subtype} measurements
        • {subtype} metrics
        • {subtype} 
    • structuredness
      • {type} structured data
      • {type} semi-structured data
      • {type} unstructured data
    • statistical usage as variable
      • {type} categorical data (aka qualitative data)
        • {subtype} nominal data
        • {subtype} ordinal data
        • {subtype} binary data
      • {type} numerical data (aka quantitative data)
        • {subtype} discrete data
        • {subtype} continuous data
    • size
      • {type} small data
      • {type} big data
  • {concept} transactional data
    • {definition} data that describe business transactions and/or events
    • supports the daily operations of an organization
    • commonly refers to data created and updated within operational systems
    • support applications that automated key business processes 
    • usually stored in normalized tables
  • {concept} master data
    • {definition}"data that provides the context for business activity data in the form of common and abstract concepts that relate to the activity" [2]
      • the key business entities on which transaction are executed
    • the dimensions around on which analysis is conducted
      • used to categorize, evaluate and aggregate transactional data
    • can be shared across more than one transactional applications
    • there are master data similar to most organizations, but also master data specific to certain industries 
    • often appear in more than one area within the business
    • represent one version of the truth
    • can be further divided into specialized subsets
    • {concept} master data entity
      • core business entity used in different applications across the organization, together with their associated metadata, attributes, definitions, roles, connections and taxonomies 
      • may be classified within a hierarchy
        • the way they describe, characterize and classify business concepts may actually cross multiple hierarchies in different ways
          • e.g. a party can be an individual, customer, employee, while a customer might be an individual, party or organization
    • do not change as frequent like transactional data
      • less volatile than transactional data 
      • there are master data that don’t change at all 
        • e.g. geographic locations
    • strategic asset of the business 
    • needs to be managed with the same diligence as other strategic assets
  • {concept} metadata 
    • {definition} "data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes" [2]
      • data about data
    • refers to 
      • database schemas for OLAP & OLTP systems
      • XML document schemas
      • report definitions
      • additional database table and column descriptions stored with extended properties or custom tables provided by SQL Server
      • application configuration data
  • {concept} analytical data
    • {definition} data that supports analytical activities 
      • e.g. decision making, reporting queries and analysis 
    • comprises
      • numerical values
      • metrics
      • measurements
    • stored in OLAP repositories 
      • optimized for decision support 
      • enterprise data warehouses
      • departmental data marts
      • within table structures designed to support aggregation, queries and data mining 
  • {concept} hierarchical data 
    • {definition} data that reflects a hierarchy 
      • relationships between data are represented in hierarchies
    • typically appears in analytical applications
    • {concept} hierarchy
      • "a classification structure arranged in levels of detail from the broadest to the most detailed level" [2]
      • {concept} natural hierarchy
        • stem from domain-based attributes
        • represent an intrinsic structure of the dat
          • they are natural for the data
            • e.g. product taxonomy (categories/subcategories)
        • useful for drilling down from a general to a detailed level in order to find reasons, patterns, and problems
          • common way of analyzing data in OLAP applications
          • common way of filtering data in OLTP applications
      • {concept} explicit hierarchy
        • organize data according to business needs
        • entity members can be organized in any way
        • can be ragged
          • the hierarchy can end at different levels
      • {concept} derived hierarchy
        • domain-based attributes form natural hierarchies 
        • relationships between entities must already exist in a model
        • can be recursive
  • {concept} structured data
    • {definition} "data that has a strict metadata defined"
  • {concept} unstructured data 
    • {definition} data that doesn't follow predefined metadata
    • involves all kinds of documents 
    • can appear in a database, in a file, or even in printed material
  • {concept} semi-structured data 
    • {definition} structured data stored within unstructured data,
    • data typically in XML form
      • XML is widely used for data exchange
    • can appear in stand-alone files or as part of a database (as a column in a table)
    • useful when metadata (the schema) changes frequently, or there’s no need for a detailed relational schema
References:
[1] The Art of Service (2017) Master Data Management Course 
[2] DAMA International (2011) "The DAMA Dictionary of Data Management", 

30 March 2018

🔬Data Science: Decision Tree (Definitions)

"Decision trees are a way of representing a series of rules that lead to a class or value. For example, the goal may be to classify a group of householders who have moved to a new house, based on their choice of type of the new dwelling. A simple decision tree can solve this problem and illustrate all the basic components of a decision tree (the decision nodes, branches, and leaves)." (William A V Clark & Marinus C Deurloo, "Categorical Modeling/Automatic Interaction Detection", Encyclopedia of Social Measurement, 2005)

"A decision tree is a graphical representation of various alternatives and sequence of events in these multi-stage decision problems." (P C Tulsian and Vishal Pandey, "Quantitative Techniques: Theory and Problems", 2006)

"A representation of a hierarchical set of rules that lead to sets of observations based on the class or value of the response variable." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A decision-making method that uses a branch diagram to portray different options and outcomes." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"It is technique for classifying data. The root node of a decision tree represents all examples. If these examples belong to two or more classes, then the most discriminating attribute is selected and the set is split into multiple classes." (Indranil Bose, "Data Mining in Tourism", 2009)

"A graph of decisions and their possible consequences (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. Regression trees approximate real-valued functions (e.g., estimate the price of a house or a patient's length of stay in a hospital). Classification trees define the logic for categorization using Boolean variables such as gender (male or female) or game results (lose or win)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A treelike model of data produced by certain data mining methods. Decision trees can be used for prediction." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A graphic tool for specifying the action that will result from each combination of a set of conditions." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"Decision trees are decision support models that classify patterns using a sequence of well-defined rules. They are tree-like graphs in which each branch node represents an option between a number of alternatives, and each leaf node represents an outcome of the cumulative choices." (Joo Chuan Tong & Shoba Ranganathan, "Computational T cell vaccine design", Computer-Aided Vaccine Design, 2013)

"The Decision Tree is a form of flow diagram that helps to map out complicated decision-making processes, or the possible directions a conversation or interaction might take." (Kevin Duncan, "The Diagrams Book", 2013)

"A family of classification methods whose results are usually represented in a tree-like graph." (Meta S Brown, "Data Mining For Dummies", 2014)

"A tool to help make decisions based on a set of rules that help to navigate the tree along its branches." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"An algorithm that focuses on maximizing group separation by iteratively splitting variables." (Evan Stubbs, "Big Data, Big Innovation", 2014)

"A representation of knowledge in a tree-like form usually used for classification. The non-terminal nodes of the tree represent questions, the terminal nodes represent class labels and the edges represent answers to questions." (Petr Berka, "Machine Learning", 2015)

"Decision tree learning is a supervised machine learning technique for inducing a decision tree from training data. A decision tree (also referred to as a classification tree or a reduction tree) is a predictive model which is a mapping from observations about an item to conclusions about its target value." (Lin Tan, "The Art and Science of Analyzing Software Data", 2015)

"A simple decision tree is an algorithm for determining a decision by making a sequence of logical or property tests." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"An organised pathway of ideas leading to a defined goal, in which at various points, a decision is made about which of two ‘branches’ of ideas to follow to the next decision point." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A decision tree is a largely used non-parametric effective machine learning modeling technique for regression and classification problems." (Thomas Plapinger, "What is a Decision Tree", 2017)

"A decision tree is the arrangement of data in a tree structure where, at each node, data is separated into different branches according to the value of the attribute at the node." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"A model classifying a data item into one of the classes at the leaf node, based on matching properties between the branches on the tree and the actual data item." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"Decision tree is a technique that helps us in deriving rules from data. A rule-based technique is very helpful in explaining how the model is supposed to work in estimating a dependent variable value." (V Kishore Ayyadevara et al, "Hands-On Machine Learning on Google Cloud Platform", 2018)

"Decision trees are a machine learning algorithm that predicts the value of a target variable based on decision rules learned from training data. The algorithm can be applied to both regression and classification problems by changing the objective function that governs how the tree learns the decision rules." (Stefan Jansen, "Hands-On Machine Learning for Algorithmic Trading", 2018)

"A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility." (James D Miller, "Hands-On Machine Learning with IBM Watson", 2019)

"In a machine learning context, a decision tree is a data structure that is built for classification or regression tasks. Each node in the tree splits on a particular feature." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A decision tree is a graph that uses a branching method to illustrate every possible outcome of a decision." (WhatIs) [source]

"A tree and branch-based model, like a flow chart, used to map decisions and their possible consequences. The decision tree is widely used in machine learning for classification and regression algorithms." (Accenture)

"A treelike model of data produced by certain data mining methods." (Microsoft Technet)

23 December 2016

♟️Strategic Management: Organization Charts (Just the Quotes)

"The writer has found, in analyzing and diagnosing organization and accounting work, that charts can express more on one page than is sometimes expressed in several chapters of writing, and has been the author and originator of many methods of charting industrial expressions. It is necessary, as a first step, for analytical and other purposes, to make a chart expressing all of the relations governing the organization of a business so as to show the very foundation upon which all authorities, accounting, and business transactions are based and conducted. There have been more failures scored both personally and financially for lack of these very elements in a business than by reason of any other one thing. As well try to build a house without a foundation as to try to conduct a business, especially a manufacturing business, without proper organization." (Clinton E. Woods, "Organizing a factory", 1905)

"An Organization Chart is a cross section picture covering every relationship in the bank. It is a schematic survey showing department functions and interrelations, lines of authority, responsibility, communication and counsel. Its purpose is 'to bring the various human parts of the organization into effective correlation and co-operation'." (John W Schulze, "Office Administration", 1919)

"The frame work of the entire organization should be sketched, and the particular place in the scheme of things which his department and his position occupy should be explained. Almost any one can be shown a particular location on a map. An organization chart is a map." (John W Schulze, "Office Administration", 1919)

"The most elementary aspect of administration is organization the structure of social institutions and their constituent parts, the composition of economic enterprises and their various branches, the organization of governmental agencies and their numerous departments. As it is mainly a matter of structure, organization bears the same rudimentary relationship to administration as does the science of anatomy or skeletology to the field of medicine. An administrative organization can be sketched and charted just as the human body can be physically depicted. Apart from its graphic convenience and its 'teachable' quality, however, what intrinsic relationship does organization bear to administration?" (Albert Lepawsky, "Administration: the art and science of organization and management", 1949)

"Although organization charts are useful, necessary, and often revealing tools, they are subject to many important limitations. In the first place, a chart shows only formal authority relationships and omits the many significant informal and informational relationships that exist in a living organization. Moreover, it does not picture how much authority exists at any point in the organization." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Essential to organization planning, then, is the search for an ideal form of organization to reflect the basic goals of the enterprise. This entails not only charting the main lines of organization and reflecting the organizational philosophy of the enterprise leaders (e.g., shall authority be as centralized as possible, or should the company try to break its operations down into semiautonomous product or territorial divisions?), but also a sketching out of authority relationships throughout the structure." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"If charts do not reflect actual organization and if the organization is intended to be as charted, it is the job of effective management to see that actual organization conforms with that desired. Organization charts cannot supplant good organizing, nor can a chart take the place of spelling out authority relationships clearly and completely, of outlining duties of managers and their subordinates, and of defining responsibilities." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"One of the tools for making organization principles work is the organization chart. Any organization which exists can be charted, for a chart is nothing more than an indication of how departments are tied together along their principal lines of authority." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Since a chart maps lines of authority, sometimes the mere charting of an organization will show inconsistencies and complexities and lead to their correction. A chart also acts as a guide for managers and new personnel in an organization, revealing how they tie into the entire structure. Charts are, therefore, not only evidences of organization planning but also road maps for decision making, and training devices for those who would learn how a company is organized." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"While good charting will attempt, as far as possible, to make levels on the chart conform to levels of importance in the business enterprise, it cannot always do so. This problem can be handled by clearly spelling out authority relationships." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The relations outlined on an organization chart provide a framework within which fuller and more spontaneous human behavior takes place. The formal system may draw upon that behavior for added strength; it will in its turn be subordinated to personal and group egotism." (Philip Selznick, "Leadership in Administration: A Sociological Interpretation", 1957)

"It is probable that one day we shall begin to draw organization charts as a series of linked groups rather than as a hierarchical structure of individual 'reporting' relationships." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"In some firms role relationships prescribed by the chart seemed to be of secondary importance to personal relationships between individuals. (Joan Woodward, "Industrial Organization: Theory and practice", 1965)

"Every organization structure, even a poor one, can be charted, for a chart merely indicates how departments are tied together along the principal lines of authority. It is therefore somewhat surprising to find top managers occasionally taking pride in the fact that they do not have an organization chart or, if they do have one, feeling that the chart should be kept a secret." (Harold Koontz, "Principles of management", 1968)

"[…] the organization chart will initially reflect the first system design, which is almost surely not the right one […] as one learns, he changes the design […]. Management structures also need to be changed as the system changes […]" (Fred Brooks, "The Mythical Man-Month: Essays on Software Engineering", 1975)

"An organization chart is a graphic device that uses pictorial methods to show qualitative information about an organization. [...] The organization chart can be used to show one or more of three things: (1) What the various staff positions in the organization are, how they are structurally related to each other and the span of control and chain of command within the organization. (2) What the different units of the organization are and how they are arranged and related to each other. (3) What the various functions are within the organization and how they are organized and related." (Robert Lefferts, "Elements of Graphics: How to prepare charts and graphs for effective reports", 1981)

"Every company has two organizational structures: the formal one is written on the charts; the other is the everyday living relationship of the men and women in the organization." (Harold Geneen & Alvin Moscow, "Managing", 1984)

"Organization charts and fancy titles count for next to nothing." (Colin Powell, "My American Journey", 1995)

"The premise here is that the hierarchy lines on the chart are also the only communication conduit. Information can flow only along the lines. [...] The hierarchy lines are paths of authority. When communication happens only over the hierarchy lines, that's a priori evidence that the managers are trying to hold on to all control. This is not only inefficient but an insult to the people underneath." (Tom DeMarco, "Slack: Getting Past Burnout, Busywork, and the Myth of Total Efficiency", 2001)

"Traditional organizational charts tend to institutionalize inefficiency, and misalignment of natural talents to job descriptions." (John Hoover, "Unleashing Leadership", 2005)

"Organization charts are subject to important limitations. A chart shows only formal authority relationships and omits the many significant informal and informational relationships." (Harold Koontz and Heinz Weihrich, "Essentials Of Management", 2006)

"When you accepted your job, you were not chosen solely to fill a position on the organization chart; you were chosen to fill a responsibility." (David Cottrell, "Monday Morning Mentoring: Ten Lessons to Guide You Up the Ladder", 2009)

"[…] decisions based on org-chart structure tend to optimize for only part of the organization, ignoring upstream and downstream effects. Local optimizations help the teams directly involved, but they don’t necessarily help improve the overall delivery of value to customers. Their impact might be negligent if there are larger bottlenecks in the stream of work." (Matthew Skelton, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

"However, in a highly collaborative context filled with uncertainty over outcomes, relying on the org chart as a principal mechanism of splitting the work to be done leads to unrealistic expectations." (Matthew Skelton & Manuel Pais, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

"Organizations that rely too heavily on org charts and matrixes to split and control work often fail to create the necessary conditions to embrace innovation while still delivering at a fast pace. In order to succeed at that, organizations need stable teams and effective team patterns and interactions. They need to invest in empowered, skilled teams as the foundation for agility and adaptability. To stay alive in ever more competitive markets, organizations need teams and people who are able to sense when context changes and evolve accordingly." (Matthew Skelton, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019

07 December 2016

♟️Strategic Management: Hierarchies (Just the Quotes)

"Formal theories of organization have been taught in management courses for many years, and there is an extensive literature on the subject. The textbook principles of organization — hierarchical structure, authority, unity of command, task specialization, division of staff and line, span of control, equality of responsibility and authority, etc. - comprise a logically persuasive set of assumptions which have had a profound influence upon managerial behavior." (Douglas McGregor, 'The Human Side of Enterprise", 1960)

"It is probable that one day we shall begin to draw organization charts as a series of linked groups rather than as a hierarchical structure of individual 'reporting' relationships." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"General systems theory is the scientific exploration of 'wholes' and 'wholeness' which, not so long ago, were considered metaphysical notions transcending the boundaries of science. Hierarchic structure, stability, teleology, differentiation, approach to and maintenance of steady states, goal-directedness - these are a few of such general system properties." (Ervin László, "Introduction to Systems Philosophy", 1972)

"In the objectives system, the corporation's aims, or plans, are broken down into a hierarchy of lesser aims or plans; and the grand total of all those objectives adds up to those of the corporation. Then all the executives have to do is meet their planned and agreed objectives and - hey presto - the corporation does the same. Perfection in management, at last, has arrived, except that it hasn't and won't." (Robert Heller, "The Naked Manager: Games Executives Play", 1972)

"With the changes in technological complexity, especially in information technology, the leadership task has changed. Leadership in a networked organization is a fundamentally different thing from leadership in a traditional hierarchy." (Edgar Schein, "Organizational Culture and Leadership", 1985)

"Ethical pressures and decisions are viewed through the prism of one's own personal values. The distinction between personal and organizational values, however, often becomes blurred, especially the longer one stays with a particular organization and/or advances up the hierarchial ladder." (Warren H Schmidt & Barry Z Posner, Public Administration Review, 1986)

"Reengineering posits a radical new principle: that the design of work must be based not on hierarchical management and the specialization of labor but on end-to-end processes and the creation of value for the customer." (James A Champy & Michael M Hammer, "Reengineering the Corporation", 1993)

"The premise here is that the hierarchy lines on the chart are also the only communication conduit. Information can flow only along the lines. [...] The hierarchy lines are paths of authority. When communication happens only over the hierarchy lines, that's a priori evidence that the managers are trying to hold on to all control. This is not only inefficient but an insult to the people underneath." (Tom DeMarco, "Slack: Getting Past Burnout, Busywork, and the Myth of Total Efficiency", 2001)

"An ecology provides the special formations needed by organizations. Ecologies are: loose, free, dynamic, adaptable, messy, and chaotic. Innovation does not arise through hierarchies. As a function of creativity, innovation requires trust, openness, and a spirit of experimentation - where random ideas and thoughts can collide for re-creation." (George Siemens, "Knowing Knowledge", 2006)

"Hierarchy adapts knowledge to the organization; a network adapts the organization to the knowledge." (George Siemens, "Knowing Knowledge", 2006)

"DevOps recognizes the importance of culture. The acronym CAMS (culture, automation, measurement, and sharing) is used to encapsulate its key themes. Culture is acknowledged as all important in making development and IT operations work together effectively. But what is culture in this context? It is not so much about an informal dress code, flexible hours, or a free in-house cafeteria as it is about how decisions are taken, norms of behavior, protocols of communication, and the ways of navigating hierarchy and bureaucracy to get things done." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Some hierarchy is essential for the effective functioning of an organization. Eliminating hierarchy has the frequent side effect of slowing down decision making and diffusing accountability." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

30 August 2011

📈Graphical Representation: Tree (Definitions)

"A complex data structure built from nodes, each of which points to two or more other nodes." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours" 3rd Ed., 2001)

"In the hierarchical data mode, a single entity hierarchy." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A structure for data relationships where all relationships are one-to-many and no child entity may have more than one parent entity." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A graph in which child nodes do not have more than one parent. SEE ALSO chart; graph; structure, tree." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A hierarchy of things from the same population. The things could be a) instances from a population represented by a single type icon representing the population of instances, and a reflexive relationship on that type, or b) types from the set of types defined in a database represented by a tree structure where each node of the tree is a population of instances of the same type. In the first case, it is the instances that form a tree structure, and in the second, it is the types that form a tree structure. The latter is called a hierarchical data structure." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A linked data structure that forms a hierarchy where nodes at higher levels know about a subset of the nodes in the level below them. Each node in a tree can only be reached from a single node in the level above it." (Mark C Lewis, "Introduction to the Art of Programming Using Scala", 2012)

"A structure with a unique starting node (the root), in which each node is capable of having multiple child nodes, and in which a unique path exists from the root to every other node" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"A tree is a constrained graph. Trees are directed graphs because the 'parent of' relationship between nodes is asymmetric: the edges are arrows that point in a certain direction. Trees are acyclic graphs, because if you follow the directed edges from one node to another, you can never encounter the same node twice. Finally, trees have the constraint that every node (except the root) must have exactly one parent." (Robert J Glushko, "The Discipline of Organizing: Professional Edition, 4th Ed", 2016)

"Trees consist of nodes joined by edges, recursively nested. When a single, root dictionary is connected to child nodes that are themselves dictionaries, we say that the dictionaries are nested into a kind of tree structure." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"Hierarchical data structure where each node may have any number of child nodes, but only one parent node (with the exception of the root node, which has no parent)." (Karl Beecher, "Computational Thinking - A beginner's guide to problem-solving and programming", 2017)

12 February 2010

💎SQL Reloaded: Ways of Looking at Data V (Hierarchies)


While commenting on a post on User Defined Functions from SQL Server Programmers Blog I realized that I could use a multi-statement table-valued function in order to simulate recursive processing for ‘Identifying the top-most parent’ problem presented in Reports - Ways of Looking at Data – Part IV: Hierarchies. Some years back, I needed to create the BOM structure for a report based on IFS-iV ERP system, the data being stored in a SQL Server 2000-based data warehouse. Then, given the fact that common table expressions were introduced only on SQL Server 2005, I used a similar technique as the below one though based on temporary tables, if I’m remembering correctly. Then, like now, I was just iterating through the various levels of the BOM inserting data in the temporary table, this approach having better performance than creating a query self-referencing the same logic multiple times, the logic scaling quite well on multiple millions of records.

 -- BOM Top Most Items 
CREATE FUNCTION dbo.BOMTopMostItems(  
   @ProductID  int , 
   @Date datetime)  
RETURNS  
@BOM TABLE( 
    BillOfMaterialsID int NULL,  
   ProductAssemblyID int NULL,  
   ComponentID int NOT NULL, 
   ModifiedDate datetime NOT NULL, 
   Level smallint NOT NULL)  
AS 
BEGIN 
DECLARE @NoRecords int -- stored the number of Products part of an Assembly  

DECLARE @Level smallint  

SET @Date = ISNull(@Date, GetDate())  

SET @Level = 0 -- initializing the routine 
INSERT @BOM(BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, Level)  
SELECT BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, @Level  
FROM Production.BillOfMaterials  
WHERE ComponentID = @ProductID  
AND DATEDIFF(d, IsNull(EndDate, GetDate()), @Date)<=0  

SELECT @NoRecords = count(1) 
FROM @BOM  
WHERE ProductAssemblyID IS NOT NULL  

 
WHILE  (IsNull(@NoRecords, 0)>1)  
BEGIN  

    -- inserting next level 
    INSERT @BOM(BillOfMaterialsID, ProductAssemblyID, ComponentID, ModifiedDate, Level) 
    SELECT BOM.BillOfMaterialsID, BOM.ProductAssemblyID, BOM.ComponentID, BOM.ModifiedDate, @Level+1 
    FROM @BOM B  
         JOIN Production.BillOfMaterials BOM 
           ON B.ProductAssemblyID = BOM.ComponentID 
          AND DATEDIFF(d, IsNull(BOM.EndDate, GetDate()), @Date)<=0 
    WHERE Level = @Level 

    SET @Level = @Level+1  

    SELECT @NoRecords = count(1) 
    FROM @BOM 
    WHERE ProductAssemblyID IS NOT NULL 
      AND Level = @Level 
 
END -- deleting the intermediate levels 
 
DELETE   
FROM @BOM  
WHERE ProductAssemblyID IS NOT NULL 

RETURN 
END 
 
As can be seen I’m first inserting the BOM information of the Product searched, then based on these data I’m taking and inserting the next level, stopping when the top-most items where identified, here ProductAssemblyID IS NOT NULL, in the end deleting the intermediate levels. The same approach could be used to generate the whole hierarchy, in this case starting from the top.

Here is an example based on the above function, I used NULL in the second parameter because the @date parameter is set to the current date if not provided.
    

SELECT * 
FROM dbo.BOMTopMostItems(486, NULL)  

 When checking the data returned I observed that for the given ProductID the same BillOfMaterialsID appears more than once, this because, as can be seen by generating the whole BOM for one of the respective duplicated ComponentID, the same ProductID could appear in a BOM at different levels or at the same level but belonging to different assemblies. Therefore in order to get only the list of top-most items you could use a DISTINCT:

SELECT DISTINCT ComponentID  
FROM dbo.BOMTopMostItems(486, NULL) 

On the other side if we want to remove the duplicates resulted from Products appearing at the same level of the same BOM but for different assemblies, could be used a DISTINCT when inserting into the @BOM table or, even if not recommended, use a DISTINCT when calling the function:
 
SELECT DISTINCT BillOfMaterialsID
, ProductAssemblyID
, ComponentID
, ModifiedDate
, Level  
FROM dbo.BOMTopMostItems(486, NULL)

Happy coding!

Previous Post <<||>> Next Post

10 February 2010

💎SQL Reloaded: Hierarchies (The Oracle Solution)

    In yesterday’s post on hierarchy-based reports I was showing, using a BOM example, how hierarchy-related problems could be approached using common table expressions introduced starting with SQL Server 2005. Oracle’s approach to hierarchical queries is based on CONNECT BY clause that allows to specify the join constraint between the table(s) used in the recursive logic and previous output. If in SQL Server the logic is split in two parts – the anchor member sub-query representing the starting point of the query, respectively the recursive member, the query on which the recursive technique is applied, in Oracle the two are combined in only one query with the help of START WITH clause. Here’s the basic structure for building a hierarchy based on BillOfMaterials table coming with AdventureWorks.
    The START WITH ASS.productassemblyid IS NULL and TRUNC(NVL(ass.enddate, SYSDATE))>=TRUNC(SYSDATE) constraints function as initialization for the recursive logic, being ignore in next iterations, while the CONNECT BY PRIOR ass.componentid = ass.productassemblyid and TRUNC(NVL(ass.enddate, SYSDATE))>=TRUNC(SYSDATE) constraints function as join constraints between the prior result set and current logic. Maybe somebody is asking why the ‘Active BOM’ constraint if specified in both START WITH, respectively CONNECT BY clauses and not in the WHERE clause? If we are interested only in the Active BOMs then the corresponding constraint must be added to the initiating clause, and is added to the CONNECT BY clause because we need to assure that we take only the active children. If we move the ‘Active BOM’ constrain from the CONNECT BY to WHERE clause then most probably we’ll get a totally different number of records, respectively 9525 vs. 8730 how much the above query returns. The easiest way to check whether the logic is correct is to do include the Activer ecords in an inline view and construct with it the BOM.
    As the above query returns 8730 records it seems that the above logic is correct.

    In order to match the ‘CTE-Based BOM structure’ query from the previous post, the Oracle query needs several further changes, namely to construct the Path, add the top-most ComponentID, calculate the TotalQty, and add the Product details. The Path can be constructed using the SYS_CONNECT_BY_PATH function using as parameters the attribute needed to be concatenated (e.g. componentid) and the delimiter (e.g. ‘\’). The top-most ComponentID can be retrieved using the CONNECT_BY_ROOT operator. Unfortunately Oracle doesn’t provides a straight way to calculate the TotalQty, though there is an alternative, using again the SYS_CONNECT_BY_PATH function with the PerAssemblyQty attribute and multiplication sign ‘*’ as parameters, this smart trick allowing to evaluate the final expression (e.g. ‘1*1*2*5’ will be evaluated to 10); because the extract is typically saved to excel, the final concatenation could be written as an Excel formula expression (e.g. ‘=1*1*2*5’). Here is the final query:
    The specification of an ORDER BY is not always required because typically the output of a hierarchical query in Oracle is sorted as if it were used the depth-first search, though if I remember correctly there are exceptions too.

    Sometimes it’s useful to create a concatenation of Products in order to see the actual structure of the BOM (e.g. ‘\BK-R93R-62\BB-9108\BE-2349\BA-8327’) or to use to use incremental spaces instead (see. IndentPath), for this needing to bring the Product join for ComponentID into the recursive logic.
    By comparing the SQL Server vs. Oracle implementation for hierarchical queries, I would say that Oracle’s implementation is much simpler to use from the creational point of view with a trade off in functionality, several functions being required in order to extract information from top-most record or to concatenate attributes. On the other side SQL Server provides a more elegant solution that allows more functional flexibility in the detriment of code duplication – the anchor vs. the recursive member.

Note:
    In case you need to load the Product and BillOfMaterials tables in Oracle, you could do this by using the ‘SQL Server Import and Export Wizard’, for the steps see the SQL Server to Oracle Data Export – Second Magic Class tutorial.

09 February 2010

💎SQL Reloaded: Ways of Looking at Data IV (Hierarchies)

Introduction

There is a special type of entities that when related to each other, the relations put together form a hierarchy. It’s the case of assemblies and the components they are made of, a component could be at its turn an assembly formed from smaller components, resulting thus a structure known as Bill of Materials (BoM). Take for example a laptop made of parts like the keyboard, modem, hard disk, CD-ROM, microprocessor, volatile memory, graphic card, display, to name just a few of the main components, each of these parts are at their turn are made of smaller components that at their turn could be made from smaller components, and so on. Another popular example is the one of employees in an organization, following the line of command from CEO to the lowest person as importance, a person could be chief for more employees and have only one chief, resulting thus another hierarchical structure.

Most of the hierarchies I met were having maximum 7 levels, where a level represents a parent-child relation, typically there is no need for higher hierarchies. In a RDBMS such relations are represented with self-referencing tables or self-referencing structures in case more than one table is used to represent the relation. The problem with such structures is that in order to get the whole hierarchy a join must be created for each relation, though as the number of levels could vary, such queries can become quite complex. Therefore RDBMS came with their own solutions in order to work with hierarchies

There are several types of problems specific to hierarchies: 1. Building the hierarchy behind of an item in the structure
2. Identifying the top-most parent of an item in the structure
3. Identifying the top-most parent of an item in the structure and building the hierarchy behind it – a combination of the previous mentioned problems

Building the hierarchy

SQL Server 2005 introduced the common table expressions (CTE), a powerful feature that allows building whole hierarchies in a simplistic manner. From a structural point of view a CTE has two elements – the anchor member defining the starting point of the query, respectively the recursive member that builds upon the anchor member. The starting point of the query in this case is the entity or entities for which the structure is built, and here there are 2 situations – we are looking for all topmost entities or for the entities matching a certain criteria, typically looking for one or more Product Numbers. The difference between the two scopes could lead to different approaches – whether to include the Product information in the CTE or to use the CTE only to built the hierarchical structure, and add only in the end the production information. Before things would become fuzzier let’s start with an example, here are the two approaches for defining the anchor member, the first query identifying the top-most assemblies, while the second searches for the components in the BOM that matches a criteria (e.g. the Product’s name starts with BK-R93R).

So we have the starting point, how about the recursive member? Without using a CTE, normally we would have to join either of the two queries again with the BillofMaterials table in order to retrieve the components for the Products already retrieved, then the final result must be joined again with the BillofMaterials, and repeat this step over and over again until a predefined number of iterations are met, or until no components are retrieved anymore. Actually that’s what a CTE does too, and in order to make things simpler, a CTE has a name that it’s reused in the CTE in order to handle the recursive part, and outside of the CTE in order to use the final result. Supposing that CTE’s name is BOM here is the recursive member:
    The anchor and recursive member are merged in the CTE with the help of a UNION therefore the number of attributes must be the same and their data type must be compatible. Here is the final query:
    The TotalQty attribute has been introduced in order to show the cumulated quantity for each component, for example if the Assembly has the PerAssemblyQty 2 and the Component has the PerAssemblyQty 10 then the actual total quantity needed is 2*10.
    The Path attribute has been introduced in order to sort the hierarchy lexically based on the sequence formed from the ComponentID, it could have been used instead the Product Number.

Identifying the top-most parent

    The CTE can be used also for identifying the top-most parent of a given Product or set of Items, the second anchor query member could be used for this, while the recursive member would involve the reverse join between BOM and BillOfMaterials. Here is the final query:
Given the fact that the final CTE output includes the output from each iteration when only the data from the last iteration are needed, this might not be the best solution. I would expect that the simulation of recursion with the help of a temporary table would scale better in this case.

Another interesting fact to note is that in this case more assemblies are found to match the criteria of top-most item, meaning that the BOMs are not partitioned – two BOMs could have common elements.

Happy coding!

16 March 2009

🛢DBMS: Network Model (Definitions)

[network database model:] "Essentially a refinement of the hierarchical database model. The network model allows child tables to have more than one parent, thus creating a networked-like table structure. Multiple parent tables for each child allow for many-to-many relationships, in addition to one-to-many relationships." (Gavin Powell, "Beginning Database Design", 2006)

[complex network data model:] "A navigational data model that supports direct many-to-many relationships." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

[simple network data model:] "A navigational data model that supports only one-to-many relationships but allows an entity to have an unlimited number of parent entities." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

[complex network data model:] "A navigational data model that permits direct many-to-many relationships as well as one-to-many and one-to-one relationships." (Jan L Harrington, "Relational Database Design and Implementation: Clearly explained" 3rd Ed., 2009)

[simple network data model:"A legacy data model where all relationships are one-to-many or one-toone; a navigational data model where relationships are represented with physical data structures such as pointers." (Jan L Harrington, "Relational Database Design and Implementation: Clearly explained" 3rd Ed., 2009)

"A data model standard created by the CODASYL Data Base Task Group in the late 1960s. It represented data as a collection of record types and relationships as predefined sets with an owner record type and a member record type in a 1:M relationship." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A DBMS architecture where record types are organized in a many-to-many structure consisting of multiple parent-child sets." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"A network model is a database model that is designed as a flexible approach to representing objects and their relationships. A unique feature of the network model is its schema, which is viewed as a graph where relationship types are arcs and object types are nodes." (Techopedia) [source]


03 March 2007

🌁Software Engineering: Hierarchy (Definitions)

"A structure in which components are ranked into levels of subordination; each component has zero, one, or more subordinates; and no component has more than one superordinate component."  (IEEE," IEEE Standard Glossary of Software Engineering Terminology", 1990)

"An organizational structure that defines the relationship between master data elements." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"The principle of hierarchy advocates the creation of a hierarchical organization of abstractions using techniques such as classification, generalization, substitutability, and ordering." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"An organisation structure wherein the authority available to a role increases upwards through the structure, increasing as work complexity increases. The authority structure of the organisation is made visible and accessible by means of role titles. In a correctly structured organisation each role has the authority that is necessary to perform the work assigned to the role and this provides the connection between role authority and work." (Catherine Burke et al, "Systems Leadership, 2nd Ed,", 2018)


Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.