Showing posts with label data structure. Show all posts
Showing posts with label data structure. Show all posts

07 August 2024

Business Intelligence: Data Modeling (Part II: From Data to Data Models)

Business Intelligence Series
Business Intelligence Series

A data model can be defined as an abstract, self-contained, logical definition of the data structures available in a database or similar repositories. It’s typically an abstraction of the data structures underpinning a set of processes, procedures and business logic used for a predefined purpose. A data model can be formed also of unrelated micromodels, depicting thus various aspects of a business. 

The association between data and data models is bidirectional. Given a set of data, a data model can be built to underpin the respective data. Conversely, one can create or generate data based on a data model. However, in business setups a bidirectional relationship between data and the data model(s) underpinning them is more realistic as the business evolves. In extremis, the data model can be used to reflect a business’ needs, at least when the respective needs are addressed accordingly by extending the data model(s).

Given a set of data (e.g. the data stored in one or more spreadsheets or other type of files) there can be defined in theory multiple data models to reflect the respective data. Within a data model, the fields (aka attributes) are partitioned into a set of data entities, where a data entity is thus a nonunique grouping of attributes that attempt to define together one unitary aspect of the world. Customers, Vendors, Products, Invoices or Sales Orders are examples of such data entities, though entities can have a broader granularity (e.g. Customers can be modeled over several tables like Entity, Addresses, Contact information, etc.). 

From an operational database’s perspective, a data entity is based on one or more tables, though several entities can share some of the tables. From a BI artifact’s perspective, an entity should be easy to create from the underlying tables, with a minimal set of transformations. Ideally, the BI data model should be as close as possible to the needed entity for reporting, however an optimal solution lies usually somewhere in between. In this resides the complexity of modeling BI solutions – providing an optimal data model which can be easily built on the source tables, and which allows addressing all or at least most of the BI requirements.

In other words, we deal with two optimization problems of two distinct data models. On one side the business data model must be flexible enough to provide fast read/write operations while keeping the referential data’s granularity efficient. Conversely, a BI data model needs to abstract these entities and provide a fast way of processing the data, while making data reads extremely efficient. These perspectives must apply when we move to Microsoft Fabric too. 

The operational data layer must provide this abstraction, and in this resides the complexity of building optimal BI solutions. This is the layer at which the modeling problems need to be tackled. The challenge of BI and Analytics resides in finding an optimal data model that allows us to address most or ideally all the BI requirements. Several overlapping layers of abstraction may be built in the process.

Looking at the data modeling techniques used in notebooks and other similar solutions, data modeling has the chance of becoming a redundant practice prone to errors. Moreover, data models have a tendency of being multilayered and of being based on certain perspectives into the processes they model. Providing reliable flexible models involves finding the right view into the data for modeling aspects of the business. Database views allow us to easily model such perspectives, often in a unique way. Moving away from them just shifts the burden on the multiple solutions built around the base data, which can create other important challenges. 

Previous Post <<||>> Next Post

27 November 2020

Data Warehousing: ETL - An Introduction

 


ETL (Extract, Transform, Load) processes, technologies or tools are about extracting data from one or more data sources via a set of queries, performing changes on the data via conversions, aggregations, mappings or other types of transformations, respectively loading the data into target tables or other type of repositories. Thus, an ETL process allows moving and transforming data between predefined data structures on an ad-hoc basis or as part of stable repetitive processes, which makes ETL ideal for data warehousing, data integrations, data migrations or similar scenarios. 

ETL Data Flow

Extract: The extraction of data is done typically based on SQL queries from relational databases or any OLEDB or ODBC-based data repositories including flat or MS Office files, though modern ETL tools can support other type of queries (CAML, XQuery, DAX) or even NoSQL architectures (Handoop). This allows addressing a wide range of requirements, the complexity of the logic depending on the functionality provided by the query languages, respectively the extraction functionality available.  

Transform: The transformation logic can be implemented based on the functionality provided by the ETL tool, and can involve after case any combination of aggregates, conditional splits, merges, lookups, multicasts, pivoting/unpivoting, cleansing, data conversions, sampling, mapping or any other transformations that can be performed on an in-transit dataset. On the other side, quite often the same can be achieved with the help of SQL-based manipulations directly in the extraction logic or later in the process. SQL can prove to be occasionally faster and more flexible than the transformations provided by the ETL tool, however despite the overlaps, the two approaches can complement each other when used adequately. 

Load: The load is usually just a dump of the data into one or more final or intermediary tables with predefined structures. Unless the data don’t match the data type, format or further defined constraints, the load seldom involve further challenges as long the solution was designed adequately. 

Within the logical model, extract, transform and load can be considered as process by themselves. Within the object model provided by the ETL tool, they are considered in the mentioned sequence within a data flow, which within a set of workflow constraints defines how the data move through the pipeline – the sequence of processing steps considered. The basic unit of work is the data flow and the workflow it belongs to, unit that can be encapsulated in one container for easier management or simply convenience. Several containers can be linked within a workflow to create more complex behavior. 

The data flows and workflow constraints, together with the supporting connections and containers form an ETL package, the main unit of work for encapsulating and running ETL logic. ETL packages are scheduled and run as fit for the purpose.

With the right design, these building blocks allow enough flexibility in handling ad-hoc requests or of building complex solutions. This involves decisions on how to partition the ETL packages, respectively the data flows, in which order they should be run, where and in which sequence the data should be transformed, how to handle exceptions, how to build eventually intermediary data repositories, how to handles audit requirements, and so on. Each of these choices can prove to be important. 

The knowledge of the ETL architecture and functionality is quintessential in providing the right solution for the problem considered, however once the basics were understood the challenges typically reside in understanding the source and/or target structures, the logical and physical entities available, identify the way the data can be partitioned horizontally or vertically, respectively what type of transformations are required for moving the data, as required by the solution. 

Previous Post <<||>> Next Post

25 July 2019

IT: Blockchain (Definitions)

"A block chain is a perfect place to store value, identities, agreements, property rights, credentials, etc. Once you put something like a Bit coin into it, it will stay there forever. It is decentralized, disinter mediated, cheap, and censorship-resistant." (Kirti R Bhatele et al, "The Role of Artificial Intelligence in Cyber Security", 2019)

"A system made-up of blocks that are used to record transactions in a peer-to-peer cryptocurrency network such as bitcoins." (Murad Al Shibli, "Hybrid Artificially Intelligent Multi-Layer Blockchain and Bitcoin Cryptology", 2020)

"A chain of blocks containing data that is bundled together. This database is shared across a network of computers (so-called distributed ledger network). Each data block links to the previous block in the blockchain through a cryptographic hash of the previous block, a timestamp, and transaction data. The blockchain only allows data to be written, and once that data has been accepted by the network, it cannot be changed." (Jurij Urbančič et al, "Expansion of Technology Utilization Through Tourism 4.0 in Slovenia", 2020)

"A system in which a record of transactions made in Bitcoin or another cryptocurrency is maintained across several computers that are linked in a peer-to-peer network. Amany M Alshawi, "Decentralized Cryptocurrency Security and Financial Implications: The Bitcoin Paradigm", 2020)

"An encrypted ledger that protects transaction data from modification." (David T A Wesley, "Regulating the Internet, Encyclopedia of Criminal Activities and the Deep Web", 2020)

"Blockchain is a decentralized, immutable, secure data repository or digital ledger where the data is chronologically recorded. The initial block named as Genesis. It is a chain of immutable data blocks what has anonymous individuals as nodes who can transact securely using cryptology. Blockchain technology is subset of distributed ledger technology." (Umit Cali & Claudio Lima, "Energy Informatics Using the Distributed Ledger Technology and Advanced Data Analytics", 2020)

"Blockchain is a meta-technology interconnected with other technologies and consists of several architectural layers: a database, a software application, a number of computers connected to each other, peoples’ access to the system and a software ecosystem that enables development. The blockchain runs on the existing stack of Internet protocols, adding an entire new tier to the Internet to ensure economic transactions, both instant digital currency payments and complicated financial contracts." (Aslı Taşbaşı et al, "An Analysis of Risk Transfer and Trust Nexus in International Trade With Reference to Turkish Data", 2020) 

"Is a growing list of records, called blocks, which are linked using cryptography. Each block contains a cryptographic hash of the previous block a timestamp, and transaction data. (Vardan Mkrttchian, "Perspective Tools to Improve Machine Learning Applications for Cyber Security", 2020)

"This is viewed as a mechanism to provide further protection and enhance the security of data by using its properties of immutability, auditability and encryption whilst providing transparency amongst parties who may not know each other, so operating in a trustless environment." (Hamid Jahankhani & Ionuț O Popescu, "Millennials vs. Cyborgs and Blockchain Role in Trust and Privacy", 2020)

"A blockchain is a data structure that represents the record of each accounting move. Each account transaction is signed digitally to protect its authenticity, and no one can intervene in this transaction." (Ebru E Saygili & Tuncay Ercan, "An Overview of International Fintech Instruments Using Innovation Diffusion Theory Adoption Strategies", 2021)

"A system in which a record of transactions made in bitcoin or another cryptocurrency are maintained across several computers that are linked in a peer-to-peer network." (Silvije Orsag et al, "Finance in the World of Artificial Intelligence and Digitalization", 2021)

"It is a decentralized computation and information sharing platform that enables multiple authoritative domains, who don’t trust each other, to cooperate, coordinate and collaborate in a rational decision-making process." (Vinod Kumar & Gotam Singh Lalotra, "Blockchain-Enabled Secure Internet of Things", 2021)

"A concept consisting of the methods, technologies, and tool sets to support a distributed, tamper-evident, and reliable way to ensure transaction integrity, irrefutability, and non-repudiation. Blockchains are write-once, append-only data stores that include validation, consensus, storage, replication, and security for transactions or other records." (Forrester)

[hybrid blockchain:] "A network with a combination of characteristics of public and private blockchains where a blockchain may incorporate select privacy, security and auditability elements required by the implementation." (AICPA)

[private blockchain:] "A restricted access network controlled by an entity or group which is similar to a traditional centralized network." (AICPA)

"A technology that records a list of records, referred to as blocks, that are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp and transaction data." (AICPA)

[public blockchain:] "An open network where participants can view, read and write data, and no one participant has control (e.g., Bitcoin, Ethereum)." (AICPA)

04 April 2018

Data Science: Graph (Definitions)

"Informally, a graph is a finite set of dots called vertices (or nodes) connected by links called edges (or arcs). More formally: a simple graph is a (usually finite) set of vertices V and set of unordered pairs of distinct elements of V called edges." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A computation object that is used to model relationships among things. A graph is defined by two finite sets: a set of nodes and a set of edges. Each node has a label to identify it and distinguish it from other nodes. Edges in a graph connect exactly two nodes and are denoted by the pair of labels of nodes that are related." (Clay Breshears, "The Art of Concurrency", 2009)

"A graph in mathematics is a set of nodes and a set of edges between pairs of those nodes; the edges are ordered or nonordered pairs, or a relation, that defines the pairs of nodes for which the relation being examined is valid. […] The edges can either be undirected or directed; directed edges depict a relation that requires the nodes to be ordered while an undirected edge defines a relation in which no ordering of the edges is implied." (Dennis M Buede, "The Engineering Design of Systems: Models and methods", 2009)

[undirected graph:] "A graph in which the nodes of an edge are unordered. This implies that the edge can be thought of as a two-way path." (Clay Breshears, "The Art of Concurrency", 2009)

[directed graph:] "A graph whose edges are ordered pairs of nodes; this allows connections between nodes in one direction. When drawn, the edges of a directed graph are commonly shown as arrows to indicate the “direction” of the edge." (Clay Breshears, "The Art of Concurrency", 2009)

"1.Generally, a set of homogeneous nodes (vertices) and edges (arcs) between pairs of nodes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[directed acyclic graph:] "A graph that defines a partial order so that nodes can be sorted into a linear sequence with references only going in one direction. A directed acyclic graph has, as its name suggests, directed edges and no cycles." (Michael McCool et al, "Structured Parallel Programming", 2012)

"A data structure that consists of a set of nodes and a set of edges that relate the nodes to each other" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[directed graph:] "A directed graph is one in which the edges have a specified direction from one vertex to another." (Dan Sullivan, "NoSQL for Mere Mortals", 2015)

[directed graph (digraph):] "A graph in which each edge is directed from one vertex to another (or the same) vertex" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[undirected graph:] "A graph in which the edges have no direction" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

[undirected graph:] "An undirected graph is one in which the edges do not indicate a direction (such as from-to) between two vertices." (Dan Sullivan, "NoSQL for Mere Mortals®", 2015)

"Like a tree, a graph consists of a set of nodes connected by edges. These edges may or may not have a direction. If they do, the graph is referred to as a 'directed graph'. If a graph is directed, it may be possible to start at a node and follow edges in a path that leads back to the starting node. Such a path is called a 'cycle'. If a directed graph has no cycles, it is referred to as an 'acyclic graph'." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"In a computer science or mathematics context, a graph is a set of nodes and edges that connect the nodes." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

Undirected graph "A graph in which the edges have no direction" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

20 March 2017

Data Management: Data Structure (Definitions)

"A logical relationship among data elements that is designed to support specific data manipulation functions (trees, lists, and tables)." (William H Inmon, "Building the Data Warehouse", 2005)

"Data stored in a computer in a way that (usually) allows efficient retrieval of the data. Arrays and hashes are examples of data structures." (Michael Fitzgerald, "Learning Ruby", 2007)

"A data structure in computer science is a way of storing data to be used efficiently." (Sahar Shabanah, "Computer Games for Algorithm Learning", 2011)

"Data structure is a general term referring to how data is organized. In modeling, it refers more specifically to the model itself. Tables are referred to as 'structures'." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

[probabilistic *] "A data structure which exploits randomness to boost its efficiency, for example skip lists and Bloom filters. In the case of Bloom filters, the results of certain operations may be incorrect with a small probability." (Wei-Chih Huang & William J Knottenbelt, "Low-Overhead Development of Scalable Resource-Efficient Software Systems", 2014)

"A collection of methods for storing and organizing sets of data in order to facilitate access to them. More formally data structures are concise implementations of abstract data types, where an abstract data type is a set of objects together with a collection of operations on the elements of the set." (Ioannis Kouris et al, "Indexing and Compressing Text", 2015)

"A representation of the logical relationship between elements of data." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Is a schematic organization of data and relationship to express a reality of interest, usually represented in a diagrammatic form." (Maria T Artese  Isabella Gagliardi, "UNESCO Intangible Cultural Heritage Management on the Web", 2015)

"The implementation of a composite data field in an abstract data type" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A way of organizing data so that it can be efficiently accessed and updated." (Vasileios Zois et al, "Querying of Time Series for Big Data Analytics", 2016)

"A particular way of storing information, allowing to a high level approach on the software implementation." (Katia Tannous & Fillipe de Souza Silva, "Particle Shape Analysis Using Digital Image Processing", 2018)

"It is a particular way of organizing data in a computer so that they can be used efficiently." (Edgar C Franco et al, "Implementation of an Intelligent Model Based on Machine Learning in the Application of Macro-Ergonomic Methods...", 2019)

"Way information is represented and stored." (Shalin Hai-Jew, "Methods for Analyzing and Leveraging Online Learning Data", 2019)

"A physical or logical relationship among a collection of data elements." (IEEE 610.5-1990)

29 January 2017

Data Management: Data Dictionary (Definitions)

"The system tables that contain descriptions of the database objects and how they are structured." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A set of system tables stored in a catalog. A data dictionary includes definitions of database structures and related information, such as permissions." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Software in which metadata is stored, manipulated and defined – a data dictionary is normally associated with a tool used to support software engineering." (Keith Gordon, "Principles of Data Management", 2007)

"A list of descriptions of data items to help developers stay on the same track." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The place where information about data that exists in the organization is stored. This should include both technical and business details about each data element." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"Data dictionary are mini database management systems that manages metadata. It is a repository of information about a database that documents data elements of a database. The data dictionary is an integral part of the database management systems and stores metadata or information about the database, attribute names and definitions for each table in the database." (Vijay K Pallaw, "Database Management Systems" 2nd Ed., 2010)

"In the days of mainframe computers, this was a listing of record layouts, describing each field in each type of file." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Software coupled with a data store for managing data definitions." (Craig S Mullins, "Database Administration", 2012)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations. They also contain information about which programs use which data and which users are interested in which reports." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A reference by which a team can understand what data assets they have, how those assets were created, what they mean, and where to find them." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A repository of the metadata useful to the corporation" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A comprehensive record of business and technical definitions of the elements within a dataset. Also referred to as a business glossary." (Jonathan Ferrar et al, "The Power of People", 2017)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations." (BAAN)

"A read-only collection of database tables and views containing reference information about the database, its structures, and its users." (Oracle)

"A set of system tables, stored in a catalog, that includes definitions of database structures and related information, such as permissions." (Microsoft Technet)

"A set of tables that keep track of the structure of both the database and the inventory of database objects." (IBM)

"A specialized type of database containing metadata; a repository of information describing the characteristics of data used to design, monitor, document, protect, and control data in information systems and databases; an application system supporting the definition and management of database metadata." (TOGAF)

"Metadata that keeps track of database objects such as tables, indexes, and table columns." (MySQL)

02 January 2017

Data Management: Information (Definitions)

"Information is data that increases the knowledge of the person who consumes it. Information is distinguished from data in that data may or may not be meaningful whereas information is always meaningful. For example, the numeric portion of an address is data, but it is not information." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"Usable, processed data, typically output from a computer program." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"Data that has been processed in such a way that it can increase the knowledge of the person who receives it." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"data that human beings assimilate and evaluate to solve a problem or make a decision." (William H Inmon, "Building the Data Warehouse", 2005)

"Information is data with context. It can be externally validated and is independent of applications." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Information can be defined as all inputs that people process to gain understanding." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"Sets of data presented in a context. Information about a business and its environment." (Steve Williams & Nancy Williams, "The Profit Impact of Business Intelligence", 2007)

"1.Generally, understanding concerning any objects such as facts, events, things, processes, or ideas, including concepts that, within a certain context and timeframe, have a particular meaning." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data that have been organized so they have meaning and value to the recipient." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Refers to all or part of a raw data item, which, on examination, turns out to be of interest. Such interest can be justified by means of explicit criteria. Also denotes an observation conducted in the field." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management 9th Ed", 2011)

"Data with additional context in the form of metadata, including definition and relationships between data and possibly other information. Data in context with metadata makes information." (Craig S Mullins, "Database Administration", 2012)

"In the context of this book, a collection of descriptors derived from observation, measurement, calculation, inference, or imagination in a form that can be shared with or communicated to others, or both. The format can be tangible or intangible or some combination of both." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"An organised and formatted collection of data" (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

"Any communication on or representation of facts or data in all forms (textual, graphical, audiovisual, digital)." (Gilbert Raymond & Philippe Desfray, "Modeling Enterprise Architecture with TOGAF", 2014)

"Data that has been organized or processed in a useful manner" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"One view understands information to be content- and purpose- specific knowledge, which is exchanged during human communication. Another takes the view of a purely informational processing perspective, according to which data is the building blocks for information. Accordingly, data is processed into information." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Data that has been processed to create meaning. Information is intended to expand the knowledge of the person who receives it. Information is the output of decision support systems and information systems." (Ciara Heavin & Daniel J Power, "Decision Support, Analytics, and Business Intelligence 3rd Ed.", 2017)

"Organized or structured data, processed for a specific purpose to make it meaningful, valuable, and useful in specific contexts." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide)", 2017)

"A structured collection of data presented in a form that people can understand and process. Information is converted into knowledge when it is contextualised with the rest of a person’s knowledge and world model." (Open Data Handbook)

01 February 2010

Data Warehousing: Cube Definitions)

"A subset of data, usually constructed from a data warehouse, that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. A cube's data is stored in one or more partitions." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Name for a dimensional structure on a multidimensional or online analytical processing (OLAP) database platform, originally referring to the simple three-dimension case of product, market, and time." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed, 2002)

"Proprietary data structure used to store data for an online analytical processing (OLAP) end user data access and analysis tool." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A multidimensional data structure that represents the intersections of each unique combination of dimensions. At each intersection there is a cell that contains a data value." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"Used with online analytical processing (OLAP), data cubes are multidimensional structures built from one or more tables in a relational database(s)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"A multidimensional structure that contains dimensions and measures." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A multidimensional structure that contains dimensions and measures. Cubes are a denormalized version of either the entire database or part of the database and are used within SQL Server Analysis Services (SSAS)." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A set of data that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A database object that organizes data for accessibility in an OLAP database." (Ken Withee, "Microsoft® Business Intelligence For Dummies®", 2010)

"A multi-dimensional data structure that contains an aggregate value at each point, i.e., the result of applying an aggregate function to an underlying relation. Data cubes are used to implement OLAP." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Refers to the multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value in the data cube is based on the x-, y-, and z-axes of the cube. Data cubes are static (must be created before they are used), so they cannot be created by an ad hoc query." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"A set of data that is organized and summarized into a multidimensional structure that is defined by a set of dimensions and measures." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A multidimensional representation of data needed for online analytical processing, multidimensional reporting, or multidimensional planning applications." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Cubes, also known as OLAP cubes, are preprocessed and presummarized collections of data that drastically improve query time. [...] OLAP cubes are logical structures as defined by the metadata." (Piethein Strengholt, "Data Management at Scale", 2020)

29 March 2009

DBMS: Data Model (Definitions)

"A method of organizing data into two-dimensional tables made up of rows and columns. The model is based on the mathematical theory of relations, a part of set theory." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A representation, usually graphical, of objects and their relationships, generally undertaken as part of designing an Oracle database application." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A formal way of describing the relationship between entities in a database to a database management system." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

"A data model is an abstraction or representation of the data in a given environment. It is a collection and subsequent verification and communication method for fully documenting the data requirements used in the creation of accurate, effective, and efficient physical databases. The data model consists of entities, attributes, and relationships." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A data model is a schematic showing the data in the warehouse, how the data relate to other data, and how the data should be structured. It is used to ensure that the data warehouse can substantiate all business requirements." (Margaret Y Chu, "Blissful Data", 2004)

"An integrated collection of concepts for describing data, relationships between data, and constraints on the data used by an organization." (Thomas M Connolly & Carolyn E Begg, "Database Solutions: A step-by-step guide to building databases", 2004)

"The specification of data structures and business rules needed to support a defined set of functions (sometimes called an Information Model); usually depicted in a diagram consisting of entities and relationships." (Margaret Y Chu, "Blissful Data ", 2004)

"(1) A data model is an abstract, self-contained, logical definition of the data structures, data operators, and so forth, that together make up the abstract machine with which users interact. (2) A data model is a model of the persistent data of some particular enterprise." (Christopher J Date, "Database in Depth", 2005)

"A data model is the specification of data structures and business rules to represent business requirements. This is an abstraction that describes one or more aspects of a problem or a potential solution addressing a problem." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A model that provides a two-dimensional structure to data." (Gavin Powell, "Beginning Database Design", 2006)

[object database model:] "A model that provides a three-dimensional structure to data where any item in a database can be retrieved from any point very rapidly." (Gavin Powell, "Beginning Database Design", 2006)

"(1) The logical data structures, including operations and constraints provided by a DBMS for effective database processing; (2) the system used for the representation of data (for example, the ERD or relational model). " (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

"A formal description of data managed by a business process. In most cases, these data are stored via a Database Management System (DBMS), and are also referenced by an Information System (IS) and, possibly, by a Decision Support Systems (DSS)" (C Combi & G Pozzi, "Workflow Management Systems for Healthcare Processes", 2008)

[Entity Data Model] "An EDM is an abstract logical representation of a physical database, used to implement database connectivity in the middle or client tiers." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

[navigational data model:] "A data model where relationships between entities are represented by physical data structures (for example, pointers or indexes) that provide the only paths for data access." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A formal description language to describe and to manipulate the investigated data instances. It contains three components: a static structural part, an integrity part and a manipulation part." (László Kovács & Tanja Sieber, "Multi-Layered Semantic Data Models" [in "Encyclopedia of Artificial Intelligence"], 2009)

"A paradigm for describing the structure of a database in which entities are represented as tables, and relationships between the entities are represented by matching data." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"An abstraction of how individual data elements relate to each other. It visually depicts how the data is to be organized and stored in a database. A data model provides the mechanism to document and understand how data is organized. (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"The formal way of expressing relationships in a database." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A representation of the structure of data. As used in this book, the term refers to a conceptual data model, which describes data in terms of their inherent semantics, without regard to how they might be organized in a physical database. Some use the term to describe a logical data model that organizes data in terms of a specific data management technology, such as relational tables and columns, object-oriented classes, or ISAM hierarchies." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A model that includes formal data names, comprehensive data definitions, proper data structures, and precise data integrity rules. A complete data model must include all four of these components." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A representation, usually graphic, of a complex 'real-world' data structure. Data models are used in the database design phase of the database life cycle." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management 9th Ed", 2011)

"A data model is a visual representation of data content and the relationships, created for purposes of understanding how data is or might be organized, and for ensuring the comprehensibility and usability of that way of organizing data." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"A representation, usually graphic, of a complex 'real-world' data structure. Data models are used in the database design phase of the database life cycle." (Carlos Coronel & Steven Morris, "Database Systems: Design, Implementation, & Management" 11th Ed., 2014)

[Entity Data Model (EDM):] "An abstract logical representation of a physical database, used to implement database connectivity in the middle or client tiers." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide 4th Ed", 2015)

"Represents data objects and their relationships with each other. Data models form the basis for data integration at the conceptual level as well as the improvement of data quality, such as with regard to the reduction of data redundancy.  Data models are one component of the data architecture." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"A visual means of depicting data and its relationship to other data." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"A description of the objects represented by a computer system together with their properties and relationships." (Besma Khalfi et al, "Enhanced F-Perceptory Approach for Dealing with Geographic Data Imprecision from the Conceptual Modeling to the Fuzzy Geographical Database Building", 2017)

"1. A representation, using text and/or graphics, of the definition, characterization, and relationships of data in a given environment. 2. No longer used, the DBMS architecture (hierarchical, network, relational, etc.)." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"In a data-centric benchmark, a database schema and a protocol for instantiating this schema, i.e. , generating synthetic data or reusing real-life data." (Jérôme Darmont, "Data-Centric Benchmarking", Encyclopedia of Information Science and Technology, Fourth Edition, 2018)

"An abstract model that describes how data is presented and used." (Piethein Strengholt, "Data Management at Scale", 2020)

"A description of data that consists of all entities represented in a data structure or database and the relationships that exist among them." (IEEE 610.5-1990)

16 March 2009

DBMS: Hash Table (Definitions)

"A data structure used internally by Perl for implementing associative arrays (hashes) efficiently. See also bucket." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

[hash cluster:] "A type of table cluster that is similar to an indexed cluster, except the index key is replaced with a hash function. No separate cluster index exists. In a hash cluster, the data is the index." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"An in-memory data structure that associates join keys with rows in a hash join. For example, in a join of the employees and departments tables, the join key might be the department ID. A hash function uses the join key to generate a hash value. This hash value is an index in an array, which is the hash table." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"The data structure used to store elements using hashing" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"An object that is like a dictionary or an associative array. A hash table stores and retrieves elements using key values called hashcodes. See also hashcode." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

[sorted hash cluster:] "A hash cluster that stores the rows corresponding to each value of the hash function in such a way that the database can efficiently return them in sorted order. The database performs the optimized sort internally." (Oracle, "Oracle Database Concepts")

"An in-memory data structure that associates join keys with rows in a hash join. For example, in a join of the employees and departments tables, the join key might be the department ID. A hash function uses the join key to generate a hash value. This hash value is an index in an array, which is the hash table." (Oracle, "Oracle Database Concepts")

"A two-dimensional table of items in which a hash function is applied to the key of each item to determine its hash value. The hash value identifies each item's primary position in the table, and if this position is already occupied, the item is inserted either in an overflow table or in another available position in the table." (IEEE 610.5-1990)

27 December 2007

Software Engineering: Data Structures (Just the Quotes)

"At the present time, choosing a programming language is equivalent to choosing a data structure, and if that data structure does not fit the data you want to manipulate then it is too bad. It would, in a sense, be more logical first to choose a data structure appropriate to the problem and then look around for, or construct with a kit of tools provided, a language suitable for manipulating that data structure." (Maurice V Wilkes, "Computers Then and Now", 1968)

"Choosing a better data structure is often an art, which we cannot teach. Often you must write a preliminary draft of the code before you can determine what changes in the data structure will help simplify control. [...] Choose a data representation that makes the program simple." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Let the data structure the program." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Use recursive procedures for recursively-defined data structures." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"The programmer's primary weapon in the never-ending battle against slow system is to change the intramodular structure. Our first response should be to reorganize the modules' data structures." (Fred Brooks, "The Mythical Man-Month: Essays on Software Engineering", 1975)

"The representation of knowledge in symbolic form is a matter that has pre-occupied the world of documentation since its origin. The problem is now relevant in many situations other than documents and indexes. The structure of records and files in databases: data structures in computer programming; the syntactic and semantic structure of natural language; knowledge representation in artificial intelligence; models of human memory: in all these fields it is necessary to decide how knowledge may be represented so that the representations may be manipulated." (Brian C Vickery, "Concepts of documentation", 1978)

"Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures." (Rob Pike, "Notes on Programming in C" , 1989)

"Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." (Rob Pike, "Notes on Programming in C", 1989)

"If a programmer designs a program, only half the job is done if they have only designed the data structures. They also have to design the procedures for operating on the structures. (Specifically, a programmer designs abstract data types.) Without the appropriate procedures for operating on data structures, a computer would literally get lost in the structures, even supposing it could start executing anything sensible." (Yin L Theng et al," 'Lost in hhyperspace': Psychological problem or bad design?", 1996)

"Often you'll see the same three or four data items together in lots of places: fields in a couple of classes, parameters in many method signatures. Bunches of data that hang around together really ought to be made into their own object." (Kent Beck, "Refactoring: Improving the Design of Existing Code", 1999)

"Smart data structures and dumb code works a lot better than the other way around." (Eric S Raymond, "The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary", 2001)

"In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful. […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." (Linus Torvalds, [email] 2006)

"Computation at its root consists of a data structure (for input, output, and perhaps something being stored in between) and some process. One cannot talk about the process without describing the data structure. More importantly, different data structures enable certain computations to be done easily, whereas other data structures support other computations. Thus, the choice of data structure (representation) helps explain why a problem-solver does or does not successfully engage in a given process (cognition/behavior) or perhaps why a process takes as long or as short as it does." (Christian D Schunn et al, "Complex Visual Data Analysis, Uncertainty, and Representation", 2007)

"One of the essential parts of a formal training in programming is a long and demanding study of the large collection of algorithms that have already been discovered and analyzed, together with the Data Structures (carefully tailored, seemingly unnatural ways of organizing data for effective access) that go with them. As with any other engineering profession, it is impossible to do a good job without a thorough knowledge of what has been tried before. If a programmer starts the job fully armed with what is already known, they will have some chance of finding something new. Inventiveness is important: not all problems have been seen before. A programmer who does not already know the standard algorithms and data structures is doomed to nothing more than rediscovering the basics." (Robert Plant & Stephen Murrell, "An Executive’s Guide to Information Technology: Principles, Business Models, and Terminology", 2007)

"A modeling language is usually based on some kind of computational model, such as a state machine, data flow, or data structure. The choice of this model, or a combination of many, depends on the modeling target. Most of us make this choice implicitly without further thinking: some systems call for capturing dynamics and thus we apply for example state machines, whereas other systems may be better specified by focusing on their static structures using feature diagrams or component diagrams. For these reasons a variety of modeling languages are available." (Steven Kelly & Juha-Pekka Tolvanen, "Domain-specific Modeling", 2008)

"Clearly, the search for a dividing line between code and data is fruitless—and not particularly flattering to our egos. Let’s abandon any attempt to find a higher truth here, and settle for a pragmatic definition. If a piece of generated text simply instantiates and provides values for a data structure, it’s data; otherwise, it’s code." (Steven Kelly & Juha-Pekka Tolvanen, "Domain-specific Modeling", 2008)

"Generally, the craft of programming is the factoring of a set of requirements into a a set of functions and data structures." (Douglas Crockford, "JavaScript: The Good Parts", 2008)

"If the data structure can’t be explained on a beer coaster, it’s too complex." (Felix von Leitner, "Source Code Optimization", 2009)

04 May 2006

Programming: Array (Definitions)

"A group of cells arranged by dimensions. A table is a two-dimensional array in which the cells are arranged in rows and columns, with one dimension forming the rows and the other dimension forming the columns. A cube is a three-dimensional array and can be visualized as a cube, with each dimension of the array forming one edge of the cube." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A collection of objects all of the same type." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours 3rd Ed.", 2001)

"A list of variables that have the same name and data type." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"Values whose members, called elements, are accessed by an index rather than by name. An array has a rank that specifies the number of indices needed to locate an element (sometimes called the number of dimensions) within the array. It may have either zero or nonzero lower bounds in each dimension." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A collection of data items, all of the same type, in which each item is uniquely addressed by a 32-bit integer index. Java arrays behave like objects but have some special syntax. Java arrays begin with the index value 0." (Marcus Green & Bill Brogden, "Java 2™ Programmer Exam Cram™ 2 (Exam CX-310-035)", 2003)

"A device that aggregates large collections of hard drives into a logical whole." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A data structure containing an ordered list of elements - any Ruby object - starting with an index of 0. Compare hash." (Michael Fitzgerald, "Learning Ruby", 2007)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"In a SQL database, an ordered collection of elements of the same data type stored in a single column and row of a table." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"A group of values stored together in a single variable and accessed by index." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A grouping of similar items of the same storage type in a sequential pattern, and referenced by a sequential index value." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A variable that holds a series of values with the same data type. An index into the array lets the program select a particular value." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"A basic collection of values that is a sequence represented by a single block of memory. Arrays have efficient direct access, but do not easily grow or shrink." (Mark C Lewis, "Introduction to the Art of Programming Using Scala", 2012)

"An ordered sequence of values, stored such that you can easily access any of the values using an integer subscript that specifies the value’s offset in the sequence." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A group of variables stored under a single name." (Matt Telles, "Beginning Programming", 2014)

"A structure composed of multiple identical variables that can be individually addressed." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A structure that contains an ordered collection of elements of the same data type in which each element can be referenced by its index value or ordinal position in the collection." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.