15 January 2009

🛢DBMS: Tables (Definitions)

"A collection of rows (records) that have associated columns (fields). The logical equivalent of a database file." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A collection of rows that have associated columns." (Patrick Dalton, "Microsoft SQL Server Black Book", 1997)

"The structure within a database that holds the data. A table is made iup of row and columns." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"An object in a database that stores data as a collection of rows and columns." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"A two-dimensional object, consisting of rows and columns, used to store data in a relational database. Each table stores information about one of the types of objects modeled by the database." (Microsoft Corporation, "Microsoft SQL Server 2000 System Administration", 2001)

"The basic construct of a relational database that contains rows and columns of related data." (Bob Bryla, "Oracle Database Foundations", 2004)

"A collection of rows holding similar columns of data. The corresponding formal database term is relation." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A two-dimensional storage location of data. Databases within SQL Server 2005 are composed of multiple related tables." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A two-dimensional object, consisting of rows and columns, used to store data in a relational database. Each table stores information about one of the types of objects modeled by the database, such as information about sales orders." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A two-dimensional array of data, consisting of one or more rows representing instances of the thing the table describes and one or more columns each containing a kind of data describing that thing." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"In a relational database, the collection of rows (or records) of a single type (similar to a file)." (Toby J Teorey, ", Database Modeling and Design 4th Ed", 2010)

"A two-dimensional arrangement of data into rows and columns." (Craig S Mullins, "Database Administration", 2012)

"A database object that stores data in records (rows) and fields (columns). The data is usually about a particular category of things, such as employees or orders." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In relational databases and also in Excel, where information is stored. Tables are essentially spreadsheets, or lists, that store database information." (E C Nelson & Stephen L Nelson, "Excel Data Analysis For Dummies ", 2015)

"In a relational database, a database object that consists of a specific number of columns and is used to store an unordered set of rows." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

🛢DBMS: Indexes (Definitions)

"A database object that consists of key values from the data tables, and pointers to the pages that contain those values. Indexes speed up access to data rows." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A set of pointers that are logically ordered by the values of a key. An index is a database object that provides access to data in the rows of a table, based on key values. Indexes provide quick access to data and can enforce uniqueness on the rows in a table. SQL Server supports clustered and nonclustered indexes. In a clustered index, data is stored in the same order as the index; in a nonclustered index, data is stored differently from the index." (Patrick Dalton, "Microsoft SQL Server Black Book", 1997)

"A database object that provides fast access to data in the rows of a table, based on key values. Indexes can also enforce uniqueness on the rows in a table. SQL Server supports clustered and nonclustered indexes." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A database object designed to reduce the amount of time it takes to retrieve rows from a table. An index is created based on one or more columns in the table." (Bob Bryla, "Oracle Database Foundations", 2004)

"In a relational database, a database object that provides fast access to data in the rows of a table, based on key values. Indexes can also enforce uniqueness on the rows in a table. SQL Server supports clustered and non-clustered indexes. The primary key/unique constraint automatically causes an index to be built. In full-text searches, a full-text index stores information about significant words and their location within a given column." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A data structure that uses one or more columns to make looking up values on those columns faster." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"In a relational database, a database object that provides fast access to data in the rows of a table, based on key values. The primary key of a table is automatically indexed." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A physical database object used to quickly access data in a database." (Craig S Mullins, "Database Administration", 2012)

10 January 2009

🛢DBMS: Aggregate Function (Definitions)

"A function that works on a set of cells to produce a single answer or set of answers, one for each subset of cells. The aggregate functions available in Transact-SQL are: average (avg), maximum (max), minimum (min), sum (sum), and count of the number of items (count)." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Include SUM, AVG, COUNT, COUNT(*), MAX, and MIN. These functions generate summary values that appear as new columns in query results. They act on the whole table or on a subset of the rows." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"Functions that calculate summary values, such as averages and sums, from the values in a particular column and return a single value for each set of rows to which the function applies. The aggregate functions are AVG, COUNT, COUNT(*), MAX, MIN, SUM, STDEV, STDEVP, VAR, and VARP. Aggregate functions can be applied either to all rows in a table, to a subset of table rows specified by a WHERE clause, or to one or more groups of table rows specified by the GROUP BY clause." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"An aggregate function performs a calculation on a column in a set of rows and returns a single value." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A type of function in Oracle SQL that performs a calculation or transformation across multiple rows in a table, rather than just on a single row." (Bob Bryla, "Oracle Database Foundations", 2004)

"Functions that provide summary data over sets returning a singular value." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A function that performs a calculation at the column level on a set of rows to return a single value. Examples of T-SQL aggregate functions include AVG, MIN, and SUM." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A function that produces a single result based on the contents of an entire set of table rows." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A function that performs a summary calculation on a series of data and returns a single value. Each aggregate function uses the Scope parameter, which defines the scope (such as grouping, data set, or data region) in which the aggregate function is performed." (Jim Joseph et al, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A SQL function - for example, AVG and SUM - that computes a variety of measures based on values in one or more numeric columns." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"A function that performs a calculation on multiple values and returns a single value." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A function that optionally accepts arguments and returns a single scalar value that is the result of an evaluation of a set of like values, such as those in a column within a set of one or more rows. See also function, routine." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A function that generates a single value from a group of values, often used with GROUP BY and HAVING clauses. Aggregate functions include AVG, COUNT, MAX, MIN, and SUM. Also known as set functions. See also scalar function." (Microsoft, "ODBC Glossary")

"A function that performs a calculation on multiple values and returns a single summary value." (Microsoft Technet)

"In a query, a function such as COUNT, AVG, or STDEV that calculates a value using all the rows in a column of a table. In writing expressions and in programming, you can use SQL aggregate functions (including the three listed above) and domain aggregate functions to determine various statistics." (Microsoft, "ADO Glossary Terms")

🛢DBMS: Function (Definitions)

"A set of instructions that operates as a single logical unit, can be called by name, accepts input parameters, and returns information. In programming languages such as C, a function is a named subroutine of a program that encapsulates some logic. The function can be called by name, using parameters to pass data into the function and retrieve data produced by the function. In Transact-SQL, a function is a unit of syntax consisting of a keyword and, usually, a set of parameters. There are several categories of Transact-SQL functions: string, math, system, niladic, text and image, date, aggregate, and conversion functions." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A program that returns a value to the program or environment from which it is called." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A section of code that operates as a single logical unit. Transact-SQL supports both built-in functions and user-defined functions. The built-in functions cannot be modified by SQL Server 2000 users, whereas users can create and modify user- defined functions." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A named set of predefined programming language commands that performs a specific task given zero, one, or more arguments and returns a value." (Bob Bryla, "Oracle Database Foundations", 2004)

"A programming unit or expression returning a single value, also allowing determinant values to be passed in as parameters. Thus, parameter values can change the outcome or return result of a function. The beauty of a function is that it is self-contained and can thus be embedded into an expression." (Gavin Powell, "Beginning Database Design", 2006)

"A set of instructions that operates as a single logical unit." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A centralized, server-based routine that can be included as part of your Transact-SQL statements. Typically used to streamline logic and reduce the amount of required programming effort, you can build your own functions." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"Functions are routines that can accept parameters, perform an action, and return the result of that action. SQL Server includes many built-in functions. User- defined functions can be created to meet specific needs." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A piece of code that operates as a single logical unit. A function is called by name, accepts optional input parameters, and returns a status and optional output parameters. Many programming languages support functions, including C, Visual Basic, and Transact-SQL. Transact-SQL supplies built-in functions, which cannot be modified, and supports user-defined functions, which can be created and modified by users." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A small program that performs one task and returns a single value. It may be built into the SQL language or written by a user, database administrator, or application programmer." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A piece of code that operates as a single logical unit. A function is called by name, accepts optional input parameters, and returns a status and optional output parameters. Many programming languages support functions." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A predefined algorithm supported by the DBMS for converting, manipulating, or calculating data based upon input. Functions can be used within SQL statements any place an expression can be used." (Craig S Mullins, "Database Administration", 2012)

"A relationship between a set of input data values and a set of result values that is used to extend and customize SQL or XQuery. Functions are invoked from elements of SQL statements such as the select list or the FROM clause and from XQuery as primary expressions." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

04 January 2009

🛢DBMS: User-defined Functions [UDFs] (Definitions)

"A Transact- SQL function defined by a user. Functions encapsulate frequently performed logic in a named entity that can be called by Transact-SQL statements instead of recoding the logic in each statement." (Microsoft Corporation, Microsoft SQL Server 2000 system administration, 2001)

"Functions that you write as opposed to built-in functions supplied by a programming language." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"A user-defined function is a Transact-SQL function defined by a user. Functions encapsulate frequently performed logic in a named entity that can be called by Transact-SQL statements instead of recoding the logic in each statement." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A function that is written by an analyst, user, or database administrator and does not come as part of the default installation of the Oracle server software." (Bob Bryla, "Oracle Database Foundations", 2004)

"A collection of T-SQL statements with a well-defined set of input parameters, but only one output which can be a scalar value or a table. User-defined functions allow the encapsulation of various logical and database operations, but cannot be used to affect changes to a database." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A custom function written by developers in SQL Server." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance 70-444" , 2007)

"A Transact-SQL function defined by a user. Functions encapsulate frequently performed logic in a named entity that can be called by Transact-SQL statements instead of recoding the logic in each statement." (Microsoft Docs, SQL Server 2005)

"Extensible code added to the DBMS to be callable like built-in database functions." (Craig S Mullins, "Database Administration" 2nd Ed, 2012)

"A function that is defined to the DB2 database system by using the CREATE FUNCTION statement and that can be referenced thereafter in SQL statements." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"In SQL Server, a Transact-SQL function defined by a user." (Microsoft Technet)

"A user-defined routine that returns at least one value. (IBM)

03 January 2009

🛢DBMS: Stored Procedures [SP] (Definitions)

"A collection of SQL statements and optional control-of-flow statements stored under a name. SQL Server-supplied stored procedures are called system procedures." (Karen Paulsell et al,  "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Transact-SQL statements stored under a name and processed as a unit. Stored procedures are stored within a database and can be executed with one request from an application. A stored procedure can also allow user-declared variables, conditional execution, and other powerful programming features." (Patrick Dalton, Microsoft SQL Server Black Book, 1997)

"Transact-SQL programs stored on the server in a compiled form. Processing for these objects is done at the server level, making them as fast as any other tables the procedure references on the local machine. They can be used to enforce entity integrity, referential integrity, and domain integrity." (Rob Scrimger et al, "MCSE TestPrep: SQL Server 6.5  Administration", 1998)

"A precompiled collection of Transact-SQL statements and optional control-of-flow statements stored under a name and processed as a unit. Stored procedures are stored within a database, can be executed with one call from an application, and allow user-declared variables, conditional execution, and other powerful programming features." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"A stored procedure is a group of Transact-SQL statements compiled into a single execution plan." (Microsoft Corporation, "SQL Server 2000 Books Online", 2000) 

"A program that resides and executes inside a database system. In Oracle, the term generally encompasses not only procedures but also functions and packages." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A software program stored in the database itself to be executed on the server based on stipulated conditions." (Paulraj Ponniah, "Data Warehousing Fundamentals", 2001)

"A precompiled collection of Transact-SQL statements stored under a name and processed as a unit." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A collection of T-SQL statements with a well-defined set of inputs, called input parameters, and a well-defined set of outputs, which may be output parameters, return values, or cursors. Stored procedures allow the encapsulation of various database operations." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A batch of SQL commands that are precompiled and saved as a procedure in SQL Server" (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A collection of T-SQL statements with a well-defined set of inputs, called input parameters, and a well-defined set of outputs, which may be output parameters, return values, or cursors. Stored procedures allow the encapsulation of various database operations." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"Also called a database procedure, a chunk of code stored within and executed from within a database, typically on data stored in a database (but not always)." (Gavin Powell, "Beginning Database Design", 2006)

"An executable code module in SQL Server." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance 70-444" , 2007)

"A piece of code stored in the database that can be executed by various pieces of code such as check constraints or application code. Stored procedures are a good place to store business logic that should not be built into the database's structure." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Centralized, server-based application code. Typically used to standardize business logic and reduce the amount of required programming effort, you can build your own stored procedures or leverage the many built-in stored procedures offered by SQL Server." (Robert D. Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A precompiled collection of Transact-SQL statements stored under a name and processed as a unit." (Jim Joseph et al, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A set of SQL statements (subroutine) available to applications accessing a relational database system. Stored procedures are physically stored in the database." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"A software program stored in the database itself to be executed on the server based on stipulated conditions." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"An executable module stored in a database instance, to be invoked as part of the verification process for particular data elements." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A SQL program module that is invoked by an application program using the SQL CALL command." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"(1) A named collection of procedural and SQL statements. (2) Business logic stored on a server in the form of SQL code or some other DBMS-specific procedural language." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A precompiled code routine stored within a Database Management System." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A precompiled routine or program stored within the DBMS and executable using DBMS commands." (Craig S Mullins, "Database Administration" 2nd Ed, 2012)

"A precompiled collection of SQL statements and optional control-of-flow statements stored under a name and processed as a unit. They are stored in an SQL database and can be run with one call from an application." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A database object that can contain executable database server logic. Permissions can be assigned to stored procedures in order to prevent unwanted data modifications and to provide more granular control of security." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"A precompiled collection of SQL statements and optional control-of-flow statements stored under a name and processed as a unit. They are stored in an SQL database and can be run with one call from an application." (Microsoft Docs, SQL Server 2014)

"A block of code that is stored in an SQL database and can be called by applications. In some systems the code is written in an extended SQL dialect, in others it is written in a standard programming language and contains Embedded SQL. Stored procedures are used for one or more of the following reasons: to improve performance; to provide greater security; or to improve manageability." (Microfocus)

"A block of procedural constructs and embedded SQL statements that is stored in a database and that can be called by name. Stored procedures allow an application program to be run in two parts, one on the client and the other on the server, so that one call can produce several accesses to the database." (IBM)

"A named PL/SQL block or Java program that Oracle Database stores in the database. Applications can call stored procedures by name." (Oracle)

"A precompiled collection of code such as SQL statements and optional control-of-flow statements stored under a name and processed as a unit. Stored procedures are stored within a database; they can be executed with one call from an application and allow user-declared variables, conditional execution, and other powerful programming features." (Microsoft)

"A precompiled collection of Transact-SQL statements that are stored under a name and processed as a unit." (Microsoft Technet)

"A stored procedure is a combination of SQL statements and control and conditional handling statements that provides an interface to the database." (Teradata)

"A stored procedure is a precompiled executable object that contains one or more Transact-SQL statements." (Microsoft) [source

"A stored procedure is an executable object stored in a database." (Microsoft) [source]

02 January 2009

🛢DBMS: Views (Definitions)

"An alternative way of looking at the data in one or more tables. Usually created as a subset of columns from one or more tables." (Karen Paulsell et al,  "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Predefined query that provides an alternative method of looking at the data in one or more underlying tables. Typically views are used to focus a users view of data and for security." (Rob Scrimger et al, "MCSE TestPrep: SQL Server 6.5  Administration", 1998)

"An alternate way of looking at data from one or more tables in the database. A view is a virtual table, usually created as a subset of columns from one or more tables, which is expressed as a SELECT statement. Depending on the definition, data in base tables can be modified through views." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"A 'virtual table', usually referring to a database object that has been named and created with SQL's CREATE VIEW statement. Usually created for read-only purposes, it is possible to update the database through some views; as of Oracle8, it is also possible to associate an INSTEAD OF trigger with a view to allow INSERT, UPDATE, and DELETE operations on the view." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"SQL statement that creates logical copies of a table or a complete query that can be used separately in a SELECT statement. Views are semantically independent, so the separate roles of a dimension usually are implemented as views." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"A database object that can be referenced the same way as a table in SQL statements. Views are defined using a SELECT statement and are analogous to an object that contains the result set of this statement." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A view is a relational database object that can be referenced and built by using SELECT statements to join data from one or more base tables. Views are similar to tables in that data can be retrieved and modified and indexes can be built." (Thomas Moore, "MCSE Training Guide 70-229: Designing and Implementing Databases with Microsoft® SQL Server™ 2000 Enterprise Edition", 2003)

"A database object that is based on a SELECT statement against one or more tables or other views in the database. A regular view does not store any data in the database; only the definition is stored. Views are also known as stored queries." (Bob Bryla, "Oracle Database Foundations", 2004)

"A relational database object that can be referenced and built by using SELECT statements to join data from one or more base tables. Views are similar to tables in that data can be retrieved and modified and indexes can be built." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006) 

"A query that is stored in the data dictionary and is rerun when called for. A view appears to a user to be a table." (Sikha S Bagui & Richard W Earp, "Learning SQL on SQL Server 2005", 2006)

"virtual representations of a table whose contents are defined by a query" (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

 "A database object in SQL Server used to encapsulate a query. Commonly referred to as a virtual table." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance 70-444" , 2007)

"A virtual table which is derived from base table using a query." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A virtual grouping of one or more tables, often done to reduce complexity while increasing security and reliability." (Robert D. Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A stored SQL query from which a virtual table is created for use each time the name of the view is used." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A virtual table consisting of columns and selected instances from other tables assembled in a systematic way." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A presentation of a set of data from one or more physical tables as one logical table. A view can include some or all the rows and columns from each contributing table, and can be defined as the result table from a SELECT statement." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A virtual table, defined as a SQL SELECT statement, to provide a subset of data from one or more tables." (Craig S Mullins, "Database Administration" 2nd Ed, 2012)

"A virtual table that uses the result set of a saved query." (Faisal Akkawi et al, "MCSA Guide to Microsoft SQL Server 2012 - Exam #70-462", 2014)

"A database object that is created from one or more tables and can be used the same as a table. A view is a virtual table that has no storage requirements of its own." (Arie D Jones, "SQL in 24 Hours" 6th Ed., 2015)

"A virtual relation defined by the database administrator in order to keep subjects from viewing certain data." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK 4th Ed.", 2015)

01 January 2009

🛢DBMS: Database Object (Definitions)

"One of the components of a database: table, view, index, procedure, trigger, column, default, or rule." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"One of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Any structure or entity that exists in an Oracle database, such as a table, index, PL/SQL program, or view. For a list of database objects owned by the current user, look in the data dictionary's USEROBJECTS view." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"Any database component. It could be a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure in a database." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Any of the various items included in a database including tables, views, diagrams, and so on." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"Any object in a database, such as a table, a view, an index, a stored procedure, or a trigger." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"An object that exists in an installation of a database system, such as an instance, a database, a database partition group, a buffer pool, a table, or an index." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

02 December 2008

🧭Business Intelligence: Perspectives (Part I: General Issues)

Business Intelligence
Business Intelligence Series

Introduction

BI projects are noble in intent though many managers and data professionals ignore their implications and prerequisites – data quality (incl. availability), cooperation, maturity, infrastructure, adequate tools and knowledge.

Data Quality

The problem with data starts usually at the source - ERP and other information systems (IS). In theory the system should cover all the basic reporting requirements existing in an enterprise, though that's seldom the case. Therefore, basic reporting needs arrive to be covered by ad-hoc developed tools which often include MS Excel/Access solutions, which are difficult to integrate and manage across organization.

Data Quality (DQ) is maybe the most ignored component in the attempt to build flexible, secure and reliable BI solutions. DQ is based on the validation implemented in source systems and the mechanisms used to cleanse the data before being reported, respectively on the efficiency and effectiveness of existing business processes and best practices.

DQ must be guaranteed for accurate decisions. If the quality is not validated and reviewed periodically, users will be reluctant in using the reports! The reports must be validated as part of the UAT process. Aggregated BI reports need detailed reports that can be used for validation, while the logic and data need to be synchronized accordingly.

The quality of decisions is based on the degree to which data were understood and presented to the decisional factors, though that’s not enough; it's need also a complete perspective, and maybe that’s why some business users prefer to prepare and aggregate data by themselves, the process allowing them in theory to get a deeper understanding of what’s happening.

Cooperation

A BI initiative doesn’t depend only on the effort of a department (usually IT), but on the business as a whole. Unfortunately, the so called partnership is more a theoretical term than a fact, while managers’ and business users' involvement is often suboptimal. 

BI implementations are also dependent on consultants’ skills and the degree to which they understood business’ requirements, on team’s cohesion and other project (management) related prerequisites, respectively on knowledge transfer and training. 

Tools

Most of the BI tools available on the market don’t satisfy all business, respectively users’ requirements. Even if they excel in some features, they lack in others. Usually, more than one BI tool is needed to cover (most of) the requirements. When features are not available, or they are not mature enough, or they are difficult to learn, users will prefer to use tools they already know.

Another important consideration is that BI tools rely on data models, often inflexible from the point of the data they provide, lacking integrating additional datasets, algorithms and customizations. The overall requirements need to be considered more recently from the point of cloud computing technologies, which becomes steadily a requirement for nowadays business dynamics. 

Maturity 

Besides the fact that Capability Maturity Models (CMMs) are difficult to implement, organizations lack the knowledge of transforming data into knowledge, respectively in understanding data and evolving it further in wisdom and competitive advantage. 

Most of the fancy words used by salesmen to sell a product don’t become reality overnight. Of course, a BI tool might have the potentiality of fulfilling the various technical and nontechnical goals, though between a theoretical potentiality and harnessing the respective potential is a long road that need to be addressed at strategical, tactical and operational levels.

Infrastructure

Infrastructure refers to human and technical components and the way they interact in getting the job done. It's not only about "breaking habits" and using the best tools, but in aligning people and technologies to the desired level of performance, of retaining and diffusing knowledge. 

11 November 2008

🗄️Data Management: Data Quality (Part I: Information Systems' Perspective)

Data Management
Data Management Series

One LinkedIn user brought to attention the fact that according to top IT managers the top two reasons why CRM investments fail is: (1) managing resistance within the organization; (2) bad data quality.

The two reasons are common not only to CRM or BI solutions but also to other Information Systems, though from the two data quality has usually the biggest impact. Especially in ERP systems the data quality continues to be a problem and here are a few reasons:
  • Processes span different functions and/or roles, each of them maintaining the data they are interested in, without any agreement or coordination on the ownership. The lack of ownership is in general management’s fault.
  • Within an enterprise many systems arrive to be integrated, the quality of the data depending on the quality and scope of the integrations, whether they were addressed fully or only superficially. Few integrations are stable and properly designed. If stability can be obtained in time, scope is seldom changed as it involves further investments, and thus the remaining data need to be maintained manually, respectively the issues need to be troubleshooted or let accumulate in the backlog.
  • There are systems which are not integrated but use the same data, users needing to duplicate their effort, so they often focus on their immediate needs. Moreover, the lack of mappings between systems makes data analysis and review difficult. 
  • The lack of knowledge about the systems used in terms of processes, procedures, best practices, policies, etc. Users usually try to do their best based on the knowledge they have, and despite their best intent, the systems arrive to be misused just to get things done. 
  • Basic or inexistent validation for data entry in each important entry point (UI, integration interfaces, bulk upload functionality), system permissiveness (allowing workarounds), stability and reliability (bugs/defects).
  • Inexistence of data quality control mechanisms or quality methodologies, respectively a Data and/or Quality Management strategy. If the data quality is not kept under review, it can easily decrease over time. 
  • The lack of a data culture and processes that support data quality.
  • People lack consistency and/or the self-discipline to follow the processes and update the data as the processes requires it and not only the data to move to the next or final step. Therefore, the gap between reality and the one presented by the system is considerable.
  • People are not motivated to improve data quality even if they may recognize the importance of doing that.
Data quality is usually ignored in BI projects, and this is because few are the ones that go and search for the causes, making it easier to blame the BI solution or the technical team than to do something. This is one of the reasons for which users are reticent in using a BI solution, to which add up solution’s flexibility and the degree up to which the solution satisfies users’ needs. On the other side BI solutions are often abused, including also reports which have OLTP characteristics or of providing too much unstructured or inadequate content that needs to be further reworked.

Data quality comes on the managers' agenda, especially during ERP implementations. Unfortunately, as soon as that happens, it also disappears, despite being warned of the consequences poor data quality might have on the implementation and further data use. An ERP implementation is supposed to be an opportunity for improving the data quality, though for many organizations it remains in this state. Once this opportunity passes, organizations need more financial and human resources to reach a fraction from the opportunity missed.

The above topics are complex and need further discussion (see [1], [2]).


Written: Nov-2008, Last Reviewed: Mar-2024

Resources:
[1] SQL-Troubles (2010) Data Management: Data Quality - An Introduction (link)
[2] SQL-Troubles (2012) Data Migration: Data Quality’s Perspective I - A Bird’s-Eye View (link)

08 November 2008

💎SQL Reloaded: Dealing with data duplicates on SQL Server

Duplicates or duplications can be defined as "result rows that are identical to one or more other rows" [1]. They occur frequently in data collections (e.g. Excel, Access, etc.) with concurrent access that have minimum or no validation at all. More users and/or more frequent the updates on the same information, higher the risks of having duplicated data. Duplicates occur also in modern architectures when validation wasn’t addressed correspondingly or from usage related issues, for example, in information systems the huge amount of data entry makes it hard to overlook documents already processed (e.g. purchase orders, invoices, payments, etc.).

Subject to duplication are whole records, a group of attributes (fields) or only single attributes. I depends from case to case. Often duplicates are easy to identify - it’s enough to let somebody who has the proper knowledge to look over them. But what you do when the volume of data is too large or when is need to automate the process as much as possible? Using the DISTINCT keyword in a SELECT statement might do the trick, while other times it requires more complicated validation, ranging from simple checks to Data Mining techniques.

I will try to exemplify the techniques I use to deal with duplicates with the help of a simple example based on table that tracks information about Assets:

-- create test table
CREATE TABLE [dbo].[Assets](
 [ID] [int] NOT NULL,
 [CreationDate] smalldatetime NOT NULL,
 [Vendor] [varchar](50) NULL,
 [Asset] [varchar](50) NULL,
 [Model] [varchar](50) NULL,
 [Owner] [varchar](50) NULL,
 [Tag] [varchar](50) NULL,
 [Quantity] [decimal](13, 2) NULL
) ON [PRIMARY]

Here's some test data:

-- insert test data (SQL Server 2000+)
INSERT INTO dbo.Assets
VALUES ('1', DATEADD(d,-5, GetDate()), 'IBM','Laptop 1','Model 1','Owner 1','XX0001','1')
INSERT INTO dbo.Assets
VALUES  ('2', DATEADD(d,-4, GetDate()),'IBM','Laptop 2','Model 2','Owner 2','XX0002','1')
INSERT INTO dbo.Assets
VALUES  ('3', DATEADD(d,-3, GetDate()),'Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
INSERT INTO dbo.Assets
VALUES ('4', DATEADD(d,-3, GetDate()),'Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
INSERT INTO dbo.Assets
VALUES  ('5', DATEADD(d,-3, GetDate()),'Dell','Laptop 4','Model 4','Owner 3','DD0001','1')
INSERT INTO dbo.Assets
VALUES  ('6', DATEADD(d,-1, GetDate()),'Dell','Laptop 4','Model 4','Owner 4','DD0001','1')

    
Let’s check table’s content:

-- review the data
SELECT ID, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity
FROM dbo.Assets

Output:
ID CreationDate Vendor Asset Model Owner Tag Quantity
1 1/29/2024 10:46:00 PM IBM Laptop 1 Model 1 Owner 1 XX0001 1
2 1/30/2024 10:46:00 PM IBM Laptop 2 Model 2 Owner 2 XX0002 1
3 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
4 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
5 1/31/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 3 DD0001 1
6 2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1

Normally, a Tag or a Serial Number should uniquely identify a product coming from the same Vendor, so in this case duplicates’ identification will resume to the records in which the Tag and Vendor combination appears more than once:

-- retrieve the duplicates
SELECT Vendor, Tag
FROM dbo.Assets A    
GROUP BY Vendor, Tag
HAVING COUNT(*)>1

Output:
Vendor Tag
Dell DD0001
Microsoft WX0001

In many cases showing only the duplicated values would be enough, however there are cases in which is required to see the whole record in which the duplicates appeared, for example to clarify the context in which the issues appear. A simple JOIN with the base table would provide the affected records:

-- retrieve duplicates with details
SELECT A.Id, A.CreationDate, A.Vendor, A.Asset, A.Model, A.Owner, A.Tag, A.Quantity
FROM dbo.Assets A
     JOIN (-- duplicates
	  SELECT Vendor, Tag
	  FROM dbo.Assets A    
	  GROUP BY Vendor, Tag
	  HAVING COUNT(*)>1
     ) B
      ON A.Vendor = B.Vendor 
     AND A.Tag = B.Tag

Output:
Id CreationDate Vendor Asset Model Owner Tag Quantity
5 1/31/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 3 DD0001 1
6 2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1
3 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
4 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1

There are two scenarios in which we need to remove the duplicates – in a result set or within an existing table.

In a result set normally it's enough to use the DISTINCT keyword to remove duplicated rows:

-- select unique records
SELECT DISTINCT CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity
FROM dbo.Assets

Output:
CreationDate Vendor Asset Model Owner Tag Quantity
1/29/2024 10:46:00 PM IBM Laptop 1 Model 1 Owner 1 XX0001 1
1/30/2024 10:46:00 PM IBM Laptop 2 Model 2 Owner 2 XX0002 1
1/31/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 3 DD0001 1
1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1

In our example only some combinations are duplicated while the other attributes might slightly differ, and therefore is needed another approach. First of all we need to identify which one is the most reliable record, in some cases the latest records entry should be the most accurate or closer to reality, but that’s not necessarily the truth. There are also cases in which we don’t care which the record that is selected is, but from experience these cases are few.

Oracle and SQL Server introduced the dense_rank() analytic function, which returns the rank of rows within the partition of a result set, without any gaps in the ranking. In our case the partition is determined by Vendor and Tag, following to identify which the logic used for raking. Supposing that we are always interested in the last record entered, the query would look like this:

-- retrieve duplicates via ranking functions 
SELECT Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity
FROM (--subquery 
 SELECT Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity
 , dense_rank() OVER(PARTITION BY Vendor, Tag ORDER BY CreationDate DESC , Id DESC) RANKING 
 FROM dbo.Assets 
) A 
WHERE RANKING = 1

Output:
CreationDate Vendor Asset Model Owner Tag Quantity
1/29/2024 10:46:00 PM IBM Laptop 1 Model 1 Owner 1 XX0001 1
1/30/2024 10:46:00 PM IBM Laptop 2 Model 2 Owner 2 XX0002 1
1/31/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 3 DD0001 1
1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1

Things are simple when records’ creation date is available and it's unique for the key used to validate the data, because a descending order would allow selecting the last record first. An Id can be added to the clause for the cases in which multiple records have the same date. At least in this example the same output is obtained by using the row_number analytic function, which returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. Actually, the dense_rank function assigns the same rank for the same occurrence of the values appearing in ORDER BY clause (within the same partition), and thus forces the developer to choose the exact attributes that make a field unique within a partition, while row_number() will return a sequence no matter of the attributes used in ORDER BY clause. 

Both functions should provide similar performance. Conversely, when the selection of records within a partition is not important, it’s better to use the row_number() function, which would need to do less processing.

Unfortunately, this technique doesn’t work in SQL Server 2000, where a different approach is needed. In most of the cases the unique identifier for a record is a sequential unique number, the highest id corresponding to the latest entered record. This would allow selecting the latest entered record, by using the Max function:

-- nonduplicated records (SQL server 2000+)
SELECT A.Id, A.CreationDate, A.Vendor, A.Asset, A.Model, A.Owner, A.Tag, A.Quantity
FROM dbo.Assets A
     JOIN ( -- last entry
	  SELECT Vendor, Tag, MAX(Id) MaxId
	  FROM dbo.Assets A    
	  GROUP BY Vendor, Tag
	 -- HAVING count(*)>1
     ) B
      ON A.Vendor = B.Vendor 
     AND A.Tag = B.Tag 
     AND A.ID = B.MaxId

Output:
Id CreationDate Vendor Asset Model Owner Tag Quantity
4 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
2 1/30/2024 10:46:00 PM IBM Laptop 2 Model 2 Owner 2 XX0002 1
1 1/29/2024 10:46:00 PM IBM Laptop 1 Model 1 Owner 1 XX0001 1
6 2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1

The same technique can be used to delete the duplicates from a table:

-- nonduplicated records (SQL server 2000+)
SELECT A.Id, A.CreationDate, A.Vendor, A.Asset, A.Model, A.Owner, A.Tag, A.Quantity
FROM dbo.Assets A
     JOIN ( -- last entry
	  SELECT Vendor, Tag, MAX(Id) MaxId
	  FROM dbo.Assets A    
	  GROUP BY Vendor, Tag
	 -- HAVING count(*)>1
     ) B
      ON A.Vendor = B.Vendor 
     AND A.Tag = B.Tag 
     AND A.ID = B.MaxId

When an Id is not available, then we have most probably to create a temporary table with a sequence (numeric unique identifier), sort the records based on the criteria we chose for selection, and then apply the technique based on Id as above.

Notes:
1. In other scenarios it’s important to select all the records matching extreme values (first, last), the dense_rank function becoming handy, however for versions that doesn’t supports it, a creation date attribute saves the day, when available, and it's unique:

-- nonduplicated records (SQL server 2000+)  
SELECT A.Id, A.CreationDate, A.Vendor, A.Asset, A.Model, A.Owner, A.Tag, A.Quantity
FROM dbo.Assets A
     JOIN (-- last entry
	  SELECT Vendor, Tag, MAX(CreationDate) LastCreationDate
	  FROM dbo.Assets A    
	  GROUP BY Vendor, Tag
	  -- HAVING count(*)>1
     ) B
      ON A.Vendor = B.Vendor 
     AND A.Tag = B.Tag 
     AND DateDiff(d, A.CreationDate, B.LastCreationDate)=0

Output:
Id CreationDate Vendor Asset Model Owner Tag Quantity
6 2/2/2024 10:46:00 PM Dell Laptop 4 Model 4 Owner 4 DD0001 1
1 1/29/2024 10:46:00 PM IBM Laptop 1 Model 1 Owner 1 XX0001 1
2 1/30/2024 10:46:00 PM IBM Laptop 2 Model 2 Owner 2 XX0002 1
3 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1
4 1/31/2024 10:46:00 PM Microsoft Laptop 3 Model 3 Owner 2 WX0001 1

2. Duplicated records can be created by merging two sets of data with UNION ALL, in such cases a simple UNION would remove the duplicates introduced by the queries coming after the first one however this will burden the database with more checks. It is recommended to find other solutions, when possible!

3. Instead of using a single multi-row insertion I used multiple insertion statements because I preferred to make the tutorial usable also on SQL Server 2000. Here’s the single multi-row insertion statement:

-- insert test data (SQL Server 2005+)
INSERT INTO dbo.Assets
VALUES ('1', DATEADD(d,-5, GetDate()), 'IBM','Laptop 1','Model 1','Owner 1','XX0001','1')
, ('2', DATEADD(d,-4, GetDate()),'IBM','Laptop 2','Model 2','Owner 2','XX0002','1')
, ('3', DATEADD(d,-3, GetDate()),'Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
, ('4', DATEADD(d,-3, GetDate()),'Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
, ('5', DATEADD(d,-3, GetDate()),'Dell','Laptop 4','Model 4','Owner 3','DD0001','1')
, ('6', DATEADD(d,-1, GetDate()),'Dell','Laptop 4','Model 4','Owner 4','DD0001','1')

4. The above techniques should work also in Oracle with two amendments, attributes’ type must be adapted to Oracle ones, while instead of SQL Server GetDate() function should be used the corresponding Oracle SYSDATE function, as below:
 
-- insert test data (Oracle)
INSERT INTO dbo.Assets<br />
VALUES ('1', SYSDATE-1, 'IBM','Laptop 1','Model 1','Owner 1','XX0001','1')

References:
[1] S Hull (1998) Business Objects Glossary
[2] Microsoft Learn (2023) dense_rank() (link)
[3] Microsoft Learn (2023) row_number() (link)

ERP Systems: Learning about Oracle APPS internals I

How can I learn more about Oracle APPS (Oracle Applications) and from where? a normal question for every beginner but also for experienced professionals.

Oracle made available documentation about their products through Oracle Technology Network and Metalink. The first source contains documents mainly as pdf files, while Metalink provides richer content and it’s easier to use, however in order to access it, your company has to purchase an Oracle Support Identifier.

In Metalink, Oracle Applications’ documentation is grouped under eTRM (Electronic Technical Reference Manuals) section, while the pdf documents can be found under Oracle 11i Documentation Library, and many of them, especially for older versions, can be found also on the web, and revealed with a simple search by using tables' name or file’s name.
Both sources are by far incomplete, there are many gaps, not to forget that many of the Oracle implementations involve also some customization, information about these changes could find maybe in the documentation made during implementation/customization process.

Lately have appeared many blogs on Oracle Applications internals, and even if many of them resume by copying some material from Metalink or other documents, there are also professionals who respect themselves.

People can learn a lot by checking the objects that unveils the APPS internals, APPS.FND_TABLES providing the list of tables used, while APPS.FND_VIEWS provides the list of views, the problem with the later being that can't be done a search using the field that stores views' script, but the data can be exported to a text file and do the search in there (it won’t work to export the data completely to Excel). In time developers arrive to intuit how the views could be named, so a search on their name could help narrowing down the search.

Other professionals might be willing to help, so often it's a good idea to post questions on blogs, forums or social networks for professionals. Not all the questions get answered so rather than waiting for indirect enlightment, it’s better to do some research in parallel too.

There will be cases in which none of the specified sources will help you, most probably you'll have to reengineer Oracle Applications' internals by studying various business scenarios, and in this case the experimented users could help a lot.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.