Showing posts with label column. Show all posts
Showing posts with label column. Show all posts

27 October 2018

💎🏭SQL Reloaded: Drop If Exists (Before and After)

    One of the activities of a database developer/administrator is to create and drop objects on the fly. If in objects' creation there are always some aspects to take into account that are implied by object's definition, the verbose syntax for their destruction seemed to be an unnecessary thing. For example for dropping a table, view, stored procedure, function or index, the most used objects, one would need to write such statements:

-- dropping a table 
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestTable]') AND type in (N'U'))
DROP TABLE [dbo].[TestTable]

-- dropping a view 
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[TestView]'))
DROP VIEW [dbo].[TestView]

-- dropping a stored procedure 
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestProcedure]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[TestProcedure]

-- dropping a fucntion
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestFunction]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[TestFunction]
 
-- dropping an index
IF EXISTS (SELECT Name FROM sysindexes WHERE Name = 'IX_TestTable') 
DROP INDEX dbo.TestTable.IX_TestTable

   Even if Copy-Paste does its magic and SQL Server allows generating scripts for existing objects, there’s still some work do be done in order to drop an object. Fortunately, with SQL Server 2016 Microsoft introduced a simplified syntax for dropping an object, namely DROP IF EXISTS.

   The general syntax:

DROP <object_type> [ IF EXISTS ] <object_name>

  The above statements can be written as follows:

-- dropping the table 
DROP TABLE IF EXISTS dbo.TestTable

-- dropping the view 
DROP VIEW IF EXISTS dbo.TestView 

-- dropping the procedure 
DROP PROCEDURE IF EXISTS dbo.TestProcedure

-- dropping the function 
DROP FUNCTION IF EXISTS dbo.TestFunction
 
-- dropping the index 
DROP INDEX IF EXISTS dbo.TestTable.IX_TestTable

  Similarly can be dropped aggregates, assemblies, roles, triggers, rules, databases, schemas, users, sequences, synonyms, etc. The scripts will run also when the objects don’t exist.

  An object can't be dropped if explicit dependencies exist on them, e.g. when the table is referenced by a  FOREIGN KEY constraint. For each object there are specific rules that apply, therefore for more details check the documentation.

  To explore the functionality here are the definitions of the above objects and the further scripts to test them:

-- creating the test table
CREATE TABLE dbo.TestTable(City nvarchar(50)
, PostalCode nvarchar(50))

-- creating the test view 
CREATE VIEW dbo.TestView 
AS
SELECT 'Test' as Result

-- creating the test stored procedure  
CREATE PROCEDURE dbo.TestProcedure
AS
BEGIN
SELECT 'Test' as Result
END

-- creating the test function
CREATE FUNCTION dbo.TestFunction()
RETURNS nvarchar(50)
BEGIN
    RETURN 'Test'
END
 
-- creating the test index
CREATE NONCLUSTERED INDEX [IX_TestTable] ON [dbo].[TestTable]
(
 [PostalCode] ASC
)


--testing the table
SELECT *
FROM dbo.TestTable

-- testing the view 
SELECT *
FROM dbo.TestView 

-- testing the procedure 
EXEC dbo.TestProcedure

-- testing the function
SELECT dbo.TestFunction() as Result

   Moreover, the IF EXISTS can be used when dropping the constraint or column of a table:

-- adding a new column 
ALTER TABLE dbo.TestTable
ADD DateFrom datetime2(0)

-- adding a constraint on it
ALTER TABLE dbo.TestTable
ADD CONSTRAINT [DF_DateFrom_Default]  DEFAULT (GetDate()) FOR [DateFrom]

-- inserting test data
INSERT INTO dbo.TestTable(City, PostalCode)
VALUES ('New York', 'XZY')

--testing the changes
SELECT *
FROM dbo.TestTable

-- dropping the constraint
ALTER TABLE dbo.TestTable
DROP CONSTRAINT IF EXISTS DF_DateFrom_Default

-- dropping a column from a table
ALTER TABLE dbo.TestTable
DROP COLUMN IF EXISTS DateFrom

--testing the changes
SELECT *
FROM dbo.TestTable

    If a constraint exists on the column first must be dropped the constraint and after that the column, like in the above example.  

Happy coding!

28 February 2017

🧊Data Warehousing: Data Load Optimization (Part I: A Success Story)

Data Warehousing
Data Warehousing Series

Introduction

This topic has been waiting in the queue for almost two years already - since I finished optimizing an already existing relational data warehouse within a SQL Server 2012 Enterprise Edition environment. Through various simple techniques I managed then to reduce the running time for the load process by more than 65%, from 9 to 3 hours. It’s a considerable performance gain, considering that I didn’t have to refactor any business logic implemented in queries.

The ETL (Extract, Transform, Load) solution was making use of SSIS (SQL Server Integration Services) packages to load data sequentially from several sources into staging tables, and from stating further into base tables. Each package was responsible for deleting the data from the staging tables via TRUNCATE, extracting the data 1:1 from the source into the staging tables, then loading the data 1:1 from the staging table to base tables. It’s the simplest and a relatively effective ETL design I also used with small alterations for data warehouse solutions. For months the data load worked smoothly, until data growth and eventually other problems increased the loading time from 5 to 9 hours.

Using TABLOCK Hint

Using SSIS to bulk load data into SQL Server provides an optimum of performance and flexibility. Within a Data Flow, when “Table Lock” property on the destination is checked, it implies that the insert records are minimally logged, speeding up the load by a factor of two. The TABLOCK hint can be used also for other insert operations performed outside of SSIS packages. At least in this case the movement of data from staging into base tables was performed in plain T-SQL, outside of SSIS packages. Also further data processing had benefitted from this change. Only this optimization step alone provided 30-40% performance gain.

Drop/Recreating the Indexes on Big Tables

As the base tables were having several indexes each, it proved beneficial to drop the indexes for the big tables (e.g. with more than 1000000 records) before loading the data into the base tables, and recreate the indexes afterwards. This was done within SSIS, and provided an additional 20-30% performance gain from the previous step.

Consolidating the Indexes

Adding missing indexes, removing or consolidating (overlapping) indexes are typical index maintenance tasks, apparently occasionally ignored. It doesn’t always bring much performance as compared with the previous methods, though dropping and consolidating some indexes proved to be beneficial as fewer data were maintained. Data processing logic benefited from the creation of new indexes as well.

Running Packages in Parallel

As the packages were run sequentially (one package at a time), the data load was hardly taking advantage of the processing power available on the server. Even if queries could use parallelism, the benefit was minimal. Enabling packages run in parallel added additional performance gain, however this minimized the availability of processing resources for other tasks. When the data load is performed overnight, this causes minimal overhead, however it should be avoided when the data are loading to business hours.

Using Nonclustered Indexes

In my analysis I found out that many tables, especially the ones storing prepared data, were lacking a clustered index, even if further indexes were built on them. I remember that years back there was a (false) myth that fact and/or dimension tables don’t need clustered indexes in SQL Server. Of course clustered indexes have downsides (e.g. fragmentation, excessive key-lookups) though their benefits exceed by far the downsides. Besides missing clustered index, there were cases in which the tables would have benefited from having a narrow clustered index, instead of a multicolumn wide clustered index. Upon case also such cases were addressed.

Removing the Staging Tables

Given the fact that the source and target systems are in the same virtual environment, and the data are loaded 1:1 between the various layers, without further transformations and conversions, one could load the data directly into the base tables. After some tests I came to the conclusion that the load from source tables into the staging table, and the load from staging table into base table (with TABLOCK hint) were taking almost the same amount of time. This means that the base tables will be for the same amount of the time unavailable, if the data were loaded from the sources directly into the base tables. Therefore one could in theory remove the staging tables from the architecture. Frankly, one should think twice when doing such a change, as there can be further implications in time. Even if today the data are imported 1:1, in the future this could change.

Reducing the Data Volume

Reducing the data volume was identified as a possible further technique to reduce the amount of time needed for data loading. A data warehouse is built based on a set of requirements and presumptions that change over time. It can happen for example that even if the reports need only 1-2 years’ worth of data, the data load considers a much bigger timeframe. Some systems can have up to 5-10 years’ worth of data. Loading all data without a specific requirement leads to waste of resources and bigger load times. Limiting the transactional data to a given timeframe can make a considerable difference. Additionally, there are historical data that have the potential to be archived.

There are also tables for which a weekly or monthly refresh would suffice. Some tables or even data sources can become obsolete, however they continue to be loaded in the data warehouse. Such cases occur seldom, though they occur. Also some unused or redundant column could have been removed from the packages.

Further Thoughts

There are further techniques to optimize the data load within a data warehouse like partitioning large tables, using columnstore indexes or optimizing the storage, however my target was to provide maximum sufficient performance gain with minimum of effort and design changes. Therefore I stopped when I considered that the amount of effort is considerable higher than the performance gain.

Further Reading:
[1] TechNet (2009) The Data Loading Performance Guide, by Thomas Kejser, Peter Carlin & Stuart Ozer (link)
[2] MSDN (2010) Best Practices for Data Warehousing with SQL Server 2008 R2, by Mark Whitehorn, Keith Burns & Eric N Hanson (link)
[3] MSDN (2012) Whitepaper: Fast Track Data Warehouse Reference Guide for SQL Server 2012, by Eric Kraemer, Mike Bassett, Eric Lemoine & Dave Withers (link)
[4] MSDN (2008) Best Practices for Data Warehousing with SQL Server 2008, by Mark Whitehorn & Keith Burns (link)
[5] TechNet (2005) Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server, by Gandhi Swaminathan (link)
[6] SQL Server Customer Advisory Team (2013) Top 10 Best Practices for Building a Large Scale Relational Data Warehouse (link)

15 April 2009

🛢DBMS: Computed Column (Definitions)

 "A virtual column in a table whose value is computed at run time. The values in the column are not stored in the table, but are computed based on the expression that defines the column. An example of the definition of a computed column is: Cost as Price * Quantity." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A virtual column defined at the table level through a Transact-SQL expression." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A column in a table that displays the result of an expression instead of stored data. For example, InventoryCost = QuantityOnHand * ProductCost. A calculated column could be calculated on-the-fly with the results not being stored, or the data can be persisted, where the computed data is held within the table." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"If a table represents data about something, a column is the place in a table to keep information about some aspect of that thing. Each row in the table contains one value in each column." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A virtual column in a table whose value is computed at run time." (SQL Server 2012 Glossary, "Microsoft", 2012)

"In SQL, a relationship between the value of one column and the value of another column." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

10 April 2009

🛢DBMS: Surrogate Key (Definitions)

"A unique identifier for a row within a database table. A surrogate, or candidate, key can be made up of one or more columns. By definition, every table must have at least one surrogate key (in which case it becomes the primary key for a table automatically). However, it is possible for a table to have more than one surrogate key (in which case one of them must be designated as the primary key). Any surrogate key that is not the primary key is called the alternate key." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A primary key that is typically invisible to the end user. Normally, surrogate keys are used where end users have their own pre-existing identification schemes (such as an ISBN in a database of books), so the users can modify their existing identifiers." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"Integer keys that are sequentially assigned as needed in the staging area to populate a dimension table and join to the fact table. In the dimension table, the surrogate key is the primary key. In the fact table, the surrogate key is a foreign key to a specific dimension and may be part of the fact table’s primary key, although this is not required. A surrogate key usually cannot be interpreted by itself. That is, it is not a smart key in any way. Surrogate keys are required in many data warehouse situations to handle slowly changing dimensions, as well as missing or inapplicable data. Also known as artificial keys, integer keys, meaningless keys, non-natural keys, and synthetic keys." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"A surrogate key is a substitute key that is usually an arbitrary numeric value assigned by the load process or the database system. The advantage of the surrogate key is that it can be structured so that it is always unique throughout the span of integration for the data warehouse." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A single-part, artificially established identifier for an entity. Surrogate key assignment is a special case of derived data - one where the primary key is derived. A common way of deriving surrogate key values is to assign integer values sequentially." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

[artificial key:] "A system-generated, nonsignificant, surrogate identifier or globally unique identifier (GUID) used to uniquely identify a row in a table. This is also known as a surrogate key." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A redundant, unique key generated for a record in a data warehouse table to allow integration of data from multiple source systems and to support changing data over time." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The primary key column of a dimension table. The surrogate key is unique to the data warehouse. Key values have no intrinsic meaning, and are assigned as part of the ETL process. By avoiding the use of a natural key, the data warehouse is able to handle changes to operational data in a different manner from transaction systems. The use of a surrogate key also eliminates the need to join fact and dimension tables via multi-part keys." (Christopher Adamson, "Mastering Data Warehouse Aggregates", 2006)

"Used as a replacement or substitute for a descriptive primary key, allowing for better control, better structure, less storage space, more efficient indexing, and absolute surety of uniqueness. Surrogate keys are usually integers, and usually automatically generated using auto counters or sequences." (Gavin Powell, "Beginning Database Design", 2006)

"An artificial key field, usually with system-assigned sequential numbers, used in the dimensional model to link a dimension table to the fact table. In a dimension table, the surrogate key is the primary key which becomes a foreign key in the fact table." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A single-part, artificially established, physical identifier for a data set, usually not visible to business users, and used for database management and performance. Surrogate key assignment is a special case of derived data - one where the primary key is derived. A common way of deriving surrogate key values is to assign integer values sequentially. Sometimes referred to as a dummy key, sequential key, or auto-number field." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A system-assigned primary key, generally numeric and auto-incremented." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

🛢DBMS: Alternate Key (Definitions)

"Column or combination of columns, not the primary key columns, whose values uniquely identify a row in a table." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"A candidate key that is not used as the table's primary key." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Unique identifier for an entity instance other than the primary key." (Craig S Mullins, "Database Administration", 2012)

"Other columns that could be used as a primary key but are not implemented as such." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"In relational theory, any unique key that is not a primary key." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

05 April 2009

🛢DBMS: Composite Key (Definitions)

"An index key that includes two or more columns; for example, authors(au_lname, au_fname)." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A key composed of two or more columns. A drawback of composite keys is that they require more complex joins when two or more tables are joined." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Key in a database table made up of several columns. Same as Concatenated key. The overall key in a typical fact table is a subset of the foreign keys in the fact table. In other words, it usually does not require every foreign key to guarantee uniqueness of a fact table row." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"A key composed of two or more columns." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A candidate key made up of more than one attribute or column." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A primary key, unique key, or foreign key consisting of more than one field." (Gavin Powell, "Beginning Database Design", 2006)

"Multiple key columns used to uniquely identify a record." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"A candidate key comprising more than one attribute." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A key that includes two or more fields. Also called a compound key or concatenated key." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A key for a database table made up of more than one attribute or field." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A key whose definition consists of two or more fields in a file, columns in a table, or attributes in a relation." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A key with more than one attribute." (Craig S Mullins, "Database Administration", 2012)

"An ordered set of key columns or expressions where the referenced column names are from the same table. See also key." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

04 April 2009

🛢DBMS: Candidate Key (Definitions)

"A unique identifier for a row within a database table. A candidate, or surrogate, key can be made up of one or more columns. In a normalized database, every table must have at least one candidate key, in which case it is considered the primary key for a table automatically. However, it is possible for a table to have more than one candidate key, in which case one of them must be designated as the primary key. Any candidate key that is not the primary key is called the alternate key." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Any attribute or group of attributes that uniquely identifies each instance of an entity and that’s a possible choice for the primary key." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"Also known as a potential key, or permissible key. A field or combination of fields, which can act as a primary key field for a table. A candidate key uniquely identifies each record in the table." (Gavin Powell, "Beginning Database Design", 2006)

"Any data item or group of data items which identify uniquely tuples in a relation." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A minimal superkey. In other words, the fields in a candidate key uniquely define the records in a table and no subset of those fields also uniquely defines the records." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Any subset of the attributes (data items) in a superkey that is also a superkey and is not reducible to another superkey." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"A column or set of columns that have a unique value for each row in a table." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An attribute or set of attributes that can be used to uniquely identify an occurrence of the entity. Each entity may have one or more candidate keys. One of these candidate keys is selected as the table primary key." (Craig S Mullins, "Database Administration", 2012)

"A column or combination of columns that can be used as the primary key of a relation." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

02 April 2009

🛢DBMS: Foreign Key [FK] (Definitions)

"A key column in a table that logically depends on a primary key column in another table. Also, a column (or combination of columns) whose values are required to match a primary key in some other table." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"The column or combination of columns whose values match the primary key (PK) or unique key in the same or another table. A foreign key does not have to be unique. A foreign key is often in a many-to-one relationship with a primary key. Foreign key values should be copies of the primary key values; no value in the foreign key except NULL should ever exist unless the same value exists in the primary key. A foreign key may be NULL; if any part of a composite foreign key is NULL, the entire foreign key must be NULL." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A type of constraintthat enforces a logical relationshipbetween a row in one table and one or more rows in another table. Often, this is a master-detail relationship, where the primary key value of a row in one table appears in the "foreign key" column of one or more rows in another table." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A column in a relational database table whose values are drawn from the values of a primary key in another table. In a star-join schema, the components of a composite fact table key are foreign keys with respect to each of the dimension tables." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"A column or combination of columns whose values match the primary key (PK) or unique key in another table. The foreign key is also called the referencing key in some literature." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A column (or columns) in a table that draws its values from a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table." (Bob Bryla, "Oracle Database Foundations", 2004)

"A column or multiple columns whose values match the Primary Key of another table. Foreign Keys help in the relational process between two entities by connecting the foreign attribute in the child entity to a Primary Key in a parent entity." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"Column or combination of columns whose values are required to match a primary key in some other table." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A type of constraint where columns contain copies of primary key values, uniquely identified in parent entities, representing the child or sibling side of what is most commonly a one-to-many relationship." (Gavin Powell, "Beginning Database Design", 2006)

"Attribute or set of attributes that identifies the entity with which another entity is associated." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Used to maintain relationships between entities. Implemented as a field in a child table that has a corresponding primary key that it references." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"An FK is used to create a relationship between two tables, and typically points to a PK (PRIMARY KEY) in another table. The relationship enforces integrity between the two tables, allowing only entries in the FK table that exist in the PK table, and preventing deletions from the PK table if a related entry exists in the FK table." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"Information that establishes a relationship between two tables. By preventing erroneous data modifications, this association helps preserve data integrity." (Robert D. Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"One or more columns that are related to values in corresponding columns in another table." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An attribute (or combination of attributes) in a relation that is the same as the primary key of another relation. A foreign key may be a non-key attribute in its own relation, or it may be part of a concatenated primary key." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"The column or combination of columns whose values match the primary key (PK) or unique key in the same or another table." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"An attribute in a relational table used for establishing the direct relationship with another table, known as the parent table. The values for the foreign key attribute are drawn from the primary key values of the parent table." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"Any attribute in a SQL table (key or nonkey) that is taken from the same domain of values as the primary key in another SQL table and can be used to join the two tables (without loss of data integrity) as part of a SQL query." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"1.An attribute (or attributes) in a relational table which is from the same domain as the identifier of the same or another table; can be thought of as a logical pointer from the “referencing” entity table (with the foreign key) to the “referenced” entity table (with the identifier). It is used to represent a many-to-one relationship between the referencing and referenced tables. It is not necessary for a foreign key to have a value; that is determined by the independently defined dependency characteristic. 2.The preferred primary key of a parent data subject that is placed in a subordinate data subject to identify the relevant parent data occurrence in that parent data subject." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Field whose purpose is to link two or more tables together." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A key in a database table that comes from another table (also know as the "referenced table") and whose values match the primary key (PK) or unique key in the referenced table." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An attribute or set of attributes that identify relationships between entity occurrences." (Craig S Mullins, "Database Administration", 2012)

"One or more attributes that are included in one entity for the purpose of identifying another." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"An attribute used for distinguishing a record that participates in a relationship with another table" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A field that has a relationship to the primary key field in another table." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"An attribute of one table that is related to the primary key of another table." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"In a relational database, a set of one or more fields in one table with values that uniquely define a record in another table. The table containing the foreign key is the child table, and the table that contains the uniquely identified record is the parent table." (Rod Stephens, "Beginning Software Engineering", 2015)

"A column or set of columns that refers to a parent key. In a relational database, a key in one table that references the primary key in another table. See also constraint, primary key, unique key." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

01 April 2009

🛢DBMS: Key (Definitions)

"A field used to identify a record, often used as the index field for a table." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A column or group of columns that uniquely identifies a row (PRIMARY KEY), defines the relationship between two tables (FOREIGN KEY), or is used to build an index." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A data element (attribute or column) or the combination of several used to identify a member of a set, instance of an entity, or record in a table." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A data item or combination of data items used to identify or locate a record instance (or other similar data groupings)." (William H Inmon, "Building the Data Warehouse", 2005)

"A specialized field determining uniqueness, or application of referential integrity through use of primary and foreign keys." (Gavin Powell, "Beginning Database Design", 2006)

"Key is a data item that helps to identify individual occurrences of an entity type." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A column or group of columns that uniquely identifies a row (primary key), defines the relationship between two tables (foreign key), or is used to build an index." (Jim Joseph et al, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A generic term for a set of one or more attributes (data items) that, taken collectively, enables one to identify uniquely an entity or a record in a SQL table; a superkey." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"A data item or combination of data items designated to uniquely identify a particular entity instance or table row." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An entity identifier based on the concept of functional dependence; may be classified as follows: Superkey - an attribute (or combination of attributes) that uniquely identifies each entity in a table. Candidate key - a minimal superkey, that is, one that does not contain a subset of attributes that is itself a superkey. Primary key (PK) - a candidate key selected as a unique entity identifier." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A column or group of columns that uniquely identifies a row (primary key), defines the relationship between two tables (foreign key), or is used to build an index." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The attributes that identify entity occurrences and define relationships between entities." (Craig S Mullins, "Database Administration", 2012)

"An identifying attribute of data" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"In a relational database, a set of one or more fields that uniquely identifies a record." (Rod Stephens, "Beginning Software Engineering", 2015)

"In key-value databases, a key is a reference to a value. In relational databases, a key is also a way to reference a row in a table. Primary keys refer to the way of uniquely identifying a row. Foreign keys refer to keys stored in one table that are used to reference rows in other tables." (Dan Sullivan, "NoSQL for Mere Mortals®", 2015)

"One or more fields of a database record that uniquely identifies it among all other records in the table." (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"Sequence of bits that are used as instructions that govern the acts of cryptographic functions within an algorithm." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A column or an ordered collection of columns that is identified in the description of an index, unique constraint, or referential constraint. An index key can also be an expression, an ordered collection of expressions, or an ordered collection of columns and expressions." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A subset of row data that uniquely identifies a row. Key data uniquely describes the current row in an open cursor." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

🛢DBMS: Column (Definitions)

"The logical equivalent of a field. A column contains an individual data item within a row or record." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"In a SQL database table, the area, sometimes called a field, in each row that stores the data about an attribute of the object modeled by the table (for example, the ContactName column in the Customers table of the Northwind database). Individual columns are characterized by their maximum length and the type of data that can be placed in them. A column contains an individual data item within a row." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A named portion of a table that can hold some number of data values of a particular datatype. Columns usually have some maximum width or number of bytes. The names and datatypes of a table's columns typically do not vary over time." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"The area in each row that stores the data value for some attribute of the object modeled by the table." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"The component of a database table that contains all of the data of the same name and type across all rows." (Bob Bryla, "Oracle Database Foundations", 2004)

"A piece of data that may be recorded for each row in a table. The corresponding formal database term is attribute." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Stored within tables, a column contains a particular piece of information." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"In a table, the area in each row that stores the data value for some attribute of the object presented in the table. For example, in an Employee table, a FirstName column would contain the first name of an employee." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A data attribute as implemented in a relational database as a vertical component of a table, similar to a field in a flat file record." (Craig S Mullins, "Database Administration", 2012)

"The area in each row of a database table that stores the data value for some attribute of the object modeled by the table." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A column is the data structure for storing a single value in a column family database." (Dan Sullivan, "NoSQL for Mere Mortals", 2015)

"The vertical component of a database table. A column has a name and a particular data type (for example, character, decimal, or integer)." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

03 March 2009

🛢DBMS: Attribute (Definitions)

"A qualifier of an entity or a relation describing its character quantity, quality, degree, or extent. In database design, tables represent entities and columns represent attributes of those entities. For example, the title column represents an attribute of the entity titles." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A column (field) in a dimension table." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"An attribute is the lowest level of information relating to any entity. It models a specific piece of information or a property of a specific entity. Dimensional modeling has a more restrictive definition; it refers to information that describes the characteristics of a dimension." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A data item that has been 'attached' to an entity. By doing this, a distinction can be made between the generic characteristics of the data item itself (for instance, data type and default documentation) and the entity-specific characteristics (for example, identifying and entity-specific documentation). It’s a distinct characteristic of an entity for which data is maintained. An attribute is a value that describes or identifies an entity, and an entity contains one or more attributes that characterize the entity as a whole. An entity example is Employee, and an attribute example is Employee Last Name." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A property that can assume values for entities or relationships. Entities can be assigned several attributes (for example, a tuple in a relationship consists of values). Some systems also allow relationships to have attributes as well." (William H Inmon, "Building the Data Warehouse", 2005)

"Information about a specific dimension member." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The equivalent of a relational database field, used more often to describe a similar low-level structure in object structures." (Gavin Powell, "Beginning Database Design", 2006)

"The differing data items within a relation. An attribute is a named column of a relation." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"The formal database term for column." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Individual data element that is represented and stored in a dimension. Each attribute contains data relating to that dimension." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A primitive data element that provides descriptive detail about an entity; a data field or data item in a record. For example, lastname would be an attribute for the entity customer. Attributes may also be used as descriptive elements for certain relationships among entities." (Toby J Teorey, "Database Modeling and Design 4th Ed", 2010)

"A characteristic of an entity or object. An attribute has a name and a data type." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Characteristic describing an entity. Also known as a field." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A single characteristic or additional piece of information (financial or non-financial) that exists in a database." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An inherent fact, property, or characteristic describing an entity. Every attribute does one of three things: describes, identifies, or relates." (Craig S Mullins, "Database Administration", 2012)

"In modeling, an attribute represents a characteristic of an entity. Because of this use, attribute is sometimes understood as a data element (which is a component piece of a data used to represent an entity), or a field (which is part of a system used to display or intake data), or a column (which is a place in a table to store a defined characteristic of a represented entity, that is, to store values associated with data elements)." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"A data element that describes an entity or a relationship. Each attribute applies to every occurrence of its entity or relationship." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"In the context of information, a descriptor that is not usually associated with a numerical value. Some examples are bad, excellent, red, green, tall, small, wide, far, heavy, fast, portrait, and scenic." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"A value of data that is distinguishable from other values" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The property or characteristic of an object that can be distinguished quantitatively or qualitatively by human or automated means." (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

"Characteristics of an object we capture in a catalog or model for data management purposes. Example: last name is an attribute of a person." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

04 February 2009

🛢DBMS: Composite Index (Definitions)

"Indexes which involve more than one column. Use composite indexes when two or more columns are best searched as a unit because of their logical relationship." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Has two or more columns as index key columns in the same table." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"An index that uses more than one column in a table to index data." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

[compound index:] "An index whose keys contain values derived from more than one data column." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"An index that is created on two or more columns in a table." (Bob Bryla, "Oracle Database Foundations", 2004)

"Indexes that can be built on more than a single field." (Gavin Powell, "Beginning Database Design", 2006)

"An index that includes two or more fields." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An index that is composed of more than one column. Both clustered and nonclustered indexes can be created as composite indexes." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"An index that is made of two or more columns." (Robert D. Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"An index that uses more than one column in a table to index data." (Microsoft, "SQL Server 2012 Glossary", 2012)

20 January 2009

🛢DBMS: Non-Clustered Index [NCI] (Definitions)

"An index that stores key values and pointers to data. The leaf level points to data pages rather than containing the data itself." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A nonclustered index does not physically arrange data, but points to the data in a table. The pointers are themselves sorted, making it easy to quickly locate data (search) within a table." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"An index with a structure completely separate from the data rows it indexes; normally a B-tree. The lowest rows of a non-clustered index contain the index key values, with each key value pointing to the data rows that contain that key. The data rows are not stored in an order based on the index key, but rather in a heap." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"An index in which the logical order of the index is different than the physical, stored order of the rows on disk." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"An index in which the logical order of the index is different than the physical, stored order of the rows on disk. In contrast to clustered indexes, non-clustered indexes are totally separated from the actual data rows, causing an unsorted order of data based on non-clustered keys. Non-clustered indexes differ from the clustered indexes at the leaf level. The leaf level of a non-clustered index contains the key value and the row locator. The row locator is either the physical row address (if there is no clustered index) or the clustered index key value (if a clustered index exists)." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"This is an index that does not physically rearrange the data that is inserted into your tables. This index contains pointers to the data that is stored in the table or clustered index." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"An index in which the logical order of the index is different from the physical, stored order of the rows on disk. In contrast to clustered indexes, non-clustered indexes are totally separated from the actual data rows, causing an unsorted order of data based on non-clustered keys. Non-clustered indexes differ from clustered indexes at the leaf level. The leaf level of a non-clustered index contains the key value and the row locator. The row locator is either the physical row address (if there is no clustered index) or the clustered index key value (if a clustered index exists)." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"A SQL Server index that is separate from the table." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"An index whose leaf level stores only the columns included in the index (key columns and included columns) and holds pointers to the rows in the actual table." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"A non-clustered index is added to optimize queries. Tables can have multiple non-clustered indexes. A non-clustered index is similar to the index in the back of a book. By finding it in the index, you know specifically where to look in the book for the information." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A non-clustered index is an index that stores the clustering key or row ID to the row data in its leaf nodes, depending on whether the table is a clustered table or a heap." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"An index in which the logical order of the index key values is different than the physical order of the corresponding rows in a table. The index contains row locators that point to the storage location of the table data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A non-clustered index is an index that stores the clustering key or row ID to the row data in its leaf nodes, depending on whether the table is a clustered table or a heap." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"An index that stores the clustering key or row ID of the row data in its leaf nodes, depending on whether the table is a clustered table or a heap." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.