Showing posts with label foreign key. Show all posts
Showing posts with label foreign key. Show all posts

05 December 2009

Database Management: Databases (An Introduction)

Database Management

So, you’ve heard the reporting guy or somebody else talking about getting some data or a report from the database, or you found out that you can’t use one of the fancy applications your company has in place just because the database is not available. Dam, that database must be something important! 
Thus you may wonder what a database is, and, with a few clicks, you find out in Wikipedia what all is about, a database is “an integrated collection of logically related records or files consolidated into a common pool that provides data for one or more multiple uses” [1]. 

This definition doesn’t clear up things at all, isn’t it? Terms like “integrated collection of logical related records” or “files consolidated into a common pool” even if they seem semantically right seems to be hard to digest. Without pretending to give a better definition, I would define a database as a logical and physical structure used to contain data in a consistent form. For sure, this definition won’t revolutionize the world of databases, and most probably might be other similar definitions out there, better formulated and sustained. 

Actually also this definition might need some clarifications, I’m talking about a logical structure because there is a logic on how the data are stored, physical structure because the data are stored on a physical device (e.g. computer memory, hard disk or any other type of storage device). I used contained and not stored, because containment imply certain control over the structure in comparison with the simple storage of data. Ok, this being said, somebody would question: hey, also a delimited text file can store data in a consistent form, what’s the difference between a database and a delimited text file? 

At a first view, I would say the difference resides in context, in the fact that a database “exists” in the context of a database management system (DBMS), a (software) system that provides a mechanism for storage, retrieval, modification and management of data. There are DBMS that store their data as delimited flat files, known also as flat file database, however such simple structures hardly cope with the requirements of modern DBMS, that need to provide a scalable, reliable and secure storage system. Why is a delimited flat file a simple structure?! This affirmation needs indeed some further explanation…

A delimited text file is a text file in which the chunks of data are delimited by special characters and stored in a tabular format with rows and columns, much like in Excel if you want, though Excel offers a better visual structure that facilitates data visualization and manipulation. The delimited text file supposes the existence of at least two types of delimiters, one to delimit the columns, usually a colon, semicolon or pipe, and one to delimit the rows, usually a combination of carriage return and line feed. Unfortunately such a structure imposes one important problem: what happens with the chunks of text containing a character or a set of characters used already as delimiters? 

Therefore the text is typically encompassed between two quotes, and in case a double quote exists already in a text, then the contained double quote it’s duplicated facilitating thus text’s interpretation. There could be even more problems, given the two systems of writing a number, comma versus dot, could be tricky to use a comma delimited file, semicolon or tab would be a better delimiter. An application that reads such a file would need thus to understand the delimiters used, and even more the format in which numeric and date values are stored, of whether the first row contains the column names, being required to store such metadata in a second file.

In contrast, a database stores its data in multiple tables designed around logical entities (e.g. Vendors, Customers, etc.) with attributes (e.g. Name, Address, City, Country, etc) translated into columns and the actual values forming a record in a table, resulting thus again a row/column structure for each table. In a flat file database, each table is stored in one file, while in a typical database the tables are stored together in one file, this implying the existence of another delimiter for tables themselves, not to mention that each table has its own structure with different column names, though the columns names, their data types and several types of metadata on columns and tables are stored in auxiliary tables.

And that’s not all, normally a simple search functionality might be enough in order to find a value in a text file, though for databases that’s something complicated to achieve because first of all a “search” is performed against one or more tables, and even against two or more databases, thus first of all, for efficiency, is needed a mechanism that reflects where a table begins, secondly it makes sense to have an explicit and/or implicit unique identifier (UID) for a record, that would allow identifying a record in a table in a unique manner. 

Such unique identifiers might be a combination of one or more columns, the best candidates from a performance standpoint being (positive) integer values, though text and as types of values could qualify as such too. A table could have more than one such unique identifiers, for example an integer running number (sequence) used especially for fast retrieval of records, and another attribute (e.g. Material Number) or a combination of several attributes (e.g. Material Number, Vendor, Serial Number) from a entity standpoint. One of unique identifiers, typically single attributes, is a candidate for the primary key, used by other tables to reference a record from the respective table. 

Why is needed such a mechanism when in theory one table can store all the data, much like in Excel fashion?! It’s mainly a question of design, storage and maintenance efficiency, being more feasible of storing redundant data in a table of their own and referencing the corresponding record from the main table, having thus a relationship between the two tables, the identifier that references the primary key of the table thus formed being called the foreign key

Even if not always feasible, the foreign key could be based on multiple attributes, fact that increases relation’s complexity, because when the data are retrieve, the primary/foreign key columns are used to merge the results from the two tables into a common result set. Even if logical, the mechanism can be become complicated, the relationship needing to be stored together with the actual data in a structure of its own, even more a reference constrain can be enforced in order to assure that references are not invalidated by the deletion of a record.

Tables can become really big, ranging from a few thousand of records to millions and even milliards records, making data retrieval quite complex. Therefore it makes sense to have in place a mechanism that allows faster data retrieval at least for the most used attributes in searching, and that can be achieved with the help of indexes created based on one or more columns, they are stored in a dedicated structure and include a reference to the actual record. Indexes allow a DBMS to perform a search based on the columns used in an index on an optimized index structure rather than performing the search on the table itself, and this, from performance point of view, can make quite a difference for big tables.

Data are retrieved from a database with the help of a query, a well structured statement based on SQL (Structured Language Query) standard, specifying typically the columns to be retrieved, together with the tables they belong and the primary/foreign columns used to build the relation (join) between tables, and the columns on which the search will be performed on. A query can be more complex than that, it can include other statement modifiers, subqueries, views and functions. 

A subquery is a query nested inside of another query, referred also as nested query, the nesting can go even several levels, making queries hard to “read” and maintain. Fortunately a database allows encapsulating a query in database objects like views, functions and stored procedures, enabling a better management of queries, facilitating also code reuse. 

A database view can be seen as a virtual table based on a single statement query, storing no data on its own and being used to limit the number of columns or records retrieved. In contrast with views, user-defined functions and stored procedures can be based on multi-statement code, the main difference between the two residing in the fact that functions can be used in queries, while stored procedures are executed individually, this implying also some differences on the manner they are executed; there are many more other differences between these two types of database objects, mainly DBMS vendor related.

 Another important topic in the world of databases is concurrency, handling “simultaneous” requests coming from multiple “users” and this involves access of the same piece of data by multiple users. A DBMS has to take care in the background of such scenarios, avoid when possible and address locks, restrict users the access to data they are not entitled to see or modify. The access to data and database objects is based on accounts and roles, and even if a user has accidental access to the server on which the database stored, above the binary encoding of data, a DBMS might even encrypt the data and database objects.

As can be seen, there is a whole arsenal of database objects associated with a database, typically stored independently of the actual database that holds the data, in tables spanning over one or more databases that belong to the internal kitchen of the DBMS. This arsenal makes the difference between a simple delimited text file and a database, DBMS vendors building in their solutions even more concepts mechanism in order to address issues like reliability, accessibility, scalability, performance, security, heterogeneity and so on.

Now, after this being said, the Wikipedia definition for a database seems to be relatively accurate, and that up to a point because relatively recently appeared the concept of in-memory database (IMDB) that primarily relies on the main memory for data storage. This doesn’t make the respective definition less valuable, though frankly I prefer my own definition, hopefully I haven’t left anything important out of it.

References:
[1] Wikipedia. 2009. Database. [Online] Available from: http://en.wikipedia.org/wiki/Database (Accessed: 5 December 2009)

02 April 2009

🛢DBMS: Foreign Key [FK] (Definitions)

"A key column in a table that logically depends on a primary key column in another table. Also, a column (or combination of columns) whose values are required to match a primary key in some other table." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"The column or combination of columns whose values match the primary key (PK) or unique key in the same or another table. A foreign key does not have to be unique. A foreign key is often in a many-to-one relationship with a primary key. Foreign key values should be copies of the primary key values; no value in the foreign key except NULL should ever exist unless the same value exists in the primary key. A foreign key may be NULL; if any part of a composite foreign key is NULL, the entire foreign key must be NULL." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A type of constraintthat enforces a logical relationshipbetween a row in one table and one or more rows in another table. Often, this is a master-detail relationship, where the primary key value of a row in one table appears in the "foreign key" column of one or more rows in another table." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A column in a relational database table whose values are drawn from the values of a primary key in another table. In a star-join schema, the components of a composite fact table key are foreign keys with respect to each of the dimension tables." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"A column or combination of columns whose values match the primary key (PK) or unique key in another table. The foreign key is also called the referencing key in some literature." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A column (or columns) in a table that draws its values from a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table." (Bob Bryla, "Oracle Database Foundations", 2004)

"A column or multiple columns whose values match the Primary Key of another table. Foreign Keys help in the relational process between two entities by connecting the foreign attribute in the child entity to a Primary Key in a parent entity." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"Column or combination of columns whose values are required to match a primary key in some other table." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A type of constraint where columns contain copies of primary key values, uniquely identified in parent entities, representing the child or sibling side of what is most commonly a one-to-many relationship." (Gavin Powell, "Beginning Database Design", 2006)

"Attribute or set of attributes that identifies the entity with which another entity is associated." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Used to maintain relationships between entities. Implemented as a field in a child table that has a corresponding primary key that it references." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"An FK is used to create a relationship between two tables, and typically points to a PK (PRIMARY KEY) in another table. The relationship enforces integrity between the two tables, allowing only entries in the FK table that exist in the PK table, and preventing deletions from the PK table if a related entry exists in the FK table." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"Information that establishes a relationship between two tables. By preventing erroneous data modifications, this association helps preserve data integrity." (Robert D. Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"One or more columns that are related to values in corresponding columns in another table." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An attribute (or combination of attributes) in a relation that is the same as the primary key of another relation. A foreign key may be a non-key attribute in its own relation, or it may be part of a concatenated primary key." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"The column or combination of columns whose values match the primary key (PK) or unique key in the same or another table." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"An attribute in a relational table used for establishing the direct relationship with another table, known as the parent table. The values for the foreign key attribute are drawn from the primary key values of the parent table." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"Any attribute in a SQL table (key or nonkey) that is taken from the same domain of values as the primary key in another SQL table and can be used to join the two tables (without loss of data integrity) as part of a SQL query." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"1.An attribute (or attributes) in a relational table which is from the same domain as the identifier of the same or another table; can be thought of as a logical pointer from the “referencing” entity table (with the foreign key) to the “referenced” entity table (with the identifier). It is used to represent a many-to-one relationship between the referencing and referenced tables. It is not necessary for a foreign key to have a value; that is determined by the independently defined dependency characteristic. 2.The preferred primary key of a parent data subject that is placed in a subordinate data subject to identify the relevant parent data occurrence in that parent data subject." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Field whose purpose is to link two or more tables together." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A key in a database table that comes from another table (also know as the "referenced table") and whose values match the primary key (PK) or unique key in the referenced table." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An attribute or set of attributes that identify relationships between entity occurrences." (Craig S Mullins, "Database Administration", 2012)

"One or more attributes that are included in one entity for the purpose of identifying another." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"An attribute used for distinguishing a record that participates in a relationship with another table" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A field that has a relationship to the primary key field in another table." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"An attribute of one table that is related to the primary key of another table." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"In a relational database, a set of one or more fields in one table with values that uniquely define a record in another table. The table containing the foreign key is the child table, and the table that contains the uniquely identified record is the parent table." (Rod Stephens, "Beginning Software Engineering", 2015)

"A column or set of columns that refers to a parent key. In a relational database, a key in one table that references the primary key in another table. See also constraint, primary key, unique key." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.