SQL Troubles

22 September 2009

⌛SQL Reloaded: Incremental Update Technique(s) in Oracle

In Cognos Enterprise Business Intelligence Developer Guides mentioned in the previous post is repeated over and over again in the WHERE clause a set of constraints like in the below example(e.g. Accounts Payable Analysis for Oracle e-Business Suite, p. 21), involving more or less than 3 tables:

(A.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND A.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (B.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND B.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (C.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND C.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

For 3 tables it's not a lot of work, though the set of constraints becomes more complicated with the increase in the number of tables. It can be simplified by using the LEAST & GREATEST Oracle functions, resulting the equivalent writing:

(LEAST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

The second formulation is simpler and should have the same performance as the first formulation, however it has a small impediment – the LEFT JOINs and LEAST/GREATEST behaviour when working with NULL values. If one of the parameters provided to the two functions is NULL then the result will be NULL, thing which can mess up a query. That’s not difficult to avoid by using for example the NVL function. Supposing that there is a LEFT JOIN between A and two other tables B & C, the above constraints can be rewritten as:

(LEAST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

One might arrive to write such constraints for example when attempting to do an incremental update in a data warehouse based on the Creation Date from multiple source tables, loading the records having a Creation Date greater than the date of the previous update (of course the records existing already in data warehouse needs to be deleted or updated). The Creation Date works only for transactional data which don’t change after their creation, for the other types of cases it should be considered the Last Update Date (Date of the Last Update for a record). Also this approach has its flows, working well when records are not deleted from the table; this issue could be fixed for example by creating a trigger on deletion and save the ID of the record deleted in a third party table or actually delete the record from the third party database when the deletion happens in the source table.

💫ERP Systems: Learning about Oracle APPS internals II

In a previous post I mentioned the FND_TABLES & FND _VIEWS as sources that can be used to find more about Oracle APPS internals. The two tables are pretty useful though they don't contain all the tables/views created in Oracle unless they were registered in them. So some of the objects developed for example during the various customization or integration phases might not be found in them, therefore it makes sense to check also ALL_ALL_TABLES & ALL_VIEWS tables, providing all the tables available for a certain schema.

SELECT *
FROM ALL_ALL_TABLES
WHERE OWNER = :OWNER

SELECT *
FROM ALL_VIEWS
WHERE OWNER = :OWNER

Getting the differences between the two sources it’s pretty easy:

SELECT A.*
FROM ALL_ALL_TABLES A
    LEFT JOIN APPS.FND_TABLES B
        ON A.TABLE_NAME = B.TABLE_NAME
WHERE A.OWNER = :OWNER
AND B.TABLE_NAME IS NULL

If you are interested on Oracle internals you can find more goodies by querying the Oracle dictionary table:

SELECT *
FROM DICT
WHERE TABLE_NAME LIKE '%VIEW%'

One of such goodies is the view that stores the list of Materialized views:

SELECT *
FROM DBA_MVIEWS

PS: I know that these are pretty basic stuff for APPS DBAs though not all of us discovered them from beginning.

Previous Post <<||>> Next Post

15 September 2009

🛢DBMS: Performance (Definitions)

"The speed with which SQL Server processes queries and returns results." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Accomplishment of project expectations; ability to attain goals by using resources efficiently and effectively." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"A measure of the degree to which a system or component accomplishes designated functions within given constraints, such as accuracy, time, or resource usage." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"Performance is a measure of how fast a database services applications, and ultimately end-users." (Gavin Powell, "Beginning Database Design", 2006)

"quantitative measures of the execution of some element or set of elements, such as average execution time or memory usage." (Bruce P Douglass, "Real-Time Agility", 2009)

"Key characteristic of a computer system upon which it is compared to other systems." (Max Domeika, "Software Development for Embedded Multi-core Systems", 2011)

"The quality of outputs, outcomes, benefits, and results achieved." (Paul C Dinsmore et al, "Enterprise Project Governance", 2012)

"The manner in which organizations achieve results (i.e., the way they behave and operate to effect actions), as well as the outputs and outcomes of these actions (i.e., the results they achieve)." (Sally-Anne Pitt, "Internal Audit Quality", 2014)

"1)Output/results (or measures of same) The relationship between targeted output and achieved output Output or results can, and must, be measured. 2)Work Performance. How well a person has done in producing the results (output) taking into consideration all relevant circumstances. How well has the person carried out the work of the role? Work performance cannot be measured; it must be judged by a manager based on how well the person has worked to achieve the assigned output, or result, in the situation in which the work was performed." (Catherine Burke et al, "Systems Leadership, 2nd Ed,", 2018)

"a measure of what is achieved or delivered" (ITIL)

"The degree to which a system or component accomplishes its designated functions within given constraints regarding processing time and throughput rate. (IEEE 610)

13 September 2009

💎SQL Reloaded: Randomized Data Selection

A few years back I was asked to provide a random set of data from a data collection. Dealing only with a small number of records and as they were in Excel, it was much easier to use the Excel’s RAND function. Even if SQL Server provides a RAND function too, it returns the same value within a data set when the same parameter is provided. In order to avoid this behavior it would be useful if could be provided a distinct parameter value for each row, for example using the data set's integer unique identifier or eventually create one, thus if ID is such a column the query would look something like that:

SELECT TOP 10 RAND(ID) PERC, ID, N1, N2

FROM

ORDER BY RAND(ID)

The fact that RAND will return always the same value for the same parameter value might be a problem because the above query will always return the same records. In exchange you can use a multiplier for the parameter provided to RAND function - for example in a first attempt you can use RAND(id), in a second RAND(id*2), in a third RAND(id*3), etc.

This approach offers unfortunately a relatively "weak" randomization which might be not sufficient for repetitive selections, therefore under SQL Server 2005+ could be used instead a CLR-based UDF (user defined function) and use the random functions available in VB, C# or any other CLR programming language, or even the .Net Random class designed to produce a sequence of numbers that meet certain statistical requirements for randomness.

12 September 2009

💎SQL Reloaded: Number of Records I (via COUNT)

Getting the number of records from a table is straightforward, a simple SELECT doing the job:

-- general syntax 
SELECT count(*) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(*) NoRecords
FROM [Person].[Address]

Some prefer using a count(1) instead of a count(*), though it doesn't make a difference considering how SQL Server handles the two scripts in the background.

-- general syntax 
SELECT count(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(1) NoRecords
FROM [Person].[Address]

Same can be achieved by using a SUM:

-- general syntax 
SELECT SUM(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT SUM(1) NoRecords
FROM [Person].[Address]

However the SUM is typically used when counting only certain records within the same line (the attributes must be know however beforehand):

-- counting the occurrences for a certain values via SUM
SELECT SUM(CASE WHEN Class IS NULL THEN 1 ELSE 0 END) [Nulls]
, SUM(CASE WHEN Class = 'H' THEN 1 ELSE 0 END) [H]
, SUM(CASE WHEN Class = 'L' THEN 1 ELSE 0 END) [L]
, SUM(CASE WHEN Class = 'M' THEN 1 ELSE 0 END) [M]
, count(*) NoRecords
FROM [Production].[Product]

The sane output can be used via a GROUP BY based on the grouping attributes:

-- counting the occurrences for a certain attribute via COUNT
SELECT Class
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class

-- counting the occurrences for a group of attributes via COUNT
SELECT Class
, Color
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class
, Color

If you want to implement the functionality in a function or stored procedure and provide the name of the table as parameter, then things get a little more complicated as the query needs to be built dynamically on runtime. In stored procedures a query built dynamically can be usually executed using EXEC or EXECUTE stored procedures, though none of them will do as we need to reuse the actual value returned by the query. This can be achieved with the help of a third stored procedure called sp_executesql, and the work can be done relatively easily with the below stored procedure:

-- dropping the stored procedure
--DROP PROCEDURE IF EXISTS dbo.pGetNumberRecords

-- creating the stored procedure
CREATE PROCEDURE dbo.pGetNumberRecords(
  @Table varchar(100) 
, @NumberRecords int OUTPUT 
)
AS 
DECLARE @SglQuery nvarchar(250) 

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @SglQuery = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @SglQuery
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

-- testing the stored procedure 
DECLARE @NumberRecords int 
EXEC dbo.pGetNumberRecords '[Production].[Product]', @NumberRecords OUTPUT 
SELECT @NumberRecords

The overhead of using a stored procedure for getting the records count is bearable though the solution is not necessarily elegant, a nicer approach being the use of functions for the same purpose:

-- dropping the function
--DROP FUNCTION IF EXISTS dbo.GetNumberRecords

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@table as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
DECLARE @sql varchar(200) 
DECLARE @NumberRecords int

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @sql = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @sql
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

RETURN @NumberRecords
END

-- testing the function
SELECT dbo.GetNumberRecords('[AdventureWorks2014].[Production].[Product]')

Unfortunately UDFs have the restriction that they can’t execute stored procedures, and thus the above technique can’t be used. Even if the function is created, when attempting to run it, the following error occurs:

Msg 557, Level 16, State 2, Line 185
Only functions and some extended stored procedures can be executed from within a function.

There is actually an alternative for executing pass-through queries in functions by using OPENQUERY, OPENROWSET or OPENDATASOURCE functions, allowing thus to execute DML statements against a given linked server, as is the case of first two functions, respectively using ad-hoc connection information as is the case of the third function. For exemplification I chose OPENQUERY function, this being the code created to return the number of records from a specified table (you'll need to replace with your server, see below):

--dropping the test function 
-- DROP FUNCTION IF EXISTS dbo.GetNumberRecordsTest

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecordsTest() 
RETURNS int 
AS 
BEGIN 
  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( <server_name>, 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM [AdventureWorks2014].[Production].[Product]') AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecordsTest()

For this piece of code to work is first necessary to refer the server as a linked server:

EXEC sp_serveroption <server_name>, 'Data Access', true

Withought doing this first you’ll get the error:

Msg 7411, Level 16, State 1, Procedure GetNumberRecords, Line 4 Server '>server_name<' is not configured for DATA ACCESS.

You can use sp_helpserver to get server's name:

-- getting server's name
EXEC sp_helpserver

The above function doesn’t help much. It would be helpful if we could provide the table’s name as parameter and store the query string in a variable, something like:

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@TableName as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
  DECLARE @sql varchar(200) 
  SET @sql = 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM ' + @TableName

  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( , @sql) AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecords()

Unfortunately OPENQUERY function same as OPENROWSET and OPENDATASOURCE expect a constant parameter as query and therefore the error (as per SQL Server 2005):

Msg 102, Level 15, State 1, Procedure GetNumberRecords, Line 12 Incorrect syntax near '@sql'. Expecting STRING or TEXT_LEX

I wonder why the SQL Server team hasn’t allowed the use of variables as parameters for OPENQUERY functions, if not in the previous SQL Server versions at least in SQL Server 2008! Anyway, another impediment would be raised by the deterministic nature of functions.

Notes:

1) There are further alternatives, e.g. sys.partitions DMV provides similar information (see post) ideal for scenarios in which only an approximative value is needed.
2) For other scenarios one can use the sp_MSForEachTable undocumented stored procedure (see post)

Happy coding!

Previous Post <<||>> Next Post

💎SQL Reloaded: Deterministic Functions (GetDate SQL Server 2000/2005)

In an answer on LinkedIn, one of the users made me attentive that in SQL Server 2005 and later versions the GetDate() and other non-deterministic UDFs (user-defined functions) can be used inside functions and views, thing which was not possible under SQL Server 2000. This lack of functionality involved considerable overhead when was needed to use current date in table-valued functions. The solution was to send the date as parameter to the respective objects.

For exemplification the following piece of code would run successfully on SQL Server 2005 but fail on SQL Server 2000, returning the below error message.

-- creating the test function
CREATE FUNCTION dbo.fGetCurrentDate() 
RETURNS smalldatetime AS 
BEGIN 
    RETURN GetDate()
END

-- testing the function
SELECT dbo.fGetCurrentDate()

Error message:
Server: Msg 443, Level 16, State 1, Procedure GetCurrentDate, Line 6 Invalid use of 'getdate' within a function.

A similar error message resulted when using the GetDate() function inside of a view:

-- creating the test view
CREATE VIEW dbo.vGetCurrentDate
AS
SELECT GetDate() AS CurrentDate

-- testing the view
SELECT * 
FROM dbo.vGetCurrentDate

The alternative was to use an UDF instead, which functions like a parameterized view, something like:

-- creating the test UDF
CREATE FUNCTION dbo.udfFilterByDate(
 @date datetime)
 RETURNS TABLE
 AS 
 RETURN(
 SELECT * 
 FROM [AdventureWorks2014].Production.Product
 WHERE [SellStartDate]>=@Date)

 -- testing the UDF
 SELECT *
 FROM dbo.udfFilterByDate(GetDate())

 -- testing the UDF (alternative)
 DECLARE @date as datetime
 SET @date = GetDate()
 SELECT *
 FROM dbo.udfFilterByDate(@date)

Using an UDF can still be necessary when is needed to pass dates and other types of parameters used inside of a correlated query or within the logic.

29 August 2009

🛢DBMS: Lock Granularity (Definitions)

"The size of the locked area - database, file, table, page, row, or column." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"The size of the database element on which a lock is placed (usually a table or a row within a table)." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"Indicates the level of lock use. Locking can take place at the following levels: database, table, page, row, and field (attribute)." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"The level of data that is involved in a lock. Lock granularity can be at any level of physical data: database, block, set of tables, row, or column level, and include any amount of data contained within those objects." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

🛢DBMS: Extensible Markup Language (Definitions)

"A standard for a markup language, similar to HTML, that allows tags to be defined to describe any kind of data you have, making it very popular as a format for data feeds." (Mike Moran & Bill Hunt , "Search Engine Marketing, Inc", 2005)

"Facilitates the assignment of meaningful structures and definitions of data and services for use by multiple systems. XML simplifies the ability to transmit and share data." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"Simple and flexible text format used to represent data. XML was designed by the World Wide Web Consortium (W3C)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"separates content from format, thus letting the browser decide how and where content gets displayed. XML is not a language, but a system for defining other languages so that they understand their vocabulary." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A platform-independent markup language for specifying the structure of data in a text document used for both data storage and the transfer of data." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A way of representing data and data relationships in text files, typically for data exchange between software of different types." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed. , 2010)

"A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document’s data elements. XML is designed to facilitate the exchange of structured documents such as orders and invoices over the Internet." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A specification for creating text files that contain hierarchical data." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"Has been created to overcome some difficulties proper to HTML (Hypertext Markup Language) that – developed as a means for instructing the Web browsers how to display a given Web page – is a ‘presentation-oriented’ markup tool. XML is called ‘extensible’ because, at the difference of HTML, is not characterized by a fixed format, but it lets the user design its own customized markup languages (using, e.g., a specific DTD, Document Type Description) for limitless different types of documents; XML is then a ‘content-oriented’ markup tool." (Gian P Zarri, "RDF and OWL for Knowledge Management", 2011)

"A set of rules for encoding documents electronically. XML was chosen as the standard message format because of its widespread use and open source development efforts." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"A standard metalanguage for defining markup languages that is based on Standard Generalized Markup Language (SGML)." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Extensible markup language (XML) is a simple, very flexible text format derived from SGML (standard generalized markup language). While XML was originally designed to meet the challenges of large-scale electronic publishing, it plays an increasingly significant role in the exchange of a wide variety of data on the web." (Kamalendu Pal, "Integrating Heterogeneous Enterprise Data Using Ontology in Supply Chain Management", 2019)

"A universal markup language for text and data, using nested tags to add structure and meta-information to the content." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

"A 'best practices' subset of SGML that has been designed by the Worldwide Web Consortium (W3C) for use on the Internet." (Microfocus)

"A notation in which you describe the structure of information in a text document by enclosing information in user-defined tags that define the syntactic elements. A flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. J2EE deployment descriptors are expressed in XML." (Microfocus)

22 August 2009

🛢DBMS: Concurrency Control (Definitions)

"The control of concurrent access. SQL Server uses locking to allow multiple users to access and change shared data at the same time without conflicting with one another." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Concurrency control is the control on the database and transactions which are executed concurrently to ensure that each transaction completed healthy." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Concurrency control ensures that database operations can be executed concurrently without compromising correctness. Pessimistic concurrency control, such as used in systems with locks, will block any potentially conflicting operations even if they may not turn out to actually conflict. Optimistic concurrency control, the approach used by WiredTiger, will delay checking until after a conflict may have occurred, aborting and retrying one of the operations involved in any write conflict that arises." (MongoDb, "Glossary", 2008)

"Mechanisms to ensure that a database remains consistent and accurate during concurrent use." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A DBMS feature that is used to coordinate the simultaneous execution of transactions in a multiprocessing database system while preserving data integrity." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"The control of process contention for resources within multi-process systems." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

20 August 2009

🛢DBMS: Database-as-a-Service (Definitions)

"Database-as-a-service (DBaaS) is a cloud computing service model that provides users with some form of access to a database without the need for setting up physical hardware, installing software or configuring for performance." (Kashif Munir, "Security model for cloud database as a service", 2015)

"A database hosted in the cloud on a pay per use basis, for example Amazon Web Services." (Analytics Insight)

"An on-demand, secure, scalable, and self-service database that automates the database provisioning and administration to support new and existing business applications and operational systems." (Forrester)

"Database as a service (DBaaS) is a cloud computing managed service offering that provides access to a database without requiring the setup of physical hardware, the installation of software or the need to configure the database. Most database administration and maintenance tasks are handled by the service provider, enabling users to quickly benefit from the database service." (TechTarget) [source]

"Database as a service (DBaaS) is a cloud database offering that provides customers with access to a database without having to deploy and manage the underlying infrastructure. DBaaS is delivered as a managed database service, which means that the provider takes care of patching, upgrading, and backing up the database." (Oracle) [source]

"DBaaS (also known as managed database service) is a cloud computing service that lets users access and use a cloud database system without purchasing and setting up their own hardware, installing their own database software, or managing the database themselves (not to mention hiring the high-priced talent required to do so). The cloud provider takes care of everything from periodic upgrades to backups to ensuring that the database system remains available and secure 24/7." (IBM) [source]

🛢DBMS: CRUD (Definitions)

"Create, Read, Update, and Delete. Refers to the major functions that need to be implemented in a database solution." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"Acronym for the four main database operations: Create, Read, Update, Delete. These operations correspond to the SQL statements INSERT, SELECT, UPDATE, and DELETE." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An acronym for the fundamental operations of a database: Create, Read, Update, and Delete. See MongoDB CRUD Operations." (MongoDb, "Glossary", 2008)

"A list of the only functions of data in persistent storage, in a convenient acronym form." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"This acronym is commonly used to describe basic database functions." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A current-state analysis exercise that inventories the systems, applications, and business processes that can create, read, update, and delete critical enterprise data and attributes. It's used to identify the processes and data access points that are affecting data quality or allowing improper usage of the data without suitable validation and governance." (Forrester)

"CRUD is an acronym that comes from the world of computer programming and refers to the four functions that are considered necessary to implement a persistent storage application: create, read, update and delete." (Sumo Logic) [source]

"CRUD (acronym for Create, Read, Update, Delete) are considered to be basic functions regarding stored data. In computer programming, possible actions are often mapped to these standard CRUD functions in order to clarify the actions. For example, standard HTTP(S) actions GET and POST refer to Read and Create functions regarding stored data." (Innopay)

🛢DBMS: Change Data Capture [CDC] (Definitions)

"As changes are made to a production data source, change data capture reads the source database log. This information can be used to prepare a batch to update the data warehouse, or it can update the data warehouse on a transaction-by-transaction basis. With SQL Server 7.0, transactional replication is an example of change data capture." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The process of capturing changes made to a production data source. Change data capture is typically performed by reading the log file of the Database Management System of the source database. Change data capture consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a Data Warehousing environment." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of capturing changes made to a production data source; typically used in data warehousing environments." (Craig S Mullins, "Database Administration", 2012)

"Change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. CDC often uses the database transaction log to populate the deltas, although it can also query the database directly." (Piethein Strengholt, "Data Management at Scale", 2020)

"CDC is the process of capturing changes that were made in the source systems and applying these changes throughout the enterprise for both decision support systems such as information warehouse and operational data stores as well as other downstream consuming applications." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"An automated approach for ensuring that data changes are synchronized across an enterprise by replicating data changes from a source system to other systems." (Jonathan Ferrar et al, "The Power of People", 2017)

🛢DBMS: Clustering (Definitions)

"Technology that enables you to create a hot spare. That is a server that is actually running and can take over immediately. This technology enables you to mirror an entire server to another computer." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"The use of multiple computers to provide increased reliability, capacity, and management capabilities." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

[federated cluster:] "A grouping of SQL servers used together to achieve scalability by employing a distributed partition view. A federated cluster is not used for availability, only for achieving scalability through scale out." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"Any collection of distinct computers that are connected and used as a parallel computer, or to form a redundant system for higher availability. The computers in a cluster are not specialized to cluster computing and could, in principle, be used in isolation as standalone computers. In other words, the components making up the cluster, both the computers and the networks connecting them, are not custom-built for use in the cluster." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"Connecting two or more computers in such a way that they behave like a single computer to an application or client. Clustering is used for parallel processing, load balancing, and fault tolerance." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"A method of keeping database files physically close to one another on the storage media for improving performance through sequential pre-fetch operations." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"(1) The condition whereby data is physically ordered contiguously by a specified key (usually implemented by means of an index). (2) The use of multiple, 'independent' computing systems working together to form what appears to users as a single highly available system." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed, 2012)

"The tendency of elements to become unevenly distributed in the hash table, with many adjacent locations containing elements" (Nell Dale et al, "Object-Oriented Data Structures Using Java 4th Ed.", 2016)

19 August 2009

🛢DBMS: Shared Lock (Definitions)

"A lock created by non-update ('read') operations. Other users may read the data concurrently, but no transaction can acquire an exclusive lock on the data until all the shared locks have been released." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A lock used during READ operations." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"Lock that allows concurrent transactions to read a resource." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Control over a portion of the database given to one or more transactions that prevents other transactions from modifying the data while the locks are in place." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A lock on a database element that prevents other transactions from updating the database element while the lock is held." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A lock that is issued when a transaction requests permission to read data from a database and no exclusive locks are held on that data by another transaction. A shared lock allows other read-only transactions to access the database." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A lock created by non-update (read) operations." (SQL Server 2012 Glossary, "Microsoft", 2012)

"The type of lock taken by the DBMS when data is read with no intent to update it." (Craig S Mullins, "Database Administration", 2012)

"A lock that limits concurrently running application processes to read-only operations on database data." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

18 August 2009

🛢DBMS: Intent Lock (Definitions)

"Indicates the intention to acquire a share or exclusive lock on a data page." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Specifies the intent to gain a shared or exclusive lock." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"An lock that indicates that SQL Server wants to acquire a shared or exclusive lock on a more specific resource. An intent lock prevents another transaction from acquiring an exclusive lock on the resource containing that page or row." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A lock of a larger object related to a locked small object. Implicitly, any small-grain lock implies a shared big-grain lock. If there is one or more locks on pages belonging to a table, then there will also be a lock on the table itself, as a separate lock record." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A lock placed on one level of a resource hierarchy to protect shared or exclusive locks on lower-level resources." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A lock on a resource that indicates that the holder of the lock will read (intent shared) or write (intent exclusive) the resource using concurrency control at a finer granularity than that of the resource with the intent lock. Intent locks allow concurrent readers and writers of a resource. See What type of locking does MongoDB use?." (MongoDb, "Glossary", 2008)

"A lock that is placed on one level of a resource hierarchy to protect shared or exclusive locks on lower-level resources." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A type of lock placed on higher-level database objects when a user or process acquires locks on data pages or rows. An intent lock stays in place for the life of the lower-level locks." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)