SQL Troubles

31 October 2009

Database Management: Views I

Database Management Series

Typically the RDBMS vendors offer on top of their table-based storage 3 types of database objects used for data access: stored procedures, functions and views, each with their pluses and minuses. The views, same as the functions and stored procedures, can be seen as abstraction layers that stand between the physical database and users, allowing functionality reuse, logical structuring and better code maintenance. As Tony Regerson mentions, logically (and it only is logically) a view can be visualized as a virtual table, but from an optimization perspective a view is not a virtual table – "this is extremely important to remember when designing queries that you want to scale and perform well in a real environment" (T. Regerson, 2007). The view as virtual table is maybe the most realistic definition for a view, without reusing the term of view redundantly, as in “a view offers a view within a data set based on one or more tables”.

Over the years I saw several pros and cons against using views, even several fights on whether to use views or not in database-based development. I’m not an SQL guru, though I often dealt with various applications making heavily use of database side programming, and I’m using the old fashioned views on a daily basis because of one of the below benefits.

Structure the code in logical table-like query-able units that can cover many possibilities of querying the same data and which can be reused in other database objects, including views (nesting views). It can be thus created an abstract model on top of the existing table model, with multiple levels of details and perspectives.

Hide implementation complexity from users, being easier to use than the base tables. There are situations in which users want to query the data by themselves, instead of using an existing report, a simple view can reduce the complexity of a query by hiding the implementation details and it simplifies data manipulation by focusing on a restrained scope specific to the problem is supposed to model. Making use of a often met quip, the users don’t have to “reinvent the wheel”, a view reducing concomitantly the effort and potential mistakes which can occur in queries’ construction; as somebody was remarking sarcastically, “not everybody is a Bill Gates”, so you can’t expect from users to be good query developers and know in and outs of a database model.

When accessing data over MS Access or other similar tools, views might allow reducing the network traffic and increase query performance compared with the queries built on top of linked tables, in theory the fewer linked tables, the less network traffic and better the performance as the heavy processing is moved from the client to the server. Based on their use, the reduction of network traffic and the increased in performance can be relative.

Stored as database objects, views keep code duplication and maintenance at minimum, offering better code traceability and testability. Imagine creating a set of queries for each possible scenario the users might use to query a set of tables, the simplest such scenario being the one in which the only difference resides in the attributes the users need; it’s must easier to create a view with all the attributes, the users selecting then only the attributes they need. In some situations might be necessary to create several views for the same set of tables, when are targeted different levels of details or views. There is another important aspect, the more queries you use on the same topic, the more maintenance work is needed when changes occurred in their logic; not to mention that often queries are run directly by users, the synchronization and testing of the various versions of the same query can become a nightmare. Even more, working with a named entity facilitates the communication between users and developers, allowing to easier identifying the piece of code in discussion.

Views allow changing the order of attributes, adding formatting, calculations and scalar functions on top of the existing attributes, using table-valued functions and views.

Views allow enforcing security at object, vertical (column) and horizontal (rows) level, partitioning thus data access based on the required security levels.

Another “unorthodox” feature of views is that they allow developers to directly insert, update or delete data from tables they are based on, however this can be done under certain circumstances (e.g. such views need to reference directly and individually table’s columns, thus no aggregations and transformations can be performed).

There are several cases against using views, the most important of them stresses the lower views’ performance when compared with the one of queries or stored procedures, however the benefit of caching query-plans can be relative, and the balancing between the cost of flexibility and maintainability vs. the cost of performance can favour the views. I find somehow paradoxical that in applications programming everybody goes for OOP weighting reusability more than performance, often creating allegorical monsters deviating from their purpose, and when it comes to databases, everyone is against views though the performance decrease is not so big, while the structures built derive intrinsically from data. On the other side it’s true that databases have a higher workload and are more sensitive to it, but I think people should learn to make use of tools and knowledge more wisely than hiding behind philosophies.

Tony Regerson considers that views are more difficult to understand when nested, this making tuning and debugging more difficult. I can partially agree with him; from my point of view the code becomes easier to read, understand, maintain and test when logic is split in smaller fragments, and even if it can be more difficult to identify the source of a field from the final query, for this needing to traverse multiple views, the benefits of views can diminish the impact of this downside. It remains the problem of tuning, though actually this reflects the need for smarter database engines, while the other understanding related issues reflect the need for development tools that can identify the source tables and fields from nested objects. Documentation like ERDs and other types of mappings can improve the understanding of nested views.

Several applications I worked with, for example Oracle APPS, built the data access on top of views, each form being based on one or more views, which encapsulate the basic layer of business logic, further processing and customization being done in the form itself. This architecture facilitates to some degree developers’ work. The same views can be used also for the various reports, however such views often have the inconvenience they resume only on the attributes needed by a form, thus the adding of other attributes resume at extending the views, the creation of similar views that include the needing attributes, or at the recombination of views. The extension of standard views is not recommended because there is the risk the vendor might decline the responsibility for the errors occurring in a view, another important risk residing the fact that the views might be replaced with the installation of patches or during version upgrade. The combination of views might lead to the unnecessary reuse of the same table more than once, resulting in performance decrease which can be substantial. The remaining approach, often the best, is to create another view replicating the needed logic, though the developed views need to be synchronized with the standard views in case changes occur.

A similar scenario occurs when users recombine views in order to obtain the attributes needed in their analysis, again is recommended to provide a view which offers the same needed behaviour. The problem is that it’s difficult to identify such issues, because usually the developer has low visibility on how the user is using the tables. This could be theoretically avoided by better communication between users and developers, by moving the code, when possible, from personal databases to production databases.

Doing changes through (updatable) views diminishes the chances of trapping or raising customized errors from code, letting the application or DBMS to handle the issue and all the implications derived from it. This eventual issue could be partially solved by view triggers, however this functionality is provided only by a few vendors. In web or desktop applications, such direct data access increases also the risk for SQL injection attacks, stored procedure based access being more recommended for such architectures.

There are several other aspects which need to be considered with views, facts resulting mainly from the various types of views; I’ll try to summarize them in a second post.

References:
T. Regerson (2007) Views – they offer no optimisation benefits they are simply inline macros use sparingly (link)

22 September 2009

⌛SQL Reloaded: Incremental Update Technique(s) in Oracle

In Cognos Enterprise Business Intelligence Developer Guides mentioned in the previous post is repeated over and over again in the WHERE clause a set of constraints like in the below example(e.g. Accounts Payable Analysis for Oracle e-Business Suite, p. 21), involving more or less than 3 tables:

(A.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND A.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (B.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND B.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (C.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND C.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

For 3 tables it's not a lot of work, though the set of constraints becomes more complicated with the increase in the number of tables. It can be simplified by using the LEAST & GREATEST Oracle functions, resulting the equivalent writing:

(LEAST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

The second formulation is simpler and should have the same performance as the first formulation, however it has a small impediment – the LEFT JOINs and LEAST/GREATEST behaviour when working with NULL values. If one of the parameters provided to the two functions is NULL then the result will be NULL, thing which can mess up a query. That’s not difficult to avoid by using for example the NVL function. Supposing that there is a LEFT JOIN between A and two other tables B & C, the above constraints can be rewritten as:

(LEAST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))

One might arrive to write such constraints for example when attempting to do an incremental update in a data warehouse based on the Creation Date from multiple source tables, loading the records having a Creation Date greater than the date of the previous update (of course the records existing already in data warehouse needs to be deleted or updated). The Creation Date works only for transactional data which don’t change after their creation, for the other types of cases it should be considered the Last Update Date (Date of the Last Update for a record). Also this approach has its flows, working well when records are not deleted from the table; this issue could be fixed for example by creating a trigger on deletion and save the ID of the record deleted in a third party table or actually delete the record from the third party database when the deletion happens in the source table.

💫ERP Systems: Learning about Oracle APPS internals II

In a previous post I mentioned the FND_TABLES & FND _VIEWS as sources that can be used to find more about Oracle APPS internals. The two tables are pretty useful though they don't contain all the tables/views created in Oracle unless they were registered in them. So some of the objects developed for example during the various customization or integration phases might not be found in them, therefore it makes sense to check also ALL_ALL_TABLES & ALL_VIEWS tables, providing all the tables available for a certain schema.

SELECT *
FROM ALL_ALL_TABLES
WHERE OWNER = :OWNER

SELECT *
FROM ALL_VIEWS
WHERE OWNER = :OWNER

Getting the differences between the two sources it’s pretty easy:

SELECT A.*
FROM ALL_ALL_TABLES A
    LEFT JOIN APPS.FND_TABLES B
        ON A.TABLE_NAME = B.TABLE_NAME
WHERE A.OWNER = :OWNER
AND B.TABLE_NAME IS NULL

If you are interested on Oracle internals you can find more goodies by querying the Oracle dictionary table:

SELECT *
FROM DICT
WHERE TABLE_NAME LIKE '%VIEW%'

One of such goodies is the view that stores the list of Materialized views:

SELECT *
FROM DBA_MVIEWS

PS: I know that these are pretty basic stuff for APPS DBAs though not all of us discovered them from beginning.

Previous Post <<||>> Next Post

15 September 2009

🛢DBMS: Performance (Definitions)

"The speed with which SQL Server processes queries and returns results." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Accomplishment of project expectations; ability to attain goals by using resources efficiently and effectively." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"A measure of the degree to which a system or component accomplishes designated functions within given constraints, such as accuracy, time, or resource usage." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"Performance is a measure of how fast a database services applications, and ultimately end-users." (Gavin Powell, "Beginning Database Design", 2006)

"quantitative measures of the execution of some element or set of elements, such as average execution time or memory usage." (Bruce P Douglass, "Real-Time Agility", 2009)

"Key characteristic of a computer system upon which it is compared to other systems." (Max Domeika, "Software Development for Embedded Multi-core Systems", 2011)

"The quality of outputs, outcomes, benefits, and results achieved." (Paul C Dinsmore et al, "Enterprise Project Governance", 2012)

"The manner in which organizations achieve results (i.e., the way they behave and operate to effect actions), as well as the outputs and outcomes of these actions (i.e., the results they achieve)." (Sally-Anne Pitt, "Internal Audit Quality", 2014)

"1)Output/results (or measures of same) The relationship between targeted output and achieved output Output or results can, and must, be measured. 2)Work Performance. How well a person has done in producing the results (output) taking into consideration all relevant circumstances. How well has the person carried out the work of the role? Work performance cannot be measured; it must be judged by a manager based on how well the person has worked to achieve the assigned output, or result, in the situation in which the work was performed." (Catherine Burke et al, "Systems Leadership, 2nd Ed,", 2018)

"a measure of what is achieved or delivered" (ITIL)

"The degree to which a system or component accomplishes its designated functions within given constraints regarding processing time and throughput rate. (IEEE 610)

13 September 2009

💎SQL Reloaded: Randomized Data Selection

A few years back I was asked to provide a random set of data from a data collection. Dealing only with a small number of records and as they were in Excel, it was much easier to use the Excel’s RAND function. Even if SQL Server provides a RAND function too, it returns the same value within a data set when the same parameter is provided. In order to avoid this behavior it would be useful if could be provided a distinct parameter value for each row, for example using the data set's integer unique identifier or eventually create one, thus if ID is such a column the query would look something like that:

SELECT TOP 10 RAND(ID) PERC, ID, N1, N2

FROM

ORDER BY RAND(ID)

The fact that RAND will return always the same value for the same parameter value might be a problem because the above query will always return the same records. In exchange you can use a multiplier for the parameter provided to RAND function - for example in a first attempt you can use RAND(id), in a second RAND(id*2), in a third RAND(id*3), etc.

This approach offers unfortunately a relatively "weak" randomization which might be not sufficient for repetitive selections, therefore under SQL Server 2005+ could be used instead a CLR-based UDF (user defined function) and use the random functions available in VB, C# or any other CLR programming language, or even the .Net Random class designed to produce a sequence of numbers that meet certain statistical requirements for randomness.

12 September 2009

💎SQL Reloaded: Number of Records I (via COUNT)

Getting the number of records from a table is straightforward, a simple SELECT doing the job:

-- general syntax 
SELECT count(*) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(*) NoRecords
FROM [Person].[Address]

Some prefer using a count(1) instead of a count(*), though it doesn't make a difference considering how SQL Server handles the two scripts in the background.

-- general syntax 
SELECT count(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(1) NoRecords
FROM [Person].[Address]

Same can be achieved by using a SUM:

-- general syntax 
SELECT SUM(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT SUM(1) NoRecords
FROM [Person].[Address]

However the SUM is typically used when counting only certain records within the same line (the attributes must be know however beforehand):

-- counting the occurrences for a certain values via SUM
SELECT SUM(CASE WHEN Class IS NULL THEN 1 ELSE 0 END) [Nulls]
, SUM(CASE WHEN Class = 'H' THEN 1 ELSE 0 END) [H]
, SUM(CASE WHEN Class = 'L' THEN 1 ELSE 0 END) [L]
, SUM(CASE WHEN Class = 'M' THEN 1 ELSE 0 END) [M]
, count(*) NoRecords
FROM [Production].[Product]

The sane output can be used via a GROUP BY based on the grouping attributes:

-- counting the occurrences for a certain attribute via COUNT
SELECT Class
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class

-- counting the occurrences for a group of attributes via COUNT
SELECT Class
, Color
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class
, Color

If you want to implement the functionality in a function or stored procedure and provide the name of the table as parameter, then things get a little more complicated as the query needs to be built dynamically on runtime. In stored procedures a query built dynamically can be usually executed using EXEC or EXECUTE stored procedures, though none of them will do as we need to reuse the actual value returned by the query. This can be achieved with the help of a third stored procedure called sp_executesql, and the work can be done relatively easily with the below stored procedure:

-- dropping the stored procedure
--DROP PROCEDURE IF EXISTS dbo.pGetNumberRecords

-- creating the stored procedure
CREATE PROCEDURE dbo.pGetNumberRecords(
  @Table varchar(100) 
, @NumberRecords int OUTPUT 
)
AS 
DECLARE @SglQuery nvarchar(250) 

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @SglQuery = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @SglQuery
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

-- testing the stored procedure 
DECLARE @NumberRecords int 
EXEC dbo.pGetNumberRecords '[Production].[Product]', @NumberRecords OUTPUT 
SELECT @NumberRecords

The overhead of using a stored procedure for getting the records count is bearable though the solution is not necessarily elegant, a nicer approach being the use of functions for the same purpose:

-- dropping the function
--DROP FUNCTION IF EXISTS dbo.GetNumberRecords

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@table as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
DECLARE @sql varchar(200) 
DECLARE @NumberRecords int

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @sql = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @sql
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

RETURN @NumberRecords
END

-- testing the function
SELECT dbo.GetNumberRecords('[AdventureWorks2014].[Production].[Product]')

Unfortunately UDFs have the restriction that they can’t execute stored procedures, and thus the above technique can’t be used. Even if the function is created, when attempting to run it, the following error occurs:

Msg 557, Level 16, State 2, Line 185
Only functions and some extended stored procedures can be executed from within a function.

There is actually an alternative for executing pass-through queries in functions by using OPENQUERY, OPENROWSET or OPENDATASOURCE functions, allowing thus to execute DML statements against a given linked server, as is the case of first two functions, respectively using ad-hoc connection information as is the case of the third function. For exemplification I chose OPENQUERY function, this being the code created to return the number of records from a specified table (you'll need to replace with your server, see below):

--dropping the test function 
-- DROP FUNCTION IF EXISTS dbo.GetNumberRecordsTest

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecordsTest() 
RETURNS int 
AS 
BEGIN 
  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( <server_name>, 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM [AdventureWorks2014].[Production].[Product]') AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecordsTest()

For this piece of code to work is first necessary to refer the server as a linked server:

EXEC sp_serveroption <server_name>, 'Data Access', true

Withought doing this first you’ll get the error:

Msg 7411, Level 16, State 1, Procedure GetNumberRecords, Line 4 Server '>server_name<' is not configured for DATA ACCESS.

You can use sp_helpserver to get server's name:

-- getting server's name
EXEC sp_helpserver

The above function doesn’t help much. It would be helpful if we could provide the table’s name as parameter and store the query string in a variable, something like:

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@TableName as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
  DECLARE @sql varchar(200) 
  SET @sql = 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM ' + @TableName

  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( , @sql) AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecords()

Unfortunately OPENQUERY function same as OPENROWSET and OPENDATASOURCE expect a constant parameter as query and therefore the error (as per SQL Server 2005):

Msg 102, Level 15, State 1, Procedure GetNumberRecords, Line 12 Incorrect syntax near '@sql'. Expecting STRING or TEXT_LEX

I wonder why the SQL Server team hasn’t allowed the use of variables as parameters for OPENQUERY functions, if not in the previous SQL Server versions at least in SQL Server 2008! Anyway, another impediment would be raised by the deterministic nature of functions.

Notes:

1) There are further alternatives, e.g. sys.partitions DMV provides similar information (see post) ideal for scenarios in which only an approximative value is needed.
2) For other scenarios one can use the sp_MSForEachTable undocumented stored procedure (see post)

Happy coding!

Previous Post <<||>> Next Post

💎SQL Reloaded: Deterministic Functions (GetDate SQL Server 2000/2005)

In an answer on LinkedIn, one of the users made me attentive that in SQL Server 2005 and later versions the GetDate() and other non-deterministic UDFs (user-defined functions) can be used inside functions and views, thing which was not possible under SQL Server 2000. This lack of functionality involved considerable overhead when was needed to use current date in table-valued functions. The solution was to send the date as parameter to the respective objects.

For exemplification the following piece of code would run successfully on SQL Server 2005 but fail on SQL Server 2000, returning the below error message.

-- creating the test function
CREATE FUNCTION dbo.fGetCurrentDate() 
RETURNS smalldatetime AS 
BEGIN 
    RETURN GetDate()
END

-- testing the function
SELECT dbo.fGetCurrentDate()

Error message:
Server: Msg 443, Level 16, State 1, Procedure GetCurrentDate, Line 6 Invalid use of 'getdate' within a function.

A similar error message resulted when using the GetDate() function inside of a view:

-- creating the test view
CREATE VIEW dbo.vGetCurrentDate
AS
SELECT GetDate() AS CurrentDate

-- testing the view
SELECT * 
FROM dbo.vGetCurrentDate

The alternative was to use an UDF instead, which functions like a parameterized view, something like:

-- creating the test UDF
CREATE FUNCTION dbo.udfFilterByDate(
 @date datetime)
 RETURNS TABLE
 AS 
 RETURN(
 SELECT * 
 FROM [AdventureWorks2014].Production.Product
 WHERE [SellStartDate]>=@Date)

 -- testing the UDF
 SELECT *
 FROM dbo.udfFilterByDate(GetDate())

 -- testing the UDF (alternative)
 DECLARE @date as datetime
 SET @date = GetDate()
 SELECT *
 FROM dbo.udfFilterByDate(@date)

Using an UDF can still be necessary when is needed to pass dates and other types of parameters used inside of a correlated query or within the logic.

29 August 2009

🛢DBMS: Lock Granularity (Definitions)

"The size of the locked area - database, file, table, page, row, or column." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"The size of the database element on which a lock is placed (usually a table or a row within a table)." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"Indicates the level of lock use. Locking can take place at the following levels: database, table, page, row, and field (attribute)." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"The level of data that is involved in a lock. Lock granularity can be at any level of physical data: database, block, set of tables, row, or column level, and include any amount of data contained within those objects." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

🛢DBMS: Extensible Markup Language (Definitions)

"A standard for a markup language, similar to HTML, that allows tags to be defined to describe any kind of data you have, making it very popular as a format for data feeds." (Mike Moran & Bill Hunt , "Search Engine Marketing, Inc", 2005)

"Facilitates the assignment of meaningful structures and definitions of data and services for use by multiple systems. XML simplifies the ability to transmit and share data." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"Simple and flexible text format used to represent data. XML was designed by the World Wide Web Consortium (W3C)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"separates content from format, thus letting the browser decide how and where content gets displayed. XML is not a language, but a system for defining other languages so that they understand their vocabulary." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A platform-independent markup language for specifying the structure of data in a text document used for both data storage and the transfer of data." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A way of representing data and data relationships in text files, typically for data exchange between software of different types." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed. , 2010)

"A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document’s data elements. XML is designed to facilitate the exchange of structured documents such as orders and invoices over the Internet." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A specification for creating text files that contain hierarchical data." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"Has been created to overcome some difficulties proper to HTML (Hypertext Markup Language) that – developed as a means for instructing the Web browsers how to display a given Web page – is a ‘presentation-oriented’ markup tool. XML is called ‘extensible’ because, at the difference of HTML, is not characterized by a fixed format, but it lets the user design its own customized markup languages (using, e.g., a specific DTD, Document Type Description) for limitless different types of documents; XML is then a ‘content-oriented’ markup tool." (Gian P Zarri, "RDF and OWL for Knowledge Management", 2011)

"A set of rules for encoding documents electronically. XML was chosen as the standard message format because of its widespread use and open source development efforts." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"A standard metalanguage for defining markup languages that is based on Standard Generalized Markup Language (SGML)." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Extensible markup language (XML) is a simple, very flexible text format derived from SGML (standard generalized markup language). While XML was originally designed to meet the challenges of large-scale electronic publishing, it plays an increasingly significant role in the exchange of a wide variety of data on the web." (Kamalendu Pal, "Integrating Heterogeneous Enterprise Data Using Ontology in Supply Chain Management", 2019)

"A universal markup language for text and data, using nested tags to add structure and meta-information to the content." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

"A 'best practices' subset of SGML that has been designed by the Worldwide Web Consortium (W3C) for use on the Internet." (Microfocus)

"A notation in which you describe the structure of information in a text document by enclosing information in user-defined tags that define the syntactic elements. A flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. J2EE deployment descriptors are expressed in XML." (Microfocus)

22 August 2009

🛢DBMS: Concurrency Control (Definitions)

"The control of concurrent access. SQL Server uses locking to allow multiple users to access and change shared data at the same time without conflicting with one another." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Concurrency control is the control on the database and transactions which are executed concurrently to ensure that each transaction completed healthy." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Concurrency control ensures that database operations can be executed concurrently without compromising correctness. Pessimistic concurrency control, such as used in systems with locks, will block any potentially conflicting operations even if they may not turn out to actually conflict. Optimistic concurrency control, the approach used by WiredTiger, will delay checking until after a conflict may have occurred, aborting and retrying one of the operations involved in any write conflict that arises." (MongoDb, "Glossary", 2008)

"Mechanisms to ensure that a database remains consistent and accurate during concurrent use." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A DBMS feature that is used to coordinate the simultaneous execution of transactions in a multiprocessing database system while preserving data integrity." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"The control of process contention for resources within multi-process systems." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

20 August 2009

🛢DBMS: Database-as-a-Service (Definitions)

"Database-as-a-service (DBaaS) is a cloud computing service model that provides users with some form of access to a database without the need for setting up physical hardware, installing software or configuring for performance." (Kashif Munir, "Security model for cloud database as a service", 2015)

"A database hosted in the cloud on a pay per use basis, for example Amazon Web Services." (Analytics Insight)

"An on-demand, secure, scalable, and self-service database that automates the database provisioning and administration to support new and existing business applications and operational systems." (Forrester)

"Database as a service (DBaaS) is a cloud computing managed service offering that provides access to a database without requiring the setup of physical hardware, the installation of software or the need to configure the database. Most database administration and maintenance tasks are handled by the service provider, enabling users to quickly benefit from the database service." (TechTarget) [source]

"Database as a service (DBaaS) is a cloud database offering that provides customers with access to a database without having to deploy and manage the underlying infrastructure. DBaaS is delivered as a managed database service, which means that the provider takes care of patching, upgrading, and backing up the database." (Oracle) [source]

"DBaaS (also known as managed database service) is a cloud computing service that lets users access and use a cloud database system without purchasing and setting up their own hardware, installing their own database software, or managing the database themselves (not to mention hiring the high-priced talent required to do so). The cloud provider takes care of everything from periodic upgrades to backups to ensuring that the database system remains available and secure 24/7." (IBM) [source]

🛢DBMS: CRUD (Definitions)

"Create, Read, Update, and Delete. Refers to the major functions that need to be implemented in a database solution." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"Acronym for the four main database operations: Create, Read, Update, Delete. These operations correspond to the SQL statements INSERT, SELECT, UPDATE, and DELETE." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An acronym for the fundamental operations of a database: Create, Read, Update, and Delete. See MongoDB CRUD Operations." (MongoDb, "Glossary", 2008)

"A list of the only functions of data in persistent storage, in a convenient acronym form." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"This acronym is commonly used to describe basic database functions." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A current-state analysis exercise that inventories the systems, applications, and business processes that can create, read, update, and delete critical enterprise data and attributes. It's used to identify the processes and data access points that are affecting data quality or allowing improper usage of the data without suitable validation and governance." (Forrester)

"CRUD is an acronym that comes from the world of computer programming and refers to the four functions that are considered necessary to implement a persistent storage application: create, read, update and delete." (Sumo Logic) [source]

"CRUD (acronym for Create, Read, Update, Delete) are considered to be basic functions regarding stored data. In computer programming, possible actions are often mapped to these standard CRUD functions in order to clarify the actions. For example, standard HTTP(S) actions GET and POST refer to Read and Create functions regarding stored data." (Innopay)

🛢DBMS: Change Data Capture [CDC] (Definitions)

"As changes are made to a production data source, change data capture reads the source database log. This information can be used to prepare a batch to update the data warehouse, or it can update the data warehouse on a transaction-by-transaction basis. With SQL Server 7.0, transactional replication is an example of change data capture." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The process of capturing changes made to a production data source. Change data capture is typically performed by reading the log file of the Database Management System of the source database. Change data capture consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a Data Warehousing environment." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of capturing changes made to a production data source; typically used in data warehousing environments." (Craig S Mullins, "Database Administration", 2012)

"Change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. CDC often uses the database transaction log to populate the deltas, although it can also query the database directly." (Piethein Strengholt, "Data Management at Scale", 2020)

"CDC is the process of capturing changes that were made in the source systems and applying these changes throughout the enterprise for both decision support systems such as information warehouse and operational data stores as well as other downstream consuming applications." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"An automated approach for ensuring that data changes are synchronized across an enterprise by replicating data changes from a source system to other systems." (Jonathan Ferrar et al, "The Power of People", 2017)

🛢DBMS: Clustering (Definitions)

"Technology that enables you to create a hot spare. That is a server that is actually running and can take over immediately. This technology enables you to mirror an entire server to another computer." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"The use of multiple computers to provide increased reliability, capacity, and management capabilities." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

[federated cluster:] "A grouping of SQL servers used together to achieve scalability by employing a distributed partition view. A federated cluster is not used for availability, only for achieving scalability through scale out." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"Any collection of distinct computers that are connected and used as a parallel computer, or to form a redundant system for higher availability. The computers in a cluster are not specialized to cluster computing and could, in principle, be used in isolation as standalone computers. In other words, the components making up the cluster, both the computers and the networks connecting them, are not custom-built for use in the cluster." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"Connecting two or more computers in such a way that they behave like a single computer to an application or client. Clustering is used for parallel processing, load balancing, and fault tolerance." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"A method of keeping database files physically close to one another on the storage media for improving performance through sequential pre-fetch operations." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"(1) The condition whereby data is physically ordered contiguously by a specified key (usually implemented by means of an index). (2) The use of multiple, 'independent' computing systems working together to form what appears to users as a single highly available system." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed, 2012)

"The tendency of elements to become unevenly distributed in the hash table, with many adjacent locations containing elements" (Nell Dale et al, "Object-Oriented Data Structures Using Java 4th Ed.", 2016)

19 August 2009

🛢DBMS: Shared Lock (Definitions)

"A lock created by non-update ('read') operations. Other users may read the data concurrently, but no transaction can acquire an exclusive lock on the data until all the shared locks have been released." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A lock used during READ operations." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"Lock that allows concurrent transactions to read a resource." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Control over a portion of the database given to one or more transactions that prevents other transactions from modifying the data while the locks are in place." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A lock on a database element that prevents other transactions from updating the database element while the lock is held." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A lock that is issued when a transaction requests permission to read data from a database and no exclusive locks are held on that data by another transaction. A shared lock allows other read-only transactions to access the database." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A lock created by non-update (read) operations." (SQL Server 2012 Glossary, "Microsoft", 2012)

"The type of lock taken by the DBMS when data is read with no intent to update it." (Craig S Mullins, "Database Administration", 2012)

"A lock that limits concurrently running application processes to read-only operations on database data." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)