SQL Troubles: SQL Server 2005

Showing posts with label SQL Server 2005. Show all posts

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

{def} system database available as a global resource to all users connected to a Database Engine instance [1]

does not persist after SQL Server shuts down

created at restart using the specifications from the model database
doesn’t need crash recovery
requires rollback capabilities

{scenario} when updating a row in a global temporary table and rolling back the transaction [12]

there's no way to undo this change unless was logged the ‘before value’ of the update [12]
there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]

{scenario} during an Insert operation

only redo information is needed
when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]

only redo information is needed

not all objects in TempDB are subject to logging [18]

{characteristic} critical database

⇒ needs to be configured adequately
has the highest level of create and drop actions [5]
under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
workers use tempdb like any other database

any worker can issue I/O to and from tempdb as needed [5]

if the tempdb database cannot be created, SQL Server will not start [7]

{recommendation} start SQL Server by using the (-f) startup parameter

files

data file:

{default} dempdb.mdf

log file

{default} templog.ldf

{operation} changing the tempdb database

{step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
{step} stop and restart SQL Server

{operation} create file group

only one file group for data and one file group for logs are allowed.
{default} the number of files is set to 1
can be a high contention point for internal tracking structures

often it's better to create multiple files for tempdb

multiple files can be created for each file group

{recommendation} set the number of files to match the number of CPUs that are configured for the instance

⇐ if the number of logical processors is less than or equal to eight [1]

otherwise, use eight data files [1]
contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]

⇐ it’s not imperative

{scenario}add a file per CPU (aka multiple files)

allows the file group manager to

pick the next file

there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]

then use the target location to start locating free space [4]

⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)

allows each of the logical schedulers to loosely align with a file [4]

since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention [4]

internal latching and other activities achieve better separation (per file)

⇒ the workers don’t cause a resource contention on a single tempdb file

rare type of contention [5]

{misconception}

leads to new I/O threads per file [5]

{scenario} add more data files

may help to solve potential performance problems that are due to I/O operations.
helps to avoid a latch contention on allocation pages

⇐ manifested as a UP-latch)

{downside} having too many files

increases the cost of file switching
requires more IAM pages
increases the manageability overhead.

{recommendation} configure the size of the files to be equal

⇐ to better use the allocation mechanism (proportional fill),

{recommendation} create tempdb files striped across fast disks

the size of the tempdb affect the performance of the database
{event} server restarts

the tempdb file size is reset to the configured value (the default is 8 MB).

{feature} auto grow

temporary

⇐ unlike other types of databases)

when the tempdb grows, all transactional activity may come to an halt

because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
{recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]

{recommendation} should only be used more for exceptions rather than as a strategy [12]

transactions

lose the durability attribute
can be rolled back

there is no need to REDO them because the contents of tempdb do not persist across server restarts

because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions

most internal operations on tempdb do not generate log records because there is no need to roll back

{restriction} add filegroups [1]
{restriction} remove the primary filegroup, primary data file, or log file [1]
{restriction} renaming the database or primary filegroup [1]
{restriction} back up or restore the database [1]
{restriction} change collation [1]

the default collation is the server collation [1]

{restriction} change the database owner [1]

owned by sa

{restriction} create a database snapshot [1]
{restriction} drop the database [1]
{restriction} drop the quest user from the database [1]
{restriction} enable CDC
{restriction} participate in database mirroring [1]
{restriction} can only be configured in the simple recovery model

⇒ the value can’t be changed

{restriction} auto shrink is not allowed for tempdb

database shrink and file shrink capabilities are limited

⇐ because many of the hidden objects stored in tempdb cannot be moved by shrink operations

{restriction} the database CHECKSUM option cannot be enabled
{restriction} database snapshots cannot be created
{restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.

only offline checking for DBCC CHECKTABLE is performed

⇒ TAB-S lock is needed
here are internal consistency checks that occur when tempdb is in use

if the checks fail, the user connection is broken and the tempdb space used by the connection is freed.

{restriction} set the database to OFFLINE
{restriction} set the database or primary filegroup to READ_ONLY [1]

used by

{feature} query
{feature} temporary tables
{feature} table variables
{feature} table-valued functions
{feature} user-defined functions
{feature} online/offline index creation
{feature} triggers
{feature} cursors
{feature} RCSI
{feature} MARS
{feature} DBCC CHECK
{feature} LOB parameters
{feature} Service Broker and event notification
{feature} XML and LOB variable
{feature} query notifications
{feature} database mail

used to store

user objects

objects explicitly created

user objects that can be created in a user database can also be created in tempdb [1]

{limitation} there's no durability guarantee [1]
{limitation} dropped when the Database Engine instance restarts [1]

{type} global and local temporary tables

correspond to ## and # tables and table variables created explicitly by the application [12]

REDO information is not logged

{type} indexes on global and local temporary tables
{type} table variables
{type} tables returned in table-valued functions
{type} cursors
improved caching for temporary objects

temp objects

only cached when none of the following conditions is violated: [12]

named constraints are not created

DDL statements that affect the table are not run after the temp table has been created

e.g. CREATE INDEX or CREATE STATISTICS statements

not created by using dynamic SQ

e.g. sp_executesql N'create table #t(a int)'

internal objects

objects created internally by SQL Server

each object uses a minimum of nine pages [1]

an IAM page and an eight-page extent [1]

{type} work tables

store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]

{type} work files

used for for hash join or hash aggregate operations [1]

{type} intermediate sort results

used for operations

creating or rebuilding indexes

if SORT_IN_TEMPDB is specified

certain GROUP BY, ORDER BY
UNION queries

{restriction} applications cannot directly insert into or delete rows from them
{restriction} metadata is stored in memory
{restriction} metadata does not appear in system catalog views such as sys.all_objects.
{restriction} considered to be hidden objects
{restriction} updates to them do not generate log records
{restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated.
each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
used

to store intermediate runs for sort
to store intermediate results for hash joins and hash aggregates.
to store XML variables or other LOB data type variables

e.g. text, image, ntext, varchar(max), varbinary(max), and all others

by queries that need a spool to store intermediate results
by keyset cursors to store the keys
by static cursors to store a query result
by Service Broker to store messages in transit
by INSTEAD OF triggers to store data for internal processing
by any feature that uses the above mentioned operations

version stores

collections of data pages that hold the data rows that support row versioning [1]
{type} common version store
{type} online index build version store

{} storage

I/O characteristics

{requirement} read after rights

{def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]

{recommendation} writer ordering

{def} the ability of the subsystem to maintain the correct order of write operations [7]

{recommendation} torn I/O prevention

{def} the ability of the system to avoid splitting individual I/O requests [7]

{requirement} physical sector alignment and size

devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]

can be put on specialty systems

⇐ they can’t be used for other databases
RAM disks

non-durable media
support double RAM cache

one in the buffer pool and one on the RAM disk

⇐ directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]

give up RAM

implementations of RAM disks and RAM-based files caches

solid state

high speed subsystems
{recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]

sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]

{feature} logging optimization

avoids logging the "after value" in certain log records in tempdb

when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file

can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device

{feature} instant data file initialization

isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
{benefit} minimizes overhead significantly when tempdb needs to auto grow.

without this, auto grow could take a long time and lead to application timeout.

reduces the impact of database creation [5]

because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]

reduces the gain from using multiple threads during database creation. [5]

{feature} proportional fill optimization

reduces UP latch contention in tempdb
when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time

⇐ accomplished by removing a latch that was taken during proportional fill.

{feature} deferred drop in tempdb

when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait

⇒ faster response time to applications.

{feature} worktable caching

{improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated

⇐ in addition, the first nine pages for the work table are kept.

{feature} caches temporary objects

when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming

⇐ this can cause contentions on tempdb system catalog tables and allocation pages

Previous Post <<||>> Next Post

References

[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005

[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]

[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]

[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]

[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]

[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]

[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output

LOB - large object
MARS - Multiple Active Result Sets

NTFS - New Technology File System

RAM - Random-Access Memory

RCSI - Read Committed Snapshot Isolation

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 22-Mar-2024

[SQL Server 2005] Indexed View

{def} a materialized view

materializes the data from the view queries, storing it in the database in a way similar to tables [6]

⇒ its definition is computed and the resulting data stored just like a table [3]
the view is indexed by creating a unique clustered index on it

the resulting structure is physically identical to a table with a clustered index

⇐ nonclustered indexes also are supported on this structure

can be created on a partitioned table, respectively can be partitioned [1]

{benefit} can improve the performance of some types of queries [3]

e.g. queries that aggregate many rows
⇐ because the view is stored the same way a table with a clustered index is stored [1]
⇐ not well-suited for underlying data that are frequently updated [3]
more expensive to use and maintain than filtered indexes [5]

[query optimizer]

can use it to speed up the query execution [1]

the view doesn't have to be referenced in the query for the optimizer to consider it for a substitution [1]
{downside} DML query performance can degrade significantly [1]

⇐ in some cases, a query plan can't even be produced [1]
when executing UPDATE, DELETE or INSERT on the base table referenced, the indexed views must be updated as well [1]
{recommendation} test DML queries before production use [1]

analyze the query plan and tune/simplify the DML statemen [1]

can use the structure to return results more efficiently to the user

contains logic to use this index in either of the cases

the original query text referenced the view explicitly [2]
the user submits a query that uses the same components as the view (in any equivalent order) [2]
⇐ the query processor expands indexed views early in the query pipeline and always uses the same matching code for both cases [2]

the WITH(NOEXPAND) hint tells the query processor not to expand the view definition [2]
also instructs the query processor to perform an index scan of the indexed view rather than expand it into its component parts [5]
any extra rows in the indexed view are reported as 8907 errors [5]
any missing rows are reported as 8908 errors [5]

expose some of the benefits of view materialization while retaining the benefits of global reasoning about query operations [2]
expanded (aka in-lined) before optimization begins

gives the Query Optimizer opportunities to optimize queries globally [2]
makes it difficult for the (query) optimizer to consider plans that perform the view evaluation first, then process the rest of the query [2]

arbitrary tree matching is a computationally complex problem, and the feature set of views is too large to perform this operation efficiently [2]

cases in which it does not match the view

indexed views are inserted into the Memo and evaluated against other plan choices

while they are often the best plan choice, this is not always the case [2]
the Query Optimizer can detect logical contradictions between the view definition and the query that references the view [2]

there are also some cases where the Query Optimizer does not recognize an indexed view even when it would be a good plan choice [2]

often, these cases deal with complex interactions between high-level features within the query processor (e.g. computed column matching, the algorithm to explore join orders) [2]
consider the WITH (NOEXPAND) hint to force the query processor to pick that indexed view [2]

this usually is enough to get the plan to include the indexed view [2]

indexed view alternatives

are generated and stored in the Memo
are compared using costing equations against other possible plans
partial matches cost the residual operations as well

an indexed-view plan can be generated but not picked when the Query Optimizer considers other plans to have lower costs [2]

maintained as part of the update processing for tables on which the view is based

this makes sure that the view provides a consistent result if it is selected by the Query Optimizer for any query plan [2]
some query operations are incompatible with this design guarantee

restrictions are placed on the set of supported constructs in indexed views to make sure that the view can be created, matched, and updated efficiently [2]

{operation} updating indexed views

the core question behind the restrictions is “Can the query processor compute the necessary changes to the Indexed View clustered and nonclustered indexes without having to recompute the whole indexed view?” [2]

if so, the query processor can perform these changes efficiently as part of the maintenance of the base tables that are referenced in the view[2]

this property is relatively easy for filters, projections (compute scalar), and inner joins on keys[2]
operators that destroy or create data are more difficult to maintain, so often these are restricted from use in indexed views. [2]

matching indexed views is supported in cases beyond exact matches of the query text to the view definition [2]

it also supports using an indexed view for inexact matches where the definition of the view is broader than the query submitted by the user [2]

then applies residual filters, projections (columns in the select list), and even aggregates to use the view as a partial precomputation of the query result [2]

{concept} statistics on indexed views

normally statistics aren't needed

because the substitution of the indexed views into the query plan is considered only after all the statistics for the underlying tables and indexes are attached to the query plan [3]
used if the view is directly referenced by the NOEXPAND hint in a FROM clause

an error is generated and the plan is not created if the NOEXPAND hint is used on a view that does not also contain an index [3]

can’t be created by using sp_createstats or updated by using sp_updatestats.
auto update and auto create statistics features work for indexed views

created manually

via CREATE STATISTICS on the indexed view columns
via UPDATE STATISTICS to update column or index statistics on indexed views

{operation} creating a view

requires that the underlying object’s schema can’t change
requires WITH SCHEMABINDING option [5]
⇒ must include the two-part names of all referenced tables [5]
⇐ the tables can't be dropped and the columns can't be altetd when participate in a view unless the view is tropped [5]
⇐ an error is raised [5]
the user must hold

the CREATE VIEW permission in the database [1]
ALTER permission on the schema in which the view is being created [1]
if the base table resides within a different schema, the REFERENCES permission on the table is required as a minimum [1]
if the user creating the index differs from the users who created the view, for the index creation alone the ALTER permission on the view is required [1]

{operation} creating an index on the view[

indexes can only be created on views that have the same owner as the referenced table or tables (aka intact ownership chain between the view and the tables) [1]

{operation} dropping a view

makes all indexes on the view to be dropped [1]

⇐ all nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped [1]

{exception} ser-created statistics on the view are maintained [1]

nonclustered indexes can be individually dropped [1]
dropping the clustered index on the view

removes the stored result set [1]
the optimizer returns to processing the view like a standard view [1]

{operation} disable indexes on tables and views

when a clustered index on a table is disabled, indexes on views associated with the table are also disabled [1]

{option} EXPAND VIEWS

allows to prevent the Database Engine from using indexed views [1]

if any of the listed options are incorrectly set, this option prevents the optimizer from using the indexes on the views [1]
via OPTION (EXPAND VIEWS) hint

{recommendation} when using datetime and smalldatetime string literals in indexed views, explicitly convert the literal to the date type by using a deterministic date format style [1]
{limitation} AVG is not allowed {workaround} use SUM and COUNT_BIG (5]
{limitation} impacted by SET options [1]

{restriction} require fixed values for several SET options [1]
{recommendation} set the ARITHABORT user option to ON server-wide as soon as the first indexed view or index on a computed column is created in any database on the server [1]

{limitation} further requirements apply (see [1])
{limitation} aren't supported on top of temporal queries

⇐ queries that use FOR SYSTEM_TIME clause).

{scenario}simplifying SQL queries
{scenario} abstracting data models from user models
{scenario} enforcing user security

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: Create indexed views [link]
[2] Kalen Delaney et all (2009) Microsoft® SQL Server® 2008 Internals
[3] Microsoft Learn (2024) SQL Server: Views [link]
[4] Microsoft Learn (2024) SQL Server: CREATE INDEX (Transact-SQL) [link]

[5] Kalen Delaney et all (2012) Microsoft® SQL Server® 2012 Internals

[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

Resources:
[R1] Microsoft Learn (2024) SQL Server: Optimize index maintenance to improve query performance and reduce resource consumption [link]

Acronyms:
DML - Data Manipulation Language
QO - Query Optimizer

19 March 2025

💠🛠️🗒️SQL Server: Views [Notes]

Last updated: 19-Mar-2024

[SQL Server 2005] View (aka virtual table)

{def} a database object that encapsulates a SQL statement and that can be used as a virtual table in further SQL statements

cannot be executed by itself

⇐ it must be used within a query [15]

doesn't store any data

except index views
data is dynamically produced from the underlying table when the view is used [32]

views depend on the underlying tables and act like a filter on the underlying tables [32]

used just like regular tables without incurring additional cost

unless the view is indexed [25]

turning a query into a view

remove the ORDER BY clause
assure there are no name duplicates
assure that each column has a name

projected columns

columns included in the view

view’s column list

renames every output column just as if every column had those alias names in the SELECT statement
a view is more self-documenting if the column names of the view are specified in the SELECT statement and not listed separately in the view [27]

{restriction} sorting is not allowed in a view

unless the view includes a TOP predicate

ORDER BY clause serves only to define which rows qualify for the TOP predicate [15]

the only way to logically guarantee sorted results is to define the ORDER BY clause in the executing query [15]

[SQL Server 2005] had a bug in the Query Optimizer that would enable an ORDER BY in a view using a top 100 percent predicate [15]

the behavior was never documented or officially supported [15]

OFFSET FETCH clause

{restriction} parameters can’t be passed to a view [100]

{alternative} use an inline table-valued function

{restriction} cannot reference a variable inside the SELECT statement [100]
{restriction} cannot create a table, whether permanent or temporary

⇒ cannot use the SELECT/INTO syntax in a view

{restriction} can reference only permanent tables

⇒ cannot reference a temporary table [100]

{benefit} present the correct fields to the user
{benefit} enforce security

by specifying

only needed columns

projects a predefined set of columns [15]
hides sensitive, irrelevant, or confusing columns [15]
should be used in parallel with SQL Server–enforced security [15]

only needed records

by allowing users access to the view without the need to give access to the used tables

grant users read permission from only the views, and restrict access to the physical tables [15]

{benefit} maintainability
{benefit} provides a level of abstraction

hides the complexity of the underlying data structures
encapsulates (business)logic
denormalize or flatten complex joins
can consolidate data across databases/servers
can be used as single version of truth

{benefit} allow changing data in the base tables
{downside} layers of nested views require needless overhead for views’ understanding
{downside} single-purpose views quickly become obsolete and clutter the database [15]
{downside} complex views are perceived as having poor performance [15]
{best practice} use generic/standard naming conventions
{best practice} use aliases for cryptic/recurring column names
{best practice} consider only the requested columns
{best practice} group specific purpose view under own schema
{best practice} avoid hardcoding values
{best practice} use views for column-level security together with SQL Server–enforced security
{best practice} limit views to ad-hoc queries and reports

for extensibility and control [15]
⇐ performance isn’t the reason [15]

{poor practices} create views for single-purpose queries (aka one time requests)
{operation} create a view
{operation} drop a view
{operation} alter a view
{operation} select data
{operation} update data

unless the view is a simple single table view, it’s difficult to update the underlying data through the view [15]

{type} inline views

exist only during the execution of a query [32]
simplify the development of a one-time query [32]

allows creating queries in steps

enables troubleshooting

can replace inline UDFs
alternatives

inline UDFs
temporary tables

{type} indexed views

materialize the data, storing the results of the view in a clustered index on disk [15]
similar to a covering index

but with greater control

can include data from multiple data sources [15]
no need to include the clustered index keys [15]

designing an indexed view is more like designing an indexing structure than creating a view [15]

can cause deadlock when two or more of the participating tables is updated/inserted/deleted from two or more sessions in parallel such that they block each other causing a deadlock scenario [29]

{type} compatibility views

allow accessing a subset of the SQL Server 2000 system tables

don’t contain any metadata related to features added after

views have the same names as many of the system tables in previous version, as well as the same column names

⇒ any code that uses the SQL Server 2000 system tables won’t break [16]
there’s no guarantee that will be returned exactly the same results as the ones from the corresponding tables in SQL Server 2000 [16]

accessible from any database
hidden in the resource database

e.g. sysobjects, sysindexes, sysusers, sysdatabases

{type} [SQL Server 2015] catalog views

general interface to the persisted system metadata
built on an inheritance model

⇒ no need to redefine internally sets of attributes common to many objects

available over sys schema

must be included in object’s reference

some of the names are easy to remember because they are similar to the SQL Server 2000 system table names [16]
the columns displayed are very different from the columns in the compatibility views
some metadata appears only in the master database

keeps track of system-wide data (e.g. databases and logins)
other metadata is available in every database (e.g. objects and permissions)
metadata appearing only in the msdb database isn’t available through catalog views but is still available in system tables, in the schema dbo (e.g. backup and restore, replication, Database Maintenance Plans, Integration Services, log shipping, and SQL Server Agent)

{type} partitioned views

allow the data in a large table to be split into smaller member tables

the data is partitioned between the member tables based on ranges of data values in one of the columns [4]
the data ranges for each member table are defined in a CHECK constraint specified on the partitioning column [4]
a view that uses UNION ALL to combine selects of all the member tables into a single result set is then defined [4]
when SELECT statements referencing the view specify a search condition on the partition column, the query optimizer uses the CHECK constraint definitions to determine which member table contains the rows [4]

{type} distributed partition views (DPV)

local partitioned views

a single table is horizontally split into multiple tables, usually all have the same structure [30]

cross database partitioned views

tables are split among different databases on the same server instance

distributed (across server or instance) partitioned views

tables participating in the view reside in different databases which reside ondifferent servers or different instances

{type} nested views

views referred by other views [15]
can lead to an abstraction layer with nested views several layers deep

too difficult to diagnose and maintain [15]

{type} updatable view

view that allows updating the underlying tables

only one table may be updated
if the view includes joins, then the UPDATE statement that references the view must change columns in only one table [15]

typically not a recommended solution for application design
WITH CHECK OPTION causes the WHERE clause of the view to check the data being inserted or updated through the view in addition to the data being retrieved [15]

it makes the WHERE clause a two-way restriction [15]

⇒ can protect the data from undesired inserts and updates [15]

⇒ useful when the view should limit inserts and updates with the same restrictions applied to the WHERE clause [15]
when CHECK OPTION isn’t use, records inserted in the view that don’t match the WHERE constraints will disappear (aka disappearing rows) [15]

{type} non-updatable views

views that don’t allow updating the underlying tables
{workaround} build an INSTEAD OF trigger that inspects the modified data and then performs a legal UPDATE operation based on that data [15]

{type} horizontally positioned views.

used s to enforce row-level security with the help of a WITH CHECK option

{downside} has a high maintenance cost [15]
{alternative} row-level security can be designed using user-access tables and stored procedures [15]

{type} schema-bound views

the SELECT statement must include the schema name for any referenced objects [15]

SELECT * (all columns) is not permitted [15]

{type} subscription views

a view used to export Master Data Services data to subscribing systems

Previous Post <<||>> Next Post

References:
[4] Microsoft (2013) SQL Server 2000 Documentation

[15] Adam Jorgensen et al (2012) Microsoft® SQL Server® 2012 Bible

[16] Bob Beauchemin et al (2012) Microsoft SQL Server 2012 Internals
[25] Basit A Masood-Al-Farooq et al (2014) SQL Server 2014 Development Essentials: Design, implement, and deliver a successful database solution with Microsoft SQL Server 2014

[30] Kevin Cox (2007) Distributed Partitioned Views / Federated Databases: Lessons Learned

[32] Sikha S Bagui & Richard W Earp (2006) Learning SQL on SQL Server 2005

[100] Itzik Ben-Gan et al (2012) Exam 70-461: Querying Microsoft SQL Server 201

Acronyms:
DPV - Distributed Partition Views
UDF - User-Defined Function

💠🛠️🗒️SQL Server: Stored procedures (Notes)

Last updated: 19-Mar-2024

[SQL Server 2005] Stored procedure

{def} a database object that encapsulates one or more statements and compiled when used

is a saved batch

whatever a batch can do, a stored procedure can do

{characteristic} abstraction layer

provides the means of abstracting/decoupling a database [1]

{characteristic} performant

the fastest possible code, when well-written

keeps the execution of data-centric code close to the data

easier to index tune a database with stored procedures

{characteristics} usability

easier to write, consume and troubleshoot stored procedures
less likely to contain data integrity errors
easier to unit test, than ad-hoc SQL code

{characteristic} secure

{best practice} locking down the tables and providing access only through stored procedures is a standard for database development [1]
minimizes the risk of sql injection attacks
provides an alternative for passing a dataset to SQL Server

can have zero or more input parameters
can have zero or more output parameters
highly dependent on the objects it calls
[SQL Server 2008] provides enhanced ways to view these dependencies

managed by means of the DDL commands

{operation} create stored procedure

via CREATE STORED PROCEDURE <schema.name>
CREATE must be the first command in a batch;
the termination of the batch ends the creation of the stored procedure
when created, its text is saved in a system table

like other database objects
the text is only stored as definition

⇐ the text is not stored for the execution of the stored procedure

{best practice}

never use "sp_" to prefix the name of a stored procedure
reserved for system stored procedures

{best practice} use a standard prefix for the stored procedure name (e.g. usp, Proc)

it helps identify an object as a stored procedure when reviewing and troubleshooting code [15]

{best practice} always use a two-part naming convention

ensures that the stored procedure is added to the appropriate schema [15]

{best practice} use descriptive names
{best practice} implement error handling

syntax and logic errors should be gracefully handled, with meaningful information sent back to the calling application [15]

{operation} alter stored procedure

replaces the entire existing stored procedure with new code
preferable to dropping and recreating it

because the latter method removes any permissions [1]

{operation} drop stored procedure

removes it from the database

{operation} execute stored procedure

via EXECUTE or EXEC

can be executed individually also without EXECUTE when the stored procedure is the first line of a batch

{concept} compilation

automatic process that takes place the first time the code is executed
{option} WITH ENCRYPTION

obfuscates the code in the object

is not to prevent a user from reading the code

the stored procedure text is not directly readable
there is no routine to hide the code

SQL Server applies a bitwise OR to the code in the object

anyone with VIEW DEFINITION authority on an object can see the code though
carried from early versions of SQL Server [2]
{best practice} there must be a compelling and carefully considered justification for encrypting a stored procedure [15]

e.g. third-party software

{type} system stored procedures

stored in master database

{type} extended stored procedures

routines residing in DLLs that function similarly to regular stored procedures [33]

usually written in C or C++

receive parameters and return results via SQL Server's Open Data Services API [33]
reside in the master database [33]
run within the SQL Server process space [33]
aren't automatically located in the master database [33]
don't assume the context of the current database when executed [33]
fully qualify the reference to execute an extended procedure from a database other than the master [33]

{workaround} wrapping the extended stored procedure into a system stored procedure

can be called from any database without requiring the master prefix
technique used with a number of SQL Server's own extended procedures

{type} internal stored procedures

system-supplied stored procedures implemented internally by SQL Server [33]
have stubs in master..sysobjects
are neither true system procedures nor extended procedures [33]

listed as extended procedures, but they are actually implemented internally by the server [33]

cannot be dropped or replaced with updated DLLs

normally this happens when a service pack is applied [33]

examples:

sp_executesql
sp_xml_preparedocument
most of the sp_cursor routines
sp_reset_connection

{type} user-defined stored procedures
{type} remote stored procedures

may only be remotely called

⇐ may not be remotely created [15]

require that the remote server be a linked server

namely, a four-part name reference

via EXECUTE Server.Database.Schema.StoredProcedureName;

a distributed query

OpenQuery(LinkedServerName, 'EXECUTE Schema.StoredProcedureName');

{type} recursive stored procedures

stored procedures that call themselves
perform numeric computations that lend themselves to repetitive evaluation by the same processing steps [33]
calls can be nested up to 32 levels deep

{type} nested stored procedure

calls can be nested up to 32 levels deep

{advantage} execution plan retention and reuse
{advantage} query auto-parameterization
{advantage} allow encapsulation of business rules and policies
{advantage} allow application modularization
{advantage} allow sharing of application logic between applications
{advantage} allow access to database objects that is both secure and uniform
{advantage} allow consistent, safe data modification
{advantage} allow network bandwidth conservation
{advantage} support for automatic execution at system start-up
{limitation} cannot be schema bound

[SQL Server 2012] {feature} Result Sets

can guarantee the structure of the returned results at run time

{myth} a stored procedure provide a performance benefit because the execution plan is cached and stored for reuse

[SQL Server 2000] all execution plans are cached, regardless of whether they’re the result of inline T-SQL or a stored procedure call
{corollary} all T-SQL must be encapsulated into stored procedures

{myth} stored procedures are hard to manage

Previous Post <<||>> Next Post

References:
[1] Paul Nielsen et al (2009 SQL Server 2008 Bible
[2] Tobias Thernström et al (2009) MCTS Exam 70-433 Microsoft SQL Server 2008 – Database Development. Self-Paced Training Kit
[7] Michael Lee & Gentry Bieker (2008) Mastering SQL Server® 2008
[8] Joseph Sack (2008) SQL Server 2008 Transact-SQL Recipes
[15] Adam Jorgensen et al (2012) Microsoft® SQL Server® 2012 Bible
[26] Patrick LeBlanc (2013) Microsoft SQL Server 2012: Step by Step, Microsoft Press
[33] Ken Henderson (2001) The Guru's Guide to SQL Server™ Stored Procedures, XML, and HTML
[42] Dušan Petkovic (2008) Microsoft® SQL Server™ 2008: A Beginner’s Guide
[51] Michael Lee & Gentry Bieker (2009) Mastering SQL Server® 2008
[64] Robert D Schneider and Darril Gibson (2008) Microsoft® SQL Server® 2008 All-In-One Desk Reference for Dummies
[100] Itzik Ben-Gan et al (2012) Exam 70-461: Querying Microsoft SQL Server 2012

Acronyms:
DDL - Data Definition Language
DLL - Dynamic Link Library

💠🛠️🗒️SQL Server: Statistics [Notes]

Last updated: 19-Mar-2024

[SQL Server 2005] Statistics

{def} objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view [2]

used by query optimizer to estimate the cardinality (aka number of rows) in the query result
created on any data type that supports comparison operations [4]
metadata maintained about index keys and, optionally, nonindexed column values

implemented via statistics objects
can be created over most types

generally data types that support comparisons (such as >, =, and so on) support the creation of statistics [20]

the DBA is responsible for keeping the statistics objects up to date in the system [20]
{restriction} the combined width of all columns constituting a single statistics set must not be greater than 900 bytes [21]

statistics collected

time of the last statistics collection (inside STATBLOB) [21]
number of rows in the table or index (rows column in SYSINDEXES) [21]
number of pages occupied by the table or index (dpages column in SYSINDEXES) [21]
number of rows used to produce the histogram and density information (inside STATBLOB, described below) [21]
average key length (inside STATBLOB) [21]
histogram

measures the frequency of occurrence for each distinct value in a data set [31]
computed by the query optimizer on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view [31]

when created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers [31]
it sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps [31]

each step includes a range of column values followed by an upper bound column value [31]
the range includes all possible column values between boundary values, excluding the boundary values themselves [21]
the lowest of the sorted column values is the upper boundary value for the first histogram step [31]

defines the histogram steps according to their statistical significance [31]

uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values

the maximum number of steps is 200 [31]
the number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points [31]

stored only for the first column of a composite index

the most selective columns in a multicolumn index should be selected first [2]

the histogram will be more useful to the optimizer [2]

single column histogram [21]
splitting up composite indexes into multiple single-column indexes is sometimes advisable [2]

used to estimate the selectivity of nonequality selection predicates, joins, and other operators [3]
contains a sampling of up to 200 values for the index's first key column

the values in a given column are sorted in ordered sequence

divided into up to 199 intervals

so that the most statistically significant information is captured
in general, of nonequal size

{type} table-level statistics

statistics maintained on each table
include:

number of rows in the table [8]
number of pages used by the table [8]
number of modifications made to the keys of the table since the last update to the statistics [8]

{type} index statistics

created with the indexes
updated with fullscan when indexes are rebuild the index [13]

{exception} [SQL Server 2012] for partitioned indexes when the number of partitions >1000 it uses default sampling [13]

{type} column statistics (aka non-index statistics)

statistics on non-indexed columns
determine the likelihood that a given value might occur in a column [2]

gives the optimizer valuable information in determining how best to service a query [2]

allows the optimizer to estimate the number of rows that will qualify from a given table involved in a join [2]

allows to more accurately select join order [2]

if automatic creation or updating of statistics is disabled, the Query Optimizer returns a warning in the showplan output when compiling a query where it thinks it needs this information [20]

used by optimizer to provide histogram-type information for the other columns in a multicolumn index [2]

the more information the optimizer has about data, the better [2]
queries asking for data that is outside the bounds of the histogram will estimate 1 row and typically end up with a bad, serial plan [15]

automatic created

when nonindexed column is queried while AUTO_CREATE_STATISTICS is enabled for the database

aren’t updated when an index is reorganized or rebuild [10]

{exception} [SQL Server 2005] indexes rebuilt with DBCC dbreindex [10]

updates index statistics as well column statistics [10]
this feature will be removed in a future version of Microsoft SQL Server

{type} [SQL Server 2008] filtered statistics

useful when dealing with partitioned data or data that is wildly skewed due to wide ranging data or lots of nulls
scope defined with the help of a statistic filter

a condition that is evaluated to determine whether a row must be part of the filtered statistics
the predicate appears in the WHERE clause of the CREATE STATISTICS or CREATE INDEX statements (in the case when statistics are automatically created as a side effect of creating an index)

{type} custom statistics

{limitation}[SQL Server] not supported

{type} temporary statistics

when statistics on a read-only database or read-only snapshot are missing or stale, the Database Engine creates and maintains temporary statistics in tempdb [22]

a SQL Server restart causes all temporary statistics to disappear [22]
the statistics name is appended with the suffix _readonly_database_statistic

to differentiate the temporary statistics from the permanent statistics [22]
reserved for statistics generated by SQL Server [2]
scripts for the temporary statistics can be created and reproduced on a read-write database [22]

when scripted, SSMS changes the suffix of the statistics name from _readonly_database_statistic to _readonly_database_statistic_scripted [22]

{restriction} can be created and updated only by SQL Server [22]

can be deleted and monitored [22]

{operation} create statistics

auto-create (aka auto-create statistics)

on by default

samples the rows by default

{exception} when statistics are created as a by-product of index creation, then a full scan is used [3]

{exception} statistics may not be created

for tables where the cost of the plan execution would be lower than the statistics creation itself [21]
when the server is too busy [21]

e.g. too many outstanding compilations in progress [21]

manually

via CREATE STATISTICS

only generates the statistics for a given column or combination of columns [21]
{recommendation} keep the AUTO_CREATE_STATISTICS option on so that the query optimizer continues to routinely create single-column statistics for query predicate columns [22]

{operation} update statistics

covered by the same SQL Server Profiler event as statistics creation
ensures that queries compile with up-to-date statistics [22]
{recommendation} statistics should not be updated too frequently [22]

because there is a performance tradeoff between improving query plans and the time it takes to recompile queries [22]

tradeoffs depend from application to application [22]

triggered by the executions of either commands:

CREATE INDEX ... WITH DROP EXISTING

scans the whole data set

the index statistics are initially created without sampling

allows to set the sample size in the WITH clause either by specifying

FULLSCAN
percentage of data to scan
interpreted as an approximation

sp_createstats stored procedure
sp_updatestats stored procedure
DBCC DBREINDEX

rebuilds one or more indexes for a table in the specified database [21]
DBCC INDEXDEFRAG or ALTER INDEX REORGANIZE operations don’t update the statistics [22]

undocumented options

meant for testing and debugging purposes

should never be used on production systems [29]

STATS_STREAM = stats_stream
ROWCOUNT = numeric_constant

incl PAGECOUNT, alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object [29]

read by the Query Optimizer when processing queries that access the table and/or index in question [29]

cheat the Optimizer into thinking that a table or index is extremely large [29]

the content of the actual tables and indexes will remain intact [29]

PAGECOUNT = numeric contant

auto-update (aka auto statistics update)

on by default

samples the rows by default

always performed by sampling the index or table [21]

triggered by either

query optimization
execution of a compiled plan

involves only a subset of the columns referred to in the query
occurs

before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF
asynchronously if AUTO_UPDATE_STATISTCS_ASYNC is ON

the query that triggered the update proceeds using the old statistics

provides more predictable query response time for some workloads [3]

particularly the workload with short running queries and very large tables [3]

it tracks changes to columns in the statistics

{limitation} it doesn’t track changes to columns in the predicate [3]

{recommendation} if there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes [3]

enable/disable statistics update

database level

ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS OFF
{limitation} it’s not possible to override the database setting of OFF for auto update statistics by setting it ON at the statistics object level [3]

table level

NORECOMPUTE option of the UPDATE STATISTICS command
CREATE STATISTICS command
sp_autostats

index

sp_autostats

statistics object
sp_autostats

[SQL Server 2005] asynchronous statistics update

allows the statistics update operation to be performed on a background thread in a different transaction context

avoids the repeating rollback issue [20]
the original query continues and uses out-of-date statistical information to compile the query and return it to be executed [20]
when the statistics are updated, plans based on those statistics objects are invalidated and are recompiled on their next use [20]

{command}

ALTER DATABASE... SET AUTO_UPDATE_STATISTICS_ASYNC {ON | OFF}

manually

{recommendation} don’t update statistics after index defragmentation

update eventually only the column statistics

[system tables] it might be needed to update statistics also on system tables when many objects were created

triggers a recompilation of the queries [22]

{exception} when a plan is trivial [1]

won’t be generated a better or different plan [1]

{exception} when the plan is non-trivial but no row modifications since last statistics update [1]

no insert, delete or update since last statistics update [1]
manually free procedure cache in case is needed to generate a new plan [1]

schema change

dropping an index defined on a table or an indexed view

only if the index is used by the query plan in question

[SQL Server 2000] manually updating or dropping a statistic (not creating!) on a table will cause a recompilation of any query plans that use that table

the recompilation happens the next time the query plan in question begins execution

[SQL Server 2005] dropping a statistic (not creating or updating!) defined on a table will cause a correctness-related recompilation of any query plans that use that table

the recompilations happen the next time the query plan in question begins execution

updating a statistic (both manual and auto-update) will cause an optimality-related (data related) recompilation of any query plans that uses this statistic

update scenarios

{scenario} query execution times are slow [22]

troubleshooting: ensure that queries have up-to-date statistics before performing additional analysis [22]

{scenario} insert operations occur on ascending or descending key columns [22]

statistics on ascending or descending key columns (e.g. IDENTITY or real-time timestamp) might require more frequent statistics updates than the query optimizer performs [22]
if statistics are not up-to-date and queries select from the most recently added rows, the current statistics will not have cardinality estimates for these new values [22]

inaccurate cardinality estimates and slow query performance [22]

after maintenance operations [22]

{recommendation} consider updating statistics after performing maintenance procedures that change the distribution of data (e.g. truncating a table, bulk inserts) [22]

this can avoid future delays in query processing while queries wait for automatic statistics updates [22]
rebuilding, defragmenting, or reorganizing an index do not change the distribution of data [22]

no need to update statistics after performing ALTER INDEX REBUILD, DBCC REINDEX, DBCC INDEXDEFRAG, or ALTER INDEX REORGANIZE operations [22]

{operation} dropping statistics

via DROP STATISTICS command
{limitation} it’s not possible to drop statistics that are a byproduct of an index [21]

such statistics are removed only when the index is dropped [21]

aging

[SQL Server 2000] ages the automatically created statistics (only those that are not a byproduct of the index creation)
after several automatic updates the column statistics are dropped rather than updated [21]

if they are needed in the future, they may be created again [21]
there is no substantial cost difference between statistics that are updated and created [21]

does not affect user-created statistics [21]

{operation} statistics import)

performed by running generated scripts (see statistics export )

{operation}export (aka statistics export)

performed via “Generate Scripts” task

WITH STATS_STREAM

value can be generated by using DBCC SHOW_STATISTICS WITH STATS_STREAM

{operation} save/restore statistics

not supported

{operation} monitoring

via dbcc show_statistics
via SQL Server query profiler

{option} AUTO_CREATE_STATISTICS

when enabled the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan [22]

these single-column statistics are created on columns that do not already have a histogram in an existing statistics object [22]
it applies strictly to single-column statistics for the full table [22]
statistics name starts with _WA

does not determine whether statistics get created for indexes [22]
does not generate filtered statistics [22]

call: SELECT DATABASEPROPERTY(<database_name>','IsAutoCreateStatistics')
call: select is_auto_create_stats_on from sys.databases where name = '<database_name'

{option}AUTO_UPDATE_STATISTICS option

when enabled the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query [22]

statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view [22]
determined by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold [22]

the threshold is based on the number of rows in the table or indexed view [22]

the query optimizer checks for out-of-date statistics

before compiling a query

the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date [22]

before executing a cached query plan

the Database Engine verifies that the query plan references up-to-date statistics [22]

applies to

statistics objects created for indexes, single-columns in query predicates [22]
statistics created with the CREATE STATISTICS statement [22]
filtered statistics [22]

call: SELECT DATABASEPROPERTY(<database_name>','IsAutoUpdateStatistics')
call: select is_auto_update_stats_on from sys.databases where name = '<database_name'

{option} AUTO_UPDATE_STATISTICS_ASYNC

determines whether the query optimizer uses synchronous or asynchronous statistics updates
{default} off

the query optimizer updates statistics synchronously

queries always compile and execute with up-to-date statistics
when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query [22]
{recommendation} use when performing operations that change the distribution of data [22]

e.g. truncating a table or performing a bulk update of a large percentage of the rows [22]
if the statistics are not updated after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data [22]

applies to statistics objects created for [22]

indexes
single columns in query predicates
statistics created with the CREATE STATISTICS statement

asynchronous statistics updates

queries compile with existing statistics even if the existing statistics are out-of-date

the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles [22]
queries that compile after the asynchronous updates have completed will benefit from using the updated statistics [22]

{warning} it will do nothing if the “Auto Update Statistics” database option isn’t enabled [30]
{recommendation} scenarios

to achieve more predictable query response times
an application frequently executes the same query, similar queries, or similar cached query plans

the query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics [22]

this avoids delaying some queries and not others [22]

an application experienced client request time outs caused by one or more queries waiting for updated statistics [22]

in some cases waiting for synchronous statistics could cause applications with aggressive time outs to fail [22]

call: select is_auto_update_stats_async_on from sys.databases where name = '<database_name'

[SQL Server 2014] Auto Create Incremental Statistics

when active (ON), the statistics created are per partition statistics [22]
{default} when disabled (OFF), the statistics tree is dropped and SQL Server re-computes the statistics [22]
this setting overrides the database level INCREMENTAL property [22]

[partitions] when new partitions are added to a large table, statistics should be updated to include the new partitions [22]

however the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long [22]
scanning the entire table isn't necessary because only the statistics on the new partitions might be needed [22]
the incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics [22]
if per partition statistics are not supported the option is ignored and a warning is generated [22]
not supported for several statistics types

statistics created with indexes that are not partition-aligned with the base table [22]
statistics created on AlwaysOn readable secondary databases [22]
statistics created on read-only databases [22]
statistics created on filtered indexes [22]
statistics created on views [22]
statistics created on internal tables [22]
statistics created with spatial indexes or XML indexes [22]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) SQL Server: Statistics [link]
[2] Microsoft Learn (2025) SQL Server: Configure auto statistics [link]
[3] Microsoft Learn (2009) SQL Server: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 [link]

[4] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[8] Microsoft (2014) Statistical maintenance functionality (autostats) in SQL Server (kb 195565)

[10] CSS SQL Server Engineers (2015) Does rebuild index update statistics? [link]

[16] Thomas Kejser's Database Blog (2011) The Ascending Key Problem in Fact Tables – Part one: Pain! by Thomas Kejser
[20] Kalen Delaney et al (2013) Microsoft SQL Server 2012 Internals, 2013

[22] MSDN (2016) Statistics [link]
[29] Tips, Tricks, and Advice from the SQL Server Query Optimization Team (2006) UPDATE STATISTICS undocumented options [link]
[31] Microsoft Learn (2025) SQL Server: sys.dm_db_stats_histogram (Transact-SQL) [link]

15 March 2025

💠🛠️🗒️SQL Server: Schemas [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources.

Last updated: 15-Mar-2024

[SQL Server 2005] Schemas

{def} a collection of database objects that are owned by a single user and form a single namespace

a named container for database objects

allows to group objects into separate namespaces
collection of like objects which provide maintenance and security to those objects as a whole, without affecting objects within other schemas [1]

reside within databases
fulfilling a common purpose [1]
each schema can contain zero or more data structures (aka objects) [1]
all objects within a schema share

a common naming context

a common security context [10]

behavior of schema changed

⇐ compared to SQL Server 2000
schemas are no longer equivalent to database users

each schema is a distinct namespace that exists independently of the database user who created it

used as a prefix to the object name
schema is simply a container of objects [3]

code written for earlier releases of SQL Server may return incorrect results, if the code assumes that schemas are equivalent to database users [3]

can be owned by any database principal

this includes roles and application roles [3]
its ownership is transferable [3]
every object is contained by a schema [6]
anything contained by it has the same owner [6]

separation of ownership [3]

ownership of schemas and schema-scoped securables is transferable [3]
objects can be moved between schemas [3]
a single schema can contain objects owned by multiple database users [3]
multiple database users can share a single default schema [3]
permissions on schemas and schema-contained securables can be managed with greater precision than in earlier releases [3]
each user has a default schema [3]
user’s default schema is used for name resolution during object creation or object reference [7]

{warning} a user might not have permission to create objects in the dbo schema, even if that is the user’s default schema [7]

when a login in the sysadmin role creates an object with a single part name, the schema is always dbo [7]
a database user can be dropped without dropping objects in a corresponding schema [3]
catalog views designed for earlier releases of SQL Server may return incorrect results

⇐ includes sysobjects
more than 250 new catalog views were introduced to reflect the changes

when creating a database object, if you specify a valid domain principal (user or group) as the object owner, the domain principal will be added to the database as a schema. The new schema will be owned by that domain principal [3]

schema-qualified object name (aka two-part object name)

if schema is omitted, a schema resolution is performed (aka implicit resolution)

checks whether the object exists in the user's default schema
if it doesn't, checks whether it exists in the dbo schema [5]

extra costs are involved in resolving the object name (aka name resolution) [5]

uses a spinlock [8]

in rare occasions a spinlock could not be acquired immediately on such an operation

this may occur on a system under significant load [8]
the contention appears on the SOS_CACHESTORE spinlock type [8]
{resolution} ensure that you always fully qualify your table names [8]

if multiple objects with the same name exist in different schemas, the wrong object might be retrieved [5]

improves readability

{recommendation} always use two-part object names in queries (aka schema-qualify objects)
{poor practice} partition data and objects by using only schemas

⇐ instead of creating multiple databases [1]

{poor practice} complex schemas

developing a row-based security schema for an entire database using dozens or hundreds of views can create maintenance issues [6]

{benefit} simplify database object management

groups of tables can be managed from a single point [4]

by creation of categories of tables [4]

helps navigation through database [4]
allow control permissions at schema level

{benefit} provide separation of ownership

allows to manage user permissions at the schema level, and then enhance them or override them at the object level as appropriate [10]
{recommendation} manage database object security by using ownership and permissions at the schema level [2]
{recommendation} have distinct owners for schemas or use a user without a login as a schema owner [2]
{recommendation} not all schemas should be owned by dbo [2]
{recommendation} minimize the number of owners for each schema [2]

{benefit} enhance security

by minimizing the risk of SQL injection

by assigning objects to schema it is possible to drop users without rewriting your applications as the name resolution is no longer depend upon the user or principals names

used as an extra hierarchical layer for solution and security management [1]

gives architects and developers the ability to choose between the types of logical separation of objects they have created, as well as benefit from having a combination of multiple databases and multiple schemas within them [1]

{type} system schemas

can't be dropped
[default schema] dbo

included in each database
if an application needs to create objects in the under the dbo schema then by granting dbo privileges to the application [12]

increases the attack surface of the application [12]
increases the severity if the application is vulnerable to SQL Injection attacks [12]

can be set and changed by using DEFAULT_SCHEMA option of [3]

e.g. CREATE USER <user_name> WITH DEFAULT_SCHEMA = <schema_name>
e.g. ALTER USER <user_name> WITH DEFAULT_SCHEMA = <schema_name>

if DEFAULT_SCHEMA is left undefined, the database user will have dbo as its default schema [3]

[SQL Server 2005] Windows Groups are not allowed to have this property [11]
[SQL Server 2012] Windows Groups can also have a defined default schema [1]

streamlines the process of creating users

if no default schema is specified for a new user, instead is used the default schema of a group where the user is a member [9]

{warning} not to be confused with the dbo role [6]

INFORMATION_SCHEMA schema

an internal, system table-independent view of the SQL Server metadata
enable applications to work correctly although significant changes have been made to the underlying system tables

guest schema

sys schema

provides a way to access all the system tables and views [7]

{type} user-defined schemas

{best practice} assign objects to user-defined schemas

leaving everything in the dbo schema is like putting everything in the root directory of your hard drive [8]
it saves the Query Processor a step from having to resolve the schema name out by itself [8]

avoid ambiguity

{best practice} assign each user a default schema

ensures that if they create an object without specifying a schema, it will automatically go into their assigned container [8]

{type} role-based schemas

[SQL Server 2012] every fixed database role has a schema of the same name [7]

{exception} public role5

{action} create objects in schema

{prerequisite}

schema must exist
the user creating the object must have permission to create the object, either directly or through role membership [7]
the user creating the object must either [7]

be the owner of the schema
be a member of the role that owns the schema
have ALTER rights on the schema
have the ALTER ANY SCHEMA permission in the database

{recommendation} group like objects together into the same schema [2]

{operation} create schema

{recommendation} use two-part names for database object creation and access [2]

{operation} change schema (aka modify schema)

when applying schema changes to an object and try to manipulate the object data in the same batch, SQL Server may not be aware of the schema changes yet and fail the data manipulation statement with a resolution error [5]

the parsing does not check any object names or schemas because a schema may change by the time the statement executes [6]

triggers a database lock
invalidates existing query plans

a new plan will need to be recompiled for the queries as soon as they are run anew

not allowed on

[SQL Server 2014] [memory-optimized tables]
[table variables]

{best practice} explicitly list column names in statements in case a schema changes

{operation} schema dropping
[windows groups]

an exception in the SQL Server security model [11]
a secondary identity with additional capabilities that are traditionally reserved only for primary identities [11]

require handling not seen in any other security system [11]

can simplify management but due to their hybrid nature, they come with some restrictions [11]
{recommendation} for users mapped to Windows groups, try and limit each Windows user to one Windows group that has database access [2]

Previous Post <<||>> Next Post

References:

[1] 40074A: Microsoft SQL Server 2014 for Oracle DBAs, Microsoft, 2015

[2] Bob Beauchemin et al (2012) SQL Server 2012 Security Best Practices - Operational and Administrative Tasks [whitepaper]

[3] MSDN (2005)User-Schema Separation [link]

[4] Solid Quality Learning (2007) Microsoft SQL Server 2005: Database Essentials Step by Step

[5] Itzik Ben-Gan (2008) Microsoft® SQL Server® 2008 T-SQL Fundamentals

[6] Adam Jorgensen et al (2012) Microsoft® SQL Server® 2012 Bible

[7] Kalen Delaney et al (2013) Microsoft SQL Server 2012 Internals

[8] Buck Woody (2009) SQL Server Best Practices: User-Defined Schemas [link obsolet]

[9] Microsoft (2014) 10977B: Updating Your SQL Server Skills to SQL Server 2014 (Trainer Handbook)

[10] Microsoft (2012) 20465B Designing Database Solutions for SQL Server 2012 (Trainer Handbook

[11] Laurentiu Cristofor (2008) SQL Server: Windows Groups, default schemas, and other properties [link]

[12] Dan Sellers's WebLog (2006) Post Webcast’s Notes: Securing SQL Server 2005 for Developers [link]

[13] Microsoft Learn (2024) SQL Server 2022: System Information Schema Views (Transact-SQL)

[14]

Acronyms:
SQL - Structured Query Language

SQL Troubles

Pages

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

19 March 2025

💠🛠️🗒️SQL Server: Views [Notes]

💠🛠️🗒️SQL Server: Stored procedures (Notes)

💠🛠️🗒️SQL Server: Statistics [Notes]

15 March 2025

💠🛠️🗒️SQL Server: Schemas [Notes]

About Me