SQL Troubles: January 2024

31 January 2024

Microsoft Fabric: Parquet Format (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources.

Last updated: 31-Jan-2024

Parquet format

{definition} open source, column-oriented data file format designed for efficient data storage and retrieval [1]

provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk [1]
designed to be a common interchange format for both batch and interactive workloads [1]

{characteristic} open source file format

similar to other columnar-storage file formats available in Hadoop [1]

e.g. RCFile, ORC

became an industry standard

{benefit} provides interoperability across multiple tools

{characteristic} language agnostic [1]

different programming languages can be used to manipulate the data

{characteristic} column-based format [1]

files are organized by column

⇐ rather than by row
⇒ saves storage space and speeds up analytics queries [1]

reads only the needed columns

⇐ non-relevant data are skipped
⇒ greatly minimizes the IO [1]

aggregation queries are less time-consuming compared to row-oriented databases [1]

{benefit} increased data throughput and performance [1]

⇒ recommended for analytical workloads

{characteristic} highly efficient data compression/decompression [1]

supports flexible compression options and efficient encoding schemes [1]

data can be compressed by using one of the several codecs available [1]

⇒ different data files can be compressed differently [1]

reduced storage requirements [1]

by at least one-third on large datasets
⇒ {benefit} saves on cloud storage space

greatly improves scan and deserialization time [1]

⇒ {benefit} reduces the processing costs

{downside} can be slower to write than row-based file formats

primarily because they contain metadata about the file contents
though have fast read times

{characteristic} supports complex data types and advanced nested data structures [1]

implemented using the record-shredding and assembly algorithm

accommodates complex data structures that can be used to store the data [1]
optimized to work with complex data in bulk and features different ways for efficient data compression and encoding types [1]

the approach is best especially for those queries that need to read certain columns from a large table [1]

{characteristic} cloud-ready

works best with interactive and serverless technologies [1]

{characteristic} immutable

a file can't be update to modify the column name, reorder or drop columns [2]

⇐ requires rewriting the whole file [2]

{characteristic} binary-based file

⇒ not easily readable (by humans)

{characteristic} self-describing

contains metadata about schema and structure
{concept} row groups (aka segments)

contains data from the same columns

{constraint} column names are case sensitive

{concept} file footer

stores metadata statistics for each row group [2]

min/max statistics
the number of rows
can be leveraged by data processing engines to run queries more efficiently [2]

⇐ depending on the query, entire row group can be skipped [2]

{concept} file header

large datasets can be split across multiple parquet files

⇐ the structure can be flat or hierarchical
managing multiple files has several challenges
the files can be used to define a table (aka parquet table)

⇐ {constraint} the files must have the same definition

⇐ schema enforcement must be coded manually [2]

{limitation} [Data Lake] no support for ACID transactions [2]

⇒ easy to corrupt [2]

partially written files will break any subsequent read operations

the compute engine will try to read in the corrupt files and error out [2]
corrupted files must be manually identified and deleted manually to fix the issues [2]

{limitation} it's not easy to delete rows from it [2]

requires reading all the data, filtering out the data not needed, and then rewriting the entire table [2]

{limitation} doesn't support DML transactions [2]
{limitation} there is no change data feed [2]
{limitation} slow file listing [2]

small files require excessive I/O overhead

ideally the files should be between 64 MB and 1 GB
ideally the files should be compacted into larger files (aka small file compaction, bin-packing)

{limitation} expensive footer reads to gather statistics for file skipping [2]

fetching all the footers and building the file-level metadata for the entire table is slow [2]

⇐ it requires a file-listing operation [2]

the effectiveness of data skipping depends on how many files can be can skipped when performing a query [2]

{limitation} doesn't support schema enforcement [2]
{limitation} doesn't support check constraints [2]
{limitation} doesn't support data versioning [2]

{concept} table partitioning

{definition} common optimization approach used to store the data of the same table in different directories, with partitioning column values encoded in the path of each partition directory [6]
{recommendation} avoid partitioning by columns with very high cardinality

{concept} bin-packing (aka compaction, bin-compaction)

aims to produce evenly-balanced data files with respect to their size on disk,

⇐ but not necessarily in respect to the number of tuples per file [7]

requires an algorithm that efficiently organizes the files into equal size containers [6]
{characteristic} idempotent

if it is run twice on the same dataset, the second run has no effect [7]

{feature} [Microsoft Fabric] V-order

{definition} write time optimization to the parquet file format that enables lightning-fast reads under the MF compute engines [3]
applies special sorting, row group distribution, dictionary encoding and compression on parquet files [3]

requires less compute engines resources in to read it [3]

provides further cost efficiency and performance

has a 15% impact on average write times but provides up to 50% more compression [3]

{characteristic} open-source parquet format compliant

all parquet engines can read it as a regular parquet file [3]
⇐ table properties and optimization commands can be used on control V-Order on its partitions [3]
compatible with other features [3]

applied at parquet file level [3]
enabled by default

{command} OPTIMIZE

merges all changes into bigger, consolidated parquet files (aka bin-compaction) [3]
[Spark] dynamically optimizes partitions while generating files with a default 128 MB size [5]

the target file size may be changed per workload requirements using configurations [5]

properly designing the table physical structure based on the ingestion frequency and expected read patterns is likely more important than running the optimization command [3]
running the compaction operation brings the data lake in an unusable state for readers [7]
{warning} manually compacting the files is inefficient and error prone [7]

no way to differentiate files that contain new data from files that contain existing data that was just compacted into new files [7]

[Delta Lake] when ZORDER and VORDER are used together, Apache Spark performs bin-compaction, ZORDER, VORDER sequentially [3]

Previous Post <<||>> Next Post

Acronyms:
ACID - atomicity, consistency, isolation, durability
IO - Input/Output

MF - Microsoft Fabric
ORC - Optimized Row Columnar
RCFile - Record Columnar File

Resources:
[1] Databricks (2023) What is Parquet? (link)
[2] Data Lake (2023) Delta Lake vs. Parquet Comparison (link)
[3] Data Mozart (2023) Parquet file format – everything you need to know! (link)
[4] Microsoft Learn (2023) Query Parquet files using serverless SQL pool in Azure Synapse Analytics (link)
[5] Microsoft Learn (2023) Lakehouse tutorial: Prepare and transform data in the lakehouse (link)
[6] Apache Spark (2023) Spark SQL Guide (link)
[7] Delta Lake (2023) Delta Lake Small File Compaction with OPTIMIZE (link)
[8] Delta Lake (2023) Optimizations (link)

27 January 2024

Data Science: Back to the Future I (About Beginnings)

Data Science Series

I've attended again, after several years, a webcast on performance improvement in SQL Server with Claudio Silva, “Writing T-SQL code for the engine, not for you”. The session was great and I really enjoyed it! I recommend it to any data(base) professional, even if some of the scenarios presented should be known already.

It's strange to see the same topics from 20-25 years ago reappearing over and over again despite the advancements made in the area of database engines. Each version of SQL Server brought something new in what concerns the performance, though without some good experience and understanding of the basic optimization and troubleshooting techniques there's little overall improvement for the average data professional in terms of writing and tuning queries!

Especially with the boom of Data Science topics, the volume of material on SQL increased considerably and many discover how easy is to write queries, even if the start might be challenging for some. Writing a query is easy indeed, though writing a performant query requires besides the language itself also some knowledge about the database engine and the various techniques used for troubleshooting and optimization. It's not about knowing in advance what the engine will do - the engine will often surprise you - but about knowing what techniques work, in what cases, which are their advantages and disadvantages, respectively on how they might impact the processing.

Making a parable with writing literature, it's not enough to speak a language; one needs more for becoming a writer, and there are so many levels of mastery! However, in database world even if creativity is welcomed, its role is considerable diminished by the constraints existing in the database engine, the problems to be solved, the time and the resources available. More important, one needs to understand some of the rules and know how to use the building blocks to solve problems and build reliable solutions.

The learning process for newbies focuses mainly on the language itself, while the exposure to complexity is kept to a minimum. For some learners the problems start when writing queries based on multiple tables - what joins to use, in what order, how to structure the queries, what database objects to use for encapsulating the code, etc. Even if there are some guidelines and best practices, the learner must walk the path and experiment alone or in an organized setup.

In university courses the focus is on operators algebras, algorithms, on general database technologies and architectures without much hand on experience. All is too theoretical and abstract, which is acceptable for research purposes, but not for the contact with the real world out there! Probably some labs offer exposure to real life scenarios, though what to cover first in the few hours scheduled for them?

This was the state of art when I started to learn SQL a quarter century ago, and besides the current tendency of cutting corners, the increased confidence from doing some tests, and the eagerness of shouting one’s shaking knowledge and more or less orthodox ideas on the various social networks, nothing seems to have changed! Something did change – the increased complexity of the problems to solve, and, considering the recent technological advances, one can afford now an AI learn buddy to write some code for us based on the information provided in the prompt.

This opens opportunities for learning and growth. AI can be used in the learning process by providing additional curricula for learners to dive deeper in some topics. Moreover, it can help us in time to address the challenges of the ever-increase complexity of the problems.

21 January 2024

SQL Server: Clustered & Non-clustered Indexes (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.
Columnstore indexes will be considered separately.

Last updated: 22-Jan-2024

Indexes

{definition} a database object associated with a table or view that provides efficient access path to data in the base object, based on one or more columns

contains keys built from one or more columns stored physically as B-trees

⇒ provides the path that is necessary to get to the data in the quickest and most efficient manner possible [7]
⇒ can dramatically speed up the process of finding data as well as the process of joining tables together [2]

on-disk structure

index placement

placed on either

particular filegroup
partitioned according to a predefined partition scheme

{default} placed on the same filegroup as the base object [5]

must be maintained and updated as data are added or changed in the base object

each index brings with it a certain amount of overhead

⇐ the updates, inserts, and deletes are slower

{type} clustered index (CI)

based on a clustering key

tells SQL Server how to order the table's data pages [11]
static columns (aka unchanging columns, non-volatile columns)

the clustering key is never updated [11]

nonstatic columns

when the clustering key value is updated, the data may need to be moved elsewhere (aka record relocation) in the clustered index so that the clustering order is maintained [11]

triggers a nonclustered index update as well

so that the value of the clustering key is correct for the relevant nonclustered index rows [5]

wastes time and space
causes page splits and fragmentation
adds unnecessary overhead to every modification of the column(s) that make up the clustering key [5]

updating the clustering key is clearly more expensive than updating a non-key column [11]

can be either

a single column key
a composite key

[SQL Server 2005] up to 16 columns
[SQL Server 2016] up to 32 columns can be combined into a single composite index key
all the columns in a composite index key must be in the same table or view [8]
the most redundant data within the entire table

{restriction}a table can have only one clustered index [5]
{restriction} maximum allowable size of the combined index values is 900 bytes
{restriction} LOB data types can’t be specified as columns
{subtype} unique clustered index (UCI)

created with the UNIQUE keyword

{subtype} non-unique clustered indexes

created without the UNIQUE keyword
SQL Server does not require a clustered index to be unique

⇒ must have some means of uniquely identifying every row
forces the index to be unique by appending a 4-byte value (aka uniqueifier) to key values as necessary to differentiate identical key values from one another [2]

stored in every level of the B-tree [11]

in both clustered and non-clustered indexes

stored as a variable-length column [11]

if a table does not already contain any other variable-length columns, each duplicate value is actually consuming 8-bytes of overhead:

4 bytes for the uniqueifier value
4 bytes to manage variable-length columns on the row [11]
the overhead for rows with a NULL uniqueifier value is zero bytes [11]

added everywhere the clustering key is stored [11]
if there are many rows using the same clustering key value, the operation can become quite expensive [11]
is not readily visible and cannot be queried [11]
it is impossible to estimate how much additional storage overhead will result from the addition of a uniqueifier, without first having a thorough understanding of the data being stored [11]
meaningless operation in the context of the data [11]
all clustered indexes must be unique so that nonclustered index entries can point to exactly one specific row [5]

{type} nonclustered index (NCI)

the index is a completely separate structure from the data itself [5]

⇒ its presence or absence doesn’t affect how the data pages are organized [5]
heap or a table with a clustered index affects the structure of nonclustered indexes [5]
best at singleton selects

⇐ queries that return a single row.
once the nonclustered B-tree is navigated, the actual data can be accessed with just one page I/O, that is, the read of the page from the underlying table

each table can have up to 999 nonclustered indexes [8]
{subtype} nonclustered indexes with included columns

uses the INCLUDE keyword along with a list of one or more columns to indicate the columns to be saved in the leaf level of an index [1]

the data for these columns are duplicated in index leaf-level pages

⇒ increases the disk space requirement [1]
⇒ fewer index rows can fit on index pages [1]

⇒ might increase the I/O and decrease the database cache efficiency [1]

⇒ data is stored and updated twice [1]

⇒ index maintenance may increase in terms of the time that it takes [1]

can be specified up to 1,023 columns [1]
the included columns are not part of the index key

⇒ keeps the key size small and efficient [1]

having the columns available at the leaf pages with the index means avoid querying the base table if the included columns can satisfy the query [1]

⇒ increases the chances that the query will find all it needs in the index pages itself, without any further lookups [1]

provides the benefits of a covering index [1]
results in smaller, efficient, and narrow keys [1]
removes the 900 bytes/16 columns key size restriction [1]
{restriction} columns of data type text, ntext, and image are not allowed as non-key included columns [1]

{option} FILLFACTOR

its value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth [16]
it's applied when the index is first created but is not enforced afterward

⇒ isn't maintained over time

affects only the leaf-level pages in the index

normally reserves enough empty space on intermediate index pages to store at least one row of the index's maximum size [2]

the empty space is reserved between the index rows rather than at the end of the index [16]
its value is a percentage from 1 to 100

{default} 0 which means that the leaf-level pages are filled to capacity [16]

provided for fine-tuning index data storage and performance [16]
affects performance

creating an index with a relatively low fillfactor

helps avoid page splits during inserts

with pages only partially full, the potential for needing to split one of them in order to insert new rows is lower than it would be with completely full pages
can be good for performance if the new data is evenly distributed throughout the table [16]

if all the data is added to the end of the table, the empty space in the index pages will not be filled [16]

the index will require more storage space and can decrease read performance [16]

⇒ can decrease database read performance by an amount inversely proportional to the fill-factor setting [16]

a high fillfactor can help compact pages so that less I/O is required to service a query

common technique with data warehouses
retrieving pages that are only partially full wastes I/O bandwidth

{option} PAD_INDEX option

instructs SQL Server to apply the FILLFACTOR to the intermediate-level pages of the index [2]
if FILLFACTOR setting is so high that there isn't room on the intermediate pages for even a single row (e.g., a FILLFACTOR of 100%), the percentage is overridden so that at least one row fits [2]
if FILLFACTOR setting is so low that the intermediate pages cannot store at least two rows, SQL Server will override the fillfactor percentage on the intermediate pages so that at least two rows fit on each page [2]

{option} IGNORE_DUP_KEY

{enabled} when a UNIQUE index is created the option is used a duplicate key error on a multiple-row INSERT won’t cause the entire statement to be rolled back [5]

⇒ the nonunique row is discarded, and all other rows are inserted [5]

⇒ it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows [5]

{disabled} during changes even if one row is found that would cause duplicates of keys defined as unique, the entire statement is aborted and no rows are affected [5]

{option} STATISTICS_NORECOMPUTE

determines whether the statistics on the index should be updated automatically [5]
typically enabled
used to set a specific statistic or index to not update automatically [5]
overrides an ON value for the AUTO_UPDATE_STATISTICS database option [5]
if the database option is set to OFF, that behavior for a particular index can’t be overridden

⇒ all statistics in the database must be updated manually

via UPDATE STATISTICS or sp_updatestats [5]

{option} MAXDOP

controls the maximum number of processors that can be used for index creation [5]
can override the server configuration option [max degree of parallelism for index building] [5]
allows multiple processors to be used for index creation can greatly enhance the performance of index build operations [5]

each processor builds an equal-sized chunk of the index in parallel

each parallel thread builds a separate tree

the math used to determine the theoretical minimum number of required pages varies from the actual number [5]
the tree might not be perfectly balanced
after each thread is completed, the trees are essentially concatenated [5]

extra page space reserved during this parallel process can be used for later modifications [5]

{operation} index tuning

iterative process
{operation}index consolidation

typically performed when indexes overlap

includes duplicated indexes

{caution} consider the various aspects when modifying indexes
the number and design of indexes impact performance

⇐ the goal is to obtain the best performance with the smallest number of indexes

{operation} creation (aka index creation)

[clustered index]

the data becomes the leaf level of the clustered index [5]

data in the table is moved to new pages and ordered by the clustering key
maintained logically rather than physically

the order is maintained through a doubly linked list (aka page chain)

based on the definition of the clustered index

same as the order of rows on the data pages

deciding on which column(s) to cluster is an important performance consideration [5]

can be ordered in only one way

although heap pages do have pointers to next and previous pages, the ordering is arbitrary and doesn’t represent any ordering in the table itself [5]

{operation} [SQL Server 2005] rebuilding (aka rebuilding indexes)

{goal} generate new statistics [18]
{goal} change the number of pages used by the index (fill factor) [18]
{goal} reorganize the data on database pages in relative proximity to each other [18]
introduced mainly for SQL Server's own internal purposes when applying upgrades and service packs [6]
drops and re-creates the index

always rebuilds the entire index [3]

regardless of the extent of fragmentation [3]
an index rebuild for a lightly fragmented index is really overkill [3]

there must be enough free space in the database to accommodate the new index [3]

otherwise the database will grow to provide the required free space [3]

this can be problematic for large indexes [3]

reclaims disk space by compacting the pages based on the specified or existing fill factor setting [15]
reorders the index rows in contiguous pages [15]

removes fragmentation

automatically rebuilds all index statistics [18]

columns statistics aren’t touched
usually faster than reorganizing indexes if the logical fragmentation is high [18]

when ALL is specified, all indexes on the table are dropped and rebuilt in a single transaction
indexes with more than 128 extents are rebuilt in two separate phases [15]

logical phase

the existing allocation units used by the index are marked for deallocation [15]
the data rows are copied and sorted [15]
then moved to new allocation units created to store the rebuilt index [15]

physical phase

the allocation units previously marked for deallocation are physically dropped in short transactions that happen in the background, and do not require many locks [15]

serial index rebuild

forces an allocation order scan, ignoring the key placement and scanning the object from first IAM to last IAM order [13]

via WITH(NOLOCK, INDEX=0)

takes longer than a [parallel index rebuild]

parallel index rebuild

during rebuild a portion of the data (range) is assigned to each of the parallel workers [13]

for each parallel worker is assigned its own CBulkAllocator when saving the final index pages [13]

a leap frog behavior of values, across extents occurs as the workers copy the final keys into place [13]

each worker spreads 1/nth of the data across the entire file instead of packing the key values together in a specific segment of the file [13]

can use minimal-logging to reduce transaction log growth [3]
may require long-term locks on the table [3]

⇒ can limit concurrent operations [3]

replaces

DBCC DBREINDEX command
DROP_EXISTING option of CREATE INDEX command

can be executed

online (aka online operation)

table and indexes are available while the operation is performed

offline (aka offline operation)

permissions requirements

ALTER permission on the table or view

users must be members of either:

sysadmin
db_ddladmin
db_owner

{operation} reorganize (aka reorganize indexes, index reorganization)

doesn’t update statistics at all [3]

always single-threaded [3]

always fully logged [3]

⇐ doesn’t prevent transaction log clearing [3]

doesn’t hold blocking locks

always executed online
won’t update statistics at all [3]
uses minimal system resources

requires 8KB of additional space in the database [3]
doesn’t hold blocking locks [3]
takes care of the existing fragmentation [3]

⇒ makes it a better choice for removing fragmentation from a lightly fragmented index [3]
⇒ makes it a poor choice for removing fragmentation from a heavily fragmented index [3]

{limitation} can’t use index options
replaces the DBCC INDEXDEFRAG command [6]
removes some of the fragmentation from an index

it’s not guaranteed to remove all the fragmentation

just like DBCC INDEXDEFRAG

doesn’t have a corresponding option in the CREATE INDEX command [6]

because when creating an index there is nothing to reorganize [6]

it defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right, order of the leaf nodes [15]
compacts the index pages

based on the existing fill factor value

{operation} disabling (aka index disabling)
- makes an index completely unavailable
- disables one or ALL indexes with a single command
- indexes must be completely rebuilt to make them useful again
- {warning} disabling a [clustered index] on a table, makes the data unavailable [6]
{operation} dropping (aka index dropping)
{operation} changing index options

via ALTER INDEX command

{warning} regular rebuilding of all the indexes in a database is a poor way to manage indexes

leads to resources waste by rebuilding indexes that are not fragmented
it can affect the availability of a system when the indexes can’t be rebuilt online

{concept} index extent

a group of eight index pages

{concept} narrow index

in terms of the number of bytes it stores [11]

usually has few columns in the index key
can accommodate more rows into an 8KB index page than a wide index [1]

⇒ {benefit} reduces I/O
⇒ {benefit} reduces storage requirements
⇒ {benefit} improves database caching

{benefit} reduces maintenance overhead

small indexes might not have an intermediate level at all
the goal when defining indexes isn’t to have only very narrow indexes [5]

extremely narrow indexes usually have fewer uses than slightly wider indexes [5]

{concept} wide index

typically an index with many columns in the index key

wider keys cause more I/O and permit fewer key rows to fit on each B-tree page

indexes require a larger number of pages than it otherwise would and causes it to take up more disk space a lot of space (and potentially buffer pool memory) is wasted

large indexes often have more than one intermediate level

{benefit} covers more queries
{downside} increases maintenance overhead
{downside} can adversely affect disk space

{concept} ever increasing index

e.g. a numeric value can be continuously incremented by the value defined at creation

e.g. identity integers

{benefit} avoids fragmentation

results in sequential IO and maximizes the amount of data stored per page

the most efficient use of system resources

{benefit} improve write performance

SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page [11]

non-sequential key column can result in a much higher overhead during insertion [11]

SQL Server has to find the correct page to write to and pull it into memory

if the page is full, SQL Server will need to perform a page split to create more space [11]
during a page split, a new page is allocated, and half the records are moved from the old page to the newly-allocated page [11]
each page has a pointer to the previous and next page in the index, so those pages will also need to be updated [11]

{concept} fragmentation results from data modifications

can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering [11]

{concept} covering index

nonclustered index that contains all the columns requested by the query without going to the base table

allows it to skip the bookmark lookup step and simply return the data the query seeks from its own B-tree [2]
when a clustered index is present, a query can be covered using a combination of nonclustered and clustered key columns since the clustered key is the nonclustered index's bookmark [2]
the next best thing to having multiple clustered indexes on the same table [2]

included columns

columns added in the INCLUDE clause of the CREATE INDEX command
allow exceeding the 900-byte or 16-key column limits in the leaf level of a nonclustered index [5]
appear only in the leaf level
don’t affect the sort order of the index rows in any way [5]
SQL Server can silently add an included column to indexes [5]

e.g. when an index is created on a partitioned table and no ON filegroup or ON partition_scheme_name is specified [5]

when partitioning a non-unique, nonclustered index, the Database Engine adds the partitioning column as a non-key (included) column of the index, if it is not already specified [8]

column names cannot be repeated in the list [8]
column namescannot be used simultaneously as both key and non-key columns [8]
{restriction} all data types are allowed except text, ntext, and image [8
{restriction} index must be created or rebuilt offline if any one of the specified non-key columns are varchar(max), nvarchar(max), or varbinary(max) data types [8]
[computed columns] that are deterministic and either precise or imprecise can be included columns [8]
[computed columns] derived from image, ntext, text, varchar(max), nvarchar(max), varbinary(max), and xml data types can be included in non-key columns as long as the computed column data types is allowable as an included column [8]

[SQL Server 2005] can be stored in the same filegroup as the underlying table, or on a different filegroup [12]

{concept} [SQL Server 2008] filtered indexes

indexes with simple predicates that restrict the set of rows included in the index [4]
leaf level contains an index row only for keys that match the filter definition [5]
created using a new WHERE clause on a CREATE INDEX statement
alternative to indexed views

which are more expensive to use and maintain
their matching capability is not supported in all editions
tend to be more useful for the more classical relational query precomputation scenarios

{benefit} have fewer rows and are also narrower than the base table

require fewer pages as well
specific constraints that are used on queries with large tables where space is an issue, this kind of index can be quite useful

usage scenarios

fields are used only occasionally

resulting in many NULL entries for that column

a traditional index stores a lot of NULLs and wastes a lot of storage space
updates to the table have to maintain this index for every row

querying a table with a small number of distinct values and are using a multicolumn predicate where some of the elements are fixed

this might be useful for a regular report with special purpose

it speeds up a small set of queries while not slowing down updates as much for everyone else

when there is a known query condition on an expensive query on a large table

{concept} indexed views

similar to materialized views in Oracle
without indexes, views are purely logical

⇒ the data involved has no physical storage [5]

the first index built must be a clustered index

data is stored in the leaf level of the index [5]
further nonclustered indexes can be built

{constraint} view’s columns must be deterministic
{constraint} session-level restrictions

options must be set to a specific value (see )

if these options aren’t set as specified, an error message is raised when attempted to create an indexed view [5]

{constraint} only deterministic functions are allowed

check via SELECT OBJECTPROPERTY (object_id('<function_name>'), 'IsDeterministic')
imprecise values (float or real values) can be used only if the computed value is persisted [5]

{constraint} the definition of any underlying object’s schema can’t change [5]

must be enforced through WITH SCHEMABINDING option

{constraint} if a view's definition contains GROUP BY

the SELECT list must include the aggregate COUNT_BIG (*)

COUNT_BIG returns a BIGINT, a 8-byte integer

can’t contain HAVING, CUBE, ROLLUP, or GROUP BY ALL
all GROUP BY columns must appear in the SELECT list
constraints apply only to view definition, not to the queries that might use the indexed views [5]

{benefit}materialize summary aggregates of large tables [5]

eliminates the need of scanning the underlying, large tables [5]

{activity} check whether a view is indexable

via SELECT OBJECTPROPERTY (OBJECT_ID ('<view_name>'), 'IsIndexable');

{activity} check whether a view is indexed

via SELECT OBJECTPROPERTY (OBJECT_ID ('<view_name>'), 'IsIndexed');

alternative solutions

temporary tables
persistent tables

[Query Optimizer] doesn’t always choses an indexed view for query’s execution plan [5]

{restriction} considered only in [Enterprise], [Developer] and [Evaluation] editions
the base tables might be accessed directly [5]
use NOEXPAND hint in FROM clause to make sure that the indexed view isn’t expanded into its underlying SELECT statement [5]

{recommendation} NOEXPAND hint should be used only when improves query’s performance [5]

indexes on computed columns

without indexes, computed columns are purely logical

the data involved has no physical storage [5]

recomputed every time a row is accessed

{exception}the computed column is marked as PERSISTED [5]

{restriction} see the requirements for indexed views

{exception} table-level restrictions don’t apply [5]

{restriction} allowed on deterministic, precise (and persisted imprecise) computed columns where the resulting data type is otherwise indexable [5]

{activity} check whether a column is deterministic

via SELECT COLUMNPROPERTY (OBJECT_ID('<tabelle>'), '<column>', 'IsDeterministic');

{activity} check whether a column is precise

via SELECT COLUMNPROPERTY (OBJECT_ID('<tabelle>'), '<column>', 'IsPrecise');

{activity} check if a column is persisted

SELECT is_persisted
FROM sys.computed_columns
WHERE object_id = object_Id('<tabelle>')
AND name = '<column_name>'

{concept} missing index

index that would have lead to a optimal query plan
[query optimizer] when generating a query plan, it analyzes what are the best indexes for a particular filter condition [17]

if the best indexes do not exist it generates a suboptimal query plan [17]
stores information about the missing indexes
{activity} review missing indexes

via sys.dm_db_missing_index_group_stats

when table metadata changes, all missing index information about the table is deleted [17]

e.g. column added/dropped, index adde

via MissingIndexes element in XML Showplans

correlate indexes that the query optimizer considers missing with the queries for which they are missing [17]
turn on via SET STATISTICS XML ON option

it’s not intended to fine tune an indexing configuration [17]

it cannot gather statistics for more than 500 missing index groups [17]

after this threshold is reached, no more missing index group data is gathered [17]

the threshold is not a tunable parameter and cannot be changed [17]

it does not specify an order for columns to be used in an index [17]
for queries involving only inequality predicates, it returns less accurate cost information [17]
it reports only include columns for some queries

index key columns must be manually selected [17]

it returns only raw information about columns on which indexes might be missing [17]

information returned might require additional processing before being used to create an index [17]

it does not suggest [filtered indexes] [17]
it can return different costs for the same missing index group that appears multiple times in XML Showplans [17]

it can occur when different parts of a single query benefit differently from the same missing index group [17]

it does not consider [trivial query plans] [17]

{concept} fragmented index:

index that is not in the same logic order as it is stored in memory or on storage [19]
caused by

modifications to data can cause the information in the index to become scattered in the database (fragmented) [15]
autoshrink option

heavily fragmented indexes can degrade query performance and cause your application to respond slowly [15]
index fragmentation

exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file [15]
{type} extent fragmentation (aka external index fragmentation)

the pages get out of physical order, as a result of data modifications [11]

can result in random I/O

does not perform as well as sequential I/O [11]

is truly bad only when SQL Server is doing an ordered scan of all or part of a table or an index [6]

only when the pages need to be fetched in logical order is needed to follow the page chain.

if the pages are heavily fragmented, this operation is more expensive than if there were no fragmentation [6]

if seeking individual rows through an index, it doesn't matter where those rows are physically located [6]

{type} page fragmentation (aka internal index fragmentation)

occurs when there are gaps in the data pages

reduces the amount of data that can be stored on each page

increases the overall amount of space needed to store the data [11]

[scanning] the entire table or index involves more read operations than if no free space were available on your pages [6]

some fragmentation is desirable in order to minimize the [page splits] [6]

fragmentation on small indexes is often not controllable

the pages of small indexes are stored on mixed extents [15]

mixed extents are shared by up to eight objects [15]

the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index [15]

{affect} primarily impacts range-scan queries

singleton queries would not notice much impact [11]
can benefit from routine defragmentation efforts [11]

{affect} disk performance and the effectiveness of the SQL Server read-ahead manager [10]
disk latency

[small-scale environment] only the highest fragmentation levels had a significant negative impact on disk latency [10]
[large-scale environment] significantly lower and never became an issue

due to increased I/O performance provided by the SAN [10]

{action} detect fragmentation

analyze the index to determine the degree of fragmentation

via sys.dm_db_index_physical_stats

controlled through

fillfactor setting
regular defrag operations

{remedy} reorganize indexes
{remedy} rebuild indexes

{concept} page splits

when a new row is added to a full index page, half the rows are moved to a new page to make room for the new row

the new page must be linked into the indexes page chain, and usually the new page is not contiguous to the page being split [6]

resource intensive operation
occurs frequently
can lead to external fragmentation [6]

{concept} interleaving

occurs when index extents are not completely contiguous within the data file, leaving extents from one or more indexes intermingled in the file [18]
can occur even when there is no logical fragmentation

because all index pages are not necessarily contiguous, even when logical ordering matches physical ordering [18]

{concept} leap frog behavior

behavior occurs during [parallel index rebuild]
when the key range is leap frogged the fragmentation limits [SQL Server]’s I/O size to 160K instead of 508K and drives the number of I/O requests much higher [13]
{workaround} [partitioned table] partition the table on separate files matching the DOP used to build the index

allows better alignment of parallel workers to specific partitions, avoiding the [leap frog behavior]

{workaround} [non-partitioned table] aligning the number of files with the DOP may be helpful [13]

with reasonably even distribution of free space in each file the allocation behavior is such that alike keys will be placed near each other [13]

{workaround} for single partition rebuild operations consider serial index building behaviors to minimize fragmentation behaviors [13]

{concept} defragmentation

does not yield performance gains in every case [10]
steps

{activity} ensure there are no resource issues

common resource issues are related to

I/O subsystem performance
memory usage
CPU utilization

physical disk fragmentation [10]

{recommendation} [small-scale environments] correct disk fragmentation before running index defragmentation [10]

not necessary on SAN environments [10]

inappropriate schemas [10]
out-of-date statistics

{recommendation} resolve any out-of-date statistics before defragmenting [10]

{activity} determine the amount of index fragmentation

focus on the larger indexes

their pages are less likely to be cached [10]
{best practice} monitor regularly the fragmentation levels on indexes [10]

{activity} determine workload type

not all workload types benefit from defragmenting [10]

read-intensive workload types that do significant disk I/O benefit the most [10]
DSS workload types benefit much more than OLTP workload types [10]

{activity} determine queries that perform poorly [10]

determine the amount of I/O performed by a query [10]
queries that scan large ranges of index pages are affected most by fragmentation and gain the most from defragmenting [10]

{activity} understand the effect of fragmentation on disk throughput and the read-ahead manager [10]

for queries that scan one or more indexes, the read-ahead manager is responsible for scanning ahead through the index pages and bringing additional data pages into the SQL Server data cache [10]

the read-ahead manager dynamically adjusts the size of reads it performs based on the physical ordering of the underlying pages [10]
when there is low fragmentation, the read-ahead manager can read larger blocks of data at a time, more efficiently using the I/O subsystem [10]
as the data becomes fragmented, the read-ahead manager must read smaller blocks of data [10]
the amount of read-ahead operations that can be issued is independent of the physical ordering of the data; however, smaller read requests take more CPU resources per block, resulting in less overall disk throughput [10]

when fragmentation exists and the read-ahead manager is unable to read the larger block sizes, it can lead to a decrease in overall disk throughput [10]

{constraint}UNIQUE

allows the columns that make up the constraint to allow NULLs, but it doesn’t allow all key columns to be NULL for more than one row [5]
unique indexes created using the CREATE INDEX command are no different from indexes created to support constraints [5]

[Query Optimizer] makes decisions based on the presence of the unique index rather than on whether the column was declared as a constraint [5]

⇐ is irrelevant how the index was created [5]

{constraint} PRIMARY KEY

all the columns involved in the PRIMARY KEY don’t allow NULL values

if any of the columns allow NULL values, the PRIMARY KEY constraint can’t be created [5]

its value is unique within the table

uniqueness defined via the UNIQUE constraint
⇒ a unique index is created on the columns that make up the constraint [5]

when a PRIMARY KEY or UNIQUE constraint is created, the underlying index structure that’s created is the same as though you had used the CREATE INDEX command directly [5]

⇐ however usage and features have some differences

a constraint-based index can’t have other features added (e.g. included columns, filters), while a UNIQUE index can have these features while still enforcing uniqueness over the key definition of the index [5]
when referencing a UNIQUE index - which doesn’t support a constraint - through a FOREIGN KEY constraint, indexes can’t be referenced with filters [5]

⇐ an index that doesn’t use filters or an index that uses included columns can be referenced [5]

⇐ the names of the indexes built to support these constraints are the same as the constraint names [5]

{concept} index design

designing and using proper indexes is one of the keys to maximizing query performance [1]

over-indexing

can be worse than under-indexing
extreme case: index every column

under-indexing

extreme case: having no index

very complex, time-consuming, and error-prone process

even for moderately complex databases and workloads [12]

requires knowing

the data
the workload
how SQL Server works [5]

index internals
statistics
query optimization
maintenance

{activity} understand the characteristics of the database itself [12]

e.g. characteristics of OLAP vs OLTP

{activity}understand the characteristics of the most frequently used queries [12]
{activity} understand the characteristics of the columns used in the queries [12]

data type
column uniqueness

unique indexes are more useful for the query optimizer than nonunique indexes [12]

data distribution in the column
NOT NULL values percentage
columns with categories of values
columns with distinct ranges of values
sparse columns
computed columns

{activity} determine which index options might enhance performance when the index is created or maintained [12]

e.g. creating a clustered index on an existing large table would benefit from the ONLINE index option

{activity} determine the optimal storage location for the index [12]

storage location of indexes can improve query performance by increasing disk I/O performance [12]

⇐ storing a nonclustered index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time [12]

[partitioning] clustered and nonclustered indexes can use a partition scheme across multiple filegroups

⇐ [partitioning] makes large tables or indexes more manageable by letting you access or manage subsets of data quickly and efficiently, while maintaining the integrity of the overall collection
{default} indexes created on a partitioned table will also use the same partitioning scheme and partitioning column [9]

⇐ the index is aligned with the table [9]
allows for easier management and administration, particularly with the sliding-window scenario [9]

{guideline} avoid over-indexing heavily updated tables

large numbers of indexes on a table affect the CRUD statements [12]

{guideline} use many indexes to improve query performance on tables with low update requirements, but large volumes of data [12]
{guideline} indexing small tables may not be optimal

because it can take the query optimizer longer to traverse the index searching for data than to perform a simple table scan [12]

{guideline} indexes on views can provide significant performance gains when the view contains

aggregations
table joins
combination of aggregations and joins
views does not have to be explicitly referenced in the query for the query optimizer to use it [12]

{guideline} keep your indexes as compact and narrow as possible while still meeting the business needs your system was designed to address [2]

⇐ the index needs to be efficient

{guideline}use the Database Engine Tuning Advisor to analyze your database and make index recommendations [12]
{guideline} create nonclustered indexes on the columns that are frequently used in predicates and join conditions in queries [12]
{guideline} write queries that insert or modify as many rows as possible in a single statement

instead of using multiple queries to update the same rows
exploits optimized index maintenance

{guideline} consider using filtered indexes on columns that have well-defined subsets [12]

e.g. sparse columns
e.g. columns with mostly NULL values
e.g. columns with categories of values
e.g. columns with distinct ranges of values
e.g. consider indexing computed columns

{recommendation} use a couple of relatively narrow columns that, together, can form a unique key [11]

saves the cost of the uniqueifier in the data pages of the leaf level of your clustered index [11]
still results in an increase increase of the row size for your clustering key in the index pages of both your clustered and nonclustered indexes

{recommendation} carefully design and determine when and what columns should be included [1]

determine whether the gains in query performance outweigh the effect on performance during data modification and the additional disk space requirements [1]

{recommendation} keep the index size small

tends to be efficient [1]
features such as INCLUDE and filtered indexes can profoundly affect the index in both size and usefulness [5]

{recommendation} have all the data required by the query obtained from the covering index without touching the clustered index or table data page

results in lesser I/O and better query throughput
poorly designed indexes and a lack of indexes are primary sources of database application bottlenecks [12]
the right index, created for the right query, can reduce query execution time from hours down to seconds [5]

without a useful index the entire table is scanned to retrieve the data
in joins some tables may be scanned multiple times to find all the data needed to satisfy the query [2]

{recommendation} create a clustered index on all user tables [14]

the performance benefits of having a clustered index on a table outweigh the negatives ones [14]

tables with clustered indexes have a higher 'page splits/sec' values [14]
heaps can have higher throughput (rows/sec) for individual or batch CRUD operations [14]
concurrent insert operations performed are often susceptible to encountering hot spots [14]

the contention at the particular insert location in the table adversely affects performance [14]

disk space required is smaller than for a heap [14]
deleting rows frees up more disk space when compared to a heap table

[clustered indexes] empty extents are automatically deallocated from the object [14]
[heaps] extents are not deallocated automatically [14]

extents are held on for later reuse [14]

results in table size bloating [14]

the residual disk space for heap tables could be reclaimed by using maintenance tasks [14]

⇐ requires additional work and processing [14]

{recommendation} choose the optimum sort order by examining the individual workload queries [17]
{recommendation} limit changes to no more than 1-2 indexes

metadata

⇐{recommendation}for the data not persisted make periodic copies to use the data for further analysis
[SQL Server 2000] sys.sysindexes

system table that stores system-level information about indexes
a pointer to each index's root node is stored in sysindexes' root column
{limitation} doesn't support partition tables

use sys.indexes instead

{warning} susceptible to be removed in further versions

[SQL Server 2005] sys.indexes

contains a row per index or heap of a tabular object, such as a table, view, or table-valued function
metadata's visibility is limited to the securables a user either owns or to which has been granted permissions

[SQL Server 2005] sys.index_columns

contains one row per column that is part of an index or unordered table (heap).

[SQL Server 2005] sys.dm_db_index_physical_stats

returns the size and fragmentation information for the data and indexes of the specified table or view
useful for determining the size and health of indexes [5]
replaces DBCC SHOWCONTIG
when the function is called, SQL Server traverses the page chains for the allocated pages for the specified partitions of the table or index [5]

has similar parameters as sys.dm_db_index_physical_stats

[SQL Server 2008] sys.dm_db_missing_index_group_stats

returns summary information about missing index groups [17]
updated by every query execution

⇒ not by every query compilation or recompilation

not persisted

[SQL Server 2008] sys.dm_db_missing_index_groups

returns information about a specific group of missing indexes [17]

⇐ the group identifier and the identifiers of all missing indexes that are contained in that group [17]

not persisted
{required permissions} VIEW SERVER STATE
{required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE

[SQL Server 2008] sys.dm_db_missing_index_details

DMV that returns detailed information about missing indexes [17]
updated when a query is optimized by the query optimizer
not persisted
{limitation} the result set is limited to 600 rows

⇒ address the missing indexes issues to see further values

{required permissions} VIEW SERVER STATE
{required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE STATE on the server

[SQL Server 2008] sys.dm_db_missing_index_columns

returns information about the database table columns that are missing an index [17]
not persisted
{required permissions} VIEW SERVER STATE
{required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE

{undocumented} [SQL Server 2012] sys.dm_db_database_page_allocations

returns a row for every page used or allocated based on the given parameters [5]
replaces the DBCC IND undocumented command [5]

[SQL Server 2019] sys.dm_db_page_info

function that returns information about a page in a database

returns one row that contains the header information from the page, including the object_id, index_id, and partition_id

replaces DBCC PAGE in most cases

{command} DBCC SHOWCONTIG
- determines how full the pages in a table and/or index really are
- shows three types of fragmentation:
- scan density
- {concept} average page density
- key indicators
- {option} WITH FAST
{command} DBCC INDEXDEFRAG

defragments index pages
defragments only individual indexes [10]

⇒ cannot rebuild all indexes with a single statement [10]

performs an in-place update of the pages to defragment them [20]

⇐ does not require locking the table for the entirety of the process, [20]

⇒ allows users to access the table [20]

only defragments the pages within the extents [20]

⇐ it does not defragment the extents [20]

⇐ doesn’t affect extent scan fragmentation

⇒ reduces logical scan fragmentation

online operation
skips locked pages as it encounters them

⇒ may not be able to completely eliminate fragmentation [18] [6]

each unit of work performed occurs as a separate transaction [18]

⇒ can be stopped and restarted without losing any work [18]

{drawback} does not help to untangle indexes that have become interleaved within a data file [10]

even if indexes can be rebuilt so that there is minimal interleaving, this does not have a significant effect on performance [10]

{drawback} does not correct extent fragmentation on indexes [10]
{drawback} can produce large transaction log files [20]

{workaround} decrease the backup interval for transaction backups while DBCC INDEXDEFRAG is running [20]
⇐ keeps the transaction log files from growing too large [20]

{drawback} can make extent switching worse [20]

because a page order scan, which might be in order after DBCC INDEXDEFRAG, might switch between more extents [20]
{phase: 1} compacts the pages and attempt to adjust the page density to the fillfactor that was specified when the index was created

attempts to raise the page-density level of pages to the original fillfactor [10]

{phase: 2} defragments the index by shuffling the pages so that the physical ordering matches the logical ordering of the leaf nodes of the index [10]

⇐ performed as a series of small discrete transactions

has a small impact to overall system performance [10]

{command} DBCC DBREINDEX

rebuilds an index or all indexes on a table

⇐ builds indexes the same way as CREATE INDEX [10]

restores the page density levels to the original fillfactor or to the provided value [10]
rebuilds statistics automatically during the rebuild

have dramatic improvements on workload performance [10]

can take advantage of multiple-processor computers [10]

can be significantly faster when rebuilding large or heavily fragmented indexes [10]

rearranges the extents and the pages within a table so that they are in index order [19]

⇐ pages scan faster when they have been written in heap memory in the same order as their next page's pointers (index order) [19]
extents can be paged from disk faster during a scan operation when they're written to storage in the same order as their next extent pointers (index order) [19]

offline operation
its effectiveness can be influenced by the available free space [10]

⇐ without large enough contiguous blocks of free space, DBREINDEX may be forced to reuse other areas of free space within the data files, resulting in indexes being rebuilt with a small amount of logical fragmentation [10]
the amount of free space needed varies and is dependent on the number of indexes being created in the transaction [10]

{guideline} [clustered indexes] required free space = 1.2 * (average row size) * (number of rows) [10]
[nonunique clustered index] when rebuilt is also need free space for both the clustered and any nonclustered indexes

implicitly rebuilt because SQL Server must generate new unique identifiers for the rows [10]
{guideline} [nonclustered indexes] average row size *number of rows

average row size of each row in the nonclustered index considers the length of the nonclustered key plus the length of clustering key or row ID [10]

runs as one atomic transaction [18]

when stopped the changes are rolled back [18]

[clustered indexes][locking] puts an exclusive table lock on the table [19]

prevents any users from accessing the table [19]

[nonclustered indexes][locking] puts a shared table lock

prevents all but SELECT operations from being performed on it [19]

{good practice} specify individual indexes for defragmentation [10]

⇐ gives more control over the operations being performed [10]
⇐ can help to avoid unnecessary work [10]

{limitation} does not reduce page density levels on pages that currently have a higher page density than the original fillfactor [10]
{advantage} as the fragmentation level and size of the indexes increase, DBCC DBREINDEX can rebuild the indexes much faster than DBCC INDEXDEFRAG [10]
via DBCC DBREINDEX (‘<table_name>')

Acronyms:
CI - clustered index
CRUD - Create, Read, Update, Delete
DBCC - Database Console Command
DMV - Dynamic Management View
IAM - Index Allocation Map
NCI - nonclustered index
UCI - unique clustered index

Resources:
[1] Scalability Experts (2005) "Microsoft® SQL Server 2005: Changing the Paradigm",
[2] Ken Henderson (2003) "Guru's Guide to SQL Server Architecture and Internals"
[3] Paul S Randal (2011) SQL Q&A: The Lore of Logs (link)
[4] Kalen Delaney et al (2009) "Microsoft® SQL Server® 2008 Internals"
[5] Kalen Delaney (2013) Microsoft SQL Server 2012 Internals
[6] Kalen Delaney (2006) "Inside Microsoft® SQL Server™ 2005: The Storage Engine"
[7] Jason Strate & Grant Fritchey (2015) "Expert Performance Indexing in SQL Server"
[8] Microsoft Learn (2023) CREATE INDEX (link)
[9] Kimberly L Tripp (2005) Partitioned Tables and Indexes in SQL Server 2005 (link)
[10] Mike Ruthruff (2009) Microsoft SQL Server 2000 Index Defragmentation Best Practices [white paper]
[11] Michelle Ufford (2011) Effective Clustered Indexes, (link)
[12] Technet (2016) SQL Server Index Design Guide (link)
[13] Bob Dorr (2015) How It Works: MAX DOP Level and Parallel Index Builds (old link)
[14] Stuart Ozer et al (2007) SQL Server Best Practices Article (link)
[15] Microsoft Learn (2023) Optimize index maintenance to improve query performance and reduce resource consumption (link)
[16] MSDN (2015) Specify Fill Factor for an Index (link)
[17] Technet (2015) Finding Missing Indexes (link)
[18] MSDN Blogs (2010) Notes - SQL Server Index Fragmentation, Types and Solutions, by Pankaj Mittal (old link)
[19] Wayne W Berry (2010) The DBCC DBREINDEX Command
[20] Wayne W Berry (2010) The DBCC INDEXDEFRAG Command
[21] Microsoft Learn (2023) SQL Server and Azure SQL index architecture and design guide (link)

14 January 2024

SQL Server: The Query Plan (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources (some unfortunately not available anymore).

Query plan (aka query execution plan, execution plan)

read-only reentrant structure that is shared by multiple users [1]

optimized version of the query tree [7]

defines

the sequence in which the source tables are accessed [7]
the methods used to extract data from each table [7]

whether it uses an index or a table scan [7]

the methods used to compute calculations [7]
the methods used to filter, aggregate, and sort data from each table [7]

built by the Query Optimizer

it chooses a query plan using a set of heuristics to balance compilation time and plan optimality in order to find a good query plan [7]

stored in the plan cache

remain in the plan cache as long as there is enough memory to store them

under memory pressure exists, a cost-based approach is used to determine which execution plans to remove from the plan cache [7]

execution plans are examined multiple times until enough have been removed to satisfy memory requirements [7]

⇒ an execution plan may have its cost increased and decreased more than once [7]

the database engine uses the resource monitor and user worker threads to free memory from the plan cache in response to memory pressure [7]

examine plans run concurrently to decrease the current cost for each unused execution plan [7]
the resource monitor removes execution plans from the plan cache when global memory pressure exists [7]

it frees memory to enforce policies for system memory, process memory, resource pool memory, and maximum size for all caches [7]

the user worker threads remove execution plans from the plan cache when single cache memory pressure exists [7]

it enforce policies for maximum single cache size and maximum single cache entries [7]

when memory pressure no longer exists, the decreasing of current cost of unused execution plans is stopped [7]

all execution plans remain in the plan cache, even if their cost is zero [7]

one or all plans can be removed from the plan cache [7]

via DBCC FREEPROCCACHE
via DBCC FREESYSTEMCACHE can also be used to clear any cache, including plan cache
[SQL Server 2016] via ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE

clears the procedure (plan) cache for the database in scope [7]

changes in some configuration settings via sp_configure and reconfigure cause plans to be removed from plan cache [7]

agnostic to both parameter values and user-specific information

no user context is stored in the query plan [5]

the text of each newly submitted query is hashed and cached in memory, along with the original query text and a link to the execution context and the execution plan that the original query resulted in [2]

allows avoid redundantly parsing and optimizing queries [2]

must first go through a conversion to be passed to the LPE process to be carried out

⇐ regardless of the stage that actually produces the execution plan
this is handled by the conversion stage of the optimization process
each optimized query plan is processed by this stage before being returned for execution [3]

[SQL Server 2005] is put in the plan cache when a cacheable batch is submitted for execution and compiled [1]

a portion from the buffer pool is used for caching query plans (aka plan cache) [3]

[SQL Server 2005] there are at most two instances of a query plan at any time in plan cache [1]

one for all of the serial executions
one for all of the parallel executions

common for all parallel executions, regardless of their degrees of parallelism [5]
a necessary condition to compile a parallel query plan is that the minimum of the number of processors must be more than one after having survived:

the processor affinity mask
the 'max degree of parallelism' server-wide option (possibly set using the "sp_configure" stored procedure)
[SQL Server 2008] the MAXDOP limits imposed on the resource pool that the query is executing within, by the Resource Governor
any MAXDOP hint on the query

if two identical queries posed by the same user using two different sessions with the same session options arrive simultaneously, two query plans exists while they execute [3]

at the end of their executions, the plan for only one of them is retained in the plan cache [3]

can be cached (aka query plan caching)

query plans are cashed for many types of batches [3]

an exact text match for two ad-hoc queries before reuse can occur [3]

the text match is both case- and space-sensitive, even on case-insensitive servers [3]

if a query plan is not cached, its reuse opportunity is zero [3]

such a plan will be compiled every time it is executed, potentially resulting in poor performance [3]
in some cases, non-caching is a desirable outcome [3]

can be reused (aka query plan reuse)

when the plan cache is searched for plan reuse opportunities, the comparisons are against query plans, not against execution contexts [3]
once a reusable query plan is found, an available execution context is found (causing execution context reuse) or freshly generated [3]
does not necessarily imply execution context reuse [3]
query plans should be reused whenever possible and optimal to avoid unnecessary query compilation costs [3]
factors that affect plan-reuse:

[stored procedures and triggers] the execution plan is executed separately from the execution plan for the batch calling the stored procedure or firing the trigger [7]

allows for greater reuse of their execution plans [7]
a query plan for a stored procedure is not reused when executed across different databases

doesn’t apply to ad-hoc queries, prepared queries, or dynamic SQL

trigger execution

the number of rows affected by that execution (1 versus n) is a distinguishing factor in determining a plan cache hit
⇐ specific to triggers, and does not apply to stored procedures.
[SQL Server 2005] INSTEAD OF triggers

the "1-plan" is shared by executions that affect both 0 and 1 row

[SQL Server 2005] non-INSTEAD OF ("after") triggers

"1-plan" is only used by executions that affect 1 row and "n-plan" is used by executions that affect both 0 and n rows (n > 1)

triggers associated with bulk inserts are cached.

bulk insert statements are never cached
a batch that contains any one literal longer than 8 KB is not cached

a literal's length is measured after constant folding is applied

batches flagged with the "replication flag" associated with a replication user are not matched with batches without that flag.
batch called from CLR is not matched with the same batch submitted from outside of CLR

⇐ applies to CLR triggers and non-CLR triggers
two CLR-submitted batches can reuse the same plan.

notification queries and non-notification queries
query plans for queries submitted via sp_resyncquery are not cached

if the query is resubmitted (via sp_resyncquery or otherwise), it needs to be compiled again

batches within a cursor with non-cursor batches
certain SET options (see [3])

{best practice} to avoid SET option-related recompilations, establish SET options at connection time, and ensure that they do not change for the duration of the connection [3]

batches with unqualified object names result in non-reuse of query plans.
stored procedure created with "CREATE PROCEDURE ...WITH RECOMPILE
stored procedure executed using "EXEC ... WITH RECOMPILE"

trace flag 8675

determines when the optimizer times out while generating a query plan [2]
tells when the optimizer reaches the memory limit to which SQL Server restricts it [2]

⇐ about 80% of the BPool

{concept} trivial plan

results when the query optimizer determines that given the tables referenced in the query and the indexes existing on them, only one plan is possible [3]

usually a single table is referenced
doesn't require any cost estimation or comparisons between plans [2]

in XML plan will appear statementOptmLevel="TRIVIAL"
a query that has generated a trivial plan may not always generate a trivial plan

e.g. new indexes might be created on the underlying tables, and so multiple access paths become available to the query optimizer

may be generated for certain types of covered queries, joins between tables with constraint relationships, DML statements [2]
a query that contains a subquery is not eligible for a trivial plan [2]
using a variable in the query doesn't impair the optimizer's ability to select a trivial plan [12]

regardless of variable's value at runtime, a simple equality operator is being used to compare the variable with a uniquely indexed column, and the optimizer intrinsically knows that a trivial plan is the best option [2]

8759 trace flag

allows detecting when a trivial plan has been chosen [2]
{enabled} the optimizer writes the first portion of a query for which a trivial plan is created to the error log [2]

{concept} execution context

data structure that holds parameter values and user-specific information specific to the executed query
derived from a query plan 'on the fly' [3]

before a batch execution begins, a skeleton execution context is generated

as execution proceeds, the necessary execution context pieces are generated and put into the skeleton

⇒ two execution contexts need not be identical even after user-specific information and query parameters are deleted from them

because structures of execution contexts derived from the same query plan can differ from one another, the execution context used for a particular execution has slight impact on performance [3]

the impact of the differences diminishes over time as the plan cache gets 'hot' and as a steady state is reached [3]

holds the values that are needed for a specific execution of a query plan [3]
multiple execution contexts may exist for a query plan
⇒ it cannot exist in the plan cache without an associated query plan [3]

⇒ whenever a query plan is removed from plan cache, all of the associated execution contexts are also removed along with it [3]

cached (aka execution context caching)
can be reused (aka execution context reuse)
each user concurrently executing a batch will have an execution context
is not reentrant

⇐ at any point of time, an execution context can execute only one batch submitted by a session

⇐ while the execution is happening, the context is not given to any other session or user [3]

if a batch execution generates an error of severity 11 or higher, the execution context is destroyed [3]

⇒ even in the absence of memory pressure the number of execution contexts (for a given query plan) cached in plan cache can go up and down [3]

if a parallel query plan is compiled, a serial execution context may be derived from the parallel plan at execution time by disabling parallelism-specific plan operators [3]
execution contexts that are associated with parallel branches of the parallel plan are not cached, however serial branches of the parallel plan are cached [3]

{concept} plan cost

associated with query plans and execution contexts
partially controls how long the plan or context will live in the plan cache
the cost of an ad-hoc query is zero

[SQL Server 2000] for a query plan, the cost is a measure of the server resources (CPU time and I/O) it takes the query optimizer to optimize the batch

for ad-hoc queries, the cost is zero

[SQL Server 2000] for an execution context, the cost is a measure of the server resources (CPU time and I/O) it takes for the server to initialize the execution context so that the individual statements become ready for execution

doesn't include costs (CPU and I/O) incurred during batch executions
typically, execution context costs are lower than query plan costs

[SQL Server 2005] the cost of a query plan is a measure of the amount of resources required to produce it

the cost is calculated in 'number of ticks' with a maximum value of 31

the individual parts of the cost are calculated as follows [3]

two I/Os cost 1 tick, with a maximum of 19 ticks
two context switches cost 1 tick, with a maximum of 8 ticks
sixteen memory pages (128 KB) cost 1 tick, with a maximum of 4 ticks

as soon as the size of the plan cache reaches 50% of the buffer pool size, the next plan cache access decrements the ticks of all of the plans by 1 each [3]

because this decrement is piggybacked on a thread that accesses the plan cache for plan lookup purpose, the decrement can be considered to occur in a lazy fashion [3]

if the sum of the sizes of all of the caches reaches or exceeds 75% of the buffer pool size, a dedicated resource monitor thread gets activated, and it decrements tick counts of all of the objects in all of the caches

this thread's behavior approximates that of the lazy-writer thread in SQL Server 2000

cost = I/O cost + context switch cost (a measure of CPU cost) + memory cost
the plan cache is distinct from the data cache
there are other functionality-specific caches
the lazy-writer process does not decrement costs

query plan reuse causes the query plan cost to be reset to its initial value [3]

{concept} query optimization

the multi-phased process of selecting a 'good-enough' query execution plan from potentially many possible plans [7]
involves the overhead of selecting the query and selecting a plan

the overhead is typically saved several-fold when the query optimizer picks an efficient execution plan [7]

{concept} [SQL Server 2022] optimized plan forcing

part of intelligent query processing family of features
reduces compilation overhead for repeating forced queries [7]

requires the Query Store to be enabled and in 'read write' mode
once the query execution plan is generated, specific compilation steps are stored for reuse as an optimization replay script [7]

stored as part of the compressed showplan XML in Query Store, in a hidden OptimizationReplay attribute [7]

{concept} estimated execution plan

the compiled plan, as produced by the query optimizer [7]

{concept} actual execution plan

the compiled plan plus its execution context [7]
includes runtime information available after the execution completes [7]

e.g. execution warnings, or in newer versions of the Database Engine, the elapsed and CPU time used during execution [7]

{concept} Live Query Statistics

is the same as the compiled plan plus its execution context
includes runtime information during execution progress [7]

e.g. the actual number of rows flowing through the operators

updated every second. Runtime information includes for example [7]

{concept} query plan hash

a binary hash value calculated on the execution plan for a given query [7]
used to uniquely identify similar execution plans [7]

{concept} query hash

a binary hash value calculated on the Transact-SQL text of a query [7]
used to uniquely identify queries [7]

{concept} plan cache

part of the memory pool that is used to store execution plans [7]
has two stores for all compiled plans

Object Plans cache store (OBJCP)

used for plans related to persisted objects (stored procedures, functions, and triggers) [7]

SQL Plans cache store (SQLCP)

used for plans related to auto-parameterized, dynamic, or prepared queries [7]

has two additional stores that don't store compiled plans

Bound Trees cache store (PHDR)

used for data structures (aka bound trees, algebraizer trees) used during plan compilation for views, constraints, and defaults [7]

Extended Stored Procedures cache store (XPROC)

used for predefined system procedures that are defined using a DLL, not using Transact-SQL statements [7]

e.g. sp_executeSql or xp_cmdshell

the cached structure contains only the function name and the DLL name in which the procedure is implemented [7]

{concept} plan handle

transient identifier that allows to retrieve a compiled plan from the plan cache [7]

remains constant only while the plan remains in the cache [7]

hash value derived from the compiled plan of the entire batch [7]
remains the same even if one or more statements in the batch get recompiled [7]
if a plan was compiled for a batch instead of a single statement, the plan for individual statements in the batch can be retrieved using the plan handle and statement offsets [7]

{concept} recompile execution plans
- certain changes in a database can cause an execution plan to be either inefficient or invalid, based on the new state of the database [7]
- most recompilations are required either for statement correctness or to obtain potentially faster query execution plans [7]

Acronyms
MAXDOP - Maximum Degree of Parallelism

Resources:
[1] Scalability Experts, "Microsoft® SQL Server 2005: Changing the Paradigm", 2005
[2] Ken Henderson (2004) The Guru's Guide to SQL Server Architecture and Internals 55/12
[3] Greg Low, Plan Caching and Recompilation in SQL Server 2012, 2013 [3]
[4] MCITP Self-Paced Training Kit Exam 70-442
[5] Microsoft Learn (2012) Execution Plan Caching and Reuse (link)
[6] Microsoft Learn (2023) Execution plan overview (link)
[7] Microsoft Learn (2023) Optimized plan forcing with Query Store (link)
[8] Microsoft Learn (2023) Query processing architecture guide (link)

SQL Troubles

Pages