Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.
Last updated: 19-Mar-2024
[SQL Server 2005] Statistics
- {def} objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view [2]
- used by query optimizer to estimate the cardinality (aka number of rows) in the query result
- created on any data type that supports comparison operations [4]
- metadata maintained about index keys and, optionally, nonindexed column values
- implemented via statistics objects
- can be created over most types
- generally data types that support comparisons (such as >, =, and so on) support the creation of statistics [20]
- the DBA is responsible for keeping the statistics objects up to date in the system [20]
- {restriction} the combined width of all columns constituting a single statistics set must not be greater than 900 bytes [21]
- statistics collected
- time of the last statistics collection (inside STATBLOB) [21]
- number of rows in the table or index (rows column in SYSINDEXES) [21]
- number of pages occupied by the table or index (dpages column in SYSINDEXES) [21]
- number of rows used to produce the histogram and density information (inside STATBLOB, described below) [21]
- average key length (inside STATBLOB) [21]
- histogram
- measures the frequency of occurrence for each distinct value in a data set [31]
- computed by the query optimizer on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view [31]
- when created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers [31]
- it sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps [31]
- each step includes a range of column values followed by an upper bound column value [31]
- the range includes all possible column values between boundary values, excluding the boundary values themselves [21]
- the lowest of the sorted column values is the upper boundary value for the first histogram step [31]
- defines the histogram steps according to their statistical significance [31]
- uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values
- the maximum number of steps is 200 [31]
- the number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points [31]
- stored only for the first column of a composite index
- the most selective columns in a multicolumn index should be selected first [2]
- the histogram will be more useful to the optimizer [2]
- single column histogram [21]
- splitting up composite indexes into multiple single-column indexes is sometimes advisable [2]
- used to estimate the selectivity of nonequality selection predicates, joins, and other operators [3]
- contains a sampling of up to 200 values for the index's first key column
- the values in a given column are sorted in ordered sequence
- divided into up to 199 intervals
- so that the most statistically significant information is captured
- in general, of nonequal size
- {type} table-level statistics
- statistics maintained on each table
- include:
- number of rows in the table [8]
- number of pages used by the table [8]
- number of modifications made to the keys of the table since the last update to the statistics [8]
- {type} index statistics
- created with the indexes
- updated with fullscan when indexes are rebuild the index [13]
- {exception} [SQL Server 2012] for partitioned indexes when the number of partitions >1000 it uses default sampling [13]
- {type} column statistics (aka non-index statistics)
- statistics on non-indexed columns
- determine the likelihood that a given value might occur in a column [2]
- gives the optimizer valuable information in determining how best to service a query [2]
- allows the optimizer to estimate the number of rows that will qualify from a given table involved in a join [2]
- allows to more accurately select join order [2]
- if automatic creation or updating of statistics is disabled, the Query Optimizer returns a warning in the showplan output when compiling a query where it thinks it needs this information [20]
- used by optimizer to provide histogram-type information for the other columns in a multicolumn index [2]
- the more information the optimizer has about data, the better [2]
- queries asking for data that is outside the bounds of the histogram will estimate 1 row and typically end up with a bad, serial plan [15]
- automatic created
- when nonindexed column is queried while AUTO_CREATE_STATISTICS is enabled for the database
- aren’t updated when an index is reorganized or rebuild [10]
- {exception} [SQL Server 2005] indexes rebuilt with DBCC dbreindex [10]
- updates index statistics as well column statistics [10]
- this feature will be removed in a future version of Microsoft SQL Server
- {type} [SQL Server 2008] filtered statistics
- useful when dealing with partitioned data or data that is wildly skewed due to wide ranging data or lots of nulls
- scope defined with the help of a statistic filter
- a condition that is evaluated to determine whether a row must be part of the filtered statistics
- the predicate appears in the WHERE clause of the CREATE STATISTICS or CREATE INDEX statements (in the case when statistics are automatically created as a side effect of creating an index)
- {type} custom statistics
- {limitation}[SQL Server] not supported
- {type} temporary statistics
- when statistics on a read-only database or read-only snapshot are missing or stale, the Database Engine creates and maintains temporary statistics in tempdb [22]
- a SQL Server restart causes all temporary statistics to disappear [22]
- the statistics name is appended with the suffix _readonly_database_statistic
- to differentiate the temporary statistics from the permanent statistics [22]
- reserved for statistics generated by SQL Server [2]
- scripts for the temporary statistics can be created and reproduced on a read-write database [22]
- when scripted, SSMS changes the suffix of the statistics name from _readonly_database_statistic to _readonly_database_statistic_scripted [22]
- {restriction} can be created and updated only by SQL Server [22]
- can be deleted and monitored [22]
- {operation} create statistics
- auto-create (aka auto-create statistics)
- on by default
- samples the rows by default
- {exception} when statistics are created as a by-product of index creation, then a full scan is used [3]
- {exception} statistics may not be created
- for tables where the cost of the plan execution would be lower than the statistics creation itself [21]
- when the server is too busy [21]
- e.g. too many outstanding compilations in progress [21]
- manually
- via CREATE STATISTICS
- only generates the statistics for a given column or combination of columns [21]
- {recommendation} keep the AUTO_CREATE_STATISTICS option on so that the query optimizer continues to routinely create single-column statistics for query predicate columns [22]
- {operation} update statistics
- covered by the same SQL Server Profiler event as statistics creation
- ensures that queries compile with up-to-date statistics [22]
- {recommendation} statistics should not be updated too frequently [22]
- because there is a performance tradeoff between improving query plans and the time it takes to recompile queries [22]
- tradeoffs depend from application to application [22]
- triggered by the executions of either commands:
- CREATE INDEX ... WITH DROP EXISTING
- scans the whole data set
- the index statistics are initially created without sampling
- allows to set the sample size in the WITH clause either by specifying
- FULLSCAN
- percentage of data to scan
- interpreted as an approximation
- sp_createstats stored procedure
- sp_updatestats stored procedure
- DBCC DBREINDEX
- rebuilds one or more indexes for a table in the specified database [21]
- DBCC INDEXDEFRAG or ALTER INDEX REORGANIZE operations don’t update the statistics [22]
- undocumented options
- meant for testing and debugging purposes
- should never be used on production systems [29]
- STATS_STREAM = stats_stream
- ROWCOUNT = numeric_constant
- incl PAGECOUNT, alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object [29]
- read by the Query Optimizer when processing queries that access the table and/or index in question [29]
- cheat the Optimizer into thinking that a table or index is extremely large [29]
- the content of the actual tables and indexes will remain intact [29]
- PAGECOUNT = numeric contant
- auto-update (aka auto statistics update)
- on by default
- samples the rows by default
- always performed by sampling the index or table [21]
- triggered by either
- query optimization
- execution of a compiled plan
- involves only a subset of the columns referred to in the query
- occurs
- before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF
- asynchronously if AUTO_UPDATE_STATISTCS_ASYNC is ON
- the query that triggered the update proceeds using the old statistics
- provides more predictable query response time for some workloads [3]
- particularly the workload with short running queries and very large tables [3]
- it tracks changes to columns in the statistics
- {limitation} it doesn’t track changes to columns in the predicate [3]
- {recommendation} if there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes [3]
- enable/disable statistics update
- database level
- ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS OFF
- {limitation} it’s not possible to override the database setting of OFF for auto update statistics by setting it ON at the statistics object level [3]
- table level
- NORECOMPUTE option of the UPDATE STATISTICS command
- CREATE STATISTICS command
- sp_autostats
- index
- sp_autostats
- statistics object
- sp_autostats
- [SQL Server 2005] asynchronous statistics update
- allows the statistics update operation to be performed on a background thread in a different transaction context
- avoids the repeating rollback issue [20]
- the original query continues and uses out-of-date statistical information to compile the query and return it to be executed [20]
- when the statistics are updated, plans based on those statistics objects are invalidated and are recompiled on their next use [20]
- {command}
- ALTER DATABASE... SET AUTO_UPDATE_STATISTICS_ASYNC {ON | OFF}
- manually
- {recommendation} don’t update statistics after index defragmentation
- update eventually only the column statistics
- [system tables] it might be needed to update statistics also on system tables when many objects were created
- triggers a recompilation of the queries [22]
- {exception} when a plan is trivial [1]
- won’t be generated a better or different plan [1]
- {exception} when the plan is non-trivial but no row modifications since last statistics update [1]
- no insert, delete or update since last statistics update [1]
- manually free procedure cache in case is needed to generate a new plan [1]
- schema change
- dropping an index defined on a table or an indexed view
- only if the index is used by the query plan in question
- [SQL Server 2000] manually updating or dropping a statistic (not creating!) on a table will cause a recompilation of any query plans that use that table
- the recompilation happens the next time the query plan in question begins execution
- [SQL Server 2005] dropping a statistic (not creating or updating!) defined on a table will cause a correctness-related recompilation of any query plans that use that table
- the recompilations happen the next time the query plan in question begins execution
- updating a statistic (both manual and auto-update) will cause an optimality-related (data related) recompilation of any query plans that uses this statistic
- update scenarios
- {scenario} query execution times are slow [22]
- troubleshooting: ensure that queries have up-to-date statistics before performing additional analysis [22]
- {scenario} insert operations occur on ascending or descending key columns [22]
- statistics on ascending or descending key columns (e.g. IDENTITY or real-time timestamp) might require more frequent statistics updates than the query optimizer performs [22]
- if statistics are not up-to-date and queries select from the most recently added rows, the current statistics will not have cardinality estimates for these new values [22]
- inaccurate cardinality estimates and slow query performance [22]
- after maintenance operations [22]
- {recommendation} consider updating statistics after performing maintenance procedures that change the distribution of data (e.g. truncating a table, bulk inserts) [22]
- this can avoid future delays in query processing while queries wait for automatic statistics updates [22]
- rebuilding, defragmenting, or reorganizing an index do not change the distribution of data [22]
- no need to update statistics after performing ALTER INDEX REBUILD, DBCC REINDEX, DBCC INDEXDEFRAG, or ALTER INDEX REORGANIZE operations [22]
- {operation} dropping statistics
- via DROP STATISTICS command
- {limitation} it’s not possible to drop statistics that are a byproduct of an index [21]
- such statistics are removed only when the index is dropped [21]
- aging
- [SQL Server 2000] ages the automatically created statistics (only those that are not a byproduct of the index creation)
- after several automatic updates the column statistics are dropped rather than updated [21]
- if they are needed in the future, they may be created again [21]
- there is no substantial cost difference between statistics that are updated and created [21]
- does not affect user-created statistics [21]
- {operation} statistics import)
- performed by running generated scripts (see statistics export )
- {operation}export (aka statistics export)
- performed via “Generate Scripts” task
- WITH STATS_STREAM
- value can be generated by using DBCC SHOW_STATISTICS WITH STATS_STREAM
- {operation} save/restore statistics
- not supported
- {operation} monitoring
- via dbcc show_statistics
- via SQL Server query profiler
- {option} AUTO_CREATE_STATISTICS
- when enabled the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan [22]
- these single-column statistics are created on columns that do not already have a histogram in an existing statistics object [22]
- it applies strictly to single-column statistics for the full table [22]
- statistics name starts with _WA
- does not determine whether statistics get created for indexes [22]
- does not generate filtered statistics [22]
- call: SELECT DATABASEPROPERTY(<database_name>','IsAutoCreateStatistics')
- call: select is_auto_create_stats_on from sys.databases where name = '<database_name'
- {option}AUTO_UPDATE_STATISTICS option
- when enabled the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query [22]
- statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view [22]
- determined by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold [22]
- the threshold is based on the number of rows in the table or indexed view [22]
- the query optimizer checks for out-of-date statistics
- before compiling a query
- the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date [22]
- before executing a cached query plan
- the Database Engine verifies that the query plan references up-to-date statistics [22]
- applies to
- statistics objects created for indexes, single-columns in query predicates [22]
- statistics created with the CREATE STATISTICS statement [22]
- filtered statistics [22]
- call: SELECT DATABASEPROPERTY(<database_name>','IsAutoUpdateStatistics')
- call: select is_auto_update_stats_on from sys.databases where name = '<database_name'
- {option} AUTO_UPDATE_STATISTICS_ASYNC
- determines whether the query optimizer uses synchronous or asynchronous statistics updates
- {default} off
- the query optimizer updates statistics synchronously
- queries always compile and execute with up-to-date statistics
- when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query [22]
- {recommendation} use when performing operations that change the distribution of data [22]
- e.g. truncating a table or performing a bulk update of a large percentage of the rows [22]
- if the statistics are not updated after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data [22]
- applies to statistics objects created for [22]
- indexes
- single columns in query predicates
- statistics created with the CREATE STATISTICS statement
- asynchronous statistics updates
- queries compile with existing statistics even if the existing statistics are out-of-date
- the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles [22]
- queries that compile after the asynchronous updates have completed will benefit from using the updated statistics [22]
- {warning} it will do nothing if the “Auto Update Statistics” database option isn’t enabled [30]
- {recommendation} scenarios
- to achieve more predictable query response times
- an application frequently executes the same query, similar queries, or similar cached query plans
- the query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics [22]
- this avoids delaying some queries and not others [22]
- an application experienced client request time outs caused by one or more queries waiting for updated statistics [22]
- in some cases waiting for synchronous statistics could cause applications with aggressive time outs to fail [22]
- call: select is_auto_update_stats_async_on from sys.databases where name = '<database_name'
- [SQL Server 2014] Auto Create Incremental Statistics
- when active (ON), the statistics created are per partition statistics [22]
- {default} when disabled (OFF), the statistics tree is dropped and SQL Server re-computes the statistics [22]
- this setting overrides the database level INCREMENTAL property [22]
- [partitions] when new partitions are added to a large table, statistics should be updated to include the new partitions [22]
- however the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long [22]
- scanning the entire table isn't necessary because only the statistics on the new partitions might be needed [22]
- the incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics [22]
- if per partition statistics are not supported the option is ignored and a warning is generated [22]
- not supported for several statistics types
- statistics created with indexes that are not partition-aligned with the base table [22]
- statistics created on AlwaysOn readable secondary databases [22]
- statistics created on read-only databases [22]
- statistics created on filtered indexes [22]
- statistics created on views [22]
- statistics created on internal tables [22]
- statistics created with spatial indexes or XML indexes [22]
References:
[1] Microsoft Learn (2025) SQL Server: Statistics [link]
[2] Microsoft Learn (2025) SQL Server: Configure auto statistics [link]
[3] Microsoft Learn (2009) SQL Server: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 [link]
[1] Microsoft Learn (2025) SQL Server: Statistics [link]
[2] Microsoft Learn (2025) SQL Server: Configure auto statistics [link]
[3] Microsoft Learn (2009) SQL Server: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 [link]
[4] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[8] Microsoft (2014) Statistical maintenance functionality (autostats) in SQL Server (kb 195565)
[8] Microsoft (2014) Statistical maintenance functionality (autostats) in SQL Server (kb 195565)
[10] CSS SQL Server Engineers (2015) Does rebuild index update statistics?
[link]
[16] Thomas Kejser's Database Blog (2011) The Ascending Key Problem in Fact
Tables – Part one: Pain! by Thomas Kejser
[20] Kalen Delaney et al (2013) Microsoft SQL Server 2012 Internals, 2013
[20] Kalen Delaney et al (2013) Microsoft SQL Server 2012 Internals, 2013
No comments:
Post a Comment