Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.
Last updated: 22-Mar-2024
[SQL Server 2005] Statistics
- {def} objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view [2] [see notes]
- {issue} inaccurate statistics (aka bad statistics)
- the primary source for cardinality estimation errors [6]
- guessed selectivity can lead to a less-than-optimal query plan
- {cause} missing statistics
- if an error or other event prevents the query optimizer from creating statistics, the query optimizer creates the query plan without using statistics [22]
- the query optimizer marks the statistics as missing and attempts to regenerate the statistics the next time the query is executed [22]
- [SSMS] indicated as warnings when the execution plan of a query is graphically displayed [22]
- [SQL Server Profiler] monitoring the Missing Column Statistics event class indicates when statistics are missing [22]
- troubleshooting
- {step} verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on [22]
- verify that the database is not read-only [22]
- the query optimizer cannot save statistics in read-only databases [22]
- create the missing statistics by using the CREATE STATISTICS statement [22]
- {cause} use of local variables
- {cause} non-constant-foldable expressions
- {cause} complex expressions
- {cause} ascending key columns
- can cause inaccurate statistics in tables with frequent INSERTs because new values all lie outside the histogram [3]
- it is not necessary for statistics to be highly accurate to be effective [18]
- {issue} stale statistics (aka outdated statistics)
- statistics are outdated based on the number of modifications of the statistics (and index) columns [6]
- SQL Server checks to see if the statistics are outdated when it looks up a plan from the cache [6]
- it recompiles the query if they are [6]
- ⇐ triggers a statistics update [6]
- [temporary tables] can increase the number of recompilations triggered by outdated statistics [6]
- [query hint] KEEPFIXED PLAN
- prevents query recompilation in cases of outdated statistics [6]
- queries are recompiled only when the schemas of the underlying tables are changed or the recompilation is forced
- e.g. when a stored procedure is called using the WITH RECOMPILE clause [6]
- can be identified based on
- Stats_Date(object_id, stats_id) returns the last update date
- Rowmodctr from sys.sysindexes
- has negative impact on performance
- almost always outweighs the performance benefits of avoiding automatic statistics updates/creation [2]
- ⇐ applies also to missing statistics [2]
- auto update/create stats should be disabled only when thorough testing has shown that there's no other way to achieve the performance or scalability you require [2]
- without correct and up-to-date statistical information, can be challenging to determine the best execution plan for a particular query [8]
- {issue} missing statistics
- {symptom} excessive number of logical reads
- often indicates suboptimal execution plans
- due to missing indexes and/or suboptimal join strategies selected
- because of incorrect cardinality estimation [6]
- shouldn't be used as the only criteria during optimization
- further factors should be considered
- resource usage, parallelism, related operators in the execution plan [6]
- {issue} unused statistics
- table can end up having man statistics that serve no purpose
- it makes sense to review and clean up the statistics as part of general maintenance
- based on the thresholds for the automatic update
- {issue} skewed statistics
- may happen when the sample rate doesn’t reflect the characteristics of values’ distribution
- {solution} using a higher sample rate
- update statistics manually with fullscan
- {issue} statistics accumulate over time
- statistics objects are persistent and will remain in the database forever until explicitly dropped
- this becomes expensive over time
- {solution} drop all auto stats
- will place some temporary stress on the system [11]
- all queries that adheres to a certain pattern that requires a statistics to be created, will wait [11]
- typically in a matter of minutes most of the missing statistics will be already back in place and the temporary stress will be over [11]
- {tip} the cleanup can be performed at off-peak hours and ‘warm-up’ the database by capturing and replaying a trace of the most common read only queries [11]
- create many of the required statistics objects without impacting your ‘real’ workload [11]
- {downside} statements may deadlock on schema locks with concurrent sessions [11]
- {issue} overlapping statistics
- column statistics created by SQL Server based on queries that get executed can overlap with statistics for indexes created by users on the same column [2]
Previous Post <<||>> Next Post
References:
[2] Ken Henderson (2003) Guru's Guide to SQL Server Architecture and Internals
[2] Ken Henderson (2003) Guru's Guide to SQL Server Architecture and Internals
[3] Kendal Van Dyke (2010) More On Overlapping Statistics [link]
[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[18] Joe Chang’s Blog (2012) Decoding STATS_STREAM, by Joe Chang
[22] MSDN (2016) Statistics [link]Resources:
Acronyms:
SSMS - SQL Server Management Studio
No comments:
Post a Comment