Showing posts with label query optimizer. Show all posts
Showing posts with label query optimizer. Show all posts

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation. 

Last updated: 22-Mar-2024

[SQL Server 2005] Indexed View

  • {def} a materialized view
    • materializes the data from the view queries, storing it in the database in a way similar to tables [6]
      • ⇒ its definition is computed and the resulting data stored just like a table [3]
      • the view is indexed by creating a unique clustered index on it
        • the resulting structure is physically identical to a table with a clustered index
          • ⇐ nonclustered indexes also are supported on this structure
      • can be created on a partitioned table, respectively can be partitioned [1]
    • {benefit} can improve the performance of some types of queries [3]
      • e.g. queries that aggregate many rows
      • ⇐ because the view is stored the same way a table with a clustered index is stored [1]
      • ⇐ not well-suited for underlying data that are frequently updated [3]
      •  more expensive to use and maintain than filtered indexes [5]
    • [query optimizer] 
      • can use it to speed up the query execution [1]
        • the view doesn't have to be referenced in the query for the optimizer to consider it for a substitution [1]
        • {downside} DML query performance can degrade significantly [1]
          • ⇐ in some cases, a query plan can't even be produced [1]
          • when executing UPDATE, DELETE or INSERT on the base table referenced, the indexed views must be updated as well [1]
          • {recommendation} test DML queries before production use [1]
            • analyze the query plan and tune/simplify the DML statemen [1]
      • can use the structure to return results more efficiently to the user
        • contains logic to use this index in either of the cases 
          • the original query text referenced the view explicitly [2]
          • the user submits a query that uses the same components as the view (in any equivalent order) [2]
          • ⇐ the query processor expands indexed views early in the query pipeline and always uses the same matching code for both cases [2]
            • the WITH(NOEXPAND) hint tells the query processor not to expand the view definition [2]
            • also instructs the query processor to perform an index scan of the indexed view rather than expand it into its component parts [5]
            • any extra rows in the indexed view are reported as 8907 errors [5]
            • any missing rows are reported as 8908 errors [5]
      • expose some of the benefits of view materialization while retaining the benefits of global reasoning about query operations [2]
      • expanded (aka in-lined) before optimization begins
        • gives the Query Optimizer opportunities to optimize queries globally [2]
        • makes it difficult for the (query) optimizer to consider plans that perform the view evaluation first, then process the rest of the query [2]
          • arbitrary tree matching is a computationally complex problem, and the feature set of views is too large to perform this operation efficiently [2]
      • cases in which it does not match the view
        • indexed views are inserted into the Memo and evaluated against other plan choices
          • while they are often the best plan choice, this is not always the case [2]
          • the Query Optimizer can detect logical contradictions between the view definition and the query that references the view [2]
        • there are also some cases where the Query Optimizer does not recognize an indexed view even when it would be a good plan choice [2]
          • often, these cases deal with complex interactions between high-level features within the query processor (e.g. computed column matching, the algorithm to explore join orders) [2]
          • consider the WITH (NOEXPAND) hint to force the query processor to pick that indexed view [2]
            •  this usually is enough to get the plan to include the indexed view [2]
        • indexed view alternatives 
          • are generated and stored in the Memo 
          • are compared using costing equations against other possible plans
          • partial matches cost the residual operations as well
            • an indexed-view plan can be generated but not picked when the Query Optimizer considers other plans to have lower costs [2]
        • maintained as part of the update processing for tables on which the view is based
          • this makes sure that the view provides a consistent result if it is selected by the Query Optimizer for any query plan [2]
          • some query operations are incompatible with this design guarantee
            • restrictions are placed on the set of supported constructs in indexed views to make sure that the view can be created, matched, and updated efficiently [2]
        • {operation} updating indexed views
          • the core question behind the restrictions is “Can the query processor compute the necessary changes to the Indexed View clustered and nonclustered indexes without having to recompute the whole indexed view?” [2]
            • if so, the query processor can perform these changes efficiently as part of the maintenance of the base tables that are referenced in the view[2]
              • this property is relatively easy for filters, projections (compute scalar), and inner joins on keys[2]
              • operators that destroy or create data are more difficult to maintain, so often these are restricted from use in indexed views. [2]
        • matching indexed views is supported in cases beyond exact matches of the query text to the view definition [2]
          • it also supports using an indexed view for inexact matches where the definition of the view is broader than the query submitted by the user [2]
            • then applies residual filters, projections (columns in the select list), and even aggregates to use the view as a partial precomputation of the query result [2]
    • {concept} statistics on indexed views
      • normally statistics aren't needed
        • because the substitution of the indexed views into the query plan is considered only after all the statistics for the underlying tables and indexes are attached to the query plan [3]
        • used if the view is directly referenced by the NOEXPAND hint in a FROM clause 
          • an error is generated and the plan is not created if the NOEXPAND hint is used on a view that does not also contain an index [3]
      • can’t be created by using sp_createstats or updated by using sp_updatestats. 
      • auto update and auto create statistics features work for indexed views
        • created manually
          • via CREATE STATISTICS on the indexed view columns
          • via UPDATE STATISTICS to update column or index statistics on indexed views
    • {operation} creating a view
      • requires that the underlying object’s schema can’t change
      • requires WITH SCHEMABINDING option [5]
      • ⇒ must include the two-part names of all referenced tables [5]
      • ⇐ the tables can't be dropped and the columns can't be altetd when participate in a view unless the view is tropped [5]
      • ⇐ an error is raised [5]
      • the user must hold 
        • the CREATE VIEW permission in the database [1]
        • ALTER permission on the schema in which the view is being created [1]
        • if the base table resides within a different schema, the REFERENCES permission on the table is required as a minimum [1]
        • if the user creating the index differs from the users who created the view, for the index creation alone the ALTER permission on the view is required [1]
    • {operation} creating an index on the view[
      • indexes can only be created on views that have the same owner as the referenced table or tables (aka intact ownership chain between the view and the tables) [1]
    • {operation} dropping a view
      • makes all indexes on the view to be dropped  [1]
        • ⇐ all nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped [1]
          • {exception} ser-created statistics on the view are maintained [1]
      • nonclustered indexes can be individually dropped [1]
      • dropping the clustered index on the view 
        • removes the stored result set [1]
        • the optimizer returns to processing the view like a standard view [1]
    • {operation} disable indexes on tables and views
      • when a clustered index on a table is disabled, indexes on views associated with the table are also disabled [1]
    • {option} EXPAND VIEWS
      • allows to prevent the Database Engine from using indexed views [1]
        • if any of the listed options are incorrectly set, this option prevents the optimizer from using the indexes on the views [1]
        • via OPTION (EXPAND VIEWS) hint
    • {recommendation} when using datetime and smalldatetime string literals in indexed views, explicitly convert the literal to the date type by using a deterministic date format style [1]
    • {limitation} AVG is not allowed {workaround} use SUM and COUNT_BIG (5]
    • {limitation} impacted by SET options [1]
      • {restriction} require fixed values for several SET options [1]
      • {recommendation} set the ARITHABORT user option to ON server-wide as soon as the first indexed view or index on a computed column is created in any database on the server [1]
    • {limitation} further requirements apply (see [1])
    • {limitation} aren't supported on top of temporal queries
      • ⇐ queries that use FOR SYSTEM_TIME clause).
    • {scenario}simplifying SQL queries
    • {scenario} abstracting data models from user models
    • {scenario} enforcing user security


References:
[1] Microsoft Learn (2024) SQL Server: Create indexed views [link]
[2] Kalen Delaney et all (2009) Microsoft® SQL Server® 2008 Internals
[3] Microsoft Learn (2024) SQL Server: Views [link]
[4] Microsoft Learn (2024) SQL Server: CREATE INDEX (Transact-SQL) [link]
[5] Kalen Delaney et all (2012) Microsoft® SQL Server® 2012 Internals
[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

Resources:
[R1] Microsoft Learn (2024) SQL Server: Optimize index maintenance to improve query performance and reduce resource consumption [link]

Acronyms:
DML - Data Manipulation Language
QO - Query Optimizer

20 February 2025

💠🛠️🗒️SQL Server: Folding [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 20-Feb-2024

[SQL Server] Folding

  • {def} the process by which the optimizer is able to properly determine certain types of indexable expressions even when the column in the expression is involved in a subexpression or nestled in a function 
    • is an optimization over older versions of SQL Server in which the optimizer was unable to use an index to service a query clause when the table column was involved in an expression or buried in a function [1]
    • {type} constant folding 
      • some constant expression is evaluated early
      • foldable constant expressions [2]
        • arithmetical expressions 
        • logical expressions 
        • built-in functions whose input doesn’t depend of contextual information, 
          • e. g. SET options, language settings, database options, encryption keys
          • deterministic built-in functions are foldable, with some exceptions
        • certain forms of the LIKE predicate
        • [SQL Server 2012+] deterministic methods of CLR user-defined types [3]
        • [SQL Server 2012+] deterministic scalar-valued CLR user-defined functions [3]
      • nonfoldable expressions [2]
        • expressions whose results depend on a local variable or parameter
        • user-defined functions
          • both T-SQL and CLR
        • expressions whose results depend on language settings.
        • expressions whose results depend on SET options.
        • expressions whose results depend on server configuration options.
        • nonconstant expressions such as an expression whose result depends on the value of a column.
        • nondeterministic functions
        • if the output is a large object type, then the expressions are not folded 
          • e.g. text, image, nvarchar(max), varchar(max), varbinary(max), XML
      • {benefit} the expression does not have to be evaluated repeatedly at run time [2]
      • {benefit} the value of the expression after it is evaluated is used by the query optimizer to estimate the size of the result set of the portion of the query [2]
        • e.g. TotalDue > 117.00 + 1000.00
    • {type} nonconstant folding
      • some expressions that are not constant folded but whose arguments are known at compile time, whether the arguments are parameters or constants, are evaluated by the result-set size (cardinality) estimator that is part of the optimizer during optimization [2]
      • deterministic functions: 
        • e.g. UPPER, LOWER, RTRIM
        • e.g. DATEPART( YY only ), GetDate, CAST, CONVERT
      • operators 
        • arithmetic operators: +, -, *, /, unary -, 
        • logical Operators: AND, OR, NOT
        • comparison operators: <, >, <=, >=, <>, LIKE, IS NULL, IS NOT NULL


    References:
    [1] Ken Henderson (2003) Guru's Guide to SQL Server Architecture and Internals
    [2] Microsoft Learn (2012) SQL Server: Troubleshooting Poor Query Performance: Constant Folding and Expression Evaluation During Cardinality Estimation [link]
    [3] Microsoft Learn (2025) SQL Server: Query processing architecture guide [link]
    [4] SQLShack (2021) Query Optimization in SQL Server for beginners, by Esat Erkec [link]

    17 August 2009

    🛢DBMS: Query Optimizer (Definitions)

    "SQL Server code that analyzes queries and database objects and selects the appropriate query plan. The SQL Server optimizer is a cost-based optimizer. It estimates the cost of each permutation of table accesses in terms of CPU cost and I/O cost." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

    "A SQL server tool that formulates an optimum execution plan for a query." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

    "The SQL Server component responsible for generating the optimum execution plan for a query." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

    "The SQL Server database engine component responsible for generating efficient execution plans for SQL statements." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

    "A term applied to a process, within a database engine, that attempts to find the fastest method of executing a SQL command against a database." (Gavin Powell, "Beginning Database Design", 2006)

    "This is the component in SQL Server that analyzes your queries, compares them with available indexes, and decides which index will return a result set the fastest." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

    "An optimization process running within SQL Server. Any queries submitted to SQL Server are first processed by the query optimizer. It determines the best way to run the query, including what indexes to use and what types of joins to use. The output is a query execution plan, sometimes called a query plan or just a plan." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

    "A process that generates query plans. For each query, the optimizer generates a plan that matches the query to the index that will return results as efficiently as possible. The optimizer reuses the query plan each time the query runs. If a collection changes significantly, the optimizer creates a new query plan." (MongoDb, "Glossary", 2008)

    "The Optimizer is an internal technology that is responsible for selecting the most efficient means to accessing or altering information. It uses detailed statistics about the database to make the right decision." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

    "A part of a DBMS that examines a nonprocedural data manipulation request and makes a determination of the most efficient way to process that request." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

    "The component of a relational database system responsible for analyzing SQL queries and producing optimal access paths for retrieving data from the database." (Craig S Mullins, "Database Administration", 2012)

    "A component of the SQL and XQuery compiler that chooses an access plan for a data manipulation language statement by modeling the execution cost of many alternative access plans and choosing the one with the minimal estimated cost." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

    "Built-in database software that determines the most efficient way to execute a SQL statement by considering factors related to the objects referenced and the conditions specified in the statement." (Oracle)

    "The MySQL component that determines the best indexes and join order to use for a query, based on characteristics and data distribution of the relevant tables." (MySQL)

    16 March 2009

    🛢DBMS: Query Plan [QP] (Definitions)

    "The ordered set of steps required to carry out a query, complete with the access methods chosen for each table." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

    "A portion of a DBMS that determines the most efficient sequence of relational algebra operations to use to satisfy a query." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

    "The plan produced by an optimizer for processing a query." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

    "A query plan is a sequence of logical and physical operators and data flows that the SQL query optimizer returns for use by the query processor to retrieve or modify data." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

    "Once the query optimizer determines the best way to execute a query, it creates a query plan. This identifies all the elements of the query, including what indexes are used, what types of joins are employed, and more." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

    "A sequence of logical and physical operators and data flows that the SQL query optimizer returns for use by the query processor to retrieve or modify data." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

    [adaptive query plan:] "An execution plan that changes after optimization because run-time conditions indicate that optimizer estimates are inaccurate. An adaptive query plan has different built-in plan options. During the first execution, before a specific subplan becomes active, the optimizer makes a final decision about which option to use. The optimizer bases its choice on observations made during the execution up to this point. Thus, an adaptive query plan enables the final plan for a statement to differ from the default plan." (Oracle)

    [default plan:] "For an adaptive plan, the execution plan initially chosen by the optimizer using the statistics from the data dictionary. The default plan can differ from the final plan." (Oracle)

    [execution plan:] "The combination of steps used by the database to execute a SQL statement. Each step either retrieves rows of data physically from the database or prepares them for the session issuing the statement." (Oracle)

    [query execution plan:] "The set of decisions made by the optimizer about how to perform a query most efficiently, including which index or indexes to use, and the order in which to join tables." (MySQL)

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.