SQL Troubles: query optimizer

Showing posts with label query optimizer. Show all posts

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 6-Apr-2025

[Microsoft Fabric] Hints in Warehouse

{def} keywords that users can add to SQL statements to provide additional information or instructions to the query optimizer [2]

options or strategies specified for enforcement by the SQL Server query processor [1]

applicable to SELECT, INSERT, UPDATE, or DELETE statements [1]

{benefit} help improve the performance, scalability, or consistency of queries by overriding the default behavior of the query optimizer [2]
{type} join hints

specify what join strategy/algorithm used between two tables [3]

improves the performance of queries that involve large or complex joins [2]
specified in the FROM clause of a query [1]
if a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query [3]

based on the position of the ON keywords [3]

when a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order [3]

looping

via LOOP JOIN
{restriction} can't be specified together with RIGHT or FULL as a join type [3]

hashing

via HASH JOIN

merging

via MERGE JOIN

REPLICATE

causes a broadcast move operation

a specific table to be replicated across all distribution nodes [2]

with INNER or LEFT` join

the broadcast move operation will replicate the right side of the join to all nodes [2]

with RIGHT join

the broadcast move operation will replicate the left side of the join to all nodes [2]

with FULL` join

an estimated plan cannot be created [2]

REDISTRIBUTE [(colsCount)]

ensures two data sources are distributed based on JOIN clause columns [2]
handles multiple join conditions, specified by the first n columns in both tables, where n is the column_count argument [2]
redistributing data optimizes query performance by evenly spreading data across nodes during intermediate steps of execution [2]
the (columns_count) argument is only supported in MF warehouse [2]

{type} query hint

specify that the indicated hints are used in the scope of a query [3]

affect all operators in the statement

[UNION only the last query involving a UNION operation can have the OPTION clause [3]

specified as part of the OPTION clause [3]
Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan [3]

used via the OPTION clause at the end of a query [2]

followed by the name of the query hint and its optional parameters in parentheses [2]
multiple hints can be used in the same query, separated by commas

e.g. FORCE ORDER and MAX_GRANT_PERCENT

instruct the QO to preserve the join order specified in the query and to limit the memory grant to 20 percent of the available memory

{hint} HASH GROUP

specifies that the QO should use a hash-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve large or complex grouping sets [2]

{hint} ORDER GROUP

specifies that the QO should use a sort-based algorithm for the GROUP BY operation [2]
{benefit} can improve the performance of queries that involve small or simple grouping sets [2]

{hint} MERGE UNION

specifies that the QO should use a merge-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve sorted inputs [2]

{hint} HASH UNION

specifies that the query optimizer should use a hash-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improve the performance of queries that involve unsorted or large inputs [2]

{hint} CONCAT UNION

specifies that the QO should use a concatenation-based algorithm for the UNION or UNION ALL operation [2]
{benefit} can improves the performance of queries that involve distinct or small inputs[2]

{hint} FORCE ORDER

specifies that the QO should preserve the join order specified in the query [2]
{benefit} can improves the performance or consistency of queries that involve complex join conditions or hints [2]

{hint} FORCE SINGLE NODE PLAN/FORCE DISTRIBUTED PLAN

allows to choose whether to force a single node plan or a distributed plan for query’s execution [2]

{hint} USE HINT

adds one or more extra hints to the query processor, where the hints are specified with a hint name inside single quotation marks inside OPTION clause [2] OPTION(USE HINT(‘ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES’))
used with several hint names, changing the behavior of CE derivation

{hint name} ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full correlation among filters when a high level of underestimation on AND predicates for filters is observed [2]
[SQL Server] equivalent TF 4137 [4]

{hint name} ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
MF assumes full independence among filters [2]

if a high level of overestimation on AND predicates for filters is observed, this hint can help produce a better estimate [2]

{hint name} ASSUME_PARTIAL_CORRELATION_FOR_FILTER_ESTIMATES

applies when calculating cardinality estimates for AND predicates for filters [2]
{default} MF assumes partial correlation among filters [2]

⇐ it is unlikely that this hint will help improve the estimates [2]

{hint name} ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS

applies when calculating cardinality estimate for joins [2]
uses Simple Containment assumption instead of the default Base Containment assumption [2]
[SQL Server] equivalent TF 9476 [4]

{type} table hints

none enforced currently

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) SQL: Hints (T-SQL) [link]

[2] Microsoft Fabric Updates Blog (2025) Hints in Fabric Data Warehouse [link]
[3] Microsoft Learn (2025) SQL: OPTION clause (T-SQL) [link]
[4] Microsoft Support (2016) KB3189813 - Update introduces USE HINT query hint argument in SQL Server 2016 [link]

Resources:
[R1] Microsoft Learn (2025) SQL Server: Query hints (T-SQL) [link]
[R2] Most Useful Query Hints [link]

Acronyms:
MF - Microsoft Fabric
TF - Trace Flag
QO - Query Optimizer

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 22-Mar-2024

[SQL Server 2005] Indexed View

{def} a materialized view

materializes the data from the view queries, storing it in the database in a way similar to tables [6]

⇒ its definition is computed and the resulting data stored just like a table [3]
the view is indexed by creating a unique clustered index on it

the resulting structure is physically identical to a table with a clustered index

⇐ nonclustered indexes also are supported on this structure

can be created on a partitioned table, respectively can be partitioned [1]

{benefit} can improve the performance of some types of queries [3]

e.g. queries that aggregate many rows
⇐ because the view is stored the same way a table with a clustered index is stored [1]
⇐ not well-suited for underlying data that are frequently updated [3]
more expensive to use and maintain than filtered indexes [5]

[query optimizer]

can use it to speed up the query execution [1]

the view doesn't have to be referenced in the query for the optimizer to consider it for a substitution [1]
{downside} DML query performance can degrade significantly [1]

⇐ in some cases, a query plan can't even be produced [1]
when executing UPDATE, DELETE or INSERT on the base table referenced, the indexed views must be updated as well [1]
{recommendation} test DML queries before production use [1]

analyze the query plan and tune/simplify the DML statemen [1]

can use the structure to return results more efficiently to the user

contains logic to use this index in either of the cases

the original query text referenced the view explicitly [2]
the user submits a query that uses the same components as the view (in any equivalent order) [2]
⇐ the query processor expands indexed views early in the query pipeline and always uses the same matching code for both cases [2]

the WITH(NOEXPAND) hint tells the query processor not to expand the view definition [2]
also instructs the query processor to perform an index scan of the indexed view rather than expand it into its component parts [5]
any extra rows in the indexed view are reported as 8907 errors [5]
any missing rows are reported as 8908 errors [5]

expose some of the benefits of view materialization while retaining the benefits of global reasoning about query operations [2]
expanded (aka in-lined) before optimization begins

gives the Query Optimizer opportunities to optimize queries globally [2]
makes it difficult for the (query) optimizer to consider plans that perform the view evaluation first, then process the rest of the query [2]

arbitrary tree matching is a computationally complex problem, and the feature set of views is too large to perform this operation efficiently [2]

cases in which it does not match the view

indexed views are inserted into the Memo and evaluated against other plan choices

while they are often the best plan choice, this is not always the case [2]
the Query Optimizer can detect logical contradictions between the view definition and the query that references the view [2]

there are also some cases where the Query Optimizer does not recognize an indexed view even when it would be a good plan choice [2]

often, these cases deal with complex interactions between high-level features within the query processor (e.g. computed column matching, the algorithm to explore join orders) [2]
consider the WITH (NOEXPAND) hint to force the query processor to pick that indexed view [2]

this usually is enough to get the plan to include the indexed view [2]

indexed view alternatives

are generated and stored in the Memo
are compared using costing equations against other possible plans
partial matches cost the residual operations as well

an indexed-view plan can be generated but not picked when the Query Optimizer considers other plans to have lower costs [2]

maintained as part of the update processing for tables on which the view is based

this makes sure that the view provides a consistent result if it is selected by the Query Optimizer for any query plan [2]
some query operations are incompatible with this design guarantee

restrictions are placed on the set of supported constructs in indexed views to make sure that the view can be created, matched, and updated efficiently [2]

{operation} updating indexed views

the core question behind the restrictions is “Can the query processor compute the necessary changes to the Indexed View clustered and nonclustered indexes without having to recompute the whole indexed view?” [2]

if so, the query processor can perform these changes efficiently as part of the maintenance of the base tables that are referenced in the view[2]

this property is relatively easy for filters, projections (compute scalar), and inner joins on keys[2]
operators that destroy or create data are more difficult to maintain, so often these are restricted from use in indexed views. [2]

matching indexed views is supported in cases beyond exact matches of the query text to the view definition [2]

it also supports using an indexed view for inexact matches where the definition of the view is broader than the query submitted by the user [2]

then applies residual filters, projections (columns in the select list), and even aggregates to use the view as a partial precomputation of the query result [2]

{concept} statistics on indexed views

normally statistics aren't needed

because the substitution of the indexed views into the query plan is considered only after all the statistics for the underlying tables and indexes are attached to the query plan [3]
used if the view is directly referenced by the NOEXPAND hint in a FROM clause

an error is generated and the plan is not created if the NOEXPAND hint is used on a view that does not also contain an index [3]

can’t be created by using sp_createstats or updated by using sp_updatestats.
auto update and auto create statistics features work for indexed views

created manually

via CREATE STATISTICS on the indexed view columns
via UPDATE STATISTICS to update column or index statistics on indexed views

{operation} creating a view

requires that the underlying object’s schema can’t change
requires WITH SCHEMABINDING option [5]
⇒ must include the two-part names of all referenced tables [5]
⇐ the tables can't be dropped and the columns can't be altetd when participate in a view unless the view is tropped [5]
⇐ an error is raised [5]
the user must hold

the CREATE VIEW permission in the database [1]
ALTER permission on the schema in which the view is being created [1]
if the base table resides within a different schema, the REFERENCES permission on the table is required as a minimum [1]
if the user creating the index differs from the users who created the view, for the index creation alone the ALTER permission on the view is required [1]

{operation} creating an index on the view[

indexes can only be created on views that have the same owner as the referenced table or tables (aka intact ownership chain between the view and the tables) [1]

{operation} dropping a view

makes all indexes on the view to be dropped [1]

⇐ all nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped [1]

{exception} ser-created statistics on the view are maintained [1]

nonclustered indexes can be individually dropped [1]
dropping the clustered index on the view

removes the stored result set [1]
the optimizer returns to processing the view like a standard view [1]

{operation} disable indexes on tables and views

when a clustered index on a table is disabled, indexes on views associated with the table are also disabled [1]

{option} EXPAND VIEWS

allows to prevent the Database Engine from using indexed views [1]

if any of the listed options are incorrectly set, this option prevents the optimizer from using the indexes on the views [1]
via OPTION (EXPAND VIEWS) hint

{recommendation} when using datetime and smalldatetime string literals in indexed views, explicitly convert the literal to the date type by using a deterministic date format style [1]
{limitation} AVG is not allowed {workaround} use SUM and COUNT_BIG (5]
{limitation} impacted by SET options [1]

{restriction} require fixed values for several SET options [1]
{recommendation} set the ARITHABORT user option to ON server-wide as soon as the first indexed view or index on a computed column is created in any database on the server [1]

{limitation} further requirements apply (see [1])
{limitation} aren't supported on top of temporal queries

⇐ queries that use FOR SYSTEM_TIME clause).

{scenario}simplifying SQL queries
{scenario} abstracting data models from user models
{scenario} enforcing user security

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: Create indexed views [link]
[2] Kalen Delaney et all (2009) Microsoft® SQL Server® 2008 Internals
[3] Microsoft Learn (2024) SQL Server: Views [link]
[4] Microsoft Learn (2024) SQL Server: CREATE INDEX (Transact-SQL) [link]

[5] Kalen Delaney et all (2012) Microsoft® SQL Server® 2012 Internals

[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

Resources:
[R1] Microsoft Learn (2024) SQL Server: Optimize index maintenance to improve query performance and reduce resource consumption [link]

Acronyms:
DML - Data Manipulation Language
QO - Query Optimizer

20 February 2025

💠🛠️🗒️SQL Server: Folding [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 20-Feb-2024

[SQL Server] Folding

{def} the process by which the optimizer is able to properly determine certain types of indexable expressions even when the column in the expression is involved in a subexpression or nestled in a function

is an optimization over older versions of SQL Server in which the optimizer was unable to use an index to service a query clause when the table column was involved in an expression or buried in a function [1]

{type} constant folding

some constant expression is evaluated early
foldable constant expressions [2]

arithmetical expressions
logical expressions
built-in functions whose input doesn’t depend of contextual information,

e. g. SET options, language settings, database options, encryption keys
deterministic built-in functions are foldable, with some exceptions

certain forms of the LIKE predicate
[SQL Server 2012+] deterministic methods of CLR user-defined types [3]
[SQL Server 2012+] deterministic scalar-valued CLR user-defined functions [3]

nonfoldable expressions [2]

expressions whose results depend on a local variable or parameter
user-defined functions

both T-SQL and CLR

expressions whose results depend on language settings.
expressions whose results depend on SET options.
expressions whose results depend on server configuration options.
nonconstant expressions such as an expression whose result depends on the value of a column.
nondeterministic functions
if the output is a large object type, then the expressions are not folded

e.g. text, image, nvarchar(max), varchar(max), varbinary(max), XML

{benefit} the expression does not have to be evaluated repeatedly at run time [2]
{benefit} the value of the expression after it is evaluated is used by the query optimizer to estimate the size of the result set of the portion of the query [2]

e.g. TotalDue > 117.00 + 1000.00

{type} nonconstant folding

some expressions that are not constant folded but whose arguments are known at compile time, whether the arguments are parameters or constants, are evaluated by the result-set size (cardinality) estimator that is part of the optimizer during optimization [2]
deterministic functions:

e.g. UPPER, LOWER, RTRIM
e.g. DATEPART( YY only ), GetDate, CAST, CONVERT

operators

arithmetic operators: +, -, *, /, unary -,
logical Operators: AND, OR, NOT
comparison operators: <, >, <=, >=, <>, LIKE, IS NULL, IS NOT NULL

Previous Post <<||>> Next Post

References:

[1] Ken Henderson (2003) Guru's Guide to SQL Server Architecture and Internals

[2] Microsoft Learn (2012) SQL Server: Troubleshooting Poor Query Performance: Constant Folding and Expression Evaluation During Cardinality Estimation [link]
[3] Microsoft Learn (2025) SQL Server: Query processing architecture guide [link]
[4] SQLShack (2021) Query Optimization in SQL Server for beginners, by Esat Erkec [link]

17 August 2009

🛢DBMS: Query Optimizer (Definitions)

"SQL Server code that analyzes queries and database objects and selects the appropriate query plan. The SQL Server optimizer is a cost-based optimizer. It estimates the cost of each permutation of table accesses in terms of CPU cost and I/O cost." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A SQL server tool that formulates an optimum execution plan for a query." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"The SQL Server component responsible for generating the optimum execution plan for a query." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The SQL Server database engine component responsible for generating efficient execution plans for SQL statements." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A term applied to a process, within a database engine, that attempts to find the fastest method of executing a SQL command against a database." (Gavin Powell, "Beginning Database Design", 2006)

"This is the component in SQL Server that analyzes your queries, compares them with available indexes, and decides which index will return a result set the fastest." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"An optimization process running within SQL Server. Any queries submitted to SQL Server are first processed by the query optimizer. It determines the best way to run the query, including what indexes to use and what types of joins to use. The output is a query execution plan, sometimes called a query plan or just a plan." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A process that generates query plans. For each query, the optimizer generates a plan that matches the query to the index that will return results as efficiently as possible. The optimizer reuses the query plan each time the query runs. If a collection changes significantly, the optimizer creates a new query plan." (MongoDb, "Glossary", 2008)

"The Optimizer is an internal technology that is responsible for selecting the most efficient means to accessing or altering information. It uses detailed statistics about the database to make the right decision." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A part of a DBMS that examines a nonprocedural data manipulation request and makes a determination of the most efficient way to process that request." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"The component of a relational database system responsible for analyzing SQL queries and producing optimal access paths for retrieving data from the database." (Craig S Mullins, "Database Administration", 2012)

"A component of the SQL and XQuery compiler that chooses an access plan for a data manipulation language statement by modeling the execution cost of many alternative access plans and choosing the one with the minimal estimated cost." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Built-in database software that determines the most efficient way to execute a SQL statement by considering factors related to the objects referenced and the conditions specified in the statement." (Oracle)

"The MySQL component that determines the best indexes and join order to use for a query, based on characteristics and data distribution of the relevant tables." (MySQL)

16 March 2009

🛢DBMS: Query Plan [QP] (Definitions)

"The ordered set of steps required to carry out a query, complete with the access methods chosen for each table." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A portion of a DBMS that determines the most efficient sequence of relational algebra operations to use to satisfy a query." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"The plan produced by an optimizer for processing a query." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A query plan is a sequence of logical and physical operators and data flows that the SQL query optimizer returns for use by the query processor to retrieve or modify data." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"Once the query optimizer determines the best way to execute a query, it creates a query plan. This identifies all the elements of the query, including what indexes are used, what types of joins are employed, and more." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A sequence of logical and physical operators and data flows that the SQL query optimizer returns for use by the query processor to retrieve or modify data." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

[adaptive query plan:] "An execution plan that changes after optimization because run-time conditions indicate that optimizer estimates are inaccurate. An adaptive query plan has different built-in plan options. During the first execution, before a specific subplan becomes active, the optimizer makes a final decision about which option to use. The optimizer bases its choice on observations made during the execution up to this point. Thus, an adaptive query plan enables the final plan for a statement to differ from the default plan." (Oracle)

[default plan:] "For an adaptive plan, the execution plan initially chosen by the optimizer using the statistics from the data dictionary. The default plan can differ from the final plan." (Oracle)

[execution plan:] "The combination of steps used by the database to execute a SQL statement. Each step either retrieves rows of data physically from the database or prepares them for the session issuing the statement." (Oracle)

[query execution plan:] "The set of decisions made by the optimizer about how to perform a query most efficiently, including which index or indexes to use, and the order in which to join tables." (MySQL)

SQL Troubles

Pages

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes] 🆕

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

20 February 2025

💠🛠️🗒️SQL Server: Folding [Notes]

17 August 2009

🛢DBMS: Query Optimizer (Definitions)

16 March 2009

🛢DBMS: Query Plan [QP] (Definitions)

About Me