12 April 2025

🏭🗒️Microsoft Fabric: Copy job in Data Factory [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 11-Apr-2025

[Microsoft Fabric] Copy job in Data Factory 
  • {def} 
    • {benefit} simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation [1]
      • across cloud data stores [1]
      • from on-premises data stores behind a firewall [1]
      • within a virtual network via a gateway [1]
  • elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination [1]
  • {benefit} provides seamless data integration 
    • through over 100 built-in connectors [3]
    • provides essential tools for data operations [3]
  • {benefit} provides intuitive experience
    • easy configuration and monitoring [1]
  • {benefit} efficiency
    • enable incremental copying effortlessly, reducing manual intervention [1]
  • {benefit} less resource utilization and faster copy durations
    • flexibility to control data movement [1]
      • choose which tables and columns to copy
      • map the data
      • define read/write behavior
      • set schedules that fit requirements whether [1]
    • applies for a one-time or recurring jobs [1]
  • {benefit} robust performance
    • the serverless setup enables data transfer with large-scale parallelism
    • maximizes data movement throughput [1]
      • fully utilizes network bandwidth and data store IOPS for optimal performance [3]
  • {feature} monitoring
    • once a job executed, users can monitor its progress and metrics through either [1] 
      • the Copy job panel
        • shows data from the most recent runs [1]
      • reports several metrics
        • status
        • row read
        • row written
        • throughput
      • the Monitoring hub
        • acts as a centralized portal for reviewing runs across various items [4]
  • {mode} full copy
    • copies all data from the source to the destination at once
  • {mode|preview} incremental copy
    • the initial job run copies all data, and subsequent job runs only copy changes since the last run [1]
    • an incremental column must be selected for each table to identify changes [1]
      • used as a watermark
        • allows comparing its value with the same from last run in order to copy the new or updated data only [1]
        • the incremental column can be a timestamp or an increasing INT [1]
      • {scenario} copying from a database
        • new or updated rows will be captured and moved to the destination [1]
      • {scenario} copying from a storage store
        • new or updated files identified by their LastModifiedTime are captured and moved to the destination [1]
      • {scenario} copy data to storage store
        • new rows from the tables or files are copied to new files in the destination [1]
          • files with the same name are overwritten [1]
      • {scenario} copy data to database
        • new rows from the tables or files are appended to destination tables [1]
          • the update method to merge or overwrite [1]
  • {default} appends data to the destination [1]
    • the update method can be adjusted to 
      • {operation} merge
        • a key column must be provided
          • {default} the primary key is used, if available [1]
      • {operation} overwrite
  • availability 
    • the same regional availability as the pipeline [1]
  • billing meter
    • Data Movement, with an identical consumption rate [1]
  • {feature} robust Public API
    • {benefit} allows to automate and manage Copy Job efficiently [2]
  • {feature} Git Integration
    • {benefit} allows to leverage Git repositories in Azure DevOps or GitHub [2]
    • {benefit} allows to seamlessly deploy Copy Job with Fabric’s built-in CI/CD workflows [2]
  • {feature|preview} VNET gateway support
    • enables secure connections to data sources within virtual network or behind firewalls
      • Copy Job can be executed directly on the VNet data gateway, ensuring seamless and secure data movement [2]
  • {feature} Upsert to Azure SQL Database
  • {feature} overwrite to Fabric Lakehouse
  • {enhancement} column mapping for simple data modification to storage as destination store [2]
  • {enhancement} data preview to help select the right incremental column  [2]
  • {enhancement} search functionality to quickly find tables or columns  [2]
  • {enhancement} real-time monitoring with an in-progress view of running Copy Jobs  [2]
  • {enhancement} customizable update methods & schedules before job creation [2]
Previous Post <<||>> Next Post 

References:
[1] Microsoft Learn (2025) Fabric: What is the Copy job in Data Factory for Microsoft Fabric? [link]
[2] Microsoft Fabric Updates Blog (2025) Recap of Data Factory Announcements at Fabric Conference US 2025 [link]
[3] Microsoft Fabric Updates Blog (2025) Fabric: Announcing Public Preview: Copy Job in Microsoft Fabric [link]
[4] Microsoft Learn (2025) Fabric: Learn how to monitor a Copy job in Data Factory for Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: Learn how to create a Copy job in Data Factory for Microsoft Fabric [link]

Acronyms:
API - Application Programming Interfrace
CI/CD - Continuous Integration and Continuous Deployment
DevOps - Development & Operations
DF - Data Factory
IOPS - Input/Output Operations Per Second
VNet - Virtual Network

09 April 2025

💠🛠️🗒️SQL Server: Tempdb Database [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 9-Apr-2025

[SQL Server 2005] Tempdb database

  • {def} system database available as a global resource to all users connected to a Database Engine instance [1]
    • does not persist after SQL Server shuts down 
      • created at restart using the specifications from the model database 
      • doesn’t need crash recovery
      • requires rollback capabilities
        • {scenario} when updating a row in a global temporary table and rolling back the transaction [12]
          • there's no way to undo this change unless was logged the ‘before value’ of the update [12]
          • there is no need to log the ‘after value’ of the update because this value is only needed when the transaction needs to be ‘redone’ which happens during database recovery [12]
        • {scenario} during an Insert operation
          • only redo information is needed
          • when insert a row in say a global temporary table, the actual ‘row value’ is not logged because SQL Server does not need the ‘row’ to undo except it needs to set the offsets within the page appropriately or if this insert had caused a new page to be allocate, de-allocate the page [12]
            • only redo information is needed
          • not all objects in TempDB are subject to logging [18]
    • {characteristic} critical database
      • ⇒ needs to be configured adequately 
      •  has the highest level of create and drop actions [5]
      • under high stress the allocation pages, syscolumns and sysobjects can become bottlenecks [5]
      • workers use tempdb like any other database
        • any worker can issue I/O to and from tempdb as needed [5]
    • if the tempdb database cannot be created, SQL Server will not start [7]
      • {recommendation} start SQL Server by using the (-f) startup parameter
  • files
    • data file:
      • {default} dempdb.mdf
    • log file
      • {default} templog.ldf
  • {operation} changing the tempdb database
    • {step} use ALTER database statement and MODIFY FILE clause to change the physical file names of each file in the tempdb database to refer to the new physical location [7]
    • {step} stop and restart SQL Server
  • {operation} create file group
    • only one file group for data and one file group for logs are allowed.
    • {default} the number of files is set to 1
    • can be a high contention point for internal tracking structures
      •  often it's better to create multiple files for tempdb
    • multiple files can be created for each file group
      • {recommendation} set the number of files to match the number of CPUs that are configured for the instance
        • ⇐ if the number of logical processors is less than or equal to eight [1]
          • otherwise, use eight data files [1]
          • contention is still observed, increase the number of data files by multiples of four until the contention decreases to acceptable levels, or make changes to the workload [1]
        • ⇐ it’s not imperative

      • {scenario}add a file per CPU (aka multiple files)
        • allows the file group manager to 
          • pick the next file 
            • there are some optimizations here for tempdb to avoid contention and skip to the next file when under contention [4]
          •  then use the target location to start locating free space [4]                
        • ⇐ SQL Server creates a logical scheduler for each of the CPUs presented to it (logical or physical)
          • allows each of the logical schedulers to loosely align with a file [4]
            • since can be only 1 active worker per scheduler this allows each worker to have its own tempdb file (at that instant) and avoid allocation contention  [4]        
        • internal latching and other activities achieve better separation (per file) 
          • ⇒ the workers don’t cause a resource contention on a single tempdb file
            • rare type of contention [5]
        • {misconception}
          • leads to new I/O threads per file [5]       
      • {scenario} add more data files 
        • may help to solve potential performance problems that are due to I/O operations. 
        • helps to avoid a latch contention on allocation pages
          •  manifested as a UP-latch)
      • {downside} having too many files 
        • increases the cost of file switching
        • requires more IAM pages
        • increases the manageability overhead. 
      • {recommendation} configure the size of the files to be equal
        • ⇐ to better use the allocation mechanism (proportional fill),
      • {recommendation} create tempdb files striped across fast disks 
    • the size of the tempdb affect the performance of the database 
    • {event} server restarts
      • the tempdb file size is reset to the configured value (the default is 8 MB).
    • {feature} auto grow 
      • temporary 
        • ⇐  unlike other types of databases)
      • when the tempdb grows, all transactional activity may come to an halt 
        • because TempDB is used by most operations and these activities will get blocked until more disk space gets allocated to the TempDB [12]
        • {recommendation} pre-allocate the size of the TempDB that matches the needs of workload [12]
      • {recommendation} should only be used more for exceptions rather than as a strategy [12]
    • transactions
      • lose the durability attribute
      • can be rolled back
        • there is no need to REDO them because the contents of tempdb do not persist across server restarts
          • because the transaction log does not need to be flushed, transactions are committed faster in tempdb than in user databases. In tempdb, transactions 
        • most internal operations on tempdb do not generate log records because there is no need to roll back
    • {restriction} add filegroups [1]
    • {restriction} remove the primary filegroup, primary data file, or log file [1]
    • {restriction} renaming the database or primary filegroup [1]
    • {restriction} back up or restore the database [1]
    • {restriction} change collation [1]
      • the default collation is the server collation [1]
    • {restriction} change the database owner [1]
      • owned by sa
    • {restriction} create a database snapshot [1]
    • {restriction} drop the database [1]
    • {restriction} drop the quest user from the database [1]
    • {restriction} enable CDC
    • {restriction} participate in database mirroring [1]
    • {restriction} can only be configured in the simple recovery model
      • ⇒ the value can’t be changed
    • {restriction} auto shrink is not allowed for tempdb
      • database shrink and file shrink capabilities are limited
        •  because many of the hidden objects stored in tempdb cannot be moved by shrink operations
    • {restriction} the database CHECKSUM option cannot be enabled
    • {restriction} database snapshots cannot be created
    • {restriction} DBCC CHECKALLOC and DBCC CHECKCATALOG are not supported.
      • only offline checking for DBCC CHECKTABLE is performed
        • ⇒ TAB-S lock is needed
        • here are internal consistency checks that occur when tempdb is in use
          • if the checks fail, the user connection is broken and the tempdb space used by the connection is freed. 
    • {restriction} set the database to OFFLINE
    • {restriction} set the database or primary filegroup to READ_ONLY [1]
  • used by
    • {feature} query
    • {feature} temporary tables
    • {feature} table variables
    • {feature} table-valued functions
    • {feature} user-defined functions
    • {feature} online/offline index creation
    • {feature} triggers 
    • {feature} cursors
    • {feature} RCSI
    • {feature} MARS
    • {feature} DBCC CHECK
    • {feature} LOB parameters 
    • {feature} Service Broker and event notification
    • {feature} XML and LOB variable
    • {feature} query notifications
    • {feature} database mail
  • used to store
    • user objects
      • objects explicitly created
        • user objects that can be created in a user database can also be created in tempdb [1]
          • {limitation} there's no durability guarantee [1]
          • {limitation} dropped when the Database Engine instance restarts [1]
      • {type} global and local temporary tables 
        • correspond to ## and # tables and table variables created explicitly by the application [12]
          • REDO information is not logged
      • {type} indexes on global and local temporary tables
      • {type} table variables
      • {type} tables returned in table-valued functions
      • {type} cursors
      • improved caching for temporary objects
    • temp objects
      • only cached when none of the following conditions is violated: [12]
        • named constraints are not created
      • DDL statements that affect the table are not run after the temp table has been created
        • e.g. CREATE INDEX or CREATE STATISTICS statements
      • not created by using dynamic SQ
        • e.g. sp_executesql N'create table #t(a int)'
    • internal objects
      • objects created internally by SQL Server
        • each object uses a minimum of nine pages [1]
          • an IAM page and an eight-page extent [1]
      • {type} work tables
        • store intermediate results for spools, cursors, sorts, and temporary LOB storage [1]
      • {type} work files
        • used for for hash join or hash aggregate operations [1]
      • {type} intermediate sort results
        • used for operations 
          • creating or rebuilding indexes
            • if SORT_IN_TEMPDB is specified
          • certain GROUP BY, ORDER BY
          • UNION queries
      • {restriction} applications cannot directly insert into or delete rows from them
      • {restriction} metadata is stored in memory
      • {restriction} metadata does not appear in system catalog views such as sys.all_objects.
      • {restriction} considered to be hidden objects
      • {restriction} updates to them do not generate log records
      • {restriction} page allocations do not generate log records unless on a sort unit. If the statement fails, these objects are deallocated. 
      • each object occupies at least nine pages (one IAM page and eight data pages) in tempdb
      • used
        • to store intermediate runs for sort
        • to store intermediate results for hash joins and hash aggregates.
        • to store XML variables or other LOB data type variables
          • e.g. text, image, ntext, varchar(max), varbinary(max), and all others
        • by queries that need a spool to store intermediate results
        • by keyset cursors to store the keys
        • by static cursors to store a query result
        • by Service Broker to store messages in transit
        • by INSTEAD OF triggers to store data for internal processing
        • by any feature that uses the above mentioned operations 
    • version stores
      • collections of data pages that hold the data rows that support row versioning [1]
      • {type} common version store
      • {type} online index build version store
  • {} storage
    • I/O characteristics
      • {requirement} read after rights
        • {def} the ability of the subsystem to service read requests with the latest data image when the read is issued after any write is successfully completed [7]
      • {recommendation} writer ordering
        • {def} the ability of the subsystem to maintain the correct order of write operations [7]
      • {recommendation} torn I/O prevention
        • {def} the ability of the system to avoid splitting individual I/O requests [7]
      • {requirement} physical sector alignment and size
        • devices are required to support sector attributes permitting SQL Server to perform writes on physical sector-aligned boundaries and in multiples of the sector size [7]
    • can be put on specialty systems
      • ⇐  they can’t be used for other databases
      • RAM disks
        • non-durable media
        • support double RAM cache
          • one in the buffer pool and one on the RAM disk
            •  directly takes away from the buffer pool’s total possible size and generally decreases the performance [7]
        • give up RAM
          • implementations of RAM disks and RAM-based files caches
      • solid state
        • high speed subsystems
        • {recommendation} confirm with the product vendor to guarantee full compliance with SQL Server I/O needs [7]
    • sort errors were frequently solved by moving tempdb to a non-caching local drive or by disabling the read caching mechanism [8]
  • {feature} logging optimization 
    • avoids logging the "after value" in certain log records in tempdb
      • when an UPDATE operation is performed without this optimization, the before and after values of the data are recorded in the log file 
    • can significantly reduce the size of the tempdb log as well as reduce the amount of I/O traffic on the tempdb log device
  • {feature} instant data file initialization 
    • isn’t zeroing out the NTFS file when the file is created or when the size of the file is increased
    • {benefit} minimizes overhead significantly when tempdb needs to auto grow. 
      • without this, auto grow could take a long time and lead to application timeout.
    • reduces the impact of database creation [5]
      • because zero’s don’t have to be stamped in all bytes of a database file, only the log files [5]
        • reduces the gain from using multiple threads during database creation. [5]
  • {feature} proportional fill optimization
    • reduces UP latch contention in tempdb
    • when there are multiple data files in tempdb, each file is filled in proportion to the free space that is available in the file so that all of the files fill up at about the same time
      •  accomplished by removing a latch that was taken during proportional fill.
  • {feature} deferred drop in tempdb
    • when a large temporary table is dropped by an application, it is handled by a background task and the application does not have to wait
      • ⇒ faster response time to applications.
  • {feature} worktable caching 
    • {improvement} when a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated
      • ⇐ in addition, the first nine pages for the work table are kept.
  • {feature} caches temporary objects
    • when a table-valued functions, table variables, or local temporary tables are used in a stored procedure, function, or trigger, the frequent drop and create of these temporary objects can be time consuming
      •  this can cause contentions on tempdb system catalog tables and allocation pages 
Previous Post <<||>> Next Post

References
[1] Microsoft Learn (2024) SQL Server: Tempdb database [link]
[2] Wei Xiao et al (2006) Working with tempdb in SQL Server 2005
[4] CSS SQL Server Engineers (2009) SQL Server TempDB – Number of Files – The Raw Truth [link]
[5] SQL Server Support Blog (2007) SQL Server Urban Legends Discussed [link]
[6] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[7] Microsoft Learn (2023) SQL Server: Microsoft SQL Server I/O subsystem requirements for the tempdb database [link]
[8] Microsoft Learn (2023) SQL Server: SQL Server diagnostics detects unreported I/O problems due to stale reads or lost writes [link]
[12a] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: Simple Example), by Sunil Agarwal [link]
[12b] SQL Server Blog (2008) Managing TempDB in SQL Server: TempDB Basics (Version Store: logical structure), by Sunil Agarwal [link]
[18] Simple Talk (2020) Temporary Tables in SQL Server, by Phil Factor [link]
[19] Microsoft Learn (2023) SQL Server: Recommendations to reduce allocation contention in SQL Server tempdb database [link]

Acronyms:
DB - database
DDL - Data Definition Language
I/O - Input/Output
LOB - large object
MARS - Multiple Active Result Sets
NTFS - New Technology File System
RAM - Random-Access Memory
RCSI - Read Committed Snapshot Isolation

07 April 2025

🏭🗒️Microsoft Fabric: User Data Functions (UDFs) in Python [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 7-Apr-2025

[Microsoft Fabric] User Data Functions (UDFs)
  • {def} a platform that allows customers to host and run custom logic on Fabric from different types of items and data sources [1]
  • empower data developers to write custom logic and embed it into their Fabric ecosystem [1]
    • the logic can include internal algorithms and libraries [1]
    • {benefit} allows to handle complex transformations, optimizing performance, and integrating diverse data sources beyond the capabilities of low-code tools [5]
      • via public libraries from PyPI
      • via native Fabric integrations to 
        • connect to Fabric data sources
          • e.g. warehouse, lakehouse or SQL Database
        • invoke functions from other Fabric items
          • e.g. notebooks, Power BI reports, data pipelines
      • via edit functions directly in the Fabric portal [1]
        • via in-browser tooling, VS Code extension
      • supports the Python 3.11 runtime [1]
    • {goal} reusability
      • by creating libraries of standardized functionality [1]
        • can be used in many solutions across the organization [1]
    • {goal} customization
      • the applications are tailored to customers’ needs [1]
    • {goal} encapsulation
      • functions can perform various tasks to build sophisticated workflows [1]
    • {goal} external connectivity
      • data functions can be invoked from external client applications using a REST endpoint [1]
      • allows integrations with external systems [1]
  • {feature} programming model
    • SDK that provides the necessary functionality to author and publish runnable functions in Fabric [4]
    • {benefit} allows to seamlessly integrate with other items in the Fabric ecosystem [4]
      • e.g. Fabric data sources
    • {concept} user data functions item 
      • contains one or many functions that can be invoked from the Fabric portal using the provided REST endpoint [4]
        • from another Fabric item, or from an external application 
      • each function is a method in Python script that allows passing parameters and returning an output to the invoker [4]
    • {component} fabric.functions library 
      • provides the code needed to create user data functions in Python [4]
      • imported in the template by default [4]
    • {method} fn.UserDataFunctions() 
      • provides the execution context [4]
      • added at the beginning of the code file in all new user data functions items, before any function definitions [4]
    • {concept} functions
      • every function is identified with a @udf.function() decorator
        • can be invoked individually from the portal or an external invoker [4]
        • functions without the decorator can be invoked directly [4]
  • {feature} invocation logs
    • function invocations are logged
    • {benefit} allows to check the status or perform debugging  [3]
    • {limitation} can take few minutes to appear [3]
      • {recommendation} refresh the page after a few minutes [3]
    • {limitation} daily ingestion limit: 250 MB [3]
      • reset the next day 
        • a new log is made available [3]
    • {limitation} logs are sampled while preserving a statistically correct analysis of application data [3]
      • the sampling is done by User data functions to reduce the volume of logs ingested [3]
      • if some logs are partially missing, it might be because of sampling [3]
    • {limitation} supported log types: information, error, warning, trace [3]
  • {feature} Manage connections
    • {limitation} only supports connecting to Fabric-native data sources [2]
  • {limitation} only available in a subset of Fabric regions [2]
  • {limitation} editable by the owner only [2]
    • can only be modified and published by the user who is the owner of the User Data Functions Fabric item [2]
  • {limitation} adds further keywords to the list of reserved keywords
    • namely: req, context, reqInvocationId
    • can't be used as parameter names or function names [2]

References:
[1] Microsoft Learn (2025) Microsoft Fabric: What is Fabric User data functions (Preview)? [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Service details and limitations of Fabric User Data Functions [link]
[3] Microsoft Learn (2025) Microsoft Fabric: User data functions invocation logs (Preview) [link]
[4] Microsoft Learn (2025) Microsoft Fabric: Fabric User data functions programming model overview (Preview) [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Fabric User Data Functions (Preview) [link]
[6] Microsoft Learn (2025) Microsoft Fabric: Use the Functions activity to run Fabric user data functions and Azure Functions [link

Resources:
[R1] Microsoft Fabric Updates Blog (2025) Utilize User Data Functions in Data pipelines with the Functions activity (Preview) [link

Acronyms:
PyPI - Python Package Index
REST - Representational State Transfer
SDK - Software Development Kit
UDF - User Data Function
VS - Visual Studio

06 April 2025

🏭🗒️Microsoft Fabric: Query Optimizer in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 6-Apr-2025

[Microsoft Fabric] Hints in Warehouse
  • {def} keywords that users can add to SQL statements to provide additional information or instructions to the query optimizer [2]
    • options or strategies specified for enforcement by the SQL Server query processor [1]
      • applicable to SELECT, INSERT, UPDATE, or DELETE statements [1]
  • {benefit} help improve the performance, scalability, or consistency of queries by overriding the default behavior of the query optimizer [2]
  • {type} join hints
    • specify what join strategy/algorithm used between two tables [3]
      • improves the performance of queries that involve large or complex joins [2]
      • specified in the FROM clause of a query [1]
      • if a join hint is specified for any two tables, the query optimizer automatically enforces the join order for all joined tables in the query [3]
        • based on the position of the ON keywords [3]
          • when a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order [3]
    • looping
      • via LOOP JOIN
      • {restriction} can't be specified together with RIGHT or FULL as a join type [3]
    • hashing
      • via HASH JOIN
    • merging
      • via MERGE JOIN
    • REPLICATE
      • causes a broadcast move operation
        • a specific table to be replicated across all distribution nodes [2]
      • with INNER or LEFT` join
        • the broadcast move operation will replicate the right side of the join to all nodes [2]
      • with RIGHT join
        • the broadcast move operation will replicate the left side of the join to all nodes [2]
      • with FULL` join
        • an estimated plan cannot be created [2]
    • REDISTRIBUTE [(colsCount)]
      • ensures two data sources are distributed based on JOIN clause columns [2]
      • handles multiple join conditions, specified by the first n columns in both tables, where n is the column_count argument [2]
      • redistributing data optimizes query performance by evenly spreading data across nodes during intermediate steps of execution [2]
      • the (columns_count) argument is only supported in MF warehouse [2]
  • {type} query hint
    • specify that the indicated hints are used in the scope of a query [3]
      • affect all operators in the statement
        • [UNION only the last query involving a UNION operation can have the OPTION clause [3]
      • specified as part of the OPTION clause [3]
      • Error 8622 occurs if one or more query hints cause the Query Optimizer not to generate a valid plan [3]
    • used via the OPTION clause at the end of a query [2]
      • followed by the name of the query hint and its optional parameters in parentheses [2]
      • multiple hints can be used in the same query, separated by commas
        • e.g. FORCE ORDER and MAX_GRANT_PERCENT
          • instruct the QO to preserve the join order specified in the query and to limit the memory grant to 20 percent of the available memory
    • {hint} HASH GROUP
      • specifies that the QO should use a hash-based algorithm for the GROUP BY operation [2]
      • {benefit} can improve the performance of queries that involve large or complex grouping sets [2]
    • {hint} ORDER GROUP
      • specifies that the QO should use a sort-based algorithm for the GROUP BY operation [2]
      • {benefit} can improve the performance of queries that involve small or simple grouping sets [2]
    • {hint} MERGE UNION
      • specifies that the QO should use a merge-based algorithm for the UNION or UNION ALL operation [2]
      • {benefit} can improve the performance of queries that involve sorted inputs [2]
    • {hint} HASH UNION
      • specifies that the query optimizer should use a hash-based algorithm for the UNION or UNION ALL operation [2]
      • {benefit} can improve the performance of queries that involve unsorted or large inputs [2]
    • {hint} CONCAT UNION
      • specifies that the QO should use a concatenation-based algorithm for the UNION or UNION ALL operation [2]
      • {benefit} can improves the performance of queries that involve distinct or small inputs[2]
    • {hint} FORCE ORDER
      • specifies that the QO should preserve the join order specified in the query [2]
      • {benefit} can improves the performance or consistency of queries that involve complex join conditions or hints [2]
    • {hint} FORCE SINGLE NODE PLAN/FORCE DISTRIBUTED PLAN
      • allows to choose whether to force a single node plan or a distributed plan for query’s execution [2]
    • {hint} USE HINT
      • adds one or more extra hints to the query processor, where the hints are specified with a hint name inside single quotation marks inside OPTION clause [2] OPTION(USE HINT(‘ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES’)) 
      • used with several hint names, changing the behavior of CE derivation
        • {hint name} ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES
          • applies when calculating cardinality estimates for AND predicates for filters [2]
          • MF assumes full correlation among filters when a high level of underestimation on AND predicates for filters is observed [2]
          • [SQL Server] equivalent TF 4137 [4]
        • {hint name} ASSUME_FULL_INDEPENDENCE_FOR_FILTER_ESTIMATES
          • applies when calculating cardinality estimates for AND predicates for filters  [2]
          • MF assumes full independence among filters [2]
            • if a high level of overestimation on AND predicates for filters is observed, this hint can help produce a better estimate [2]
        • {hint name} ASSUME_PARTIAL_CORRELATION_FOR_FILTER_ESTIMATES
          • applies when calculating cardinality estimates for AND predicates for filters [2]
          • {default} MF assumes partial correlation among filters [2]
            • ⇐ it is unlikely that this hint will help improve the estimates [2]
        • {hint name} ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS
          • applies when calculating cardinality estimate for joins [2]
          • uses Simple Containment assumption instead of the default Base Containment assumption [2]
          • [SQL Server] equivalent TF 9476 [4]
  • {type} table hints
    • none enforced currently

References:
[1] Microsoft Learn (2025) SQL: Hints (T-SQL) [link]
[2] Microsoft Fabric Updates Blog (2025) Hints in Fabric Data Warehouse [link]
[3] Microsoft Learn (2025) SQL: OPTION clause (T-SQL) [link]
[4] Microsoft Support (2016) KB3189813 - Update introduces USE HINT query hint argument in SQL Server 2016 [link]

Resources:
[R1] Microsoft Learn (2025) SQL Server: Query hints (T-SQL) [link]
[R2] Most Useful Query Hints [link]

Acronyms:
MF - Microsoft Fabric
TF - Trace Flag
QO - Query Optimizer
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.