19 March 2025

💠🛠️🗒️SQL Server: Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 19-Mar-2024

[SQL Server 2005] View (aka virtual table)

  • {def} a database object that encapsulates a SQL statement and that can be used as a virtual table in further SQL statements
    • cannot be executed by itself
      •  it must be used within a query [15]
    • doesn't store any data 
      • except index views
      • data is dynamically produced from the underlying table when the view is used [32]
        • views depend on the underlying tables and act like a filter on the underlying tables [32]
    • used just like regular tables without incurring additional cost
      • unless the view is indexed [25]
    • turning a query into a view
      • remove the ORDER BY clause
      • assure there are no name duplicates
      • assure that each column has a name
    • projected columns
      • columns included in the view 
    • view’s column list 
      • renames every output column just as if every column had those alias names in the SELECT statement
      • a view is more self-documenting if the column names of the view are specified in the SELECT statement and not listed separately in the view [27]
    • {restriction} sorting is not allowed in a view
      •  unless the view includes a TOP predicate 
        • ORDER BY clause serves only to define which rows qualify for the TOP predicate [15]
          • the only way to logically guarantee sorted results is to define the ORDER BY clause in the executing query [15]
        • [SQL Server 2005] had a bug in the Query Optimizer that would enable an ORDER BY in a view using a top 100 percent predicate [15]
          • the behavior was never documented or officially supported [15]
      • OFFSET FETCH clause
    • {restriction} parameters can’t be passed to a view  [100]
      • {alternative} use an inline table-valued function 
    • {restriction} cannot reference a variable inside the SELECT statement [100]
    • {restriction} cannot create a table, whether permanent or temporary
      • ⇒ cannot use the SELECT/INTO syntax in a view
    • {restriction} can reference only permanent tables
      • ⇒  cannot reference a temporary table [100]
    • {benefit} present the correct fields to the user
    • {benefit} enforce security 
        • by specifying 
          • only needed columns
            • projects a predefined set of columns [15]
            • hides sensitive, irrelevant, or confusing columns [15]
            • should be used in parallel with SQL Server–enforced security [15]
          • only needed records
        • by allowing users access to the view without the need to give access to the used tables
          • grant users read permission from only the views, and restrict access to the physical tables [15]
    • {benefit} maintainability
    • {benefit} provides a level of abstraction
      • hides the complexity of the underlying data structures 
      • encapsulates (business)logic
      • denormalize or flatten complex joins 
      • can consolidate data across databases/servers
      • can be used as single version of truth
    • {benefit} allow changing data in the base tables
    • {downside} layers of nested views require needless overhead for views’ understanding
    • {downside} single-purpose views quickly become obsolete and clutter the database [15]
    • {downside} complex views are perceived as having poor performance [15]
    • {best practice} use generic/standard naming conventions
    • {best practice} use aliases for cryptic/recurring column names
    • {best practice} consider only the requested columns
    • {best practice} group specific purpose view under own schema 
    • {best practice} avoid hardcoding values 
    • {best practice} use views for column-level security together with SQL Server–enforced security
    • {best practice} limit views to ad-hoc queries and reports
      • for extensibility and control [15]
      •  performance isn’t the reason [15]
    • {poor practices} create views for single-purpose queries (aka one time requests)
    • {operation} create a view
    • {operation} drop a view
    • {operation} alter a view
    • {operation} select data
    • {operation} update data
      • unless the view is a simple single table view, it’s difficult to update the underlying data through the view [15]
    • {type} inline views
      • exist only during the execution of a query [32]
      • simplify the development of a one-time query [32]
        • allows creating queries in steps
          • enables troubleshooting 
      • can replace inline UDFs
      • alternatives
        • inline UDFs
        • temporary tables
    • {type} indexed views
      • materialize the data, storing the results of the view in a clustered index on disk [15]
      • similar to a covering index 
        • but with greater control 
          • can include data from multiple data sources [15]
          • no need to include the clustered index keys [15]
        • designing an indexed view is more like designing an indexing structure than creating a view [15]
      • can cause deadlock when two or more of the participating tables is updated/inserted/deleted from two or more sessions in parallel such that they block each other causing a deadlock scenario [29]
    • {type} compatibility views
      • allow accessing a subset of the SQL Server 2000 system tables
        • don’t contain any metadata related to features added after
      • views have the same names as many of the system tables in previous version, as well as the same column names
        • ⇒ any code that uses the SQL Server 2000 system tables won’t break [16]
        • there’s no guarantee that will be returned exactly the same results as the ones from the corresponding tables in SQL Server 2000 [16]
      • accessible from any database
      • hidden in the resource database
        • e.g. sysobjects, sysindexes, sysusers, sysdatabases
    • {type} [SQL Server 2015] catalog views
      • general interface to the persisted system metadata
      • built on an inheritance model
        • ⇒  no need to redefine internally sets of attributes common to many objects
      • available over sys schema
        • must be included in object’s reference
      • some of the names are easy to remember because they are similar to the SQL Server 2000 system table names [16]
      • the columns displayed are very different from the columns in the compatibility views
      • some metadata appears only in the master database 
        • keeps track of system-wide data (e.g. databases and logins)
        • other metadata is available in every database (e.g. objects and permissions)
        • metadata appearing only in the msdb database isn’t available through catalog views but is still available in system tables, in the schema dbo (e.g. backup and restore, replication, Database Maintenance Plans, Integration Services, log shipping, and SQL Server Agent)
    • {type} partitioned views 
      • allow the data in a large table to be split into smaller member tables
        • the data is partitioned between the member tables based on ranges of data values in one of the columns [4]
        • the data ranges for each member table are defined in a CHECK constraint specified on the partitioning column [4]
        • a view that uses UNION ALL to combine selects of all the member tables into a single result set is then defined [4]
        • when SELECT statements referencing the view specify a search condition on the partition column, the query optimizer uses the CHECK constraint definitions to determine which member table contains the rows [4]
    • {type} distributed partition views (DPV)
      • local partitioned views
        • a single table is horizontally split into multiple tables, usually all have the same structure [30]
      • cross database partitioned views 
        • tables are split among different databases on the same server instance
      • distributed (across server or instance) partitioned views
        • tables participating in the view reside in different databases which reside ondifferent servers or different instances
    • {type} nested views
      • views referred by other views [15]
      • can lead to an abstraction layer with nested views several layers deep 
        • too difficult to diagnose and maintain [15]
    • {type} updatable view
      • view that allows updating the underlying tables
        • only one table may be updated
        • if the view includes joins, then the UPDATE statement that references the view must change columns in only one table [15]
      • typically not a recommended solution for application design
      • WITH CHECK OPTION causes the WHERE clause of the view to check the data being inserted or updated through the view in addition to the data being retrieved [15]
        • it makes the WHERE clause a two-way restriction [15]
          • ⇒  can protect the data from undesired inserts and updates [15]
        • ⇒  useful when the view should limit inserts and updates with the same restrictions applied to the WHERE clause [15]
        • when CHECK OPTION isn’t use, records inserted in the view that don’t match the WHERE constraints will disappear (aka disappearing rows) [15]
    • {type} non-updatable views
      • views that don’t allow updating the underlying tables
      • {workaround} build an INSTEAD OF trigger that inspects the modified data and then performs a legal UPDATE operation based on that data [15]
    • {type} horizontally positioned views.
      • used s to enforce row-level security with the help of a WITH CHECK option
        • {downside} has a high maintenance cost [15] 
        •  {alternative} row-level security can be designed using user-access tables and stored procedures [15]
    • {type} schema-bound views
      • the SELECT statement must include the schema name for any referenced objects [15]
        • SELECT * (all columns) is not permitted [15]
    • {type} subscription views 
      • a view used to export Master Data Services data to subscribing systems
Previous Post <<||>> Next Post

References:
[4] Microsoft (2013) SQL Server 2000 Documentation
[15] Adam Jorgensen et al (2012) Microsoft® SQL Server® 2012 Bible
[16] Bob Beauchemin et al (2012) Microsoft SQL Server 2012 Internals
[25] Basit A Masood-Al-Farooq et al (2014) SQL Server 2014 Development Essentials: Design, implement, and deliver a successful database solution with Microsoft SQL Server 2014
[30] Kevin Cox (2007) Distributed Partitioned Views / Federated Databases: Lessons Learned
[32] Sikha S Bagui & Richard W Earp (2006) Learning SQL on SQL Server 2005
[100] Itzik Ben-Gan et al (2012) Exam 70-461: Querying Microsoft SQL Server 201

Acronyms:
DPV - Distributed Partition Views
UDF - User-Defined Function

💠🛠️🗒️SQL Server: Stored procedures (Notes)

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 19-Mar-2024

[SQL Server 2005] Stored procedure

  • {def} a database object that encapsulates one or more statements and compiled when used
    • is a saved batch
      • whatever a batch can do, a stored procedure can do
    • {characteristic} abstraction layer
      • provides the means of abstracting/decoupling a database [1]
    • {characteristic} performant
      •  the fastest possible code, when well-written
        • keeps the execution of data-centric code close to the data
      • easier to index tune a database with stored procedures
    • {characteristics} usability
      • easier to write, consume and troubleshoot stored procedures
      • less likely to contain data integrity errors
      • easier to unit test, than ad-hoc SQL code
    • {characteristic} secure
      • {best practice} locking down the tables and providing access only through stored procedures is a standard for database development [1]
      • minimizes the risk of sql injection attacks
      • provides an alternative for passing a dataset to SQL Server
    • can have zero or more input parameters
    • can have zero or more output parameters
    • highly dependent on the objects it calls
    • [SQL Server 2008] provides enhanced ways to view these dependencies
      • managed by means of the DDL commands
  • {operation} create stored procedure
    • via CREATE STORED PROCEDURE <schema.name> 
    • CREATE must be the first command in a batch;
    • the termination of the batch ends the creation of the stored procedure
    • when created, its text is saved in a system table
      • like other database objects
      • the text is only stored as definition 
        • ⇐  the text is not stored for the execution of the stored procedure
    • {best practice}
      • never use "sp_" to prefix the name of a stored procedure
      • reserved for system stored procedures
    • {best practice} use a standard prefix for the stored procedure name (e.g. usp, Proc)
      • it helps identify an object as a stored procedure when reviewing and troubleshooting code [15]
    • {best practice} always use a two-part naming convention
      • ensures that the stored procedure is added to the appropriate schema [15]
    • {best practice} use descriptive names
    • {best practice} implement error handling
      • syntax and logic errors should be gracefully handled, with meaningful information sent back to the calling application [15]
  • {operation} alter stored procedure
    • replaces the entire existing stored procedure with new code
    • preferable to dropping and recreating it
      • because the latter method removes any permissions [1]
  • {operation} drop stored procedure
    •  removes it from the database
  • {operation} execute stored procedure
    • via EXECUTE or EXEC
      • can be executed individually also without EXECUTE when the stored procedure is the first line of a batch
  • {concept}  compilation
    • automatic process that takes place the first time the code is executed 
    • {option} WITH ENCRYPTION
      • obfuscates the code in the object 
        • is not to prevent a user from reading the code 
      • the stored procedure text is not directly readable
      • there is no routine to hide the code
        • SQL Server applies a bitwise OR to the code in the object
      • anyone with VIEW DEFINITION authority on an object can see the code though
      • carried from early versions of SQL Server [2]
      • {best practice} there must be a compelling and carefully considered justification for encrypting a stored procedure [15]
        • e.g.  third-party software
  • {type} system stored procedures
    • stored in master database
  • {type} extended stored procedures
    • routines residing in DLLs that function similarly to regular stored procedures [33]
      • usually written in C or C++
    • receive parameters and return results via SQL Server's Open Data Services API [33]
    • reside in the master database [33]
    • run within the SQL Server process space [33]
    • aren't automatically located in the master database [33]
    • don't assume the context of the current database when executed [33]
    • fully qualify the reference to execute an extended procedure from a database other than the master [33]
      • {workaround} wrapping the extended stored procedure into a system stored procedure
        •  can be called from any database without requiring the master prefix
        • technique used with a number of SQL Server's own extended procedures
    • {type} internal stored procedures 
      • system-supplied stored procedures implemented internally by SQL Server [33]
      • have stubs in master..sysobjects
      • are neither true system procedures nor extended procedures [33]
        • listed as extended procedures, but they are actually implemented internally by the server [33]
      • cannot be dropped or replaced with updated DLLs
        • normally this happens when a service pack is applied [33]
      • examples: 
        • sp_executesql
        • sp_xml_preparedocument
        • most of the sp_cursor routines
        • sp_reset_connection
    • {type} user-defined stored procedures
    • {type} remote stored procedures
      • may only be remotely called
        • ⇐  may not be remotely created [15]
      • require that the remote server be a linked server
        • namely, a four-part name reference 
          • via EXECUTE Server.Database.Schema.StoredProcedureName;
      • a distributed query
        • OpenQuery(LinkedServerName, 'EXECUTE Schema.StoredProcedureName');
    • {type} recursive stored procedures 
      • stored procedures that call themselves 
      • perform numeric computations that lend themselves to repetitive evaluation by the same processing steps [33]
      • calls can be nested up to 32 levels deep
    • {type} nested stored procedure
      • calls can be nested up to 32 levels deep
  • {advantage} execution plan retention and reuse
  • {advantage} query auto-parameterization
  • {advantage} allow encapsulation of business rules and policies
  • {advantage} allow application modularization
  • {advantage} allow sharing of application logic between applications
  • {advantage} allow access to database objects that is both secure and uniform
  • {advantage} allow consistent, safe data modification
  • {advantage} allow network bandwidth conservation
  • {advantage} support for automatic execution at system start-up
  • {limitation} cannot be schema bound
    • [SQL Server 2012] {feature} Result Sets 
      • can guarantee the structure of the returned results at run time
  • {myth} a stored procedure provide a performance benefit because the execution plan is cached and stored for reuse
    • [SQL Server 2000] all execution plans are cached, regardless of whether they’re the result of inline T-SQL or a stored procedure call
    • {corollary} all T-SQL must be encapsulated into stored procedures
  • {myth} stored procedures are hard to manage

Previous Post <<||>> Next Post

References:
[1] Paul Nielsen et al (2009 SQL Server 2008 Bible
[2] Tobias Thernström et al (2009) MCTS Exam 70-433 Microsoft SQL Server 2008 – Database Development. Self-Paced Training Kit
[7] Michael Lee & Gentry Bieker (2008) Mastering SQL Server® 2008
[8] Joseph Sack (2008) SQL Server 2008 Transact-SQL Recipes
[15] Adam Jorgensen et al (2012)  Microsoft® SQL Server® 2012 Bible
[26] Patrick LeBlanc (2013) Microsoft SQL Server 2012: Step by Step, Microsoft Press
[33] Ken Henderson (2001) The Guru's Guide to SQL Server™ Stored Procedures, XML, and HTML
[42] Dušan Petkovic (2008) Microsoft® SQL Server™ 2008: A Beginner’s Guide
[51] Michael Lee & Gentry Bieker (2009) Mastering SQL Server® 2008
[64] Robert D Schneider and Darril Gibson (2008) Microsoft® SQL Server® 2008 All-In-One Desk Reference for Dummies
[100] Itzik Ben-Gan et al (2012) Exam 70-461: Querying Microsoft SQL Server 2012

Acronyms:
DDL - Data Definition Language
DLL - Dynamic Link Library

💠🛠️🗒️SQL Server: Statistics [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 19-Mar-2024

[SQL Server 2005] Statistics

  • {def} objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view [2]
    • used by query optimizer to estimate the cardinality (aka number of rows)  in the query result
    • metadata maintained about index keys and, optionally, nonindexed column values
      • implemented via statistics objects 
      • can be created over most types 
        • generally data types that support comparisons (such as >, =, and so on) support the creation of statistics [20]
      • the DBA is responsible for keeping the statistics objects up to date in the system [20]
      • {restriction} the combined width of all columns constituting a single statistics set must not be greater than 900 bytes [21]
    • statistics collected
      • time of the last statistics collection (inside STATBLOB) [21]
      • number of rows in the table or index (rows column in SYSINDEXES) [21]
      • number of pages occupied by the table or index (dpages column in SYSINDEXES) [21]
      • number of rows used to produce the histogram and density information (inside STATBLOB, described below) [21]
      • average key length (inside STATBLOB) [21]
      • histogram 
        • measures the frequency of occurrence for each distinct value in a data set [31]
        • computed by the query optimizer on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view [31]
          • when created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers [31]
          • it sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps [31]
            • each step includes a range of column values followed by an upper bound column value [31]
            • the range includes all possible column values between boundary values, excluding the boundary values themselves [21]
            • the lowest of the sorted column values is the upper boundary value for the first histogram step [31]
        • defines the histogram steps according to their statistical significance [31] 
          • uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values
            • the maximum number of steps is 200 [31]
            • the number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points [31]
    • stored only for the first column of a composite index
      • the most selective columns in a multicolumn index should be selected first [2]
        • the histogram will be more useful to the optimizer [2]
      • single column histogram [21]
      • splitting up composite indexes into multiple single-column indexes is sometimes advisable [2]
    • used to estimate the selectivity of nonequality selection predicates, joins, and other operators  [3]
    • contains a sampling of up to 200 values for the index's first key column
      • the values in a given column are sorted in ordered sequence
        • divided into up to 199 intervals 
          • so that the most statistically significant information is captured
          • in general, of nonequal size
  • {type} table-level statistics
    • statistics maintained on each table
    • include: 
      • number of rows in the table [8]
      • number of pages used by the table [8]
      • number of modifications made to the keys of the table since the last update to the statistics [8]
  • {type} index statistics 
    • created with the indexes
    • updated with fullscan when indexes are rebuild the index [13]
      • {exception} [SQL Server 2012] for partitioned indexes when the number of partitions >1000 it uses default sampling [13]
  • {type} column statistics (aka non-index statistics)
    • statistics on non-indexed columns
    • determine the likelihood that a given value might occur in a column [2]
      • gives the optimizer valuable information in determining how best to service a query [2]
        • allows the optimizer to estimate the number of rows that will qualify from a given table involved in a join [2]
          • allows to more accurately select join order [2]
      • if automatic creation or updating of statistics is disabled, the Query Optimizer returns a warning in the showplan output when compiling a query where it thinks it needs this information [20]
    • used by optimizer to provide histogram-type information for the other columns in a multicolumn index [2]
      • the more information the optimizer has about data, the better [2]
      • queries asking for data that is outside the bounds of the histogram will estimate 1 row and typically end up with a bad, serial plan [15]
    • automatic created 
      • when nonindexed column is queried while AUTO_CREATE_STATISTICS is enabled for the database
    • aren’t updated when an index is reorganized or rebuild [10]
      • {exception} [SQL Server 2005] indexes rebuilt with DBCC dbreindex  [10]
        • updates index statistics as well column statistics [10]
        • this feature will be removed in a future version of Microsoft SQL Server
  • {type} [SQL Server 2008] filtered statistics
    • useful when dealing with partitioned data or data that is wildly skewed due to wide ranging data or lots of nulls
    • scope defined with the help of a statistic filter
      • a condition that is evaluated to determine whether a row must be part of the filtered statistics
      • the predicate appears in the WHERE clause of the CREATE STATISTICS or CREATE INDEX statements (in the case when statistics are automatically created as a side effect of creating an index)
  • {type} custom statistics
    • {limitation}[SQL Server] not supported 
  • {type} temporary statistics
    • when statistics on a read-only database or read-only snapshot are missing or stale, the Database Engine creates and maintains temporary statistics in tempdb  [22]
      • a SQL Server restart causes all temporary statistics to disappear [22]
      • the statistics name is appended with the suffix _readonly_database_statistic 
        • to differentiate the temporary statistics from the permanent statistics [22]
        • reserved for statistics generated by SQL Server [2] 
        • scripts for the temporary statistics can be created and reproduced on a read-write database [22]
          • when scripted, SSMS changes the suffix of the statistics name from _readonly_database_statistic to _readonly_database_statistic_scripted [22]
    • {restriction} can be created and updated only by SQL Server [22]
      • can be deleted and monitored [22]
  • {operation} create statistics
    • auto-create (aka auto-create statistics)
      • on by default 
        • samples the rows by default 
          • {exception} when statistics are created as a by-product of index creation, then a full scan is used [3]
      • {exception} statistics may not be created
        • for tables where the cost of the plan execution would be lower than the statistics creation itself [21]
        • when the server is too busy [21]
          • e.g. too many outstanding compilations in progress [21]
    • manually 
      • via CREATE STATISTICS 
        • only generates the statistics for a given column or combination of columns [21]
        • {recommendation} keep the AUTO_CREATE_STATISTICS option on so that the query optimizer continues to routinely create single-column statistics for query predicate columns [22]
  • {operation} update statistics
    • covered by the same SQL Server Profiler event as statistics creation
    • ensures that queries compile with up-to-date statistics [22]
    • {recommendation} statistics should not be updated too frequently [22]
      • because there is a performance tradeoff between improving query plans and the time it takes to recompile queries [22]
        • tradeoffs depend from application to application [22]
    • triggered by the executions of either commands:
      • CREATE INDEX ... WITH DROP EXISTING
        • scans the whole data set
          • the index statistics are initially created without sampling
        • allows to set the sample size in the WITH clause either by specifying
          • FULLSCAN 
          • percentage of data to scan
          • interpreted as an approximation 
      • sp_createstats stored procedure
      • sp_updatestats stored procedure
      • DBCC DBREINDEX
        • rebuilds one or more indexes for a table in the specified database [21]
        • DBCC INDEXDEFRAG or ALTER INDEX REORGANIZE operations don’t update the statistics [22]
    • undocumented options
      • meant for testing and debugging purposes
        •  should never be used on production systems [29]
      • STATS_STREAM = stats_stream 
      • ROWCOUNT = numeric_constant 
        • incl PAGECOUNT, alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object [29]
          • read by the Query Optimizer when processing queries that access the table and/or index in question [29]
            •  cheat the Optimizer into thinking that a table or index is extremely large [29]
              • the content of the actual tables and indexes will remain intact [29]
      • PAGECOUNT = numeric contant 
    • auto-update (aka auto statistics update)
      • on by default 
        • samples the rows by default 
          • always performed by sampling the index or table [21]
      • triggered by either
        • query optimization 
        • execution of a compiled plan
      • involves only a subset of the columns referred to in the query 
      • occurs
        • before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF
        • asynchronously if AUTO_UPDATE_STATISTCS_ASYNC is ON
          • the query that triggered the update proceeds using the old statistics
            • provides more predictable query response time for some workloads [3]
              • particularly the workload with short running queries and very large tables [3]
      • it tracks changes to columns in the statistics
        • {limitation} it doesn’t track changes to columns in the predicate [3]
          • {recommendation} if there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes [3]
      • enable/disable statistics update 
        • database level
          • ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS OFF
          • {limitation} it’s not possible to override the database setting of OFF for auto update statistics by setting it ON at the statistics object level [3]
        • table level
          • NORECOMPUTE option of the UPDATE STATISTICS command 
          • CREATE STATISTICS command
          • sp_autostats
        • index
          • sp_autostats
        • statistics object
        • sp_autostats
      • [SQL Server 2005] asynchronous statistics update
        • allows the statistics update operation to be performed on a background thread in a different transaction context
          • avoids the repeating rollback issue [20]
          • the original query continues and uses out-of-date statistical information to compile the query and return it to be executed [20]
          • when the statistics are updated, plans based on those statistics objects are invalidated and are recompiled on their next use [20]
        • {command}
          • ALTER DATABASE... SET AUTO_UPDATE_STATISTICS_ASYNC {ON | OFF}
    • manually
      • {recommendation} don’t update statistics after index defragmentation 
        • update eventually only the column statistics
      • [system tables] it might be needed to update statistics also on system tables when many objects were created
    • triggers a recompilation of the queries [22]
      • {exception} when a plan is trivial [1]
        • won’t be generated a better or different plan [1]
      • {exception} when the plan is non-trivial but no row modifications since last statistics update [1]
        • no insert, delete or update since last statistics update [1]
        • manually free procedure cache in case is needed to generate a new plan [1]
      • schema change 
        • dropping an index defined on a table or an indexed view 
          • only if the index is used by the query plan in question
        • [SQL Server 2000] manually updating or dropping a statistic (not creating!) on a table will cause a recompilation of any query plans that use that table
          • the recompilation happens the next time the query plan in question begins execution
        • [SQL Server 2005] dropping a statistic (not creating or updating!) defined on a table will cause a correctness-related recompilation of any query plans that use that table
          • the recompilations happen the next time the query plan in question begins execution
        • updating a statistic (both manual and auto-update) will cause an optimality-related (data related) recompilation of any query plans that uses this statistic 
      • update scenarios 
        • {scenario} query execution times are slow [22]
          • troubleshooting: ensure that queries have up-to-date statistics before performing additional analysis [22] 
        • {scenario} insert operations occur on ascending or descending key columns [22]
          • statistics on ascending or descending key columns (e.g. IDENTITY or real-time timestamp) might require more frequent statistics updates than the query optimizer performs [22]
          • if statistics are not up-to-date and queries select from the most recently added rows, the current statistics will not have cardinality estimates for these new values [22] 
            • inaccurate cardinality estimates and slow query performance [22]
        • after maintenance operations [22]
          • {recommendation} consider updating statistics after performing maintenance procedures that change the distribution of data (e.g. truncating a table, bulk inserts) [22]
            • this can avoid future delays in query processing while queries wait for automatic statistics updates [22]
            • rebuilding, defragmenting, or reorganizing an index do not change the distribution of data [22]
              • no need to update statistics after performing ALTER INDEX REBUILD, DBCC REINDEX, DBCC INDEXDEFRAG, or ALTER INDEX REORGANIZE operations [22]
  • {operation} dropping statistics
    • via DROP STATISTICS command
    • {limitation} it’s not possible to drop statistics that are a byproduct of an index [21]
      • such statistics are removed only when the index is dropped [21]
    • aging 
      • [SQL Server 2000] ages the automatically created statistics (only those that are not a byproduct of the index creation)
      • after several automatic updates the column statistics are dropped rather than updated [21]
        • if they are needed in the future, they may be created again [21]
        • there is no substantial cost difference between statistics that are updated and created [21]
      • does not affect user-created statistics  [21]
  • {operation} statistics import)
    • performed by running generated scripts (see statistics export )
  • {operation}export (aka statistics export)
    • performed via “Generate Scripts” task
      • WITH STATS_STREAM
        • value can be generated by using DBCC SHOW_STATISTICS WITH STATS_STREAM
  • {operation} save/restore statistics
    • not supported
  • {operation} monitoring
    • via dbcc show_statistics
    • via SQL Server query profiler 
  • {option} AUTO_CREATE_STATISTICS
    • when enabled the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan [22]
      • these single-column statistics are created on columns that do not already have a histogram in an existing statistics object [22]
      • it applies strictly to single-column statistics for the full table [22]
      • statistics name starts with _WA
    • does not determine whether statistics get created for indexes [22]
    • does not generate filtered statistics [22]
      • call: SELECT DATABASEPROPERTY(<database_name>','IsAutoCreateStatistics') 
      • call: select is_auto_create_stats_on from sys.databases where name = '<database_name'
  • {option}AUTO_UPDATE_STATISTICS option
    • when enabled the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query [22]
      • statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view [22]
      • determined by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold [22]
        • the threshold is based on the number of rows in the table or indexed view [22]
      • the query optimizer checks for out-of-date statistics 
        • before compiling a query 
          • the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date [22]
        • before executing a cached query plan
          • the Database Engine verifies that the query plan references up-to-date statistics [22]
    • applies to 
      • statistics objects created for indexes, single-columns in query predicates [22]
      • statistics created with the CREATE STATISTICS statement [22]
      • filtered statistics [22]
    • call: SELECT DATABASEPROPERTY(<database_name>','IsAutoUpdateStatistics') 
    • call: select is_auto_update_stats_on from sys.databases where name = '<database_name'
  • {option} AUTO_UPDATE_STATISTICS_ASYNC
    • determines whether the query optimizer uses synchronous or asynchronous statistics updates 
    • {default} off
      • the query optimizer updates statistics synchronously
        • queries always compile and execute with up-to-date statistics
        • when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query [22]
        • {recommendation} use when performing operations that change the distribution of data [22]
          • e.g. truncating a table or performing a bulk update of a large percentage of the rows [22]
          •  if the statistics are not updated after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data [22]
    • applies to statistics objects created for  [22]
      • indexes
      • single columns in query predicates
      • statistics created with the CREATE STATISTICS statement
    • asynchronous statistics updates
      • queries compile with existing statistics even if the existing statistics are out-of-date
        • the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles [22]
        • queries that compile after the asynchronous updates have completed will benefit from using the updated statistics [22]
      • {warning} it will do nothing if the “Auto Update Statistics” database option isn’t enabled [30]
      • {recommendation} scenarios
        •  to achieve more predictable query response times
        • an application frequently executes the same query, similar queries, or similar cached query plans
          • the query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics [22]
            • this avoids delaying some queries and not others [22]
        • an application experienced client request time outs caused by one or more queries waiting for updated statistics [22]
          • in some cases waiting for synchronous statistics could cause applications with aggressive time outs to fail [22]
      • call: select is_auto_update_stats_async_on from sys.databases where name = '<database_name'
  • [SQL Server 2014] Auto Create Incremental Statistics
    • when active (ON), the statistics created are per partition statistics [22]
    • {default} when disabled (OFF), the statistics tree is dropped and SQL Server re-computes the statistics [22]
    • this setting overrides the database level INCREMENTAL property [22]
  • [partitions] when new partitions are added to a large table, statistics should be updated to include the new partitions [22]
    • however the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long [22]
    • scanning the entire table isn't necessary because only the statistics on the new partitions might be needed [22]
    • the incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics [22]
    • if per partition statistics are not supported the option is ignored and a warning is generated [22]
    • not supported for several statistics types
      • statistics created with indexes that are not partition-aligned with the base table [22]
      • statistics created on AlwaysOn readable secondary databases [22]
      • statistics created on read-only databases [22]
      • statistics created on filtered indexes [22]
      • statistics created on views [22]
      • statistics created on internal tables [22]
      • statistics created with spatial indexes or XML indexes [22]

References:
[1] Microsoft Learn (2025) SQL Server: Statistics [link]
[2] Microsoft Learn (2025) SQL Server: Configure auto statistics [link]
[3] Microsoft Learn (2009) SQL Server: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 [link]
[8] Microsoft (2014) Statistical maintenance functionality (autostats) in SQL Server  (kb 195565)
[10] CSS SQL Server Engineers (2015) Does rebuild index update statistics? [link]
[16] Thomas Kejser's Database Blog (2011) The Ascending Key Problem in Fact Tables – Part one: Pain!, by Thomas Kejser
[20] Kalen Delaney et al (2013) Microsoft SQL Server 2012 Internals, 2013
[22] MSDN (2016) Statistics https://msdn.microsoft.com/en-us/library/ms190397.aspx
[29] Tips, Tricks, and Advice from the SQL Server Query Optimization Team (2006) UPDATE STATISTICS undocumented options [link]
[31] Microsoft Learn (2025) SQL Server: sys.dm_db_stats_histogram (Transact-SQL) [link]

18 March 2025

🏭🗒️Microsoft Fabric: Statistics in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 18-Mar-2024

[Microsoft Fabric] Statistics
  • {def} objects that contain relevant information about data, to allow query optimizer to estimate plans' costs [1]
    • critical for the warehouse and lakehouse SQL endpoint for executing queries quickly and efficiently [3]
      • when a query is executed, the engine tries to collect existing statistics for certain columns in the query and use that information to assist in choosing an optimal execution plan [3]
      • inaccurate statistics can lead to unoptimized query plans and execution times [5]
  • {type} user-defined statistics
    • statistics defined manually by the users via DDL statement [1]
    • users can create, update and drop statistic 
      • via CREATE|UPDATE|DROP STATISTICS
    • users can review the contents of histogram-based single-column statistics [1]
      • via DBCC SHOW_STATISTICS
        • only, a limited version of these statements is supported [1]
    • {recommendation} focus on columns heavily used in query workloads
      • e.g. GROUP BYs, ORDER BYs, filters, and JOINs
    • {recommendation} consider updating column-level statistics regularly [1]
      • e.g. after data changes that significantly change rowcount or distribution of the data [1]
  • {type} automatic statistics
    • statistics created and maintained automatically by the query engine at query time [1]
    • when a query is issued and query optimizer requires statistics for plan exploration, MF automatically creates those statistics if they don't already exist [1]
      • then the query optimizer can utilize them in estimating the plan costs of the triggering query [1]
      • if the query engine determines that existing statistics relevant to query no longer accurately reflect the data, those statistics are automatically refreshed [1]
        • these automatic operations are done synchronously [1]
          • the query duration includes this time [1]
  • {object type} histogram statistics
    • created per column needing histogram statistics at query time [1]
    • contains histogram and density information regarding the distribution of a particular column [1]
    • similar to the statistics automatically created at query-time in Azure Synapse Analytics dedicated pools [1]
    • name begins with _WA_Sys_.
    • contents can be viewed with DBCC SHOW_STATISTICS
  • {object type} average column length statistics
    • created for variable character columns (varchar) greater than 100 needing average column length at query-time [1]
    • contain a value representing the average row size of the varchar column at the time of statistics creation [1]
    • name begins with ACE-AverageColumnLength_
    • contents cannot be viewed and are nonactionable by users [1]
  • {object type} table-based cardinality statistics
    • created per table needing cardinality estimation at query-time [1]
    • contain an estimate of the rowcount of a table [1]
    • named ACE-Cardinality [1]
    • contents cannot be viewed and are nonactionable by user [1]
  • [lakehouse] SQL analytics endpoint
    • uses the same engine as the warehouse to serve high performance, low latency SQL queries [4]
    • {feature} automatic metadata discovery
      • a seamless process reads the delta logs and from the files folder and ensures SQL metadata for tables is always up to date [4]
        • e.g. statistics [4]
  • {limitation} only single-column histogram statistics can be manually created and modified [1]
  • {limitation} multi-column statistics creation is not supported [1]
  • {limitation} other statistics objects might appear in sys.stats
    • besides the statistics created manually/automatically [1]
      • ⇐ the objects are not used for query optimization [1]
  • {limitation} if a transaction has data insertion into an empty table and issues a SELECT before rolling back, the automatically generated statistics can still reflect the uncommitted data, causing inaccurate statistics [5]
    • {recommendation} update statistics for the columns mentioned in the SELECT [5]
  • {recommendation} ensure all table statistics are updated after large DML transactions [2]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Fabric: Statistics in Fabric data warehousing [link
[2] Microsoft Learn (2025) Fabric: Troubleshoot the Warehouse [link
[3] Microsoft Fabric Updates Blog (2023) Microsoft Fabric July 2023 Update [link]
[4] Microsoft Learn (2024) Fabric: Better together: the lakehouse and warehouse [link]
[5] Microsoft Learn (2024) Fabric: Transactions in Warehouse tables in Microsoft Fabric [link]

Acronyms:
DDL - Data Definition Language
MF - Microsoft Fabric

17 March 2025

🏭🗒️Microsoft Fabric: Z-Order [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Z-Order
  • {def} technique to collocate related information in the same set of files [2]
    • ⇐ reorganizes the layout of each data file so that similar column values are strategically collocated near one another for maximum efficiency [1]
    • {benefit} efficient query performance
      • reduces the amount of data to read [2] for certain queries
        • when the data is appropriately ordered, more files can be skipped [3]
        • particularly important for the ordering of multiple columns [3]
    • {benefit} data skipping
      • automatically skips irrelevant data, further enhancing query speeds
        • via data-skipping algorithms [2]
    • {benefit} flexibility
      • can be applied to multiple columns, making it versatile for various data schemas
    • aims to produce evenly-balanced data files with respect to the number of tuples
      • ⇐ but not necessarily data size on disk [2]
        • ⇐ the two measures are most often correlated [2]
          • ⇐ but there can be situations when that is not the case, leading to skew in optimize task times [2]
    • via ZORDER BY clause 
      • applicable to columns with high cardinality commonly used in query predicates [2]
      • multiple columns can be specified as a comma-separated list
        • {warning} the effectiveness of the locality drops with each extra column [2]
          • has tradeoffs
            • it’s important to analyze query patterns and select the right columns when Z Ordering data [3]
        • {warning} using columns that do not have statistics collected on them is  ineffective and wastes resources [2] 
          • statistics collection can be configured on certain columns by reordering columns in the schema, or by increasing the number of columns to collect statistics on [2]
      • {characteristic} not idempotent
        • every time is executed, it will try to create a new clustering of data in all files in a partition [2]
          • it includes new and existing files that were part of previous z-ordering [2]
      • compatible with v-order
    • {concept} [Databricks] liquid clustering 
      • replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance [4] [6]
        • not compatible with the respective features [4] [6]
      • tables created with liquid clustering enabled have numerous Delta table features enabled at creation [4] [6]
      • provides flexibility to redefine clustering keys without rewriting existing data [4] [6]
        • ⇒ allows data layout to evolve alongside analytic needs over time [4] [6]
      • applies to 
        • streaming tables 
        • materialized views
      • {scenario} tables often filtered by high cardinality columns [4] [6]
      • {scenario} tables with significant skew in data distribution [4] [6]
      • {scenario} tables that grow quickly and require maintenance and tuning effort [4] [6]
      • {scenario} tables with concurrent write requirements [4] [6]
      • {scenario} tables with access patterns that change over time [4] [6]
      • {scenario} tables where a typical partition key could leave the table with too many or too few partitions [4] [6]

    References:
    [1] Bennie Haelen & Dan Davis (2024) Delta Lake Up & Running: Modern Data Lakehouse Architectures with Delta Lake
    [2] Delta Lake (2023) Optimizations [link]
    [3] Delta Lake (2023) Delta Lake Z Order, by Matthew Powers [link]
    [4] Delta Lake (2025) Use liquid clustering for Delta tables [link]
    [5] Databricks (2025) Delta Lake table format interoperability [link]
    [6] Microsoft Learn (2025) Use liquid clustering for Delta tables [link]

    Resources:
    [R1] Azure Guru (2024) Z Order in Delta Lake - Part 1 [link]

    Acronyms:
    MF - Microsoft Fabric 

    🏭🗒️Microsoft Fabric: V-Order [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 17-Mar-2024

    [Microsoft Fabric] V-Order
    • {def} write time optimization to the parquet file format that enables fast reads under the MF compute engine [2]
      • all parquet engines can read the files as regular parquet files [2]
      • results in a smaller and therefore faster files to read [5]
        • {benefit} improves read performance 
        • {benefit} decreases storage requirements
        • {benefit} optimizes resources' usage
          • reduces the compute resources required for reading data
            • e.g. network bandwidth, disk I/O, CPU usage
      • still conforms to the open-source Parquet file format [5]
        • they can be read by non-Fabric tools [5]
      • delta tables created and loaded by Fabric items automatically apply V-Order
        • e.g. data pipelines, dataflows, notebooks [5]
      • delta tables and its features are orthogonal to V-Order [2]
        •  e.g. Z-Order, compaction, vacuum, time travel
        • table properties and optimization commands can be used to control the v-order of the partitions [2]
      • compatible with Z-Order [2]
      • not all files have this optimization applied [5]
        • e.g. Parquet files uploaded to a Fabric lakehouse, or that are referenced by a shortcut 
        • the files can still be read, the read performance likely won't be as fast as an equivalent Parquet file that's had V-Order applied [5]
      • required by certain features
        • [hash encoding] to assign a numeric identifier to each unique value contained in the column [5]
      • {command} OPTIMIZE 
        • optimizes a Delta table to coalesce smaller files into larger ones [5]
        • can apply V-Order to compact and rewrite the Parquet files [5]
    • [warehouse] 
      • works by applying certain operations on Parquet files
        • special sorting
        • row group distribution
        • dictionary encoding
        • compression 
      • enabled by default
      •  ⇒ compute engines require less network, disk, and CPU resources to read data from storage [1]
        • provides cost efficiency and performance [1]
          • the effect of V-Order on performance can vary depending on tables' schemas, data volumes, query, and ingestion patterns [1]
        • fully-compliant to the open-source parquet format [1]
          • ⇐ all parquet engines can read it as regular parquet files [1]
      • required by certain features
        • [Direct Lake mode] depends on V-Order
      • {operation} disable V-Order
        • causes any new Parquet files produced by the warehouse engine to be created without V-Order optimization [3]
        • irreversible operation
          •  once disabled, it cannot be enabled again [3]
        • {scenario} write-intensive warehouses
          • warehouses dedicated to staging data as part of a data ingestion process [1]
        • {warning} consider the effect of V-Order on performance before deciding to disable it [1]
          • {recommendation} test how V-Order affects the performance of data ingestion and queries before deciding to disable it [1]
        • via ALTER DATABASE CURRENT SET VORDER = OFF; [3]
      • {operation} check current status
        • via  SELECT name, is_vorder_enabled FROM sys.databases; [post]
    • {feature} [lakehouse] Load to Table
      • allows to load a single file or a folder of files to a table [6]
      • tables are always loaded using the Delta Lake table format with V-Order optimization enabled [6]
    • [Direct Lake semantic model] 
      • data is prepared for fast loading into memory [5]
        • makes less demands on capacity resources [5]
        • results in faster query performance [5]
          • because less memory needs to be scanned [5]

    References:
    [1] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
    [2] Microsoft Learn (2024) Delta Lake table optimization and V-Order [link]
    [3] Microsoft Learn (2024) Disable V-Order on Warehouse in Microsoft Fabric [link]
    [4] Miles Cole (2024) To V-Order or Not: Making the Case for Selective Use of V-Order in Fabric Spark [link]
    [5] Microsoft Learn (2024) Understand storage for Direct Lake semantic models [link]
    [6] Microsoft Learn (2025] Fabric: Load to Delta Lake table [link]

    Resources:
    [R1] Serverless.SQL (2024) Performance Analysis of V-Ordering in Fabric Warehouse: On or Off?, by Andy Cutler [link]
    [R2] Redgate (2023 Microsoft Fabric: Checking and Fixing Tables V-Order Optimization, by Dennes Torres [link]
    [R3] Sandeep Pawar (2023) Checking If Delta Table in Fabric is V-order Optimized [link]

    Acronyms:
    MF - Microsoft Fabric

    🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 17-Mar-2024

    [Microsoft Fabric] Caching
    • {def} technique that improves the performance of data processing by storing frequently accessed data and metadata in a faster storage layer [1]
      • e.g. local memory, local SSD disk
      • ⇐ subsequent requests can be served faster, directly from the cache [1]
        • if a set of data has been previously accessed by a query, any subsequent queries will retrieve that data directly from the in-memory cache [1]
        • local memory operations are notably faster compared to fetching data from remote storage [1]
        • ⇐ significantly diminishes IO latency [1]
      • fully transparent to the user
      • consistently active and operates seamlessly in the background [1]
      • orchestrated and upheld by MF
        • it doesn't offer users the capability to manually clear the cache [1] 
      • provides transactional consistency
        • ensures that any modifications to the data in storage after it has been initially loaded into the in-memory cache, will result in consistent data [1]
      • when the cache reaches its capacity threshold and fresh data is being read for the first time, objects that have remained unused for the longest duration will be removed from the cache [1]
        • process is enacted to create space for the influx of new data and maintain an optimal cache utilization strategy [1] 
    • {type} in-memory cache
      • data in cache is organized in a compressed columnar format (aka columnar storage) [1]
        • ⇐ optimized for analytical queries
        • each column of data is stored separately [1]
          • {benefit} allows for better compression [1]
            • since similar data values are stored together [1]
          • {benefit} reduces the memory footprint
      • when queries need to perform operations on a specific column, the engine can work more efficiently speeding up queries' execution [1]
        • it doesn't have to process unnecessary data from other columns
        • can perform operations on multiple columns simultaneously [1]
          • taking advantage of modern multi-core processors [1]
      • when retrieves data from storage, the data is transformed from its original file-based format into highly optimized structures in in-memory cache [1]
      • {scenario} analytical workloads where queries involve scanning large amounts of data to perform aggregations, filtering, and other data manipulations [1]
    • {type} disk cache
      • complementary extension to the in-memory cache
      • any data loaded into the in-memory cache is also serialized to the SSD cache [1]
        • data removed from the in-memory cache remains within the SSD cache for an extended period
        • when subsequent query requests the data, it is retrieved from the SSD cache into the in-memory cache quicker [1]
    • {issue} cold run (aka cold cache) performance
      • the first 1-3 executions of a query perform noticeably slower than subsequent executions [2]
        • if the first run's performance is crucial, try manually creating statistics (aka pre-warming the cache) [2]
        • otherwise, one can rely on the automatic statistics generated in the first query run and leveraged in subsequent runs [2]
        • as long as underlying data does not change significantly [2]
    • differentiated from
      • [Kusto] caching policy [link]
      • [Apache Spark] intelligent cache [link]
      • [Power BI] query caching [link]
      • [Azure] caching [link]
    Previous Post <<||>> Next Post
      References:
      [1] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
      [2] Microsoft Learn (2024) Fabric Data Warehouse performance guidelines [link]
      [3] Sandeep Pawar (2023) Pre-Warming The Direct Lake Dataset For Warm Cache Import-Like Performance [link]

      Acronyms:
      IO - Input/Output
      MF - Microsoft Fabric
      SSD - Solid State Drive

      16 March 2025

      💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part XII: Databases)

      After reviewing the server and database properties, the next step is to review the database configurations using the sys.databases metadata. There's a such object even for warehouse and lakehouse, respectively for mirrored databases.

      Except the last line from the query, all the other attributes are supported by all above mentioned repositories. 

      -- database metadata
      SELECT db.name
      , db.database_id
      --, db.source_database_id
      , db.owner_sid
      , db.create_date
      , db.compatibility_level
      , db.collation_name
      --, db.user_access
      , db.user_access_desc
      --, db.is_read_only
      --, db.is_auto_close_on
      --, db.is_auto_shrink_on
      --, db.state
      , db.state_desc
      --, db.is_in_standby
      --, db.is_cleanly_shutdown
      --, db.is_supplemental_logging_enabled
      --, db.snapshot_isolation_state
      , db.snapshot_isolation_state_desc
      --, db.is_read_committed_snapshot_on
      --, db.recovery_model
      , db.recovery_model_desc
      , db.page_verify_option
      , db.page_verify_option_desc
      , db.is_auto_create_stats_on
      --, db.is_auto_create_stats_incremental_on
      , db.is_auto_update_stats_on
      --, db.is_auto_update_stats_async_on
      --, db.is_ansi_null_default_on
      --, db.is_ansi_nulls_on
      --, db.is_ansi_padding_on
      --, db.is_ansi_warnings_on
      --, db.is_arithabort_on
      --, db.is_concat_null_yields_null_on
      --, db.is_numeric_roundabort_on
      --, db.is_quoted_identifier_on
      --, db.is_recursive_triggers_on
      --, db.is_cursor_close_on_commit_on
      --, db.is_local_cursor_default
      , db.is_fulltext_enabled
      --, db.is_trustworthy_on
      , db.is_db_chaining_on
      --, db.is_parameterization_forced
      , db.is_master_key_encrypted_by_server
      --, db.is_query_store_on
      --, db.is_published
      --, db.is_subscribed
      --, db.is_merge_published
      --, db.is_distributor
      --, db.is_sync_with_backup
      , db.service_broker_guid
      --, db.is_broker_enabled
      --, db.log_reuse_wait
      , db.log_reuse_wait_desc
      --, db.is_date_correlation_on
      --, db.is_cdc_enabled
      --, db.is_encrypted
      --, db.is_honor_broker_priority_on
      --, db.replica_id
      --, db.group_database_id
      --, db.resource_pool_id
      --, db.default_language_lcid
      --, db.default_language_name
      --, db.default_fulltext_language_lcid
      --, db.default_fulltext_language_name
      --, db.is_nested_triggers_on
      --, db.is_transform_noise_words_on
      --, db.two_digit_year_cutoff
      --, db.containment
      --, db.containment_desc
      , db.target_recovery_time_in_seconds
      --, db.delayed_durability
      --, db.delayed_durability_desc
      --, db.is_memory_optimized_elevate_to_snapshot_on
      --, db.is_federation_member
      --, db.is_remote_data_archive_enabled
      --, db.is_mixed_page_allocation_on
      , db.is_temporal_history_retention_enabled
      --, db.catalog_collation_type
      , db.catalog_collation_type_desc
      , db.physical_database_name
      --, db.is_result_set_caching_on
      , db.is_accelerated_database_recovery_on
      --, db.is_tempdb_spill_to_remote_store
      --, db.is_stale_page_detection_on
      , db.is_memory_optimized_enabled
      , db.is_data_retention_enabled
      --, db.is_ledger_on
      --, db.is_change_feed_enabled
      --, db.is_data_lake_replication_enabled
      --, db.is_change_streams_enabled
      --, db.data_lake_log_publishing
      , db.data_lake_log_publishing_desc
      , db.is_vorder_enabled
      --, db.is_optimized_locking_on
      FROM sys.databases db
      

      Output (consolidated):

      Attribute Warehouse Mirrored Db Lakehouse SQL database
      name Warehouse Test 001 MirroredDatabase_1 Lakehouse_DWH SQL DB Test...
      database_id 5 6 7 28
      owner_sid 0x01010000000000051... 0x01010000000000051... x01010000000000051... AAAAAAWQAAAAAA...
      create_date 2025-02-22T18:56:28.700 2025-03-16T14:57:56.600 2025-03-16T15:07:59.563 2025-02-22T03:01:53.7130000
      compatibility_level 160 160 160 160
      collation_name Latin1_General_100_BIN2_UTF8 Latin1_General_100_BIN2_UTF8 Latin1_General_100_BIN2_UTF8 SQL_Latin1_General_CP1_CI_AS
      user_access_desc MULTI_USER MULTI_USER MULTI_USER MULTI_USER
      state_desc ONLINE ONLINE ONLINE ONLINE
      snapshot_isolation_state_desc ON ON ON ON
      recovery_model_desc SIMPLE SIMPLE SIMPLE FULL
      page_verify_option 0 0 0 2
      page_verify_option_desc NONE NONE NONE CHECKSUM
      is_auto_create_stats_on 1 1 1 1
      is_auto_update_stats_on 1 1 1 1
      is_fulltext_enabled 1 1 1 1
      is_db_chaining_on 0 0 0 0
      is_master_key_encrypted_by_server 0 0 0 0
      service_broker_guid 1F2261FC-5031-... 7D882362-567E-... 0D8938AB-BA79-... 2b74bed3-4405-...
      log_reuse_wait_desc NOTHING NOTHING NOTHING NOTHING
      target_recovery_time_in_seconds 60 60 60 60
      is_temporal_history_retention_enabled 1 1 1 1
      catalog_collation_type_desc DATABASE_DEFAULT DATABASE_DEFAULT DATABASE_DEFAULT SQL_Latin1_General_CP1_CI_AS
      physical_database_name Warehouse Test 001 MirroredDatabase_1 Lakehouse_DWH 3f4a3e79-e53e-...
      is_accelerated_database_recovery_on 1 1 1 1
      is_memory_optimized_enabled 1 1 1 1
      is_data_retention_enabled 1 1 1 1
      data_lake_log_publishing_desc AUTO UNSUPPORTED UNSUPPORTED UNSUPPORTED
      is_vorder_enabled 1 1 1 0

      The output for the SQL database was slightly different formatted and is_vorder_enabled is not available. Otherwise, the above query can be used for all environments. 

      All attributes except the last two are known from the earlier versions of SQL Server. is_vorder_enabled reflects the current status of V-Order [2], while data_lake_log_publishing_desc reflects the current state of Delta Lake log publishing [3].

      Consolidating the output from different sources helps identify the differences and one can easily use Excel formulas for this.

      Previous Post <<||>> Next Post

      References:
      [1] Microsoft Learn (2025) SQL Server 2022: sys.databases (Transact-SQL) [link]

      [2] Microsoft Learn (2025) SQL Server 2022: Disable V-Order on Warehouse in Microsoft Fabric [link

      [3] Microsoft Learn (2025) SQL Server 2022: Delta Lake logs in Warehouse in Microsoft Fabric [link

      Related Posts Plugin for WordPress, Blogger...

      About Me

      My photo
      Koeln, NRW, Germany
      IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.