Showing posts with label SQL Reloaded. Show all posts
Showing posts with label SQL Reloaded. Show all posts

06 February 2025

🌌🏭KQL Reloaded: First Steps (Part V: Database Metadata)

When working with a new data repository, one of the first things to do is to look at database's metadata, when available, and try to get a birds eye view of what's available, how big is the databases in terms of size, tables and user-defined objects, how the schema was defined, how the data are stored, eventually how often backup are taken, what users have access and to what, etc. 

So, after creating some queries in KQL and figuring out how things work, I tried to check what metadata are available, how it can be accessed, etc. The target is not to provide a full list of the available metadata, but to understand what information is available, in what format, how easy is to extract the important metadata, etc. 

So, the first set of metadata is related to database:

// get database metadata metadata
.show databases (ContosoSales)

// get database metadata metadata (multiple databases)
.show databases (ContosoSales, Samples)

// get database schema metadata
.show databases (ContosoSales) schema

// get database schema metadata (multiple databases) 
.show databases (ContosoSales, Samples) schema

// get database schema violations metadata
.show database ContosoSales schema violations

// get database entities metadata
.show databases entities with (showObfuscatedStrings=true)
| where DatabaseName == "ContosoSales"

// get database metadata 
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" and EntityType == "Table"
//| summarize count () //get the number of tables

// get a function's details
.show databases entities with (resolveFunctionsSchema=true)
| where DatabaseName == "ContosoSales" 
    and EntityType == "Function" 
    and EntityName == "SalesWithParams"

// get external tables metadata
.show external tables

// get materialized views metadata
.show materialized-views

// get query results metadata
.show stored_query_results

// get entities groups metadata
.show entity_groups

Then, it's useful to look at the database objects. 

// get all tables 
.show tables 
//| count

// get tables metadata
.show tables (Customers, NewSales)

// get tables schema
.show table Customers cslschema

// get schema as json
.show table Customers schema as json

// get table size: Customers
Customers
| extend sizeEstimateOfColumn = estimate_data_size(*)
| summarize totalSize_MB=round(sum(sizeEstimateOfColumn)/1024.00/1024.00,2)

Unfortunately, the public environment has restrictions in what concerns the creation of objects, while for the features available one needs to create some objects to query the corresponding metadata.

Furthermore, it would be interesting to understand who has access to the various repositories, what policies were defined, and so on. 

// get principal roles
.show database ContosoSales principal roles

// get principal roles for table
.show table Customers principal roles

// get principal roles for function:
.show function SalesWithParams principal roles

// get retention policies
.show table Customers policy retention

// get sharding policies
.show table Customers policy sharding

There are many more objects one can explore. It makes sense to document the features, respectively the objects used for the various purposes.

In addition, one should check also the best practices available for the data repository (see [2]).

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Management commands overview [link]
[2] Microsoft Learn (2024) Kusto: Best practices for schema management [link]

23 January 2025

💎SQL Reloaded: Number of Records VI (via sp_MSForEachTable Undocumented Stored Procedure)

Starting with SQL Server 2000 it's possible to execute a command via the undocumented stored procedure sp_MSForEachTable for each table available in a database, respectively for subsets of the tables. In a previous post I shown how the stored procedure can be used in several scenarios, including how to get the total number of records in each set of tables. However, the code used generates a result set for each table, which makes it difficult to aggregate the information for further processing. In many scenarios, it would be useful to store the result as a temporary or even persisted table.

-- dropping the tables
DROP TABLE IF EXISTS #Tables
DROP TABLE IF EXISTS #TablesRecordCount

-- create a temporary table to store the input list
SELECT TableName
INTO #Tables 
FROM (VALUES ('Person.Address')
, ('Person.AddressType')
, ('Person.BusinessEntity')) DAT(TableName)


-- create a temporary table to store the results
CREATE TABLE dbo.#TablesRecordCount (
  table_name nvarchar(150) NOT NULL
, number_records bigint
, run_date datetime2(0)
, comment nvarchar(255)
)

-- getting the number of records for the list of tables into the result table
INSERT INTO #TablesRecordCount
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records, GetDate() run_date, ''testing round 1'' comment FROM ?'
, @whereand = ' And Object_id In (Select Object_id(TableName) FROM #Tables)'

-- reviewing the result
SELECT *
FROM #TablesRecordCount
ORDER BY number_records DESC

The above solution uses two temporary tables, though it can be easily adapted to persist the result in a standard table: just replace the "#" with the schema part (e.g. "dbo."). This can be useful in troubleshooting scenarios, when the code is run at different points in time, eventually for different sets of tables. 

The code is pretty simple and can be extended as needed. Unfortunately, there's no guarantee that the sp_MSForEachTable stored procedure will be supported in the next versions of the SQL Server. For example, the stored procedure is not available in SQL databases, respectively in Fabric warehouses. In SQL databases the following error is thrown:

"Msg 2812, Level 16, State 62, Line 1, Could not find stored procedure 'sys.sp_MSForEachTable'."

To test whether the feature works in your environment, it's enough to run a call to the respective stored procedure:

-- retrieve the record count for all tables
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'

Or, you can check whether it works for one table (replace the Person.AddressType table with one from your environment):

-- getting the number of records for the list of tables into another table
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'
, @whereand = ' And Object_id = Object_id(''Person.AddressType'')'

The solution could prove to be useful in multiple scenarios, though one should consider also the risk of being forced to rewrite the code when the used stored procedure becomes unavailable. Even if it takes more time to write, a solution based on cursors can be more feasible (see previous post).

Update 29-Jan-2025: Probably, despite their usefulness, the undocumented features will not be brought to SQL databases (see [1], 47:30). So, be careful about using the respective features as standard solutions in production environments!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

💎🏭SQL Reloaded: Number of Records V (via Cursors, a Solution for Warehouses in Microsoft Fabric)

After deploying the sample warehouse available in Microsoft Fabric, I tried to check the number of records available in the deployed tables under the dbo schema. Surprisingly, the sys.partitions.count column has 0 values for all the tables associated with the respective schema (see post). 

There are only a few tables available, and taking a record count for each table should be enough, which is relatively simple with the undocumented sp_MSForEachTable. Unfortunately, this approach doesn't work neither, so one needs to revert to the use of old-fashioned cursors (as I used to do in SQL Server 2000):

-- number of records via cursor
DECLARE @table_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_tables int, @iterator int

DROP TABLE IF EXISTS dbo.#tables;

CREATE TABLE dbo.#tables (
  ranking int NOT NULL
, table_name nvarchar(150) NOT NULL
, number_records bigint
)

INSERT INTO #tables
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) table_name
, NULL number_records
FROM sys.tables obj
WHERE obj.schema_id = schema_id('dbo')
ORDER BY table_name

SET @iterator = 1
SET @number_tables = IsNull((SELECT count(*) FROM #tables), 0)

WHILE (@iterator <= @number_tables)
BEGIN 
    SET @table_name = (SELECT table_name FROM #tables WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @table_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)> 0  
		BEGIN
                SET @sql = 'UPDATE #tables' 
             + ' SET number_records = ' + Str(@number_records)
             + ' WHERE table_name = ''' + @table_name + '''';

		 EXEC(@sql)
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
        END CATCH;

	SET @iterator = @iterator + 1
END

SELECT *
FROM dbo.#tables;

--DROP TABLE IF EXISTS dbo.#tables;
Results:
ranking table_name number_records
1 dbo.Date 5844
2 dbo.Geography 305179
3 dbo.HackneyLicense 42958
4 dbo.Time 86400
5 dbo.Weather 526330
6 dbo.Trip 2838927
7 dbo.Medallion 13668

Comments:
1) It's a lot of code for a simple task, though the code can be easily duplicated and adapted for similar requirements. Unfortunately, it can lead in time also to many instances of the same code. When possible, one should consider maybe encapsulating the logic in a stored procedure. 
2) It's usually a good idea to check how many records are available in the tables used for testing, as this can impact queries' performance and tables' appropriateness for the tests performed. Moreover, it's a good idea to understand the volume of data when taking over or working with a database. 
3) If one removes the row_number function, the code should run also in SQL Server 2000. Similar solutions were used then for retrieving the record count.
4) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1]..
5) There are others who stumbled over this issue (see [1]).
6) The solution has been tested successfully also in SQL databases.
7) The whole code must be run together because the temporary table seems to have only a transitory scope! An attempt to rerun the last SELECT from #tables raises the error: "Invalid object name '#tables'"

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Koen Verbeeck (2024) Get row counts of all tables in a Microsoft Fabric warehouse [link]
[2] Haripriya SB (2024) Do NOT drop #temp tables (link)

17 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VIII: Permissions) [new feature]

Data-based solutions usually target a set of users who (ideally) have restricted permissions to the functionality. Therefore, as part of the process are defined several personas that target different use cases, for which the permissions must be restricted accordingly. 

In the simplest scenario the user must have access to the underlying objects for querying the data. Supposing that an Entra User was created already, the respective user must be given access also in the Fabric database (see [1], [2]). From database's main menu follow the path to assign read permissions:
Security >> Manage SQL Security >> (select role: db_datareader)

Manage SQL Security
Manage SQL Security

Manage access >> Add >> (search for User)

Manage access
Manage access

(select user) >> Share database >> (select additional permissions) >> Save

Manage additional permissions
Manage additional permissions

The easiest way to test whether the permissions work before building the functionality is to login over SQL Server Management Studio (SSMS) and check the access using the Microsoft Entra MFA. Ideally, one should have a User's credentials that can be used only for testing purposes. After the above setup was done, the new User was able to access the data. 

A second User can be created for testing with the maximum of permissions allowed on the SQL database side, which is useful for troubleshooting. Alternatively, one can use only one User for testing and assign or remove the permissions as needed by the test scenario. 

It's a good idea to try to understand what's happening in the background. For example, the expectation was that for the Entra User created above also a SQL user is created, which doesn't seem to be the case, at least per current functionality available. 

 Before diving deeper, it's useful to retrieve User's details: 

-- retrieve current user
SELECT SUser_Name() sys_user_name
, User_Id() user_id 
, USER_NAME() user_name
, current_user [current_user]
, user [user]; 
Output:
sys_user_name user_id user_name current_user user
JamesClavell@[domain].onmicrosoft.com 0 JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com

Retrieving the current User is useful especially when testing in parallel functionality with different Users. Strangely, User's ID is 0 when only read permissions were assigned. However, a valid User identifier is added for example when to the User is assigned also the db_datawriter role. Removing afterwards the db_datawriter role to the User keeps as expected User's ID. For troubleshooting purposes, at least per current functionality, it might be a good idea to create the Users with a valid User ID (e.g. by assigning temporarily the db_datawriter role to the User). 

The next step is to look at the Users with access to the database:

-- database access 
SELECT USR.uid
, USR.name
--, USR.sid 
, USR.hasdbaccess 
, USR.islogin
, USR.issqluser
--, USR.createdate 
--, USR.updatedate 
FROM sys.sysusers USR
WHERE USR.hasdbaccess = 1
  AND USR.islogin = 1
ORDER BY uid
Output:
uid name hasdbaccess islogin issqluser
1 dbo 1 1 1
6 CharlesDickens@[...].onmicrosoft.com 1 1 0
7 TestUser 1 1 1
9 JamesClavell@[...].onmicrosoft.com 1 1 0

For testing purposes, besides the standard dbo role and two Entra-based roles, it was created also a SQL role to which was granted access to the SalesLT schema (see initial post):

-- create the user
CREATE USER TestUser WITHOUT LOGIN;

-- assign access to SalesLT schema 
GRANT SELECT ON SCHEMA::SalesLT TO TestUser;
  
-- test impersonation (run together)
EXECUTE AS USER = 'TestUser';

SELECT * FROM SalesLT.Customer;

REVERT; 

Notes:
1) Strangely, even if access was given explicitly only to the SalesLT schema, the TestUser User has access also to sys.sysusers and other DMVs. That's valid also for the access over SSMS
2) For the above created User there are no records in the sys.user_token and sys.login_token DMVs, in contrast with the user(s) created for administering the SQL database. 

Let's look at the permissions granted explicitly:

-- permissions granted explicitly
SELECT DPR.principal_id
, DPR.name
, DPR.type_desc
, DPR.authentication_type_desc
, DPE.state_desc
, DPE.permission_name
FROM sys.database_principals DPR
     JOIN sys.database_permissions DPE
	   ON DPR.principal_id = DPE.grantee_principal_id
WHERE DPR.principal_id != 0 -- removing the public user
ORDER BY DPR.principal_id
, DPE.permission_name;
Result:
principal_id name type_desc authentication_type_desc state_desc permission_name
1 dbo SQL_USER INSTANCE GRANT CONNECT
6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT AUTHENTICATE
6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT
7 TestUser SQL_USER NONE GRANT CONNECT
7 TestUser SQL_USER NONE GRANT SELECT
9 JamesClavell@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT

During troubleshooting it might be useful to check current user's permissions at the various levels via sys.fn_my_permissions:

-- retrieve database-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions(NULL, 'Database');

-- retrieve schema-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions('SalesLT', 'Schema');

-- retrieve object-scoped permissions for current user
SELECT *
FROM sys.fn_my_permissions('SalesLT.Customer', 'Object')
WHERE permission_name = 'SELECT';

Notes:
1) See also [1] and [4] in what concerns the limitations that apply to managing permissions in SQL databases.

Happy coding!

Previous Post <<||>> Previous Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: Share your SQL database and manage permissions [link]
[2] Microsoft Learn (2024) Microsoft Fabric: Share data and manage access to your SQL database in Microsoft Fabric  [link]
[3] Microsoft Learn (2024) Authorization in SQL database in Microsoft Fabric [link]
[4] Microsoft Learn (2024) Authentication in SQL database in Microsoft Fabric [link]

[5] Microsoft Fabric Learn (2025) Manage access for SQL databases in Microsoft Fabric with workspace roles and item permissions [link

06 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VII: Things That Don't Work) [new feature]

Microsoft does relatively a good job in documenting what doesn't work in Microsoft Fabric's SQL Databases. There's a good overview available already in the documentation, though beyond this the current post lists my finding while testing the previously written code on this blog,

USE Database

The standard syntax allows to change via USE the database context to the specified database or database snapshot. Unfortunately, this syntax doesn't seem to be supported currently and unfortunately many scripts seem to abuse of it. Thus, the following line of code throws an error:

-- changing the context
USE master;
GO
USE tempdb;

"Msg 40508, Level 16, State 1, Line 1, USE statement is not supported to switch between databases. Use a new connection to connect to a different database"

However, one can use the 3-part naming convention to reference the various objects:

-- sys metadata - retrieving the database files

SELECT *
FROM tempdb.sys.database_files dbf
ORDER BY name;

Even if the tempdb is not listed in the sys.databases table, it's still available for querying, which can prove helpful for troubleshooting. 

DBCC commands 

The documentation warns that some DBCC commands won't work, though in some cases there are also alternatives. For example:

-- clearing the procedure cache via DBCC
DBCC FREEPROCCACHE;
Output:
"Msg 2571, Level 14, State 9, Line 1, User '<user>' does not have permission to run DBCC freeproccache."

Alternatively, one can use the following command, which seems to work:

-- clearing the procedure cash via ALTER
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;

CHECKDB, which checks the logical and physical integrity of all the objects in the specified database, can't be used as well:
 
-- Checking the logical and physical integrity of a database
DBCC CHECKDB();
Output:
"Msg 916, Level 14, State 2, Line 1, The server principal "..." is not able to access the database "..." under the current security context."
The same error message is received for CHECKTABLE, utility which checks the integrity of all the pages and structures that make up the table (or indexed view):

-- checking a table's integrity
DBCC CHECKTABLE ('SalesLT.Address');
Output:
"Msg 916, Level 14, State 2, Line 2, The server principal "..." is not able to access the database "..." under the current security context."
A similar error messages is received for SQLPERF, which provides transaction log space usage statistics for all databases:

-- retrieving the LOGSPACE information for all databases
DBCC SQLPERF (LOGSPACE);
Output: 
"Msg 297, Level 16, State 10, Line 1, The user does not have permission to perform this action."
There are however DBCC commands like SHOW_STATISTICS or SHRINKDATABASE which do work. 
 
-- current query optimization statistics
DBCC SHOW_STATISTICS('SalesLT.Address','PK_Address_AddressID');
Output:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows Persisted Sample Percent
PK_Address_AddressID Dec 21 2024 3:02AM 450 450 197 1 4 NO 450 0

SHRINKDATABASE shrinks the size of the data and log files in the specified database:

-- shrinking database
DBCC SHRINKDATABASE([AdventureWorks01-...]) WITH NO_INFOMSGS;

Update 29-Jan-2025: According to an answer from Ask the Expert session on Fabric Database [3], Microsoft seems to be working in bringing more DBCC features to SQL databases.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: USE <database> [link]
[2] Microsoft Learn (2024) Database console commands [link]
[3] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VI: Index Usage Analysis) [new feature]

There are several system dynamic management views (DMV) available in SQL Server, Azure SQL Server and now in SQL databases that allow to gather more information about indexes' fragmentation and usage. Let's look at the most important information available based on the indexes create in the previous posts. As the data were probably purged from the views, it's needed to run first the select queries based on the SalesLT.Product from the previous post. This step is important, otherwise the DMVs might return no records!

One starting point is to use the sys.dm_db_index_physical_stats DMV to look at the indexes' size and fragmentation information for a given table (or view). The table is used usually as starting point for analyzing indexes' fragmentation and then defragment the indexes with high fragmentation.

-- sys metadata - index & data size and fragmentation information for the data and indexes of the specified table or view
SELECT --db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name index_name
, IND.type_desc
, IPS.page_count
, IPS.record_count
, IPS.index_level
, Cast(IPS.avg_fragmentation_in_percent as decimal(10,2)) avg_fragmentation_perc
, Cast(IPS.avg_page_space_used_in_percent as decimal(10,2)) space_used_perc
--, IPS.*
FROM sys.indexes IND
     CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), IND.object_id, IND.index_id, NULL, 'DETAILED') IPS
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product');
Output:
index_name type_desc page_count record_count index_level avg_fragmentation_perc space_used_perc
PK_Product_ProductID CLUSTERED 101 295 0 0.99 87.90
PK_Product_ProductID CLUSTERED 1 101 1 0.00 16.20
AK_Product_rowguid NONCLUSTERED 2 295 0 50.00 74.69
AK_Product_rowguid NONCLUSTERED 1 2 1 0.00 0.59
AK_Product_ProductNumber NONCLUSTERED 2 295 0 50.00 85.79
AK_Product_ProductNumber NONCLUSTERED 1 2 1 0.00 0.49
AK_Product_Name NONCLUSTERED 3 295 0 33.33 87.32
AK_Product_Name NONCLUSTERED 1 3 1 0.00 1.67
IX_SalesLT_Product_Color NONCLUSTERED 1 295 0 0.00 79.24
IX_SalesLT_Product_Color_Size NONCLUSTERED 1 295 0 0.00 94.12
IX_SalesLT_Product_ListPrice_IC NONCLUSTERED 4 295 0 0.00 86.60
IX_SalesLT_Product_ListPrice_IC NONCLUSTERED 1 4 1 0.00 1.01

In a second step one can look at the sys.dm_db_index_usage_stats DMV which provides the counts of the different types of index operations and the time each type of operation was last performed:

-- sys metadata - counts of different types of index operations and the time each type of operation was last performed.
SELECT -- db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name
, IND.type_desc
, IUS.user_seeks 
, IUS.user_scans
, IUS.user_lookups 
, IUS.user_updates
, IUS.last_user_seek
, IUS.last_user_scan 
, IUS.last_user_lookup
, IUS.last_user_update
FROM sys.dm_db_index_usage_stats IUS
     JOIN sys.indexes IND
       ON IUS.index_id = IND.index_id
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product');
Output:
name type_desc user_seeks user_scans user_lookups user_updates last_user_seek last_user_scan last_user_lookup
PK_Product_ProductID CLUSTERED 0 10 15 0 2025-01-06T14:23:54 2025-01-06T14:23:54
IX_SalesLT_Product_Color_Size NONCLUSTERED 11 0 0 0 2025-01-06T14:23:54
IX_SalesLT_Product_ListPrice_IC NONCLUSTERED 8 0 0 0 2025-01-06T13:38:03

Finally, it might be useful to look also at the sys.dm_db_index_operational_stats DMV which returns the current lower-level I/O, locking, latching, and access method activity for each partition of a table or index in the database (see the documentation for the full list of attrbutes):

-- sys metadata - index operations stats
SELECT -- db_name() db_name
--, object_name(IND.object_id) table_name
 IND.name index_name
, IND.type_desc
, IOS.range_scan_count
, IOS.singleton_lookup_count
, IOS.leaf_insert_count
, IOS.leaf_delete_count
, IOS.leaf_update_count
, IOS.nonleaf_insert_count
, IOS.nonleaf_delete_count
, IOS.nonleaf_update_count
FROM sys.indexes IND
     CROSS APPLY sys.dm_db_index_operational_stats(DB_ID(), IND.object_id, IND.index_id, NULL) IOS
WHERE IND.object_id = OBJECT_ID(N'SalesLT.Product')
 AND IOS.range_scan_count<>0
ORDER BY IND.name;
Output:
index_name type_desc range_scan_count singleton_lookup_count leaf_insert_count leaf_delete_count leaf_update_count nonleaf_insert_count nonleaf_delete_count nonleaf_update_count
IX_SalesLT_Product_Color_Size NONCLUSTERED 11 0 0 0 0 0 0 0
IX_SalesLT_Product_ListPrice_IC NONCLUSTERED 8 0 0 0 0 0 0 0
PK_Product_ProductID CLUSTERED 10 64 0 0 0 0 0 0

For more information on these DMVs check the documentation.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: sys.dm_db_index_physical_stats [link]
[2] Microsoft Learn (2024) SQL Server: sys.dm_db_index_usage_stats [link]
[3] Microsoft Learn (2024) SQL Server: sys.dm_db_index_operational_stats [link]

04 January 2025

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part V: Manual Index Maintenance) [new feature]

Indexes' maintenance in Microsoft Fabric's SQL databases is supposed to happen automatically in the background via automatic tuning options feature, though the whole functionality is still in its early phases, and therefore many questions regarding the whole process may arise. Probably the most important question is whether indexes can still be created, respectively maintained manually. That's useful for temporary or even periodic workloads, where maybe organizations might still want to maintain indexes manually. 

The tests made below are based on the SalesLT.Product from AdventureWorkds database available in Microsoft Fabric. The target was to create several indexes that could be used for the various testing purposes. Each set of the below scripts was run 5-10 times until records appeared in the sys.dm_db_missing_index_details table for each test case (see further below):

-- batch 1: filter on single column (to be run 5-10 times)
SELECT *
FROM SalesLT.Product 
WHERE Color = 'Red'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'Black'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'White'

-- batch 2: filter on two columns (to be run 5-10 times)
SELECT *
FROM SalesLT.Product 
WHERE Color = 'Red'
  AND Size = '58'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'Black'
  AND Size = '58'

SELECT *
FROM SalesLT.Product 
WHERE Color = 'White'
     AND Size = '58'

-- batch 3: filter with column selection (to be run 5-10 times)
SELECT ProductNumber, Name, Color, ListPrice
FROM SalesLT.Product 
WHERE ListPrice BETWEEN 50 AND 55

SELECT ProductNumber, Name, Color, ListPrice
FROM SalesLT.Product 
WHERE ListPrice BETWEEN 100 and 105

Once the scripts run, one can look at the records created in the above considered dynamic management view:

-- sys metadata -  missing indexes
SELECT MID.statement AS table_name
, MID.equality_columns
, MID.inequality_columns
, MID.included_columns
--, MIG.index_group_handle
--, MIG.index_handle
FROM sys.dm_db_missing_index_details MID 
    JOIN sys.dm_db_missing_index_groups MIG 
     ON MID.index_handle =  MIG.index_handle
ORDER BY MIG.index_group_handle
, MIG.index_handle
Output:
table_name equality_columns inequality_columns included_columns
[AdventureWorks01-...].[SalesLT].[Product] [Color]
[AdventureWorks01-...].[SalesLT].[Product] [Color], [Size]
[AdventureWorks01-...].[SalesLT].[Product] [ListPrice] [Name], [ProductNumber], [Color]

The next step is to create one of the indexes (please note that database's name must be replaced accordingly or used only the 2-part naming convention - schema & table name ):

-- create index on Color
CREATE INDEX IX_SalesLT_Product_Color 
ON [AdventureWorks01-...].[SalesLT].[Product] (Color);

Once the script was run, all the records related to the SalesLT.Product disappeared from the dynamic management view. Therefore, it might be a good idea to take a snapshot with view's data before creating any indexes manually. Probably the same behavior should be expected when the indexes are created by the system.

-- create index on Color & Size
CREATE INDEX IX_SalesLT_Product_Color_Size
ON [SalesLT].[Product] (Color, Size);

-- create index on ListPrice with included columns
CREATE INDEX IX_SalesLT_Product_ListPrice_IC
ON [SalesLT].[Product] (ListPrice) INCLUDE(ProductNumber, Name, Color);

One can use the following query based on the meta.vIndexes (created in a previous post) to look at the indexes created:

-- sys metadata - index columns
SELECt IND.db_name
, IND.schema_name
, IND.table_name
, IND.index_name
, IND.index_type
, IND.principal_type
, IND.auto_created
FROM meta.vIndexes IND
WHERE IND.schema_name = 'SalesLT'
  AND IND.table_name = 'Product'
  AND IND.index_name IN ('IX_SalesLT_Product_Color ','IX_SalesLT_Product_Color_Size'
,'IX_SalesLT_Product_ListPrice_IC')
ORDER BY IND.table_name
, IND.index_name
Output:
db_name schema_name table_name index_name index_type principal_type auto_created
AdventureWorks01-... SalesLT Product IX_SalesLT_Product_Color NONCLUSTERED S False
AdventureWorks01-... SalesLT Product IX_SalesLT_Product_Color_Size NONCLUSTERED S False
AdventureWorks01-... SalesLT Product IX_SalesLT_Product_ListPrice_IC NONCLUSTERED S False

After this model can be created further indexes as needed. It's always a good idea to take a "copy" of the indexes created (or keep a history of the scripts run for indexes' maintenance). This best practice is now more important, when the system can drop indexes as it considers fit. 

Don't forget to clean up the changes made if the indexes aren't needed anymore:

-- cleaning after
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_Color;
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_Color_Size;
DROP INDEX IF EXISTS SalesLT.IX_SalesLT_Product_ListPrice_IC;

So, after these tests, the standard syntax for index's maintenance seems to work also on SQL databases, with all the implications deriving from this (e.g. porting of scripts, database objects, etc.)

Happy coding!

Previous Post <<||>> Next Post

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.