SQL Troubles: undocumented feature

Showing posts with label undocumented feature. Show all posts

01 March 2025

💎🤖💫SQL Reloaded: Copilot Stories (Part VII: Using the sp_MSForEachTable Undocumented Stored Procedures in SQL Server)

Probably few know that SQL Server makes available a set of undocumented stored procedures that can be useful in certain scenarios that don't involve code that will deployed into production. One of them is sp_MSForEachTable on which I wrote several posts until now. The stored procedure executes one or more scripts based on a set of constraints against the subset of available tables. The first surprising fact is that Copilot does know the respective stored procedure and seems to correctly generate code for it!

Disclaimer: The below code was generated by Copilot via prompts used for demonstration purposes!

Please note that the following code is provided as it is and you can use it on your own risk! Don't run the code in a productive environment and make sure that you have a usable backup! Please consider all caution when you use a feature marked as undocumented!

Let's start with a simple script and increase the complexity of the queries as it goes!

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure in SQL Server

Output:

-- query that uses this stored procedure to update a specific column in all tables:

EXEC sp_MSForEachTable 'UPDATE ? SET YourColumn = ''NewValue'' WHERE YourCondition'

/*
? is a placeholder for the table name.
YourColumn is the column you want to update.
NewValue is the new value you want to set.
YourCondition is the condition that must be met for the update to occur.*/

As the description says, "?" acts as placeholder for table's name while the other values need to be provided explicitly. The code can be used to overwrite the value of a column that exists in some of the tables. For example, when the data come over an interface from various system, it may happen that there is a mix of upper and lower case values that designate the same value. This might not be an issue for Dynamics AX but can create issues further on.

Here's the actual code for performing the update:

-- update DataAreaId for DMEMF company
EXEC sp_MSForEachTable 'UPDATE ? SET DataAreaId = Upper(DataAreaId) WHERE DataAreaId = ''demf'''

Executing the code blindly can create serious damage, so always test first the code in a test environment and have a backup available! One needs to be careful if there are fields which have same name(s) but different meaning! It's important to perform a thorough analysis in this direction.

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to select records from a set of tables.

Output:

-- return all columns
EXEC sp_MSForEachTable 'SELECT * FROM ?'

-- return specific columns based on condition
EXEC sp_MSForEachTable 'SELECT Column1, Column2 FROM ? WHERE YourCondition'

The second query will result in errors if the columns don't exist, though the execution runs further! The errors can be filtered out in Excel by sorting them accordingly.
The query works in general for combinations of columns available in many tables. It's the case of DataAreaId and RecId available in Dynamics 365 for Finance and Operations for a high percentage of the tables. RecId should be available for all the tables.

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records from a table

Output:

-- retrieving the number of records for all tables
EXEC sp_MSForEachTable 'SELECT ''?'' AS TableName, COUNT(*) AS RecordCount FROM ?'

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to select all the records from tables containing the word invent.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%invent%'' BEGIN SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

Here's the working version of the query:

-- retrieve the records for tables having 'invent' intheir name
EXEC sp_MSForEachTable 
    @command1 = 'DECLARE  @sql nvarchar(max) = ''''
		IF ''?'' LIKE ''%invent%'' BEGIN SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' END;
                    EXEC sp_executesql @sql;'

A few days later, using the same prompt in Edge I got the following answer, simpler and correct:

-- second version run on Edge Copilot
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%invent%'' 
                BEGIN 
                    PRINT ''?'' 
                    EXEC(''SELECT * FROM ?'') 
                END'

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure when the DataAreaId field exists in the table.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = PARSENAME(''?'', 1) AND COLUMN_NAME = ''DataAreaId'') 
                 BEGIN 
                     SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' 
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

The query uses the metadata available to retrieve the tables in scope.

EXEC sp_MSForEachTable 
    @command1 = ' DECLARE  @sql nvarchar(max) = ''''
	IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = PARSENAME(''?'', 1) AND COLUMN_NAME = ''DataAreaId'') 
        BEGIN 
            SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' 
			EXEC sp_executesql @sql
        END'

Here's the answer in Edge:

EXEC sp_MSForEachTable 
    @command1 = 'IF EXISTS (SELECT * FROM sys.columns WHERE object_id = OBJECT_ID(''?'' ) AND name = ''DataAreaId'') 
                BEGIN 
                    PRINT ''?'' 
                    EXEC(''SELECT * FROM ?'') 
                END'

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to select records from a set of tables while it checks whether the fields exist in the table.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'DECLARE @table NVARCHAR(128)
                 SET @table = PARSENAME(''?'', 1)
                 IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @table AND COLUMN_NAME = ''Field1'')
                 AND EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @table AND COLUMN_NAME = ''Field2'')
                 BEGIN
                     SET @sql = @sql + ''SELECT Field1, Field2 FROM '' + ''?'' + ''; ''
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

The query used the metadata available in the INFORMATION_SCHHEMA.COLUMNS. Unfortunately, the code doesn't work as expected. The second command is not necessary for performing the task, so the code can be written as follows, and now it works:

EXEC sp_MSForEachTable 
    @command1 = 'DECLARE @table NVARCHAR(128), @sql nvarchar(max) = ''''
                 SET @table = PARSENAME(''?'', 1)
                 IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @table AND COLUMN_NAME = ''DataAreaId'')
                 AND EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @table AND COLUMN_NAME = ''ItemId'')
                 BEGIN
                     SET @sql = @sql + ''SELECT DataAreaId, ItemId FROM '' + ''?'' + ''; ''
                 END
		 EXEC sp_executesql @sql'

The query is easy to modify to include more columns after the same pattern, if such tables exist after all (e.g. in Dynamics 365 one can search for the combination ItemId, DataAreaId and one or more Inventory dimension like InventSiteId, InventLocationId, etc.).

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records for a subset of tables.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%subset%'' 
                 BEGIN 
                     SET @sql = @sql + ''SELECT ''''?'''' AS TableName, COUNT(*) AS RecordCount FROM '' + ''?'' + ''; '' 
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

Unfortunately, that's not what I was expecting. So, let's give it a second try:

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records for a subset of the tables available in the SYS.TABLES.

Output:

EXEC sp_MSForEachTable 
    @command1 = 'IF OBJECT_ID(''?'' , ''U'') IN (SELECT OBJECT_ID FROM SYS.TABLES WHERE NAME IN (''YourTable1'', ''YourTable2'', ''YourTable3'')) 
                BEGIN 
                    PRINT ''?'' 
                    EXEC(''SELECT COUNT(*) FROM ?'') 
                END'

Replacing the tables names accordingly, the query becomes

EXEC sp_MSForEachTable 
    @command1 = 'IF OBJECT_ID(''?'' , ''U'') IN (SELECT OBJECT_ID FROM SYS.TABLES WHERE NAME IN (''InventTable'', ''InventDim'', ''InventDimCombination'')) 
                BEGIN 
                    PRINT ''?'' 
                    EXEC(''SELECT COUNT(*) FROM ?'') 
                END'

Comments:
1) Edge Copilot returned the expected solution directly and I wonder what the reason could be? It can be the fact that I run the tests on different machines at different points in time, though more likely the reason is that I used two different Copilots - from Edge, respectively Microsoft 365. The expectation was to get better answers in M365 Copilot, which wasn't the case in the above tests!

The two Copilots have different architectures and were trained with different sets of data, respectively have different sources. One important difference is that Edge Copilot primarily relies on web content and users' interactions within the browser, while M365 Copilot relies on data from the D365 services. Both use LLM and NLP, while M365 Copilot uses Microsoft Graph and Syntex.

One thing I remarked during the tests is that sometimes I get different answers to the same question also within the same Copilot. However, that's expected given that there are multiple ways of addressing the same problem! It would still be interesting to understand how a solution is chosen.

2) I seldom used sp_MSForEachTable with more than one parameter, preferring to prepare the needed statement beforehand, as in the working solutions above.

Happy coding!

Refer4ences:
[1] SQLShack (2017) An introduction to sp_MSforeachtable; run commands iteratively through all tables in a database [link]

Acronyms:
LLM - large language model
M365 - Microsoft 365
NLP - natural language processing

Previous Post <<||>> Next Post

23 January 2025

💎SQL Reloaded: Number of Records VI (via sp_MSForEachTable Undocumented Stored Procedure)

Starting with SQL Server 2000 it's possible to execute a command via the undocumented stored procedure sp_MSForEachTable for each table available in a database, respectively for subsets of the tables. In a previous post I shown how the stored procedure can be used in several scenarios, including how to get the total number of records in each set of tables. However, the code used generates a result set for each table, which makes it difficult to aggregate the information for further processing. In many scenarios, it would be useful to store the result as a temporary or even persisted table.

-- dropping the tables
DROP TABLE IF EXISTS #Tables
DROP TABLE IF EXISTS #TablesRecordCount

-- create a temporary table to store the input list
SELECT TableName
INTO #Tables 
FROM (VALUES ('Person.Address')
, ('Person.AddressType')
, ('Person.BusinessEntity')) DAT(TableName)


-- create a temporary table to store the results
CREATE TABLE dbo.#TablesRecordCount (
  table_name nvarchar(150) NOT NULL
, number_records bigint
, run_date datetime2(0)
, comment nvarchar(255)
)

-- getting the number of records for the list of tables into the result table
INSERT INTO #TablesRecordCount
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records, GetDate() run_date, ''testing round 1'' comment FROM ?'
, @whereand = ' And Object_id In (Select Object_id(TableName) FROM #Tables)'

-- reviewing the result
SELECT *
FROM #TablesRecordCount
ORDER BY number_records DESC

The above solution uses two temporary tables, though it can be easily adapted to persist the result in a standard table: just replace the "#" with the schema part (e.g. "dbo."). This can be useful in troubleshooting scenarios, when the code is run at different points in time, eventually for different sets of tables.

The code is pretty simple and can be extended as needed. Unfortunately, there's no guarantee that the sp_MSForEachTable stored procedure will be supported in the next versions of the SQL Server. For example, the stored procedure is not available in SQL databases, respectively in Fabric warehouses. In SQL databases the following error is thrown:

"Msg 2812, Level 16, State 62, Line 1, Could not find stored procedure 'sys.sp_MSForEachTable'."

To test whether the feature works in your environment, it's enough to run a call to the respective stored procedure:

-- retrieve the record count for all tables
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'

Or, you can check whether it works for one table (replace the Person.AddressType table with one from your environment):

-- getting the number of records for the list of tables into another table
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'
, @whereand = ' And Object_id = Object_id(''Person.AddressType'')'

The solution could prove to be useful in multiple scenarios, though one should consider also the risk of being forced to rewrite the code when the used stored procedure becomes unavailable. Even if it takes more time to write, a solution based on cursors can be more feasible (see previous post).

Update 29-Jan-2025: Probably, despite their usefulness, the undocumented features will not be brought to SQL databases (see [1], 47:30). So, be careful about using the respective features as standard solutions in production environments!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

💎🏭SQL Reloaded: Number of Records V (via Cursors, a Solution for Warehouses in Microsoft Fabric)

After deploying the sample warehouse available in Microsoft Fabric, I tried to check the number of records available in the deployed tables under the dbo schema. Surprisingly, the sys.partitions.count column has 0 values for all the tables associated with the respective schema (see post).

There are only a few tables available, and taking a record count for each table should be enough, which is relatively simple with the undocumented sp_MSForEachTable. Unfortunately, this approach doesn't work neither, so one needs to revert to the use of old-fashioned cursors (as I used to do in SQL Server 2000):

-- number of records via cursor
DECLARE @table_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_tables int, @iterator int

DROP TABLE IF EXISTS dbo.#tables;

CREATE TABLE dbo.#tables (
  ranking int NOT NULL
, table_name nvarchar(150) NOT NULL
, number_records bigint
)

INSERT INTO #tables
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) table_name
, NULL number_records
FROM sys.tables obj
WHERE obj.schema_id = schema_id('dbo')
ORDER BY table_name

SET @iterator = 1
SET @number_tables = IsNull((SELECT count(*) FROM #tables), 0)

WHILE (@iterator <= @number_tables)
BEGIN 
    SET @table_name = (SELECT table_name FROM #tables WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @table_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)> 0  
		BEGIN
                SET @sql = 'UPDATE #tables' 
             + ' SET number_records = ' + Str(@number_records)
             + ' WHERE table_name = ''' + @table_name + '''';

		 EXEC(@sql)
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
        END CATCH;

	SET @iterator = @iterator + 1
END

SELECT *
FROM dbo.#tables;

--DROP TABLE IF EXISTS dbo.#tables;

Results:

ranking	table_name	number_records
1	dbo.Date	5844
2	dbo.Geography	305179
3	dbo.HackneyLicense	42958
4	dbo.Time	86400
5	dbo.Weather	526330
6	dbo.Trip	2838927
7	dbo.Medallion	13668

Comments:
1) It's a lot of code for a simple task, though the code can be easily duplicated and adapted for similar requirements. Unfortunately, it can lead in time also to many instances of the same code. When possible, one should consider maybe encapsulating the logic in a stored procedure.
2) It's usually a good idea to check how many records are available in the tables used for testing, as this can impact queries' performance and tables' appropriateness for the tests performed. Moreover, it's a good idea to understand the volume of data when taking over or working with a database.
3) If one removes the row_number function, the code should run also in SQL Server 2000. Similar solutions were used then for retrieving the record count.
4) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1]..
5) There are others who stumbled over this issue (see [1]).
6) The solution has been tested successfully also in SQL databases.
7) The whole code must be run together because the temporary table seems to have only a transitory scope! An attempt to rerun the last SELECT from #tables raises the error: "Invalid object name '#tables'"

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Koen Verbeeck (2024) Get row counts of all tables in a Microsoft Fabric warehouse [link]
[2] Haripriya SB (2024) Do NOT drop #temp tables (link)

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

Identifying the SQL Server DMVs which are accessible for the Serverless SQL pool (see previous post), allowed me to identify besides sys.dm_exec_requests_history three more DMVs with statistics on the statements executed on the server: sys.dm_request_phases, sys.dm_request_phases_task_group_stats and sys.dm_request_phases_exec_task_stats. Untofurtunately, there seems to be no documentation available on these DMVs, and, at the time the post was written, there were also no further hits on google.com or bing.com found on the same.

sys.dm_request_phases

sys.dm_request_phases provides insights in the phases an execution statement goes through, and seems to summarize the other two views:

-- Azure Serverless SQL pool: request phases
SELECT TOP (100) dist_statement_id
, RPH.dist_request_id
, TRY_CAST(RPH.id as bigint) id
, TRY_CAST(RPH.parent_ids as bigint) parent_ids
, RPH.start_time
, RPH.end_time
--, RPH.total_elapsed_time_ms
--, RPH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, RPH.min_time_ms
--, RPH.min_time_ms/1000.0 min_time_sec
--, RPH.max_time_ms
--, RPH.max_time_ms/1000.0 max_time_sec
--, RPH.avg_time_ms
, RPH.avg_time_ms/1000.0 avg_time_sec
--, RPH.stdev_time_ms
--, RPH.stdev_time_ms/1000.0 stdev_time_sec -- it has no values
--, RPH.min_rows
--, RPH.max_rows
--, RPH.avg_rows
--, RPH.stdev_rows -- it has no values
, RPH.total_rows
--, RPH.total_bytes_processed
, RPH.total_bytes_processed/1028.0 total_kb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
FROM sys.dm_request_phases RPH
ORDER BY Id

dist_statement_id	dist_request_id	id	parent_ids	start_time	end_time	avg_time_sec	total_rows	total_kb_processed	state_desc	operation_type	input_dop	output_dop	task_retries	error_id
8C4386DC...	820E9FC6...	1	2	...09:58:34.213	...09:58:36.337	1.031	2030	2343.310311	succeeded	Shuffle	1	1	0	0
8C4386DC...	820E9FC6...	2	0	...09:58:36.447	...09:58:39.713	1.891	9	7145.193579	succeeded	Return	1	1	0	0
C9524971...	680DCB55...	3	4	...10:05:46.747	...10:05:47.057	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
C9524971...	680DCB55...	4	0	...10:05:47.057	...10:05:48.480	1.406	0	6630.101167	succeeded	Return	1	1	0	0
FD2D17AD...	C9453EF2...	5	6	...11:58:54.060	...11:58:55.297	0.547	10	1534.098249	succeeded	ComputeToControlNode	1	1	0	0
FD2D17AD...	C9453EF2...	6	0	...11:58:55.297	...11:58:55.420	0.125	10	4.074902	succeeded	Return	1	1	0	0
9FB0A268...	CAA533DE...	7	8	...11:59:16.483	...11:59:16.700	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
9FB0A268...	CAA533DE...	8	0	...11:59:16.700	...11:59:18.640	1.922	6	7143.673151	succeeded	Return	1	1	0	0
1732AB0D...	AC1A4F10...	9	10	...11:59:25.950	...11:59:26.140	0.172	2030	2343.310311	succeeded	Shuffle	1	1	0	0
1732AB0D...	AC1A4F10...	10	0	...11:59:26.140	...11:59:27.450	1.297	9	6635.185797	succeeded	Return	1	1	0	0

Notes:
1) The foreign keys and dates (in the above and below queries) were truncated to accomodate all the important attributes in the snapshot of the values returned.
2) Based on the exisitng queries, there are two records for each executed statement, a Shuffle or ComputeToControlNode followed by a Return (see operation_type). In more complex scenario there are several Shuffles and Broadcasts and a Return. According to the Microsoft team, even if for serverless SQL pools there's no Data Movement Service (DMS), there's a similar algorithm responsible for moving the data between the nodes.
3) Because in serverless SQL pool each query has its own distribution statement id, the min, max, avg and total values will have the sames values across the columns. Therefore, the columns with redundant values were commented.
4) The Id of the request phase seems to have numeric values despite being defined as alphanumeric. I tried to cast the values to bigint for sorting purposes.

sys.dm_request_phases_task_group_stats

sys.dm_request_phases_task_group_stats stores metadata about the requests breakdown at task group:

-- Azure Serverless SQL pool: request phases breakdown at task group
SELECT TOP (100) RPT.dist_request_id
, TRY_CAST(RPT.id as bigint) id
, TRY_CAST(RPT.parent_ids as bigint) parent_ids
, RPT.dist_statement_id
, RPT.state_desc
, RPT.start_time
, RPT.end_time
, RPT.input_dop
, RPT.output_dop
, RPT.operation_type
, RPT.task_retries
FROM sys.dm_request_phases_task_group_stats RPT
ORDER BY id

dist_request_id	id	parent_ids	dist_statement_id	state_desc	start_time	end_time	input_dop	output_dop	operation_type	task_retries
820E9FC6...	1	2	8C4386DC...	succeeded	638098055142132551	638098055163382693	1	1	Shuffle	0
820E9FC6...	2	0	8C4386DC...	succeeded	638098055164476163	638098055197133001	1	1	Return	0
680DCB55...	3	4	C9524971...	succeeded	638098059467450021	638098059470574953	1	1	Shuffle	0
680DCB55...	4	0	C9524971...	succeeded	638098059470574953	638098059484793682	1	1	Return	0
C9453EF2...	5	6	FD2D17AD...	succeeded	638098127340607112	638098127352951067	1	1	ComputeToControlNode	0
C9453EF2...	6	0	FD2D17AD...	succeeded	638098127352951067	638098127354202970	1	1	Return	0
CAA533DE...	7	8	9FB0A268...	succeeded	638098127564826084	638098127567013504	1	1	Shuffle	0
CAA533DE...	8	0	9FB0A268...	succeeded	638098127567013504	638098127586388549	1	1	Return	0
AC1A4F10...	9	10	1732AB0D...	succeeded	638098127659513620	638098127661388514	1	1	Shuffle	0
AC1A4F10...	10	0	1732AB0D...	succeeded	638098127661388514	638098127674513601	1	1	Return	0

Notes:
1) The DVM seems to return the same number of records as sys.dm_request_phases.
2) Observe the format of the start_time and end_time, probably the timestamps come from the Spark cluster and were not translated into an SQL Server data type.

sys.dm_request_phases_exec_task_stats

sys.dm_request_phases_exec_task_stats stores metadata about the requests breakdown at task level:

-- Azure Serverless SQL pool: request phases breakdown at task
SELECT TOP (100) RPE.dist_request_id
, TRY_CAST(RPE.id as bigint) id
--, RPE.min_time_ms
--, RPE.max_time_ms
, RPE.avg_time_ms/1000.0 avg_time_sec
--, RPE.stdev_time_ms
, RPE.total_bytes_processed
--, RPE.min_rows
--, RPE.max_rows
--, RPE.avg_rows
--, RPE.stdev_rows
, RPE.total_rows
, RPE.error_id
FROM sys.dm_request_phases_exec_task_stats RPE
ORDER BY id

dist_request_id	id	avg_time_sec	total_kb_processed	total_rows	error_id
820E9FC6...	1	1.031	2343.310311	2030	0
820E9FC6...	2	1.891	7145.193579	9	0
680DCB55...	3	0.203	2343.310311	2030	0
680DCB55...	4	1.406	6630.101167	0	0
C9453EF2...	5	0.547	1534.098249	10	0
C9453EF2...	6	0.125	4.074902	10	0
CAA533DE...	7	0.203	2343.310311	2030	0
CAA533DE...	8	1.922	7143.673151	6	0
AC1A4F10...	9	0.172	2343.310311	2030	0
AC1A4F10...	10	1.297	6635.185797	9	0

What does all this mean?

The lack of documentation makes it challenging to interpret the values of the views besides the data and metadata they offer. In a paper on POLARIS, the code given to the serveless SQL pool engine, a taks is defined as "a careful packaging of data and query processing into units [...] that can be readily moved across compute nodes and re-started at the task level" [1]. Therefore, one can assume that this is the level targetted by the sys.dm_request_phases_exec_task_stats DMV. Further on, the tasks are grouped at phase level according to the sys.dm_request_phases_task_group_stats, the metadata from the two DMVs being further combined into sys.dm_request_phases DMV.

If the meaning is kept from dedicated SQL pools, a shuffle operation indicates that data is moved between the frontend and backend nodes to satisfy a request, while a Result represents the operation of returning the result selt to client. The "ComputeToControlNode" operation involves a simple select (e.g. SELECT top 10) from a CETA and therefore no "Shuffle" is needed.

Requests' history

Further on, one can use the "Distributed statement id" to join the execution request phases with the request history, however matches will be found only for a small subset of the records (probably the executions since the pool started):

-- Azure Serverless SQL pool: requests history with request phase info
SELECT top 100 ERH.status
, ERH.transaction_Id
, ERH.distributed_statement_Id 
, ERH.query_hash 
--, ERH.login_name 
, ERH.start_time
, ERH.end_time 
, ERH.command 
, ERH.query_text 
--, ERH.total_elapsed_time_ms
, ERH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, ERH.data_processed_mb
, ERH.data_processed_mb
, RPH.avg_time_ms/1000.0 avg_time_sec
, RPH.total_rows
, RPH.total_bytes_processed/1028.0/1028.0 total_mb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
, ERH.error
, ERH.error_code 
FROM sys.dm_exec_requests_history ERH
     JOIN sys.dm_request_phases RPH
	   ON ERH.distributed_statement_Id = RPH.dist_statement_id
	  --AND RPH.parent_ids = 0 -- only the parent
ORDER BY RPH.Id DESC

Here's a subset of the result set focusing only on the statistical values:

distr_statement_Id	start_time	end_time	total_elapsed_time_sec	data_processed_mb	avg_time_sec	total_rows	total_mb_processed	operation_type	id	parent_ids
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.031	2030	2.279484738326	Shuffle	1	2
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.891	9	6.950577411478	Return	2	0
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	0.203	2030	2.279484738326	Shuffle	3	4
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	1.406	0	6.449514753891	Return	4	0
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.547	10	1.492313471789	ComputeToControlNode	5	6
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.125	10	0.003963912451	Return	6	0
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	0.203	2030	2.279484738326	Shuffle	7	8
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	1.922	6	6.949098395914	Return	8	0
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	0.172	2030	2.279484738326	Shuffle	9	10
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	1.297	9	6.454460892023	Return	10	0

Notes:
As can be seen, the volume of data processed and the elapsed time values don't match between the two tables, though they are close. The differences probably result from further steps occuring in the process.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Josep Aguilar-Saborit, Raghu Ramakrishnan et al, "POLARIS: The Distributed SQL Engine in Azure Synapse", VLDB Conferences. PVLDB, 13(12): 3204 – 3216, 2020, DOI: https://doi.org/10.14778/3415478.3415545

13 January 2021

💠🛠️SQL Server: Undocumented (Part IV: DBCC SQLPERF)

Besides the documented LOGSPACE parameter (see previous post), DBCC SQLPERF utility has several undocumented parameters which allow providing statistics about schedulers, threads, spinlocks, IO, network, read-aheads, respectively waits.

Scheduler Statistics

By providing 'umsstats' as parameter, the utility returns as result the visible UMS schedulers on the system:

-- visible UMS schedulers
DBCC SQLPERF(umsstats)

Output:

Statistic	Value
Node Id	0
Avg Sched Load	5
Sched Switches	6903
Sched Pass	6721358
IO Comp Passes	11334
Scheduler ID	0
online	1
num tasks	6
num runnable	0
num workers	9
active workers	6
work queued	0
cntxt switches	7125444
cntxt switches(idle)	9898903
preemptive switches	2304
Scheduler ID	1
online	1
num tasks	6
num runnable	0
num workers	9
active workers	5
work queued	0
cntxt switches	3370432
cntxt switches(idle)	4427991
preemptive switches	22729

The fields have the following meaning:

Statistic	Description
Node Id
Avg Sched Load
Sched Switches	The number of switches between schedulers
Sched Pass
IO Comp Passes
Scheduler ID	The scheduler's zero-based ID number
num tasks	The number of tasks associated with the scheduler
num runnable	The number of workers on the runnable list
num workers	The total number of workers associated with the scheduler
idle workers	The number of idle workers
work queued	The number of items waiting to be processed in the work queue
cntxt switches	The number of switches between workers for the scheduler
cntxt switches(idle)	The number of times the idle loop was switched into

The functionality is useful when one suspects that there are issues related to the visible schedulers.

Detailed information about the schedulers can be found also via the sys.dm_os_schedulers DMV.

Threads Statistics

By providing 'threads' as parameter, the utility returns as result the threads created in the system:

--  thread statistics
DBCC SQLPERF(threads)

Output (just the first records):

Spid	Thread ID	Status	LoginName	IO	CPU	MemUsage
1	11228	background	sa	0	0	0
2	12572	background	sa	0	0	0
3	12576	background	sa	0	0	0
4	11884	background	sa	0	0	0
5	12964	background	sa	0	0	0
6	12960	background	sa	0	0	4
7	12968	background	sa	0	0	0

Detailed information about the schedulers can be found also via the sys.dm_os_threads DMV.

IO Statistics

By providing 'iostats' as parameter, the utility returns as result a count of the outstanding reads, respectively writes:

--  IO statistics
DBCC SQLPERF(iostats)

Output:

Statistic	Value
Reads Outstanding	0
Writes Outstanding	0

Network Statistics

By providing 'netstats' as parameter, the utility returns as result network-related statistics:

--  network statistics
DBCC SQLPERF(netstats)

Output:

Statistic	Value
Network Reads	6976
Network Writes	9036
Network Bytes Read	5318957
Network Bytes Written	2,222512E+07
Command Queue Length	0
Max Command Queue Length	0
Worker Threads	0
Max Worker Threads	0
Network Threads	0
Max Network Threads	0

Read Ahead Statistics

By providing 'rastats' as parameter, the utility returns read-ahead statistics:

--  read ahead statistics
DBCC SQLPERF(rastats)

Output

Statistic	Value
RA Pages Found in Cache	0
RA Pages Placed in Cache	0
RA Physical IO	0
Used Slots	0

Spinlock Statistics

By providing 'spinlockstats' as parameter, the utility returns the spinlock statistics, where a spinlock is a a lightweight synchronization object used to serialize access to data structures which are typically held for a short period of time:

--  spinlock statistics
DBCC SQLPERF(spinlockstats)

Output:

Spinlock Name	Collisions	Spins	Spins/Collision	Sleep Time (ms)	Backoffs
LOCK_RW_TEST	0	0	0	0	0
LOCK_RW_SECURITY_CACHE	711	210500	296,0619	0	54
LOCK_RW_CMED_HASH_SET	5	1750	350	0	1
LOCK_RW_ABTX_HASH_SET	0	0	0	0	0
LOCK_RW_RBIO_REQ	0	0	0	0	0

Detailed information about the spinlock stats can be found also via the sys.dm_os_spinlock_stats DMV.

Wait Statistics

By providing 'waitstats' as parameter, the utility returns the available wait statistics:

-- wait statistics 
DBCC SQLPERF(waitstats)

Output (only a few records):

Wait Type	Requests	Wait Time	Signal Wait Time
LCK_M_SCH_M	18	597	3
LCK_M_S	13	16895	0
PAGEIOLATCH_SH	1789	38568	25
PAGEIOLATCH_UP	118	101	0
PAGEIOLATCH_EX	736	10490	16

Detailed information about the wait stats can be found also via sys.dm_os_wait_stats DMV.

Notes:
As Microsoft warns, the undocumented features shouldn't be used into production environments as they will be deprecated in future versions. Instead should be used the documented DMVs, when available.
All objects mentioned above require VIEW SERVER STATE permissions.
The DBCC SQLPERF utility allows resetting the latch, spinlock, respectively the wait statistics by providing the following parameters (see the SQL Docs for more information):

-- resetting the latch statistics
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR)

-- resetting the spinlock statistics
DBCC SQLPERF ('sys.dm_os_spinlock_stats', CLEAR);  

-- resetting the wait statistics
DBCC SQLPERF('sys.dm_os_latch_stats', CLEAR)

DBCC PERFMON provides in a single call the IO, network, read ahead, spinlocks, respectively the wait statistics in distinct resultsets:

-- performance statistics
DBCC PERFMON

Pages

01 March 2025

💎🤖💫SQL Reloaded: Copilot Stories (Part VII: Using the sp_MSForEachTable Undocumented Stored Procedures in SQL Server)

23 January 2025

💎SQL Reloaded: Number of Records VI (via sp_MSForEachTable Undocumented Stored Procedure)

💎🏭SQL Reloaded: Number of Records V (via Cursors, a Solution for Warehouses in Microsoft Fabric)

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

13 January 2021

💠🛠️SQL Server: Undocumented (Part IV: DBCC SQLPERF)

About Me