SQL Troubles: January 2023

28 January 2023

💎SQL Reloaded: Temporary Tables and Tempdb in Serverless SQL Pool

In SQL Server, temporary tables are stored in tempdb database and the front end (FE) SQL Server from a serverless SQL Server pool makes no exception from it, however there's no such database listed in the list of databases available. Watching today Brian Bønk's session on "Azure Synapse serverless - CETAS vs Views vs Parquet" at Data Toboggan winter edition 2023, the speaker pointed out that the tables available in tempdb database can be actually queried:

-- Azure Serverless SQL pool: tempdb tables
SELECT top 10 *
FROM tempdb.information_schema.tables

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	TABLE_TYPE
tempdb	dbo	#dm_continuous_copy_status_..._0000000029C7	BASE TABLE
tempdb	dbo	diff_MonDmSbsConnections	BASE TABLE
tempdb	dbo	diff_MonTceMasterKeys	BASE TABLE
tempdb	dbo	diff_MonTceColEncryptionKey	BASE TABLE
tempdb	dbo	diff_MonAutomaticTuningState	BASE TABLE
tempdb	dbo	diff_MonDmTranActiveTransactions	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveUsageInfo	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveColEncryptionKey	BASE TABLE
tempdb	dbo	diff_MonTceEnclaveMasterKeys	BASE TABLE
tempdb	dbo	diff_MonTceColumnInfo	BASE TABLE

Moreover, the content of the system tables available can be queried, even if some of the tables might have no data:

-- Azure Serverless SQL pool: checking a table's content
SELECT top 10 *
FROM tempdb.dbo.dmv_view_run_history

How about the temporary tables created? To check this, let's reuse a temp table created recently for listing the DMVs available in the serverlss SQL pool:

-- dropping the temp table
DROP TABLE IF EXISTS dbo.#views;

-- create the temp table
CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
)

-- inserting a few records
INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  AND obj.name LIKE 'dm_exec_requests%'

-- checking temp table's content
SELECT *
FROM dbo.#views

ranking	view_name
1	sys.dm_exec_requests
2	sys.dm_exec_requests_history

Here's temp table's metadata:

-- checking temp table's metadata
SELECT *
FROM tempdb.information_schema.tables
WHERE table_name like '#views%'

TABLE_CATALOG	TABLE_SCHEMA	TABLE_NAME	TABLE_TYPE
tempdb	dbo	#views_____..._____00000000295B	BASE TABLE

Of course, you can query also the tempdb.sys.all_objects:

-- checking temp table's metadata
SELECT *
FROM tempdb.sys.all_objects
WHERE name like '#views%'

You can use now the table name returned by any of the two queries to call the table (just replace the name accordingly):

-- querying the table via tempdb (name will differ)
SELECT *
FROM tempdb.dbo.[#views______________________________________________________________________________________________________________00000000295B]

ranking	view_name
1	sys.dm_exec_requests
2	sys.dm_exec_requests_history

The benefit of knowing that is neglectable, however the topic is interesting more for the ones who want to know how the pools in Azure Synapse work, what was kept or removed compared with the standard editions of SQL Server.

Thus, another topic interesting for DBAs would be how many files are created for the tempdb, given that the database is used to store intermediate data for the various operations. Finding the optimal number of tempdb files and configuring them was and still is one of the main concerns when configuring an SQL Server instance for optimal performance.

The old query developed for standard SQL Server seems to work as well:

-- Azure Serverless SQL pool: tempdb files 
SELECT dbf.file_id
, dbf.name file_name
--, dbf.physical_name
, dsp.name file_group
--, type 
, dbf.type_desc file_type
--, dbf.growth growth_kb
, Cast(dbf.growth/128.0  as decimal(18,2)) growth_mb
, dbf.is_percent_growth
--, dbf.max_size max_size_kb
, Cast(NullIf(dbf.max_size, -1)/128.0  as decimal(18,2)) max_size_mb
--, dbf.size file_size_kb
, Cast(dbf.size/128.0 as decimal(18,2)) file_size_mb
, dbf.state_desc 
, dbf.is_read_only 
FROM tempdb.sys.database_files dbf
     LEFT JOIN tempdb.sys.data_spaces dsp
       ON dbf.data_space_id = dsp.data_space_id
ORDER BY dbf.Name

file_id	file_name	file_group	file_type	growth_mb	is_percent_growth	max_size_mb	file_size_mb	state_desc	is_read_only
1	tempdev	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
3	tempdev2	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
4	tempdev3	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
5	tempdev4	PRIMARY	ROWS	32	0	102670	16	ONLINE	0
2	templog	NULL	LOG	64	0	16384	255	ONLINE	0

So, there were created 4 data files of 16 MB each, with a fix growth of 32 MB, and the files can grow up to about 100 GB. What would be interesting to check is how the files grow under heavy workload, what kind of issues this raises, etc. At least in serverless SQL pools many of the views one used to use for troubleshooting aren't accessible. Maybe one doesn't have to go that far, given that the resources are managed automatically.

Notes:

(1) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1].

Happy coding!

Last updated: Oct-2024

Previous Post <<||>> Next Post

References:
[1] Haripriya SB (2024) Do NOT drop #temp tables (link)

25 January 2023

💎SQL Reloaded: Documenting External Tables

When using serverless SQL pool, CETAS (aka Create External Table as Select) are the main mechanism of making data available from the Data Lake for queries. In case is needed to document them, sys.external_tables DMV can be used to export their metadata, much like sys.tables or sys.views can be used for the same:

-- CETAS metadata
SELECT TOP (50) ext.object_id
, ext.name
, schema_name(ext.schema_id) [schema_name]
, ext.type_desc
, ext.location
, ext.data_source_id
, ext.file_format_id
, ext.max_column_id_used
, ext.uses_ansi_nulls
, ext.create_date
, ext.modify_date
FROM sys.external_tables ext

It's interesting that CETAS have type_desc = 'USER_TABLE' in sys.all_objects, same like user-defined tables in SQL Server have:

-- CETAS' metadata via sys.all_objects
SELECT *
FROM sys.all_objects
WHERE object_id = object_id('<schema_name>.<CETAS name>')

The data source and file format can be retrieved via the sys.external_data_sources and sys.external_file_formats DMVs. Moreover, it's useful to include the logic into a view, like the one below:

-- drop view
--DROP VIEW IF EXISTS dbo.vAdminExternalTables

-- create view
CREATE VIEW dbo.vAdminExternalTables
AS
-- external tables - metadata 
SELECT ext.object_id
, sch.name + '.' + ext.name [unique_identifier]
, sch.name [schema]
, ext.name [object]
, ext.type_desc [type]
, ext.max_column_id_used 
, ext.location
, eds.name data_source 
, eff.name file_format 
, ext.create_date 
, ext.modify_date
FROM sys.external_tables ext
     JOIN sys.schemas sch
       ON ext.schema_id = sch.schema_id 
     JOIN sys.external_data_sources eds 
       ON ext.data_source_id = eds.data_source_id
     JOIN sys.external_file_formats eff
       ON ext.file_format_id = eff.file_format_id

-- testing the view
SELECT top 10 *
FROM dbo.vAdminExternalTables

The view can be used then for further queries, for example checking the CETAS created or modified starting with a given date:

-- external tables created after a certain date
SELECT *
FROM dbo.vAdminExternalTables ext
WHERE ext.create_date >= '20230101'
  OR ext.modify_date >= '20230101';

Or, when the CETAS are deployed from one environment to another, one can compare the datasets returned by the same view between environments, something like in the below query:

-- comparison external tables metadata between two databases
SELECT *
FROM (
    SELECT *
    FROM <test_database>.dbo.vAdminExternalTables
    WHERE [Schema] = '<schema_name>'
	) PRD
	FULL OUTER JOIN (
    SELECT *
    FROM <prod_database>.dbo.vAdminExternalTables
    WHERE [Schema] = '<schema_name>'
	) UAT
	ON PRD.[unique_identifier] = UAT.[unique_identifier]
-- WHERE PRD.[unique_identifier] IS NULL OR UAT.[unique_identifier] IS NULL

The definitions for multiple CETAS can be exported from the source database in one step via the Object Explorer Details >> Tables >> External tables >> (select CETAS) >> Script Table as >> ... .

Happy coding!

Previous Post <<||>> Next Post

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

Identifying the SQL Server DMVs which are accessible for the Serverless SQL pool (see previous post), allowed me to identify besides sys.dm_exec_requests_history three more DMVs with statistics on the statements executed on the server: sys.dm_request_phases, sys.dm_request_phases_task_group_stats and sys.dm_request_phases_exec_task_stats. Untofurtunately, there seems to be no documentation available on these DMVs, and, at the time the post was written, there were also no further hits on google.com or bing.com found on the same.

sys.dm_request_phases

sys.dm_request_phases provides insights in the phases an execution statement goes through, and seems to summarize the other two views:

-- Azure Serverless SQL pool: request phases
SELECT TOP (100) dist_statement_id
, RPH.dist_request_id
, TRY_CAST(RPH.id as bigint) id
, TRY_CAST(RPH.parent_ids as bigint) parent_ids
, RPH.start_time
, RPH.end_time
--, RPH.total_elapsed_time_ms
--, RPH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, RPH.min_time_ms
--, RPH.min_time_ms/1000.0 min_time_sec
--, RPH.max_time_ms
--, RPH.max_time_ms/1000.0 max_time_sec
--, RPH.avg_time_ms
, RPH.avg_time_ms/1000.0 avg_time_sec
--, RPH.stdev_time_ms
--, RPH.stdev_time_ms/1000.0 stdev_time_sec -- it has no values
--, RPH.min_rows
--, RPH.max_rows
--, RPH.avg_rows
--, RPH.stdev_rows -- it has no values
, RPH.total_rows
--, RPH.total_bytes_processed
, RPH.total_bytes_processed/1028.0 total_kb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
FROM sys.dm_request_phases RPH
ORDER BY Id

dist_statement_id	dist_request_id	id	parent_ids	start_time	end_time	avg_time_sec	total_rows	total_kb_processed	state_desc	operation_type	input_dop	output_dop	task_retries	error_id
8C4386DC...	820E9FC6...	1	2	...09:58:34.213	...09:58:36.337	1.031	2030	2343.310311	succeeded	Shuffle	1	1	0	0
8C4386DC...	820E9FC6...	2	0	...09:58:36.447	...09:58:39.713	1.891	9	7145.193579	succeeded	Return	1	1	0	0
C9524971...	680DCB55...	3	4	...10:05:46.747	...10:05:47.057	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
C9524971...	680DCB55...	4	0	...10:05:47.057	...10:05:48.480	1.406	0	6630.101167	succeeded	Return	1	1	0	0
FD2D17AD...	C9453EF2...	5	6	...11:58:54.060	...11:58:55.297	0.547	10	1534.098249	succeeded	ComputeToControlNode	1	1	0	0
FD2D17AD...	C9453EF2...	6	0	...11:58:55.297	...11:58:55.420	0.125	10	4.074902	succeeded	Return	1	1	0	0
9FB0A268...	CAA533DE...	7	8	...11:59:16.483	...11:59:16.700	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
9FB0A268...	CAA533DE...	8	0	...11:59:16.700	...11:59:18.640	1.922	6	7143.673151	succeeded	Return	1	1	0	0
1732AB0D...	AC1A4F10...	9	10	...11:59:25.950	...11:59:26.140	0.172	2030	2343.310311	succeeded	Shuffle	1	1	0	0
1732AB0D...	AC1A4F10...	10	0	...11:59:26.140	...11:59:27.450	1.297	9	6635.185797	succeeded	Return	1	1	0	0

Notes:
1) The foreign keys and dates (in the above and below queries) were truncated to accomodate all the important attributes in the snapshot of the values returned.
2) Based on the exisitng queries, there are two records for each executed statement, a Shuffle or ComputeToControlNode followed by a Return (see operation_type). In more complex scenario there are several Shuffles and Broadcasts and a Return. According to the Microsoft team, even if for serverless SQL pools there's no Data Movement Service (DMS), there's a similar algorithm responsible for moving the data between the nodes.
3) Because in serverless SQL pool each query has its own distribution statement id, the min, max, avg and total values will have the sames values across the columns. Therefore, the columns with redundant values were commented.
4) The Id of the request phase seems to have numeric values despite being defined as alphanumeric. I tried to cast the values to bigint for sorting purposes.

sys.dm_request_phases_task_group_stats

sys.dm_request_phases_task_group_stats stores metadata about the requests breakdown at task group:

-- Azure Serverless SQL pool: request phases breakdown at task group
SELECT TOP (100) RPT.dist_request_id
, TRY_CAST(RPT.id as bigint) id
, TRY_CAST(RPT.parent_ids as bigint) parent_ids
, RPT.dist_statement_id
, RPT.state_desc
, RPT.start_time
, RPT.end_time
, RPT.input_dop
, RPT.output_dop
, RPT.operation_type
, RPT.task_retries
FROM sys.dm_request_phases_task_group_stats RPT
ORDER BY id

dist_request_id	id	parent_ids	dist_statement_id	state_desc	start_time	end_time	input_dop	output_dop	operation_type	task_retries
820E9FC6...	1	2	8C4386DC...	succeeded	638098055142132551	638098055163382693	1	1	Shuffle	0
820E9FC6...	2	0	8C4386DC...	succeeded	638098055164476163	638098055197133001	1	1	Return	0
680DCB55...	3	4	C9524971...	succeeded	638098059467450021	638098059470574953	1	1	Shuffle	0
680DCB55...	4	0	C9524971...	succeeded	638098059470574953	638098059484793682	1	1	Return	0
C9453EF2...	5	6	FD2D17AD...	succeeded	638098127340607112	638098127352951067	1	1	ComputeToControlNode	0
C9453EF2...	6	0	FD2D17AD...	succeeded	638098127352951067	638098127354202970	1	1	Return	0
CAA533DE...	7	8	9FB0A268...	succeeded	638098127564826084	638098127567013504	1	1	Shuffle	0
CAA533DE...	8	0	9FB0A268...	succeeded	638098127567013504	638098127586388549	1	1	Return	0
AC1A4F10...	9	10	1732AB0D...	succeeded	638098127659513620	638098127661388514	1	1	Shuffle	0
AC1A4F10...	10	0	1732AB0D...	succeeded	638098127661388514	638098127674513601	1	1	Return	0

Notes:
1) The DVM seems to return the same number of records as sys.dm_request_phases.
2) Observe the format of the start_time and end_time, probably the timestamps come from the Spark cluster and were not translated into an SQL Server data type.

sys.dm_request_phases_exec_task_stats

sys.dm_request_phases_exec_task_stats stores metadata about the requests breakdown at task level:

-- Azure Serverless SQL pool: request phases breakdown at task
SELECT TOP (100) RPE.dist_request_id
, TRY_CAST(RPE.id as bigint) id
--, RPE.min_time_ms
--, RPE.max_time_ms
, RPE.avg_time_ms/1000.0 avg_time_sec
--, RPE.stdev_time_ms
, RPE.total_bytes_processed
--, RPE.min_rows
--, RPE.max_rows
--, RPE.avg_rows
--, RPE.stdev_rows
, RPE.total_rows
, RPE.error_id
FROM sys.dm_request_phases_exec_task_stats RPE
ORDER BY id

dist_request_id	id	avg_time_sec	total_kb_processed	total_rows	error_id
820E9FC6...	1	1.031	2343.310311	2030	0
820E9FC6...	2	1.891	7145.193579	9	0
680DCB55...	3	0.203	2343.310311	2030	0
680DCB55...	4	1.406	6630.101167	0	0
C9453EF2...	5	0.547	1534.098249	10	0
C9453EF2...	6	0.125	4.074902	10	0
CAA533DE...	7	0.203	2343.310311	2030	0
CAA533DE...	8	1.922	7143.673151	6	0
AC1A4F10...	9	0.172	2343.310311	2030	0
AC1A4F10...	10	1.297	6635.185797	9	0

What does all this mean?

The lack of documentation makes it challenging to interpret the values of the views besides the data and metadata they offer. In a paper on POLARIS, the code given to the serveless SQL pool engine, a taks is defined as "a careful packaging of data and query processing into units [...] that can be readily moved across compute nodes and re-started at the task level" [1]. Therefore, one can assume that this is the level targetted by the sys.dm_request_phases_exec_task_stats DMV. Further on, the tasks are grouped at phase level according to the sys.dm_request_phases_task_group_stats, the metadata from the two DMVs being further combined into sys.dm_request_phases DMV.

If the meaning is kept from dedicated SQL pools, a shuffle operation indicates that data is moved between the frontend and backend nodes to satisfy a request, while a Result represents the operation of returning the result selt to client. The "ComputeToControlNode" operation involves a simple select (e.g. SELECT top 10) from a CETA and therefore no "Shuffle" is needed.

Requests' history

Further on, one can use the "Distributed statement id" to join the execution request phases with the request history, however matches will be found only for a small subset of the records (probably the executions since the pool started):

-- Azure Serverless SQL pool: requests history with request phase info
SELECT top 100 ERH.status
, ERH.transaction_Id
, ERH.distributed_statement_Id 
, ERH.query_hash 
--, ERH.login_name 
, ERH.start_time
, ERH.end_time 
, ERH.command 
, ERH.query_text 
--, ERH.total_elapsed_time_ms
, ERH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, ERH.data_processed_mb
, ERH.data_processed_mb
, RPH.avg_time_ms/1000.0 avg_time_sec
, RPH.total_rows
, RPH.total_bytes_processed/1028.0/1028.0 total_mb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
, ERH.error
, ERH.error_code 
FROM sys.dm_exec_requests_history ERH
     JOIN sys.dm_request_phases RPH
	   ON ERH.distributed_statement_Id = RPH.dist_statement_id
	  --AND RPH.parent_ids = 0 -- only the parent
ORDER BY RPH.Id DESC

Here's a subset of the result set focusing only on the statistical values:

distr_statement_Id	start_time	end_time	total_elapsed_time_sec	data_processed_mb	avg_time_sec	total_rows	total_mb_processed	operation_type	id	parent_ids
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.031	2030	2.279484738326	Shuffle	1	2
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.891	9	6.950577411478	Return	2	0
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	0.203	2030	2.279484738326	Shuffle	3	4
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	1.406	0	6.449514753891	Return	4	0
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.547	10	1.492313471789	ComputeToControlNode	5	6
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.125	10	0.003963912451	Return	6	0
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	0.203	2030	2.279484738326	Shuffle	7	8
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	1.922	6	6.949098395914	Return	8	0
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	0.172	2030	2.279484738326	Shuffle	9	10
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	1.297	9	6.454460892023	Return	10	0

Notes:
As can be seen, the volume of data processed and the elapsed time values don't match between the two tables, though they are close. The differences probably result from further steps occuring in the process.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Josep Aguilar-Saborit, Raghu Ramakrishnan et al, "POLARIS: The Distributed SQL Engine in Azure Synapse", VLDB Conferences. PVLDB, 13(12): 3204 – 3216, 2020, DOI: https://doi.org/10.14778/3415478.3415545

15 January 2023

💎🏭SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views I

I feel sometimes flying blind when I build or troubleshoot SQL queries and I don't have the query plan and/or further statistics to understand how the database engine works, why some queries take longer than expected, etc. Unfortunately, Synapse serverless SQL pool doesn't seem to support showing exection plans in SQL Server Management Studio as per now (SHOWPLAN_XML is not supported for SET). I looked at my old queries based on the the sys.dm_exec_requests and sys.dm_exec_query_stats DMVs, however the results didn't proved to be what I was searching for. (

This weekend, I found Sidney Cirqueira's post on monitoring Synapse serverless SQL pools where he describes how to do that via the Monitoring hub, DMVs, QPI library, respectively Log Analytics. (You should check regularly the Azure Synapse Analytics Blog as it's full of goodies!)

Thus, I found out that there's a new DMV called sys.dm_exec_requests_history which provides at least the duration and the volume of data processes by each statement run on the service:

-- Azure Serverless SQL pool: requests' history 
SELECT top 100 ERH.status
, ERH.transaction_Id
, ERH.distributed_statement_Id 
, ERH.query_hash 
, ERH.login_name 
, ERH.start_time
, ERH.end_time 
, ERH.command 
, ERH.query_text 
--, ERH.total_elapsed_time_ms
, ERH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec

--, ERH.data_processed_mb
, ERH.data_processed_mb/1028.0 data_processed_gb
, ERH.error
, ERH.error_code 
FROM sys.dm_exec_requests_history ERH
ORDER BY ERH.data_processed_mb DESC

It isn't much information, compared with the columns returned by sys.dm_exec_requests, but it's something to start with. At least it allows focusing on the queries with the longest duration (use the above query sorting the records based on the total_elapsed_time_ms descending) or highest volume of data processed:

-- Azure Serverless SQL pool: queries with most data processed
SELECT TOP 50 ERH.query_text  
, COUNT(*) no_runs
, SUM(ERH.total_elapsed_time_ms) total_elapsed_time_ms
, SUM(ERH.data_processed_mb) data_processed_mb
, SUM(ERH.data_processed_mb/1028.0) data_processed_gb
, MIN(ERH.start_time) first_run_date
, MAX(ERH.start_time) last_run_date
FROM sys.dm_exec_requests_history ERH
GROUP BY ERH.query_text
HAVING COUNT(*)>1
ORDER BY data_processed_mb DESC

The same query can be slightly changed to retrieve the volume of data processed by month:

-- Azure Serverless SQL pool: data processed by month
SELECT Convert(nvarchar(7), ERH.start_time, 23) [period]
, COUNT(*) no_runs
, SUM(ERH.total_elapsed_time_ms) total_elapsed_time_ms
, SUM(ERH.data_processed_mb) data_processed_mb
, SUM(ERH.data_processed_mb/1028.0) data_processed_gb
, MIN(ERH.start_time) first_run_date
, MAX(ERH.start_time) last_run_date
FROM sys.dm_exec_requests_history ERH
GROUP BY Convert(nvarchar(7), ERH.start_time, 23)
HAVING COUNT(*)>1
ORDER BY data_processed_mb DESC

One can add in the grouping also the login name to break down the analysis by the login that issued the query. Organization's domain can be used to differentiate between system or organization-baed queries.

The volume of data processed is stored also in the sys.dm_external_data_processed DMV aggregated for the current day, week, respectively month as part of the cost control related feature:

-- Azure Serverless SQL pool: volume of data processed
SELECT type 
, data_processed_mb 
, data_processed_mb/1028.0 data_processed_gb
FROM sys.dm_external_data_processed

And here's how the output looks like:

type	data_processed_mb	data_processed_gb
daily	230	0.223735
weekly	377	0.366731
monthly	223522	217.433852

Notes:

1) I still need to play with the DMVs to understand their scope and limitations.

2) The view appears also in the list of DMVs I idenfitied to be supported the by Synapse serverless SQL pool. As I discoered later, 3 more DMVs are available with useful statistics.

3) ~~The queries based on sys.dm_exec_requests and sys.dm_exec_query_stats DMVs seem to return only the running query based on them.~~ (Actually, the DMVs seem to work.)

4) The view is available also in SQL Server 2022, though it doesn't seem to be used.

5) According to the above-mentioned source, the view is provided for ticket purposes to help customers better troubleshooting the SQL requests. Use the distributed_statement_id in the tickets raised with Microsoft to troubleshoot any issues with Synapse.

6) Unfortunately, also the useful Query Store feature is not yet supported, even if the DMVs related to it seem to be available. Attempting to enable it results in the error:

"Msg 15869, Level 16, State 9, Line 1
QUERY_STORE is not supported for ALTER DATABASE"

Previous Post <<||>> Next Post

Happy coding!

💎🏭SQL Reloaded: Data Management Views for the Synapse serverless SQL pool (& Microsoft Fabric Warehouse)

Unfortunately, the Dynamic Management Views (DMVs) for serverless SQL Server pools don't seem to be documented (or at least I haven't found them in the standard SQL Server documentation). I was thinking some weeks back how I could retrieve them easily as cursors aren't supported in serverless. In the end the old-fashioned loop got the job done (even if might not be the best way to do it):

-- retrieving the data management views in use with the number of records they held
DECLARE @view_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_views int, @iterator int

DROP TABLE IF EXISTS dbo.#views;

CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
)

INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  --AND obj.name LIKE 'dm_exec_requests%'
ORDER BY view_name
SET @iterator = 1
SET @number_views = IsNull((SELECT count(*) FROM #views), 0)

WHILE (@iterator <= @number_views)
BEGIN 
    SET @view_name = (SELECT view_name FROM #views WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @view_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)> 0  
		BEGIN
		  SELECT @view_name, @number_records
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
        END CATCH;

	SET @iterator = @iterator + 1
END

DROP TABLE IF EXISTS dbo.#views;

As can be seen the code above retrieves the system views and dumps them in a temporary table, then loops through each record and for each record retrieves the number of records available with the sp_executesql. The call to the stored procedure is included in a TRY/CATCH block to surpress the error messages, considering that many standard SQL Server DMVs are not supported. The error messages follow the same pattern:

"Msg 15871, Level 16, State 9, Line 187
DMV (Dynamic Management View) 'dm_resource_governor_resource_pool_volumes' is not supported."

On the instance I tested the code, from a total of 729 DMVs only 171 records were returned, though maybe there are some views not shown because the feature related to them was not yet configured:

View name	Description
INFORMATION_SCHEMA.COLUMNS	Returns one row for each column (*)
INFORMATION_SCHEMA.PARAMETERS	Returns one row for each parameter of a user-defined function or stored procedure (*)
INFORMATION_SCHEMA.ROUTINE_COLUMNS	Returns one row for each column returned by the table-valued functions (*)
INFORMATION_SCHEMA.ROUTINES	Returns one row for each stored procedure and function (*)
INFORMATION_SCHEMA.SCHEMATA	Returns one row for each schema in the current database
INFORMATION_SCHEMA.TABLES	Returns one row for each table or view in the current database (*)
INFORMATION_SCHEMA.VIEW_COLUMN_USAGE	Returns one row for each column in the current database that is used in a view definition
INFORMATION_SCHEMA.VIEW_TABLE_USAGE	Returns one row for each table in the current database that is used in a view
INFORMATION_SCHEMA.VIEWS	Returns one row for each view that can be accessed by the current user in the current database
sys.all_columns
sys.all_objects
sys.all_parameters
sys.all_sql_modules
sys.all_views
sys.allocation_units
sys.assemblies
sys.assembly_files
sys.assembly_types
sys.columns
sys.configurations
sys.credentials
sys.data_spaces
sys.database_automatic_tuning_options
sys.database_automatic_tuning_options_internal
sys.database_credentials
sys.database_files
sys.database_filestream_options
sys.database_mirroring
sys.database_mirroring_endpoints
sys.database_permissions
sys.database_principals
sys.database_query_store_internal_state
sys.database_query_store_options
sys.database_recovery_status
sys.database_resource_governor_workload_groups
sys.database_role_members
sys.database_scoped_configurations
sys.database_scoped_credentials
sys.databases
sys.dm_exec_connections
sys.dm_exec_query_stats
sys.dm_exec_requests	Returns information about each request that is executing in SQL Server.
sys.dm_exec_requests_history	Returns information about each request that executed in SQL Server; provided by Microsoft for troubleshooting.
sys.dm_exec_sessions
sys.dm_external_data_processed
sys.dm_os_host_info
sys.dm_request_phases	Returns information about each request phase performed in request's execution.
sys.dm_request_phases_exec_task_stats	Returns information about each task performed in request's execution.
sys.dm_request_phases_task_group_stats	Returns information aggregated at task group level about each task performed in request's execution.
sys.endpoints
sys.event_notification_event_types
sys.extended_properties
sys.external_data_sources
sys.external_file_formats
sys.external_language_files
sys.external_languages
sys.external_table_columns
sys.external_tables
sys.filegroups
sys.fulltext_document_types
sys.fulltext_languages
sys.fulltext_system_stopwords
sys.identity_columns
sys.index_columns
sys.indexes
sys.internal_tables
sys.key_encryptions
sys.linked_logins
sys.login_token
sys.master_files
sys.messages
sys.objects
sys.parameters
sys.partitions
sys.procedures
sys.query_store_databases_health
sys.query_store_global_health
sys.resource_governor_configuration
sys.resource_governor_external_resource_pools
sys.resource_governor_resource_pools
sys.resource_governor_workload_groups
sys.routes
sys.schemas
sys.securable_classes
sys.server_audit_specification_details
sys.server_audit_specifications
sys.server_audits
sys.server_event_session_actions
sys.server_event_session_events
sys.server_event_session_fields
sys.server_event_session_targets
sys.server_event_sessions
sys.server_memory_optimized_hybrid_buffer_pool_configuration
sys.server_permissions
sys.server_principals
sys.server_role_members
sys.servers
sys.service_contract_message_usages
sys.service_contract_usages
sys.service_contracts
sys.service_message_types
sys.service_queue_usages
sys.service_queues
sys.services
sys.spatial_reference_systems
sys.sql_dependencies
sys.sql_expression_dependencies
sys.sql_logins
sys.sql_modules
sys.stats
sys.stats_columns
sys.symmetric_keys
sys.sysaltfiles
sys.syscacheobjects
sys.syscharsets
sys.syscolumns
sys.syscomments
sys.sysconfigures
sys.syscurconfigs
sys.sysdatabases
sys.sysdepends
sys.sysfilegroups
sys.sysfiles
sys.sysindexes
sys.sysindexkeys
sys.syslanguages
sys.syslockinfo
sys.syslogins
sys.sysmembers
sys.sysmessages
sys.sysobjects
sys.sysoledbusers
sys.sysperfinfo
sys.syspermissions
sys.sysprocesses
sys.sysprotects
sys.sysservers
sys.system_columns
sys.system_components_surface_area_configuration
sys.system_internals_allocation_units
sys.system_internals_partition_columns
sys.system_internals_partitions
sys.system_objects
sys.system_parameters
sys.system_sql_modules
sys.system_views
sys.systypes
sys.sysusers
sys.tables
sys.tcp_endpoints
sys.time_zone_info
sys.trace_categories
sys.trace_columns
sys.trace_event_bindings
sys.trace_events
sys.trace_subclass_values
sys.trigger_event_types
sys.type_assembly_usages
sys.types
sys.user_token
sys.via_endpoints
sys.views
sys.xml_schema_attributes
sys.xml_schema_collections
sys.xml_schema_component_placements
sys.xml_schema_components
sys.xml_schema_facets
sys.xml_schema_model_groups
sys.xml_schema_namespaces
sys.xml_schema_types
sys.xml_schema_wildcards

Notes:
1) As can be seen, also the INFORMATION_SCHEMA views don't seem to be fully supprted.
2) "(*)" in description marks the views that can be accessed by the current user in the current database.

3) I removed the number of records as they are instance specific.
4) The code should work also on a dedicated SQL Server pool.
5) I hope to come back and showcase the usage of some of the most important views.

6) The script can be used for the Microsoft Fabric Warehouse, however each record will be shown in a different panel! One can use an additional temporary table to save the results or extend the views table and update the table with the result, like in the following script:

-- retrieving the data management views in use with the number of records they held
DECLARE @view_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_views int, @iterator int

DROP TABLE IF EXISTS dbo.#views;

CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
, record_count bigint NULL
)

INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
, NULL record_count
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  --AND obj.name LIKE 'dm_exec_requests%'
ORDER BY view_name

SET @iterator = 1
SET @number_views = IsNull((SELECT count(*) FROM #views), 0)

WHILE (@iterator <= @number_views)
BEGIN 
    SET @view_name = (SELECT view_name FROM #views WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @view_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)>= 0  
		BEGIN
		  UPDATE #views
                  SET record_count = @number_records
                  WHERE view_name = @view_name
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
    END CATCH;

	SET @iterator = @iterator + 1
END

SELECT *
FROM dbo.#views;

DROP TABLE IF EXISTS dbo.#views;

Happy coding!

Previous Post <<||>> Next Post

SQL Troubles

Pages

28 January 2023

💎SQL Reloaded: Temporary Tables and Tempdb in Serverless SQL Pool

25 January 2023

💎SQL Reloaded: Documenting External Tables

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

15 January 2023

💎🏭SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views I

💎🏭SQL Reloaded: Data Management Views for the Synapse serverless SQL pool (& Microsoft Fabric Warehouse)

About Me