SQL Troubles

15 January 2023

💎🏭SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views I

I feel sometimes flying blind when I build or troubleshoot SQL queries and I don't have the query plan and/or further statistics to understand how the database engine works, why some queries take longer than expected, etc. Unfortunately, Synapse serverless SQL pool doesn't seem to support showing exection plans in SQL Server Management Studio as per now (SHOWPLAN_XML is not supported for SET). I looked at my old queries based on the the sys.dm_exec_requests and sys.dm_exec_query_stats DMVs, however the results didn't proved to be what I was searching for. (

This weekend, I found Sidney Cirqueira's post on monitoring Synapse serverless SQL pools where he describes how to do that via the Monitoring hub, DMVs, QPI library, respectively Log Analytics. (You should check regularly the Azure Synapse Analytics Blog as it's full of goodies!)

Thus, I found out that there's a new DMV called sys.dm_exec_requests_history which provides at least the duration and the volume of data processes by each statement run on the service:

-- Azure Serverless SQL pool: requests' history 
SELECT top 100 ERH.status
, ERH.transaction_Id
, ERH.distributed_statement_Id 
, ERH.query_hash 
, ERH.login_name 
, ERH.start_time
, ERH.end_time 
, ERH.command 
, ERH.query_text 
--, ERH.total_elapsed_time_ms
, ERH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec

--, ERH.data_processed_mb
, ERH.data_processed_mb/1028.0 data_processed_gb
, ERH.error
, ERH.error_code 
FROM sys.dm_exec_requests_history ERH
ORDER BY ERH.data_processed_mb DESC

It isn't much information, compared with the columns returned by sys.dm_exec_requests, but it's something to start with. At least it allows focusing on the queries with the longest duration (use the above query sorting the records based on the total_elapsed_time_ms descending) or highest volume of data processed:

-- Azure Serverless SQL pool: queries with most data processed
SELECT TOP 50 ERH.query_text  
, COUNT(*) no_runs
, SUM(ERH.total_elapsed_time_ms) total_elapsed_time_ms
, SUM(ERH.data_processed_mb) data_processed_mb
, SUM(ERH.data_processed_mb/1028.0) data_processed_gb
, MIN(ERH.start_time) first_run_date
, MAX(ERH.start_time) last_run_date
FROM sys.dm_exec_requests_history ERH
GROUP BY ERH.query_text
HAVING COUNT(*)>1
ORDER BY data_processed_mb DESC

The same query can be slightly changed to retrieve the volume of data processed by month:

-- Azure Serverless SQL pool: data processed by month
SELECT Convert(nvarchar(7), ERH.start_time, 23) [period]
, COUNT(*) no_runs
, SUM(ERH.total_elapsed_time_ms) total_elapsed_time_ms
, SUM(ERH.data_processed_mb) data_processed_mb
, SUM(ERH.data_processed_mb/1028.0) data_processed_gb
, MIN(ERH.start_time) first_run_date
, MAX(ERH.start_time) last_run_date
FROM sys.dm_exec_requests_history ERH
GROUP BY Convert(nvarchar(7), ERH.start_time, 23)
HAVING COUNT(*)>1
ORDER BY data_processed_mb DESC

One can add in the grouping also the login name to break down the analysis by the login that issued the query. Organization's domain can be used to differentiate between system or organization-baed queries.

The volume of data processed is stored also in the sys.dm_external_data_processed DMV aggregated for the current day, week, respectively month as part of the cost control related feature:

-- Azure Serverless SQL pool: volume of data processed
SELECT type 
, data_processed_mb 
, data_processed_mb/1028.0 data_processed_gb
FROM sys.dm_external_data_processed

And here's how the output looks like:

type	data_processed_mb	data_processed_gb
daily	230	0.223735
weekly	377	0.366731
monthly	223522	217.433852

Notes:

1) I still need to play with the DMVs to understand their scope and limitations.

2) The view appears also in the list of DMVs I idenfitied to be supported the by Synapse serverless SQL pool. As I discoered later, 3 more DMVs are available with useful statistics.

3) ~~The queries based on sys.dm_exec_requests and sys.dm_exec_query_stats DMVs seem to return only the running query based on them.~~ (Actually, the DMVs seem to work.)

4) The view is available also in SQL Server 2022, though it doesn't seem to be used.

5) According to the above-mentioned source, the view is provided for ticket purposes to help customers better troubleshooting the SQL requests. Use the distributed_statement_id in the tickets raised with Microsoft to troubleshoot any issues with Synapse.

6) Unfortunately, also the useful Query Store feature is not yet supported, even if the DMVs related to it seem to be available. Attempting to enable it results in the error:

"Msg 15869, Level 16, State 9, Line 1
QUERY_STORE is not supported for ALTER DATABASE"

Previous Post <<||>> Next Post

Happy coding!

💎🏭SQL Reloaded: Data Management Views for the Synapse serverless SQL pool (& Microsoft Fabric Warehouse)

Unfortunately, the Dynamic Management Views (DMVs) for serverless SQL Server pools don't seem to be documented (or at least I haven't found them in the standard SQL Server documentation). I was thinking some weeks back how I could retrieve them easily as cursors aren't supported in serverless. In the end the old-fashioned loop got the job done (even if might not be the best way to do it):

-- retrieving the data management views in use with the number of records they held
DECLARE @view_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_views int, @iterator int

DROP TABLE IF EXISTS dbo.#views;

CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
)

INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  --AND obj.name LIKE 'dm_exec_requests%'
ORDER BY view_name
SET @iterator = 1
SET @number_views = IsNull((SELECT count(*) FROM #views), 0)

WHILE (@iterator <= @number_views)
BEGIN 
    SET @view_name = (SELECT view_name FROM #views WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @view_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)> 0  
		BEGIN
		  SELECT @view_name, @number_records
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
        END CATCH;

	SET @iterator = @iterator + 1
END

DROP TABLE IF EXISTS dbo.#views;

As can be seen the code above retrieves the system views and dumps them in a temporary table, then loops through each record and for each record retrieves the number of records available with the sp_executesql. The call to the stored procedure is included in a TRY/CATCH block to surpress the error messages, considering that many standard SQL Server DMVs are not supported. The error messages follow the same pattern:

"Msg 15871, Level 16, State 9, Line 187
DMV (Dynamic Management View) 'dm_resource_governor_resource_pool_volumes' is not supported."

On the instance I tested the code, from a total of 729 DMVs only 171 records were returned, though maybe there are some views not shown because the feature related to them was not yet configured:

View name	Description
INFORMATION_SCHEMA.COLUMNS	Returns one row for each column (*)
INFORMATION_SCHEMA.PARAMETERS	Returns one row for each parameter of a user-defined function or stored procedure (*)
INFORMATION_SCHEMA.ROUTINE_COLUMNS	Returns one row for each column returned by the table-valued functions (*)
INFORMATION_SCHEMA.ROUTINES	Returns one row for each stored procedure and function (*)
INFORMATION_SCHEMA.SCHEMATA	Returns one row for each schema in the current database
INFORMATION_SCHEMA.TABLES	Returns one row for each table or view in the current database (*)
INFORMATION_SCHEMA.VIEW_COLUMN_USAGE	Returns one row for each column in the current database that is used in a view definition
INFORMATION_SCHEMA.VIEW_TABLE_USAGE	Returns one row for each table in the current database that is used in a view
INFORMATION_SCHEMA.VIEWS	Returns one row for each view that can be accessed by the current user in the current database
sys.all_columns
sys.all_objects
sys.all_parameters
sys.all_sql_modules
sys.all_views
sys.allocation_units
sys.assemblies
sys.assembly_files
sys.assembly_types
sys.columns
sys.configurations
sys.credentials
sys.data_spaces
sys.database_automatic_tuning_options
sys.database_automatic_tuning_options_internal
sys.database_credentials
sys.database_files
sys.database_filestream_options
sys.database_mirroring
sys.database_mirroring_endpoints
sys.database_permissions
sys.database_principals
sys.database_query_store_internal_state
sys.database_query_store_options
sys.database_recovery_status
sys.database_resource_governor_workload_groups
sys.database_role_members
sys.database_scoped_configurations
sys.database_scoped_credentials
sys.databases
sys.dm_exec_connections
sys.dm_exec_query_stats
sys.dm_exec_requests	Returns information about each request that is executing in SQL Server.
sys.dm_exec_requests_history	Returns information about each request that executed in SQL Server; provided by Microsoft for troubleshooting.
sys.dm_exec_sessions
sys.dm_external_data_processed
sys.dm_os_host_info
sys.dm_request_phases	Returns information about each request phase performed in request's execution.
sys.dm_request_phases_exec_task_stats	Returns information about each task performed in request's execution.
sys.dm_request_phases_task_group_stats	Returns information aggregated at task group level about each task performed in request's execution.
sys.endpoints
sys.event_notification_event_types
sys.extended_properties
sys.external_data_sources
sys.external_file_formats
sys.external_language_files
sys.external_languages
sys.external_table_columns
sys.external_tables
sys.filegroups
sys.fulltext_document_types
sys.fulltext_languages
sys.fulltext_system_stopwords
sys.identity_columns
sys.index_columns
sys.indexes
sys.internal_tables
sys.key_encryptions
sys.linked_logins
sys.login_token
sys.master_files
sys.messages
sys.objects
sys.parameters
sys.partitions
sys.procedures
sys.query_store_databases_health
sys.query_store_global_health
sys.resource_governor_configuration
sys.resource_governor_external_resource_pools
sys.resource_governor_resource_pools
sys.resource_governor_workload_groups
sys.routes
sys.schemas
sys.securable_classes
sys.server_audit_specification_details
sys.server_audit_specifications
sys.server_audits
sys.server_event_session_actions
sys.server_event_session_events
sys.server_event_session_fields
sys.server_event_session_targets
sys.server_event_sessions
sys.server_memory_optimized_hybrid_buffer_pool_configuration
sys.server_permissions
sys.server_principals
sys.server_role_members
sys.servers
sys.service_contract_message_usages
sys.service_contract_usages
sys.service_contracts
sys.service_message_types
sys.service_queue_usages
sys.service_queues
sys.services
sys.spatial_reference_systems
sys.sql_dependencies
sys.sql_expression_dependencies
sys.sql_logins
sys.sql_modules
sys.stats
sys.stats_columns
sys.symmetric_keys
sys.sysaltfiles
sys.syscacheobjects
sys.syscharsets
sys.syscolumns
sys.syscomments
sys.sysconfigures
sys.syscurconfigs
sys.sysdatabases
sys.sysdepends
sys.sysfilegroups
sys.sysfiles
sys.sysindexes
sys.sysindexkeys
sys.syslanguages
sys.syslockinfo
sys.syslogins
sys.sysmembers
sys.sysmessages
sys.sysobjects
sys.sysoledbusers
sys.sysperfinfo
sys.syspermissions
sys.sysprocesses
sys.sysprotects
sys.sysservers
sys.system_columns
sys.system_components_surface_area_configuration
sys.system_internals_allocation_units
sys.system_internals_partition_columns
sys.system_internals_partitions
sys.system_objects
sys.system_parameters
sys.system_sql_modules
sys.system_views
sys.systypes
sys.sysusers
sys.tables
sys.tcp_endpoints
sys.time_zone_info
sys.trace_categories
sys.trace_columns
sys.trace_event_bindings
sys.trace_events
sys.trace_subclass_values
sys.trigger_event_types
sys.type_assembly_usages
sys.types
sys.user_token
sys.via_endpoints
sys.views
sys.xml_schema_attributes
sys.xml_schema_collections
sys.xml_schema_component_placements
sys.xml_schema_components
sys.xml_schema_facets
sys.xml_schema_model_groups
sys.xml_schema_namespaces
sys.xml_schema_types
sys.xml_schema_wildcards

Notes:
1) As can be seen, also the INFORMATION_SCHEMA views don't seem to be fully supprted.
2) "(*)" in description marks the views that can be accessed by the current user in the current database.

3) I removed the number of records as they are instance specific.
4) The code should work also on a dedicated SQL Server pool.
5) I hope to come back and showcase the usage of some of the most important views.

6) The script can be used for the Microsoft Fabric Warehouse, however each record will be shown in a different panel! One can use an additional temporary table to save the results or extend the views table and update the table with the result, like in the following script:

-- retrieving the data management views in use with the number of records they held
DECLARE @view_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_views int, @iterator int

DROP TABLE IF EXISTS dbo.#views;

CREATE TABLE dbo.#views (
  ranking int NOT NULL
, view_name nvarchar(150) NOT NULL
, record_count bigint NULL
)

INSERT INTO #views
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) view_name
, NULL record_count
FROM sys.all_views obj
WHERE obj.Type = 'V'
  AND obj.is_ms_shipped = 1
  --AND obj.name LIKE 'dm_exec_requests%'
ORDER BY view_name

SET @iterator = 1
SET @number_views = IsNull((SELECT count(*) FROM #views), 0)

WHILE (@iterator <= @number_views)
BEGIN 
    SET @view_name = (SELECT view_name FROM #views WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @view_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)>= 0  
		BEGIN
		  UPDATE #views
                  SET record_count = @number_records
                  WHERE view_name = @view_name
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
    END CATCH;

	SET @iterator = @iterator + 1
END

SELECT *
FROM dbo.#views;

DROP TABLE IF EXISTS dbo.#views;

Happy coding!

Previous Post <<||>> Next Post

08 January 2023

💠🛠️SQL Server: DELETE vs. TRUNCATE TABLE Cheat Sheet

The comparison between the DELETE and TRUNCATE TABLE commands resumes to more than saying that one method is faster than the other or that one should always use TRUNCATE TABLE when deleting all the records from a table, which typically is not advisable in production environments. I tried to provide an overview of the two commands, though this should be considered as "work in progress".

Disclaimer: please refer to the SQL Server Docs for the complete set of features broken down on version.

	DELETE	TRUNCATE TABLE
Definition	DML command that removes one or more rows from a table or view	DDL command that removes all rows from a table or specified partitions of a table, without logging the individual row deletions
Scope	tables, views, memory-optimized tables, common table expressions, MERGE, sp_MSforEachTable, linked servers	tables, partitions
Behavior	• removes rows one at a time and records an entry in the transaction log for each deleted row - for big tables the transaction log fills fast and may reach its limit • fully logged ⇒rollback supported • executed using a row lock, though locks might be escalated to a larger scope ⇒ performance may degrade • [views] deletes the records only from the base table • allows outputting the deleted records (via OUTPUT clause) • [heaps] pages made empty may remain allocated ⇒ can’t be reused by other objects	• equivalent of a DROP & CREATE TABLE • uses an optimized logging mode: - removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log - a deferred-drop mechanism unhooks the allocations for the table and putting them on the ‘deferred-drop queue’ for later processing by a background task deallocates all the pages and extents • fully logged, however rollback supported only with explicit transactions • leaves zero pages in the table • resets the identity property
Syntax (simple form)	DELETE <table_name> FROM <database> [WHERE <search_condition>] [OPTION (<query_options>)]	TRUNCATE TABLE <table_name> [ WITH ( PARTITIONS ( { <partition_number_expression> \| <range> }))]
Performance	• degrades with the numbers of records • [large tables] can cause the transaction log to become full	• the operation completes almost instantaneously • best practice because is faster and uses fewer system and transaction log resources
Constraints	• a DELETE may fail if - violates a trigger - tries to remove a row referenced by data in another table with a FOREIGN KEY constraint • TOP can’t be used in a DELETE statement against partitioned views • doesn’t reset the identity property	• can’t be used with views • can’t be used on tables: - referenced by a FOREIGN KEY constraint, except self-references - participate in an indexed view - published using transactional or merge replication - system-versioned temporal. - referenced by an EDGE constraint • can’t activate a trigger • can’t be used on views & memory-optimized tables
Permissions	• [minimum] DELETE permission on target table, and SELECT permission, it if includes a WHERE clause • default to - table owner - members of the sysadmin fixed server role - db_owner and db_datawriter fixed database roles • table owners & members of sysadmin, db_owner & db_securityadmin roles can transfer permissions to other users	• [minimum] ALTER permission on target table • default to - table owner - members of the sysadmin fixed server role - db_owner and db_ddladmin fixed database roles • permissions are not transferable • doesn’t support direct permissions (workaround: use TRUNCATE in stored procedure, and assign the required permission to it using the EXECUTE AS clause)
Scenarios	• delete a set of records from a table - based on fix constraints - based on records from another table - based on a join with a source table • empty a set of tables from a database
Recommendations	• use a TRUNCATE when is safe to delete all the records ⇒ make sure that a backup or copy of the data is available • [large tables] consider dropping the indexes before performing a DELETE when this covers all or most of the data, and recreate them afterwards • [large tables] if the volume of data to be deleted is big compared with the remaining data, consider moving the data to a table with a similar structure, perform a TRUNCATE and then move the data back (see [5]) • [large tables] consider deleting data in batches with log truncation in single-user mode (be careful in production environments) • [heaps] specify the TABLOCK hint in the DELETE statement • [heaps] create a clustered index on the heap before deleting the row
Myths	•TRUNCATE TABLE is a non-logged operation (see [4]) •TRUNCATE TABLE is a minimally logged operation (see [3])
Related concepts	non-logged/minimally-logged/fully-logged operations, deferred-drop mechanism, sp_MSforEachTable stored procedure, transaction log, MERGE, DDL, DML

Resources:
[1] Microsoft SQL Docs (2022) DELETE [link]
[2] Microsoft SQL Docs (2022) TRUNCATE TABLE [link]
[3] Microsoft TechNet (2017) SQL Server: Understanding Minimal Logging Under Bulk-Logged Recovery Model vs. Logging in Truncate Operation [link]
[4] Paul Randal (2013) The Myth that DROP and TRUNCATE TABLE are Non-Logged [link]
[5] SQL-troubles (2018) ERP Systems: Dynamics AX 2009 – Deleting Obsolete Companies [link]
[6] Microsoft TechNet (2014) SQL Server: An Examination of Logging in Truncate Table Statement and Its Comparison with Delete Statement [link]

19 November 2022

💎🏭SQL Reloaded: Tricks with Strings via STRING_SPLIT, PATINDEX and TRANSLATE

Searching for a list of words within a column can be easily achieved by using the LIKE operator:

-- searching for several words via LIKE (SQL Server 2000+)
SELECT * 
FROM Production.Product 
WHERE Name LIKE '%chain%'
   OR Name LIKE '%lock%'
   OR Name LIKE '%rim%'
   OR Name LIKE '%spindle%'

The search is quite efficient, if on the column is defined an index, a clustered index scan being more likely chosen.

If the list of strings to search upon becomes bigger, the query becomes at least more difficult to maintain. Using regular expressions could be a solution. Unfortunately, SQL Server has its limitations in working with patterns. For example, it doesn't have a REGEXP_LIKE function, which is used something like (not tested):

-- Oracle 
SELECT * 
FROM Production.Product 
WHERE REGEXP_LIKE(lower(Name), 'chain|lock|rim|spindle')

However, there's a PATINDEX function which returns the position of a pattern within a string, and which uses the same wildcards that can be used with the LIKE operator:

-- searching for a value via PATINDEX (SQL Server 2000+)
SELECT * 
FROM [Production].[Product] 
WHERE PATINDEX('%rim%', Name)>0

Even if together with the Name can be provided only one of the values, retrieving the values from a table or a table-valued function (TVF) would do the trick. If the values need to be reused in several places, they can be stored in a table or view. If needed only once, a common table expression is more indicated:

-- filtering for several words via PATHINDEX (SQL Server 2008+)
WITH CTE

AS (

  -- table from list of values (SQL Server 2008+)
  SELECT *
  FROM (VALUES ('chain')
  , ('lock')
  , ('rim')
  , ('spindle')) DAT(words)
) 
SELECT * 
FROM Production.Product PRD
WHERE EXISTS (
	SELECT *
	FROM CTE
	WHERE PATINDEX('%'+ CTE.words +'%', PRD.Name)>0
	)

The query should return the same records as above in the first query!

Besides own's UDFs (see SplitListWithIndex or SplitList), starting with SQL Server 2017 can be used the STRING_SPLIT function to return the same values as a TVF:

-- filtering for several words via PATHINDEX & STRING_SPLIT (SQL Server 2017+)
SELECT * 
FROM Production.Product PRD
WHERE EXISTS (
	SELECT *
	FROM STRING_SPLIT('chain|lock|rim|spindle', '|') SPL
	WHERE PATINDEX('%'+ SPL.value +'%', PRD.Name)>0
	)

A dynamic list of values can be built as well. For example, the list of words can be obtained from a table and the STRING_SPLIT function:

-- listing the words appearing in a column (SQL Server 2017+)
SELECT DISTINCT SPL.value
FROM Production.Product PRD
     CROSS APPLY STRING_SPLIT(Name, ' ') SPL
ORDER BY SPL.value

One can remove the special characters, the numeric values, respectively the 1- and 2-letters words:

-- listing the words appearing in a column (SQL Server 2017+)
SELECT DISTINCT SPL.value
FROM Production.Product PRD
     CROSS APPLY STRING_SPLIT(Replace(Replace(Replace(Replace(Name, '-', ' '), ',', ' '), '/', ' '), '''', ' '), ' ') SPL
WHERE IsNumeric(SPL.value) = 0 -- removing numbers
  AND Len(SPL.value)>2 -- removing single/double letters
ORDER BY SPL.value

The output looks better, though the more complex the text, the more replacements need to be made. An alternative to a UDF (see ReplaceSpecialChars) is the TRANSLATE function, which replaces a list of characters with another. One needs to be careful and have a 1:1 mapping, the REPLICATE function doing the trick:

-- replacing special characters via TRANSLATE (SQL Server 2017+)
SELECT TRANSLATE(Name, '-,/''', Replicate(' ', 4))
FROM Production.Product PRD

Now the query becomes:

-- listing the words appearing in a column using TRANSLATE (SQL Server 2017+)
SELECT DISTINCT SPL.value
FROM Production.Product PRD
     CROSS APPLY STRING_SPLIT(TRANSLATE(Name, '-,/''', Replicate(' ', 4)), ' ') SPL
WHERE IsNumeric(SPL.value) = 0 -- removing numbers
  AND Len(SPL.value)>2 -- removing single/double letters
ORDER BY SPL.value

Notes:
1) The SQL Server-based queries work also in a SQL databases in Microsoft Fabric. Just replace the Production with SalesLT schema (see post, respectively GitHub repository with the changed code).

Happy coding!

05 November 2022

💎SQL Reloaded: STRING_AGG and STRING_SPLIT at Work, and a Bit of Pivoting

Working with strings across records was for long a nightmare for SQL developers until Microsoft introduced STRING_SPLIT in SQL Server 2016, respectively STRING_AGG in SQL Server 2017. Previously, one was forced to write procedural language or use workarounds until SQL Server 2015, when recursive CTEs (common table expressions), Ranking and PIVOT were introduced, which allowed handling many scenarios.

Microsoft provides several examples for the usage of STRING_SPLIT and STRING_AGG functions based on AdventureWorks database, though let's look at another example based on the same database.

Let's say we want to show the concatenated Contacts for a store, result which can now easily be obtained by using the STRING_AGG:

-- concatenating names per store via STRING_AGG (SQL Server 2017+)
SELECT BusinessEntityID
, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
FROM Sales.vStoreWithContacts
GROUP BY BusinessEntityID
HAVING count(*)>1

Observe that is needed to use a GROUP BY to show one record per Store. Unfortunately, there isn't yet a window function available for the same.

The inverse operation can be performed with the help of STRING_SPLIT table-valued function (TVF). (If you wonder why is needed a TVF, it is because the initial record needs to be multiplied by the generated output.)

-- reversing the concatenation (SQL Server 2017+)
WITH CTE
AS (
	-- concatenating names per store
	SELECT BusinessEntityID
	, STRING_AGG(Concat(FirstName, ' ', LastName), ';') Contacts
	FROM Sales.vStoreWithContacts
	GROUP BY BusinessEntityID
	HAVING count(*)>1
) 
SELECT CTE.BusinessEntityID
, DAT.Value
, DAT.Ordinal 
FROM CTE
    CROSS APPLY STRING_SPLIT(Contacts, ';', 1) DAT

STRING_SPLIT provides also an ordinal field, which can be used in theory in pivoting the values, though we'd return then from where we started. Instead of using the query just generated, let's exemplify an alternative solution which is available with SQL Server 2005 for concatenating strings across records:

-- concatenating names per store via PIVOT (SQL Server 2012+)
SELECT BusinessEntityID
, [1] Contact1
, [2] Contact2
, [3] Contact3
, [4] Contact4
, Concat([1], IsNull(';' + [2], ''), IsNull(';' + [3], ''), IsNull(';' + [4], '')) Contacts
FROM (
	-- concatenating names and adding a rank
	SELECT BusinessEntityID
	, Concat(FirstName, ' ', LastName) Contact
	, ROW_NUMBER() OVER(PARTITION BY BusinessEntityID ORDER BY FirstName) Ranking
	FROM Sales.vStoreWithContacts
) PER
PIVOT (
    Max(Contact)
	FOR Ranking IN ([1], [2], [3], [4])
) AS DAT

It's needed to rewrite the Concat function to port the code on SQL Server 2005 though.

Talking about workarounds for splitting strings, in certain scenarios I used a combination of CutLeft & CutRight functions, which proved to be useful in data migrations, or use my own version of STRING_SPLIT (see SplitListWithIndex or SplitList). For concatenations I used mainly CTEs (see example) or cursors for exceptional cases (see example).

Happy coding!

SQL Troubles

Pages

15 January 2023

💎🏭SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views I

💎🏭SQL Reloaded: Data Management Views for the Synapse serverless SQL pool (& Microsoft Fabric Warehouse)

08 January 2023

💠🛠️SQL Server: DELETE vs. TRUNCATE TABLE Cheat Sheet

19 November 2022

💎🏭SQL Reloaded: Tricks with Strings via STRING_SPLIT, PATINDEX and TRANSLATE

05 November 2022

💎SQL Reloaded: STRING_AGG and STRING_SPLIT at Work, and a Bit of Pivoting

About Me