Showing posts sorted by relevance for query sp_MSForEachTable. Sort by date Show all posts
Showing posts sorted by relevance for query sp_MSForEachTable. Sort by date Show all posts

01 March 2025

💎🤖SQL Reloaded: Copilot Stories (Part VII: Using the sp_MSForEachTable Undocumented Stored Procedures in SQL Server)

Probably few know that SQL Server makes available a set of undocumented stored procedures that can be useful in certain scenarios that don't involve code that will deployed into production. One of them is sp_MSForEachTable on which I wrote several posts until now. The stored procedure executes one or more scripts based on a set of constraints against the subset of available tables. The first surprising fact is that Copilot does know the respective stored procedure and seems to correctly generate code for it!

Disclaimer: The below code was generated by Copilot via prompts used for demonstration purposes!

Please not that the following code is provided as it is and you can use it on your own risk! Don't run the code in a productive environment and make sure that you have a usable backup! Please consider all caution when you use a feature marked as undocumented!

Let's start with a simple scripts and increase the complexity of the queries as it goes!

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure in SQL Server

Output:

-- query that uses this stored procedure to update a specific column in all tables:

EXEC sp_MSForEachTable 'UPDATE ? SET YourColumn = ''NewValue'' WHERE YourCondition'

/*
? is a placeholder for the table name.
YourColumn is the column you want to update.
NewValue is the new value you want to set.
YourCondition is the condition that must be met for the update to occur.*/

The code can be used to overwrite the value of a column that exists in some of the tables. One needs to be careful if there are tables which have same names but different meaning! Review the tables and their values first!

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to select records from a set of tables.

Output:

-- return all columns
EXEC sp_MSForEachTable 'SELECT * FROM ?'

-- return specific columns based on condition
EXEC sp_MSForEachTable 'SELECT Column1, Column2 FROM ? WHERE YourCondition'

The second query will result in errors if the columns don't exist!

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records from a table

Output:

-- retrieving the number of records for all tables
EXEC sp_MSForEachTable 'SELECT ''?'' AS TableName, COUNT(*) AS RecordCount FROM ?'

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to select all the records from tables containing the word invent.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%invent%'' BEGIN SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql


Prompt:
Write a query that uses the sp_MSForEachTable stored procedure when the DataAreaId field exists in the table.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = PARSENAME(''?'', 1) AND COLUMN_NAME = ''DataAreaId'') 
                 BEGIN 
                     SET @sql = @sql + ''SELECT * FROM '' + ''?'' + ''; '' 
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

As can be seen, the query uses the metadata available to retrieve the tables in scope.

Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records for a subset of tables.

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%subset%'' 
                 BEGIN 
                     SET @sql = @sql + ''SELECT ''''?'''' AS TableName, COUNT(*) AS RecordCount FROM '' + ''?'' + ''; '' 
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql


Prompt:
Write a query that uses the sp_MSForEachTable stored procedure to count all the records for a subset of tables based on a different column from each table. 

Output:

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'IF ''?'' LIKE ''%subset%'' 
                 BEGIN 
                     IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = PARSENAME(''?'', 1) AND COLUMN_NAME = ''Column1'')
                     BEGIN
                         SET @sql = @sql + ''SELECT ''''?'''' AS TableName, COUNT(Column1) AS RecordCount FROM '' + ''?'' + ''; ''
                     END
                     ELSE IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = PARSENAME(''?'', 1) AND COLUMN_NAME = ''Column2'')
                     BEGIN
                         SET @sql = @sql + ''SELECT ''''?'''' AS TableName, COUNT(Column2) AS RecordCount FROM '' + ''?'' + ''; ''
                     END
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

The output is not what I intended, though it looks interesting enough to give it a try...

Prompt:
Create a test table that contains a list of tables and corresponding fields from a table. There can be multiple columns available for each table. Write a query that uses the sp_MSForEachTable stored procedure to count all the distinct value for each combination of values available in the test table

Output:

-- Step 1: Create the Test Table
-- Create the test table
CREATE TABLE TestTable (
    TableName NVARCHAR(128),
    ColumnName NVARCHAR(128)
);

-- Insert sample data into the test table
INSERT INTO TestTable (TableName, ColumnName) VALUES ('Table1', 'ColumnA');
INSERT INTO TestTable (TableName, ColumnName) VALUES ('Table1', 'ColumnB');
INSERT INTO TestTable (TableName, ColumnName) VALUES ('Table2', 'ColumnC');
INSERT INTO TestTable (TableName, ColumnName) VALUES ('Table2', 'ColumnD');
INSERT INTO TestTable (TableName, ColumnName) VALUES ('Table3', 'ColumnE');

--Step 2: Query to Count Distinct Values
DECLARE @sql NVARCHAR(MAX)

SET @sql = ''
EXEC sp_MSForEachTable 
    @command1 = 'DECLARE @table NVARCHAR(128), @column NVARCHAR(128)
                 SELECT @table = TableName, @column = ColumnName
                 FROM TestTable
                 WHERE TableName = PARSENAME(''?'', 1)
                 IF @column IS NOT NULL
                 BEGIN
                     SET @sql = @sql + ''SELECT ''''?'''' AS TableName, '''' + @column + '''' AS ColumnName, COUNT(DISTINCT '' + @column + '') AS DistinctCount FROM '' + ''?'' + ''; ''
                 END',
    @replacechar = '?',
    @command2 = 'EXEC sp_executesql @sql'

PRINT @sql

Use the following code to create a table for testing the above scripts. The scripts should be run in a small, ideally empty database in a non-productive environment!

Prompt:
Create a table that contains the DataAreaId and CompanyName and populate it with 5 records.

Output:

-- Create the table
CREATE TABLE Company (
    DataAreaId INT,
    CompanyName NVARCHAR(100)
);

-- Insert 5 records into the table
INSERT INTO Company (DataAreaId, CompanyName) VALUES (1, 'Company A');
INSERT INTO Company (DataAreaId, CompanyName) VALUES (2, 'Company B');
INSERT INTO Company (DataAreaId, CompanyName) VALUES (3, 'Company C');
INSERT INTO Company (DataAreaId, CompanyName) VALUES (4, 'Company D');
INSERT INTO Company (DataAreaId, CompanyName) VALUES (5, 'Company E');

The queries should be run in SQL Server environment where the sp_MSForEachTable is available. In SQL databases the code will result in the following error:

"Msg 2812, Level 16, State 62, Line 1, Could not find stored procedure 'sp_MSForEachTable'."

Previous Post <<||>> Next Post

27 June 2020

💠🛠️SQL Server: Undocumented (Part II: Execute Command for Each Table)

One neat undocumented feature useful in the administration of the SQL Server is the sp_MSforeachtable stored procedure which allows executing a set of several commands against each table available into the current database, or, with a few changes, against a set of predefined tables.  It allows running up to 3 commands as part of the main processing, as well a pre-command, typically used to set variable environments or perform initializations, respectively a post-command, typically used for control and cleanup processes. It replaces thus the need to run the commands within a cursor, even if the latter can occasionally provide more flexibility (see example).

The stored procedures provides the following parameters:
@command1, @command2, @command3: the main commands to be executed (at least the first command must be executed)
@precommand: a command to be executed before the @command1,
@postcommand: a command to be executed after all the commands were executed successfully,
@replacechar: the character used to represent the table within the commands, by default a question mark (?) character
@whereand: used to limit the scope only to a set of tables 

Before the introduction of Data Management Views with SQL Server 2005, the stored procedure was used to get the number of records for all tables:

-- getting the number of records for each table
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?'

To restrict running the command(s) only for a table, the object_id function can be used within the @whereand parameter:

-- getting the number of records for a table
EXEC sp_MSforeachtable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?' --
      , @whereand = ' And Object_id = Object_id(''[Person].[Address]'')'

In AX 2009 and even the later versions including Dynamics 365 (as long the direct access to the database is possible) the number of records could be restricted to a given company (aka business unit):

-- getting the number of records from a mandant
sp_MSforEachTable @command1 = 'SELECT ''?'' [Table], COUNT(*) NoRecords FROM WHERE DataAreaId IN (''DAT'')'

The procedude can be used to perform CRUD operayions on each table. During a data migration it was possible to clean out the business units not needed with just a simple script:

--delete the data for multiple data areas
 sp_MSforEachTable @command1 = 'DELETE FROM ? WHERE DataAreaId IN (''m01'')'

Administrations tasks can be performed as well, e.g. updating statistics, performing consistency checks for each table, etc.:

-- for each table 
EXEC sp_MSforeachtable 'EXEC sp_spaceused ''?''' -- displaying the disk space researved/used
EXEC sp_MSforeachtable 'UPDATE statistics ? WITH ALL' -- updating statistics
EXEC sp_MSforeachtable 'DBCC CHECKTABLE (''?'') WITH ALL_ERRORMSGS, NO_INFOMSGS' -- check table considerncy 

A set of tasks can be performed for a single table by providing the constraint within the @whereand parameter:

-- executing multiple commands for a table
EXEC sp_MSforeachtable @command1='UPDATE statistics ? WITH ALL' -- updating statistics
   , @command2= 'DBCC CHECKTABLE (''?'') WITH ALL_ERRORMSGS, NO_INFOMSGS' -- check table considerncy 
   , @whereand = ' And Object_Name(Object_id) = ''[Person].[Address]'''

Executing the commands against a subset of tables involves using a persisted or temporary table to store the name of the tables which shoul be considered, eventually with further metadata to allow filtering:

-- dropping the table
-- DROP TABLE IF EXISTS #Tables

-- create a temporary table
SELECT TableName
INTO #Tables 
FROM (VALUES ('[Person].[Address]')
, ('[Person].[AddressType]')
, ('[Person].[BusinessEntity]')) DAT(TableName)

-- getting the number of records for the list of tables
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?'
, @whereand = ' And Object_id In (Select Object_id(TableName) FROM #Tables)'

Warnings:
1) Do not forget to dump the temporary table when finished!
2) The code is provided only for exemplification purposes. You can use the above code on your own risk!
3) Undocumented features can be deprecated in future versions of SQL Server, therefore they should be used with precaution in long-term solutions. It doesn't run for example in Microsoft Fabric warehouses.

Happy coding!

23 January 2025

💎SQL Reloaded: Number of Records VI (via sp_MSForEachTable Undocumented Stored Procedure)

Starting with SQL Server 2000 it's possible to execute a command via the undocumented stored procedure sp_MSForEachTable for each table available in a database, respectively for subsets of the tables. In a previous post I shown how the stored procedure can be used in several scenarios, including how to get the total number of records in each set of tables. However, the code used generates a result set for each table, which makes it difficult to aggregate the information for further processing. In many scenarios, it would be useful to store the result as a temporary or even persisted table.

-- dropping the tables
DROP TABLE IF EXISTS #Tables
DROP TABLE IF EXISTS #TablesRecordCount

-- create a temporary table to store the input list
SELECT TableName
INTO #Tables 
FROM (VALUES ('Person.Address')
, ('Person.AddressType')
, ('Person.BusinessEntity')) DAT(TableName)


-- create a temporary table to store the results
CREATE TABLE dbo.#TablesRecordCount (
  table_name nvarchar(150) NOT NULL
, number_records bigint
, run_date datetime2(0)
, comment nvarchar(255)
)

-- getting the number of records for the list of tables into the result table
INSERT INTO #TablesRecordCount
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records, GetDate() run_date, ''testing round 1'' comment FROM ?'
, @whereand = ' And Object_id In (Select Object_id(TableName) FROM #Tables)'

-- reviewing the result
SELECT *
FROM #TablesRecordCount
ORDER BY number_records DESC

The above solution uses two temporary tables, though it can be easily adapted to persist the result in a standard table: just replace the "#" with the schema part (e.g. "dbo."). This can be useful in troubleshooting scenarios, when the code is run at different points in time, eventually for different sets of tables. 

The code is pretty simple and can be extended as needed. Unfortunately, there's no guarantee that the sp_MSForEachTable stored procedure will be supported in the next versions of the SQL Server. For example, the stored procedure is not available in SQL databases, respectively in Fabric warehouses. In SQL databases the following error is thrown:

"Msg 2812, Level 16, State 62, Line 1, Could not find stored procedure 'sys.sp_MSForEachTable'."

To test whether the feature works in your environment, it's enough to run a call to the respective stored procedure:

-- retrieve the record count for all tables
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'

Or, you can check whether it works for one table (replace the Person.AddressType table with one from your environment):

-- getting the number of records for the list of tables into another table
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) numer_records FROM ?'
, @whereand = ' And Object_id = Object_id(''Person.AddressType'')'

The solution could prove to be useful in multiple scenarios, though one should consider also the risk of being forced to rewrite the code when the used stored procedure becomes unavailable. Even if it takes more time to write, a solution based on cursors can be more feasible (see previous post).

Update 29-Jan-2025: Probably, despite their usefulness, the undocumented features will not be brought to SQL databases (see [1], 47:30). So, be careful about using the respective features as standard solutions in production environments!

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft Reactor (2025) Ask The Expert - Fabric Edition - Fabric Databases [link]

20 June 2018

💫ERP Systems: Dynamics AX 2009 (Part I: Deleting Obsolete Companies)

Introduction   

During implementations, migrations and other projects are created in Dynamics AX temporary companies (aka legal entities, data areas) that aren’t needed anymore once they fulfilled their purpose. Excepting the fact that obsolete companies occupy space in the data center, under certain circumstances they can lead to performance problems. The logical thing to do would be to delete the obsolete companies as long there’s no further demand from the business.    

In what follows we will look at several methods for deleting obsolete companies. The scripts were tested in Dynamics AX 2009, and more likely they’ll work in coming versions as long the data model behind was kept.

Warning:
Please note that the scripts are provided “AS IS” only to exemplify a technique and they come without any warranty! Before attempting any of the methods described here, review the comments from “Further Considerations” section!

Method 1: Using DynamcsAX Built-In Functionality   

Dynamics AX 2009 provides built-in functionality for deleting a company, however when the volume of data in the system goes above a certain limit the functionality starts to perform poorly, even when run directly on an AOS. (It is recommended to run long-running administration jobs directly on the AOS rather than clients.)    For example, it was attempted to use this method to delete several companies in Dynamics AX Test environment. By the first company the deletion job needed a few hours, while by the second company the job hasn’t finished after two days, being thus forced to stop it. After two further failed attempts it came the time to look for another solution.

Warning:
It seems that this solution can lead to orphaned data (see [1]). So, even if you are using this method, you might need to consider one of the following methods as well.

Method 2: Using sp_MSforEachTable   

In almost all tables in AX the company is stored in a DataAreaId attribute. Over this attribute the records belonging to a company are logically partitioned. This allows writing a script via the undocumented sp_MSforEachTable stored procedure:

--delete the data for one data area
sp_MSforEachTable @command1 = 'DELETE FROM ? WHERE DataAreaId = ''m01'''

An error with be thrown for the tables that don’t contain the DataAreaId attribute:

"Msg 207, Level 16, State 1, Line 1"

Invalid column name 'DataAreaId'.The script can be extended to delete in the same step two or more companies:

--delete the data for multiple data areas
 sp_MSforEachTable @command1 = 'DELETE FROM ? WHERE DataAreaId IN (''m01'', ''m02'')'


During the first test the script needed half of hour to run, however a few tables  in which the company is stored in other attributes remained untouched. One can either search for such tables manually, via a script, or run the built-in AX functionality. We opted for running the built-in functionality, which managed to delete the remaining data relatively fast.

Warning:
Microsoft doesn’t support this method and can be used when the volume of obsolete data is relatively small!    What does it mean relatively small? The most important limitation of this method is the transaction log, considering that the deleted data are logged. One can either change log’s size to accommodate the volume of data to be deleted or run the deletion only for a subset of the tables. (Changing the recovery model to “simple” or “bulk-logged” won’t make a difference.)

The second important limitation is the available memory, once the available memory is reached SQL Server having to paginate the data, fact that could lead to further disk space consumed.    Other limitations have more with the performance to do, e.g. each deletion is reflected also in the indexes. One might consider for example dropping the indexes before deletion and recreating them afterwards.

Method 3: Using a Cursor    

Instead of using the undocumented sp_MSforEachTable stored procedure, the loop can be performed via a cursor (see [1]). This method is advantageous when the deletion needs to be performed only for a subset of tables one could use a cursor. The deletion can be grouped together with other activities and run together.

Method 4: Using „Shadow“ Tables    

When the volume of data available is huge, and the volume of data that remain in the table is small compared with the overall data, it might be useful to consider using “shadow” tables. One can take advantage of the fact that a truncate command performs incomparable better than a delete command.  To use a truncate on a table, the records that need to be kept could be saved temporarily to a copy (aka “shadow”) of the table, the truncate then applied, and the copied records could be moved back. The following scripts exemplify the logic needed to delete the records from InventDim (inventory dimensions) table:

-- (optional) prove the number of records
SELECT count(*) 
FROM dbo.InventDim 
WHERE DataAreaId = 'm01'

-- create the “shadow” table
CREATE TABLE [dbo].[INVENTDIM_Dump](
[INVENTDIMID] [nvarchar](30) NOT NULL,
[INVENTBATCHID] [nvarchar](21) NOT NULL,
[WMSLOCATIONID] [nvarchar](12) NOT NULL,
[INVENTSERIALID] [nvarchar](21) NOT NULL,
[INVENTLOCATIONID] [nvarchar](10) NOT NULL,
[CONFIGID] [nvarchar](10) NOT NULL,
[INVENTSIZEID] [nvarchar](10) NOT NULL,
[INVENTCOLORID] [nvarchar](10) NOT NULL,
[INVENTSITEID] [nvarchar](10) NOT NULL,
[DATAAREAID] [nvarchar](4) NOT NULL,
[RECVERSION] [int] NOT NULL,
[RECID] [bigint] NOT NULL,
[WMSPALLETID] [nvarchar](18) NOT NULL,
[INVENTSTYLEID] [nvarchar](10) NOT NULL
) ON [PRIMARY]

-- copy the data into the “shadow” table
INSERT INTO [dbo].[InventDim_Dump] WITH (TABLOCK)
SELECT *
FROM [dbo].[InventDim] 
WHERE DataAreaId = 'm01'

-- truncate the data frome the main table 
--TRUNCATE TABLE [dbo].[InventDim]

-- copy the data back
INSERT INTO [dbo].[InventDim] WITH (TABLOCK)
SELECT *
FROM [dbo].[InventDim_Dump]

-- (optional) prove whether the IDs were correctly copied 
SELECT count(*)
FROM [dbo].[InventDim] A
JOIN [dbo].[InventDim_Dump] B
ON A.recid = B.RECID 
AND A. DATAAREAID = B.DATAAREAID 
WHERE A.DataAreaId = 'm01'

-- drop the „shadow“ table 
--DROP TABLE[dbo].[InventDim_Dump]

 As can be seen the “shadow” tables are simplified versions of the original tables, without constraints or indexes. They can be eventually created in another schema or even other database.   

Except the script for table’s creation in the other scripts table’s name can be easily replaced in the editor via the search and replace functionality, trick that reduces considerably the time needed for development. I needed on average 5 minutes for each table, plus 3-4 hours for further tests.    

The optional steps are more for exemplification and can be eventually removed.  

The Tablock hint used in inserts provides better performance and minimizes the volume of data logged.    

I used this method only for the tables having more than 3 million records, around 50 tables in total. Between them there were a few tables having 20-200 GB worth of records. I started with these big tables and figured out that also smaller tables could benefit from this method. A few minutes gained for each small table resulted in the end in a gain of a couple of hours.

The remained records were 0-25% of the initial tables.   

In theory, these steps could be performed within a cursor in which the creation of the “shadow” tables could be automated via table metadata as well. This approach will pay-off especially when the schema is not fixed, or the procedure needs to be repeated on different schemas.

Method 5: Delete Records in Batches    

There will be a point beyond which the performance provided by the fourth method will deprecate considerably. This point is based on the volume of records available in the table, and the records needed to be inserted back and forth. Without further tests, I suppose that this point lies in the 50-75% interval. Beyond this point for big tables in range of 10x or 100x GB it might be useful to delete the data in batches. A push in this direction might be constrained by the need to shrink the transaction log in between the deletes. The query could be written as follow:

-- deleting top x records 
DELETE top 10000
FROM dbo.InventDim WITH (TABLOCK)
WHERE DataAreaId = m01

The query can be included in a loop or run manually until no records are returned. It can be tested with different batch sizes to determine the best solution. In between is recommended to check also the growth of the log file and truncate it accordingly when needed.

Method 6: Using X++ code

For those having some basic knowledge of X++ and Dynamics AX classes, a solution based on deleting data via AX code could prove to be a better solution as standard functionality can be leveraged, functionality that eventually considers also the business logic implemented. The downside is the code that need to be written for this purpose, however there are already some examples available on the web (see [4]).

Hint:
In AX 2012 built-in support for batch deletes was added via the delete_from statement (see [3]).

Further Aspects     

Before attempting a deletion, it might be useful to analyze how many records will be deleted from each table, and eventually devise different scenarios for specific table categories. To get the number of records one can use either the built-in functionality from AX or use the sp_MSforEachTable stored procedure and export the results to text, following to overwork the data further in Excel:

-- listing the number of records per company 
sp_MSforEachTable @command1 = 'SELECT dataareaid, ''?'' table_name, count(*) no_records FROM ? WHERE DataAreaId IN (''m01'', ''m02'') GROUP BY dataareaid'

The results can be used also to approximate the space occupied by the data.   

Independently of the method used it is recommended to restrict users‘ access to the system and to deactivate the scheduled AX or SQL Server jobs. This will ensure that no blockings will occur in the system during the respective time.    

As data are synchronized between the AOS’s and the database, it is recommended to shut down the not needed AOS services before the deletions are performed, and restart them once all activities were performed.   

To minimize the risks associated with the loss of data it’s recommended to perform a backup of the database(s) before performing any changes.    

By deleting the data directly on the database, the business logic from AX (including customizations) is skipped. In theory this can lead to logical inconsistencies, however considering that all the data for a company are deleted, the risks are very small, unless intercompanies are involved.   

After the data are deleted it is recommended to recreate the indexes and update the statistics on the tables.  

Check whether the transaction log can accommodate the volume of records to be deleted! In extreme cases your SQL Server might crash! From this consideration it might be advantageous to delete only a company at a time.    

Based on the volume of data available in the transaction log it might be needed to truncate the log(s) between the steps, as well at the end.  

After the principle “better safe than sorry”, it might be a good idea to check the physical and logical consistency of the data before letting the users in.   

To minimize the impact on the business, it is recommended to perform the deletion outside the working hours, otherwise the action can lead to blocking and even deadlocks in the system.     Always attempt to use standard functionality and resort to other methods only when there’s no way around it.

It is recommended to always test the scripts thoroughly in the test environment before attempting their productive usage!

Previous Post <<||>> Next Post

References:
[1] Microsoft Dynamics AX Technical Support Blog (2010) How to delete orphaned data remained from deleted company?, by Martin Falta [link
[2] Art of Creation (2010) Delete an AX company on SQL [link]
[3] MSDN (2012) delete_from Statement [link]
4] Kevin’s blog (2017) Dynamics Ax 2012 History cleanup, by Kevin Roos [link

25 December 2024

💎🏭SQL Reloaded: Number of Records IV (via sys.partitions DMV)

To get the exact number of records in a table one can use the COUNT (see post) or the more recent COUNT_BIG function, though for big tables this can be an inefficient operation for the database engine:

-- number of records via COUNT
SELECT count(*) row_count
FROM SalesLT.Product

Moreover, sometimes the operation needs to be repeated for a number of tables, e.g. dropdown tables in Dynamics 365 for Finance and Operations (D365 F&O). Writing the query as a UNION allows to export the data as a single table and do comparisons (e.g. in Excel). The same approach can be used also when multiple columns are used for grouping, though one must account for the additional columns in the other subqueries. However, the more tables are involved, the more difficult it becomes to maintain the query over time. 
 
-- number of records via COUNT for multiple tables
SELECT 'SalesLT.Product' table_name
, count(*) row_count
FROM SalesLT.Product
UNION ALL
SELECT 'SalesLT.ProductDescription' table_name
, count(*) row_count
FROM SalesLT.ProductDescription
UNION ALL
SELECT 'SalesLT.ProductModel' table_name
, count(*) row_count
FROM SalesLT.ProductModel

There are many scenarios in which it's needed to get an approximate of the number of records available in a table and doing a record count might prove to be too expensive. For a quick and dirty solution one can use the sys.partitions DMV  instead:

-- number of records via DMV for single object
SELECT object_id
, OBJECT_NAME(object_id) object_name
, OBJECT_SCHEMA_NAME(object_id) schema_name
, SUM(Rows) AS row_count
, data_compression_desc AS compression_type
, COUNT(*) partitions_count
FROM sys.partitions 
WHERE index_id < 2 --ignore the partitions from the non-clustered index if any
  AND OBJECT_ID('SalesLT.Product') = object_id
GROUP BY object_id
, data_compression_desc
ORDER BY row_count DESC;

The query is based on sys.partitions table [1] which contains a row for each partition of all the tables and most types of indexes in the database. The documentation mentions that "rows" indicates the approximate number of rows in the considered partition.

Alternatively, one can bring more tables into the query to extend its range of applicability. 

-- number of records via DMVs
SELECT S.name + '.' + T.name SearchName
, S.Name SchemaName
, T.name TableName
, P.row_count
, P.compression_type
, P.partitions_count
FROM sys.tables T
     LEFT JOIN (
        SELECT object_id
        , SUM(Rows) AS row_count
        , data_compression_desc AS compression_type
        , COUNT(*) partitions_count
        FROM sys.partitions 
        WHERE index_id < 2 --ignore the partitions from the non-clustered index if any
        --AND OBJECT_ID('SalesLT.Product') = object_id
        GROUP BY object_id
        , data_compression_desc
     ) P
    ON T.object_id = P.object_id
     JOIN sys.schemas as S
	   on S.schema_id = T.schema_id
WHERE S.Name = 'SalesLT'
  AND T.Name LIKE 'Product%'
ORDER BY row_count DESC;

The data can be exported regularly to give an idea how tables' cardinality changes over time. One can find this useful as part of the loading process in data warehouses or other solutions (e.g. data migrations). 

By using a FULL JOIN instead of a LEFT JOIN one can retrieve only the tables that have records. 

One should consider only the tables in scope, and eventually remove the records associated with the system objects (e.g. sys or information_schema upon case).

 -- constraints to be added in the WHERE clause to remove the records related to system objects
 AND OBJECT_NAME(object_id) NOT LIKE 'sys%'
 AND OBJECT_NAME(object_id) NOT LIKE 'queue_%' 
 AND OBJECT_NAME(object_id) NOT LIKE 'filestream_tombstone%' 

There are also scenarios in which the count is needed only for a subset of the data. It's the case of D365 F&O (in which the number of records is needed by DataAreaId (aka company) or another field. A solution can be built using the sp_MSForEachTable stored procedure (see the last query from this post) and a cursor.

Notes:
1) Unfortunately, in Microsoft Fabric warehouses the sys.partitions.rows is 0 for all user tables and currently also the sp_MSForEachTable stored procedure can't be used to retrieve the number of records for all tables. However, one can create an old-fashioned cursor for iterating though the collection of tables in scope.
2) The code used in this post is available also in the GitHub repository.

Happy coding and Merry Christmas!

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Learn (2024) sys.partitions (Transact-SQL) [link]
[2] Microsoft Learn (2024) COUNT_BIG (Transact-SQL) [link]

08 January 2023

💠🛠️SQL Server: DELETE vs. TRUNCATE TABLE Cheat Sheet

The comparison between the DELETE and TRUNCATE TABLE commands resumes to more than saying that one method is faster than the other or that one should always use TRUNCATE TABLE when deleting all the records from a table, which typically is not advisable in production environments. I tried to provide an overview of the two commands, though this should be considered as "work in progress".
 
Disclaimer: please refer to the SQL Server Docs for the complete set of features broken down on version.
 
DELETETRUNCATE TABLE
DefinitionDML command that removes one or more rows from a table or viewDDL command that removes all rows from a table or specified partitions of a table, without logging the individual row deletions
Scopetables, views, memory-optimized tables, common table expressions, MERGE, sp_MSforEachTable, linked serverstables, partitions
Behavior• removes rows one at a time and records an entry in the transaction log for each deleted row
  - for big tables the transaction log fills fast and may reach its limit
• fully logged ⇒rollback supported
• executed using a row lock, though locks might be escalated to a larger scope ⇒ performance may degrade
• [views] deletes the records only from the base table
• allows outputting the deleted records (via OUTPUT clause)
• [heaps] pages made empty may remain allocated ⇒ can’t be reused by other objects
• equivalent of a DROP & CREATE TABLE
• uses an optimized logging mode:
   - removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log
   - a deferred-drop mechanism unhooks the allocations for the table and putting them on the ‘deferred-drop queue’ for later processing by a background task deallocates all the pages and extents
• fully logged, however rollback supported only with explicit transactions
• leaves zero pages in the table
• resets the identity property
Syntax (simple form)
DELETE <table_name>
FROM <database>
[WHERE <search_condition>]   
[OPTION (<query_options>)]
TRUNCATE TABLE <table_name>
[ WITH ( PARTITIONS ( { 
<partition_number_expression> 
| <range> }))]
Performance• degrades with the numbers of records
• [large tables] can cause the transaction log to become full
• the operation completes almost instantaneously
• best practice because is faster and uses fewer system and transaction log resources
Constraints• a DELETE may fail if - violates a trigger - tries to remove a row referenced by data in another table with a FOREIGN KEY constraint
• TOP can’t be used in a DELETE statement against partitioned views • doesn’t reset the identity property
• can’t be used with views
• can’t be used on tables:
   - referenced by a FOREIGN KEY constraint, except self-references
   - participate in an indexed view - published using transactional or merge replication - system-versioned temporal.
   - referenced by an EDGE constraint
• can’t activate a trigger
• can’t be used on views & memory-optimized tables
Permissions• [minimum] DELETE permission on target table, and SELECT permission, it if includes a WHERE clause
• default to
   - table owner
   - members of the sysadmin fixed server role   
   - db_owner and db_datawriter fixed database roles
• table owners & members of sysadmin, db_owner & db_securityadmin roles can transfer permissions to other users
• [minimum] ALTER permission on target table
• default to
   - table owner - members of the sysadmin fixed server role
   - db_owner and db_ddladmin fixed database roles
• permissions are not transferable
• doesn’t support direct permissions (workaround: use TRUNCATE in stored procedure, and assign the required permission to it using the EXECUTE AS clause)
Scenarios• delete a set of records from a table
   - based on fix constraints
   - based on records from another table
   - based on a join with a source table
• empty a set of tables from a database
Recommendations• use a TRUNCATE when is safe to delete all the records ⇒ make sure that a backup or copy of the data is available
• [large tables] consider dropping the indexes before performing a DELETE when this covers all or most of the data, and recreate them afterwards
• [large tables] if the volume of data to be deleted is big compared with the remaining data, consider moving the data to a table with a similar structure, perform a TRUNCATE and then move the data back (see [5])
• [large tables] consider deleting data in batches with log truncation in single-user mode (be careful in production environments)
• [heaps] specify the TABLOCK hint in the DELETE statement
• [heaps] create a clustered index on the heap before deleting the row
Myths•TRUNCATE TABLE is a non-logged operation (see [4])
•TRUNCATE TABLE is a minimally logged operation (see [3])
Related conceptsnon-logged/minimally-logged/fully-logged operations, deferred-drop mechanism, sp_MSforEachTable stored procedure, transaction log, MERGE, DDL, DML

Resources:
[1] Microsoft SQL Docs (2022) DELETE [link]
[2] Microsoft SQL Docs (2022) TRUNCATE TABLE [link]
[3] Microsoft TechNet (2017) SQL Server: Understanding Minimal Logging Under Bulk-Logged Recovery Model vs. Logging in Truncate Operation [link]
[4] Paul Randal (2013) The Myth that DROP and TRUNCATE TABLE are Non-Logged [link]
[5] SQL-troubles (2018) ERP Systems: Dynamics AX 2009 – Deleting Obsolete Companies [link]
[6] Microsoft TechNet (2014) SQL Server: An Examination of Logging in Truncate Table Statement and Its Comparison with Delete Statement [link]

23 January 2025

💎🏭SQL Reloaded: Number of Records V (via Cursors, a Solution for Warehouses in Microsoft Fabric)

After deploying the sample warehouse available in Microsoft Fabric, I tried to check the number of records available in the deployed tables under the dbo schema. Surprisingly, the sys.partitions.count column has 0 values for all the tables associated with the respective schema (see post). 

There are only a few tables available, and taking a record count for each table should be enough, which is relatively simple with the undocumented sp_MSForEachTable. Unfortunately, this approach doesn't work neither, so one needs to revert to the use of old-fashioned cursors (as I used to do in SQL Server 2000):

-- number of records via cursor
DECLARE @table_name nvarchar(150)
DECLARE @sql nvarchar(250)
DECLARE @number_records bigint 
DECLARE @number_tables int, @iterator int

DROP TABLE IF EXISTS dbo.#tables;

CREATE TABLE dbo.#tables (
  ranking int NOT NULL
, table_name nvarchar(150) NOT NULL
, number_records bigint
)

INSERT INTO #tables
SELECT row_number() OVER(ORDER BY object_id) ranking
, concat(schema_name(schema_id),'.', name) table_name
, NULL number_records
FROM sys.tables obj
WHERE obj.schema_id = schema_id('dbo')
ORDER BY table_name

SET @iterator = 1
SET @number_tables = IsNull((SELECT count(*) FROM #tables), 0)

WHILE (@iterator <= @number_tables)
BEGIN 
    SET @table_name = (SELECT table_name FROM #tables WHERE ranking = @iterator)
    SET @sql = CONCAT(N'SELECT @NumberRecords = count(*) FROM ', @table_name)

	BEGIN TRY
		--get the number of records
		EXEC sp_executesql @Query = @sql
		, @params = N'@NumberRecords bigint OUTPUT'
		, @NumberRecords = @number_records OUTPUT

		IF IsNull(@number_records, 0)> 0  
		BEGIN
                SET @sql = 'UPDATE #tables' 
             + ' SET number_records = ' + Str(@number_records)
             + ' WHERE table_name = ''' + @table_name + '''';

		 EXEC(@sql)
		END 
	END TRY
	BEGIN CATCH  
	 -- no action needed in case of error
        END CATCH;

	SET @iterator = @iterator + 1
END

SELECT *
FROM dbo.#tables;

--DROP TABLE IF EXISTS dbo.#tables;
Results:
ranking table_name number_records
1 dbo.Date 5844
2 dbo.Geography 305179
3 dbo.HackneyLicense 42958
4 dbo.Time 86400
5 dbo.Weather 526330
6 dbo.Trip 2838927
7 dbo.Medallion 13668

Comments:
1) It's a lot of code for a simple task, though the code can be easily duplicated and adapted for similar requirements. Unfortunately, it can lead in time also to many instances of the same code. When possible, one should consider maybe encapsulating the logic in a stored procedure. 
2) It's usually a good idea to check how many records are available in the tables used for testing, as this can impact queries' performance and tables' appropriateness for the tests performed. Moreover, it's a good idea to understand the volume of data when taking over or working with a database. 
3) If one removes the row_number function, the code should run also in SQL Server 2000. Similar solutions were used then for retrieving the record count.
4) Microsoft recommends not to drop the temporary tables explicitly, but let SQL Server handle this cleanup automatically and take thus advantage of the Optimistic Latching Algorithm, which helps prevent contention on TempDB [1]..
5) There are others who stumbled over this issue (see [1]).
6) The solution has been tested successfully also in SQL databases.
7) The whole code must be run together because the temporary table seems to have only a transitory scope! An attempt to rerun the last SELECT from #tables raises the error: "Invalid object name '#tables'"

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Koen Verbeeck (2024) Get row counts of all tables in a Microsoft Fabric warehouse [link]
[2] Haripriya SB (2024) Do NOT drop #temp tables (link)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.