To get the exact number of records in a table one can use the COUNT (see post) or the more recent COUNT_BIG function, though for big tables this can be an inefficient operation for the database engine:
-- number of records via COUNT SELECT count(*) row_count FROM SalesLT.Product
-- number of records via COUNT for multiple tables SELECT 'SalesLT.Product' table_name , count(*) row_count FROM SalesLT.Product UNION ALL SELECT 'SalesLT.ProductDescription' table_name , count(*) row_count FROM SalesLT.ProductDescription UNION ALL SELECT 'SalesLT.ProductModel' table_name , count(*) row_count FROM SalesLT.ProductModel
There are many scenarios in which it's needed to get an approximate of the number of records available in a table and doing a record count might prove to be too expensive. For a quick and dirty solution one can use the sys.partitions DMV instead:
-- number of records via DMV for single object SELECT object_id , OBJECT_NAME(object_id) object_name , OBJECT_SCHEMA_NAME(object_id) schema_name , SUM(Rows) AS row_count , data_compression_desc AS compression_type , COUNT(*) partitions_count FROM sys.partitions WHERE index_id < 2 --ignore the partitions from the non-clustered index if any AND OBJECT_ID('SalesLT.Product') = object_id GROUP BY object_id , data_compression_desc ORDER BY row_count DESC;
The query is based on sys.partitions table [1] which contains a row for each partition of all the tables and most types of indexes in the database. The documentation mentions that "rows" indicates the approximate number of rows in the considered partition.
Alternatively, one can bring more tables into the query to extend its range of applicability.
-- number of records via DMVs SELECT S.name + '.' + T.name SearchName , S.Name SchemaName , T.name TableName , P.row_count , P.compression_type , P.partitions_count FROM sys.tables T LEFT JOIN ( SELECT object_id , SUM(Rows) AS row_count , data_compression_desc AS compression_type , COUNT(*) partitions_count FROM sys.partitions WHERE index_id < 2 --ignore the partitions from the non-clustered index if any --AND OBJECT_ID('SalesLT.Product') = object_id GROUP BY object_id , data_compression_desc ) P ON T.object_id = P.object_id JOIN sys.schemas as S on S.schema_id = T.schema_id WHERE S.Name = 'SalesLT' AND T.Name LIKE 'Product%' ORDER BY row_count DESC;
The data can be exported regularly to give an idea how tables' cardinality changes over time. One can find this useful as part of the loading process in data warehouses or other solutions (e.g. data migrations).
By using a FULL JOIN instead of a LEFT JOIN one can retrieve only the tables that have records.
One should consider only the tables in scope, and eventually remove the records associated with the system objects (e.g. sys or information_schema upon case).
-- constraints to be added in the WHERE clause to remove the records related to system objects AND OBJECT_NAME(object_id) NOT LIKE 'sys%' AND OBJECT_NAME(object_id) NOT LIKE 'queue_%' AND OBJECT_NAME(object_id) NOT LIKE 'filestream_tombstone%'
There are also scenarios in which the count is needed only for a subset of the data. It's the case of D365 F&O (in which the number of records is needed by DataAreaId (aka company) or another field. A solution can be built using the sp_MSForEachTable stored procedure (see the last query from this post) and a cursor.
Notes:
1) Unfortunately, in Microsoft Fabric warehouses the sys.partitions.rows is 0 for all user tables and currently also the sp_MSForEachTable stored procedure can't be used to retrieve the number of records for all tables. However, one can create an old-fashioned cursor for iterating though the collection of tables in scope.
2) The code used in this post is available also in the GitHub repository.
Happy coding and Merry Christmas!
Previous Post <<||>> Next Post
Resources:
[1] Microsoft Learn (2024) sys.partitions (Transact-SQL) [link]
[2] Microsoft Learn (2024) COUNT_BIG (Transact-SQL) [link]
No comments:
Post a Comment