While reading the respective post I thought: what if the inner query is encapsulated in a inline table-valued function?! Would the performance remain the same or deprecate? Does it make sense to add the overhead of creating a view for each such scenario in order to gain some performance? Then, reading the comments, one of them was pointing something so obvious: the inner query can be encapsulated within a common table expression, following to perform on it the deletion. An elegant solution I haven’t had available under SQL Server 2000 at the times I had to solve the same problem. Actually my approach was a little different, at that time identifying and storing the greatest value of a new data set, building thus a collection of values that would allow me to select a given range. This approach was quite useful in the respective scenarios and quite acceptable as performance, plus the fact that I was reusing the greatest value in further logic – two birds with one shot.
This being said, I was thinking that except the methods provided by LuborK, I could provide in this post several other simple techniques. Please note that I’m not focusing on performance, but on the techniques. Eventually you can display the statistics related to the disk activity and time required to parse, compile and execute each batch of statements. So, let’s start with the creation and population of the table! In order to minimize the differences, I took the same example and added just a little make-up – comments and a small change in the way the values of the second column are inserted.
-- dropping the table -- DROP TABLE dbo.TestTable --declaring the table CREATE TABLE dbo.TestTable ( id int primary key , value varchar (50)) -- inserting the records in a loop DECLARE @index int SET @index=1 SET NOCOUNT ON WHILE (@index<100000) BEGIN INSERT INTO dbo.TestTable VALUES (@index,'test '+cast(@index as varchar(10))) SET @index=@index+1 END
Supposing we are interested in the first 10000 of records of a sequential data set, it would be much easier if we would know the maximum value from the respective data set. Then we could use the value to perform a simple deletion:
DECLARE @max as int
-- deletion based on stored max value DECLARE @max int SELECT @max = MAX(ID) FROM( SELECT top(10000) id FROM TestTable ORDER BY ID ) A --deleting the data DELETE FROM TestTable WHERE id <= @max
If we don’t need the maximum value for further logic, the two statements in can be combined in one query:
-- deletion based on max value - within one query DELETE FROM TestTable WHERE id <= ( SELECT MAX(ID) FROM ( SELECT top(10000) id FROM dbo.TestTable ORDER BY ID ) A ) --deleting the data DELETE FROM TestTable WHERE id <= @max
When dealing with huge volumes of data, and not only then, in case of sequential processing we can store the maximum value in a given table, and pull the values as required. This approach allows us to process any interval within the data set, in this case the delete resumes to a simple delete statement in which @start and @end were retrieved from the respective table:
-- deleting the data within a given interval DELETE FROM dbo.TestTable WHERE id BETWEEN @start AND @end
If this seems trivial, let’s spice up things! As I observed, few people know that it’s possible to delete data from a given table using an inline table-valued function. Huh, what’s that?! If it seems impossible to you, take a look at documentation! I’m referring here to the GRANT object permissions section in which of interest is the following statement that refers to the ANSI 92 permissions applicable to the various database objects, for table-valued function being listed the following permissions: DELETE, INSERT, REFERENCES, SELECT, UPDATE. In order to exemplify the idea, let’s create the UDF:
-- creating the UDF function CREATE FUNCTION dbo.TestUDF1( @start int , @end int) RETURNS TABLE AS RETURN ( SELECT id, value FROM dbo.TestTable WHERE id BETWEEN @start AND @end ) --deleting the data DELETE FROM dbo.TestUDF1(20001, 30000)
The example is supposed to work, unless you are having a problem with the permissions. Let’s consider this functionality from the perspective of our problem! What LuborK does is to include the inner query in a view and then delete the data:
-- creating the view CREATE VIEW dbo.vTestView AS SELECT top(10000) id FROM dbo.TestTable ORDER BY ID --deleting the data DELETE FROM dbo.vTestView
One of the problems with this approach is that the value representing the number of records to be returned is hardcoded. And here comes to help an inline table-valued function, also referred as a “parameterized view”:
-- creating the UDF function CREATE FUNCTION dbo.TestUDF2( @number_records int) RETURNS TABLE AS RETURN ( SELECT top (@number_records) id, value FROM dbo.TestTable ORDER BY id ) --deleting the data DELETE FROM dbo.TestUDF2(10001)
And, as closure, the example based on the common table expression:
-- deletion using a CTE ;WITH CTE AS ( SELECT top(10000) id FROM dbo.TestTable ORDER BY ID ) DELETE FROM CTE
You can review the record count after each operation:
-- review record count SELECT count(*) FROM dbo.TestTable;
The queries work also in SQL databases in Microsoft Fabric (see file in GitHub repository). You might want to use another schema (e.g. Test), not to interfere with the existing code.
Happy coding!
No comments:
Post a Comment