Last week I run into an interesting solution to a simple problem, problem I dealt with in several occasions too: how to delete (efficiently) sequential data, or how LuborK calls it in his post, “a problem of efficient ordered delete”. In this case he’s talking about the deletion of the first n records from a dataset of sequential data. It sounds like a simple problem considering that we can easily select in a first step the first n records using the TOP function and then delete the respective matched records. That’s also what LuborK does, however the novelty of the solution proposed it’s a little unexpected, and this not from the point of view of the approach, but of the gain in performance. The bottom line: by encapsulating the inner query within a view, the deletion of sequential data becomes more efficient.
While reading the respective post I thought: what if the inner query is encapsulated in a inline table-valued function?! Would the performance remain the same or deprecate? Does it make sense to add the overhead of creating a view for each such scenario in order to gain some performance? Then, reading the comments, one of them was pointing something so obvious: the inner query can be encapsulated within a common table expression, following to perform on it the deletion. An elegant solution I haven’t had available under SQL Server 2000 at the times I had to solve the same problem. Actually my approach was a little different, at that time identifying and storing the greatest value of a new data set, building thus a collection of values that would allow me to select a given range. This approach was quite useful in the respective scenarios and quite acceptable as performance, plus the fact that I was reusing the greatest value in further logic – two birds with one shot.
This being said, I was thinking that except the methods provided by LuborK, I could provide in this post several other simple techniques. Please note that I’m not focusing on performance, but on the techniques. Eventually you can display the statistics related to the disk activity and time required to parse, compile and execute each batch of statements. So, let’s start with the creation and population of the table! In order to minimize the differences, I took the same example and added just a little make-up – comments and a small change in the way the values of the second column are inserted.
Supposing we are interested in the first 10000 of records of a sequential data set, it would be much easier if we would know the maximum value from the respective data set. Then we could use the value to perform a simple deletion:
If we don’t need the maximum value for further logic, the two statements in can be combined in one query:
When dealing with huge volumes of data, and not only then, in case of sequential processing we can store the maximum value in a given table, and pull the values as required. This approach allows us to process any interval within the data set, in this case the delete resumes to a simple delete statement in which @start and @end were retrieved from the respective table:
If this seems trivial, let’s spice up things! As I observed, few people know that it’s possible to delete data from a given table using an inline table-valued function. Huh, what’s that?! If it seems impossible to you, take a look at documentation! I’m referring here to the GRANT object permissions section in which of interest is the following statement that refers to the ANSI 92 permissions applicable to the various database objects, for table-valued function being listed the following permissions: DELETE, INSERT, REFERENCES, SELECT, UPDATE. In order to exemplify the idea, let’s create the UDF:
Voila! The example is supposed to work, unless you are having a problem with the permissions. Let’s consider this functionality from the perspective of our problem! What LuborK does is to include the inner query in a view and then delete the data:
One of the problems with this approach is that the value representing the number of records to be returned is hardcoded. And here comes to help an inline table-valued function, also referred as a “parameterized view”:
And, as closure, the example based on the common table expression: