Two days ago I was looking on the Web on random posts to see whether I missed something important related to CASE function and I stumble over an old post of Craig S. Mullins, in one of the examples (See SQL Statement #5) there is a query similar with the one below:
As can be seen on more than one branch of a CASE there is a correlated sub-query based on the same table, the aggregated value being used for further calculations. Most probably the respective query had only the purpose to demonstrate a technique, though I’m having mainly two observations related to it:
1. The query returns a label of whether the Stock is High, Normal, Low or there is no Stock, though I have no measure of what the Stock actually is, thus in a first phase I can’t validate whether the query shows the correct data. In case a report is based on the respective query this reduces considerably also report's usability.
2. A select is replicated across several CASE branches fact that I find it complicates query’s complexity reflected in query’s understanding and maintainability, while considering that the logic needs to be kept in synchronization increases the chances for making a mistake. At least in this situation, being involved only one table, the logic is quite simple though what do you do when the correlated sub-query is more complex?
How about encapsulating query’s logic in a sub-query and join it to the main tables using a left join? This approach would address both issues mentioned above:
When creating a query there is always an important facts that needs to be considered - query’s performance. The second query is simpler and in theory it should be easier to process, expecting to have at least similar performance as the first version. In this case the ProductID is a foreign key in Production.ProductInventory, the search performed on the respective table having minimum impact on performance. If no index is available on the searched attribute, more likely first query’s performance will decrease considerably. The best approach for mitigating the performance differences between the two queries is to look at Client Statistics and Execution Plan. In what concerns the Client Statistics both queries are having similar performance, while the Execution Plan of the second query excepting the fact that is simpler and, without going into details, it seems the second plan is better.
Maybe it makes sense to use correlated sub-queries in a CASE only when the correlated sub-queries have distinct logic, though even then I would recommend using a left join instead, this technique allowing more flexibility being possible to show the actual values in the query and even reuse them if needed in additional calculations.
Talking about correlated queries, I found cases in which two distinct correlated sub-queries with the same logic were used to pull two distinct attributes available in the same table, something like in the below example.
What if the correlated sub-queries are used again with the CASE function like in the below example?
Even if maybe the number of calls to the correlated sub-queries is reduced by using a CASE, I would recommend using a left join instead, this technique offering more flexibility, the logic becoming also much easier to maintain and debug.
A Software Engineer and data professional's blog on SQL, data, databases, data architectures, data management, programming, Software Engineering, Project Management, ERP implementation and other IT related topics.
05 February 2010
Subscribe to:
Post Comments (Atom)
About Me
- Adrian
- Koeln, NRW, Germany
- IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.
No comments:
Post a Comment