About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Thursday, February 04, 2010

Just In CASE - Part II: Clauses and Joins

    The CASE function could appear in all typical clauses – SELECT, WHERE, GROUP BY, ORDER BY and even in JOIN predicates. If the impact on performance is quite small when CASE is used inside of a SELECT clause, the decrease in performance could be quite high when the CASE is used in searches – WHERE constraints or JOIN predicates. Oracle provides function-based indexes, the smart use of such functions could improve in theory searches performed with CASE/DECODE functions, unfortunately as far I know there is no similar feature available on MS SQL Server family.

    The use of CASE functions in WHERE clause is very rare because it could be replaced with several OR operators, the only benefit results maybe from syntax’ clearness or by checking values’ type before using them – the case of numeric and date values. Even if the below example doesn’t make sense from a business point of view, it shows a possible use of CASE within the WHERE clause.
    There are systems that misuse an alphanumeric attribute in order to store a numeric or date value, therefore before using the respective values in computations it’s necessary/recommended to test whether the values have the expected data type. I met also cases in which the range of dates covered by a date type in a RDBMS is narrower than a similar date type of other RDBMS (e.g. SQL Server vs. Oracle), in such case the test for valid dates is also necessary. For this purpose the CASE could be used together with the IsDate, respectively IsNumeric, functions as in below examples.
    The same data type handling could be used also in joins when with the misuse of alphanumeric data types, the numeric values represent references to other tables, or the date values could be used in the logic of a join. When the misused attribute is on the main table then the impact on performance is minimal, though when the misused attribute appears in the referenced table then the performance could be impacted considerably because is less likely than an index is built on the respective alphanumeric attribute, a full table scan being done. In order to exemplify this situation I will use the dbo.SplitList table-valued function, which takes a list of value and creates a table out of it, and join it with the Production.Product table. I used to Cast the Value attribute from dbo.SplitList to an integer in order to simulate the handling of numeric values.
    Even if this use might be look strange for some people, it isn’t utopic, I met many cases in which Users wanted to pull for example all the Products available in the system having the values for a given attribute in a given list, and they wanted to see also the values that had no match. In theory could be easier to rewrite the query as follows, and use in Excel a VLOOKUP function to determine the attributes with no match.
    Of course, in this case it would have been much easier to just use the IN operator and provide the list of values.

No comments: