09 April 2011

💎SQL Reloaded: Searching a Value within a Database

Some years back, in a Management meeting related to one of the projects I was working on, one of the Managers made a quite interesting statement, something like: "it should be easy to pool in an Excel sheet from the whole database all the information I am interested in". I kind of smiled, because the information he was requiring was spanning within different modules, having different levels of detail and thus requiring several different output datasets in order to answer a simple question like "how is the manufacturing cost varying over time?". After some meetings we figured out that it’s not so easy to pull the data for the respective question, lacking some data which were not available in the respective database. It’s funny how IS (Information Systems) are getting more complex day by day but fail to capture and indirectly answer simple questions, this also because they were not designed to address such questions. Anyway, this post is not about this.

Some days back I spent a couple of hours trying to identify the tables from which the IDs stored in a log where coming from, in the log being saved no reference to the table, just the IDs. Another example of "good" design! While working diligently, I remembered that I could have easily solved the problem if I were using one of the CLR functions created in a post on Data Profiling Using CLR Functions. Unfortunately, because I was working in a production environment with strong security constraints, I wasn’t allowed registering and use any of the respective functions. Still, considering the easiness and flexibility of a the solution I thought that would be interesting to put it into practice. Who knows, maybe someday I will find the opportunity to use it!

Let’s suppose I am searching for an integer ID value within a given schema of AdventureWorks database. So, in theory, I need to do a search on each integer column of all the tables from the respective schema. The tables can be found in a matter of seconds using the following query based on the dbo.ExecuteScalarToInt CLR function introduced in the previous mention post and dbo.vTableColumns view.
 
-- searching a value in whole database 
WITH CTE 
AS (  
    SELECT [Schema_Name]  
    , Table_Name  
    , Column_Name 
    , system_type 
    , dbo.ExecuteScalarToInt('SELECT count(*) FROM ' + Quotename([Schema_Name]) + '.' + Quotename(Table_Name) + ' WHERE ' + Quotename(Column_Name) + ' = ' + Cast(1001 as varchar(20))) NumberRecords 
    FROM dbo.vTableColumns 
    WHERE [Schema_Name]= 'Sales' 
     --AND Table_Name = 'SalesOrderDetail' 
     --AND Column_Name = 'SalesOrderDetailID' 
     AND system_type = 'int' ) 
SELECT * 
FROM CTE 
WHERE NumberRecords>0 
ORDER BY [Schema_Name] 
, Table_Name  

Output:
Schema_Name Table_Name Column_Name system_type NumberRecords
Sales ContactCeditCard CreditCardId int 1
Sales CreditCard CreditCardId int 1
Sales CurrencyRate CurrencyRateId int 1
Sales CustomerAddress AddressId int 1
Sales SalesOrderDetail SalesOrderDetailId int 1
Sales SalesOrderHeader BillToAddressId int 8
Sales SalesOrderHeader CreditCardId int 1
Sales SalesOrderHeader ShipToAddressId int 8

As can be seen the dynamically built query, based on table metadata and searched value, is provided as parameter to dbo.ExecuteScalarToInt function. The query calculates the number of occurrences (see NumberRecords column) of the same value within a given column. Some attention must be given to the search column’s data type, as the data type needs to be compatible with the searched value, otherwise a cast being necessary. I haven’t tried the query in a big database and I would advice you to be cautious when running it in a production environment. In case you arrive to use the query in a big database, please let me know how it behaved!

In case of the problem I had a few days back, the above query it’s only a first step in identifying the tables in which the values occur. Because multiple tables are returned, it’s needed to investigate which of the tables has the highest probability to be the source for the ID saved in log. Of course, some DM (data mining) algorithm could be probably used for this purpose, however some programming is needed and also some amount of diligent work to prove the data. It would be nice to have a tool that performs such analysis or that creates a complex query using any set of attributes within a given database.

💎SQL Reloaded: Table Value Constructors at Work

    Paradoxically, the first new SQL Server 2008 feature I learned, soon after the first CTP was made available, was the use of table value constructors, however I managed all this time without using it. This happened because I was either working on previous SQL Server versions or because other techniques were more appropriate to the problems I had to solve. So, a few days ago, the use of table value constructors came again on the table when I was weighting what’s the best approach to a problem posed by a colleague. Actually, the problem is quite simple – given a pair of values (id, new value) is needed to update a table with the “new value” corresponding to the given “id”. Previously, in similar cases, I preferred to load the data in a staging table because this was allowing me to quickly perform several checks on the set of values (e.g. check for duplicates, identify how many records will be updated). But what do you do when you are working directly in a production or any other type of environment in which you don’t have the possibility of using a staging table?

    For such cases you can either create a dataset on the fly with a set of UNIONS and reuse it in queries or, create an UPDATE statement for each pair of values. You can even automate the creation of UNION and UPDATE statements, however their use doesn’t seem so elegant, especially when you can use table value constructors. The feature seems to be hidden in documentation, as a matter of fact I found it only in the online documentation, the local help having no reference to it. Anyway, excepting a few posts on the same subject I can’t say I have seen any code until now making use of it. What’s interesting to note, before going into details, is that the ANSI SQL 94 (the earliest version of documentation I have) mentions vaguely table value constructors (p. 221), therefore I expect they are part of ANSI standard.

   According to MSDN a table value constructor allows a set of row value expressions to be constructed into a table within a single DML statement. As can be seen from the MSDN examples the list of values can contain a combination of list of values and queries, as long the data type matches and only a single scalar value is provided as a row value expression Another limitation concerns the maximum number of rows constructed, the maximum value being 1000. It’s important to know these limitations, because they might force you use other approaches. Before sketching a solution for this post’s problem, let’s look how a table can be constructed on the fly given a list of pair-values:

-- example table value constructor 
SELECT EmployeeId 
, VacationHours 
FROM ( VALUES (101, '23') 
     , (102, '82') 
     , (103, '66') 
     , (104, '17') 
      , (105, '64') 
      , (106, '56') 
      , (107, '107') 
      , (108, '50') 
      , (109, '109') 
      , (110, '48')) AS Data(EmployeeId, VacationHours) 

    The table thus constructed can be further used in an UPDATE statement, the following example being based on HumanResources.Employee from AdventureWorks database.

-- example table value constructor update UPDATE HumanResources.Employee 
SET VacationHours = Data.VacationHours 
FROM ( VALUES (101, '23') 
      , (102, '82') 
      , (103, '66') 
      , (104, '17') 
      , (105, '64') 
      , (106, '56') 
      , (107, '107') 
      , (108, '50') 
      , (109, '109') 
      , (110, '48')) AS Data(EmployeeId, VacationHours) 
WHERE HumanResources.Employee.EmployeeId = Data.EmployeeId 

    A similar construct can be be used to perform a SELECT, UPDATE or DELETE in which multiple columns participate in the JOIN. For example let’s consider a SELECT based on a set of pairs of values from the Sales.SalesOrderHeader table:

-- example table value constructor
SELECT SOH.* 
FROM Sales.SalesOrderHeader SOH 
     JOIN ( -- set of pairs 
        VALUES(10, 5069) 
         , (10, 6070) 
         , (10, 7060) 
         , (11, 407) 
         , (11, 1603) 
         , (11, 2737)) AS Data(CustomerId, CurrencyRateID) 
      ON SOH.CustomerId = Data.CustomerID 
    AND SOH.CurrencyRateID = Data.CurrencyRateID 

    As can be seen table value constructors are pretty simple to use, especially when dealing with a small number of values. If they are the best choice, that depends on the problem you are trying to solve and its specific constraints!

04 April 2011

🔹SQL Server: Server Role (Definitions)

"A named set of security accounts. A SQL Server role can contain Windows NT users, Windows NT groups, SQL Server users, or other SQL Server roles from the same database." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A predefined role defined at the server level and existing outside individual databases." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A predefined role that exists at the server level. The scope of the role is limited to the SQL Server instance in which it is defined. Fixed server roles allow you to quickly assign permissions on the server to specific individuals. Roles can be compared to groups in the Window NT/2000 operating systems." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A predefined role at the server level that is used to control administrative access to the SQL Server instance." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A predefined role that exists at the server level. The scope of the role is limited to the SQL Server instance in which it is defined." (Microsoft, "SQL Server 2012 Glossary", 2012)

"These are special groups in SQL Server that are used to limit the amount of administrative access that a user has once logged on to SQL Server." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

🔹SQL Server: Fixed Database Role (Definitions)

"A predefined role defined at the database level and existing in each database." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A predefined role that exists in each database. The scope of the role is limited to the database in which it is defined. You use fixed database roles to quickly define permissions within a particular database. Roles can be compared to groups in the Window NT/2000 operating systems." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"These are special groups in each SQL Server database that already have permissions applied and are used to limit the permissions that a user has inside the database." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A role that is created by SQL Server 2005 in each database with a set of permissions that are predetermined and cannot be altered. Users can be added to one or more fixed database roles." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A predefined role that exists in each database. The scope of the role is limited to the database in which it is defined." (Microsoft, "SQL Server 2012 Glossary", 2012)

30 March 2011

🔹SQL Server: Full-Text Catalog (Definitions)

"A catalog that stores a database's full-text index." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The full-text catalog stores all of the full-text indexes for tables within a database." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A Full-Text catalog is a special storage space used to house Full-Text indexes. By default, all Full-Text indexes are housed in a single catalog." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A special storage space used to house Full-Text indexes. By default, all Full-Text indexes are housed in a single catalog." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"A full-text catalog is a logical grouping of SQL Server full-text indexes for management purposes." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"A collection of full-text index components and other files that are organized in a specific directory structure and contain the data that is needed to perform queries." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A full-text catalog is a logical grouping of SQL Server full-text indexes for management purposes." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"A logical grouping of SQL Server full-text indexes for management purposes." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

🔹SQL Server: Filegroup (Definitions)

"A named collection of one or more database files that forms a single unit of allocation and administration. Filegroups enable the creation of objects in a specific place - for example, placing a heavily accessed table on a very fast drive. They also provide the ability to back up specific objects." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A named collection of one or more files that forms a single unit of allocation or for administration of a database. Filegroups allows you to engage in advanced administration techniques including the specific storage of particular objects on particular hard drives." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A named collection of one or more disk files that represent a single allocation on or for administration of a database. A database can have more than one filegroup." (Allan Hirt et al, "Microsoft SQL Server 2000 High Availability", 2004)

"In SQL Server, a named collection of one or more files that forms a single unit of allocation. Also for administration of a database." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"This is a logical grouping of files used to segregate database objects for storage and performance reasons." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A logical means of controlling the placement of database objects on a file or set of files." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server 2005 Optimization and Maintenance 70-444", 2007)

"A method of optimizing the performance of a database by controlling the placement of database files and database objects. By default, all data and database objects are placed into a single file, in a single filegroup. Unlike files, which can be viewed on the disk, a filegroup is conceptual." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"Collections of SQL Server data files. For performance and administrative reasons, you can place user objects into dedicated filegroups." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A named collection of one or more data files that forms a single unit of data allocation or for administration of a database." (Microsoft, "SQL Server 2012 Glossary", 2012)

25 March 2011

🧭Business Intelligence: Troubleshooting (Part II: Approaching a Query)

Business Intelligence Series
Business Intelligence Series

Introduction

You received a (long) query for troubleshooting, reviewing, conversion or any similar tasks. In addition, you don’t know much about the underlying table structure or business logic. So, what do you do then? For sure two things are intuitively clear: you don’t need to panic and, understanding the query may help you in your task. Understanding the query, it seems such a simple statement, though there is more to it. Here are some points on how to approach a query.

State your problem
 
“A problem well stated is a problem half solved” (Charles F. Kettering). Before performing any work, check what’s requested from you, whether you are having the information required for the task(s) ahead, for example documentation, valid examples, all code, etc. If something is missing, don’t hesitate to request all the information you need. While waiting for information, you can continue with next steps. As we don’t live in a perfect world, there will be also cases in which you’ll have to fill the gaps by yourself by performing additional research/work. When troubleshooting is important to understand what’s wrong and, when possible, have data against which to compare query’s output.

Save the work

Even if you are having a copy of the query somewhere on the server, save the previous version of the query and, when possible, use versioning. It might seem a redundant task, however the fact is that you never know when you need to refer to it and, as you’ll see next, it can/should be used as a baseline for validating the changes. In case you haven’t saved the query, check whether your RDBMS is tracking metadata about the queries run and, if the metadata were not reset in the meantime, you might be lucky enough to find a copy of your query.

I found that is important to save the daily work, the various analysis performed in order to understand a query, the various versions and even the data used for testing. All this work could help you letter to review what you made, the steps you missed, you can reuse one of the queries for further work, etc.

Break down

When the query is too complex, it could be useful to break the query into chunks that could be run and understood in isolation. Typically such chunks derive from query’s structure (e.g. inline queries, subqueries derived from unions). I found that often, focusing only a chunk of a query help isolating issues.

Restructure

Many programmers still write queries using the old non-ANSI joining syntax in which the join constraints appear in the WHERE clause, making the understanding and troubleshooting of a query more difficult. Often I found myself in the position of transforming first a query to ANSI SQL syntax, before performing further work on it. It’s actually a good occasion to gain a first understanding of query’s structure, but I’d prefer not to do it so often. In addition, during restructure phase it makes sense to differentiate between the join and filter constraints, this helping isolating the issue(s).

Check cardinalities

Wrong join constraints lead to duplicates or fewer records than expected, such differences being difficult to track when the variances in the numbers of records are quite small. Even if RDBMS come in developers’ help by providing metadata about the join relations, the columns and predicates participating in a join are not always so easy to identify. Therefore, in order to address this issue, it’s needed to check the constraints between any two tables between participating in a join. Sometimes, when the query is based on the table with the lowest level of detail, it can be enough to check the variations of the number of records.

Check filter constraints

Filter constraints are maybe more difficult to identify, especially when is needed to reengineer the logic built in applications. Many of the filter constraints are logical, though when you have no documentation about the schemas, is like rambling in the dark, having to check real examples and identify the various values and the impact they have on the behavior of your report.

Validate changes
 
So, you made the changes, everything looks perfect. Is it so? Often your intuition might tell you that the logic of a query is correct, though as software is not based on magic, at least not all the time, check some of the records to assure that the data are rendered as expected, check totals, compare the current with previous version, identify variations, etc. You don’t need to use all the technique you know, but to choose the best and minimal set of tools that allows you to validate the query.

Perform refactoring
    
Refactoring, the way to (continuous) code improvement, should become part of each developer’s philosophy about programming. A query, as any other piece of code, is rarely perfect as technical and factual knowledge is relative, features get deprecated and new techniques are introduced. On the other side, there is an old saying in IT – don’t change something that’s already working, so, there should be kept a balance between the two – the apparent and needed for change.

Document
    
I hope it’s not the case to stress the importance of documentation. From versioning to logic description, it’s a good practice to document the important parts of your work, especially the parts that will facilitate later work.

🔹SQL Server: Application Role (Definitions)

"A SQL Server role created to support the security needs of an application. Such a role is activated by a password and the use of the spsetapprole system stored procedure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A SQL Server role created to support the security needs of an application. Using application roles is an alternative to allowing users access to SQL Server 2000. You can create an application role and assign it to a particular application, allowing users who use the application to access SQL Server." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is a special type of role that requires activation using the sp_setapprole stored procedure. This is primarily used to keep users from accessing a database with anything other than a custom application." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A SQL Server role used by the application, instead of the user, to authenticate against a database solution." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"A SQL Server role used by the application, instead of the user, to authenticate against a database solution." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"A SQL Server role created to support the security needs of an application." (Microsoft, "SQL Server 2012 Glossary", 2012)

11 March 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VIII (Insertions, Deletions and Replacements)

Until now, the operations with strings resumed to concatenation and its reverse operation(s) - extracting a substring or splitting a string into substrings. It was just the warm up! There are several other important operations that involve the internal manipulation of strings – insertion, deletion and replacement of a substring in a given string, operations performed using the Replace and Stuff functions.

Replace function, as its name denotes, replaces all occurrences of a specified string value with another. Several scenarios in which the function is quite useful: the replacement of delimiters, special characters, correcting misspelled words or any other chunks of text. Here are some basic simple examples, following to consider the before mentioned applications in other posts:

-- examples with replace 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT replace(@str, ' ', ',') Example1 
, replace(@str, ' ', ' ') Example2 
, replace(@str, ' ', '') Example3 
, replace(@str, 'is', 'as') Example4  
Output:
Example1 Example2 Example3 Example4
this,is,a,test,string this is a test string thisisateststring thas as a test string

When there are good chances that the searched string won’t appear in the “searched” string, and especially when additional logic is depending on the replacement, logic that could be included in the same expression with the replacement, then maybe it makes sense to check first if the searched character is present:

-- replacement with check 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
SET @str = 'this is a test string' 
SET @search = 'this string' 
SET @replacememt = 'other string' 
SELECT CASE            
    WHEN CharIndex(@search, @str)>0 THEN Replace(@str, @search, @replacememt)             
    ELSE @str        
END result 

Unfortunately the function doesn’t have the flexibility of the homonym functions provided by the languages from the family of VB (VBScript, VB.NET), which allow to do the replacement starting with a given position, and/or for a given number of occurrences. This type of behavior could be obtained with a simple trick – splitting the string into two other strings, performing the replacement on the second string, and then concatenating the first string and the result of the replacement:

-- replacement starting with a given position 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
DECLARE @start int 
SET @str = 'this is a test string' 
SET @search = 's' 
SET @replacememt = 'x' 
SET @start = 7 
SELECT Left(@str, @start-1) FirstPart 
, RIGHT(@str, Len(@str)-@start+1) SecondPart 
, CASE        
    WHEN @start <= LEN(@str) THEN Left(@str, @start-1) + Replace(RIGHT(@str, Len(@str)-@start+1), @search, @replacememt)        
     ELSE @str  
END Replacement 
Output:
FirstPart SecondPart Replacement
this i s a test string this ix a text xtring

The logic can be encapsulated in a function together with additional validation logic.

Stuff function inserts a string into another string starting with a given position and deleting a specified number of characters. Even if seldom used, the function it’s quite powerful allowing to insert a string in another, to remove a part of a string or more general, to replace a single occurrence of a string with another string, as can be seen from the below examples:
 
-- Stuff-based examples DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT STUFF(@str, 6, 2, 'was ') Example1 
, STUFF(@str, 1, 0, 'and ') Example2 
, STUFF(@str, 1, 0, 'that') Example3 
, STUFF(@str, LEN(@str) + 1, 0, '!') Example4 
Output:
Example1 Example2 Example3 Example4
this was a test string and this is a test string thatthis is a test string NULL

If in the first example is done a replacement of a text from a fix position, in the next examples are attempted insert on a first, middle respectively end position. As can be seen, the last example doesn’t work as expected, this because the insert position can’t go over the length of the target string. Actually, if the insert needs to be done at the beginning, respectively the end of a string, a concatenation can be much easier to use. A such example is the padding of strings with leading or trailing characters, typically in order to arrive to a given length. SQL Server doesn’t provide such a function, however the function is quite easy to build.
 
-- left/right padding DECLARE @str varchar(30) 
DECLARE @length int  
DECLARE @padchar varchar(1) 
SET @str = '12345'  
SET @length = 10 
SET @padchar = '0' 
SELECT @str StringToPad  
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 
 
Output:
StringToPad LeftPadding RightPadding
12345 0000012345 1234500000

Happy Coding!

24 February 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VI (Subparts of a String)

No matter how normalized a database is, there will always be the need to encode multiple semantic entities in a string, needing thus to extract them later. For example data like the Street Name and Number, or the Country Phone Prefix and Phone Number, the First and the Last Name, etc. Another frequent scenario is the misuse of a long-sized string data types to store multiple delimited attributes or simply a whole text like a Comment or Description. The easiest scenarios to deal with are the ones when you know the rules behind your encoding, preferably dealing with a fix length encoding or only a given delimiter. An example of fix length encoding is the IBAN Number, the country specific VAT Number or any other artificial constructed/standardized encoding. According to Wikipedia, IBAN (International Bank Account) Number consists of  ISO 3166-1 alpha-2 country code, followed by two check digits that are calculated using a mod-97 technique, and Basic Bank Account Number (BBAN) with up to thirty alphanumeric characters. Taking the IBAN example provided for Switzerland, it could be stored as “CH93 0076 2011 6238 5295 7” or “CH9300762011623852957”, in either case when the country is not provided explicitly it would be interesting to extract it from the IBAN together with the BBAN. How would we do that?

Many of the programming languages I worked with provide a function for extracting a substring from a string - Mid (VB/VBScript), Substring in C# Substr in Oracle, etc. SQL Server is actually providing three functions for this purpose: Substring, Left, respectively Right. The Substring function extracts a substring of a given length starting from a given position. The Left and Right functions return, as their name suggests, the left part, respectively the right part of a character string with the specified number of characters. The use of Left and Right functions seems to be redundant, as they are a particular case of Substring, however they can simplify sometimes the code, as can be seen from below example based on the above IBAN example.

-- parsing a VAT Number 
DECLARE @VAT varchar(50) 
SET @VAT = 'CH9300762011623852957' 
SELECT @VAT VATNumber 
, LEFT(@VAT, 2) CountryCode1 
, SUBSTRING(@VAT, 1, 2) CountryCode2 
, SUBSTRING(@VAT, 3, 2) CheckDigits 
, RIGHT(@VAT, Len(@VAT)-4) BBAN1 
, SUBSTRING(@VAT, 5, Len(@VAT)-4) BBAN2 
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 

Substrings Example 1

Even if the IBAN has a variable-length component (the BBAN) given the fact that the other two components are fixed, this allows us to clearly extract each component. The example shows also the equivalent call of Substring function for Left (Country Code extraction), respectively Right (BBAN extraction).

What happens if there are more than one variable-length components? For such scenarios it’s useful to introduce a delimiter, it could be a comma, a dash, space, etc. It’s the case of a First and Last Name stored in the same attribute. Normally only one component qualifies as Last Name, and for simplicity let’s consider it as being stored first and space as delimiter. In order to identify the components, it’s enough to identify the first occurrence of the delimiter by using the CharIndex function.
 
-- parsing a Person's Name 
DECLARE @Name varchar(50) 
SET @Name = 'Stone Sharon' 
SELECT @Name Name 
, LEFT(@Name, CHARINDEX(' ', @Name)) LastName 
, RIGHT(@Name, LEN(@Name) - CHARINDEX(' ', @Name)) FirstName 
Output:
Name LastName FirstName
Stone Sharon Stone Sharon

The code for cutting the left, respectively right part of a string is pretty simple and over the years I used it quite often, so it makes sense to encapsulate it in a function., like I did in an older post.

When a delimiter is used repeatedly in a string, normally we need to identify each component in the string. Sometimes the number of components can be given, other times not. For this purpose can be used a common table expression, and here is another example in which the space is used as delimiter – extracting the words from a given sentence.

-- extracting the words from a sentence 
DECLARE @str nvarchar(100) 
SET @str = 'This is just a test' 
;WITH CTE (PrevString, Position, Word) 
AS ( 
    SELECT LTrim(RTrim( CASE  
         WHEN CharIndex(' ', @str)>0 THEN Right(@str, Len(@str)-CharIndex(' ', @str)) 
         ELSE '' 
     END)) PrevString 
, 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', @str)>0 THEN LEFT(@str, CharIndex(' ', @str)) 
     ELSE @str 
END)) Word 
UNION ALL  
SELECT LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(' ', PrevString)) 
     ELSE '' 
END)) PrevString 
, Position + 1 Position  
, LTrim(RTrim(CASE  
     WHEN CharIndex(' ', PrevString)>0 THEN LEFT(PrevString, CharIndex(' ', PrevString)) 
     ELSE PrevString 
END)) Word 
FROM CTE  
WHERE Len(PrevString)>0 
)  
SELECT PrevString 
, Word  
, Position 
FROM CTE 
OPTION (maxrecursion 100)
Output
PrevString Word Position
is just a test This 1
just a test is 2
a test just 3
test a 4
test 5

The logic works for a sentence, and if we ignore the fact that some punctuation signs are appearing at the end of the words, it might work  as well for a whole phrase, considering that the punctuation signs can be replaced from the end result. It would be useful for example to generalize the logic for a set of delimiters, in this case the other punctuation signs (e.g. “,”, “;”, “!”, etc.), however this would mean to identify which of the delimiters is used first or to apply the the same logic for the first delimiter, then for the second and so on. In addition, if the number of encoded elements within a value remain the same, a pivot can be applied on the final result and have thus all values’ elements within the same row.

Happy Coding!   

05 February 2011

💎SQL Reloaded: Deleting Sequential Data From a Table

    Last week I run into an interesting solution to a simple problem, problem I dealt with in several occasions too: how to delete (efficiently) sequential data, or how LuborK calls it in his post, “a problem of efficient ordered delete”.  In this case he’s talking about the deletion of the first n records from a dataset of sequential data. It sounds like a simple problem considering that we can easily select in a first step the first n records using the TOP function and then delete the respective matched records. That’s also what LuborK does, however the novelty of the solution proposed it’s a little unexpected, and this not from the point of view of the approach, but of the gain in performance. The bottom line: by encapsulating the inner query within a view, the deletion of sequential data becomes more efficient.

    While reading the respective post I thought: what if the inner query is encapsulated in a inline table-valued function?! Would the performance remain the same or deprecate? Does it make sense to add the overhead of creating a view for each such scenario in order to gain some performance? Then, reading the comments, one of them was pointing something so obvious: the inner query can be encapsulated within a common table expression, following to perform on it the deletion. An elegant solution I haven’t had available under SQL Server 2000 at the times I had to solve the same problem. Actually my approach was a little different, at that time identifying and storing the greatest value of a new data set, building thus a collection of values that would allow me to select a given range. This approach was quite useful in the respective scenarios and quite acceptable as performance, plus the fact that I was reusing the greatest value in further logic – two birds with one shot.

    This being said, I was thinking that except the methods provided by LuborK, I could provide in this post several other simple techniques. Please note that I’m not focusing on performance, but on the techniques. Eventually you can display the statistics related to the disk activity and time required to parse, compile and execute each batch of statements. So, let’s start with the creation and population of the table! In order to minimize the differences, I took the same example and added just a little make-up – comments and a small change in the way the values of the second column are inserted.  

-- dropping the table 
DROP TABLE dbo.TestTable 

--declaring the table 
CREATE TABLE dbo.TestTable ( 
  id int primary key 
, value varchar (50))  

-- inserting the records in a loop 
DECLARE @index int  
SET @index=1  
SET NOCOUNT ON  
WHILE (@index<100000)  
BEGIN 
    INSERT INTO dbo.TestTable  
    VALUES (@index,'test '+cast(@index as varchar(10))) 
    SET @index=@index+1  
END  

    Supposing we are interested in the first 10000 of records of a sequential data set, it would be much easier if we would know the maximum value from the respective data set. Then we could use the value to perform a simple deletion:
 
-- deletion based on stored max value DECLARE @max int 
SELECT @max = MAX(ID) 
FROM( 
    SELECT top(10000) id 
    FROM TestTable  
    ORDER BY ID 
) A 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 

    If we don’t need the maximum value for further logic, the two statements in can be combined in one query:

-- deletion based on max value - within one query DELETE FROM TestTable 
WHERE id <= ( 
     SELECT MAX(ID) 
     FROM ( 
          SELECT top(10000) id 
          FROM dbo.TestTable  
          ORDER BY ID 
     ) A 
) 

--deleting the data  
DELETE FROM TestTable 
WHERE id <= @max 
       
    When dealing with huge volumes of data, and not only then, in case of sequential processing we can store the maximum value in a given table, and pull the values as required. This approach allows us to process any interval within the data set, in this case the delete resumes to a simple delete statement in which @start and @end were retrieved from the respective table:
 
-- deleting the data within a given interval 
DELETE FROM dbo.TestTable 
WHERE id BETWEEN @start AND @end 

   If this seems trivial, let’s spice up things! As I observed, few people know that it’s possible to delete data from a given table using an inline table-valued function. Huh, what’s that?! If it seems impossible to you, take a look at documentation! I’m referring here to the GRANT object permissions section in which of interest is the following statement that refers to the ANSI 92 permissions applicable to the various database objects, for table-valued function being listed the following permissions: DELETE, INSERT, REFERENCES, SELECT, UPDATE. In order to exemplify the idea, let’s create the UDF:
 
-- creating the UDF function 
CREATE FUNCTION dbo.TestUDF1( 
@start int 
, @end int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT id, value  
    FROM dbo.TestTable  
    WHERE id BETWEEN @start AND @end 
)  

--deleting the data 
 DELETE FROM dbo.TestUDF1(20001, 30000) 

    Voila! The example is supposed to work, unless you are having a problem with the permissions. Let’s consider this functionality from the perspective of our problem! What LuborK does is to include the inner query in a view and then delete the data:
 
-- creating the view 

CREATE VIEW dbo.vTestView 
AS 
SELECT top(10000) id 
FROM dbo.TestTable  
ORDER BY ID  


--deleting the data 
DELETE FROM dbo.vTestView 

    One of the problems with this approach is that the value representing the number of records to be returned is hardcoded. And here comes to help an inline table-valued function, also referred as a “parameterized view”: 

 -- creating the UDF function 
 CREATE FUNCTION dbo.TestUDF2( 
@number_records int) 
RETURNS TABLE 
AS RETURN ( 
    SELECT top (@number_records) id, value  
    FROM dbo.TestTable  
    ORDER BY id 
)        

--deleting the data 
DELETE FROM dbo.TestUDF2(10001)  

    And, as closure, the example based on the common table expression:

 -- deletion using a CTE 
;WITH CTE  
AS ( 
     SELECT top(10000) id 
     FROM dbo.TestTable  
     ORDER BY ID 
)  
DELETE FROM CTE 
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.