Showing posts with label recursion. Show all posts
Showing posts with label recursion. Show all posts

05 December 2010

SQL Reloaded: A Look into Reverse

Some time back I was mentioning Stuff as one of the functions rarely used in SQL Server. Here’s another one: Reverse. I remember I used such a function in other programming languages and maybe 1-2 times in SQL Server, and I can’t say I seen it in action in other posts. As it name states and the MSDN documentation confirms, Reverse returns the reverse of a string value. The function takes as parameter any string value or value that could be implicitly conversed to a string, otherwise you’ll have to do an explicit conversion. As can be seen from the below first example the reverse of the “string” string is “gnirts”, and, as per the second example, the reverse of the revers of a string is the initial string itself. The third and fourth example use in exchange a numeric value.
 
-- reversing a string using Reverse function 
SELECT Reverse('string') Example1 
, Reverse(Reverse('string')) Example2 
, Reverse(1234567890) Example3 
, Reverse(Reverse(1234567890)) Example4 
Example1 Example2 Example3 Example4
gnirts string 987654321 1234567890

It seems there is not much to be said about Reverse function, and in the few situations that you need it, it’s great to have it, otherwise you would have to built it yourself. Let’s try to imagine there is no Reverse function in SQL Server, how would we provide such functionality by using existing functionality? In general such an exercise might look like lost time, though it could be useful in order to present several basic techniques. In this case the Reverse functionality could be provided by using a simple loop that iterates through string’s characters changing their order. The below piece of code should return the same output as above:

-- reversing a string using looping 
DECLARE @String varchar(50) 
DECLARE @Retval varchar(50) 
DECLARE @Index int 
SET @String = 'string' 
SET @Index = 1 
SET @Retval = '' 
WHILE @Index <= Len(@String) 
BEGIN  
   SET @Retval = Substring(@String, @Index, 1) + @Retval   
   SET @Index = @Index + 1  
END 
SELECT @Retval 

Some type of problems whose solution involve the sequential processing of values within loops could be solved also using recursive techniques (recursion) that involve the sequential processing of values in which the output of a given intermediary step is based on the previous step. Recursion imply a slightly different view of the same problem, it could prove itself to be maybe less efficient in some cases and quite a useful technique in the others. SQL Server provides two types of recursion, one at function level involving functions, stored procedures are triggers, respectively at query level implemented in common table expressions. Recursive functions, functions that call themselves, have been for quite a while in SQL Server though few of the database books I read really talk about them. Probably I will detail the topic in another post, though for the moment I will show only the technique at work. In this case the recursive approach is quite simple:

-- reversing a string using a recursive function 
CREATE FUNCTION dbo.ReverseString( 
@string varchar(50)) 
RETURNS varchar(50) 
AS 
BEGIN           
     RETURN (CASE WHEN Len(@string)>1 THEN dbo.ReverseString( Right(@string, len(@string)-1)) + Left(@string, 1) ELSE @string END) 
END 
 
-- testing 
SELECT dbo.ReverseString('string') Example1 
, dbo.ReverseString(dbo.ReverseString('string')) Example2 

Common table expressions (CTE), since their introduction with SQL Server 2005, became quite an important technique in processing hierarchical structures like BOMs, which involve the traversal of the whole tree. A sequence is actually a tree having the width of 1 node, so it seems like a CTE’s job:
  
-- reversing a string using common table expression 
DECLARE @String varchar(50) 
SET @String = 'string' 
;WITH CTE(String, [Index]) 
AS (     SELECT Cast('' as varchar(50)) String 
    , 1 [Index] 
    UNION ALL 
    SELECT Cast(Substring(@String, [Index], 1) + String as varchar(50)) String 
    , [Index] + 1 [Index] 
    FROM CTE 
    WHERE [Index]<=Len(@String) 
) 
SELECT @String Reversed 
FROM CTE 
WHERE [Index]-1=Len(@String) 

Unlike the previous two solutions, the CTE approach involves more work, work that maybe isn’t required just for the sake of using the technique. When multiple alternative approaches exist, I prefer using the (simplest) solution that offers the highest degree of flexibility. In this case is the loop, though in others it could be a recursive function or a CTE.

09 February 2010

SQL Reloaded: Ways of Looking at Data IV (Hierarchies)

Introduction

There is a special type of entities that when related to each other, the relations put together form a hierarchy. It’s the case of assemblies and the components they are made of, a component could be at its turn an assembly formed from smaller components, resulting thus a structure known as Bill of Materials (BoM). Take for example a laptop made of parts like the keyboard, modem, hard disk, CD-ROM, microprocessor, volatile memory, graphic card, display, to name just a few of the main components, each of these parts are at their turn are made of smaller components that at their turn could be made from smaller components, and so on. Another popular example is the one of employees in an organization, following the line of command from CEO to the lowest person as importance, a person could be chief for more employees and have only one chief, resulting thus another hierarchical structure.

Most of the hierarchies I met were having maximum 7 levels, where a level represents a parent-child relation, typically there is no need for higher hierarchies. In a RDBMS such relations are represented with self-referencing tables or self-referencing structures in case more than one table is used to represent the relation. The problem with such structures is that in order to get the whole hierarchy a join must be created for each relation, though as the number of levels could vary, such queries can become quite complex. Therefore RDBMS came with their own solutions in order to work with hierarchies

There are several types of problems specific to hierarchies: 1. Building the hierarchy behind of an item in the structure
2. Identifying the top-most parent of an item in the structure
3. Identifying the top-most parent of an item in the structure and building the hierarchy behind it – a combination of the previous mentioned problems

Building the hierarchy

SQL Server 2005 introduced the common table expressions (CTE), a powerful feature that allows building whole hierarchies in a simplistic manner. From a structural point of view a CTE has two elements – the anchor member defining the starting point of the query, respectively the recursive member that builds upon the anchor member. The starting point of the query in this case is the entity or entities for which the structure is built, and here there are 2 situations – we are looking for all topmost entities or for the entities matching a certain criteria, typically looking for one or more Product Numbers. The difference between the two scopes could lead to different approaches – whether to include the Product information in the CTE or to use the CTE only to built the hierarchical structure, and add only in the end the production information. Before things would become fuzzier let’s start with an example, here are the two approaches for defining the anchor member, the first query identifying the top-most assemblies, while the second searches for the components in the BOM that matches a criteria (e.g. the Product’s name starts with BK-R93R).

So we have the starting point, how about the recursive member? Without using a CTE, normally we would have to join either of the two queries again with the BillofMaterials table in order to retrieve the components for the Products already retrieved, then the final result must be joined again with the BillofMaterials, and repeat this step over and over again until a predefined number of iterations are met, or until no components are retrieved anymore. Actually that’s what a CTE does too, and in order to make things simpler, a CTE has a name that it’s reused in the CTE in order to handle the recursive part, and outside of the CTE in order to use the final result. Supposing that CTE’s name is BOM here is the recursive member:
    The anchor and recursive member are merged in the CTE with the help of a UNION therefore the number of attributes must be the same and their data type must be compatible. Here is the final query:
    The TotalQty attribute has been introduced in order to show the cumulated quantity for each component, for example if the Assembly has the PerAssemblyQty 2 and the Component has the PerAssemblyQty 10 then the actual total quantity needed is 2*10.
    The Path attribute has been introduced in order to sort the hierarchy lexically based on the sequence formed from the ComponentID, it could have been used instead the Product Number.

Identifying the top-most parent

    The CTE can be used also for identifying the top-most parent of a given Product or set of Items, the second anchor query member could be used for this, while the recursive member would involve the reverse join between BOM and BillOfMaterials. Here is the final query:
Given the fact that the final CTE output includes the output from each iteration when only the data from the last iteration are needed, this might not be the best solution. I would expect that the simulation of recursion with the help of a temporary table would scale better in this case.

Another interesting fact to note is that in this case more assemblies are found to match the criteria of top-most item, meaning that the BOMs are not partitioned – two BOMs could have common elements.

Happy coding!

07 December 2007

Software Engineering: Recursion (Just the Quotes)

"A powerful tool for reducing apparent complexity is recursion. In a recursive procedure, the method of solution is defined in terms of itself. That is, each part of the routine handles only a small piece of the strategy, then calls the other parts of the routine as needed to handle the rest. The trick is to reduce each hard case to one that is handled simply elsewhere." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Recursion represents no saving of time or storage. Somewhere in the computer must be maintained a list of all the places a recursive routine is called, so the program can eventually find its way back. But the storage for that list is shared among many different uses. More important, it is managed automatically; many of the burdens of storage management and control flow are placed on the compiler, not on the programmer. And since bookkeeping details are hidden, the program can be much easier to understand. Learning to think recursively takes some effort, but it is repaid with smaller and simpler programs." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Use recursive procedures for recursively-defined data structures." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Recursion is the root of computation since it trades description for time." (Alan J Perlis, "Epigrams on Programming", 1982)

"A deterministic process may be defined in terms of a mathematical function from its input channels to its output channels. Each channel is identified with the indefinitely extensible sequence of messages which pass along it. Such functions are defined in the usual way by recursion on the structure of the input sequences, except that the case of an empty input sequence is not considered." (C Anthony R Hoare, "Communicating Sequential Processes", 1985)

"Recursion permits the definition of a single process as the solution of a single equation. The technique is easily generalised to the solution of sets of simultaneous equations in more than one unknown. For this to work properly, all the right-hand sides must be guarded, and each unknown process must appear exactly once on the left-hand side of one of the equations." (C Anthony R Hoare, "Communicating Sequential Processes", 1985)

"Apart from power laws, iteration is one of the prime sources of self-similarity. Iteration here means the repeated application of some rule or operation - doing the same thing over and over again. […] A concept closely related to iteration is recursion. In an age of increasing automation and computation, many processes and calculations are recursive, and if a recursive algorithm is in fact repetitious, self-similarity is waiting in the wings."(Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990)

"When we talk about software architecture, software is recursive and fractal in nature, etched and sketched in code. Everything is details. Interlocking levels of detail also contribute to a building’s architecture, but it doesn’t make sense to talk about physical scale in software. Software has structure - many structures and many kinds of structures-but its variety eclipses the range of physical structure found in buildings. You can even argue quite convincingly that there is more design activity and focus in software than in building architecture - in this sense, it’s not unreasonable to consider software architecture more architectural than building architecture!" (Robert C Martin, "Clean Architecture: A Craftsman's Guide to Software Structure and Design", 2017)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.