SQL Troubles: recursion

Showing posts with label recursion. Show all posts

03 March 2025

💎🤖SQL Reloaded: Copilot Stories (Part VIII: Recursive Functions)

In theory, recursive functions prove useful in generating a list of numbers based on a formula in which the final result is based on the result of the previous step. In other words, a problem is solved by calculating the previous iteration of the same problem. In practice, they are limited by the number of iterations allowed, and there is a hard limit in SQL Server, the available resources (usually memory), and probably a few other factors. Of course, some algorithms can be implemented to overcome the limitations, while other times there are quite simple linear implementations based on a loop, like in the below examples. Therefore, recursive functions are more of a theoretical concept.

Disclaimer: The below code was generated by Copilot via prompts and is used for demonstration purposes!

This post evaluates whether Microsoft 365 and Edge Copilot can be used to generate the code for some recursive functions.

Case 1: Factorials

Let's start with the factorial that calculates the product of the n consecutive numbers.

Prompt:
Generate a recursive function in SQL Server to calculate the factorial of a number.

Output (M365 Copilot):

-- factorial function in Microsoft 365 Copilot
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    IF @n <= 1
        RETURN 1
    ELSE
        RETURN @n * dbo.Factorial(@n - 1)
END

Output (Edge Copilot):

-- factorial function in Edge Copilot
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    IF @n = 0 OR @n = 1
        RETURN 1
    ELSE
        RETURN @n * dbo.Factorial(@n - 1)
END;

The structure seems correct, and the returning value was identified correctly as bigint, which allows the maximum value in working with integer values. Unfortunately, both functions lead to the same error in a SQL database:

"Msg 455, Level 16, State 2, Line 9, The last statement included within a function must be a return statement."

The formulas ca be consolidated in a single statement as follows:

-- factorial function adapted (minimal)
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    RETURN 
        CASE 
            WHEN @n = 0 OR @n = 1 THEN 1
        ELSE
            @n * dbo.Factorial(@n - 1) END
END;

Especially when formulas become complex, it might be a good idea to store the result into a variable and use it in the RETURN statement.

-- factorial function adapted
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    DECLARE @retval as bigint;

    IF @n = 0 OR @n = 1
        SET @retval = 1
    ELSE
        SET @retval = @n * dbo.Factorial(@n - 1)

    RETURN @retval
END;

And, here are some tests for the function:

-- testing the functions for multiple values
SELECT dbo.Factorial (0) -- returns 1
SELECT dbo.Factorial (1) -- returns 1
SELECT dbo.Factorial (2) -- returns 2
SELECT dbo.Factorial (3) -- returns 6
SELECT dbo.Factorial (4) -- returns 24
SELECT dbo.Factorial (5) -- returns 120
SELECT dbo.Factorial (20) -- returns 2432902008176640000
SELECT dbo.Factorial (21) -- returns an error for all bigger values

As can be seen, the function will return an error for values bigger than 20, output limited by the data type used. One can build further constraints to test for the maximum value.

Use the following code to drop the function as needed:

-- dropping the function
DROP FUNCTION IF EXISTS dbo.Factorial

Function's implementation has however another important caveat. If negative numbers are provided as parameters, the following error message is thrown:

"Msg 217, Level 16, State 1, Line 1, Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32)."

The final function can be defined as:

-- factorial function adapted (final)
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    DECLARE @retval as bigint;

    IF @n BETWEEN 0 AND 20
        IF @n = 0 OR @n = 1
            SET @retval = 1
        ELSE
            SET @retval = @n * dbo.Factorial(@n - 1)

    RETURN @retval
END;

One can linearize the function as a loop, though the limitation with the maximum value still exists:

Prompt:
Rewrite the factorial function using a loop.

Output (M365 Copilot):

-- non-recursive function to calculate the factorial of a number using a loop:
CREATE FUNCTION dbo.Factorial (@n INT)
RETURNS BIGINT
AS
BEGIN
    DECLARE @result BIGINT = 1;
    DECLARE @i INT = 1;

    WHILE @i <= @n
    BEGIN
        SET @result = @result * @i;
        SET @i = @i + 1;
    END

    RETURN @result;
END

-- test the code
SELECT dbo.Factorial(5) AS FactorialResult; -- result 120

One can obtain some improvement by changing the return type and @retsult from bigint to float, though the results are always approximations. Here are the approximated results:

-- test code
SELECT dbo.Factorial(20) AS FactorialResult; -- result 2432902008176640000
SELECT dbo.Factorial(21) AS FactorialResult; -- result 5.109094217170944E+19
SELECT dbo.Factorial(22) AS FactorialResult; -- result 1.1240007277776077E+21
SELECT dbo.Factorial(23) AS FactorialResult; -- result 2.585201673888498E+22
SELECT dbo.Factorial(24) AS FactorialResult; -- result 6.204484017332394E+23
SELECT dbo.Factorial(25) AS FactorialResult; -- result 1.5511210043330986E+25
SELECT dbo.Factorial(26) AS FactorialResult; -- result 4.0329146112660565E+26
SELECT dbo.Factorial(27) AS FactorialResult; -- result 1.0888869450418352E+28
SELECT dbo.Factorial(28) AS FactorialResult; -- result 3.0488834461171384E+29
SELECT dbo.Factorial(29) AS FactorialResult; -- result 8.841761993739701E+30
SELECT dbo.Factorial(30) AS FactorialResult; -- result 2.6525285981219103E+32
SELECT dbo.Factorial(31) AS FactorialResult; -- result 8.222838654177922E+33
SELECT dbo.Factorial(32) AS FactorialResult; -- result 2.631308369336935E+35
SELECT dbo.Factorial(170) AS FactorialResult; -- result 7.257415615307994E+306
SELECT dbo.Factorial(171) AS FactorialResult; -- results in arithmetic overflow

Another structure that allows creating recursive functions are the common table expressions (CTEs).

Prompt:
Create a recursive CTE in SQL Server for calculating n factorial.

Output (M365 Copilot):

-- n factorial as CTE
WITH FactorialCTE AS (
    -- Anchor member: factorial of 0 is 1
    SELECT 0 AS n, 1 AS factorial
    UNION ALL
    -- Recursive member: factorial of n is n * factorial of (n-1)
    SELECT n + 1, (n + 1) * factorial
    FROM FactorialCTE
    WHERE n < 10 -- Change this value to calculate factorial for larger numbers
)
SELECT n, factorial
FROM FactorialCTE;

One can cast the value in the anchor member as bigint or float as above with all the constraints resulting from this:

-- n factorial as CTE
WITH FactorialCTE AS (
    -- Anchor member: factorial of 0 is 1
    SELECT 0 AS n, Cast(1 as bigint) AS factorial
    UNION ALL
    -- Recursive member: factorial of n is n * factorial of (n-1)
    SELECT n + 1, (n + 1) * factorial
    FROM FactorialCTE
    WHERE n < 20 -- Change this value to calculate factorial for larger numbers
)
SELECT n, factorial
FROM FactorialCTE;

Case 2: Approximation of π

Power laws can be used to approximate the value of π and other irrational constants. Writing such a function via recursive functions has the same constraint as above, though there are alternatives (including storing the actual value as a constant).

Prompt:
Generate a recursive function in SQL Server to approximate the value of π using power laws.

Output (M365 Copilot):

-- approximate the value of π using this series:
CREATE FUNCTION dbo.ApproximatePi (@n INT)
RETURNS FLOAT
AS
BEGIN
    IF @n = 0
        RETURN 4.0
    ELSE
        RETURN 4.0 * POWER(-1, @n) / (2 * @n + 1) + dbo.ApproximatePi(@n - 1)
END

-- test the function
SELECT dbo.ApproximatePi(1000) AS PiApproximation;

Function's definition has the same issue as the previous example. It's needed to rewrite the function as a single return:

-- approximate the value of π using this series (rewritten)
CREATE FUNCTION dbo.ApproximatePi (@n INT)
RETURNS FLOAT
AS
BEGIN
    RETURN
        CASE 
            WHEN @n = 0 THEN 4.0
            ELSE
                4.0 * POWER(-1, @n) / (2 * @n + 1) + dbo.ApproximatePi(@n - 1)
        END
END

-- testing the function
SELECT dbo.ApproximatePi (0) -- output: 4
SELECT dbo.ApproximatePi (1) -- output: 2.666666666667
SELECT dbo.ApproximatePi (2) -- output: 3.466666666667
SELECT dbo.ApproximatePi (3) -- output: 2.8952380952389998
SELECT dbo.ApproximatePi (10) -- output: 3.232315809407
SELECT dbo.ApproximatePi (20) -- output: 3.189184782279
SELECT dbo.ApproximatePi (30) -- output: 3.1738423371919993
SELECT dbo.ApproximatePi (31) -- output: 3.110350273699999
SELECT dbo.ApproximatePi (32) -- output: 3.171888735237999
SELECT dbo.ApproximatePi (33) -- throws error

After 32 iterations the call throws the same error as above. Fortunately, such functions can be (re)written using non-recursive implementations, which is the proper way to implementation such functions:

Prompt:
Implement the function without using recursive functions.

Output (M365 Copilot):

-- approximation within loop
CREATE FUNCTION dbo.ApproximatePi (@terms INT)
RETURNS FLOAT
AS
BEGIN
    DECLARE @pi FLOAT = 0.0;
    DECLARE @i INT = 0;

    WHILE @i < @terms
    BEGIN
        SET @pi = @pi + 4.0 * POWER(-1, @i) / (2 * @i + 1);
        SET @i = @i + 1;
    END

    RETURN @pi;
END

This implementation is more flexible and a similar approach can be used for similar series. And, here are the results for various parameters:

-- testing the function
SELECT dbo.ApproximatePi(1000) AS PiApproximation; -- result: 3.140592653840002
SELECT dbo.ApproximatePi(10000) AS PiApproximation; -- result: 3.141492653571029
SELECT dbo.ApproximatePi(100000) AS PiApproximation; -- result: 3.141582653466945
SELECT dbo.ApproximatePi(1000000) AS PiApproximation; -- result: 3.1415916535590713

The last call is close, but probably not close enough, especially when one needs to use the function in multiple places. There are more complex power series which provide better performance faster. The direct use of the pi() function is usually recommended:

-- using the pi function
SELECT pi(); -- 3.141592653589793

Moreover, some series can be reduced to formulas based on constants like π, e or Φ.

And, a final piece of code:

-- drop the function
DROP FUNCTION IF EXISTS dbo.ApproximatePi

One can use a CTE also in this case:

Prompt:
Generate a recursive function in SQL Server to approximate the value of π using power laws.

Output (M365 Copilot):

-- series via recursive CTE

WITH PiApproximationCTE AS (
    -- Anchor member: start with k = 0
    SELECT 0 AS k, CAST(4.0 AS FLOAT) AS term, CAST(4.0 AS FLOAT) AS pi_approx
    UNION ALL
    -- Recursive member: calculate the next term and add it to the approximation
    SELECT 
        k + 1,
        CAST(4.0 * POWER(-1, k + 1) / (2 * (k + 1) + 1) AS FLOAT),
        pi_approx + CAST(4.0 * POWER(-1, k + 1) / (2 * (k + 1) + 1) AS FLOAT)
    FROM PiApproximationCTE
    WHERE k < 1000 -- Change this value to increase the number of terms
)
SELECT k, pi_approx
FROM PiApproximationCTE
OPTION (MAXRECURSION 0); -- Allow for more recursion levels if needed

This time Copilot generated a different faction for calculating the same. (The output is too long to be added in here as one record is generated for each iteration.)

Happy coding!

Previous Post <<||>> Next Post

05 December 2010

💎🏭SQL Reloaded: A Look into Reverse

Some time back I was mentioning Stuff as one of the functions rarely used in SQL Server. Here’s another one: Reverse. I remember I used such a function in other programming languages and maybe 1-2 times in SQL Server, and I can’t say I seen it in action in other posts. As it name states and the MSDN documentation confirms, Reverse returns the reverse of a string value. The function takes as parameter any string value or value that could be implicitly conversed to a string, otherwise you’ll have to do an explicit conversion. As can be seen from the below first example the reverse of the “string” string is “gnirts”, and, as per the second example, the reverse of the revers of a string is the initial string itself. The third and fourth example use in exchange a numeric value.

-- reversing a string using Reverse function 
SELECT Reverse('string') Example1 
, Reverse(Reverse('string')) Example2 
, Reverse(1234567890) Example3 
, Reverse(Reverse(1234567890)) Example4

Example1	Example2	Example3	Example4
gnirts	string	987654321	1234567890

It seems there is not much to be said about Reverse function, and in the few situations that you need it, it’s great to have it, otherwise you would have to built it yourself. Let’s try to imagine there is no Reverse function in SQL Server, how would we provide such functionality by using existing functionality? In general such an exercise might look like lost time, though it could be useful in order to present several basic techniques. In this case the Reverse functionality could be provided by using a simple loop that iterates through string’s characters changing their order. The below piece of code should return the same output as above:

-- reversing a string using looping 
DECLARE @String varchar(50) 
DECLARE @Retval varchar(50) 
DECLARE @Index int 
SET @String = 'string' 
SET @Index = 1 
SET @Retval = '' 
WHILE @Index <= Len(@String) 
BEGIN  
   SET @Retval = Substring(@String, @Index, 1) + @Retval   
   SET @Index = @Index + 1  
END 
SELECT @Retval

Some type of problems whose solution involve the sequential processing of values within loops could be solved also using recursive techniques (recursion) that involve the sequential processing of values in which the output of a given intermediary step is based on the previous step. Recursion imply a slightly different view of the same problem, it could prove itself to be maybe less efficient in some cases and quite a useful technique in the others. SQL Server provides two types of recursion, one at function level involving functions, stored procedures are triggers, respectively at query level implemented in common table expressions. Recursive functions, functions that call themselves, have been for quite a while in SQL Server though few of the database books I read really talk about them. Probably I will detail the topic in another post, though for the moment I will show only the technique at work. In this case the recursive approach is quite simple:

-- reversing a string using a recursive function 
CREATE FUNCTION dbo.ReverseString( 
@string varchar(50)) 
RETURNS varchar(50) 
AS 
BEGIN           
     RETURN (CASE WHEN Len(@string)>1 THEN dbo.ReverseString( Right(@string, len(@string)-1)) + Left(@string, 1) ELSE @string END) 
END

-- testing 
SELECT dbo.ReverseString('string') Example1 
, dbo.ReverseString(dbo.ReverseString('string')) Example2

Common table expressions (CTE), since their introduction with SQL Server 2005, became quite an important technique in processing hierarchical structures like BOMs, which involve the traversal of the whole tree. A sequence is actually a tree having the width of 1 node, so it seems like a CTE’s job:

-- reversing a string using common table expression 
DECLARE @String varchar(50) 
SET @String = 'string' 
;WITH CTE(String, [Index]) 
AS (     SELECT Cast('' as varchar(50)) String 
    , 1 [Index] 
    UNION ALL 
    SELECT Cast(Substring(@String, [Index], 1) + String as varchar(50)) String 
    , [Index] + 1 [Index] 
    FROM CTE 
    WHERE [Index]<=Len(@String) 
)

SELECT @String Reversed 
FROM CTE 
WHERE [Index]-1=Len(@String)

Unlike the previous two solutions, the CTE approach involves more work, work that maybe isn’t required just for the sake of using the technique. When multiple alternative approaches exist, I prefer using the (simplest) solution that offers the highest degree of flexibility. In this case is the loop, though in others it could be a recursive function or a CTE.

Notes:

The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

09 February 2010

💎SQL Reloaded: Ways of Looking at Data IV (Hierarchies)

Introduction

There is a special type of entities that when related to each other, the relations put together form a hierarchy. It’s the case of assemblies and the components they are made of, a component could be at its turn an assembly formed from smaller components, resulting thus a structure known as Bill of Materials (BoM). Take for example a laptop made of parts like the keyboard, modem, hard disk, CD-ROM, microprocessor, volatile memory, graphic card, display, to name just a few of the main components, each of these parts are at their turn are made of smaller components that at their turn could be made from smaller components, and so on. Another popular example is the one of employees in an organization, following the line of command from CEO to the lowest person as importance, a person could be chief for more employees and have only one chief, resulting thus another hierarchical structure.

Most of the hierarchies I met were having maximum 7 levels, where a level represents a parent-child relation, typically there is no need for higher hierarchies. In a RDBMS such relations are represented with self-referencing tables or self-referencing structures in case more than one table is used to represent the relation. The problem with such structures is that in order to get the whole hierarchy a join must be created for each relation, though as the number of levels could vary, such queries can become quite complex. Therefore RDBMS came with their own solutions in order to work with hierarchies

There are several types of problems specific to hierarchies: 1. Building the hierarchy behind of an item in the structure
2. Identifying the top-most parent of an item in the structure
3. Identifying the top-most parent of an item in the structure and building the hierarchy behind it – a combination of the previous mentioned problems

Building the hierarchy

SQL Server 2005 introduced the common table expressions (CTE), a powerful feature that allows building whole hierarchies in a simplistic manner. From a structural point of view a CTE has two elements – the anchor member defining the starting point of the query, respectively the recursive member that builds upon the anchor member. The starting point of the query in this case is the entity or entities for which the structure is built, and here there are 2 situations – we are looking for all topmost entities or for the entities matching a certain criteria, typically looking for one or more Product Numbers. The difference between the two scopes could lead to different approaches – whether to include the Product information in the CTE or to use the CTE only to built the hierarchical structure, and add only in the end the production information. Before things would become fuzzier let’s start with an example, here are the two approaches for defining the anchor member, the first query identifying the top-most assemblies, while the second searches for the components in the BOM that matches a criteria (e.g. the Product’s name starts with BK-R93R).

So we have the starting point, how about the recursive member? Without using a CTE, normally we would have to join either of the two queries again with the BillofMaterials table in order to retrieve the components for the Products already retrieved, then the final result must be joined again with the BillofMaterials, and repeat this step over and over again until a predefined number of iterations are met, or until no components are retrieved anymore. Actually that’s what a CTE does too, and in order to make things simpler, a CTE has a name that it’s reused in the CTE in order to handle the recursive part, and outside of the CTE in order to use the final result. Supposing that CTE’s name is BOM here is the recursive member:

The anchor and recursive member are merged in the CTE with the help of a UNION therefore the number of attributes must be the same and their data type must be compatible. Here is the final query:

    The TotalQty attribute has been introduced in order to show the cumulated quantity for each component, for example if the Assembly has the PerAssemblyQty 2 and the Component has the PerAssemblyQty 10 then the actual total quantity needed is 2*10.
    The Path attribute has been introduced in order to sort the hierarchy lexically based on the sequence formed from the ComponentID, it could have been used instead the Product Number.

Identifying the top-most parent

    The CTE can be used also for identifying the top-most parent of a given Product or set of Items, the second anchor query member could be used for this, while the recursive member would involve the reverse join between BOM and BillOfMaterials. Here is the final query:

Given the fact that the final CTE output includes the output from each iteration when only the data from the last iteration are needed, this might not be the best solution. I would expect that the simulation of recursion with the help of a temporary table would scale better in this case.

Another interesting fact to note is that in this case more assemblies are found to match the criteria of top-most item, meaning that the BOMs are not partitioned – two BOMs could have common elements.

Happy coding!

Previous Post <<||>> Next Post

07 December 2007

🏗️Software Engineering: Recursion (Just the Quotes)

"A powerful tool for reducing apparent complexity is recursion. In a recursive procedure, the method of solution is defined in terms of itself. That is, each part of the routine handles only a small piece of the strategy, then calls the other parts of the routine as needed to handle the rest. The trick is to reduce each hard case to one that is handled simply elsewhere." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Recursion represents no saving of time or storage. Somewhere in the computer must be maintained a list of all the places a recursive routine is called, so the program can eventually find its way back. But the storage for that list is shared among many different uses. More important, it is managed automatically; many of the burdens of storage management and control flow are placed on the compiler, not on the programmer. And since bookkeeping details are hidden, the program can be much easier to understand. Learning to think recursively takes some effort, but it is repaid with smaller and simpler programs." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Use recursive procedures for recursively-defined data structures." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Recursion is the root of computation since it trades description for time." (Alan J Perlis, "Epigrams on Programming", 1982)

"A deterministic process may be defined in terms of a mathematical function from its input channels to its output channels. Each channel is identified with the indefinitely extensible sequence of messages which pass along it. Such functions are defined in the usual way by recursion on the structure of the input sequences, except that the case of an empty input sequence is not considered." (C Anthony R Hoare, "Communicating Sequential Processes", 1985)

"Recursion permits the definition of a single process as the solution of a single equation. The technique is easily generalised to the solution of sets of simultaneous equations in more than one unknown. For this to work properly, all the right-hand sides must be guarded, and each unknown process must appear exactly once on the left-hand side of one of the equations." (C Anthony R Hoare, "Communicating Sequential Processes", 1985)

"Apart from power laws, iteration is one of the prime sources of self-similarity. Iteration here means the repeated application of some rule or operation - doing the same thing over and over again. […] A concept closely related to iteration is recursion. In an age of increasing automation and computation, many processes and calculations are recursive, and if a recursive algorithm is in fact repetitious, self-similarity is waiting in the wings."(Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990)

"When we talk about software architecture, software is recursive and fractal in nature, etched and sketched in code. Everything is details. Interlocking levels of detail also contribute to a building’s architecture, but it doesn’t make sense to talk about physical scale in software. Software has structure - many structures and many kinds of structures-but its variety eclipses the range of physical structure found in buildings. You can even argue quite convincingly that there is more design activity and focus in software than in building architecture - in this sense, it’s not unreasonable to consider software architecture more architectural than building architecture!" (Robert C Martin, "Clean Architecture: A Craftsman's Guide to Software Structure and Design", 2017)

SQL Troubles

Pages

03 March 2025

💎🤖SQL Reloaded: Copilot Stories (Part VIII: Recursive Functions)

05 December 2010

💎🏭SQL Reloaded: A Look into Reverse

09 February 2010

💎SQL Reloaded: Ways of Looking at Data IV (Hierarchies)

07 December 2007

🏗️Software Engineering: Recursion (Just the Quotes)

About Me