Showing posts with label numeric functions. Show all posts
Showing posts with label numeric functions. Show all posts

18 October 2022

SQL Reloaded: Successive Price Increases/Discounts via Windowing Functions and CTEs

I was trying today to solve a problem that apparently requires recursive common table expressions, though they are not (yet) available in Azure Synapse serverless SQL pool. The problem can be summarized in the below table definition, in which given a set of Products with an initial Sales price, is needed to apply Price Increases successively for each Cycle. The cumulated increase is simulated in the last column for each line. 

Unfortunately, there is no SQL Server windowing function that allows multiplying incrementally the values of a column (similar as the running total works). However, there’s a mathematical trick that can be used to transform a product into a sum of elements by applying the Exp (exponential) and Log (logarithm) functions (see Solution 1), and which frankly is more elegant than applying CTEs (see Solution 2). 

-- create table with test data
SELECT *
INTO dbo.ItemPrices
FROM (VALUES ('ID001', 1000, 1, 1.02, '1.02')
, ('ID001', 1000, 2, 1.03, '1.02*1.03')
, ('ID001', 1000, 3, 1.03, '1.02*1.03*1.03')
, ('ID001', 1000, 4, 1.04, '1.02*1.03*1.03*1.04')
, ('ID002', 100, 1, 1.02, '1.02')
, ('ID002', 100, 2, 1.03, '1.02*1.03')
, ('ID002', 100, 3, 1.04, '1.02*1.03*1.04')
, ('ID002', 100, 4, 1.05, '1.02*1.03*1.04*1.05')
) DAT (ItemId, SalesPrice, Cycle, PriceIncrease, CumulatedIncrease)

-- reviewing the data
SELECT *
FROM dbo.ItemPrices

-- Solution 1: new sales prices with log & exp
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) CumulatedIncrease
, SalesPrice * EXP(SUM(Log(PriceIncrease)) OVER(PARTITION BY Itemid ORDER BY Cycle)) NewSalesPrice
FROM dbo.ItemPrices

-- Solution 2: new sales prices with recursive CTE
;WITH CTE 
AS (
-- initial record
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, cast(ITP.PriceIncrease as decimal(38,6)) CumulatedIncrease
FROM dbo.ItemPrices ITP
WHERE ITP.Cycle = 1
UNION ALL
-- recursice part
SELECT ITP.ItemId
, ITP.SalesPrice
, ITP.Cycle
, ITP.PriceIncrease
, Cast(ITP.PriceIncrease * ITO.CumulatedIncrease as decimal(38,6))  CumulatedIncrease
FROM dbo.ItemPrices ITP
    JOIN CTE ITO
	  ON ITP.ItemId = ITO.ItemId
	 AND ITP.Cycle-1 = ITO.Cycle
)
-- final result
SELECT ItemId
, SalesPrice
, Cycle
, PriceIncrease
, CumulatedIncrease
, SalesPrice * CumulatedIncrease NewSalesPrice
FROM CTE
ORDER BY ItemId
, Cycle


-- validating the cumulated price increases (only last ones)
SELECT 1.02*1.03*1.03*1.04 
, 1.02*1.03*1.04*1.05

-- cleaning up
DROP TABLE IF EXISTS dbo.ItemPrices

Notes:
1. The logarithm in SQL Server’s implementation works only with positive numbers!
2. For simplification I transformed percentages (e.g. 1%) in values that are easier to multitply with (e.g. 1.01). The solution can be easily modified to consider discounts.
3. When CTEs are not available, one is forced to return to the methods used in SQL Server 2000 (I've been there) and thus use temporary tables or table variables with loops. Moreover, the logic can be encapsulated in multi-statement table-valued functions (see example), unfortunately, another feature not (yet) supported by serverless SQL pools. 
4. Unfortunately, STRING_AGG, which concatenates values across rows, works only with a GROUP BY clause. Anyway, its result is useless without the availability of a Eval function in SQL (see example), however the Expr function available in data flows could be used as workaround.
4. Even if I thought about the use of logarithms for transforming the product into a sum, I initially ignored the idea, thinking that the solution would be too complex to implement. So, the credit goes to another blogpost. Kudos!

Happy coding!

04 February 2010

SQL Reloaded: Just In CASE II (Clauses and Joins)

The CASE function could appear in all typical clauses – SELECT, WHERE, GROUP BY, ORDER BY and even in JOIN predicates. If the impact on performance is quite small when CASE is used inside of a SELECT clause, the decrease in performance could be quite high when the CASE is used in searches – WHERE constraints or JOIN predicates. Oracle provides function-based indexes, the smart use of such functions could improve in theory searches performed with CASE/DECODE functions, unfortunately as far I know there is no similar feature available on MS SQL Server family.

The use of CASE functions in WHERE clause is very rare because it could be replaced with several OR operators, the only benefit results maybe from syntax’ clearness or by checking values’ type before using them – the case of numeric and date values. Even if the below example doesn’t make sense from a business point of view, it shows a possible use of CASE within the WHERE clause. 

 
There are systems that misuse an alphanumeric attribute in order to store a numeric or date value, therefore before using the respective values in computations it’s necessary/recommended to test whether the values have the expected data type. I met also cases in which the range of dates covered by a date type in a RDBMS is narrower than a similar date type of other RDBMS (e.g. SQL Server vs. Oracle), in such case the test for valid dates is also necessary. For this purpose the CASE could be used together with the IsDate, respectively IsNumeric, functions as in below examples. 


The same data type handling could be used also in joins when with the misuse of alphanumeric data types, the numeric values represent references to other tables, or the date values could be used in the logic of a join. When the misused attribute is on the main table then the impact on performance is minimal, though when the misused attribute appears in the referenced table then the performance could be impacted considerably because is less likely than an index is built on the respective alphanumeric attribute, a full table scan being done. In order to exemplify this situation I will use the dbo.SplitList table-valued function, which takes a list of value and creates a table out of it, and join it with the Production.Product table. I used to Cast the Value attribute from dbo.SplitList to an integer in order to simulate the handling of numeric values. 


 Even if this use might be look strange for some people, it isn’t utopic, I met many cases in which Users wanted to pull for example all the Products available in the system having the values for a given attribute in a given list, and they wanted to see also the values that had no match. In theory could be easier to rewrite the query as follows, and use in Excel a VLOOKUP function to determine the attributes with no match. 


Of course, in this case it would have been much easier to just use the IN operator and provide the list of values.

14 September 2007

SQL Reloaded: Useful Functions in Data Conversion

    When working with data, even if it's about reporting, data conversion or data analysis, there are a few useful functions which makes life easier. They deal with three types of basic variables and their variations: strings (e.g. char, varchar), numeric (e.g. int, decimal) and date (e.g. smalldatetime, datetime).  

1. Case Function

CASE input_expression 
        WHEN when_expression 
        THEN result_expression [ ...n ] 
       [ELSE else_result_expression] 
END   
        
Judging from experience, CASE is the most used function in data conversions, when the output value is based on on one or multiple constraints over the already existing values. 
  
CASE WHEN column1 = 'yes' OR column2 = 'other' THEN 'yes/other' END    

 Just be careful, when the ELSE is not provided, a null will be returned! 

Another important thing, the values returned must have the same base type, dates, strings or numeric values can't be mixed without conversion! 

--example 1 
CASE 
    WHEN DateDiff(d, DateValue, GetDate())>0 THEN 'past' WHEN DateDiff(d, DateValue, GetDate())=0 THEN 'today' 
    ELSE 'future' 
END 

--example 2 
CASE 
     WHEN column1 BETWEEN 1 AND 100 THEN '1st range' 
     WHEN column1 BETWEEN 101 AND 1000 THEN '2nd range' 
     ELSE '3rd range' 
END 

--example 3 
CASE 
    WHEN column1 IN ('A', 'B', 'C') THEN '1st set' 
    WHEN column1 IN ('C', 'D', 'E') THEN '2nd set' 
    ELSE '3rd set' 
END      

 There is a saying for DBAs and not only, "Know your data!". As each statement is checked until comparison operation evaluated True, it is better to arrange the WHEN expression in the descending order of highest occurrence. 

 2. IsNull Function

IsNull(expression)   
  
 It's always a good idea to check for NULL values, they could occur anytime when a default value is not provided for a value. One of the cases I often found was when joining two tables when one of the values from JOIN was NULL. 

-- example for string values 
SELECT * 
FROM table1 A 
         LEFT JOIN table2 B 
             ON A.column1 = B.column1 
           AND IsNull(A.column2, ' ') = IsNull(B.column2, ' ') 
WHERE B.column1 IS NULL  

-- example for numeric values 
SELECT * 
FROM table1 A 
      LEFT JOIN table2 B 
          ON A.column1 = B.column1  
        AND IsNull(A.column2, 0) = IsNull(B.column2, 0) 
WHERE B.column1 IS NULL 

The ' ' and 0 values should be used when they aren't found in the list of possible values or there is no impact on the result. Another used value for IsNull function is 'N/A', to point out the missing value. 

 3. Coalesce Function

COALESCE ( expression [ ,...n ] )  
       
There are cases when is needed to take the first not null value from a list of columns, could be dates or any type of other value. Instead of writting something like IsNull(Date1, IsNull(Date2, IsNull(Date3, GetDate()) it's much easier to write COALESCE(Date1, Date2, Date3, GetDate()) 

-- example coalesce 
SELECT * 
FROM table1 
WHERE COALESCE(Column1, Column2, Column3) IS NOT NULL 

-- example IsNull alternative 
SELECT * 
FROM table1 
WHERE Column1 IS NOT NULL 
        OR Column2 IS NOT NULL 
        OR Column3 IS NOT NULL 

 4. NullIf Function
  
NullIf ( expression , expression )    

NullIf function is handy especially when is not needed a certain value in output, for example 0 or the .  

-- example numeric values 
SELECT column1
, NullIf(column2, 0) column2
FROM table_name     

-- example string or date values 
SELECT column1
, NullIf(column2, '') column2
FROM table_name     

When multiple values are not needed, it's easier maybe to write 

-- example multiple checks 
 SELECT column1 
, CASE 
     WHEN column2 IN ('A', 'B', 'V') THEN NULL 
     ELSE column1 END 
FROM table_name 

-- alternative 
SELECT column1, NullIf(NullIf(NullIf(column2, 'A'), 'B'), 'V') 
FROM table_name

 5. Convert Function

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )       

 I used to use CAST instead CONVERT, however CONVERT is more powerful, especially when it comes to dates. SQL Server comes with a few dates formats obtain by providing a value for style parameter: 

SELECT CONVERT(varchar(10), GetDate(),101) -- US format: mm/dd/yyyy
SELECT CONVERT(varchar(10), GetDate(),102) -- ANSI format: yyyy.mm.dd
SELECT CONVERT(varchar(10), GetDate(),103) -- British/French dd/mm/yyyy
SELECT CONVERT(varchar(10), GetDate(),104) -- German format: dd.mm.yyyy
SELECT CONVERT(varchar(12), GetDate(),105) -- Italian format: dd-mm-yyyy
SELECT CONVERT(varchar(12), GetDate(),106) -- dd mon yyyy
SELECT Upper(CONVERT(varchar(12), GetDate(),106))   -- upper case of: dd mon yyyy
SELECT CONVERT(varchar(8), GetDate(),112) -- ISO format (preferred): yyyymmdd
SELECT CONVERT(varchar(10), GetDate(), 23) -- yyyy-mm-dd
SELECT CONVERT(varchar(23), GetDate(),126) -- yyyy-mm-ddThh:mi:ss.mmm

The conversion of a date to a string is mainly required by the need to have the date in a special format or not allow Excel to format the date to its default. 

 6. IsNumeric Function

IsNumeric ( expression )     

This function shouldn't miss when casting string values to numeric values and doing operations with them. 

SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE IsNumeric(column1)=1     

Please note that when multiple constraints are used, the IsNumeric function should appear first after the WHERE clause, a query like the next one might raise an error if one of the column1 values is not numeric. 

-- example 
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name 
WHERE column2 = 'value' 
AND IsNumeric(column1)=1

-- alternative writing (recommended)
SELECT CAST(column1 as decimal(12,2)) + 101 
FROM table_name WHERE IsNumeric(column1)=1 
   AND column2 = 'value' 

 7. IsDate Function

IsDate( expression )          

 It works the same way as IsNumeric, but on date data types. Thus if the provided values is a date, it will return 1, otherwise 0. 

  SELECT IsDate(GetDate()) 
  SELECT IsDate('11/31/2007') --november has only 30 days 
  SELECT CASE IsDate('11/31/2007') WHEN 1 THEN 'Yes' ELSE 'No' END 

 Note:     
For more information on the function is always a good idea to consult Microsoft Technical documentation.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.