Showing posts with label dynamic query. Show all posts
Showing posts with label dynamic query. Show all posts

06 July 2020

🪄SSRS (& Paginated Reports): Design (Part II: Parameter Dependencies in Dropdowns)

Introduction

Quite often there are dependencies between the values of two distinct dropdowns, typically the values representing the different levels of a direct on implied hierarchy. This means that the choice at the upper level determines the values which will be shown at a lower level, the corresponding query needing to handle the value as a parameter. There are several ways on how one can handle this, the choice depending also on the data provider considered for the report.

Because the AdventureWorks database is used within an SQL Server, the available choices are MSOLEDBSQLSQLOLEDB, ODBC or SQL Server Native Client. It should be noted that the SQLOLEDB and the Native Clients providers have been deprecated and shouldn't be used for new developments! In the project was used the Microsoft OLE DB Provider for SQL Server (MSOLEDBSQL) provider.

For exemplification will be used the reports build previously. The hierarchy will be built based on the State and Territory, a Country belonging to a Territory, while a State to a Country. For the beginning we will build the dependency existing between the Country and State. 

First Level of Dependency 

The text parameter for StateCountry dataset will be defined similarly to the one consider for the Country, and have as source the following query:

-- State Provinces
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT StateProvinceCode
, StateProvinceCode + ' ' + Name
FROM [Person].[StateProvince]
WHERE CountryRegionCode = ISNULL(NullIf(?, ''), CountryRegionCode)
ORDER BY Id

The Country parameter will be handled in the WHERE constraint - it's the role of the question mark. When an empty string was selected, the equivalent for ignoring the parameter, the formula used will result to an identity, which however makes sense only if there's always a value for the Country Code in the Person.StateProvince table. Otherwise the same line needs to be written as follows:

WHERE IsNull(CountryRegionCode, '') = COALESCE(NullIf(?, ''), CountryRegionCode, '')

This might look strange, though it does the trick of handling the nulls. According to the rules of defensive programming, one should treat the eventuality of nulls. However, this depends also on application's design. If one develops reports with this supposition, then the supposition should be made explicit in the documentation!

Once the query provided, SSRS recognizes that a parameter was provided and requires to map the question mark to the respective parameter. Within dataset's properties click on the "Parameter" tab and selected @Country as Parameter Value:

Parameters in Dataset Properties

A question mark needs to be considered in the query for each parameter, the parameters from the Dataset Properties being displayed in the order in which they appear in the query.

Once the dataset defined, one can go on and define the StateProvince parameter similarly to the Country parameter and consider the StateProvinces dataset as source. The StateProvince parameter needs to be added also in report's header:
= "State Province: " & Parameters!StateProvince.Label

Within the Addresses main dataset there needs to be added the declaration for the StateProvince parameter, while in where the corresponding clause:

-- SQL Parameter
& " DECLARE @StateProvince as nvarchar(3) = '" & Parameters!StateProvince.Value & "'" & vbcrlf 

-- constraint in WHERE clause
& IIf(Parameters!StateProvince.Value<> "", "     AND SIC.StateProvinceCode = @StateProvince", "") & vbcrlf W

With these changes the report should be ready to run.

Notes:
The above query for the StateProvince dataset can be used inside of a formula with a few small changes. This should work independently of the provider used. Just replace the query in the formula and delete the parameters previously created for the data source via the Dataset Properties:

= " -- State Provinces " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " SELECT '' Id " & vbCrLf
 & " , '(All)' Name " & vbCrLf
 & " UNION ALL " & vbCrLf
 & " SELECT StateProvinceCode " & vbCrLf
 & " , StateProvinceCode + ' ' + Name " & vbCrLf
 & " FROM [Person].[StateProvince] " & vbCrLf
 & IIf(Parameters!Country.Value<>"", "     WHERE CountryRegionCode = @Country", "") & vbcrlf 
 & " ORDER BY Id " & vbCrLf

Second Level of Dependency

The second dependency is based on the Territory Group formed of North America, Pacific and Europe. There is however no Territory Group table, the relations between Countries and Territory Groups being stored into the Sales.SalesTerritory table. Therefore, the query for the corresponding dropdown can be written via a DISTINCT:

-- Territories
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT DISTINCT [Group] Id
, [Group] Name
FROM Sales.SalesTerritory
ORDER BY Id

When the user selects a Territory, in the Countries and State Province dropdowns are supposed to be shown only the corresponding values for the selected value. We need to modify thus both queries via formulas.
The query for the Country dataset includes the Territory parameter and the corresponding constraint via an EXIST:

= "  -- Countries  " & vbCrLf 
& " DECLARE @Territory as nvarchar(50) = '" & Parameters!Territory.Value & "'" & vbcrlf 
& " SELECT '' Id  " & vbCrLf 
& " , '(All)' Name  " & vbCrLf 
& " UNION ALL  " & vbCrLf 
& " SELECT SSP.CountryRegionCode Id  " & vbCrLf 
& " , SSP.Name   " & vbCrLf 
& " FROM [Person].[CountryRegion] SSP " & vbCrLf 
& IIf(Parameters!Territory.Value<> "", "     WHERE EXISTS (SELECT * FROM [Sales].[SalesTerritory] SST WHERE SSP.CountryRegionCode = SST.CountryRegionCode AND SST.[Group] = @Territory)", "") & vbcrlf 
& " ORDER BY Id " & vbCrLf

In exchange the StateProvince dataset is slightly more complex:

= " -- State Provinces " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " DECLARE @Territory as nvarchar(50) = '" & Parameters!Territory.Value & "'" & vbcrlf 
 & " SELECT '' Id " & vbCrLf
 & " , '(All)' Name " & vbCrLf
 & " UNION ALL " & vbCrLf
 & " SELECT PSP.StateProvinceCode " & vbCrLf
 & " , PSP.StateProvinceCode + ' ' + PSP.Name " & vbCrLf
 & " FROM [Person].[StateProvince] PSP" & vbCrLf
 & "      JOIN [Sales].[SalesTerritory] PST" & vbCrLf
 & "       ON PSP.TerritoryId = PST.TerritoryId " & vbCrLf
 & " WHERE 0=0 " & vbCrLf
 & IIf(Parameters!Country.Value<> "", "     AND PSP.CountryRegionCode = @Country", "") & vbcrlf 
 & IIf(Parameters!Territory.Value<> "", "     AND PST.[Group] = @Territory", "") & vbcrlf 
 & " ORDER BY Id " & vbCrLf

Once the datasets changed, is still needed to modify the order of the parameters to reflect the order in which they are called within dependency, respectively Territory, State Province, Country. In the end the report looks like this:


Notes:
(1) If the order of the parameters doesn't reflect the otder in which they are called, you'll get a generic error similar with the following one (the error appears when there are problems with the query):
An error occurred during local report processing. The definition of the report '/Customer Addresses' is invalid. The CommandText expression for the query 'Countries' contains an error: [BC30277] Type character '8e does not match declared data type 'String'.
(2) When designing parameters one should consider also the best practices for avoiding the poor design of parameters.

Previous Post <<||>> Next Post

25 October 2018

💎SQL Reloaded: Cursor and Linked Server for Data Import

There are times when is needed to pull some data (repeatedly) from one or more databases for analysis and SSIS is not available or there’s not much time to create individual packages via data imports. In such scenarios is needed to rely on the use of SQL Server Engine’s built-in support. In this case the data can be easily imported via a linked server into ad-hoc created tables in a local database. In fact, the process can be partially automated with the use of a cursor that iterates through a predefined set of tables.For exemplification I will use a SELECT instead of an EXEC just to render the results:

-- cleaning up
-- DROP TABLE dbo.LoadTables 

-- defining the scope
SELECT *
INTO dbo.LoadTables
FROM (VALUES ('dbo.InventTable')
           , ('dbo.InventDimCombination')
    , ('dbo.InventDim')
    , ('dbo.InventItemLocation')) DAT ([Table])


-- creating the stored procedure 
CREATE PROCEDURE dbo.pLoadData(
    @Table as nvarchar(50))
AS
/* loads the set of tables defiend in dbo.LoadTables */
BEGIN
   DECLARE @cTable varchar(50)

   -- creating the cursor
   DECLARE TableList CURSOR FOR
   SELECT [Table]
   FROM dbo.LoadTables
   WHERE [Table] = IsNull(@Table, [Table])
   ORDER BY [Table]

   -- opening the cursor
   OPEN TableList 

   -- fetching next record 
   FETCH NEXT FROM TableList
   INTO @cTable

   -- looping through each record 
   WHILE @@FETCH_STATUS = 0 
   BEGIN
 --- preparing the DROP TABLE statement 
        SELECT(' DROP TABLE IF EXISTS ' + @cTable + '')

        -- preparing the SELECT INTO STATEMENT
        SELECT( ' SELECT *' +
         ' INTO ' + @cTable +
                ' FROM [server].[database].[' + @cTable + ']')

 -- fetching next record 
 FETCH NEXT FROM TableList
 INTO @cTable
   END

   --closing the cursor
   CLOSE TableList 
   -- deallocating the cursor
   DEALLOCATE TableList 
END

Running the stored procedure for all the tables:

 -- Testing the procedure 
 EXEC dbo.pLoadData NULL -- loading all tables 

-- output 
 DROP TABLE IF EXISTS dbo.InventDim
 SELECT * INTO dbo.InventDim FROM [server].[database].[dbo.InventDim]

 DROP TABLE IF EXISTS dbo.InventDimCombination
 SELECT * INTO dbo.InventDimCombination FROM [server].[database].[dbo.InventDimCombination]

 DROP TABLE IF EXISTS dbo.InventItemLocation
 SELECT * INTO dbo.InventItemLocation FROM [server].[database].[dbo.InventItemLocation]

 DROP TABLE IF EXISTS dbo.InventTable
 SELECT * INTO dbo.InventTable FROM [server].[database].[dbo.InventTable]

Running the stored procedure for a specific table:

-- Testing the procedure 
EXEC dbo.pLoadData 'dbo.InventTable' -- loading a specific table

-- output 
DROP TABLE IF EXISTS dbo.InventTable
SELECT * INTO dbo.InventTable FROM [server].[database].[dbo.InventTable]

Notes:
Having an old example of using a cursor (see Cursor and Lists)  the whole mechanism for loading the data was available in 30 Minutes or so.
Tables can be added or removed after need, and the loading can be made more flexible by adding other parameters to the logic.
The solution is really easy to use and the performance is as well acceptable in comparison to SSIS packages.
Probably you already observed the use of DROP TABLE IF EXSISTS introduced with SQL Server 2016 (see also post)

Advantages:The stored procedure can be extended to any database for which can be created a linked server.
Structural changes of the source tables are reflected in each load.
Tables can be quickly updated when needed just by executing the stored procedure.

Disadvantages:
Such solutions are more for personal use and their use should be avoided in a production environment.
The metadata will be temporarily unavailable during the time the procedure is run. Indexes need to be created after each load.

Happy Coding!

18 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server X (Dynamic Queries)

A dynamic query is a query constructed at runtime, techniques often indispensable in many situations that require a certain flexibility in query’s creation. The creation of a dynamic query is nothing but a set of operations with strings, many of the techniques mentioned before becoming handy. SQL Server provides two functions for the execution of dynamic queries, namely EXECUTE statements (or its shortened form EXEC) and sp_executesql stored procedure. Even if the later it’s more flexible allowing passing parameters from and to the caller and allows reusing executions plans (see Using sp_esecutesql), for the following examples will be used only EXEC. But before let’s look how a static could become dynamic. For this let’s consider the following query based on AdventureWorks database:
 
-- example 1 - simple query   
SELECT *  
FROM Person.Address  
WHERE AddressID = 1 

-- example 2 - query encapsulated in a string: 
EXEC ('SELECT * FROM Person.Address WHERE AddressID = 1') 

-- example 3 - query stored into a string variable      
EXEC ('SELECT * FROM Person.Address WHERE AddressID = 1') 

Supposing that the AddressID is considered as parameter we can write:

-- example 4 - static query     
DECLARE @AddressID int 
SET @AddressID = 1 
SELECT *  
FROM Person.Address  
WHERE AddressID = @AddressID  

-- example 5 - dynamic query  
DECLARE @sql varchar(100) 
DECLARE @AddressID int 
SET @AddressID = 1 
SET @sql = 'SELECT * FROM Person.Address WHERE AddressID = ' + CAST(@AddressID as varchar (10)) 
EXEC (@sql) 

Until here there is no important conceptual difference. What if is needed to pass multiple AddressIDs? We can create a parameter for which expected values, though that’s not a reasonable solution as the number of values can vary. A more elegant solution would be to create a list of values and provided as a string parameter and then concatenate the original query and the string parameter like below. We just need to accommodate the length of the string variable to the expected size of the list of value.
 
-- example 6 (dynamic query) 
DECLARE @sql varchar(100) 
DECLARE @AddressIDs varchar(50) -- supposed parameter 
SET @AddressIDs = '1, 2, 4, 5, 6, 10'  
SET @sql = 'SELECT * FROM Person.Address WHERE AddressID IN (' + @AddressIDs + ')' 
EXEC (@sql) 

There is actually a third solution. As in the previous post on list of values has been introduced the dbo.StringToTable function, the function can be used thus to transform the list in a table:
 
-- example 7 (list of values) 
DECLARE @AddressIDs varchar(50) -- supposed parameter 
SET @AddressIDs = '1,2,4,5,6,10'  
SELECT *  
FROM Person.Address  
WHERE AddressID IN ( 
      SELECT value  
      FROM dbo.StringToTable(@AddressIDs, ',')) 

In the same post was constructed the DoubleList list of values which can be used in a dynamic query in bulk inserts or table-value constructors. The list needs to be slightly modified by replacing the single quote with two single quotes in order to accommodate value’s storage in a string. Considering that there are no integrity constraints on the targeted table, he query for bulk insert can be written as follows:
 
-- example 8 (list of values & bulk insert) 
DECLARE @sql varchar(200) 
DECLARE @AddressTypes varchar(150)  
SET @AddressTypes = '(6,''Archive''), (1,''Billing''), (2,''Home''), (3,''Main Office''), (4,''Primary''), (5,''Shipping'')' 
SET @sql = 'INSERT Person.AddressType (AddressTypeID, Name) VALUES ' + @AddressTypes  
EXEC (@sql) 

  The same technique can be used with a table-value constructor:
 
-- example 9 (list of values &amp; table-value constructor) 
DECLARE @sql varchar(400) 
DECLARE @AddressTypes varchar(150)  
SET @AddressTypes = '(6,''Archive''), (1,''Billing''), (2,''Home''), (3,''Main Office''), (4,''Primary''), (5,''Shipping'')' 
SET @sql = 'SELECT VA.AddressID, VA.AddressTypeID, AT.NAME FROM Purchasing.VendorAddress VA JOIN ( VALUES ' + @AddressTypes + ') AS AT(AddressTypeID, Name) ON VA.AddressTypeID = AT.AddressTypeID' 
EXEC (@sql) 

The above examples are basic, in daily problems such queries can involve multiple parameters and operations. In addition, in the last examples the concatenation step was left out.

 
Happy coding!

11 July 2010

💎SQL Reloaded: Evaluating Textual Expressions with CLR UDFs

The developers having experience with scripting languages like VBScript or JavaScript are most probably used with Eval (VBScript/JavaScript) or Execute (VBScript) functions, functions that parse a string evaluating its content to an expression or evaluating a given set of instructions, allowing thus to create and execute code at runtime. In several occasions I needed similar functionality also in T-SQL, having to evaluate a formula stored in a text field or text variable. The latest situation of this type I could recall was when importing a set of BOM (Bill of Materials) data from Oracle into SQL Server. I had two problems then, one related to Oracle and the second to SQL Server. 

As each level in the (Oracle) BOM has a Quantity which needs to be multiplied with the Quantities from the child levels in order to get the total Required Quantity for each level, I needed a method to aggregate the respective values. Unlike in SQL Server, in Oracle (as far I know) there is no built-in functionality for that, however SYS_CONNECT_BY_PATH function (see example) allows to built a string with the total quantities available in the BOM structure and is even possible to evaluate the resulted string using a function, though as I was not allowed to create a function in the production environment then, I had to solve the issue in SQL Server (2000) itself. The concatenated string of quantities at each BOM level was looking something like ‘1*1*2*4*1’ (here for the 5thBOM  level), so I needed a way to evaluate easily the respective product.

    In SQL Server the Cast and Convert functions allow to cast a string holding a numeric value to a numeric data type but their utility stops there.

-- Cast/Convert (working) example 
SELECT Cast('1' as int) * 25 
, Convert(int, '1') * 10 
 
    Many years ago, I remember I “tried my luck” and attempted to run a similar script like the below one:

-- Cast/Convert (not working) example 
SELECT Cast('1*2' as int) * 25 
, Convert(int, '1*2') * 10 
 
    No wonder that I got an error:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value '1*2' to data type int.

   I attempted to find, again without luck, a built-in solution for my problem. For dynamic created scripts could be used, in general, the sp_executesql stored procedure:

-- expression evaluation using sp_executesql (output: 6) DECLARE @retval int; 
DECLARE @param int; 
SET @param = 3 
EXECUTE sp_executesql N'SELECT @retval = 1*2*@param', N'@param int, @retval int output', @param=@param, @retval = @retval output;  
SELECT @retval 

    Unfortunately the sp_executesql can’t be used from UDFs, therefore the functionality can’t be used in queries involving a select from a given object. However, given its simplicity, the string could be regarded as a list of numbers delimited by “*” (multiplication) sign, a loop or starting with SQL Server 2005, a common table expression could solve the problem but it will work only for this type of multiplications, unless the expression is parsed using specific mathematical algorithms. Then I wrote a function similar with the below one:

CREATE FUNCTION dbo.EvalProduct( 
@ListValues varchar(max)) 
RETURNS decimal(18,2) 
/* 
Purpose: evaluates the product of multiple numbers written as a text expression 
Parameters: @ListValues varchar(max) - List of values to be multiplied 
Sample call: SELECT dbo.EvalProduct('1*2*3*4') 
SELECT dbo.EvalProduct('1.2*2.3*3.4') 
SELECT dbo.EvalProduct('1.2') 
SELECT dbo.EvalProduct('NULL') 
*/ AS 
BEGIN 
DECLARE @Index int 
DECLARE @Product decimal(18,2) 
SET @Index = CharIndex('*', @ListValues) 
SET @Product = CASE 
     WHEN @Index&gt;1 OR (@Index = 0 AND IsNumeric(@ListValues) = 1) THEN 1 
     ELSE NULL 
END  

WHILE @Index &gt; 0  
BEGIN 
    SET @Product = @Product * Cast(Left(@ListValues, @Index-1) as decimal(18,2)) 

    SET @ListValues = Right(@ListValues, Len(@ListValues)-@Index) 
    SET @Index = CharIndex('*', @ListValues) 
END  

IF IsNumeric(@ListValues)=1 
SET @Product = @Product * Cast(@ListValues as decimal(18,2))  

RETURN @Product 
END 

      
    The function, even if handy, it has its limitations, first of all it works only with products of positive numeric values, and secondly there is no chance to use it for more complex expressions. Fortunately, starting with SQL Server 2005 there is an alternative.  In last post, Data Profiling Using CLR Functions I introduced ExecuteScalarToInt and ExecuteScalarToDecimal CLR functions which could be used to evaluate a text expression to an integer, respectively a decimal value, and thus they could be used also to solve the above problem:  

-- CLR functions approach (Output: 120, respectively 236.48) 
SELECT dbo.ExecuteScalarToInt('SELECT 1*2*3*4*5') IntExample 
, dbo.ExecuteScalarToDecimal('SELECT 1.2*2.3*3.4*4.5*5.6') DecimalExample  

    The functions could be actually used with any type of expression that returns a numeric value, the expressions could include also built-in or user-defined functions:
 
-- CLR with built-in mathematical functions (Output: 21.76)
SELECT dbo.ExecuteScalarToDecimal('SELECT Pi()*SQRT(47.98)') 

--CLR with UDF (Output: 9.38) 
SELECT dbo.ExecuteScalarToDecimal('SELECT dbo.EvalProduct(''1.2*2.3*3.4'')') 

    This means that the two functions could be used for example in queries in which the formulas following to be executed are stored in tables or built dynamically. Let’s look at an example based on the values stored in a table:

-- creating a simple table 
 CREATE TABLE dbo.TestTextExpressions ( 
Expression varchar(max)) 

--inserting the values 
INSERT INTO dbo.TestTextExpressions 
VALUES ('1*2*3*4') 
, ('Pi()*SQRT(47.98)') 
, ('Cos(10)') 
, ('Year(GetDate())') 
, ('Square(10.45)') 

-- running the formulas 
SELECT Expression 
, dbo.ExecuteScalarToDecimal('SELECT ' + Expression) [Output]      
FROM dbo.TestTextExpressions   


CLR UDF - expression evaluation 
   
Notes:
1.    Instead of adding the SELECT in the parameter string, it could be created a CLR function (e.g. EvalToInt) that includes it already. Not sure how much benefit could be obtained from this, however I prefer to have generalized functionality rather then providing specific functions for each scenario.
2.    Most probably there are also exceptions in which this approach can’t be used. As the two CLR functions don’t perform any validation, the usefulness of such functions depends also on the validity of the input. If the input is invalid, then an error will be thrown; on the other side, even if it’s possible to catch the error in the CLR functions itself, it could prove to be difficult to alter function’s behavior (for example returning a NULL value) without impacting the result.
3.    In order to run UDFs from CLR UDFs, the SystemDataAccess:=SystemDataAccessKind.Read must be added in CLR functions definitions, otherwise “This statement has attempted to access data whose access is restricted by the assembly” error message will be thrown.

PS:  It would be nice to receive some feedback on how the above functions helped you to solve a similar problem!

04 July 2010

🗄️Data Management: Data Profiling II (Using CLR functions)

Data Management
Data Management Series

In the previous post on Data Profiling I introduced basic data profiling techniques, the queries used for exemplification being quite simple. Probably many people will ask themselves on whether is possible to automate data profiling for a whole range of tables/datasets and the attributes they contain. The answer is simple, this relies entirely on databases’ capabilities of running dynamic scripts in an elegant manner – many such solutions relying on the use of cursors in which the dynamic created scripts are run by a stored procedure (e.g. EXEC or sp_executesql in SQL Server). There are also solution in which the dynamic queries are created, run and managed by third-party tools. In what concerns SQL Server, the CLR functions could prove to be an elegant solution for this purpose.

In a previous post, “Number of Records – The CLR Version”, I described the steps I followed in order to create a CLR function in order to get the number of records for a set of tables, the name of the table being provided as parameter. Instead of giving as parameter only the name of the table it could be provided instead the whole query string, thus having the possibility of running multiple types of query that return only a scalar. 

The problem is that the CLR function returns a value only of a certain data type, thus we could transform all the values to string or create a function for each data type. I will try to describe in this post both approaches, a reason for that being also the fact that initially I thought that the second solution is more practical, and probably easier to debug, though as I discovered the first technique is more flexible allowing to encapsulate the logic in a single query.

Let’s consider the following functions that could be added in a new Database Project or in the project created in the previous post on CLR functions. As mentioned above, is created a function for each important data type, thus ExecuteScalarToInt returns an integer value, ExecuteScalarToDate a date value, ExecuteScalarToString a string value, ExecuteScalarToDecimal a decimal value, amd ExecuteScalarToBoolean a boolean value:

Partial Public Class UserDefinedFunctions 
<SqlFunction(DataAccess:=DataAccessKind.Read, SystemDataAccess:=SystemDataAccessKind.Read)> _ Public Shared Function ExecuteScalarToInt(ByVal SqlScript As String) As Integer     
Using conn As New SqlConnection("context connection=true") 
conn.Open()     
Dim cmd As New SqlCommand(SqlScript, conn) 
Return CType(cmd.ExecuteScalar(), Integer)     
End Using End Function 

<SqlFunction(DataAccess:=DataAccessKind.Read, SystemDataAccess:=SystemDataAccessKind.Read)> _ Public Shared Function ExecuteScalarToDate(ByVal SqlScript As String) As Date     
Using conn As New SqlConnection("context connection=true") 
    conn.Open()     

    Dim cmd As New SqlCommand(SqlScript, conn)     

    Return CType(cmd.ExecuteScalar(), Date)     
End Using End Function 

<SqlFunction(DataAccess:=DataAccessKind.Read, SystemDataAccess:=SystemDataAccessKind.Read)> _ Public Shared Function ExecuteScalarToString(ByVal SqlScript As String) As String     
Using conn As New SqlConnection("context connection=true") 
    conn.Open()     

    Dim cmd As New SqlCommand(SqlScript, conn)   
  
    Return CType(cmd.ExecuteScalar(), String)     
End Using End Function 

<SqlFunction(DataAccess:=DataAccessKind.Read, SystemDataAccess:=SystemDataAccessKind.Read)> _ Public Shared Function ExecuteScalarToDecimal(ByVal SqlScript As String) As Decimal     
Using conn As New SqlConnection("context connection=true") 
    conn.Open()     

    Dim cmd As New SqlCommand(SqlScript, conn)     

    Return CType(cmd.ExecuteScalar(), Decimal)     
End Using End Function 

<SqlFunction(DataAccess:=DataAccessKind.Read, SystemDataAccess:=SystemDataAccessKind.Read)> _ Public 
Shared Function ExecuteScalarToBoolean(ByVal SqlScript As String) As Boolean     
Using conn As New SqlConnection("context connection=true") 
    conn.Open()     

    Dim cmd As New SqlCommand(SqlScript, conn)   
  
    Return CType(cmd.ExecuteScalar(), Boolean)     
End Using End Function 
End Class

After compiling the project and re-linking the assembly, must be declared the following functions:

-- returns int values 
CREATE FUNCTION dbo.ExecuteScalarToInt(@sql nvarchar(max)) 
RETURNS bigint  
AS  
EXTERNAL NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].ExecuteScalarToInt 
GO 

-- returns decimal values 
CREATE FUNCTION dbo.ExecuteScalarToDecimal(@sql nvarchar(max)) 
RETURNS decimal(18,2)  
AS  
EXTERNAL NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].ExecuteScalarToDecimal 
GO 

-- returns nvarchar values 
CREATE FUNCTION dbo.ExecuteScalarToString(@sql nvarchar(max)) 
RETURNS nvarchar(max)  
AS  
EXTERNAL NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].ExecuteScalarToString 
GO 

-- returns date values 
CREATE FUNCTION dbo.ExecuteScalarToDate(@sql nvarchar(max)) 
RETURNS date  
AS  
EXTERNAL NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].ExecuteScalarToDate 
GO 

-- returns boolean values 
CREATE FUNCTION dbo.ExecuteScalarToBoolean(@sql nvarchar(max)) 
RETURNS bit  
AS  
EXTERNAL NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].ExecuteScalarToBoolean 
GO 


-- testing the functions    
SELECT dbo.ExecuteScalarToInt('SELECT count(*) FROM Production.Product') NumberRecords 
SELECT dbo.ExecuteScalarToString('SELECT Max(ProductNumber) FROM Production.Product') GreatestStringValue 
SELECT dbo.ExecuteScalarToString('SELECT Max(ListPrice) FROM Production.Product') GreatestNumericValue 
SELECT dbo.ExecuteScalarToDate('SELECT Max(SellStartDate) FROM Production.Product') GreatestDateValue 
SELECT dbo.ExecuteScalarToString('SELECT CASE WHEN count(1)&gt;1 THEN 1 ELSE 0 END FROM Production.Product') HasRecords
 
Data Profiling - testing CLR functions

  The first example that returns the number of records could be easily adapted in order to return the respective value for a given set of tables:

-- number of records 
SELECT S.name [schema_name] 
, T.name table_name 
, dbo.ExecuteScalarToInt('SELECT count(*) NumberRecords FROM AdventureWorks.' + S.name + '.' + T.name) NumberRecords 
FROM AdventureWorks.sys.tables T 
    JOIN AdventureWorks.sys.schemas S 
      ON T.schema_id = S.schema_id 
WHERE S.name = 'Sales' 
ORDER BY S.name  
, T.name 
 
Data Profiling - number of records

The attribute-related metrics require a lower level of detail, to the above query adding the sys.colums table, for this purpose I will use the dbo.vTableColumns view created in the previous post on data profiling. And here’s the query based on AdventureWorks database:

-- data profiling attribute level SELECT[Schema_Name]  
, Table_Name  
, Column_Name  
, user_type  
, system_type  
-- number distinct values 
 , CASE  
    WHEN system_type NOT IN ('xml') THEN dbo.ExecuteScalarToInt('SELECT count(DISTINCT [' + Column_Name + ']) NumberRecords FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name)  
     ELSE -1 
END NumberDistinctValues 
-- number/percentage not null values 
, dbo.ExecuteScalarToInt('SELECT count(1) NumberRecords FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name + ' WHERE [' + Column_Name + '] IS NOT NULL') NumberNotNullValues  
, CASE  
    WHEN system_type NOT IN ('xml') THEN dbo.ExecuteScalarToDecimal('SELECT Cast(CASE WHEN count(1) &gt;0 THEN Cast(count([' + Column_Name + ']) as decimal(18,2))/count(1) ELSE 0 END as decimal(5,2)) PercentageNotNullValues FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name) 
    ELSE -1 
END PercentageNotNullValues 
-- min/max length of values 
, CASE  
    WHEN system_type NOT IN ('xml') THEN dbo.ExecuteScalarToInt('SELECT IsNull(Min(Len([' + Column_Name + '])), 0) MinLengthValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name) 
    ELSE -1 
END MinLengthValue 
, CASE  
     WHEN system_type NOT IN ('xml') THEN dbo.ExecuteScalarToInt('SELECT IsNull(Max(Len([' + Column_Name + '])), 0) MaxLengthValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name) 
    ELSE -1 
END MaxLengthValue 
-- Min/Max values , CASE  
    WHEN system_type NOT IN ('varbinary', 'uniqueidentifier', 'xml', 'sysname') THEN  
dbo.ExecuteScalarToString('SELECT IsNull(Cast(Min('  
    + CASE WHEN system_type = 'bit' THEN 'Cast(' ELSE '' END 
    + '[' + Column_Name + ']' 
    + CASE WHEN system_type = 'bit' THEN ' as smallint)' ELSE '' END 
     + ') as nvarchar(max)), ''NULL'') MinValue FROM AdventureWorks.' 
     + [Schema_Name] + '.' + Table_Name) 
   ELSE CAST('' as nvarchar(max))  
END MinValue 
, CASE  
     WHEN system_type NOT IN ('varbinary', 'uniqueidentifier', 'xml', 'sysname') THEN  
dbo.ExecuteScalarToString('SELECT IsNull(Cast(Max('  
    + CASE WHEN system_type = 'bit' THEN 'Cast(' ELSE '' END 
    + '[' + Column_Name + ']' 
    + CASE WHEN system_type = 'bit' THEN ' as smallint)' ELSE '' END 
+ ') as nvarchar(max)), ''NULL'') MaxValue FROM AdventureWorks.'  
    + [Schema_Name] + '.' + Table_Name )  
    ELSE CAST('' as nvarchar(max))  
END MaxValue  
FROM dbo.vTableColumns 
WHERE [Schema_Name] = 'Production' 
AND Table_Name = 'Product' ORDER BY [Schema_Name]  
, Table_Name  
, Column_Name  
, column_id 

   Here’s the output, the query run in about 30 seconds for the whole AdventureWorks database’s table:
  
Data Profiling - attribute level

 More likely, the above query requires some explanations. First of all some CASEs have been introduced in order to treat the exceptions, given the fact that not all data types support all the aggregate functions, while the xml data type doesn’t support any aggregate function at all. Hopefully I treated all the scenarios, if not the query could be easily updated. In other cases I applied a simple conversion, as in the case of bit data type casted to a smallint. In case the dynamic queries return other value than integer, the output value has been casted to a nvarchar(max). Most probably for easier understanding of the dynamic queries used is necessary also a review of the corresponding static queries presented in the previous post.

In case we use a specific function for each data type then we either choose to work with a query for each data type, or in case we want to use a UNION (ALL) join, then we’ll have to cast the output to nvarchar. So whatever method we choose, in order to create only one dataset out of our profiling implementation, a cast to nvarchar is required. I will resume myself to provide the queries for each data type, more for exemplifying the use of the above CLR functions:

-- getting Min/Max integer values 
SELECT [Schema_Name] 
, Table_Name  
, Column_Name 
, user_type 
, dbo.ExecuteScalarToInt('SELECT IsNull(Min(' + Column_Name + '), 0) MinValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MinValue 
, dbo.ExecuteScalarToInt('SELECT IsNull(Max(' + Column_Name + '), 0) MaxValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MaxValue 
FROM dbo.vTableColumns 
WHERE [Schema_Name] = 'Production' 
AND Table_Name = 'Product' 
AND user_type IN ('int', 'bigint') 

-- getting Min/Max String values 
SELECT [Schema_Name] 
, Table_Name  
, Column_Name 
, user_type 
, dbo.ExecuteScalarToString('SELECT IsNull(Min(' + Column_Name + '), '''') MinValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MinValue 
, dbo.ExecuteScalarToString('SELECT IsNull(Max(' + Column_Name + '), '''') MaxValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MaxValue 
FROM dbo.vTableColumns 
WHERE [Schema_Name] = 'Production' AND Table_Name = 'Product' 
AND user_type IN ('nvarchar', 'varchar', 'char', 'nchar') 

-- getting Min/Max Date values 
SELECT [Schema_Name] 
, Table_Name  
, Column_Name 
, user_type 
, dbo.ExecuteScalarToDate('SELECT IsNull(Min(' + Column_Name + '), Cast(''1800-01-01'' as date)) MinDateValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MinDateValue 
, dbo.ExecuteScalarToDate('SELECT IsNull(Max(' + Column_Name + '), Cast(''1800-01-01'' as date)) MaxDateValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MaxDateValue 
FROM dbo.vTableColumns 
WHERE [Schema_Name] = 'Production' 
AND Table_Name = 'Product' 
AND user_type IN ('datetime', 'date') 

-- getting Min/Max Boolean values 
SELECT [Schema_Name] 
, Table_Name  
, Column_Name 
, user_type 
, dbo.ExecuteScalarToBoolean('SELECT IsNull(Min(Cast(' + Column_Name + ' as smallint)), 0) MinValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MinValue 
, dbo.ExecuteScalarToBoolean('SELECT IsNull(Max(Cast(' + Column_Name + ' as smallint)), 0) MaxValue FROM AdventureWorks.' + [Schema_Name] + '.' + Table_Name ) MaxValue 
FROM dbo.vTableColumns 
WHERE [Schema_Name] = 'Production' 
--AND Table_Name = 'Products' AND user_type IN ('bit') 

Notes:
1.   The problem with actual form of the functions is that when the query returns a null value, the database engine throws the following error message:
Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user-defined routine or aggregate "ExecuteScalarToDate":
System.InvalidCastException: Conversion from type 'DBNull' to type 'Date' is not valid.
System.InvalidCastException:
   at Microsoft.VisualBasic.CompilerServices.Conversions.ToDate(Object Value)
   at SqlServerProject1.UserDefinedFunctions.ExecuteScalarToDate(String SqlScript)

    
The error could be avoided by handling the NULL values with a default value, for example 0 for numeric values, the empty string for string values, a minimal value date for date values, and 0 for bit values. If for numeric and string value this translation has a minimum of impact, for date and 0 values kind of reduces from functionality.

2. SQL Server 2008 provides also a Data Profiling Task as part of its SSIS solution, quite easy to use, however in comparison with it, the above functions allow to profile not only a table but a set of tables, the statistics about attributes could be brought easily together in a tabular format and the logic could be further encapsulated in a UDF or view for reuse.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.