28 June 2020

Strategic Management: Simplicity II (A System's View)

Strategic Management

Each time one discusses in IT about (software and hardware) components interacting with each other, one talks about a composite referred as a system. Even if the term of Information System (IS) is related to it, a system is defined as a set of interrelated and interconnected components that can be considered together for specific purpose or simple convenience.

A component can be a piece of software or hardware, as well persons or groups if we extend the definition. The consideration of people becomes relevant especially in the context of ecologies, in which systems are placed in a broader context that consider people’s interaction with them, as this raises to important behavior that impacts system’s functioning.

Within a system each part has a role or function determined in respect to the whole as well to the other parts. The role or function of the component is typically fixed, actually predefined, though there are also exceptions especially when the scope of a component is enlarged, respectively reduced to the degree that the component can be removed or ignored. What one considers or not considers as part of system defines a system’s boundaries; it’s what distinguishes it from other systems within the environment(s) considered.

The interaction between the components resume in the exchange, transmission and processing of data found in different aggregations ranging from signals to complex data structures. If in non-IT-based systems the changes are determined by inflow, respectively outflow of energy, in IT the flow is considered in terms of data in its various aggregations (information, knowledge).  The data flow (also information flow) represents the ‘fluid’ that nourishes a system’s ‘organism’.

One can grasp the complexity in the moment one attempts to describe a system in terms of components, respectively the dependencies existing between them in term of data and processes. If in nature the processes are extrapolated, in IT they are predefined (even if the knowledge about them is not available). In addition, the less knowledge one has about the infrastructure, the higher the apparent complexity. Even if the system is not necessarily complex, the lack of knowledge and certainty about it makes it complex. The more one needs to dig for information and knowledge to get an acceptable level of knowledge and logical depth, the more time is needing for designing a solution.

Saint Exupéry’s definition of simplicity applies from a system’s functional point of view, though it doesn’t addresses the relative knowledge about the system, which often is implicit (in people’s heads). People have only fragmented knowledge about the system which makes it difficult creating the whole picture. It’s typically the role of system or process operational manuals, respectively of data descriptions, to make that knowledge explicit, establishing also a fundament for common knowledge and further communication and understanding.

Between the apparent (perceived) and real complexity of a system there’s an important gap that needs to be addressed if one wants to manage the systems adequately, respectively to simplify the systems. Often simplification happens when components or whole systems are replaced, consolidated or migrated, a mix between theses approaches existing as well. Simplifications at data level (aka data harmonization) or process level (aka process optimization and redesign) can have an important impact, being inherent to the good (optimal) functioning of systems.

Whether these changes occur in big-bang or gradual iterations it’s a question of available resources, organization capabilities, including the ability to handle such projects, respectively the impact, opportunities and risks associated with such endeavors. Beyond this, it’s important to regard the problems from a systemic and systematic point of view, in which ecology’s role is important.

27 June 2020

SQL Server Administration: Undocumented II (Execute Command for Each Table)

One neat undocumented feature useful in the administration of the SQL Server is the sp_MSforeachtable stored procedure which allows executing a set of several commands against each table available into the current database, or, with a few changes, against a set of predefined tables.  It allows running up to 3 commands as part of the main processing, as well a pre-command, typically used to set variable environments or perform initializations, respectively a post-command, typically used for control and cleanup processes. It replaces thus the need to run the commands within a cursor, even if the latter can occasionally provide more flexibility (see example).

The stored procedures provides the following parameters:
@command1, @command2, @command3: the main commands to be executed (at least the first command must be executed)
@precommand: a command to be executed before the @command1,
@postcommand: a command to be executed after all the commands were executed successfully,
@replacechar: the character used to represent the table within the commands, by default a question mark (?) character
@whereand: used to limit the scope only to a set of tables 

Before the introduction of Data Management Views with SQL Server 2005, the stored procedure was used to get the number of records for all tables:

-- getting the number of records for each table
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?'

To restrict running the command(s) only for a table, the object_id function can be used within the @whereand parameter:

-- getting the number of records for a table
EXEC sp_MSforeachtable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?' --
      , @whereand = ' And Object_id = Object_id(''[Person].[Address]'')'

In AX 2009 and even the later versions including Dynamics 365 (as long the direct access to the database is possible) the number of records could be restricted to a given company (aka business unit):

-- getting the number of records from a mandant
sp_MSforEachTable @command1 = 'SELECT ''?'' [Table], COUNT(*) NoRecords FROM WHERE DataAreaId IN (''DAT'')'

The procedude can be used to perform CRUD operayions on each table. During a data migration it was possible to clean out the business units not needed with just a simple script:

--delete the data for multiple data areas
 sp_MSforEachTable @command1 = 'DELETE FROM ? WHERE DataAreaId IN (''m01'')'

Administrations tasks can be performed as well, e.g. updating statistics, performing consistency checks for each table, etc.:

-- for each table 
EXEC sp_MSforeachtable 'EXEC sp_spaceused ''?''' -- displaying the disk space researved/used
EXEC sp_MSforeachtable 'UPDATE statistics ? WITH ALL' -- updating statistics
EXEC sp_MSforeachtable 'DBCC CHECKTABLE (''?'') WITH ALL_ERRORMSGS, NO_INFOMSGS' -- check table considerncy 

A set of tasks can be performed for a single table by providing the constraint within the @whereand parameter:

-- executing multiple commands for a table
EXEC sp_MSforeachtable @command1='UPDATE statistics ? WITH ALL' -- updating statistics
   , @command2= 'DBCC CHECKTABLE (''?'') WITH ALL_ERRORMSGS, NO_INFOMSGS' -- check table considerncy 
   , @whereand = ' And Object_Name(Object_id) = ''[Person].[Address]'''

Executing the commands against a subset of tables involves using a persisted or temporary table to store the name of the tables which shoul be considered, eventually with further metadata to allow filtering:

-- dropping the table
-- DROP TABLE IF EXISTS #Tables

-- create a temporary table
SELECT TableName
INTO #Tables 
FROM (VALUES ('[Person].[Address]')
, ('[Person].[AddressType]')
, ('[Person].[BusinessEntity]')) DAT(TableName)

-- getting the number of records for the list of tables
EXEC sp_MSForEachTable @command1='SELECT ''?'' [Table], COUNT(*) NoRecords FROM ?'
, @whereand = ' And Object_id In (Select Object_id(TableName) FROM #Tables)'

Warnings:
Do not forget to dump the temporary table when finished!
The code is provided only for exemplification purposes. You can use the above code on your own risk!
Undocumented features can be deprecated in future versions of SQL Server, therefore they should be used with precaution in long-term solutions.

Happy coding!:

25 June 2020

SQL Server Administration: Undocumented I (Reading the SQL Server Log)

Starting SQL Server 2005 Microsoft made available the sys.xp_readerrorlog system stored procedure, which allows reading the SQL Server as well the SQL Server Agent logs one by one. Since I learned about it, I prefer using it to search for errors in the log instead of using the Management Studio because it allows more flexibility and often seems to be much faster than the Log File Viewer.

The stored procedure exposes the following parameters:
p1: numeric value identifying the index of the log file: 0 = current, 1 = archive #1, 2 = archive #2, ...
p2: log file type: 1 or NULL = SQL Server log, 2 = SQL Agent log
p3: the string  to search for (e.g. Error)
p4: a second string t to search for (allows refining the results)
p5: start time of the events to be considered
p6: end time of the events to be considered
p7: sorting order for results: N'asc' = ascending, N'desc' = descending

The only required parameter is the first, thus the stored procedure can be called as follows:

-- reading the current SQL Server log (to be run individually)
EXEC xp_readerrorlog 0
EXEC xp_readerrorlog 0, 1
EXEC xp_readerrorlog 0, 1, N'Login failed' -- filters for failed logins 
EXEC xp_readerrorlog 0, 1, N'DBCC CHECKDB' -- filters for database consistency checks 
EXEC xp_readerrorlog 0, 1, N'Error' -- filters the Errors
EXEC xp_readerrorlog 0, 1, N'Error', N'Severity: 14' -- filters the Errors with severity 14
EXEC xp_readerrorlog 0, 1, N'', N'', '20200601', NULL, NULL -- start date
EXEC xp_readerrorlog 0, 1, N'', N'', '20200601', '20200607', NULL -- start date & end date 
EXEC xp_readerrorlog 0, 1, N'', N'', NULL, NULL, N'asc' -- ascending sorting

-- reading the last SQL Server log
EXEC xp_readerrorlog 1, 1


-- reading the current SQL Agent log
EXEC xp_readerrorlog 0, 2

Especially when filtering for certain information, the output can be displayed easily into a SSRS report. Besides errors typically I'm looking for failed logins or the results of consistency checks.

The number of log files available depends on how the SQL Server Logs were configured.

Even if the stored procedure provides the needed information fast, the error and the corresponding description are shown in separate records with the same timestamp. To process the records one can use a temporary table matching output's definition. To facilitate the selection of consecutive records a primary key can be added as well. One can dump into the table some or all of the log files and perform various searches or analyses.

-- dropping the table
--DROP TABLE IF EXISTS #tbl 

-- creating a temporary table
CREATE TABLE #tbl (
  id int IDENTITY(1,1) NOT NULL
, LogDate datetime2(3)
, ProcessInfo nvarchar(50)
, LogText varchar(max))    
  
-- reading the current log into it
INSERT INTO #tbl        
EXEC sys.xp_readerrorlog 0, 1

-- reading the last archive log into it
INSERT INTO #tbl        
EXEC sys.xp_readerrorlog 1, 1

-- retrieving the errors together with their descriptions 
SELECT A.LogDate
, A.ProcessInfo
, A.LogText
, B.LogText
FROM #tbl A
     JOIN #tbl B
    ON A.Id = B.Id - 1
WHERE A.LogText LIKE '%Error:%'

Here's the output of the last query (it will look differently on your server):


The temporary table can be used for further analyses (e.g. displaying the first, respectively last occurrence of the error together with a count):

-- first/last occurence of an error 
SELECT Min(A.LogDate) FirstOccurence
, Max(A.LogDate) LastOccurence
, count(*) NoRecords 
, A.LogText
, B.LogText
FROM #tbl A
     JOIN #tbl B
    ON A.Id = B.Id - 1
WHERE A.LogText LIKE '%Error:%'
GROUP BY A.LogText
, B.LogText


Warnings:
Do not forget to dump the temporary table when finished!
The code is provided only for exemplification purposes. You can use the above code on your own risk!
Be careful when the number of records in the logs is too big, because reading all the records can create temporary performance issues!
Undocumented features can be deprecated in future versions of SQL Server, therefore they should be used with precaution in long-term solutions.

Happy coding!

24 June 2020

Strategic Management: Simplicity I (Simple, but not that Simple)

Strategic Management

Simplicity of design has been for centuries the wholly grail of architects, while software designers seem somehow to situate themselves in opposition with the trend, as they aim using a mix of technologies that usually increase architecture’s complexity (sometimes the many, the newer and fancier, the better). Unfortunately, despite the implied but not necessarily reachable potential, each component added to an information system or infrastructure has the potential of increasing the overall complexity by a factor proportional to the degree of interactions it creates, respectively by the number of issues it creates or allows to propagate through these interactions.

On the other side one talks about simplicity in IT without actually stating what is intended by it. Quite often the aim is packed within the ‘keep it simple stupid’ (aka KISS) mantra, a modern and pejorative alternative of Occam’s razor. KISS became kind of principle in software architecture design and it can mean that a simple solution works better than a complex one, or that pursuing something in the simplest manner possible is usually better. The nuances are wide enough to cover a wide spectrum of solutions, arriving to statements that the simplest choice to make is the most appropriate one to make, thing that’s not necessarily true in IT, where complexity finds itself home.

Starting with the important number of technologies coexisting in integrations and ending with the exceptions existing in processes or the quality of data, things are almost never as simple as one may wish. An IT infrastructure’s complexity is dependent on the number of existing components, on whether they come from different generations or come from different vendor, on whether are deployed on different operating systems or are supported by different service providers, on the number of customizations made, on the degree of overlapping of the data and integrations needed to keep the data in synch, respectively of the differences existing in data models, quality and use. In general, the more variance, randomness and challenges one has, the higher the overall complexity.

Paraphrasing Saint Exupéry, in IT simplicity is reached when there is no longer anything to add or anything to take away, or in Hans Hofmann’s words, simplicity is reflected in ‘the ability to simplify means to eliminate the unnecessary so that the necessary may speak’. This refers to the features, what a piece of software can do, respectively the functionality, how a certain outcome is reached, which arrive to be packed in various logical aggregations (function point, functional requirement, story, epic, model, product, etc.) or physical aggregations (classes, components, packages, services, models, etc.). These are the levels at which one needs to address the simplicity adequately.

To make something simple one must be able either to design a solution up to the detail that there’s nothing to add or remove, or to start with something and remove or things to reach the simplicity. Both approaches involve a considerable effort and time, however the first approach can easily become utopian as some architectures are so complex that sooner or later the second approach comes into play. Therefore, one needs in general to focus on what seems an optimal solution and optimize it continuously in further iterations. Aiming for perfection from the beginning or also later in the improvement process is a foolhardy wish.

Even if simplicity is hard to achieve, one can still talk about the elegance of a solution, scenarios in which the various components fit together like the pieces of a puzzle, or about robustness, reliability, correctness, maintainability, usability or learnability. These latter characteristics are known in Software Engineering as (software) quality attributes.

21 June 2020

SSRS: Poor Design of Parameters

Dropdowns Constrained on the Source Query 

Typically the values needed in dropdown parameters are stored in tables or are predefined like in the previous post though it's not always the case. Supposing that there's no table to store the countries, I have seen reports in which the developers identified the values by using queries like the following:

-- Countries
SELECT DISTINCT SIC.CountryRegionCode Id
, SIC.CountryRegionName Name 
FROM Sales.vIndividualCustomer SIC 

It is true that the query limits the displayed values only to the ones used, however as soon the volume of data increases this can slow down the report considerably to the degree that the reports becomes unusable. Unfortunately, some developers favorize this approach when there is a table available. Just imagine that to run a report with complex logic the logic will be run a few times to populate the controls before actually being able to run the report. This slowly puts a burden on the source system and when similar reports are run, this can bring the source system to its knees.

This adds also some synchronization issues between the parameters, that are not so easy trackable as the values are not available for selection.

If no tables are available in the source system for the dropdowns then it's maybe a good idea to create some tables independently of the source system and populate them periodically. This works best for data warehouses.
If the users needed indeed only the used values, one can introduce flags in each table which will show whether the value is used or not. 

Too Many Values

If a dropdown has more than a few hundred values then consider having the free text instead of a dropdown. It's much easier to type some values that searching for them in a list. It's true that SSRS lacks a search functionality, which would make such searches easier. One can use in theory a passthrough report, however this has limited functionality. 

Wildcards Everywhere

Implementing wildcards in each text attribute is in general not a good idea. Especially the wildcards used in front of a search value doesn't make use of the available indexes. 

Too Many Dependent Dropdowns

There are structures which involve more than two dependencies of dropdowns on one another. Each dropdown implies a roundtrip to the data source and back, and upon case the user has to wait a considerable time until the parameters are filled. This in combination with the first mentioned issues can increase the damage. 


SSRS: Report Formatting (Third Magic Class)

Introduction

In the previous post were defined the parameters for the Customer Addresses report, however the design looks pretty basic as it was generated by the 'Reporting Wizard'. This can be slightly enhanced with a few changes. The order in which they are applied usually doesn't matter much.

Changing Columns' length

In Design mode drag of the columns to match the length of the attributes stored in them. This might require several tries.

Adding a Header/Footer

In Within reports body right click and from the floating menu choose 'Add Page Header', respectively 'Add Page Footer' like in the below screenshot: 

Adding the header/footer

Usually in the head I consider report's name as well two text boxes containing the date at which the report was added, respectively the number of pages the report displays. 

Cut report's header textbox from report's body into the header and move it to match the leftmost side of the report. In the header add two text boxes and provide the following formulas for the current date, respectively the number of pages: 
= Now().Date().ToString("dd.MM.yyyy"))
= Globals!PageNumber & " from " & Globals!TotalPages

Then drag of controls to match the dimensions of the blow grid like in the screen below:
Header

Note:
Then It's important to align the various controls so their margins match together, otherwise will be added empty columns or rows in the Excel exports, users having to delete the empty cells manually each time they want to use the data.

Adding the Parameters:

Even if the parameters are displayed within browser, as soon the data are exported to Excel they are lost. Therefore, to identify with which values the report was used, it is useful to add the parameters in table's header. Add one or two lines above the header, and merge a few cells to accommodate the parameters as below:


The merged cells contain the following formulas:
= "Country: " & Parameters!Country.Label
= "Customer Name: " & Parameters!CustomerName.Value
= "Postal Code: " & Parameters!PostalCode.Value
= "Phone Number:" & Parameters!PhoneNumber.Value
= "Address Type: " & Parameters!AddressType.Label
= "Person Type: " & Parameters!PersonType.Label


Formatting Table

One can go with the available formatting, however table's header could be changed of 10 pt font size. One can add also a background color for the header, format the top and bottom border and align columns' name in the center. The background color for the rows with the parameters must be non-transparent (e.g. white) if the header is fixed.

As concerns the data rows, one can add different colors for alternate rows. For this select the row and as background color provide the following formula:
=IIf(RowNumber(Nothing) Mod 2 = 0, "White", "LightSteelBlue")

Freezing the Header

One can freeze the header during scrolling by setting the property FixedData='True' for each static row. For this you need to change to Advance mode and set FixedData  property for the first three static rows as follows:


Thus, when scrolling, the header will remain fix:


Probably the choice of collars is not the best, though it can be for example adapted to organizations' colors. One can add also a logo, the name of the user who run the report, or whatever information are still needed.

SSRS: Report Parameters (Second Magic Class)

Introduction

One of the advantages of SQL Server Reporting Services (SSRS) is that once  a query available, in just a few minutes one can display the data into a report. This post shows how to add parameters to a report. For exemplification is used the same view from the first post of the series.

As prerequisite for this post one needs to have ideally the Reporting Services Extension installed. 

Identifying the Parameters

The first step is deciding which parameters are required or make sense to add to the report. The parameters together with their type can be documented in a list similar to the one below:

Parameter list

SSRS differentiates between five types of parameters:  Text, Boolean, Date/Time, Integer and Float. 

Reviewing the Source Query

Before adding the parameters to the report is recommended to look at the source query to check whether it allows the efficient use of parameters. Based on the above list of parameters the standard view needs to be modified as follow (the new columns are marked with a 'new' comment):

-- Individual Customers view
ALTER VIEW [Sales].[vIndividualCustomer] 
AS 
SELECT p.[BusinessEntityID]
, p.PersonType PersonTypeId--new
, CASE p.PersonType 
	WHEN 'SC' THEN 'Store Contact'
    WHEN 'IN' THEN 'Individual (retail) customer'
    WHEN 'SP' THEN 'Sales person'
    WHEN 'EM' THEN 'Employee (non-sales)'
    WHEN 'VC' THEN 'Vendor contact'
    WHEN 'GC' THEN 'General contact'
 END PersonType --new
, p.[Title]
, p.[FirstName]
, p.[MiddleName]
, p.[LastName]
, p.[Suffix]
, pp.[PhoneNumber]
, pnt.[Name] AS [PhoneNumberType]
, ea.[EmailAddress]
, p.[EmailPromotion]
, at.[AddressTypeID] --new
, at.[Name] AS [AddressType]
, a.[AddressLine1]
, a.[AddressLine2]
, a.[City]
, sp.StateProvinceCode --new
, [StateProvinceName] = sp.[Name]
, a.[PostalCode]
, sp.CountryRegionCode -- new
, [CountryRegionName] = cr.[Name]
, p.[Demographics]
FROM [Person].[Person] p
    INNER JOIN [Person].[BusinessEntityAddress] bea 
    ON bea.[BusinessEntityID] = p.[BusinessEntityID] 
    INNER JOIN [Person].[Address] a 
    ON a.[AddressID] = bea.[AddressID]
    INNER JOIN [Person].[StateProvince] sp 
    ON sp.[StateProvinceID] = a.[StateProvinceID]
    INNER JOIN [Person].[CountryRegion] cr 
    ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
    INNER JOIN [Person].[AddressType] at 
    ON at.[AddressTypeID] = bea.[AddressTypeID]
	INNER JOIN [Sales].[Customer] c
	ON c.[PersonID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[EmailAddress] ea
	ON ea.[BusinessEntityID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[PersonPhone] pp
	ON pp.[BusinessEntityID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
	ON pnt.[PhoneNumberTypeID] = pp.[PhoneNumberTypeID]
WHERE c.StoreID IS NULL;

Preparing the Source Query

Based on this, the source query can be defined as follows:

-- Customer Addresses
SELECT SIC.Title   
, SIC.PersonType   
, SIC.FirstName   
, SIC.LastName   
, SIC.AddressType   
, SIC.AddressLine1   
, SIC.City   
, SIC.PostalCode   
, SIC.CountryRegionName   
, SIC.PhoneNumber   
, SIC.EmailAddress   
FROM Sales.vIndividualCustomer SIC 
WHERE CountryRegionName = 'Germany'
  AND PostalCode = '42651'
ORDER BY SIC.FirstName  
, SIC.LastName   
, SIC.AddressLine1   

Note:
When preparing a source query for your report is important to limit the number of the records returned to reduce the waiting time until the data are displayed and minimize the traffic to the database. 

Generating the Report

Creating a report is straightforward in SSRS because the 'Report Wizard' takes care of everything in the background. 

Generated report with the Report Wizard

  After testing the report, right click on the created dataset (Dataset1) in design mode and change its name to be meaningful (e.g. Addresses).
Dataset properties

   In this step I prefer transforming the query into a formula because it allows more flexibility in handling the parameters dynamically. Click on the formula and replace the query with the following text: 

 = " -- Customer Addresses " & vbCrLf
 & " SELECT SIC.Title    " & vbCrLf
 & " , SIC.PersonType    " & vbCrLf
 & " , SIC.FirstName    " & vbCrLf
 & " , SIC.LastName    " & vbCrLf
 & " , SIC.AddressType    " & vbCrLf
 & " , SIC.AddressLine1    " & vbCrLf
 & " , SIC.City    " & vbCrLf
 & " , SIC.PostalCode    " & vbCrLf
 & " , SIC.CountryRegionName    " & vbCrLf
 & " , SIC.PhoneNumber    " & vbCrLf
 & " , SIC.EmailAddress    " & vbCrLf
 & " FROM Sales.vIndividualCustomer SIC  " & vbCrLf
 & " WHERE CountryRegionName = 'Germany' " & vbCrLf
 & "   AND PostalCode = '42651' " & vbCrLf
 & " ORDER BY SIC.FirstName   " & vbCrLf
 & " , SIC.LastName    " & vbCrLf
 & " , SIC.AddressLine1 " & vbCrLf

This can be obtained pretty easy by using a formula in Excel like below:

Preparing the dynamic query statement

Formula: = " & "" " & A1 & " "" & vbCrLf"

Note:
In previous versions of reporting environment there must be an empty space after each vbCrLf except the last line, otherwise an error will be shown and the report can't be used at all. When the empty space misses is tricky to find the line where this happened. To validate this one can put the mouse at the end of each line and see whether it lands after the vbCrLf or after an empty space.

Preparing the Queries for Dropdowns

Before adding the parameters it might be a good idea to prepare the queries for the dropdowns. For this let's consider the simplest parameters: 

-- Address Types
SELECT 0 Id
, '(All)' Name
UNION ALL
SELECT AddressTypeID
, Name
FROM [Person].[AddressType]
ORDER BY Id

The query has two parts, one made from the database query to which is added a row marking the '(All)' scenario in which the parameter is actually ignored, and thus all the records for this parameter will be shown. This is typically the default value. 

SSRS has another solution for showing all the values, though I prefer this approach. 

In all the dropdown queries I prefer to use the same names ('Id', respectively 'Name'), independently on how the attributes are called, as they make the purpose explicit. Because the data type of the parameter is integer and is a reference key, the default value can be 0 or also -1 if 0 values are available in the dataset. 

For Text parameters one can use the empty string as parameter, like in the below example for Country Codes: 

-- Countries
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT CountryRegionCode Id
, Name 
FROM [Person].[CountryRegion]
ORDER BY Id

Adding the Parameters

To add a Parameter right click on 'Parameters' within the 'Report Data' and then klick on 'Add Parameter'. This will open the 'Report Parameter Properties' which by default looks like this:

Report Parameter Properties

Each parameter type requires different settings to be considered. 

Defining Freetext parameters

Customer Name, Postal Code and Phone Number are Text parameters shown in Freetext controls and all three are setup similarly (see Parameter List above): 

General settings for text parameters

Defining Dropdowns with Fix Values

The General setting for the Person Type are similar with the ones considered for Freetext, however it has a set of predefined values which can be added in the Available Values as follows: 

Available Values

The default value can be provided into the 'Default Values' section. Just click on 'Add' and enter an empty space:

Default Values for fix dropdown


Defining Dropdowns with Dynamics Values

Country parameter is defined similarly, however as Available Values is selected the 'Get values from a query' option. The Countries dataset is selected as source while 'Id' and 'Name' are provided as below: 

Available Values for dynamic dropdown

Same can be done also for the AddressType parameter having the AddressTypes datasource as source. 

After adding all the parameters and choosing the order in which they should be displayed, the Report Data section will look like this:

Report's parameters

Adding the Parameters to the Query

For this the query needs to be added as follows: 

= "-- Customer Addresses " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " DECLARE @CustomerName as nvarchar(100) = '" & Parameters!CustomerName.Value & "'" & vbcrlf 
 & " DECLARE @PostalCode as nvarchar(15) = '" & Parameters!PostalCode.Value & "'" & vbcrlf 
 & " DECLARE @PhoneNumber as nvarchar(25) = '" & Parameters!PhoneNumber.Value & "'" & vbcrlf 
 & " DECLARE @AddressType as int = " & Parameters!AddressType.Value & vbcrlf 
 & " DECLARE @PersonType as nvarchar(3) = '" & Parameters!PersonType.Value & "'" & vbcrlf 
 & " SELECT SIC.Title   " & vbCrLf
 & ", SIC.PersonType   " & vbCrLf
 & ", SIC.FirstName   " & vbCrLf
 & ", SIC.LastName   " & vbCrLf
 & ", SIC.AddressType   " & vbCrLf
 & ", SIC.AddressLine1   " & vbCrLf
 & ", SIC.City   " & vbCrLf
 & ", SIC.PostalCode   " & vbCrLf
 & ", SIC.CountryRegionName   " & vbCrLf
 & ", SIC.PhoneNumber   " & vbCrLf
 & ", SIC.EmailAddress   " & vbCrLf
 & " FROM Sales.vIndividualCustomer SIC " & vbCrLf
 & " WHERE 0=0 " & vbCrLf
 & IIf(Parameters!Country.Value<> "", " 		  AND SIC.CountryRegionCode = @Country", "") & vbcrlf 
 & IIf(Parameters!CustomerName.Value<> "", " 		  AND (SIC.FirstName  LIKE '%' + @CustomerName + '%' OR SIC.LastName  LIKE '%' + @CustomerName + '%')", "") & vbcrlf 
 & IIf(Parameters!PostalCode.Value<> "", " 		  AND SIC.PostalCode = @PostalCode", "") & vbcrlf 
 & IIf(Parameters!PhoneNumber.Value<> "", " 		  AND SIC.PhoneNumber LIKE '%' + @PhoneNumber + '%'", "") & vbcrlf
 & IIf(Parameters!PersonType.Value<> "", " 		  AND SIC.PersonTypeId = @PersonType", "") & vbcrlf
 & IIf(Parameters!AddressType.Value<> 0, " 		  AND SIC.AddressTypeId = @AddressType", "") & vbcrlf 
 & " ORDER BY SIC.FirstName  " & vbCrLf
 & ", SIC.LastName   " & vbCrLf
 & ", SIC.AddressLine1   " & vbCrLf

For each report parameter is defined a SQL parameter in the header with the appropriate data type and length, and a constraint within the WHERE clause. The parameter will be considered only if a valid value was provided. One has to take into consideration the type of the parameter. 

Testing the Report

If everything worked correctly the report will run without problems:

Testing the report

For testing the report it makes sense to choose the parameters one by one from the general to the more particular case. For example for a first test I would choose 'Germany' as Country, then in a second run i would choose one of the values from the resulting dataset, e.g. 'Arthur' as Customer Name, while in a thirst run I would pick a value for the Postal Code. This approach allows minimizing the volume of time required for testing. 

Troubleshooting

Errors in formulas are sometimes difficult to troubleshoot especially when the report refuses to run. The SQL errors are easier to troubleshoot into Management Studio (SSMS). Just copy the query in SSMS and replace the added formatting.

Errors into the formula require unfortunately more time, however most of the errors I had were related to the missing empty space, or forgetting to close a quote or incorrect formatting. 

20 June 2020

SQL Unloaded: When Queries Look Like Ducks

There’s an expression of abductive reasoning “if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck” used to identify something indefinite based on certain (definite) characteristics. When one looks at a query written by someone else, one tries to identify such characteristics. However, there are also situations in which simple queries are hard to interpret. For example, each time I see a SELECT DISTINCT in a query I ask myself whether the developer wanted to shortcut a GROUP BY, or just wanted to avoid duplicated data from the output. Without looking at the data in detail it’s almost impossible deciding between the two scenarios.

Let’s consider the following simple example based on a set of orders sent over the interface between two systems. Because of poor design, the first position was caught by the logic before entering the full order (the price from it is incorrect):

-- dropping the table
--DROP TABLE IF EXISTS dbo.T_Orders 

-- inserting the records 
SELECT *
INTO dbo.T_Orders 
FROM (VALUES ('P0001', 1,'20200618', '100001','S','black',20, 1.23)
, ('P0001', 1,'20200618', '100001','S','black',20, 1.22)
, ('P0001', 2,'20200618', '100001','S','black',5, 1.23)
, ('P0001', 3,'20200618', '100001','M','black',10, 1.24)
, ('P0002', 1,'20200619', '100001','S','black',5, 1.23)
, ('P0002', 2,'20200619', '100001','S','black',10, 1.22)) DAT (PurchId, Pos, PurchDate, ItemId, SizeId, ColorId, Qty, Price)

-- adding a primary key
ALTER TABLE dbo.T_Orders 
ADD Id int identity(1,1) NOT NULL

-- reviewing the data
SELECT *
FROM dbo.T_Orders 
ORDER BY PurchId
, Pos

The following queries return the same results :


-- simple SELECT DISTINCT 
SELECT DISTINCT PurchId, Pos, ItemId, SizeId, ColorId, Qty
FROM dbo.T_Orders

-- simple GROUP BY
SELECT PurchId, Pos, ItemId, SizeId, ColorId, Qty
FROM dbo.T_Orders
GROUP BY PurchId, Pos, ItemId, SizeId, ColorId, Qty

Both queries hide the fact that a duplicate exists. However, with a small change in the last query (adding a count) one can highlight that, while the first query doesn't allow this flexibility:

-- simple GROUP BY with Count
SELECT PurchId, Pos, ItemId, SizeId, ColorId, Qty
, count(*) NoRecords
FROM dbo.T_Orders
GROUP BY PurchId, Pos, ItemId, SizeId, ColorId, Qty
-- HAVING count(*)>1

This small change can make an important difference when one needs to analyze the data, so at least from this perspective it might be worth to write the query as a GROUP BY from the beginning, especially when writing complex queries.

There are also nested SELECT DISTINCTS that attempt refining the selection in successive steps. Such curiosities seldom make sense:

-- nested SELECT DISTINCT
SELECT DISTINCT PurchId, Pos, ItemId
FROM (
 SELECT DISTINCT PurchId, Pos, ItemId, SizeId, ColorId, Qty
 FROM dbo.T_Orders
) DAT

To remove the duplicates and perform a sum one may choose another curiosity of query - the inner query removes the duplicates (some write it also as a DISTINCT), while in the outer query is done the sum:

-- double grouping to remove duplicates
SELECT PurchId, Pos, ItemId, SizeId, ColorId
, SUM(Qty) Total
FROM (--inner query
 SELECT PurchId, Pos, ItemId, SizeId, ColorId, Qty
 FROM dbo.T_Orders
 GROUP BY PurchId, Pos, ItemId, SizeId, ColorId, Qty
) DAT
GROUP BY PurchId, Pos, ItemId, SizeId, ColorId

Even if such queries (miraculously) work upon case, sooner or later they are predestined to fail. When doing operations on data whose quality is doubtful, one needs to select a strategy for removing the duplicates. The first step is to identify the attributes which make a record unique (e.g. PurchId and Pos), and using a sorting criteria, one can remove the duplicates via a window function like Rank, Dense_Rank or Row_Number (with small exceptions their use is interchangeable). Supposing that the Id attribute determinates the order in which the data were added, one can write a similar query:

-- removing duplicates via ranking partition window function 
SELECT *
FROM (
 SELECT PurchId, Pos, PurchDate, ItemId, SizeId, ColorId, Qty, Price
 , RANK() OVER (PARTITION BY PurchId, Pos ORDER BY Id DESC) Ranking 
 FROM dbo.T_Orders
) DAT 
WHERE Ranking=1

The problem is that such queries need to be repeated in each use. A simpler method is running a regular check for duplicates and marking the duplicates as inactive or pushing them into a separate table. Deleting the duplicates can work as well, though this approach can hit back like a boomerang if not addressed correctly.

To identify the duplicates one of the following approaches can be used:

-- duplicates via grouping in inner query
SELECT DAT.*
, DUP.NoRecords 
FROM dbo.T_Orders DAT
     JOIN (-- duplicates
 SELECT PurchId, Pos
 , count(*) NoRecords
 FROM dbo.T_Orders DUP
 GROUP BY PurchId, Pos
 HAVING count(*)>1 
  ) DUP 
   ON DAT.PurchId = DUP.Purchid 
   AND DAT.Pos = DUP.Pos 

-- duplicates via grouping in correlated query 
SELECT DAT.*
, DUP.NoRecords 
FROM dbo.T_Orders DAT
     CROSS APPLY (-- duplicates
 SELECT count(*) NoRecords
 FROM dbo.T_Orders DUP
 WHERE DAT.PurchId = DUP.Purchid 
   AND DAT.Pos = DUP.Pos 
 GROUP BY PurchId, Pos
 HAVING count(*)>1 
  ) DUP 


-- duplicates via count within aggregate window function 
SELECT *
FROM (
 SELECT PurchId, Pos, PurchDate, ItemId, SizeId, ColorId, Qty, Price
 , count(*) OVER (PARTITION BY PurchId, Pos) NoRecords 
 FROM dbo.T_Orders
) DAT 
WHERE NoRecords>1


Happy coding!
Related Posts Plugin for WordPress, Blogger...