Showing posts with label troubleshooting. Show all posts
Showing posts with label troubleshooting. Show all posts

27 January 2024

Data Science: Back to the Future I (About Beginnings)

Data Science
Data Science Series

I've attended again, after several years, a webcast on performance improvement in SQL Server with Claudio Silva, “Writing T-SQL code for the engine, not for you”. The session was great and I really enjoyed it! I recommend it to any data(base) professional, even if some of the scenarios presented should be known already.

It's strange to see the same topics from 20-25 years ago reappearing over and over again despite the advancements made in the area of database engines. Each version of SQL Server brought something new in what concerns the performance, though without some good experience and understanding of the basic optimization and troubleshooting techniques there's little overall improvement for the average data professional in terms of writing and tuning queries!

Especially with the boom of Data Science topics, the volume of material on SQL increased considerably and many discover how easy is to write queries, even if the start might be challenging for some. Writing a query is easy indeed, though writing a performant query requires besides the language itself also some knowledge about the database engine and the various techniques used for troubleshooting and optimization. It's not about knowing in advance what the engine will do - the engine will often surprise you - but about knowing what techniques work, in what cases, which are their advantages and disadvantages, respectively on how they might impact the processing.

Making a parable with writing literature, it's not enough to speak a language; one needs more for becoming a writer, and there are so many levels of mastery! However, in database world even if creativity is welcomed, its role is considerable diminished by the constraints existing in the database engine, the problems to be solved, the time and the resources available. More important, one needs to understand some of the rules and know how to use the building blocks to solve problems and build reliable solutions.

The learning process for newbies focuses mainly on the language itself, while the exposure to complexity is kept to a minimum. For some learners the problems start when writing queries based on multiple tables -  what joins to use, in what order, how to structure the queries, what database objects to use for encapsulating the code, etc. Even if there are some guidelines and best practices, the learner must walk the path and experiment alone or in an organized setup.

In university courses the focus is on operators algebras, algorithms, on general database technologies and architectures without much hand on experience. All is too theoretical and abstract, which is acceptable for research purposes,  but not for the contact with the real world out there! Probably some labs offer exposure to real life scenarios, though what to cover first in the few hours scheduled for them?

This was the state of art when I started to learn SQL a quarter century ago, and besides the current tendency of cutting corners, the increased confidence from doing some tests, and the eagerness of shouting one’s shaking knowledge and more or less orthodox ideas on the various social networks, nothing seems to have changed! Something did change – the increased complexity of the problems to solve, and, considering the recent technological advances, one can afford now an AI learn buddy to write some code for us based on the information provided in the prompt.

This opens opportunities for learning and growth. AI can be used in the learning process by providing additional curricula for learners to dive deeper in some topics. Moreover, it can help us in time to address the challenges of the ever-increase complexity of the problems.

27 February 2021

Python: Installing PySpark and GraphFrames on a Windows 10 Machine

One of the To-Dos for this week was to set up the environment so I can start learning PySpark and GraphFrames based on the examples from Needham & Hodler’s free book on Graph Algorithms. Therefore, I downloaded and installed the Java SDK 8 from the Oracle website (requires an Oracle account) and the latest stable version of Python (Python 3.9.2), downloaded and unzipped the Apache Spark package locally on a Windows 10 machine, respectively the Winutils tool as described here.

The setup requires several environment variables that need to be created, respectively the Path variable needs to be extended with further values (delimited by ";"). In the end I added the following values:

VariableValue
HADOOP_HOMED:\Programs\spark-3.0.2-bin-hadoop2.7
SPARK_HOMED:\Programs\spark-3.0.2-bin-hadoop2.7
JAVA_HOMED:\Programs\Java\jdk1.8.0_281
PYTHONPATHD:\Programs\Python\Python39\
PYTHONPATH;%SPARK_HOME%\python
PYTHONPATH%SPARK_HOME%\python\lib\py4j-0.10.9-src.zip
PATH%HADOOP_HOME%\bin
PATH%SPARK_HOME%\bin
PATH%PYTHONPATH%
PATH%PYTHONPATH%\DLLs
PATH%PYTHONPATH%\Lib
PATH%JAVA_HOME%\bin

I tried then running the first example from Chapter 3 using the Spyder IDE, though the environment didn’t seem to recognize the 'graphframes' library. As long it's not already available, the graphframes .jar file (e.g. graphframes-0.8.1-spark3.0-s_2.12.jar) corresponding to the installed Spark version must be downloaded and copied in the Spark folder where the other .jar files are available (e.g. .\spark-3.0.2-bin-hadoop2.7\jars). With this change I could finally run my example, though it took me several tries to get this right. 

During Python's installation I had to change the value for the LongPathsEnabled setting from 0 to 1 via regedit to allow path lengths longer than 260 characters, as mentioned in the documentation. The setting is available via the following path:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

In the process I also tried installing ‘pyspark’ and ‘graphframes’ via the Anaconda tool with the following commands:

pip3 install --user pyspark
pip3 install --user graphframes

From Anaconda’s point of view the installation was correct, fact which pointed me to the missing 'graphframe' library.

It took me 4-5 hours of troubleshooting and searching until I got my environment setup. I still have two more warnings to solve, though I will look into this later:
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

Notes:
Spaces in the folder's names might creates issues. Therefore, I used 'Programs' instead of 'Program Files' as main folder. 
There seem to be some confusion what environment variables are needed and how they need to be configured.
Unfortunately, the troubleshooting involved in setting up an environment and getting a simple example to work seems to be a recurring story over the years. Same situation was with the programming languages from 15-20 years ago. 

04 February 2021

Data Migrations (DM): Conceptualization VI (Data Migration Layer)

Data Migration
Data Migrations Series

Besides migrating the master and transactional data from the legacy systems there are usually three additional important business requirements for a Data Migration (DM) – migrate the data within expected timeline, with minimal disruption for the business, respectively within expected quality levels. Hence, DM’ timeline must match and synchronize with main project’s timeline in terms of main milestones, though the DM needs to be executed typically within a small timeframe of a few days during the Go-Live. In what concerns the third requirement, even if the data have high quality as available in the source systems or provided by the business, there are aspects like integration and consistency that rely primarily on the DM logic.

To address these requirements the DM logic must reach a certain level of performance and quality that allows importing the data as expected. From project’s beginning until UAT the DM team will integrate the various information iteratively, will need to test the changes several times, troubleshoot the deviations from expectations. The volume of effort required for these activities can be overwhelming. It’s not only important for the whole solution to be performant but each step must be designed so that besides fast execution, the changes and troubleshooting must involve a minimum of overhead.

For better understanding the importance, imagine a quest game in which the character has to go through a labyrinth with traps. If the player made a mistake he’ll need to restart from a certain distant point in time or even from the beginning. Now imagine that for each mistake he has the possibility of going one step back try a new option and move forward. For some it may look like cheating though in this way one can finish the game relatively quickly. It would be great if executing a DM could allow the same flexibility.

Unfortunately, unless the data are stored between steps or each step is a different package, an ETL solution doesn’t provide the flexibility of changing the code, moving one step behind, rerunning the step and performing troubleshooting, and this over and over again like in the quest game. To better illustrate the impact of such approach let’s consider that the DM has about 40 entities and one needs to perform on average 20 changes per entity. If one is able to move forwards and backwards probably each change will take about a few minutes to execute the code. Otherwise rerunning a whole package can take 5-10 times or even more as this can depend on packages’ size and data volume. For 800 changes only an additional minute per change equates with 800 minutes (about 13 hours).

In exchange, storing the data for an entity in a database for the important points of the processing and implementing the logic as a succession of SQL scripts allows this flexibility. The most important downside is that the steps need to be executed manually though this is a small price to pay for the flexibility and control gained. Moreover, with a few tricks one can load deltas as in the case of a phased DM.

To assure that the consistency of the data is kept one needs to build for each entity a set of validation queries that check for duplicates, for special cases, for data integrity, incorrect format, etc. The queries can be included in the sequence of logic used for the DM. Thus, one can react promptly to each unexpected value. When required, the validation rules can be built within reports and used in the data cleaning process by users, or even logged periodically per entity for tracking the progress.

Previous Post <<||>> Next Post

21 June 2020

SSRS (& Paginated Reports): Report Parameters (Second Magic Class)

Introduction

One of the advantages of SQL Server Reporting Services (SSRS) is that once  a query available, in just a few minutes one can display the data into a report. This post shows how to add parameters to a report. For exemplification is used the same view from the first post of the series.

As prerequisite for this post one needs to have ideally the Reporting Services Extension installed. 

Identifying the Parameters

The first step is deciding which parameters are required or make sense to add to the report. The parameters together with their type can be documented in a list similar to the one below:

Parameter list

SSRS differentiates between five types of parameters:  Text, Boolean, Date/Time, Integer and Float. 

Reviewing the Source Query

Before adding the parameters to the report is recommended to look at the source query to check whether it allows the efficient use of parameters. Based on the above list of parameters the standard view needs to be modified as follow (the new columns are marked with a 'new' comment):

-- Individual Customers view
ALTER VIEW [Sales].[vIndividualCustomer] 
AS 
SELECT p.[BusinessEntityID]
, p.PersonType PersonTypeId--new
, CASE p.PersonType 
	WHEN 'SC' THEN 'Store Contact'
    WHEN 'IN' THEN 'Individual (retail) customer'
    WHEN 'SP' THEN 'Sales person'
    WHEN 'EM' THEN 'Employee (non-sales)'
    WHEN 'VC' THEN 'Vendor contact'
    WHEN 'GC' THEN 'General contact'
 END PersonType --new
, p.[Title]
, p.[FirstName]
, p.[MiddleName]
, p.[LastName]
, p.[Suffix]
, pp.[PhoneNumber]
, pnt.[Name] AS [PhoneNumberType]
, ea.[EmailAddress]
, p.[EmailPromotion]
, at.[AddressTypeID] --new
, at.[Name] AS [AddressType]
, a.[AddressLine1]
, a.[AddressLine2]
, a.[City]
, sp.StateProvinceCode --new
, [StateProvinceName] = sp.[Name]
, a.[PostalCode]
, sp.CountryRegionCode -- new
, [CountryRegionName] = cr.[Name]
, p.[Demographics]
FROM [Person].[Person] p
    INNER JOIN [Person].[BusinessEntityAddress] bea 
    ON bea.[BusinessEntityID] = p.[BusinessEntityID] 
    INNER JOIN [Person].[Address] a 
    ON a.[AddressID] = bea.[AddressID]
    INNER JOIN [Person].[StateProvince] sp 
    ON sp.[StateProvinceID] = a.[StateProvinceID]
    INNER JOIN [Person].[CountryRegion] cr 
    ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
    INNER JOIN [Person].[AddressType] at 
    ON at.[AddressTypeID] = bea.[AddressTypeID]
	INNER JOIN [Sales].[Customer] c
	ON c.[PersonID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[EmailAddress] ea
	ON ea.[BusinessEntityID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[PersonPhone] pp
	ON pp.[BusinessEntityID] = p.[BusinessEntityID]
	LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
	ON pnt.[PhoneNumberTypeID] = pp.[PhoneNumberTypeID]
WHERE c.StoreID IS NULL;

Preparing the Source Query

Based on this, the source query can be defined as follows:

-- Customer Addresses
SELECT SIC.Title   
, SIC.PersonType   
, SIC.FirstName   
, SIC.LastName   
, SIC.AddressType   
, SIC.AddressLine1   
, SIC.City   
, SIC.PostalCode   
, SIC.CountryRegionName   
, SIC.PhoneNumber   
, SIC.EmailAddress   
FROM Sales.vIndividualCustomer SIC 
WHERE CountryRegionName = 'Germany'
  AND PostalCode = '42651'
ORDER BY SIC.FirstName  
, SIC.LastName   
, SIC.AddressLine1   

Note:
When preparing a source query for your report is important to limit the number of the records returned to reduce the waiting time until the data are displayed and minimize the traffic to the database. 

Generating the Report

Creating a report is straightforward in SSRS because the 'Report Wizard' takes care of everything in the background. 

Generated report with the Report Wizard

  After testing the report, right click on the created dataset (Dataset1) in design mode and change its name to be meaningful (e.g. Addresses).

Dataset properties

   In this step I prefer transforming the query into a formula because it allows more flexibility in handling the parameters dynamically. Click on the formula and replace the query with the following text: 

 = " -- Customer Addresses " & vbCrLf
 & " SELECT SIC.Title    " & vbCrLf
 & " , SIC.PersonType    " & vbCrLf
 & " , SIC.FirstName    " & vbCrLf
 & " , SIC.LastName    " & vbCrLf
 & " , SIC.AddressType    " & vbCrLf
 & " , SIC.AddressLine1    " & vbCrLf
 & " , SIC.City    " & vbCrLf
 & " , SIC.PostalCode    " & vbCrLf
 & " , SIC.CountryRegionName    " & vbCrLf
 & " , SIC.PhoneNumber    " & vbCrLf
 & " , SIC.EmailAddress    " & vbCrLf
 & " FROM Sales.vIndividualCustomer SIC  " & vbCrLf
 & " WHERE CountryRegionName = 'Germany' " & vbCrLf
 & "   AND PostalCode = '42651' " & vbCrLf
 & " ORDER BY SIC.FirstName   " & vbCrLf
 & " , SIC.LastName    " & vbCrLf
 & " , SIC.AddressLine1 " & vbCrLf

This can be obtained pretty easy by using a formula in Excel like below:

Preparing the dynamic query statement

Formula: = "& "" " & A1 & " "" & vbCrLf "

Note:
In previous versions of reporting environment there must be an empty space after each vbCrLf except the last line, otherwise an error will be shown and the report can't be used at all. When the empty space misses is tricky to find the line where this happened. To validate this one can put the mouse at the end of each line and see whether it lands after the vbCrLf or after an empty space.

Preparing the Queries for Dropdowns

Before adding the parameters it might be a good idea to prepare the queries for the dropdowns. For this let's consider the simplest parameters: 

-- Address Types
SELECT 0 Id
, '(All)' Name
UNION ALL
SELECT AddressTypeID
, Name
FROM [Person].[AddressType]
ORDER BY Id

The query has two parts, one made from the database query to which is added a row marking the '(All)' scenario in which the parameter is actually ignored, and thus all the records for this parameter will be shown. This is typically the default value. 

SSRS has another solution for showing all the values, though I prefer this approach. 

In all the dropdown queries I prefer to use the same names ('Id', respectively 'Name'), independently on how the attributes are called, as they make the purpose explicit. Because the data type of the parameter is integer and is a reference key, the default value can be 0 or also -1 if 0 values are available in the dataset. 

For Text parameters one can use the empty string as parameter, like in the below example for Country Codes: 

-- Countries
SELECT '' Id
, '(All)' Name
UNION ALL
SELECT CountryRegionCode Id
, Name 
FROM [Person].[CountryRegion]
ORDER BY Id

Adding the Parameters

To add a Parameter right click on 'Parameters' within the 'Report Data' and then klick on 'Add Parameter'. This will open the 'Report Parameter Properties' which by default looks like this:

Report Parameter Properties

Each parameter type requires different settings to be considered. 

Defining Freetext parameters

Customer Name, Postal Code and Phone Number are Text parameters shown in Freetext controls and all three are setup similarly (see Parameter List above): 

General settings for text parameters

Defining Dropdowns with Fix Values

The General setting for the Person Type are similar with the ones considered for Freetext, however it has a set of predefined values which can be added in the Available Values as follows: 

Available Values

The default value can be provided into the 'Default Values' section. Just click on 'Add' and enter an empty space:

Default Values for fix dropdown


Defining Dropdowns with Dynamics Values

Country parameter is defined similarly, however as Available Values is selected the 'Get values from a query' option. The Countries dataset is selected as source while 'Id' and 'Name' are provided as below: 

Available Values for dynamic dropdown

Same can be done also for the AddressType parameter having the AddressTypes datasource as source. 

After adding all the parameters and choosing the order in which they should be displayed, the Report Data section will look like this:

Report's parameters

Note:
When designing parameters one should consider also the best practices for avoiding the poor design of parameters.

Adding the Parameters to the Query

For this the query needs to be added as follows: 

= "-- Customer Addresses " & vbCrLf
 & " DECLARE @Country as nvarchar(3) = '" & Parameters!Country.Value & "'" & vbcrlf 
 & " DECLARE @CustomerName as nvarchar(100) = '" & Parameters!CustomerName.Value & "'" & vbcrlf 
 & " DECLARE @PostalCode as nvarchar(15) = '" & Parameters!PostalCode.Value & "'" & vbcrlf 
 & " DECLARE @PhoneNumber as nvarchar(25) = '" & Parameters!PhoneNumber.Value & "'" & vbcrlf 
 & " DECLARE @AddressType as int = " & Parameters!AddressType.Value & vbcrlf 
 & " DECLARE @PersonType as nvarchar(3) = '" & Parameters!PersonType.Value & "'" & vbcrlf 
 & " SELECT SIC.Title   " & vbCrLf
 & ", SIC.PersonType   " & vbCrLf
 & ", SIC.FirstName   " & vbCrLf
 & ", SIC.LastName   " & vbCrLf
 & ", SIC.AddressType   " & vbCrLf
 & ", SIC.AddressLine1   " & vbCrLf
 & ", SIC.City   " & vbCrLf
 & ", SIC.PostalCode   " & vbCrLf
 & ", SIC.CountryRegionName   " & vbCrLf
 & ", SIC.PhoneNumber   " & vbCrLf
 & ", SIC.EmailAddress   " & vbCrLf
 & " FROM Sales.vIndividualCustomer SIC " & vbCrLf
 & " WHERE 0=0 " & vbCrLf
 & IIf(Parameters!Country.Value<> "", " 		  AND SIC.CountryRegionCode = @Country", "") & vbcrlf 
 & IIf(Parameters!CustomerName.Value<> "", " 		  AND (SIC.FirstName  LIKE '%' + @CustomerName + '%' OR SIC.LastName  LIKE '%' + @CustomerName + '%')", "") & vbcrlf 
 & IIf(Parameters!PostalCode.Value<> "", " 		  AND SIC.PostalCode = @PostalCode", "") & vbcrlf 
 & IIf(Parameters!PhoneNumber.Value<> "", " 		  AND SIC.PhoneNumber LIKE '%' + @PhoneNumber + '%'", "") & vbcrlf
 & IIf(Parameters!PersonType.Value<> "", " 		  AND SIC.PersonTypeId = @PersonType", "") & vbcrlf
 & IIf(Parameters!AddressType.Value<> 0, " 		  AND SIC.AddressTypeId = @AddressType", "") & vbcrlf 
 & " ORDER BY SIC.FirstName  " & vbCrLf
 & ", SIC.LastName   " & vbCrLf
 & ", SIC.AddressLine1   " & vbCrLf

For each report parameter is defined a SQL parameter in the header with the appropriate data type and length, and a constraint within the WHERE clause. The parameter will be considered only if a valid value was provided. One has to take into consideration the type of the parameter. 

Testing the Report

If everything worked correctly the report will run without problems:

Testing the report

For testing the report it makes sense to choose the parameters one by one from the general to the more particular case. For example for a first test I would choose 'Germany' as Country, then in a second run i would choose one of the values from the resulting dataset, e.g. 'Arthur' as Customer Name, while in a thirst run I would pick a value for the Postal Code. This approach allows minimizing the volume of time required for testing. 

Troubleshooting

Errors in formulas are sometimes difficult to troubleshoot especially when the report refuses to run. The SQL errors are easier to troubleshoot into Management Studio (SSMS). Just copy the query in SSMS and replace the added formatting.

Errors into the formula require unfortunately more time, however most of the errors I had were related to the missing empty space, or forgetting to close a quote or incorrect formatting.

29 April 2019

Data Management: Data Integration (Part I: From Disintegration to Integration)

Data Management

No matter how tight the integration between the various systems or processes there will be always gaps that need to be addressed in one way or another. The problems are in general caused by design errors rooted in the complexity of the logic from the integration layer or from the systems integrated. The errors can range from missing or incorrect validation rules, mappings and parameters to data quality issues.

An unidirectional integration involves distributing data from one system (aka publisher) to one or more systems (aka subscribers), while in bidirectional integrations systems can act as publishers and subscribers, resulting thus complex data flows with multiple endpoints. In simplest integrations the records flow one-to-one between systems, though more complex scenarios can involve logic based on business rules, mappings and other type of transformations. The challenge is to reflect the states as needed by the system with minimal involvement from the users.

Typically it falls in application/process owners or key users’ responsibilities to make sure that the integration works smoothly. When the integration makes use of interface or staging tables they can be used as starting point for the troubleshooting, however even then the troubleshooting can be troublesome and involve a considerable manual effort. When possible the data can be exported manually from the various systems and matched in Excel or similar solutions. This leads often to personal or departmental solutions hard to maintain, control and support.

A better approach is to automatize the process by importing the data from the integrated systems at regular points in time into the same database (much like in a data warehouse), model the entities and the needed logic in there, and report the differences. Even if this approach involves a small investment in the beginning and some optimization in logic or performance over time, it can become a useful tool for troubleshooting the differences. Such solutions can be used successfully in multiple integration scenarios (e.g. web shop or ERP integrations).

A set of reports for each entity can help identify the differences between the various entities. Starting from the reported differences the users can identify, categorize and devise specific countermeasures for the various issues. The best time to have such a solution is shortly before or during UAT. This would allow to make sure that the integration layer really works, and helps correcting the issues as long they still have a small impact on the systems. Some integration issues might even lead to a postponement of the Go-Live. The second best time is during the time the first important issues were found, as the issues can be used as support for a Business Case for implementing this type of solutions.

In general, it’s recommended to fix the problems in the integration layer and use the reports only for troubleshooting and for assuring that the integration runs smoothly. There are however situations in which the integration problems can’t be fixed without creating more issues. It’s the case in which multiple systems are involved and integrated over an integration bus.

One extreme approach, not advisable though, is to build a second integration to correct the issues of the first. This solution might work in theory however there’s the risk of multiplying the issues is really high and the complexity of troubleshooting increases with the degree of dependency between the two integrations. It would be more advisable to rebuild the integration anew, however also this approach has its advantages and disadvantages.

Bottom line is that integration issues should be addressed while they are small and that an automated solution for comparing the data can help in the process

30 October 2018

SQL Server Troubleshooting: Login Failed for User

    Since the installation of an SQL Server 2017 on a virtual machine (VM) in the Microsoft Cloud started to appear in the error log records with the following message:

Login failed for user '<domain>\<computer>$'. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]
Error: 18456, Severity: 14, State: 5.


   From the text it seemed like a permission problem, thing confirmed by the documentation (see [1]), the Error Number and State correspond to a „User Id is not valid“ situation. In a first step I attempted to give permissions to the local account (dollar sign included). The account wasn’t found in the Active Directory (AD), though by typing the account directly in the “Login name” I managed to give temporarily sysadmin permission to the account. The error continued to appear in the error log. I looked then at the accounts under which the SQL Services run - nothing suspect in there.

   Except the error message, which was appearing with an alarming frequency (a few seconds apart), everything seemed to be working on the server. The volume of  records (a few hundred thousands over a few days) bloating the error log, as well the fact that I didn’t knew what’s going on made me take the time and further investigate the issue.

  Looking today at the Windows Logs for Applications I observed that the error is caused by an account used for the Microsoft SQL Server IaaS Agent and IaaS Query Service. Once I gave permissions to the account the error disappeared.

   The search for a best practice on what permissions to give to the IaaS Agent and IaaS Query Service lead me to [2]. To quote, the “Agent Service needs Local System rights to be able to install and configure SQL Server, attach disks and enable storage pool and manage automated security patching of Windows and SQL server”, while the “IaaS Query Service is started with an NT Service account which is a Sys Admin on the SQL Server”. In fact, this was the only resource I found that made a reference to the IaaS Query Service.

   This was just one of the many scenarios in which the above error appears. For more information see for example  [3], [4] or [5].

References:
[1] Microsoft (2017) MSSQLSERVER_18456 [Online] Available from: https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-18456-database-engine-error?view=sql-server-2017
[2] SQL Database Engine Blog (2018) SQL Server IaaS Extension Query Service for SQL Server on Azure VM, by Mine Tokus Altug [Online] Available from:  https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/10/25/sql-server-iaas-extension-query-service-for-sql-server-on-azure-vm/
[3] Microsoft Support (2018) "Login failed for user" error message when you log on to SQL Server [Online] Available from: https://support.microsoft.com/en-sg/help/555332/login-failed-for-user-error-message-when-you-log-on-to-sql-server
[4] Microsoft Technet (2018) How to Troubleshoot Connecting to the SQL Server Database [Online] Available from: Engine https://social.technet.microsoft.com/wiki/contents/articles/2102.how-to-troubleshoot-connecting-to-the-sql-server-database-engine.aspx 
[5] Microsoft Blogs (2011)Troubleshoot Connectivity/Login failures (18456 State x) with SQL Server, by Sakthivel Chidambaram [Online] Available from: https://blogs.msdn.microsoft.com/sqlsakthi/2011/02/06/troubleshoot-connectivitylogin-failures-18456-state-x-with-sql-server/

29 October 2018

SQL Server Troubleshooting: Searching the Error Log

    Searching for a needle in a haystack is an achievable task though may turn to be daunting. Same can be said about searching for a piece of information in the SQL error log. Fortunately, there is xp_readerrorlog, an undocumented (extended) stored procedure, which helps in the process. The stored procedure makes available the content of the error log and provides basic search capabilities via a small set of parameters. For example, it can be used to search for errors, warnings, failed backups, consistency checks, failed logins, databases instant file initializations, and so on. It helps identify whether an event occurred and the time at which the event occurred.

   The following are the parameter available with the stored procedure:

Parameter
Name
Type
Description
1FileToReadint0 = Current, 1 or 2, 3, … n Archive Number
2Logtypeint1 = SQL Error Log and 2 = SQL Agent log
3String1varchar(255)the string to match the logs on
4String2varchar(255)a second string to match in combination with String1 (AND)
5StartDatedatetimebeginning date to look from
6EndDatedatetimeending date to look up to
7ResultsOrderASC or DESC sorting


Note:
If the SQL Server Agent hasn’t been active, then there will be no Agent log and the call to the stored procedure will return an error.

   Here are a few examples of using the stored procedure:

-- listing the content of the current SQL Server error log
EXEC xp_readerrorlog 0, 1

-- listing the content of the second SQL Server error log
EXEC xp_readerrorlog 1, 1

-- listing the content of the current SQL Server Agent log
EXEC xp_readerrorlog 0, 2

-- searching for errors 
EXEC xp_readerrorlog 0, 1, N'error'

-- searching for errors that have to do with consistency checks
EXEC xp_readerrorlog 0, 1, N'error', N'CHECKDB'

-- searching for errors that have to do with consistency checks
EXEC xp_readerrorlog 0, 1, N'failed', N'backups'

-- searching for warnings 
EXEC xp_readerrorlog 0, 1, N'warning'

-- searching who killed a session
EXEC xp_readerrorlog 0, 1, N'kill'

-- searching for I/O information
EXEC xp_readerrorlog 0, 1, N'I/O'

-- searching for consistency checks 
EXEC xp_readerrorlog 0, 1, N'CHECKDB'

-- searching for consistency checks performed via DBCC
EXEC xp_readerrorlog 0, 1, N'DBCC CHECKDB'

-- searching for failed logins  
EXEC xp_readerrorlog 0, 1, N'Login failed'

-- searching for 
EXEC xp_readerrorlog 0, 1, N'[INFO]'

-- searching for shutdowns 
EXEC xp_readerrorlog 0, 1, N'shutdown'

-- searching for a database instant file initialization event  
EXEC xp_readerrorlog 0, 1, N'database instant file initialization'

   If the error log is too big it’s important to narrow the search for a given time interval:

-- searching for errors starting with a given date 
DECLARE @StartDate as Date = DateAdd(d, -1, GetDate())
EXEC xp_readerrorlog 0, 1, N'error', N'', @StartDate

-- searching for errors within a time interval 
DECLARE @StartDate as Date = DateAdd(d, -14, GetDate())
DECLARE @EndDate as Date = DateAdd(d, -7, GetDate())
EXEC xp_readerrorlog 0, 1, N'', N'', @StartDate, @EndDate, N'desc' 

   The output can be dumped into a table especially when is needed to perform a detailed analysis on the error log. It might be interesting to check how often an error message occurred, like in the below example. One can take thus advantage of more complex pattern searching.

-- creating the error log table 
CREATE TABLE dbo.ErrorLogMessages (
    LogDate datetime2(0) 
  , ProcessInfo nvarchar(255)
  , [Text] nvarchar(max))

-- loading the errors 
INSERT INTO dbo.ErrorLogMessages
EXEC xp_readerrorlog 0, 1

-- checking the results 
SELECT *
FROM dbo.ErrorLogMessages

-- checking messages frequency 
SELECT [Text]
, count(*) NoOccurrences
, Min(LogDate) FirstOccurrence
FROM dbo.ErrorLogMessages
GROUP BY [Text]
HAVING count(*)>1
ORDER BY NoOccurrences DESC

-- getting the errors and their information 
SELECT *
FROM (
 SELECT *
 , Lead([Text], 1) OVER (PARTITION BY LogDate, ProcessInfo ORDER BY LogDate) PrevMessage
 FROM dbo.ErrorLogMessages
 ) DAT
WHERE [Text] LIKE '%error:%[0-9]%'

-- cleaning up 
--DROP TABLE IF EXISTS dbo.ErrorLogMessages 

   For those who don’t have admin permissions it is necessary to explicitly give execute permissions on the xp_readerrorlog stored procedure:

-- giving explicit permissions to account
GRANT EXECUTE ON xp_readerrorlog TO [<account_name>]

   Personally, I’ve been using the stored procedure mainly to check whether error messages were logged for a given time interval and whether the consistency checks run without problems. Occasionally, I used it to check for failed logins or sessions terminations (aka kills).

Notes:
Microsoft warns that undocumented objects might change in future releases. Fortunately, xp_readerrorlog made it since SQL Server 2005 to SQL Server 2017, so it might make it further…
The above code was tested also on SQL Server 2017.

Happy coding!

22 October 2018

SQL Server Troubleshooting: Root Blocking Sessions


Unless special monitoring tools like Redgate’s SQL Monitor are available, when it comes to troubleshooting blocking sessions one can use the sp_who2 and sp_whoisactive stored procedures or the further available DMVs, which can prove useful upon case. In the case one just needs to find the root session that blocks other sessions, as it happens in poorly designed and/or heavily used databases, one can follow the chain of blockings provided by the sp_who2, though it requires time. Therefore, based on sp_who2 logic I put together a script that with the help of common tables expressions walks through the hierarchical chain to identify the root blocking session(s):

 ;WITH CTERequests
 AS ( -- taking a snapshot of requests  
	 SELECT ER.session_id  
	 , ER.request_id
	 , ER.blocking_session_id  
	 , ER.database_id   	  
	 , ER.status      
	 , ER.sql_handle   
	 , ER.command 
	 , ER.start_time
	 , ER.total_elapsed_time/(1000*1000) total_elapsed_time
	 , ER.cpu_time/(1000*1000) cpu_time
	 , ER.wait_type 
	 , ER.wait_time/(1000*1000) wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory/128.0 granted_query_memory_mb
	 FROM sys.dm_exec_requests ER with (nolock)
	 WHERE ER.session_id > 50user sessions 
 ),  
 CTE  
 AS ( -- building the tree  
	 SELECT ER.session_id  
	 , ER.request_id
	 , ER.blocking_session_id  
	 , ER.database_id 
	 , ER.status  
	 , ER.command   
	 , ER.sql_handle   
	 , Cast(NULL as varbinary(64)) blocking_sql_handle   
	 , 0 level  
	 , CAST(ER.session_id as nvarchar(max)) sessions_path 
	 , ER.start_time 
	 , ER.total_elapsed_time
	 , ER.cpu_time
	 , ER.wait_type 
	 , ER.wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory_mb
	 FROM CTERequests ER
	 -- WHERE ER.blocking_session_id <>0 -- only blocked sessions 
	 UNION ALL  
	 SELECT ER.session_id  
	 , ER.request_id
	 , BR.blocking_session_id  
	 , ER.database_id  
	 , ER.status  
	 , ER.command   
	 , ER.sql_handle   
	 , BR.sql_handle blocking_sql_handle   
	 , ER.level + 1 level  
	 , ER.sessions_path + '/' + Cast(BR.session_id as nvarchar(10)) sessions_path 
	 , ER.start_time
	 , ER.total_elapsed_time
	 , ER.cpu_time
	 , ER.wait_type 
	 , ER.wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory_mb
	 FROM CTERequests br  
		  JOIN CTE ER 
			ON BR.session_id = ER.blocking_session_id   
		   AND BR.session_id <> ER.session_id  
 )  
 -- the final query   
 SELECT CTE.session_id 
 , CTE.blocking_session_id   
 , CTE.request_id
 , DB_NAME(CTE.database_id) database_name  
 , CTE.status   
 , CTE.start_time
 , BES.last_request_end_time   
 , CTE.level   
 , CTE.sessions_path   
 , CTE.command   
 , CTE.sql_handle   
 , est.text sql_text  
 , CTE.blocking_sql_handle   
 , best.text blocking_sql_text 
 , CTE.total_elapsed_time
 , CTE.cpu_time
 , CTE.wait_type 
 , CTE.wait_time
 , CTE.reads 
 , CTE.writes 
 , CTE.granted_query_memory_mb
 , TSU.request_internal_objects_alloc_page_count/128.0 allocated_mb
 , TSU.request_internal_objects_dealloc_page_count/128.0 deallocated_mb 
 , ES.login_name   
 , ES.[host_name]  
 , BES.[host_name] blocking_host_name  
 FROM CTE 
      LEFT JOIN sys.dm_exec_sessions ES with (nolock)
        ON CTE.session_id = ES.session_id   
      LEFT JOIN sys.dm_exec_sessions BES with (nolock)
        ON CTE.blocking_session_id = BES.session_id        
      LEFT JOIN ( -- allocated space
	  SELECT session_id
	  , request_id
	  , SUM(internal_objects_alloc_page_count) AS request_internal_objects_alloc_page_count
	  , SUM(internal_objects_dealloc_page_count)AS request_internal_objects_dealloc_page_count 
	  FROM sys.dm_db_task_space_usage with (nolock)
	  WHERE session_id > 50 
	  GROUP BY session_id
	  , request_id
      ) TSU
        ON TSU.request_id = CTE.request_id
       AND TSU.session_id = CTE.session_id 
      OUTER APPLY sys.dm_exec_sql_text(CTE.sql_handle) as est  
      OUTER APPLY sys.dm_exec_sql_text(CTE.blocking_sql_handle) as best 
 WHERE CTE.level IN (SELECT MAX(B.level) -- longest path per session
                 FROM CTE B  
                 WHERE CTE.session_id = B.session_id  
                 GROUP by B.session_id)  
 ORDER BY CTE.session_id

Notes:
The script considers only the user sessions (session_id>50), however the constraint can be left out in order to get also the system sessions.
Over the years I discovered also 1-2 exceptions where the logic didn’t work as expected. Unfortunately, not having historical data made it difficult to find the cause.
Killing the root blocking session usually makes the other blockings disappear, however it’s not always the case.
There can multiple root blocking sessions as well.
For the meaning of the above attributes see the MSDN documentation: sys.dm_exec_requests, sys.dm_exec_sessions, sys.dm_db_task_space_usage, sys.dm_exec_sql_text.
Be careful when killing the sessions (see my other blog post)!

Happy coding!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.