Showing posts with label SQL Server Troubleshooting. Show all posts
Showing posts with label SQL Server Troubleshooting. Show all posts

30 October 2018

💠🛠️SQL Server: Administration (Troubleshooting Login Failed for User)

    Since the installation of an SQL Server 2017 on a virtual machine (VM) in the Microsoft Cloud started to appear in the error log records with the following message:

Login failed for user '<domain>\<computer>$'. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]
Error: 18456, Severity: 14, State: 5.


   From the text it seemed like a permission problem, thing confirmed by the documentation (see [1]), the Error Number and State correspond to a „User Id is not valid“ situation. In a first step I attempted to give permissions to the local account (dollar sign included). The account wasn’t found in the Active Directory (AD), though by typing the account directly in the “Login name” I managed to give temporarily sysadmin permission to the account. The error continued to appear in the error log. I looked then at the accounts under which the SQL Services run - nothing suspect in there.

   Except the error message, which was appearing with an alarming frequency (a few seconds apart), everything seemed to be working on the server. The volume of  records (a few hundred thousands over a few days) bloating the error log, as well the fact that I didn’t knew what’s going on made me take the time and further investigate the issue.

  Looking today at the Windows Logs for Applications I observed that the error is caused by an account used for the Microsoft SQL Server IaaS Agent and IaaS Query Service. Once I gave permissions to the account the error disappeared.

   The search for a best practice on what permissions to give to the IaaS Agent and IaaS Query Service lead me to [2]. To quote, the “Agent Service needs Local System rights to be able to install and configure SQL Server, attach disks and enable storage pool and manage automated security patching of Windows and SQL server”, while the “IaaS Query Service is started with an NT Service account which is a Sys Admin on the SQL Server”. In fact, this was the only resource I found that made a reference to the IaaS Query Service.

   This was just one of the many scenarios in which the above error appears. For more information see for example  [3], [4] or [5].

References:
[1] Microsoft (2017) MSSQLSERVER_18456 [Online] Available from: https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-18456-database-engine-error?view=sql-server-2017
[2] SQL Database Engine Blog (2018) SQL Server IaaS Extension Query Service for SQL Server on Azure VM, by Mine Tokus Altug [Online] Available from:  https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/10/25/sql-server-iaas-extension-query-service-for-sql-server-on-azure-vm/
[3] Microsoft Support (2018) "Login failed for user" error message when you log on to SQL Server [Online] Available from: https://support.microsoft.com/en-sg/help/555332/login-failed-for-user-error-message-when-you-log-on-to-sql-server
[4] Microsoft Technet (2018) How to Troubleshoot Connecting to the SQL Server Database [Online] Available from: Engine https://social.technet.microsoft.com/wiki/contents/articles/2102.how-to-troubleshoot-connecting-to-the-sql-server-database-engine.aspx 
[5] Microsoft Blogs (2011)Troubleshoot Connectivity/Login failures (18456 State x) with SQL Server, by Sakthivel Chidambaram [Online] Available from: https://blogs.msdn.microsoft.com/sqlsakthi/2011/02/06/troubleshoot-connectivitylogin-failures-18456-state-x-with-sql-server/

29 October 2018

💠🛠️SQL Server: Administration (Searching the Error Log)

    Searching for a needle in a haystack is an achievable task though may turn to be daunting. Same can be said about searching for a piece of information in the SQL error log. Fortunately, there is xp_readerrorlog, an undocumented (extended) stored procedure, which helps in the process. The stored procedure makes available the content of the error log and provides basic search capabilities via a small set of parameters. For example, it can be used to search for errors, warnings, failed backups, consistency checks, failed logins, databases instant file initializations, and so on. It helps identify whether an event occurred and the time at which the event occurred.

   The following are the parameter available with the stored procedure:

Parameter
Name
Type
Description
1FileToReadint0 = Current, 1 or 2, 3, … n Archive Number
2Logtypeint1 = SQL Error Log and 2 = SQL Agent log
3String1varchar(255)the string to match the logs on
4String2varchar(255)a second string to match in combination with String1 (AND)
5StartDatedatetimebeginning date to look from
6EndDatedatetimeending date to look up to
7ResultsOrderASC or DESC sorting


Note:
If the SQL Server Agent hasn’t been active, then there will be no Agent log and the call to the stored procedure will return an error.

   Here are a few examples of using the stored procedure:

-- listing the content of the current SQL Server error log
EXEC xp_readerrorlog 0, 1

-- listing the content of the second SQL Server error log
EXEC xp_readerrorlog 1, 1

-- listing the content of the current SQL Server Agent log
EXEC xp_readerrorlog 0, 2

-- searching for errors 
EXEC xp_readerrorlog 0, 1, N'error'

-- searching for errors that have to do with consistency checks
EXEC xp_readerrorlog 0, 1, N'error', N'CHECKDB'

-- searching for errors that have to do with consistency checks
EXEC xp_readerrorlog 0, 1, N'failed', N'backups'

-- searching for warnings 
EXEC xp_readerrorlog 0, 1, N'warning'

-- searching who killed a session
EXEC xp_readerrorlog 0, 1, N'kill'

-- searching for I/O information
EXEC xp_readerrorlog 0, 1, N'I/O'

-- searching for consistency checks 
EXEC xp_readerrorlog 0, 1, N'CHECKDB'

-- searching for consistency checks performed via DBCC
EXEC xp_readerrorlog 0, 1, N'DBCC CHECKDB'

-- searching for failed logins  
EXEC xp_readerrorlog 0, 1, N'Login failed'

-- searching for 
EXEC xp_readerrorlog 0, 1, N'[INFO]'

-- searching for shutdowns 
EXEC xp_readerrorlog 0, 1, N'shutdown'

-- searching for a database instant file initialization event  
EXEC xp_readerrorlog 0, 1, N'database instant file initialization'

   If the error log is too big it’s important to narrow the search for a given time interval:

-- searching for errors starting with a given date 
DECLARE @StartDate as Date = DateAdd(d, -1, GetDate())
EXEC xp_readerrorlog 0, 1, N'error', N'', @StartDate

-- searching for errors within a time interval 
DECLARE @StartDate as Date = DateAdd(d, -14, GetDate())
DECLARE @EndDate as Date = DateAdd(d, -7, GetDate())
EXEC xp_readerrorlog 0, 1, N'', N'', @StartDate, @EndDate, N'desc' 

   The output can be dumped into a table especially when is needed to perform a detailed analysis on the error log. It might be interesting to check how often an error message occurred, like in the below example. One can take thus advantage of more complex pattern searching.

-- creating the error log table 
CREATE TABLE dbo.ErrorLogMessages (
    LogDate datetime2(0) 
  , ProcessInfo nvarchar(255)
  , [Text] nvarchar(max))

-- loading the errors 
INSERT INTO dbo.ErrorLogMessages
EXEC xp_readerrorlog 0, 1

-- checking the results 
SELECT *
FROM dbo.ErrorLogMessages

-- checking messages frequency 
SELECT [Text]
, count(*) NoOccurrences
, Min(LogDate) FirstOccurrence
FROM dbo.ErrorLogMessages
GROUP BY [Text]
HAVING count(*)>1
ORDER BY NoOccurrences DESC

-- getting the errors and their information 
SELECT *
FROM (
 SELECT *
 , Lead([Text], 1) OVER (PARTITION BY LogDate, ProcessInfo ORDER BY LogDate) PrevMessage
 FROM dbo.ErrorLogMessages
 ) DAT
WHERE [Text] LIKE '%error:%[0-9]%'

-- cleaning up 
--DROP TABLE IF EXISTS dbo.ErrorLogMessages 

   For those who don’t have admin permissions it is necessary to explicitly give execute permissions on the xp_readerrorlog stored procedure:

-- giving explicit permissions to account
GRANT EXECUTE ON xp_readerrorlog TO [<account_name>]

   Personally, I’ve been using the stored procedure mainly to check whether error messages were logged for a given time interval and whether the consistency checks run without problems. Occasionally, I used it to check for failed logins or sessions terminations (aka kills).

Notes:
Microsoft warns that undocumented objects might change in future releases. Fortunately, xp_readerrorlog made it since SQL Server 2005 to SQL Server 2017, so it might make it further…
The above code was tested also on SQL Server 2017.

Happy coding!

22 October 2018

💠🛠️SQL Server Administration: Troubleshooting Root Blocking Sessions


Unless special monitoring tools like Redgate’s SQL Monitor are available, when it comes to troubleshooting blocking sessions one can use the sp_who2 and sp_whoisactive stored procedures or the further available DMVs, which can prove useful upon case. In the case one just needs to find the root session that blocks other sessions, as it happens in poorly designed and/or heavily used databases, one can follow the chain of blockings provided by the sp_who2, though it requires time. Therefore, based on sp_who2 logic I put together a script that with the help of common tables expressions walks through the hierarchical chain to identify the root blocking session(s):

 ;WITH CTERequests
 AS ( -- taking a snapshot of requests  
	 SELECT ER.session_id  
	 , ER.request_id
	 , ER.blocking_session_id  
	 , ER.database_id   	  
	 , ER.status      
	 , ER.sql_handle   
	 , ER.command 
	 , ER.start_time
	 , ER.total_elapsed_time/(1000*1000) total_elapsed_time
	 , ER.cpu_time/(1000*1000) cpu_time
	 , ER.wait_type 
	 , ER.wait_time/(1000*1000) wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory/128.0 granted_query_memory_mb
	 FROM sys.dm_exec_requests ER with (nolock)
	 WHERE ER.session_id > 50user sessions 
 ),  
 CTE  
 AS ( -- building the tree  
	 SELECT ER.session_id  
	 , ER.request_id
	 , ER.blocking_session_id  
	 , ER.database_id 
	 , ER.status  
	 , ER.command   
	 , ER.sql_handle   
	 , Cast(NULL as varbinary(64)) blocking_sql_handle   
	 , 0 level  
	 , CAST(ER.session_id as nvarchar(max)) sessions_path 
	 , ER.start_time 
	 , ER.total_elapsed_time
	 , ER.cpu_time
	 , ER.wait_type 
	 , ER.wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory_mb
	 FROM CTERequests ER
	 -- WHERE ER.blocking_session_id <>0 -- only blocked sessions 
	 UNION ALL  
	 SELECT ER.session_id  
	 , ER.request_id
	 , BR.blocking_session_id  
	 , ER.database_id  
	 , ER.status  
	 , ER.command   
	 , ER.sql_handle   
	 , BR.sql_handle blocking_sql_handle   
	 , ER.level + 1 level  
	 , ER.sessions_path + '/' + Cast(BR.session_id as nvarchar(10)) sessions_path 
	 , ER.start_time
	 , ER.total_elapsed_time
	 , ER.cpu_time
	 , ER.wait_type 
	 , ER.wait_time
	 , ER.reads 
	 , ER.writes 
	 , ER.granted_query_memory_mb
	 FROM CTERequests br  
		  JOIN CTE ER 
			ON BR.session_id = ER.blocking_session_id   
		   AND BR.session_id <> ER.session_id  
 )  
 -- the final query   
 SELECT CTE.session_id 
 , CTE.blocking_session_id   
 , CTE.request_id
 , DB_NAME(CTE.database_id) database_name  
 , CTE.status   
 , CTE.start_time
 , BES.last_request_end_time   
 , CTE.level   
 , CTE.sessions_path   
 , CTE.command   
 , CTE.sql_handle   
 , est.text sql_text  
 , CTE.blocking_sql_handle   
 , best.text blocking_sql_text 
 , CTE.total_elapsed_time
 , CTE.cpu_time
 , CTE.wait_type 
 , CTE.wait_time
 , CTE.reads 
 , CTE.writes 
 , CTE.granted_query_memory_mb
 , TSU.request_internal_objects_alloc_page_count/128.0 allocated_mb
 , TSU.request_internal_objects_dealloc_page_count/128.0 deallocated_mb 
 , ES.login_name   
 , ES.[host_name]  
 , BES.[host_name] blocking_host_name  
 FROM CTE 
      LEFT JOIN sys.dm_exec_sessions ES with (nolock)
        ON CTE.session_id = ES.session_id   
      LEFT JOIN sys.dm_exec_sessions BES with (nolock)
        ON CTE.blocking_session_id = BES.session_id        
      LEFT JOIN ( -- allocated space
	  SELECT session_id
	  , request_id
	  , SUM(internal_objects_alloc_page_count) AS request_internal_objects_alloc_page_count
	  , SUM(internal_objects_dealloc_page_count)AS request_internal_objects_dealloc_page_count 
	  FROM sys.dm_db_task_space_usage with (nolock)
	  WHERE session_id > 50 
	  GROUP BY session_id
	  , request_id
      ) TSU
        ON TSU.request_id = CTE.request_id
       AND TSU.session_id = CTE.session_id 
      OUTER APPLY sys.dm_exec_sql_text(CTE.sql_handle) as est  
      OUTER APPLY sys.dm_exec_sql_text(CTE.blocking_sql_handle) as best 
 WHERE CTE.level IN (SELECT MAX(B.level) -- longest path per session
                 FROM CTE B  
                 WHERE CTE.session_id = B.session_id  
                 GROUP by B.session_id)  
 ORDER BY CTE.session_id

Notes:
The script considers only the user sessions (session_id>50), however the constraint can be left out in order to get also the system sessions.
Over the years I discovered also 1-2 exceptions where the logic didn’t work as expected. Unfortunately, not having historical data made it difficult to find the cause.
Killing the root blocking session usually makes the other blockings disappear, however it’s not always the case.
There can multiple root blocking sessions as well.
For the meaning of the above attributes see the MSDN documentation: sys.dm_exec_requests, sys.dm_exec_sessions, sys.dm_db_task_space_usage, sys.dm_exec_sql_text.
Be careful when killing the sessions (see my other blog post)!

Happy coding!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.