Showing posts with label recovery. Show all posts
Showing posts with label recovery. Show all posts

18 June 2017

SQL Server Administration: Database Recovery on SQL Server 2017

I installed today SQL Server 2017 CTP 2.1 on my Lab PC without any apparent problems. It was time to recreate some of the databases I used for testing. As previously I had an evaluation version of SQL Server 2016, it expired without having a backup for one of the databases. I could recreate the database from scripts and reload the data from various text files. This would have been a relatively laborious task (estimated time > 1 hour), though the chances were pretty high that everything would go smoothly. As the database is relatively small (about 2 GB) and possible data loss was neglectable, I thought it would be possible to recover the data from the database with minimal loss in less than half of hour. I knew this was possible, as I was forced a few times in the past to recover data from damaged databases in SQL Server 2005, 2008 and 2012 environments, though being in a new environment I wasn’t sure how smooth will go and how long it would take.  

Plan A - Create the database with  ATTACH_REBUILD_LOG option:

As it seems the option is available in SQL Server 2017, so I attempted to create the database via the following script:
 
CREATE DATABASE  ON 
(FILENAME='I:\Data\.mdf') 
FOR ATTACH_REBUILD_LOG 

And as expected I run into the first error:
Msg 5120, Level 16, State 101, Line 1
Unable to open the physical file "I:\Data\.mdf". Operating system error 5: "5(Access is denied.)".
Msg 1802, Level 16, State 7, Line 1 CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

It looked like a permissions problem, though I wasn’t entirely sure which account is causing the problem. In the past I had problems with the Administrator account, so it was the first thing to try. Once I removed the permissions for Administrator account to the folder containing the database and gave it full control permissions again, I tried to create the database anew using the above script, running into the next error:

File activation failure. The physical file name "D:\Logs\_log.ldf" may be incorrect. The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost due to a hardware or environment failure.
Msg 1813, Level 16, State 2, Line 1 Could not open new database ''. CREATE DATABASE is aborted.

This approach seemed to lead nowhere, so it was time for Plan B.

Plan B - Recover the database into an empty database with the same name:

Step 1: Create a new database with the same name, stop the SQL Server, then copy the old file over the new file, and delete the new log file manually. Then restarted the server. After the restart the database will appear in Management Studio with the SUSPECT state.

Step 2:
Set the database in EMERGENCY mode:

ALTER DATABASE  SET EMERGENCY, SINGLE_USER

Step 3:
Rebuild the log file:

ALTER DATABASE <database_name> 
REBUILD LOG ON (Name=_Log', 
FileName='D:\Logs\.ldf')

The rebuild worked without problems.

Step 4: Set the database in MULTI_USER mode:

ALTER DATABASE  SET MULTI_USER 

Step 5:
Perform a consistency check:

DBCC CHECKDB () WITH ALL_ERRORMSGS, NO_INFOMSG 

After 15 minutes of work the database was back online.

Warnings:
Always attempt to recover the data for production databases from the backup files! Use the above steps only if there is no other alternative!
The consistency check might return errors. In this case one might need to run CHECKDB with REPAIR_ALLOW_DATA_LOSS several times [2], until the database was repaired.
After recovery there can be problems with the user access. It might be needed to delete the users from the recovered database and reassign their permissions!  

Resources:
[1] In Recovery (2008) Creating, detaching, re-attaching, and fixing a SUSPECT database, by Paul S Randal [Online] Available from: https://www.sqlskills.com/blogs/paul/creating-detaching-re-attaching-and-fixing-a-suspect-database/ 
[2] In Recovery (2009) Misconceptions around database repair, by Paul S Randal [Online] Available from: https://www.sqlskills.com/blogs/paul/misconceptions-around-database-repair/
[3] Microsoft Blogs (2013) Recovering from Log File Corruption, by Glen Small [Online] Available from: https://blogs.msdn.microsoft.com/glsmall/2013/11/14/recovering-from-log-file-corruption/

11 June 2016

Strategic Management: Resilience (Definitions)

"The ability to recover from challenges or to overcome obstacles. In a social-ecological context this refers to the innovation capacity of the organization to successfully address societal and environmental challenges." (Rick Edgeman & Jacob Eskildsen, "Social-Ecological Innovation", 2014)

"The quality of being able to absorb systemic 'shocks' without being destroyed even if recovery produces an altered state to that of the status quo ante." (Philip Cooke, "Regional Innovation Systems in Centralised States: Challenges, Chances, and Crossovers", 2015)

"The ability of an organization to quickly adapt to disruptions while maintaining continuous business operations and safeguarding people, assets, and overall brand equity. Business resilience goes a step beyond disaster recovery, by offering post-disaster strategies to avoid costly downtime, shore up vulnerabilities, and maintain business operations in the face of additional, unexpected breaches." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A capability to anticipate, prepare for, respond to, and recover from significant multi-hazard threats with minimum damage to social well-being, the economy, and the environment." (Carolyn N Stevenson, "Addressing the Sustainable Development Goals Through Environmental Education", 2019)

"The ability of a project to readily resume from unexpected events, threats or actions." (Phil Crosby, "Shaping Mega-Science Projects and Practical Steps for Success", 2019)

"The ability of an infrastructure to resist, respond and overcome adverse events" (Konstantinos Apostolou et al, "Business Continuity of Critical Infrastructures for Safety and Security Incidents", 2020)

"The capacity to respond to, adapt and learn from stressors and changing conditions." (Naomi Borg & Nader Naderpajouh, "Strategies for Business Sustainability in a Collaborative Economy", 2020)

"The word resilience refers to the ability to overcome critical moments and adapt after experiencing some unusual and unexpected situation. It also indicates return to normal." (José G Vargas-Hernández, "Urban Socio-Ecosystems Green Resilience", 2021)

"Operational resilience is a set of techniques that allow people, processes and informational systems to adapt to changing patterns. It is the ability to alter operations in the face of changing business conditions. Operationally resilient enterprises have the organizational competencies to ramp up or slow down operations in a way that provides a competitive edge and enables quick and local process modification." (Gartner)

[Operational resilience:] "The ability of an organization to absorb the impact of any unexpected event without failing to deliver on its brand promise." (Forrester)

[Business resilience:] "The ability to thrive in the face of unpredictable events and circumstances without deteriorating customer experience or sacrificing the long-term viability of the company." (Forrester)

08 April 2016

Strategic Management: Disaster Recovery Plan (Definitions)

"A plan that establishes technical and organizational measures in order to face events or incidents with potentially huge impact that could even lead to the unavailability of data centers. The DRP development defines and ensures IT emergency procedures that intervene and protect the data relevant for the company activities and services. DRP is usually considered as the only part of the BCP in banking business continuity initiatives." (Vincenzo Morabito & Gianluigi Viscusi, "Information Technology Business Continuity", 2009)

"Generally a plan for enabling an organization to move to alternate system, network, and operational facilities in the event of an incident making the primary facilities unusable." (C Warren Axelrod, "Responsibilities and Liabilities with Respect to Catastrophes", 2009)

"A contingency plan that goes into effect after a full disaster occurs, used to reestablish basic capabilities and resources." (Annetta Cortez & Bob Yehling, "The Complete Idiot's Guide® To Risk Management", 2010)

"A written plan that explains how a company will recover its IT operations after a natural or man-made disaster that causes data or hardware loss." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"A plan developed to help a company recover from a disaster. It provides procedures for emergency response, extended backup operations, and post-disaster recovery when an organization suffers a loss of computer processing capability or resources and physical facilities." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"Plans that document the steps you can take to replace damaged or destroyed components due to a disaster to restore the integrity of your IT infrastructure. " (Weiss, "Auditing IT Infrastructures for Compliance" 2nd Ed., 2015)

"A written plan for processing critical applications in the event of a major hardware or software failure or destruction of facilities." (NIST SP 800-82 Rev. 2)

"A written plan for recovering one or more information systems at an alternate facility in response to a major hardware or software failure or destruction of facilities." (NIST SP 800-34 Rev. 1)

"Management policy and procedures used to guide an enterprise response to a major loss of enterprise capability or damage to its facilities. The DRP is the second plan needed by the enterprise risk managers and is used when the enterprise must recover (at its original facilities) from a loss of capability over a period of hours or days." (CNSSI 4009-2015)

07 April 2016

Strategic Management: Disaster Recovery (Definitions)

"The ability of an organization to respond to a disaster or an interruption in services by implementing a disaster recovery plan to stabilize and restore the organization’s critical functions." (Disaster Recovery Journal & DRI, 2007)

"A process that is required after a major business disruption caused by the occurrence of a disaster." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"The process of regaining access to data, hardware, or software after a computer based human or natural disaster." (Dwayne Stevens & David T Green, "A Strategy for Enterprise VoIP Security", 2009)

"This is a process that describes how to recover the IT environment after a disaster such as a fire destroying the IT building." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"the ability of an infrastructure to resume operations after a disaster. Disaster Recovery differentiates from Business Continuity Planning in that Disaster Recovery is primarily associated with resources and facilities, while BCP is primarily associated with processes." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook" 3rd Ed., 2012)

"The coordinated activity to enable the recovery of IT (and other) systems due to a disruption." (Sally-Anne Pitt, "Internal Audit Quality", 2014)

"The planning, preparation, and testing set of activities used to help a business plan for and recover from any major business interruption and to resume normal business operations." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"the process adopted by the IT organization in order to bring systems back up and running." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"An area of security planning that aims to protect an organization from the effects of significant negative events. DR allows an organization to maintain or quickly resume mission-critical functions following a disaster." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The planning for and/or the implementation of a strategy to respond to such failures as a total infrastructure loss, or the failure of computers (CommServe server, MediaAgent, client, or application), networks, storage hardware, or media. A disaster recovery strategy typically involves the creation and maintenance of a secure disaster recovery site, and the day-to-day tasks of running regular disaster recovery backups." (CommVault, "Documentation 11.20", 2018)

"Is an organization's method of regaining access and functionality to its IT infrastructure, to continue the delivery of services that support business processes, after a disruptive incident." (Nelson Russo & Leonilde Reis, "Methodological Approach to Systematization of Business Continuity in Organizations", 2021)

10 February 2016

Strategic Management: Recovery Time Objective (RTO)

"Following a disaster, the amount of time that a system may be offline before it must be up and running." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"The period of time within which systems, applications, or functions must be recovered after an outage (e.g., one business day). RTOs are often used as the basis for the development of recovery strategies, and as a determinant as to whether or not to implement the recovery strategies during a disaster situation." (Disaster Recovery Journal & DRI, 2007)

"This is a measure indicating how quickly after an outage IT infrastructure needs to be recovered to continue operations. The smaller the number, the quicker the solution must be able to be recovered." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"The intent to recover lost applications, within specific time limitations, to assure a certain level of operational continuity. Expresses the amount of time a business will tolerate the computing system (hardware, software, services) to be offline." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An expression of the amount of time a business will tolerate the computing system (hardware, software, DBMS, services) to be offline." (Craig S Mullins, "Database Administration", 2012)

"in disaster recovery planning, the expected amount of time between the disaster, and when services are restored." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook" 3rd Ed., 2012)

"In disaster recovery planning, the total time one can allow for their systems to be offline." (IBM, "Informix Servers 12.1", 2014)

"The earliest time period and a service level within which a business process must be restored after a disaster to avoid unacceptable consequences." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"The target time set for resumption of product, service, or activity delivery after an incident. It is the maximum allowable downtime that can occur without severely impacting the recovery of operations or the time in which systems, applications, or business functions must be recovered after an outage (for example, the point in time at which a process can no longer be inoperable)." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

22 February 2014

Systems Engineering: Resilience (Definitions)

"The ability of a system, community, or society exposed to hazards to resist, absorb, accommodate to and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions." (ISDR, 2009)

"The quality of being able to absorb systemic 'shocks' without being destroyed even if recovery produces an altered state to that of the status quo ante." (Philip Cooke, "Regional Innovation Systems in Centralised States: Challenges, Chances, and Crossovers", 2015)

"A swarm is resilient if the loss of individual agents has little impact on the success of the task of the swarm." (Thalia M Laing et al, "Security in Swarm Robotics", 2016)

"Resilience is the capacity of organism or system to withstand stress and catastrophe." (Sunil L Londhe, "Climate Change and Agriculture: Impacts, Adoption, and Mitigation", 2016)

"System resilience is an ability of the system to withstand a major disruption within acceptable degradation parameters and to recover within an acceptable time." (Denis Čaleta, "Cyber Threats to Critical Infrastructure Protection: Public Private Aspects of Resilience", 2016) 

"The capacity for self-organization, and to adapt to impact factors." (Ahmed Karmaoui, Environmental Vulnerability to Climate Change in Mediterranean Basin: Socio-Ecological Interactions between North and South, 2016)

"The capacity of ecosystem to absorb disturbance, reorganize and return to an equilibrium or steady-state while undergoing some change or perturbation so that still retain essentially the same function, structure, identity, and feedbacks." (Susmita Lahiri et al, "Role of Microbes in Eco-Remediation of Perturbed Aquatic Ecosystem", 2017)

"A capability to anticipate, prepare for, respond to, and recover from significant multi-hazard threats with minimum damage to social well-being, the economy, and the environment." (Carolyn N Stevenson, "Addressing the Sustainable Development Goals Through Environmental Education", 2019)

"The conventional understanding of resilience applied to socioeconomic studies regards the bouncing-back ability of a socioeconomic system to recover from a shock or disruption. Today resilience is being influenced by an evolutionary perspective, underlining it as the bouncing-forward ability of the system to undergo anticipatory or reactionary reorganization to minimize the impact of destabilizing shocks and create new growth trajectories." (Hugo Pinto & André Guerreiro, "Resilience, Innovation, and Knowledge Transfer: Conceptual Considerations and Future Research Directions", 2019)

"Is the system capacity to rebalance after a perturbation." (Ahmed Karmaoui et al, "Composite Indicators as Decision Support Method for Flood Analysis: Flood Vulnerability Index Category", 2020)

"The ability of human or natural systems to cope with adverse events and be able to effect a quick recovery." (Maria F Casado-Claro, "Fostering Resilience by Empowering Entrepreneurs and Small Businesses in Local Communities in Post-Disaster Scenarios", 2021)

"The word resilience refers to the ability to overcome critical moments and adapt after experiencing some unusual and unexpected situation. It also indicates return to normal." (José G Vargas-Hernández, "Urban Socio-Ecosystems Green Resilience", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.