Disaster Recovery & Planning

Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity. While business continuity involves planning for keeping all aspects of a business functioning in the midst of disruptive events, disaster recovery focuses on the IT or technology systems that support business functions.


Control measures in recovery plan
Control measures are steps or mechanisms that can reduce or eliminate various threats for organizations. Different types of measures can be included in BCP/DRP.

Disaster recovery planning is a subset of a larger process known as business continuity planning and should include planning for resumption of applications, data, hardware, communications (such as networking) and other IT infrastructure. A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity. This article focuses on disaster recovery planning as related to IT infrastructure. Types of measures:

    Preventive measures - These controls are aimed at preventing an event from occurring.
    Detective measures - These controls are aimed at detecting or discovering unwanted events.
    Corrective measures - These controls are aimed at correcting or restoring the system after disaster or event.

These controls should be always documented and tested regularly.


Strategies
Prior to selecting a disaster recovery strategy, a disaster recovery planner should refer to their organization's business continuity plan which should indicate the key metrics of recovery point objective (RPO) and recovery time objective (RTO) for various business processes (such as the process to run payroll, generate an order, etc.). The metrics specified for the business processes must then be mapped to the underlying IT systems and infrastructure that support those processes.

Once the RTO and RPO metrics have been mapped to IT infrastructure, the DR planner can determine the most suitable recovery strategy for each system. An important note here however is that the business ultimately sets the IT budget and therefore the RTO and RPO metrics need to fit with the available budget. While most business unit heads would like zero data loss and zero time loss, the cost associated with that level of protection may make the desired high availability solutions impractical.

The following is a list of the most common strategies for data protection.

    Backups made to tape and sent off-site at regular intervals
    Backups made to disk on-site and automatically copied to off-site disk, or made directly to off-site disk
    Replication of data to an off-site location, which overcomes the need to restore the data (only the systems then need to be restored or synchronized). This generally makes use of storage area network (SAN) technology
    High availability systems which keep both the data and system replicated off-site, enabling continuous access to systems and data


In many cases, an organization may elect to use an outsourced disaster recovery provider to provide a stand-by site and systems rather than using their own remote facilities.

In addition to preparing for the need to recover systems, organizations must also implement precautionary measures with an objective of preventing a disaster in the first place. These may include some of the following:

    Local mirrors of systems and/or data and use of disk protection technology such as RAID
    Surge protectors — to minimize the effect of power surges on delicate electronic equipment
    Uninterruptible power supply (UPS) and/or backup generator to keep systems going in the event of a power failure
    Fire preventions — alarms, fire extinguishers
    Anti-virus software and other security measures