Business Office

DRAPE (Business Continuity)       PMBP_DRA

Disaster Recovery Activities Planning & Execution

The DRAPE process manages business continuity by planning appropriate action for Disaster Recovery to contain a disaster’s impact and to initiate and track appropriate recovery activities. Instead of just focusing on project risk, DRAPE is focused on the operational risk of deploying applications, as well as the assessment of what needs to be done in the event of a disaster to restore the functionality to an operational (backup) state. Any change in process needs an assessment of business interruption exposure to identify the response processes that restore operations, using TaskMaster to plan and track effort executed in the correct prerequisite order, with the ability to monitor and report on the recovery status. This course is an extension of the TaskMaster scheduling capability described in:

TaskMaster Scheduling

The unique capabilities of TaskMaster-DRAPE include the use of “what if” logic to drill-down on the level of destruction and actions to be taken to restore operations in a controlled order. Disaster recovery has its own unique mindset in that to achieve faster results the client usually has to spend a lot more than a “business as usual” mindset is prepared to invest. With simulation you can both prepare for the disaster recovery, but also evaluate the cost impact of different scenarios to assess what stakeholders can accept during a more or less serious outage – this is a serious tool for evaluating tradeoffs as well as to manage the actions required for an orderly disaster recovery, and its use as a planning and scheduling tool has advantages that people are familiar with using the tool and are more adept at managing the recovery.

What makes DRAPE unique is that it employs the power of TaskMaster to manage the total recovery process from a laptop: there is no need for special infrastructure to accommodate the DR needs, and copies of the system can be carried by members of a designated DR organization to minimize the risk that the DRAPE facilities might not be available when absolutely needed.

Independent of using the TaskMaster capabilities the most important part of DRAPE is business continuity planning by preventing the adverse impact of disaster events on the survival of the business.

Personal Safety

Regardless of the TaskMaster capabilities for managing a disaster it is important to stress that the underlying methodology emphasizes that the safety of individuals trumps everything in the preparation process. It is easy to replace equipment – you cannot replace people list due to the disaster event. The focus of what we explore will be on material things that can be replaced, and the process by which we ensure replacement is a viable option.

Step 1: Risk Identification

Risks are things that can happen and that may or may not affect our ability to continue to provide a service. Risks are not specific to our operations. In fact, we will expect many more risks to be identified than we have to compensate for, to provide an audit trail of our analysis efforts. If we can mitigate risks so that no further exposures exist it no longer becomes an issue. For example, we carry a spare tire to eliminate the exposure of a flat tire in the parking lot, but we cannot eliminate the exposure against a blowout simply by carrying a spare tire. The latter then needs to be further analyzed.

Searching for potential risks is not a matter of fear or uncertainty. It is a matter of educating both yourself and your peers of what sort of things to be prepared for, and how to respond in case an event occurs that requires you to deal with a major business interruption. The only way you can prepare for possible risks is to recognize the existence of these risks. What is more, this is only a small sampling of potential risks that seem to be somewhat more likely to occur.

At the earliest stages you will want to create an outline of what the Business Continuity Plan for each site looks like. A detailed outline is provided in this course. At this early stage the intent is simply to attest to the fact that we have a BCP initiative underway, and we can identify the people currently working on the BCP. It is important to focus on priorities that make the difference in a disaster situation. From the start you have to establish the credibility of the BCP effort by demonstrating that you are not chasing windmills.

Step 2: Loss Exposure & Consequences

An exposure assessment explains how selected risks may impact on our operations. The exposures are what we really want to track from a risk management point of view. We need to know the frequency and likelihood of exposures in order to establish a statistical value of potential costs due to these exposures. The intent is to then prioritize the exposures in order to deal with the major and/or most likely disruptions. The purpose of the exposures inventory is to demonstrate that all exposures were taken into account.

The next step is to determine how likely it is that the business interruption will occur as a result of a particular exposure. This is often a subjective assessment, and it will vary from site to site. You can use a relative scale of 1 – 5, but it is best to put a more definitive measure on a risk that expresses is like “0.01%/year” for example. We understand that the risk could occur in any year, not after 99 years, but it puts the risk into a clearer perspective. Why is an up-front assessment such an important first step? The U.K. National Audit Office states that 80% of all companies that undergo a major fire will never actually recover.

The analysis process can be facilitated using a simple exposures calculator system that we impose on the TaskMaster scheduling tool. This is a basic Excel™ workbook in which each worksheet can show a different part of the exposure analysis results (consistent with our suggestion to break up large parts of the operation into its component parts that can be restored separately). A typical breakdown is one location, or even down to one floor if practical, to prepare for a future disaster recovery schedule. With the worksheet we can simulate the effect of outages of key components in the process so that we can calculate the relative cost for each contributing factor that we want to consider.

Step 3: Impact Analysis

As a result of actual exposures, there will be potential consequences to your operations. This is not simply a cost analysis: there may be people’s lives involved in a number of the potential risks. Consequences set the priority for further analysis, because if the consequences are minor it is generally cheaper to live with these consequences. More to the point of the planning exercise, we need to focus on those consequences that require a specific mitigation plan, and explain why the other exposures will not be mitigated. It is important to acknowledge that simply implementing a response process cannot mitigate all consequences, and that consequences may be itemized not only for financial exposure reasons but also for personal injury exposure reasons. It is difficult to estimate the value of an injury, so it will be assumed that all such exposures will lead to a prevention analysis by default. Therefore, only the financial exposures will be further examined.

Determining consequences and how to deal with the consequences sounds easier than it is. People view the potential consequences differently and, as illustrated, you may only have little pieces of solid ground to depend on as you try to make your way to building a “DRAPE” case. The most difficult task is to ferret out those events that merit further attention and effort to establish mitigation plans, knowing full well that there is always a risk that the event that seemed unlikely in theory is the one that ends up sending an unsinkable ship to the bottom. It is also very difficult to determine the benefit of risk mitigation.

An impact analysis of all business units that are part of a common environment enables the BCP team to identify critical systems, processes, and functions, so that the appropriate mitigation can be put in place. By looking at the total picture you can develop a much more accurate picture of anticipated economic impact of incidents and disasters. The consequences are that you must establish how you are going to mitigate the problems identified in step 2 by considering priorities based on the overall business needs of the organization. You need to get an accurate picture of how long different business units can live without critical systems in order to consider how you can restore the most critical systems first, keeping in mind that many recovery efforts will be asynchronous initiatives. The exposures assessment results in a profile of recovery requirements that form the basis for considering alternative recovery strategies.

Step 4: Prevention

This is an opportunities evaluation, to see how various consequences can be mitigated. This is the core of the Business Continuation Planning efforts, to see what can be done and how we can minimize the probability of an event occurring, as well as to determine what needs to be done should an event occur anyway. These follow-up activities will be detailed in steps 5 and 6 with prevention projects and action plans for cases where the impact is significant. The purpose of an impact analysis is to confirm that a recommended solution will work. The analysis is performed using simulation tools that follow the alternative workflows under a number of different stress conditions. The idea is to follow the workflow and to calculate the resources required at each step along the way. The next step is then to establish what happens if a resource becomes unavailable.

The BCP analysis effort can become very complex very quickly, so an automation tool is required. In particular when you have multiple sites that act as a (partial) backup for each other, the impact must be assessed across all unaffected sites. In the process overview the impact analysis is, therefore, depicted as a separate activity. Step 4 is where we match wishful thinking with the reality of what can likely be accomplished in terms of restoring processing abilities. Before we consider the use of TaskMaster as a tool we have to understand the basics of how to analyze the process using a DRAPE approach.

After you have identified critical and necessary business functions and you know what it takes to recover these functions, you must figure out how to accomplish the recovery. The analysis of consequence will provide information you need to assess available recovery options and the costs associated with different options. The impact analysis will evaluate whether recovery is feasible within the specific time limits or whether consequential costs are involved. Each consequence must be further evaluated in terms of business continuity (setting aside for now the question of personal injury prevention that is assumed to be a given). The purpose of a detailed analysis is to verify if the planned intervention (the solution design) will be adequate to deal with the expected disaster scenarios. The business process analysis is focused on how services are maintained from a customer perspective. Typically the customer could not care less where and how you provide the service, so long as it meets the SLA criteria.

Step 5: Prevention

The purpose of the impact analysis is to determine what can be done to prevent the impact by incorporating early warning opportunities. While not strictly part of the response in the sense of a traditional Business Interruption plan, it makes sense to do what is reasonably possible to prevent the consequences from impacting the operations in the first place. Prevention does not necessarily mean that an event response is no longer a priority, but it may lower the probability of occurrence and thus the imputed event cost. It is one thing to decide that we have a good plan, but it is still necessary to determine how we can reduce the likelihood that the plan will ever be required: prevention is still the preferred option no matter what. It is important to focus on the many small things that you typically can do to prevent a problem from escalating and impacting on your operations.

The prime objective of recovery planning is to enable an organization to survive a disaster and to continue normal operations. We have noted that a BCP that becomes outdated is of little use to the organization. This raises questions about who will maintain the BCP once the original development project is completed. It may be clear to management that the BCP is an important aspect of operations, but that does not translate necessarily in a career opportunity for assigning an employee to maintain this critical function, which consists of periodically reviewing the analysis to determine if any aspects have changed. This support role is perhaps the most critical of all the preventive initiatives, but it is also vulnerable to being discontinued whenever there is a change in leadership (or ownership) in the organization. It is similar to data security, quality assurance, and other audit functions scrutinized as not necessarily contributing to the bottom line.

When all preventive actions have been considered, and hopefully scheduled for implementation, it is important to consider how an incident will be managed. The BCP itself explains the strategy for the recovery, but the tactical aspects of how to achieve that BCP are still to be worked out in a plan that is not unlike any project plan. Except, this project plan does not work in days but in hours or parts thereof. This calls for a special scheduling tool that is unlike any regular project management system, but the first step is to establish a detailed plan of actions to achieve a full recovery in the least amount of time possible. We will look at software aspects of this effort at the end of the step 6 Execution description.

Step 6: Incident Management Execution

Hopefully we will never have to execute recovery processes in response to a major disaster. If it is necessary, however, we need to have a predefined response plan that can be invoked with minimal effort to start the corrective actions immediately. This is the output of the DRAPE process that everyone must become familiar with, so that if an event occurs people will know what the appropriate response should be. The emphasis must be on making sure all aspects of the response execution are fully understood, in order to keep the impact of the event to a minimum if possible. It is important to be prepared to invoke the execution of a Business Continuity Plan. Each site must understand what to do in order to cope with events that affect that site, as well as what is expected to help another site that is impacted by an emergency event. There will be enough confusion that it is not reasonable to expect people to think about the things they have to do next. The execution plan will tell them what to do next. It is also important to focus on keeping a business alive, to provide time for people to come up with more permanent solutions.

Based on the detailed information preceding this section it should be clear that an Emergency Control Center is no luxury, and that an Incident Commander has a lot to be concerned with. For this reason the Incident Management process is summarized in this section (for the purpose of our guide it makes no difference if there is an automated process involved). This is part-2 of the Incident Management process, continuing where we left off after we prepared for this eventuality, to adapt the action plan to the exact conditions and to execute the plan implementation. In most cases this comes down to rating the disaster as explained before:

  • Destruction: In this case there will not be a salvage of work in progress, but an emphasis to divert the work to a backup site and to get emergency supplies that allow the backup site to finish the work. An example of destruction is a major fire that engulfs the site.
  • Major Outage: In a major outage the plant is not destroyed but inoperable, so it is possible to salvage the work in progress and move it to the backup site to finish the work. Also move new work to the backup site. An example of a major outage is a flood that shuts the plant.

There will be 3 sets of action plans for each Business Continuity Plan. While there are several highly sophisticated software systems available for this purpose, the example we use is based on a simple utility that handles the most important aspects of such action plans. We assume that an ECC may have to operate under sparse conditions, and we do not want to have to depend on any complex systems that cannot be operational without a sufficient infrastructure. In other words, if it does not run on a stand-alone laptop it is too complex to be a practical tool for actual use.

Learning Formats       PMBP_DRA

This course is currently available in a classroom setting (public or company private) with approximately 60 contact hours (10 days).

PDF – Certificate Of Completion

Each course offers a certificate of completion that identifies the course, the student, and a brief description of the course. To receive a certificate the student must have attended at least 80% of the course sessions. This personalized certificate is forwarded to the student by Email.

PDF – Course Notebook

Each course includes a notebook in PDF format that provides the minimum knowledge the student must master in order to obtain the certificate. In the notebook you will find references to other study materials. Students receive the notebook by Email when their registration is confirmed.

PDF – Program Overview

An overview of this study program can be downloaded from the website by right-clicking on the program link on the enquiry page.

PDF – Current Training Schedule

A list of upcoming training sessions can be downloaded from the website by right-clicking on the schedule link on the enquiry page.

Registration – Service Providers

To register for any training course please look on the enquiry link page of your service provider (from where you accessed this website). On the page you will find a registration request form where you can order the course that you are interested in. The availability dates will be provided to you, along with payment instructions if you decide to go ahead.