Most organizations recognize the need for disaster recovery plans (DRP) and business continuity plans (BCP), but how many can claim that their plans are up to the task? If a disaster or disruption were to occur tomorrow, would your plans provide the guidance to keep your business functioning?
According to a StorageCraft report, a third of surveyed businesses were unable to adequately respond to a disruption, despite having plans in place. Are yours any better? The probability that your plans will help you weather a disruption and survive it is directly related to how well you have tested your plans. An untested plan is little more than a wish list.
Testing recovery plans is like killing two birds with one stone — you validate your plans and train personnel on what to do during a disruption as well. Effective plans are those that are fresh, pertinent, actionable and well understood.
Getting to that point takes some work but is definitely worth the effort. Let’s look at some simple strategies that will ensure your plans are ready for any crisis.
Get TestRail FREE for 30 days!
Planning to test
Every business needs both a BCP and a DRP. The BCP contains the tasks for continuing business operations through service disruptions, while the DRP guides personnel through the process of recovering from a disaster and returning the business to an operational state.
A common perception is that both plans should be created and then just filed away until needed. This is far from the truth. Recovery plans begin to lose their value almost immediately after being completed. The longer they sit unattended, the less value they bring to any recovery effort.
Because these plans consist of steps to recover from a disaster or service interruption, they are only as good as the accuracy of their contents. Every time your environment changes, the plans have to change too or they will become increasingly obsolete. Although a general periodic review of plans is a very good practice, you really have to test them to ensure that they are effective. There are three primary types of recovery tests: checklist, full interruption and tabletop simulation.
A checklist test is a readthrough of each plan to ensure the contents are valid. This can be done by a single individual or in a group. This type of test helps ensure that all the information in the plan is up to date and applicable to the current environment. It only requires minimal resources and can be repeated frequently enough to keep the contents fresh. The main drawbacks to a checklist test are that it can be repetitious and boring, and additions to an environment, such as a new server, can be overlooked.
Full interruption test
At the opposite end of the spectrum from a checklist test is a full interruption test. This type of test is the most thorough exercise of any plan, but it is also the most disruptive. A true full interruption test requires that service literally be interrupted for a BCP or that the primary infrastructure be disconnected to test a DRP. Either of these disruptions is likely to have substantial impact to your business operation, even when the plan works perfectly, and if your plan is ineffective in any way, it will likely increase the disruption. Although this is a great test of plans, it is almost always too risky to carry out.
A great compromise between the severity of the full interruption test and the limited value of a checklist test is the tabletop simulation. In effect, this type of test is a role-playing game. Tabletop simulations are group exercises in which personnel “play the game” as themselves. The facilitator leads the group through scenarios as participants react as they would when facing real situations.
Regardless of which type of test you choose, use it to both exercise your plans and train personnel. When you plan the test event, encourage all participants to attend and engage in the process. Provide ample time and motivation for participants to review their parts of the plan and be completely familiar with its contents; test day is not the time to learn what is in the plan. Prepared personnel make a smooth recovery from a disruption possible.
For example, on September 11, 2001, the U.S. Federal Aviation Administration (FAA) ordered all planes in or approaching U.S. airspace to land immediately. An Aviation International News article reported that it took 17,500 air traffic controllers to get some 4,300 aircraft on the ground in an orderly manner. They were able to do this because the FAA had plans for clearing the airspace that dated back decades. Controllers were required to walk through these plans periodically, even though most controllers never expected to ever use them.
Plan to test your DRP and BCP frequently enough to make sure that the plans and personnel are ready to go when the need arises.
Conducting the test
When the day of the actual test arrives, encourage every participant to engage in the whole process, not just their part.
Assuming that you’ve chosen the tabletop exercise format for your test, you’ll have a room with personnel from various areas in your company. Try to have a representative from every main area. Include legal, human resources, public relations and any other support organizations. The goal is to present realistic scenarios and have each person react as they would in real life.
One scenario may include a fire or accident that requires contacting emergency personnel. Who makes the decision to call for help? Who makes the call? Should anyone else be notified? These are just some of the questions that can foster valuable discussions. Use the draft of your plan as a guide and document any gaps you encounter. (It is a good idea to have at least one person documenting the process who is not involved in leading the simulation.)
The simulation facilitator should introduce each scenario, being sure to make each one realistic and believable. The purpose is to foster communication among all participants and cooperation when resolving each challenge. As in many role-playing games, the facilitator, or game master, will introduce the scenario and then ask the group, “Now, what do you do?” This is the real value of the simulation: It provides each person with the opportunity to think through difficult situations in a controlled environment. Participation is its own training.
Learning from the experience
Although much of the value of tabletop simulations comes from the experience of participating, the process itself can continue to deliver value. Take careful notice of the parts of the plans that worked well and the ones that did not. If parts of the plan were awkward, changes or additions may make the process smoother. In short, do more of the things that went well, and find ways of limiting or changing the things that weren’t so smooth.
Continuously testing your DRP and BCP in this manner will provide you with trained personnel who are ready to act — and plans that you can be sure will be helpful in a crisis.
Article written by Michael Solomon PhD CISSP PMP CISM, Professor of Information Systems Security and Information Technology at University of the Cumberlands.
Test Automation – Anywhere, Anytime
- TestRail Again a Leader in the G2 Grid for Software Testing
- Announcing TestRail 5.7 with Enterprise Features, new API Endpoints and Edit Result Permissions