Remember back in school, when once in a while you had a drill wake you up from class?
I’m sure you at least have fire drills (maybe tornado). Your teacher probably had specific instructions on what to do. In the event of a fire, one sure trigger to take the evacuation route would be some really annoying alarm sounding, making anyone in the building ready to leave. You probably were directed in a single orderly line outside of the building to a pre-designated meeting point.
These drills were planned to ensure everyone’s safety.
Instead of simply sounding an alarm and hoping for the best, fire drills are designed specifically to ensure that students and staff are accustomed to evacuating the building and to ensure that everyone knows exactly what procedure to follow. An exacting evacuation procedure—in theory—would save lives by minimizing risks.
IT recovery drills do the same thing.
Maybe you call it a disaster recovery plan (DRP) or a business continuity plan (BCP). Whatever its name, the bottom line is that recovery plans are an incredible tool—like the fire evacuation procedure back in school—to help create order and accustomed routines from the chaos of the emergency situation.
Your recovery plan will help you stay proactive about data security—I’ve had to guide and use them in various situations. BUT, a recovery plan is only good unless you and your team tests it.
One of the biggest misconceptions within organizations still in 2019 is that they can set and forget their recovery plans. As your organization changes—with new team members, new clients, new foci and new technology—how can you ensure that a recovery plan implemented or established 2, 5 ,or 10 years ago will still work?
Your disaster recovery plan is only usable if it’s tested and working.
Your disaster recovery plan should have defined goals. I’m not talking about “have the server up and running in 20 minutes”. When I test recovery plans, I prefer to ask questions like “how good are communications between departments or team members” or “how does added stress make your team interact with each, especially those on your IT team”.
My goal would be to answer questions like these: execution-related questions that really get your team prepared and thinking about what they are doing before they walk into a simulated disaster.
Strategic questions will give you an idea of how well you are prepared—whether your plan will work in real life or whether you have to go back to the drawing board.
Make sure you test a variety of variables to see how they affect your recovery plan’s execution. Coming back to that fire drill example above, let’s say your school had half of its students with some kind of hearing disability.
Your plan (derived from another school) probably just instructs students and staff to react to the ringing alarm. If you went ahead with a test of that plan, would it be as effective with your school? Probably not. A strategic question you might have asked is “will our school community be able to respond to traditional fire alarms?” Make sure to address the specific strategic questions pertinent to you and your team.
Certainly, if your server crashed, your IT team will be trying to get the server up as fast as possible (maybe you have a drill for a response plan to a server crash in place).
Instead of looking at just the speed at which your IT crew gets the server up and running (or has a replacement going in the case of a complete meltdown), make sure you observe how well communication is working. Do they ask for help when they need it? Will they include your team in regular updates as to what has been done and next steps? As you go through different scenarios and angles in your recovery, make sure you test each piece—not just the technology, but the softer sides of recovery (client communication, internal team preparedness, alternative work modes—just to name a few).
Get everyone on the same page.
Maybe this is a no-brainer, but get your entire team—technical and non-technical, alike—to make sure that everyone is on the same page from the initial response through completion of the plan to ensure everything runs smoothly.
Make sure you and your team has the appropriate documentation so that they can refer to specifics as need. One warning—if your team is performing disaster recovery purely from documentation, it probably will be difficult even in the absence of a chaotic disaster. The more you test and get your team primed for events—natural disaster, ransomware, or other outages—they will be better equipped to respond and react.
Make sure you practice disaster recovery often.
If it’s been over a year since you last ran a test, will that plan still work? How much changes in your organization within 90 days (let alone over a year)? Figure out an interval at which it is probably necessary to test your plan. Most experts advise to test at least critical parts of your plan quarterly and having an annual review of it in its entirety.
Take good notes.
As you test your current plan, make sure you are documenting the process again. If there are uncertainties or points needing clarification, make sure you jot down how to improve the plan for a smoother recovery the next go around. Make sure everyone is on the same page and walking in unison as the recovery process proceeds.
Reassess.
After the test was complete, get back with your team for a de-brief meeting. Get their insights on what went well and what could have run better. Realize that each team member has a unique perspective of the recovery problem and that they might be seeing things you or your IT team are not attuned to see. The more input you get on your test runs, the easier the real deal will run in the event something ever happened.
Bottom line: your plan will be only as good as how much it is tested. If you have good communication and a team that understands with clear focus what needs to be done, your tested plan will assuredly set you leaps and bounds above the end result if you never test or are not updating your disaster recovery plan.
One last note: Drills are not only good for disaster recovery, but also for compliance. Make sure you are actively participating and engaging your team in drills to ensure your business is protecting its data and keeping you running.