IT disasters may be caused by nature, IT equipment failure, or human factors, and all of these factors must be considered in disaster planning. But while 80% of IT organizations have disaster recovery (DR) plans, only 40% of them test their plans regularly.
Testing of IT disaster recovery involves examining and practicing each step in the DR plan so that critical data can be recovered, business applications can be restored, and normal operations can resume as quickly as possible after the disaster.
Testing to Identify and Correct Gaps in the Recovery Process
Think about baseball fielding practice for a minute. The team rehearses their actions against line drives, pop-ups, fielder’s choice situations, left-handed batters, and other scenarios so that in a real game, everyone knows what to do when faced with these situations. Disaster recovery testing is similar. Testing your DR plan helps you identify problem areas so you can correct them and be prepared when the real thing happens.
Disaster recovery testing has to be repeated regularly, because infrastructure, business processes, and personnel change, and these changes must be integrated into the DR plan. After each test, it’s critical to document successes, failures, and other information to improve the DR plan so everyone’s ready for the next test or actual disaster. Disaster recovery plans are complex, and it’s critical that all the “moving parts” are tested periodically. Disasters throw major irrationality at your team, and when your team has rehearsed its rational response, there’s less likelihood of panic or dumb mistakes.
A Checklist for DR Testing
If you’re not sure where to start, here is a checklist for planning your DR test or drill.
• Spell out the objectives of the drill
• Create a drill scenario
• Develop and document a drill process
• Notify all personnel involved, including end-users and management
• Define responsibilities of IT team members
• Include all infrastructure in the drill planning process: databases, firewalls, networks, applications, hardware, etc.
• Explain to management why you’re doing the test. Management buy-in is important.
After your first DR drill, you’ll discover several things you didn’t think of beforehand. This is good, because now you can improve your DR plan and include these things in your next DR drill.
Best Practices for Disaster Drills
Unannounced drills let you know who holds up well under pressure.
Focus on people. Assigning responsibilities and ensuring everyone knows who is responsible for what is critical. Your plans should include contingencies for staff problems. For example, if there’s a hurricane and some of your team members are dealing with storm damage at home, you have to plan how you’ll carry out your DR plan with a reduced team.
Include unannounced drills. Regularly scheduled drills are great, but people are prepared for them. The occasional unannounced drill (known about only by key personnel, and perhaps conducted after hours) is important too, because it lets you see how people react when they don’t have time to prepare.
Follow-up is essential. You may be discouraged if someone drops the ball during a drill, but now you know what doesn’t work and can address the problem. Documentation should involve analyzing each phase of the plan that was rehearsed and determining what went right and what went wrong. With this information, you can modify your DR plan accordingly. If you don’t follow up after a drill, you can’t expect improvement next time around.
Mobile Devices and DR
Know how mobile devices fit into your disaster planning. It’s possible that a mobile device could cause a disaster (like if someone leaves a laptop full of mission-critical data in the trunk of a rental car), and you have to be prepared. Your mobile device or BYOD policy should fully address this type of situation in terms of prompt reporting of incidents, and actions taken as a result (like remote wiping of lost mobile devices).
Mobile devices may also be part of your recovery plan. If offices are flooded, employee mobile devices could be used to resume some business processes (like cloud-based processes) so business is not halted completely.
Testing of DR procedures is a proactive approach that is as important as having a DR plan to minimize impact of a disaster. With the speed at which business operates today, even a brief interruption in operations can be phenomenally costly. How your organization reacts can affect both the company bottom line and the company reputation. Having a disaster plan is necessary, but insufficient. You also need to put that plan to the test regularly to maximize preparedness.
When you use Samanage as your IT service management solution, your IT asset management program and IT service desk are run in the cloud. This means they can be resumed quickly after a disaster, and give your team one less thing to worry about.