We face a constant two-pronged threat. On one side, the increasing number of climate and energy related phenomena and on the other the growing menace from ransomware and malware.
Both have the potential to knock your business of its rails and jeopardize future activity.
It’s time to take a serious look at the quality of your Disaster Recovery Plan.
More incidents mean greater risks
In recent months and years, we have witnessed a rise in the number of random incidents impacting IT production. Among these incidents:
• Meteorological events: floods, hurricanes, storms, wildfires…
• Power blackouts
• Virus/malware/ransomware incidents targeting more businesses of all sizes
These incidents generate IT downtime and impact business activity. Each business must analyze the risks and focus on impact mitigation. Concentrating IT infrastructure and data, often for economic reasons, into a single physical location can considerably increase the losses should a major incident occur in that location. Putting all your eggs in the same basket can have unfortunate consequences.
Increased risks weigh heavily on our global economy. We know that major IT equipment failure and the resulting downtime and data loss beyond 48 hours will see around 40% of all companies fail in the short term.
What strategy should I adopt to reduce this company-threatening situation?
• Is my organization fully convinced of the risks entailed?
o How can I get the board to buy in and invest?
• Are we sufficiently protected against this type of risk?
o Sufficient versioned data copies? Spare equipment?
• What solution guarantees production restart within a 48-hour limit?
Get written sign offs from management teams on how long it would take to get production up and running again, based on the user availability requirements. Ensure each type of incident is detailed. Total site loss. Partial site loss. Full cyberattack. Targeted cyberattack etc.
Many firms have set about implementing partial local incident recovery plans. Many rely on redundant architecture with high availability and local backup. Local backup is important for rapid recovery. Beware the loss of the backup along with production. Ensure your data protection solution includes viable off-site targets such as tape and cloud. Fireproof and floodproof racks exist for servers and backup appliances which can be considered as off-site destinations even if they are installed on the same physical site.
When the incident takes on regional proportions (a targeted malware attack for example), the lack of planning and the understandable stress thrown into the mix, will make it difficult to answer these simple questions: Who does what, when and how?
The solution lies in how you plan for crisis and then how you execute this crisis management.
Defining available means and delimiting each person’s responsibility and role are key upfront considerations. Anticipate the incident by ensuring full Disaster Recovery Plan (DRP) and Business Continuity Plan (BCP) are in place. These plans need to be tailored to the type of incident encountered.
An emergency recovery plan for a malware/ransomware attack will obviously not have the same characteristics as an emergency flood or lightning contingency plan.
– Some will say: “We’re good. We’ve got bunkered DR sites in the USA and United Kingdom”.
Fine. But this does not yet constitute an authentic recovery plan.
A recovery plan is not just a compilation of technical means (servers, storage, network equipment, access to mail, power etc).
Are you sure that minimal production status will be met? Ask yourself this question: when did you last perform a successful test recovery of 500 GB of critical data (including file restores, critical applications and servers)?
– Others will say: “My service provider and/or data centre guarantees my data backups”.
Good. But the same rules apply.
– When was the last time you asked this service provider to do a full-scale test of your critical servers going down and being returned to full working order?
– Including the recovery of 3TB of critical financial and sales data?
– And even if they do recover correctly, how long will I need to wait to get full service restored or recover data locally?
A DRP must include a crisis management role with different actions based on formal written procedures. Most importantly, these procedures must be fully tested prior to a real-life incident.
All these points are not written in stone to be stored away. A company is a living entity which evolves daily. Colleagues move on, infrastructures change, applications are updated, and data volumes grow. These changes must be integrated into a constant review of your DRP/BCPs.
Your DRP must reflect these evolutions
Your DRP is the sum of body of documents and procedures describing complex operations. The plan must be checked and tested regularly by a colleague rather than the author of the procedure, who knows everything perfectly but might be ill or on vacation the day trouble strikes!
In the event of an incident, everything must be extremely clear for every player
This plan will save your company if it is correctly engineered and executed. It is a multi-faceted and time-consuming project which can represent up to 10% of your IT budget. You need healthy doses of methodology and experience to construct the plan. Methodology to not lose track of the escalation procedures and specific application-related choices.
Experience to rapidly detect if a colleague can fully perform each procedure.
Perhaps it is more important to ensure the telephone and mail server are up and running first rather than looking to restore all IT applications.
Emergency plans will be documented and interlinked and ideally attached to emergency management software.
The investment in this type of software can help reduce timescales and costs and enable you, in the event of an incident, to pilot your DRP. This software can help you to efficiently test your validation exercises and improve their results and even reduce the number of tests to perform as part of your strategy.
Feel the fear
Gartner and others remind us regularly of the frightening consequences of emergency plans which do not work or only work partially because they are ill-designed and/or untested or poorly tested.
An insufficiently-planned test is a little like a message in a bottle. Throw and hope! It can jeopardize your firm because it gives the illusion of security. The main reason for this is of course the budget required to implement this type of project.
But what a waste of budget if the plan is never tested or updated and thus decomes unusable in the event of a real incident. On the basis of this, can we sit back and do nothing? If this project represents 5 to 10% of the IT budget, what is the capacity to manage the plan over time?
We can note the rising number of regional IT incidents, stiffer rules and regulations and more client pressure to ensure continuity with increasing malware and ransomware incidents.
The implementation and maintenance of an operational DRP should become a priority for all firms. Do ensure an implementation of a robust backup and recovery plan for your organisation’s data. Machines can always be replaced. It’s the data that fuels your business which is most critical.