Not all IT failures count as a disaster. If a hard drive fails on your workstation, for example, you should be totally fine; you talked the boss into upgrading your rig to a RAID 10 last year, so you just need to swap a drive and let it rebuild while you monitor the network. (You did talk the boss into that upgrade, right? Right??)
To truly count as a disaster, an IT failure has to have a substantial—potentially catastrophic—impact on the company. Losing power to a call center for a couple hours may not be a disaster; losing electrical equipment because hurricane waters have engulfed the office would be.
Disaster isn’t always the result of a tornado or flood. Hackers might gain entry to your system by exploiting a weak password. A call center employee could breach the network by plugging in a flash drive he found.
Cyberattacks have been escalating in recent years—2015 was the second highest year on record for data breaches in the U.S.—and they’re becoming increasingly more expensive. Cyber crime cost U.S. companies an average of $15 million in 2015. Add in reduced innovation, recovery and other indirect costs, and you’re facing an even bigger expenditure.
The best defense against an IT disaster such as a data breach is—as the Boy Scouts suggest—to be prepared and anticipate a crisis before it hits.
Thorough policies and procedures can help minimize the risk of any one issue becoming a catastrophe.
If you’re in an area prone to natural disasters, you probably already have mitigation policies in place, like out-of-state backups to a cloud or data mirroring on a remote server in case of evacuation.
You’re probably also already taking measures to protect against data thieves, such as:
Basic security practices will help protect your systems. Additional, more involved steps can help you proactively prepare for—and more easily recover from—a large-scale disaster.
It’s easy, in our risk-averse society, to get carried away with security and redundancy. (Some might even say we’re paranoid.) If you find yourself making backups of your backups, you may be overcomplicating your recovery strategy.
There is a maxim among old-school engineers: Each new component introduces a new point of failure. By extension, every bit of tech you add becomes another element that has to be configured, maintained and repaired.
It’s a good habit to look at simple, effective solutions for your disaster recovery plan. Search for one that performs multiple tasks at the same time.
A RAID 10 array is great; striping makes it fast, and mirroring gives you maximum uptime. However, you’re buying four drives to hedge against the rare failure of one. (Correction—your boss is spending money on four drives. We won’t say anything if you don’t.)
Instead of RAID, you could look at running daily backups to an in-network server—or consider regularly mirroring your data or system image to a cloud. If a drive fails, just swap it out and migrate the data. What’s an hour of your time waiting for a reimage compared to the cost of three extra hard drives?
This is also known as “don’t manage to the exception.” If you’ve been in IT for any length of time, you know weird stuff can happen—like the laptop display that resets to 640 x 480 when the fan is spinning at shutdown. (True story. It ended up being an integrated graphics card.)
When disaster strikes, it’s natural for everyone to root-cause analyze and implement policies to prevent future IT failures. For some scenarios, like a failed hard drive, that’s perfectly acceptable. (And a good excuse to remind people to back up their data regularly.)
But if it’s something weird, unusual and unpredictable, sometimes it’s smarter to let it fail—and then let it be. For example, if the power goes out, waiting for the utility company to restore electricity can be more cost-effective than buying UPS devices or a generator. If a laptop fails, a user can be re-tasked until it’s repaired, or you can make loaner systems available.
(Of course, there are less risky ways to manage these scenarios. One option is to switch to low-cost zero clients that use only solid-state components and connect to a network to load the OS and remote desktop environment. With no moving parts, there’s a much lower incidence of failure. Or use lots of nice, fast SSDs.)
Sometimes, we can fall into the trap of mistaking a random event for something that could have been prevented. However, the network may not go down again. The laptop may not have failed because of a major ongoing issue. A simple cost-benefit analysis can help you avoid misidentifying an IT failure.
You can’t change a hacker’s decision to try to breach your system. But you can put a set of systems and policies in place that make hacking you difficult, time-consuming and grindingly boring to hopefully nudge the hacker to move on to an easier target.
To cover your bases, you should evaluate the following:
A little research can help you determine which solutions would best protect your system’s potential entry points. For example, if your organization deals with credit card transactions, PCI compliance, developed to help prevent payment card fraud, would be a relevant option.
In a more general sense, system monitoring tools can offer notification about an impending issue before it turns into a disaster.
Disaster protection might seem like a big investment of time and money. In truth, it can require a fair amount of both, but it’s worthwhile effort. If you fail to plan for potential issues and if you fail to determine where and when to manage risks, the costs you incur could be considerable—and may potentially amount to much more than you’d have spent to safeguard your system.
flickr photo by Pasi Mammela https://www.flickr.com/photos/75247711@N05/26796454495/ shared under a Creative Commons (BY) license
flickr photo by Andrew Smith https://www.flickr.com/photos/andrewasmith/6157407129/ shared under a Creative Commons (BY) license