Data centres are such a critical element of the modern world that just a few hours offline can create massive disruptions, both for the operator and its customers. These issues, whether they stem from external factors like fires or - in the recent case of Texas data centre operators - storms, improperly maintained hardware, cyber breaches, or something else, are more common than the average data centre owner would probably care to believe. Back in 2018, the Uptime Institute collected data that showed over 30% of respondents had experienced a downtime incident or a severe degradation of their services in the previous year, and 48% reported experiencing at least one outage during the preceding three.
For a data centre operator, a full service outage (according to research done by Gertner) costs, on average, $5,600 per minute when you take into account lost sales, a damaged reputation, compensation paid to customers, lost data, and the cost of repairs. In order to avoid these dangers, it pays dividends as a data centre operator to have a rock solid risk management plan in place. The consequences could be dire.
10: Understand different types of risk
The threats facing the continued operation of a data centre are myriad. From external factors like cyber attacks, natural disasters and failures of the local power grid, to internal ones like equipment failure and human error, a data centre risk management plan should explore a wide range of possibilities and plan accordingly. Engaging the services of a consulting firm, or outsourcing your cybersecurity needs can help increase resilience and reduce costs in both the long and short term.
Good Habits: conduct reviews to understand your vulnerabilities; consult the experts; integrate risk management practices from day one.
Bad Habits: base your assessments off industry standards, rather than an in-depth examination of your own facility.
09: Build redundant
A UPS system failure is by far the most common cause of a data centre outage (ahead of human error and cyber attack). Securing multiple sources of power, as well as a backup generator that is capable of keeping your data centre running for more than 24 hours without access to the local grid, can be the difference between a few sleepless nights and disaster. When you understand the risk to your data centre, decide whether N+1, 2N, or 2N+1 redundancy is right for you.
Good Habits: build multiple points of redundancy in case of a power outage; develop a pool of multiple backup power suppliers.
Bad Habits: rely entirely on one type of power backup.
08: Keep it secret, keep it safe
Ensuring the physical security of a data centre is just as important as an effective firewall. One of the easiest and cheapest ways in which to keep your data centre safe is to keep its location a secret. Hyperscalers like AWS and Google are very cagey about the precise locations of their campuses; if you were to drive past one, there’s very little chance you would see the company’s name in big glowing letters on the roof. By removing obvious signage and maintaining a relatively inconspicuous presence in the local community, you eliminate a good deal of the physical risk to your customers’ data.
Good Habits: treat physical security every bit as seriously as you treat digital threats.
Bad Habits: advertise your location more than is strictly necessary.
07: Ensure you have an effective alert notification process
If and when something does go wrong with your data centre, ensuring a short response time is critical. Given the fact that data centre downtime can cost as much as $5,600 every single minute, you need to be able to make sure that, in the event of a breach or outage, the right staff receive an alert and details of their responsibilities during the emergency as quickly as possible. A good data centre risk management plan needs to account for the best way to alert employees (both on and off site) in the event of a disaster.
Good Habits: regularly update your contact information for key employees.
Bad Habits: create a plan that rests solely on the shoulders of one staff member, who may be unavailable to respond; send out confusing or mixed messages in a crisis.
06: Establish off-site backup
Should your data centre be taken offline by a fire, storm, flood or cyber attack, making sure that your clients’ most valuable data is backed up in a secure off-site location can be key, both to maintaining customer trust, and to resuming service as quickly as possible. It is also worth noting that the typical hurricane is around 300 miles wide, so choosing an off-site backup in the same city - or even region - can limit its effectiveness.
Good Habits: identify a backup facility that isn’t exposed to the same risks as your primary data centre.
Bad Habits: keep all your data in a single location.
05: Trust in zero
The idea of “Zero Trust” has gained a lot of ground in cybersecurity circles over the past few years and you should absolutely embrace it in your data centre. In short, zero trust means that your network doesn’t trust any data traffic unless a security policy specifically allows it. This does require a full understanding of your network and the writing of smart policies that allow your business to be safe and function at the same time, but achieving microsegmentation across all the applications in your data centre is one of the most effective ways to reduce cyber risk.
Good Habits: divide up functions throughout your network with a distributed internal firewall.
Bad Habits: throw up the external firewall and call it a day.
04: Visibility is everything
Like we mentioned before, the key to establishing the kinds of security protocols that effectively minimise risks to your data centre is visibility. If you can’t see something, you have no idea of whether it can hurt you. Investing in security platforms that give end-to-end visibility throughout a data centre’s network can make the difference between a neutralised cyber attack and a high-profile breach. At the same time, constant surveillance of your physical facility, in order to ensure that all systems are functioning as intended, and that the building is secure, is equally important.
Good Habits: watch out for unsecured devices like laptops and IoT sensors connecting to your network.
Bad Habits: allow blind-spots to continue existing throughout your security system.
03: Location, location, location
The storms that ravaged Texas earlier this year threw up several glaring issues with data centres located in the state. A mixture of unavailable power and susceptibility to severe weather had the potential to severely damage operators’ facilities throughout the state. As climate change results in increasingly severe weather, choosing the right location for your data centre - in a stable environment with access to a strong electrical grid and, preferably, renewable power - is a vital step in preventing a disastrous service disruption down the road.
Good Habits: examine past meteorological data, as well as the service history of the local grid.
Bad Habits: build your data centre next to a chemical plant, airport or in a flood plain.
02: Make people your greatest asset.
… not your greatest vulnerability. Aside from UPS failures, human error accounts for the highest number of data centre outages. Alarmingly, with the industry understaffed and more and more of the workforce reaching retirement age, the problem is only likely to get worse over the coming decade. Invest in your staff and invest now. Data centre operators that attract top talent, develop existing staff and incorporate constant training and development into their DNA will benefit, not only from safer data centres, but a competitive advantage over their peers.
Good Habits: create partnerships with universities that support scholarships and grants to attract more students to the industry.
Bad Habits: lean too heavily on automation at the expense of a skilled human workforce.
01: Test your plan.
Good. Now, test it again. Crafting an elegant, comprehensive risk management plan is one thing. Testing it, making changes, and testing it again is another entirely. The global data centre threat landscape is constantly evolving, and so should you. According to experts at vXchnge, “Changes in data availability needs or business growth are two of the primary reasons why disaster recovery plans need to be re-evaluated on a regular basis. As part of that reassessment, the plan itself should be tested frequently as part of ongoing disaster mitigation services.