Outages remain a major concern, says Uptime Institute
On Monday, the Uptime Institute - the organisation responsible for the industry standard Tier I-III ranking system for data centre resilience - released its third Annual Outage Analysis report.
While the report notes that technological and data centre management advances have improved the ability of data centres to avoid unplanned outages, these costly events remain “a major industry, customer, and regulatory concern.”
The report also notes that, while outages are becoming less common, the centralisation of the data centre industry, and the increased reliance on data centres as critical infrastructure during the COVID-19 crisis has resulted in “the overall impact and direct and indirect cost of outages” continuing to increase.
"Resiliency remains near the top of management priorities when delivering business services," said Andy Lawrence, executive director of research at the Uptime Institute.
He added: "Overall, the causes of outages are changing, software and IT configuration issues are becoming more common, while power issues are now less likely to cause a major IT service outage. The fact is outages remain common and justify the increased concern and investment in preventing them. Because of the disruption and high costs that result from disrupted IT services, identifying and analysing the root causes of failures is a critical step in avoiding more expensive problems."
Some key finding from the report include the fact that almost half (44%) of data centre operators have felt concern over outages rise in the past year, and around 75% of of operators and IT managers surveyed said they’d experienced an outage in the past three years, with around 30% describing those outages as having a “significant impact” on their business.
The Human Element
Human error continues to play a significant role in most data centre outages, the report found. In the Uptime Institute’s 2020 survey, three-quarters of respondents who had experienced an outage in the past three years attributed their issue to human error, claiming that the downtime incident could have been prevented with “better management or processes.”