Why data centers must commit to liquid cooling

By Jason Matteson

October 03, 2021

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Jason Matteson, Iceotope's director of product strategy, explores how data centres can bridge the gap between the present and the future of liquid cooling

When it comes to data center cooling, operators have long known that liquid is a far more efficient medium for removing heat compared to air. However, to date there have been many barriers to adopting and benefiting from liquid cooling strategies, which comes some way to explaining why the data center industry remains in transition today.

The leading obstacle to the adoption of liquid cooling is the perception of risk. In an historically conservative industry, where, e.g., reliability trumps efficiency, risk aversion has opened up a chasm between the understanding of potential benefits which could be accrued with liquid cooling and the business decision that could deliver those gains. How can the industry finally bridge this gap?

Perception of Risk #1; Leaking Liquids in the Data Center

One classic risk perception has led to concerns about the chance of leakage of water or other liquids that could interfere with the operation of the servers or other IT equipment. This has been especially true of water direct-to-chip solutions, where cold plates are installed inside the servers. The risk of leakage coupled with the use of water as a coolant has unfortunately proven to be a real concern as well as the source of actual damage.

While it is not openly admitted, it cannot be denied that water damage or leakage has been a major cause of downtime over the years. Putting running water alongside electrical equipment is not only risky, but potentially dangerous. Not unnaturally therefore, some operators have felt the potential benefits accrued by deploying liquid cooling don’t outweigh the apparent risk to valuable IT loads, especially where costs can accumulate from lost data, damaged equipment and downtime.

It is not unusual for a customer to simply say that they don’t want to have water in the data center. Despite this, the customer’s site would undoubtedly have liquid circulating the technical space underfloor and through their air handlers. Typically, a chilled water loop would be in place, cooling the air that was moving around the room. However, the customer journey didn't reflect this understanding at the time.

Perception of Risk #2; Isn’t Air-based Data Center Cooling Sufficient?

The next question often became about whether the data center operator really needs liquid cooling - isn't air sufficient? The answer is that while we might well have managed with air for decades, the reality today is that air cooling is no longer sufficient to ensure the reliable operation of data center loads. A recent paper published out by ASHRAE TC9.9, The Emergence of Liquid Cooling in Mainstream Data Center highlights exactly this point.

New technologies are just on the horizon which requires liquid direct to rack and chip, says the ASHRAE TC9.9 technical committee – generally considered to be a leading global authority on data center power and cooling trends and best practices. Not only is the need for liquid cooling being driven by chip density as well as application performance, it says, there is also an urgent need for the industry to prepare for liquid technologies now.

By way of example, the high-performance computing (HPC) community has for sometime deployed liquid cooling as an industry norm. With no seeming ill-effect on uptime or availability, the technology has enabled CPUs and GPUs to be reliably run at maximum performance while minimizing its leakage power. This in a market sector where data processing speed and volume matters, and even fractional percentage improvements can make a real difference.

Amongst most industry players, there is growing awareness of the significant value which liquid cooling brings. For example, it allows higher density racks to be deployed making white space more productive; it facilitates greater data processing capacity and performance (the real work of the data center), increased energy efficiency, lower carbon impact and the opportunity for heat recovery. Pilots with technology such as precision immersion, chassis-level liquid cooling, suggest that IT equipment is also more reliable and requires fewer manual service interventions when installed in dielectric-cooled environments.

The future doesn't have to resemble the past

Technology has evolved, as it does. At the same time, data center cooling technology has reached an inflection point, meaning the change that was already happening is now gathering momentum. The newest chipsets and related solutions being launched by all the major vendors, increasingly require liquid cooling solutions. Many websites and much documentation already state this need.

The underlying challenge is that superficial changes to the air cooling system, like adding more fans or reducing the hardware density, will no longer be enough. IBM, for example, has just announced 2nm chipsets promising 45% higher performance than today's most advanced 7nm processors, further empowering the rise of analytics, AI and machine learning.

Progressively, enterprise CIOs and IT strategists are increasingly being tasked with solving newer challenges - from advanced analytics, machine learning and artificial intelligence to 5G, the internet of things (IoT) and edge compute. That means the data center operators and digital infrastructure that underpin them must support much higher power demands and rack densities too. For many companies, the question facing the data center sector is no longer “if”, but when and how will liquid cooling start to become ubiquitous.

The cooling stakes in the data center, not just for hyperscalers and university supercomputers and other pioneers but for colo providers and 'standard' enterprise-level server rooms, have risen. Cooling and dissipating heat from increasingly hot and hungry systems and hardware has suddenly become much more challenging. This is largely because this slow-moving and conservative industry remains wedded to the use of hugely inefficient air-based cooling systems.

Solutions for the liquid risk-reward equation

Power users like HPC specialists and high-end researchers have been immersed in the world of liquid cooling for years now. Partly as a result, the range of liquid cooled solutions has developed and grown. Latterly, more specific innovations to suit the requirements of mainstream and colocation data centers have also become available.

Today, there are a few different liquid cooled technologies that are key, each with its own pros and cons. While thermal engineers argue that direct-to-chip is the superior approach, not every customer will need that level of performance right down to CPU or chip level. For others, an immersion cooling solution might deliver sufficient thermal improvement.

One question we have not mentioned is around the serviceability of IT equipment and the ongoing process of IT moves, adds and replacements in the white space. If your servers are submerged in an immersion tank, how do you access them safely for essential maintenance and repairs and what might that mean for warranties?

The latest innovation is precision immersion, or chassis-level precision immersion is essentially a hybrid of liquid cooling approaches combining the best features of full immersion and direct-to-chip. Optimised at chassis-level, precision cooling is focussed on user convenience; it can be retro-fitted into the data center using standard equipment racks, reducing risk and complexity whilst simplifying deployment.

Bridging the budget gap

A recent white paper by data center physical infrastructure leader, Schneider Electric investigated the capital costs of immersive liquid cooling versus air cooling in the large data center and the typical cost differential is not as large as one might expect. In fact, Schneider were able to demonstrate that at a like-for-like rack density of 10 kW in a 2 MW data center, the CAPEX requirement is broadly the same.

Because compaction is a key benefit of liquid cooling, Schneider have also quantified the capex difference when liquid cooling is deployed at 20 kW per rack and 40 kW per rack for the same capacity data center, respectively achieving 10% and 14% CAPEX savings. More recently, a white paper published by multi-disciplinary engineering company, Cundall, suggests that up 20% savings in CAPEX should be expected.

The choice to bridge the gap and deploy liquid cooling involves a complex equation with a lot of moving parts, but it will usually end up coming down to the dollar, pound or euro. Newer chassis-level technologies will continue to meet the cooling requirements of high-density CPUs and GPUs for the foreseeable future. What’s more, the technology can deliver space savings, efficiency savings and lower TCO. For the astute data center operator, chassis-level precision liquid cooling is a sound engineering and business case.