The liquid-cooled future of high performance compute
There’s a journey that new technology takes, from the imagination to cold, hard reality; from the near-fantastical to the mundane; from science fiction to fact of life. The cell phone went from Star Trek prop to the 21st Century’s most ubiquitous consumer electronic device in just a few years. That transition isn’t immediate or smooth either. There are challenges that need to be overcome.
A decade ago, High Performance Computing (HPC) was solely the purview of research institutions and cutting edge technology companies. AI/HPC — specifically the kind of computing capable of workloads we’re seeing today — only made the leap from science fiction to reality relatively recently with the rise of artificial intelligence (AI) powered by Big Data. It certainly wasn't seen as something that the enterprise or mid-market could hope to leverage.
Today, we're witnessing another watershed moment with the rise of generative artificial intelligence. According to Christian Cantrell, VP of Product at Stability AI, the biggest change on the horizon is that “we're now able to use technology to assist in more subjective tasks like writing copy and generating images,” as opposed to previous applications of AI, which sought objective, predictable results that already exist within the data.
“Nobody wants their spreadsheets to produce different results every time a formula gets executed … But AI is now enabling enterprises to unlock massive value in non-deterministic workflows, like generating marketing collateral, writing job descriptions, or assisting in the arduous process of legal discovery,” Cantrell adds. “That doesn’t mean AI will somehow preempt creativity. In my opinion, exactly the opposite. Just as calculators and computer simulations made the jobs of mathematicians and physicists easier — and gave them far more capabilities — I believe generative AI will empower creatives both inside and outside the enterprise.”
Alongside more established uses for AI, the technology, in one form or another, permeates close to every facet of modern life and business. Something that was once the purview of science fiction, and then of cutting edge researchers and the world's leading tech firms, is now exploding in terms of accessibility. Now, powerful analytical and generative tools help research institutions, deep tech startups, enterprises, and anyone with a ChatGPT, Stability AI or Midjourney membership to turn vast amounts of raw data into insights, efficiencies, and striking imagery.
As a specialist provider of AI/HPC services and sustainable data centre solutions, we at engineroom have harnessed our collective expertise to increase Australian primary producers’ crop yields, predict consumer credit defaults, and identify dark matter and gravitational waves. The applications for AI/HPC are only growing, along with the number of organisations looking to adopt.
This generational leap in the accessibility of AI is being driven by advancements in digital infrastructure — specifically that which supports AI/HPC — and the next generation hardware underpinning those data centres, whether that means GPUs or FPGAs, faster processors, or custom silicon. As denser, more powerful hardware becomes more widely available, alongside access to the AI/HPC applications it supports, another technology is making its way out of science fiction and the cloistered halls of research institutes, into common use — with just as much potential for changing our lives as the launch of the first iPhone.
A Thermodynamic Wall — The Barrier to Mass AI/HPC Adoption
There's an issue, however. Over time, as the density and computing power of this hardware has increased, so too has its thermal footprint. More powerful, more densely arrayed hardware consumes more energy and produces more heat, which means it takes even more power to cool it down.
We have reached a tipping point. Our existing data centre ecosystem has an upper limit when it comes to the amount and density of infrastructure it can cool, with average data centre physically unable to cool racks with a density greater than 20 kilowatts.
As the Uptime Institute identifies, conventional air cooling ceases to be an economical or effective solution “when rack densities are higher than 20-25 kW.” This wasn’t perceived as an especially pressing issue even a couple of years ago, when the industry as a whole sat comfortably in the 10-19 kW range; facilities with rack densities of 30 kW and above were even excluded from the results of the Uptime Institute’s survey as “high performance outliers”.
What happens, then, when a new generation of hardware alongside rapidly rising demand for AI/HPC pushes those outliers towards the middle of the bell curve?
With the generative AI industry alone predicted to be worth about A$22 trillion by 2030, according to the CSIRO, a shift towards higher density digital infrastructure is imminent. The chips that are being released right now — the ones that will support the transition of generative AI, advanced GPU processing, and all the other technologies that are making their way from science fiction to the mainstream — are producing twice as much heat as the previous generation. There’s little to suggest the leap to the generation after that will be any less exponential.
More heat means more power consumed to cool server racks, more carbon emissions, more wear and tear on hardware, and fewer opportunities for enterprises to capitalise on the power of AI/HPC.
AI/HPC is poised to move into common use, powering the next generation of Deep Tech and enterprise innovation. We have to ask ourselves, if we are going to leverage all this higher power computing to fuel AI/HPC applications, how are we going to cool it all down?
Our customers are already consuming AI/HPC resources on a regular basis to solve their most pressing and immediate problems. AI/HPC in the enterprise space has become normalised, accepted, and expected.
We need a new breed of facility that's capable of accommodating the hardware powering this revolution. The industry needs a new breed of company that’s able to give organisations access to the potential of AI/HPC.
The Liquid Cooled Future of AI/HPC
When rack densities grow to the point we’re starting to see in dedicated AI/HPC facilities, air ceases to be a viable medium for cooling those servers. Liquids, on the other hand, can absorb a lot of heat very quickly and carry it away, making water or dielectric cooling fluid much more effective than air as a cooling agent. Cooling a server using a dielectric cooling fluid is 1,500 times more efficient than traditional HVAC cooling.
As Paul Finch, CEO of Kao Data said recently, “if you are in the data centre colocation market, and not planning for your facility to implement liquid-cooled compute capability, then you are way behind the curve.”
That's HYDRA™: our solution to a thermodynamic impasse we see rapidly approaching for the AI/HPC sector. It's an alternative approach to cooling high density infrastructure. It's our answer to an existential problem in the world of high performance computing. By passing dielectric cooling fluid over server components as part of a closed unit or “Tank”, HYDRA™ significantly increases the output and energy efficiency of data-intensive compute.
Compared to a typical data centre, a data centre cooled by engineroom’s sustainable HYDRA™ solution generates 45% fewer emissions, requires 50% less floor space — therefore containing half the embedded emissions, as well as half the e-waste because the equipment lasts twice as long in a liquid cooled environment.
Our facilities are not only more efficient and more sustainable, but those savings and efficiencies are then passed on to our customers, making them cheaper as well.
Liquid Cooled, Collocated AI/HPC
Unfortunately, the complexity, level of investment required, and the necessary knowledge base needed to successfully implement and create value with AI/HPC is not something to which every company has access.
The data centre industry is waking up to the fact that immersion cooling is no longer a niche solution, but rather the enabler that will make AI/HPC accessible to a rapidly expanding user market; that's where collocated HPC or HPC-as-a-service comes into play. Those users may not have the capital, in-house expertise, or time to build their own liquid-cooled environments, but engineroo’s ability to provide space inside a purpose built immersion colocation facility, with fully managed installation, modification, maintenance, and support for clients looking to build an immersion-ready IT solution. Customers can cool up to 100kW per rack, boost their compute performance by up to 20%, reduce contracted capacity by around 15% by removing the need for power-hungry fans, improve hardware reliability, and benefit from world-leading energy efficiencies.
Organisations can tap into both the experience and capital-intensive infrastructure required to implement AI/HPC and then quite quickly start creating value by leveraging the knowledge and infrastructure of a partner like engineroom.
Just as liquid cooling solutions like HYDRA™ make the benefits of AI/HPC more accessible and sustainable in a world of increasing rack densities, engineroom’s immersion colocation offering and relevant expertise make liquid cooling an option for any organisation.
That's what we see when customers come to engineroom. They have a particular problem or a question, and they leverage our expertise to build a technical solution that can deliver an answer at a lower cost than if they tried to do it internally.
Problem Solvers, Not Hardware Vendors
Our true differentiator, however, is being a problem solver, not an HPC vendor. We work with a customer to solve a particular problem, utilising the strengths of a diverse team experienced in mathematics, statistics, medicine, engineering, datascience, aerospace, law enforcement, and astrophysics. We’re a collective of problem solving experts with a deep understanding of AI/HPC.
We're very hands on in the way we approach our customers’ problems and consider ourselves very lucky to be able to work with some very clever people from a wide variety of fields.
Both our outstanding liquid cooling technology, in the form of HYDRA™, and our deep reserves of HPC expertise mean we are uniquely placed as an organisation to unlock the potential of HPC for the enterprise. We consume a lot less energy for the same amount of compute compared to an air-cooled data centre. Our equipment lasts longer, we need a smaller physical footprint (which translates to smaller contracted capacity for the customer), and our deep-seated knowledge and experience of HPC workloads, the platform, and problem solving all play a part in delivering a better service at a lower cost than our competitors.
HPC is supporting a new chapter in the use of cutting edge technology to create value for the enterprise, and we’re here to support every enterprise that wants to be a part of that journey from science fiction to reality.