SoCs are making an appearance in the data centre industry
If you’ve been following the technology or automotive industries lately, you’ll have most likely heard of the 2021 chip shortage, when businesses globally were struggling to order SoCs due to the ongoing fallout of COVID-19.
Thankfully, chip production is (somewhat) returning to normal, although it’s not all guns blazing quite yet. Other industries have been taking note of the advantages of using SoCs, including the data centre sector, with some companies now offering bespoke chips and circuitry. According to McKinsey, though, if data centres are going to reap the benefits of this, it’s important to understand the use cases first.
Stephen Simpson, Senior Principal at QuantumBlack, a McKinsey company, points out that “hardware architectures have used a combination of central processing units (CPUs), memory, external storage, and network in a uniform way”, adding that the result of this has been restrictive commodisation due to the “substantial investment” that such innovation requires. This has meant chip manufacturers lack the incentives necessary to provide custom-made solutions, nor do they have the ability to outline specific use cases.
Simpson notes: “By taking a systems-on-a-chip (SoC) approach, data centre computer manufacturers can now optimise performance, cost, and power consumption simultaneously by tailoring the electronics to the needs of the business and optimising the design for specific calculations. Several industries, such as the data centre sector, have improved performance by 15 to 20% while significantly reducing cost and production time for the fabrication of bespoke chips.”
The challenges of commissioning chips
Like all forays into new industries, there are challenges to overcome – namely, a legal one in the case of the companies commissioning the chips. This is because, as Simpson explains, the process requires the identification, assembly, and licensing of all the patented technologies needed for a composite design, which means care needs to be taken to ensure that the integration of hardware across different vendors’ intellectual property does not significantly hinder performance.
Today, however, these lower-cost design alternatives are licensed predominantly on the ARM architecture, with the RISC-V open-standard instruction set emerging as a viable alternative. Both cloud vendors and specialised CPU and computer-hardware providers are taking advantage of this SoC approach for their traditional servers; by innovating quickly, they are starting to enjoy significant success.
SoCs are the new motherboard
According to Analytics India Magazine (AIM), the majority of cloud providers are turning to customised chips – including Google Cloud, which referred to SoCs as the ‘new motherboard’. In 2015, the company developed TPUs or Tensor Processing Units, AI accelerator application-specific integrated circuits (ASICs) designed for neural network machine learning, which became available to third-party use in 2018. Google also sold smaller versions of the chip.
Google’s TPUs have become a powerhouse for a range of services including real-time voice search, photo object recognition, and interactive language translation. According to Amin Vahdat from Google Cloud, the tech giant prefers to focus more on SoC designs where multiple functions sit on the same chip or on multiple chips inside one package, instead of integrating components on a motherboard. The company even claims that the ‘System-on-Chips’ is the modern-day motherboard.
AWS is another cloud provider expressing interest in the chip market. Last year, the company launched custom-built AWS Inferentia chips for the hardware specialisation department. Inferentia’s performance convinced AWS to deploy them for their popular Alexa services, which require machine learning to enable functions such as speech processing.
How is the semiconductor industry coping with the chip shortage?
Whilst semiconductor companies have managed to increase throughput of chips following the pandemic, production levels are still not out of the woods. Current lead times can exceed four months, which can soon become ten months if said product is moved to another manufacturing site. Changing manufacturer altogether can add another 12 months at least, and some chips contain manufacturer-specific intellectual property that may require alterations or licensing.
Amazon’s EC2 instances are now powered by AWS Inferentia chips that can deliver up to 30% higher throughput and up to 45% lower cost per inference. By contrast, Amazon EC2 F1 instances use FPGAs to enable delivery of custom hardware accelerations, according to Analytics India Magazine.
Nevertheless, Stephen Simpson argues that it is also important to address data-intensive network communication latency, in spite of promising innovation currently being seen in the core data centre servers. He says that the emerging trend is to “offload these responsibilities – including encryption and data loading – to a dedicated processing unit that also provides advanced security and accelerated data-movement capabilities”.
A range of companies offer technologies that differ considerably in terms of sophistication and price points. These are usually positioned as SmartNICs (wired networking and computational resources on the same card to offload tasks from server CPUs) and may be based on field programmable gate array (FPGA), application-specific integrated circuit (ASIC), or SoC technology, and a data-processing unit (DPU) – a specific type of SoC – has also been employed in several new chip designs.
Simpson goes on to say: “Some products go significantly further and offer important capabilities in data-pipeline management to significantly reduce network latency, support inline crypto acceleration, enable the highly secure ‘enclave’ isolation of different organisations’ data sets, and provide the ability to feed network data directly into graphics processing units (GPUs) for machine learning predictions. Of course, the capabilities of these devices need to match those of the next-generation servers.
“Organisations must carefully evaluate their current cloud-deployment architecture and evaluate how to best harness new setups proposed by cloud vendors based on proprietary hardware. In addition, they should assess the cost and timeline of these contracts to optimise new technologies. Since the hardware and possible efficient solution of the aforementioned use cases now become available as a service, leveraging these new services when encountering specific optimisation problems will be key,” he said.
In short, the cloud infrastructure industry is changing everyday; it’s up to data centre companies and cloud providers to make use of the advantages offered by SoCs so that they can keep up for years to come.