Nvidia Google Cloud Partnership Targets Regulated Sectors

Nvidia and Google Cloud have expanded their partnership through the deployment of Gemini AI models on Blackwell GPU architecture.
The collaboration targets healthcare, financial services and government organisations that require data sovereignty or operate within air-gapped environments.
Google Cloud has become the first cloud service provider to offer Nvidia's HGX B200 and GB200 NVL72 processors through its A4 and A4X virtual machines. The partnership extends beyond infrastructure provision to include engineering optimisation of the computing stack supporting AI applications.
This comes during a time where traditional cloud-based AI services can no longer accommodate organisations with stringent compliance mandates. Regulated sectors face increasing pressure to implement AI whilst maintaining control over data location and access.
Through the partnership, both companies are eager to address deployment requirements by enabling organisations to maintain data within their own infrastructure.
Combining compute capabilities
Google Cloud's A4 virtual machines, accelerated by Nvidia HGX B200 processors, are now available for commercial use. These systems operate within Google's AI Hypercomputer architecture, which combines processors, networking and storage optimised for machine learning workloads.
The A4X virtual machines deliver over one exaflop of computational capacity per rack, equivalent to one quintillion floating-point operations per second. These systems support scaling to tens of thousands of graphics processing units through Google's Jupiter network fabric and Nvidia ConnectX-7 network interface cards.
Additionally, Google's third-generation liquid cooling infrastructure maintains performance for large-scale AI workloads by managing heat generation from high-performance processors. This cooling system enables sustained operation of dense computing configurations required for training and running large language models.
Organisations can access these virtual machines through managed services including Vertex AI and Google Kubernetes Engine. The infrastructure enables development and deployment of agentic AI applications across various computing scales.
Gemini on-premises
Google Distributed Cloud, the tech giant’s managed solution for on-premises and air-gapped environments, will support Nvidia Blackwell platforms to enable secure deployment of Gemini models within customer data centres.
Air-gapped environments operate without external network connections to maintain security isolation.
With this in mind, the capability hopes to address requirements from public sector, healthcare and financial services organisations that must comply with data residency regulations or maintain security controls. These sectors often can’t use cloud-based AI services due to regulatory constraints on data location and access.
In addition to this, Nvidia Blackwell's confidential computing capabilities protect user prompts and model fine-tuning data during processing. Confidential computing uses hardware-based security features to encrypt data whilst it remains in use, preventing unauthorised access even by system administrators or cloud providers.
The on-premises deployment option expands access to Google's Gemini models for organisations that previously could not use cloud-based AI services due to compliance requirements. This enables customers to implement agentic AI applications whilst maintaining control over their data and meeting privacy standards.
Nvidia optimisation to improve AI performance
The Gemini family of models represents Google's approach to multimodal AI systems that can process text, images and other data types within a single model architecture. These models demonstrate capabilities in complex reasoning, code generation and understanding relationships between different types of information.
The Gemini family of models represents Google's approach to multimodal AI systems that can process text, images and other data types within a single model architecture. These models demonstrate capabilities in complex reasoning, code generation and understanding relationships between different types of information.
From this, Nvidia and Google have implemented performance optimisations to ensure Gemini inference workloads operate efficiently on Nvidia GPUs, particularly within Google Cloud's Vertex AI platform.
These optimisations enable Google to serve volumes of user queries for Gemini models on Nvidia-accelerated infrastructure across Vertex AI and Google Distributed Cloud.
Likewise, the Gemma family of open models has been optimised for inference using Nvidia's TensorRT-LLM library. TensorRT-LLM accelerates large language model inference by optimising neural network operations for Nvidia hardware.
These models are expected to become available as Nvidia NIM microservices, which package AI models as containerised applications for simplified deployment.
Such optimisation could enable deployment across various architectures, from data centres to local Nvidia RTX-powered personal computers and workstations, allowing developers to run AI workloads on infrastructure that matches their performance requirements and deployment constraints.
Explore the latest edition of Data Centre Magazine and be part of the conversation at our global conference series, Tech & AI LIVE and Data Centre LIVE.
Discover all our upcoming events and secure your tickets today
Data Centre Magazine is a BizClik brand

