Article

Design & Build

Maia 200 to Boost Inference Power in Azure Data Centres

By Megan Baggiony-Taylor

January 27, 2026

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Scott Guthrie, Executive Vice President at Microsoft

Microsoft unveils Maia 200, a high-performance AI accelerator built for low-latency inference, already running in Azure datacentres across the US

Microsoft has introduced Maia 200 – its first in-house AI accelerator focused on inference – now running in Azure data centres.

Built on TSMC’s 3-nanometre process and featuring a redesigned memory architecture, the chip is tailored for large-scale AI workloads, offering high efficiency and performance per dollar.

Maia 200 is equipped with 216GB of HBM3e memory, delivering 7 TB/s bandwidth, 272MB on-chip SRAM and data movement engines designed to keep large language models constantly active.

It targets inference use cases such as token generation and synthetic data pipelines, including work with OpenAI’s GPT-5.2 models and internal model development by Microsoft’s Superintelligence team.

Microsoft's first in-house AI accelerator focused on inference, Maia 200 (Credit: Microsoft)

Writing on LinkedIn, Scott Guthrie, Executive Vice President at Microsoft says: “As AI workloads get bigger and more complex, we are engineering the full stack from our custom-built silicon all the way to the data centre. Today we launched Maia 200, our next-generation AI accelerator chip.

“Maia 200 is an AI inference powerhouse: the most performant first‑party silicon from any hyperscaler, with three times the FP4 performance of Amazon’s third‑generation Trainium and FP8 performance above Google’s seventh‑generation TPU. It’s also the most efficient inference system we’ve ever deployed, delivering 30% better performance per dollar than the latest hardware in our fleet.

“Already running in our Iowa data centre with impressive throughput, Maia 200 is accelerating today’s multimodal, multicall AI workloads with faster inference and higher output at scale.”

Deployed and integrated at data centre scale

Maia 200 is now deployed in Microsoft’s US Central data centre region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona next in line.

Additional regions are scheduled to follow. The chip is fully integrated with Azure’s control plane and services, with native support for security, telemetry and diagnostics at the chip and rack levels.

Each Maia 200 chip houses more than 140 billion transistors and delivers over 10 petaFLOPS of 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC thermal envelope.

These capabilities are optimised for low-precision compute used in modern inference models, while still allowing headroom for larger models.

But Microsoft also addresses data movement, a common bottleneck in AI performance. The chip’s memory system uses narrow-precision datatypes, dedicated DMA engines and an on-die network-on-chip fabric to increase throughput.

This architecture improves the rate at which tokens are processed and models are fed with new inputs.

Maia 200 info-graphic from Microsoft Azure, showing the capability (Credit: Microsoft)

System-level innovation and network design

At the system level, Maia 200 introduces a two-tier scale-up network architecture using standard Ethernet rather than proprietary fabrics.

This choice enables broad scalability and cost efficiency, while maintaining high performance and reliability.

Each Maia accelerator connects with 2.8 TB/s of dedicated, bidirectional scale-up bandwidth and supports collective operations across clusters of up to 6,144 accelerators.

Inside each tray, four accelerators are linked directly with non-switched connections, ensuring high-bandwidth local communication.

The same transport protocols are used across trays, racks and entire clusters, creating a consistent and programmable fabric for inference workloads.

This unified networking design simplifies cluster management, minimises network latency and reduces power consumption, lowering total cost of ownership across Microsoft’s Azure fleet.

Maia 200 server blade (Credit: Microsoft)

Software stack and data centre readiness

Microsoft previews the Maia software development kit (SDK) alongside the hardware rollout.

The SDK includes integration with PyTorch, a Triton compiler, optimised kernel libraries and access to a low-level programming language designed for Maia.

Developers can use the SDK to port models across hardware targets or tune performance for specific use cases.

To reduce deployment time, Microsoft begins validating its silicon and systems before fabrication.

Maia 200 is developed using a pre-silicon modelling environment that simulates LLM workloads in detail, enabling optimisation across silicon, networking and software before production.

The company also builds out core data centre systems during this phase, including second-generation, liquid-cooled heat exchanger units.

As a result, Maia 200 is production-ready within days of silicon arrival and installed in data centres in less than half the time of prior infrastructure programmes.

This co-engineered approach, combining chip design with system software and data centre integration, allows Microsoft to deliver higher utilisation, lower cost per watt and faster time to production at global scale.

Company portals

Microsoft

Executives

Scott Guthrie
Executive Vice President, Cloud + AI Group

Maia 200 to Boost Inference Power in Azure Data Centres

Deployed and integrated at data centre scale

System-level innovation and network design

Software stack and data centre readiness

Company portals

Microsoft

Executives

Scott Guthrie

Tags