Maia 200 to Boost Inference Power in Azure Data Centres

Microsoft has introduced Maia 200 â its first in-house AI accelerator focused on inference â now running in Azure data centres.
Built on TSMCâs 3-nanometre process and featuring a redesigned memory architecture, the chip is tailored for large-scale AI workloads, offering high efficiency and performance per dollar.
Maia 200 is equipped with 216GB of HBM3e memory, delivering 7 TB/s bandwidth, 272MB on-chip SRAM and data movement engines designed to keep large language models constantly active.
It targets inference use cases such as token generation and synthetic data pipelines, including work with OpenAIâs GPT-5.2 models and internal model development by Microsoftâs Superintelligence team.
Writing on LinkedIn, Scott Guthrie, Executive Vice President at Microsoft says: âAs AI workloads get bigger and more complex, we are engineering the full stack from our custom-built silicon all the way to the data centre. Today we launched Maia 200, our next-generation AI accelerator chip.
âMaia 200 is an AI inference powerhouse: the most performant firstâparty silicon from any hyperscaler, with three times the FP4 performance of Amazonâs thirdâgeneration Trainium and FP8 performance above Googleâs seventhâgeneration TPU. Itâs also the most efficient inference system weâve ever deployed, delivering 30% better performance per dollar than the latest hardware in our fleet.
âAlready running in our Iowa data centre with impressive throughput, Maia 200 is accelerating todayâs multimodal, multicall AI workloads with faster inference and higher output at scale.â
Deployed and integrated at data centre scale
Maia 200 is now deployed in Microsoftâs US Central data centre region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona next in line.
Additional regions are scheduled to follow. The chip is fully integrated with Azureâs control plane and services, with native support for security, telemetry and diagnostics at the chip and rack levels.
Each Maia 200 chip houses more than 140 billion transistors and delivers over 10 petaFLOPS of 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC thermal envelope.
These capabilities are optimised for low-precision compute used in modern inference models, while still allowing headroom for larger models.
But Microsoft also addresses data movement, a common bottleneck in AI performance. The chipâs memory system uses narrow-precision datatypes, dedicated DMA engines and an on-die network-on-chip fabric to increase throughput.
This architecture improves the rate at which tokens are processed and models are fed with new inputs.
System-level innovation and network design
At the system level, Maia 200 introduces a two-tier scale-up network architecture using standard Ethernet rather than proprietary fabrics.
This choice enables broad scalability and cost efficiency, while maintaining high performance and reliability.
Each Maia accelerator connects with 2.8 TB/s of dedicated, bidirectional scale-up bandwidth and supports collective operations across clusters of up to 6,144 accelerators.
Inside each tray, four accelerators are linked directly with non-switched connections, ensuring high-bandwidth local communication.
The same transport protocols are used across trays, racks and entire clusters, creating a consistent and programmable fabric for inference workloads.
This unified networking design simplifies cluster management, minimises network latency and reduces power consumption, lowering total cost of ownership across Microsoftâs Azure fleet.
Software stack and data centre readiness
Microsoft previews the Maia software development kit (SDK) alongside the hardware rollout.
The SDK includes integration with PyTorch, a Triton compiler, optimised kernel libraries and access to a low-level programming language designed for Maia.
Developers can use the SDK to port models across hardware targets or tune performance for specific use cases.
To reduce deployment time, Microsoft begins validating its silicon and systems before fabrication.
Maia 200 is developed using a pre-silicon modelling environment that simulates LLM workloads in detail, enabling optimisation across silicon, networking and software before production.
The company also builds out core data centre systems during this phase, including second-generation, liquid-cooled heat exchanger units.
As a result, Maia 200 is production-ready within days of silicon arrival and installed in data centres in less than half the time of prior infrastructure programmes.
This co-engineered approach, combining chip design with system software and data centre integration, allows Microsoft to deliver higher utilisation, lower cost per watt and faster time to production at global scale.


