SLMs vs LLMs: Which is the Greener Choice for Data Centres?

The rapid rise of AI is reshaping data centre operations, but the environmental impact of large language models (LLMs) is proving hard to ignore.
Models such as OpenAI’s GPT-4 and Google’s Gemini deliver impressive capabilities, but they consume vast amounts of electricity and water.
Training GPT-3 alone required an estimated 1,287 MWh, the equivalent of powering 120 US homes for a year, while AI-linked cooling demands have driven up water consumption at major tech firms.
In response, small language models (SLMs) are emerging as a more efficient and sustainable alternative.
With parameter counts ranging from a few million to around 10 billion, SLMs deliver targeted functionality with significantly lower resource demands, making them well suited to data centre environments focused on efficiency and cost control.
SLMs vs LLMs in data centre strategy
SLMs are built on the same transformer architecture as LLMs but are optimised using techniques such as knowledge distillation, pruning and quantisation. This allows them to retain high performance for specific tasks while using less compute, storage and memory.
Unlike LLMs, SLMs can often run on edge devices or minimal on-premises infrastructure, reducing dependence on energy-intensive centralised facilities. This aligns with the principles of Green AI – prioritising efficiency, environmental responsibility and inclusivity.
Their smaller size also makes SLMs easier to audit and explain, which is critical for regulated sectors like banking or healthcare.
In data centres, their compact design enables deployment wherever latency, privacy or compliance demands it, rather than relying solely on the public cloud.
Microsoft’s Phi-4 series
Microsoft is among the leaders in SLM development, with its Phi-4 series designed to reduce AI’s environmental impact while maintaining performance.
“The energy intensity of advanced cloud and AI services has driven us to accelerate our efforts to drive efficiencies and energy reductions,” says Melanie Nakagawa, Microsoft’s Chief Sustainability Officer.
“As AI scenarios increase in complexity, we’re empowering developers to build and optimise AI models that can achieve similar outcomes while requiring fewer resources.”
Phi-4-multimodal contains 5.6 billion parameters and handles speech, vision and text, setting a new benchmark word error rate of 6.14% on HuggingFace’s OpenASR leaderboard.
“Phi-4-multimodal marks a new milestone in Microsoft’s AI development as our first multimodal language model,” says Weizhu Chen, Technical Fellow, CVP, Gen AI at Microsoft.
“Whether interpreting spoken language, analysing images or processing text, it delivers highly efficient, low-latency inference – all while optimising for on-device execution and reduced computational overhead.”
Phi-4-mini, with 3.8 billion parameters, is tuned for reasoning, maths and code generation. It supports up to 128,000 token sequences, making it capable of handling long documents efficiently – an advantage for enterprise-scale data processing in data centres.
- 5.6B - Parameters in the Phi-4-multimodal model, fewer than most competing multimodal systems
- 6.14% - Word error rate on the Huggingface OpenASR leaderboard, representing a new benchmark record
- 128,000 - Maximum token sequence length supported by the Phi-4-mini model, enabling processing of extensive text
IBM’s Granite 3.2 models
IBM is also investing heavily in efficient AI. Its Granite 3.2 family is designed for business use, balancing capability with low resource requirements.
“The next era of AI is about efficiency, integration and real-world impact – where enterprises can achieve powerful outcomes without excessive spend on compute,” says Sriram Raghavan, Vice President of IBM AI Research.
Granite Vision 3.2 2B, a compact vision-language model, is trained on more than 85 million PDFs using IBM’s Docling toolkit. Despite its smaller size, it competes with larger models like Meta’s Llama 3.2 11B in extracting and reasoning over complex documents.
Other releases include the Granite Guardian 3.2 safety model, now 30% smaller, and the TinyTimeMixers long-range forecaster, which extends predictive capability while maintaining efficiency.
Balancing performance and sustainability
For data centres, the appeal of SLMs lies in their ability to support enterprise AI needs while keeping energy and water consumption in check.
Hybrid strategies – pairing the broad capabilities of LLMs with the efficiency of SLMs – could become standard, allowing operators to allocate workloads dynamically for sustainability, compliance and speed.
While LLMs will continue to define the outer limits of AI capability, SLMs are set to drive practical, low-footprint adoption at scale.
In an era where environmental performance is becoming a competitive differentiator, the efficiency gains from SLMs could prove decisive.
“Sustainability is good business. Sustainable business practices drive innovation,” concludes Melanie.



