Engines & LLMs

Microsoft Researchers Squeeze AI onto CPUs with Tiny 1-bit Model

Published

21 hours ago

April 18, 2025

Abby K.

In a significant step towards running powerful AI locally, Microsoft researchers have developed an incredibly efficient 1-bit large language model (LLM). Dubbed BitNet b1.58, this 2-billion parameter model is reportedly lightweight enough to run effectively on standard CPUs, potentially even on chips like the Apple M2, without needing specialized GPUs or NPUs.

The key innovation lies in its “1-bit” architecture. While technically using 1.58 bits to represent three values (-1, 0, +1), this is drastically smaller than the typical 16-bit or 32-bit formats used in most LLMs. This massive reduction in data size dramatically cuts down on memory requirements and computational power needed for inference.

Published as open-source on Hugging Face, BitNet b1.58 was trained on a hefty 4 trillion tokens. While smaller models often sacrifice accuracy, Microsoft claims this BitNet variant holds its own against comparable, larger models like Meta’s LLaMa and Google’s Gemma in several benchmarks, even topping a few. Crucially, it requires only around 400MB of memory (excluding embeddings) – a fraction of what similar-sized models need.

To achieve these efficiency gains, the model must be run using Microsoft’s custom bitnet.cpp inference framework, available on GitHub. Standard frameworks won’t deliver the same performance benefits.

This research tackles the high energy consumption and hardware demands often associated with AI. Developing models that can run efficiently on everyday hardware like CPUs could democratize AI access, reduce reliance on large data centers, and bring advanced AI capabilities to a wider range of devices.

Our Take

Okay, a 1-bit (ish) AI model from Microsoft that can run on a regular CPU? That’s pretty cool. It tackles one of the biggest AI hurdles: the need for beefy, power-hungry hardware. Making AI this lightweight could seriously shake things up.

Imagine capable AI running locally on phones or laptops without killing the battery or needing an expensive GPU. While there’s usually a trade-off between size and smarts, Microsoft seems to be closing that gap here. This kind of efficiency focus is exactly what we need to make powerful AI more accessible and maybe even a bit more sustainable.

This story was originally featured on Tom’s Hardware.

Prompting Fate

Engines & LLMs

Microsoft Researchers Squeeze AI onto CPUs with Tiny 1-bit Model

Our Take

Leave a Reply

Leave a Reply

Trending

Our Take

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply