LOADING...
NVIDIA's new AI model combines text, vision, and speech
The model has around 30 billion parameters

NVIDIA's new AI model combines text, vision, and speech

Apr 29, 2026
02:29 pm

What's the story

NVIDIA has unveiled a new artificial intelligence (AI) model, the Nemotron 3 Nano Omni. The system combines text, vision, and speech capabilities into a single platform. With around 30 billion parameters, the model uses a mixture-of-experts architecture to deliver extremely low latency while offering high flexibility and control.

Innovative design

Touted to be 9 times faster than its rivals

The Nemotron 3 Nano Omni model combines vision and audio encoders with NVIDIA's 30B-AD3B hybrid MoE architecture. This innovative design does away with the need for separate perception modules, allowing the AI model to integrate everything into one. The result is improved efficiency at scale and up to nine times faster throughput than other open omni models currently available in the market.

Enhanced performance

The new model can help improve agentic AI applications

The new model is expected to significantly improve the performance of agentic AI applications. "To build useful agents, you can't wait seconds for a model to interpret a screen," said Gautier Cloix, CEO of H Company. He added that "By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before."

Advertisement

Versatile integration

The smaller size of the model makes it more versatile

The smaller size of the Nemotron 3 Nano Omni model also makes it possible to run on higher-end consumer hardware and execute efficiently on enterprise cloud deployments. It is designed to work with other proprietary cloud models or NVIDIA's own Nemotron open models, such as Nemotron 3 Super for high-frequency execution or Super for complex planning.

Advertisement

User-friendly deployment

The Nemotron 3 Nano Omni is available on Hugging Face

The new model can quickly understand documents, computer displays, voice activity, video, and more. This makes it an ideal interface for human-machine interaction. NVIDIA has made the Nemotron 3 Nano Omni available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice.

Advertisement