Sarvam AI launches Vision, Bulbul V3 for Indic languages
Sarvam AI just dropped two powerful models—Sarvam Vision and Bulbul V3—focused on making sense of documents and speech, with Sarvam Vision supporting 22 Indian languages and Bulbul V3 currently supporting 11 and expected to expand to 22.
Sarvam Vision nailed top scores for reading complex layouts, tables, and multilingual content from scanned docs, while Bulbul V3 is all about super natural-sounding voices.
Both models can handle mixed-language content
Sarvam Vision uses a massive 3-billion-parameter setup to understand both text and layout in real-world Indic documents.
Bulbul V3 can generate over 35 unique voices and handles mixed-language sentences, numbers, and names with impressive accuracy.
Bonus: their Document Intelligence APIs are free for February 2026.
New models beat out big names in several tests
Sarvam Vision beat out big names like Google Gemini 3 Pro and ChatGPT in recognizing words across tricky document formats.
Meanwhile, Bulbul V3 outperformed Cartesia and other competitors in 8 kHz (telephony) evaluations and showed strong stability metrics, while ElevenLabs led on general (full-band) audio quality—making these new releases tough to ignore if you care about local language tech.