Sarvam Vision is India's 1st vision-language model
Sarvam AI just dropped Sarvam Vision, a new vision-language model built for recognizing text in images (OCR), making captions, and handling charts or tables—in 22 Indian languages.
It's designed to work even with old or scanned docs, and the company says it's already beating global rivals on key tests.
It beat global rivals on key tests
Sarvam Vision scored 84.3% on the olmOCR-Bench, ahead of Gemini 3 Pro (80.2%) and GPT-5.2 (69.8%).
On OmniDocBench v1.5, it hit 93.28%, with strong results for tables and math too.
It also handled over 20,000 real-world samples—from various document domains (e.g., scientific literature, financial documents, government bulletins, historical manuscripts, textbooks, magazines, newspapers)—on the Sarvam Indic OCR Bench.
It's built specifically for Indian languages and scripts
Unlike general-purpose AIs like ChatGPT or Gemini, Sarvam Vision is trained specifically for Indian languages and scripts—which helps it nail tricky documents that others often miss.
Sarvam AI's document intelligence APIs are free until next month
Sarvam AI was picked last year for IndiaAI Mission to build the country's own large language models—including Sarvam-Large and Sarvam-Edge—and they're offering free Document Intelligence APIs; the source does not state any free-access end date for these APIs.