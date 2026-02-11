Sarvam AI just dropped Sarvam Vision, a new vision-language model built for recognizing text in images (OCR), making captions, and handling charts or tables—in 22 Indian languages. It's designed to work even with old or scanned docs, and the company says it's already beating global rivals on key tests.

It beat global rivals on key tests Sarvam Vision scored 84.3% on the olmOCR-Bench, ahead of Gemini 3 Pro (80.2%) and GPT-5.2 (69.8%).

On OmniDocBench v1.5, it hit 93.28%, with strong results for tables and math too.

It also handled over 20,000 real-world samples—from various document domains (e.g., scientific literature, financial documents, government bulletins, historical manuscripts, textbooks, magazines, newspapers)—on the Sarvam Indic OCR Bench.

It's built specifically for Indian languages and scripts Unlike general-purpose AIs like ChatGPT or Gemini, Sarvam Vision is trained specifically for Indian languages and scripts—which helps it nail tricky documents that others often miss.