Home Innovation Why an Indian Startup’s AI Performed Better Than Gemini & OpenAI on Indian Language Tasks

Why an Indian Startup’s AI Performed Better Than Gemini & OpenAI on Indian Language Tasks

Sarvam AI started with Indian languages and documents. Its recent test results suggest that starting point can make a real difference.

Sarvam AI started with Indian languages and documents. Its recent test results suggest that starting point can make a real difference.

By Khushi Arora
New Update
Sarvam AI

Sarvam AI is a Bengaluru-based startup working on AI systems focused on Indian languages and documents. (Pic source: India Today and Sarvam AI)

Advertisment

A bank form printed in English, filled partly in Hindi. A property paper with stamps and smudges. A school circular typed in Marathi with an address in Roman letters. A scanned bill that has been photocopied twice.

These documents carry daily life in India. They also carry small hurdles for technology, especially when scripts mix on the same page, scans come in low resolution, and handwriting slips into the margins. Teams building language technology in India have started treating this as a practical design challenge, rather than a corner case.

One company working in this direction is Sarvam AI, a Bengaluru-based startup founded in August 2023 by Dr Vivek Raghavan and Dr Pratyush Kumar.

The question behind the work

Sarvam describes its approach as part of the “sovereign AI” conversation, which usually refers to building AI capabilities within a country’s own ecosystem, with local control over infrastructure and choices.

Advertisment

For a reader, the idea becomes easier to hold when it is framed around everyday interactions:

How language-first AI helps

  • digitising scanned forms and notices across Indian scripts

  • extracting information from documents that mix scripts and layouts

  • generating speech in Indian languages for voice-based services

This is not one neat problem. India has 22 scheduled languages, many scripts, and a strong culture of code-mixing, including Hinglish (Hindi-English mix) and similar blends across regions.

Why document reading remains a real-world bottleneck

Many digital services still depend on human effort to read documents, check details, and correct errors. That work shows up in banks, schools, clinics, courts, and government offices. It also shows up in small businesses that handle invoices and receipts through messaging apps.

Sarvam AI
Reading Indian documents means understanding layout, not just text. (AI-generated image)

OCR, short for optical character recognition, sits at the centre of this. OCR tries to convert the text on a page into usable digital text. Document understanding goes a step further and tries to make sense of structure, such as tables, stamps, and multi-column layouts.

The task becomes harder when:

  • scans look faded or skewed

  • forms contain tables, stamps, and signatures

  • pages include multiple scripts in one flow

  • handwriting appears alongside printed text

What the recent tests suggest

Sarvam’s document-reading model, often mentioned in reports as Sarvam Vision, has received attention because of how it performed on a couple of standard tests.

Here’s what those reports say:

  • It scored 84.3% on olmOCR-Bench

  • It scored 93.28% on OmniDocBench v1.5

You can think of these tests as exam papers for AI. They check how well a system can read and understand tricky documents, such as scanned pages, forms with tables, and layouts that are not straightforward.

At the same time, these scores come with an important caveat. Real documents vary a lot. A clean scan and a blurred photocopy behave very differently. Different fonts, stamps, handwriting, and page quality can also change outcomes.

So the safest way to take these numbers is this: they suggest the model performs strongly in controlled testing, especially on complex documents, and they underline how much attention Indian teams are now giving to document-heavy, mixed-layout work that shows up often in India.

Sarvam AI
OCR benchmarks test how accurately AI systems read complex document layouts. (AI-generated infographic)

Why Sarvam performs better on Indian language tasks

Sarvam’s stronger performance on these benchmarks comes from focus, not size. The company trains and tests its models mainly on Indian documents, which often place multiple scripts, languages, stamps, layouts, and bits of handwriting on the same page.

Global models such as Gemini and OpenAI systems are built to work across many countries and use cases. Their training spreads across a wide range of languages and document types. On very specific tasks, such as reading mixed-script Indian documents, a model designed for that context can perform better than a general one.

The benchmarks where Sarvam scored highly also reflect this. They rely on dense, scanned forms and complex layouts, which match the kind of documents Sarvam’s models are trained on, rather than cleaner or more standard global formats.

Voice tools are part of the picture

Sarvam also provides tools for speech and language, mainly in two parts.

One is speech-to-text, where the system listens to you and turns your voice into written words, even when you shift between languages mid-sentence the way many of us do. Sarvam says this part supports 22 Indian languages, along with automatic language detection. 

Sarvam AI
Language-first AI focuses on how documents are actually created and used. (Pic source: Sarvam AI)

The second is text-to-speech, where written text becomes a voice you can hear. In Sarvam’s case, its Bulbul v3 documentation lists 11 languages, including English (en-IN) and Indian languages such as Hindi, Tamil, Telugu, Kannada, and Odia. Keeping these two separate helps readers understand what the tools actually do.

Where voice capability can be useful

  • helplines that serve callers in regional languages

  • education tools that read lessons aloud in a familiar language

  • accessibility features for people who prefer listening over reading

Keeping comparisons fair

Global AI tools, including ChatGPT and Gemini, continue to improve across languages and tasks, and many people use them effectively in India. Sarvam’s work, as described in its own materials and recent reporting, emphasises a narrower focus on Indian languages and India-shaped inputs, such as mixed-script documents and code-mixed speech.

This frame keeps the conversation grounded. Different tools often serve different contexts, and performance depends heavily on the task and the setting.

Why this matters now

India’s shift towards digital delivery has raised expectations around speed and ease. People want forms processed faster, documents digitised cleanly, and services available in languages that feel natural. When language support improves, it can reduce the burden on both citizens and frontline staff.

Sarvam has also said it intends to open-source models it is training under the IndiaAI Mission, according to recent reports, which places its work within a broader ecosystem approach rather than a single-product story.

The story worth following

The most meaningful test for language-first AI will come from daily use. It will come from how well systems handle blurred scans, regional spelling variations, mixed scripts, and real conversations that switch languages mid-sentence.

This is a long, careful build. It also aligns with a simple expectation that many Indians already hold: technology should understand the languages people live in, and it should work with the documents people actually use.