India’s AI still doesn’t speak India. Can it?

23 snips

Feb 2, 2026

They test ChatGPT's Punjabi and find spelling errors and Hindi bleed into responses. The episode explores how Hindi dominates datasets while many regional languages and dialects are ignored. It contrasts fast, private datasets with underused government corpora and explains why multimodal data and legal costs matter. The conversation warns that AI is flattening India’s linguistic diversity.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

The Two Worlds Of Vernacular AI

Vernacular AI sits in two fragmented universes: private datasets optimised for speed and public corpora focused on nuance.
These datasets rarely cross, leaving major gaps for Indian-language AI.

ANECDOTE

Hindi Bots Boost Tax Collection

Gurugram used Hindi AI calls to remind taxpayers and collected about Rs 200 crore.
The success hinged on Hindi's strong representation in datasets, not on broad linguistic coverage.

INSIGHT

Performance Trumps Inclusivity

Private-sector language datasets prioritise speed and accuracy over inclusivity and dialectal nuance.
That optimization flattens languages into standardized forms that miss real regional speech.

Get the Snipd Podcast app to discover more snips from this episode