Daybreak

India’s AI still doesn’t speak India. Can it?

23 snips
Feb 2, 2026
They test ChatGPT's Punjabi and find spelling errors and Hindi bleed into responses. The episode explores how Hindi dominates datasets while many regional languages and dialects are ignored. It contrasts fast, private datasets with underused government corpora and explains why multimodal data and legal costs matter. The conversation warns that AI is flattening India’s linguistic diversity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

The Two Worlds Of Vernacular AI

  • Vernacular AI sits in two fragmented universes: private datasets optimised for speed and public corpora focused on nuance.
  • These datasets rarely cross, leaving major gaps for Indian-language AI.
ANECDOTE

Hindi Bots Boost Tax Collection

  • Gurugram used Hindi AI calls to remind taxpayers and collected about Rs 200 crore.
  • The success hinged on Hindi's strong representation in datasets, not on broad linguistic coverage.
INSIGHT

Performance Trumps Inclusivity

  • Private-sector language datasets prioritise speed and accuracy over inclusivity and dialectal nuance.
  • That optimization flattens languages into standardized forms that miss real regional speech.
Get the Snipd Podcast app to discover more snips from this episode
Get the app