
Vanishing Gradients Episode 34: The AI Revolution Will Not Be Monopolized
Aug 22, 2024
Guests Ines Montani and Matthew Honnibal, founders of Explosion AI and creators of the widely-used spaCy library, discuss the evolution of natural language processing (NLP) in industry. They share insights on balancing large and small AI models, challenges in modularity and privacy, and the impact of regulation on innovation. Their transition to a smaller company highlights lessons learned in the AI startup world. The conversation touches on the importance of data quality and open-source tools while celebrating the practical applications of AI for data scientists and enthusiasts alike.
AI Snips
Chapters
Transcript
Episode notes
Use LLMs For Development, Distill For Runtime
- Use large generative models in development only and distill to smaller models for runtime to save cost and protect privacy.
- Treat models as components you can replace rather than a single runtime dependency for everything.
Focus On Non–AI-Complete Problems
- Many practical NLP problems are not 'AI-complete' and can be solved using surface linguistic structure.
- Framing tasks to avoid AI-complete requirements lets you build accurate, small CPU-friendly models.
Few Hundred Examples Often Suffice
- You often only need a few hundred labeled examples to beat zero/few-shot LLM classifiers on many tasks.
- Prioritize small, focused data collection and stable evaluation over chasing larger models.
