
Weaviate Podcast Multi-Vector Search with Amélie Chatelain and Antoine Chaffin - Weaviate Podcast #134!
Mar 23, 2026
Antoine Chaffin, ML researcher at LightOn working on ColBERT and efficient multi-vector search, and Amélie Chatelain, LightOn engineer focused on multi-vector models and PyLate. They dive into late-interaction vs single-vector trade-offs. They cover code-focused ColGrep, reasoning-intensive retrieval, multimodal search, and scaling approaches like PLAID and MuVERA.
AI Snips
Chapters
Transcript
Episode notes
Token-Level Updates Improve Training Dynamics
- Late interaction factorizes training updates by selecting token-level matches (max operator), making fine-tuning less noisy and faster to converge than monolithic single-vector training.
- Antoine notes this targeted update avoids collapsing the whole representation and lets small models achieve cross-encoder-like expressivity when trained properly.
Cascade Dense Then Late Interaction Then Cross Encoder
- Use cascaded retrieval: fast dense retrieval for recall, then late interaction re-ranking, then cross-encoder re-rank on a small candidate set to balance latency and accuracy.
- Antoine and Amélie recommend ensuring the first-stage pool has high recall to avoid missing items ColBERT could surface.
CodeGrep Uses Tiny ColBERT To Outperform Larger Models
- LightOn released small ColBERT code models (70M and ~150M params) trained on public code piles that outperform much larger models on code retrieval.
- Antoine describes CodeGrep: a grep-like tool backed by a tiny ColBERT that runs locally, speeds agent search, and reduces LLM token usage.
