Weaviate Podcast

Multi-Vector Search with Amélie Chatelain and Antoine Chaffin - Weaviate Podcast #134!

Mar 23, 2026
Antoine Chaffin, ML researcher at LightOn working on ColBERT and efficient multi-vector search, and Amélie Chatelain, LightOn engineer focused on multi-vector models and PyLate. They dive into late-interaction vs single-vector trade-offs. They cover code-focused ColGrep, reasoning-intensive retrieval, multimodal search, and scaling approaches like PLAID and MuVERA.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Token-Level Updates Improve Training Dynamics

  • Late interaction factorizes training updates by selecting token-level matches (max operator), making fine-tuning less noisy and faster to converge than monolithic single-vector training.
  • Antoine notes this targeted update avoids collapsing the whole representation and lets small models achieve cross-encoder-like expressivity when trained properly.
ADVICE

Cascade Dense Then Late Interaction Then Cross Encoder

  • Use cascaded retrieval: fast dense retrieval for recall, then late interaction re-ranking, then cross-encoder re-rank on a small candidate set to balance latency and accuracy.
  • Antoine and Amélie recommend ensuring the first-stage pool has high recall to avoid missing items ColBERT could surface.
ANECDOTE

CodeGrep Uses Tiny ColBERT To Outperform Larger Models

  • LightOn released small ColBERT code models (70M and ~150M params) trained on public code piles that outperform much larger models on code retrieval.
  • Antoine describes CodeGrep: a grep-like tool backed by a tiny ColBERT that runs locally, speeds agent search, and reduces LLM token usage.
Get the Snipd Podcast app to discover more snips from this episode
Get the app