Multi-Vector Search with Amélie Chatelain and Antoine Chaffin - Weaviate Podcast #134!

Mar 23, 2026

Antoine Chaffin, ML researcher at LightOn working on ColBERT and efficient multi-vector search, and Amélie Chatelain, LightOn engineer focused on multi-vector models and PyLate. They dive into late-interaction vs single-vector trade-offs. They cover code-focused ColGrep, reasoning-intensive retrieval, multimodal search, and scaling approaches like PLAID and MuVERA.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Token-Level Updates Improve Training Dynamics

Late interaction factorizes training updates by selecting token-level matches (max operator), making fine-tuning less noisy and faster to converge than monolithic single-vector training.
Antoine notes this targeted update avoids collapsing the whole representation and lets small models achieve cross-encoder-like expressivity when trained properly.

ADVICE

Cascade Dense Then Late Interaction Then Cross Encoder

Use cascaded retrieval: fast dense retrieval for recall, then late interaction re-ranking, then cross-encoder re-rank on a small candidate set to balance latency and accuracy.
Antoine and Amélie recommend ensuring the first-stage pool has high recall to avoid missing items ColBERT could surface.

ANECDOTE

CodeGrep Uses Tiny ColBERT To Outperform Larger Models

LightOn released small ColBERT code models (70M and ~150M params) trained on public code piles that outperform much larger models on code retrieval.
Antoine describes CodeGrep: a grep-like tool backed by a tiny ColBERT that runs locally, speeds agent search, and reduces LLM token usage.

Get the Snipd Podcast app to discover more snips from this episode

Get the app