How AI Is Built

#026 Embedding Numbers, Categories, Locations, Images, Text, and The World

10 snips
Oct 10, 2024
Mór Kapronczay, Head of ML at Superlinked, unpacks the nuances of embeddings beyond just text. He emphasizes that traditional text embeddings fall short, especially with complex data. Mór introduces multi-modal embeddings that integrate various data types, improving search relevance and user experiences. He also discusses challenges in embedding numerical data, suggesting innovative methods like logarithmic transformations. The conversation delves into balancing speed and accuracy in vector searches, highlighting the dynamic nature of real-time data prioritization.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Optimize Size And Iterate With Query Weights

  • Keep vectors as small as needed and use quantization (float16/8) to save memory in vector DBs.
  • Experiment fast with query-side weight changes instead of re-embedding to iterate quickly.
ADVICE

Set Modality Weights Per Query

  • Dynamically set modality weights per query by detecting intent in the natural-language query.
  • Train user-specific weight predictors if you have labels to personalize modality importance.
ADVICE

Evaluate With Labels Or A/B Tests

  • Evaluate new embeddings with IR metrics when labels exist and run A/B tests when they don't.
  • Start by eyeballing common queries to justify larger experiments before full A/B testing.
Get the Snipd Podcast app to discover more snips from this episode
Get the app