
How AI Is Built #026 Embedding Numbers, Categories, Locations, Images, Text, and The World
10 snips
Oct 10, 2024 Mór Kapronczay, Head of ML at Superlinked, unpacks the nuances of embeddings beyond just text. He emphasizes that traditional text embeddings fall short, especially with complex data. Mór introduces multi-modal embeddings that integrate various data types, improving search relevance and user experiences. He also discusses challenges in embedding numerical data, suggesting innovative methods like logarithmic transformations. The conversation delves into balancing speed and accuracy in vector searches, highlighting the dynamic nature of real-time data prioritization.
AI Snips
Chapters
Transcript
Episode notes
Optimize Size And Iterate With Query Weights
- Keep vectors as small as needed and use quantization (float16/8) to save memory in vector DBs.
- Experiment fast with query-side weight changes instead of re-embedding to iterate quickly.
Set Modality Weights Per Query
- Dynamically set modality weights per query by detecting intent in the natural-language query.
- Train user-specific weight predictors if you have labels to personalize modality importance.
Evaluate With Labels Or A/B Tests
- Evaluate new embeddings with IR metrics when labels exist and run A/B tests when they don't.
- Start by eyeballing common queries to justify larger experiments before full A/B testing.
