
How AI Is Built #021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them
17 snips
Sep 12, 2024 Nirant Kasliwal, an author known for his expertise in metadata extraction and evaluation strategies, shares invaluable insights on scaling Retrieval-Augmented Generation (RAG) systems. He dives into common pitfalls such as the challenges posed by naive RAG and the sensitivity of LLMs to input. Strategies for query profiling, user personalization, and effective metadata extraction are discussed. Nirant emphasizes the importance of understanding user context to deliver precise information, ultimately aiming to enhance the efficiency of RAG implementations.
AI Snips
Chapters
Transcript
Episode notes
Debug With Real User Logs First
- Start debugging by reviewing user complaints and logged retrieval → prompt → answer chains to identify clear failures.
- Fixes for the first 30–50% are quick; use better prompts, examples, and diagnostics to capture low-hanging gains.
Use Balanced Synthetic And Human Eval
- Build an eval set from synthetic QA plus real user complaints, but keep synthetic volume <= human-derived cases.
- Use human verification for domain-expert areas and avoid letting synthetic data dominate evaluation.
Generate Hard Negatives From Baselines
- Derive hard negatives by running known good queries against a strong baseline (e.g., BM25/Elastic) and flagging mismatches as negatives.
- Use domain experts to verify hard negatives when user feedback shows failures without clear ground truth.
