MLOps.community

Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312

71 snips
May 6, 2025
Alon Bochman, CEO of RagMetrics and AI veteran, dives into the complexities of making AI reliable. He emphasizes empirical evaluation over influencer advice, advocating for collaboration between technical and domain experts. Alon discusses the importance of tailoring AI solutions and involving subject matter experts in development. The conversation also covers fine-tuning language models through expert feedback and the challenges of AI in finance, highlighting the need for effective knowledge-sharing to enhance accuracy in decision-making.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Balance Eval Quantity and Quality

  • Write as many evals as necessary to keep users satisfied, balancing test scope and complexity.
  • Focus on key critical cases rather than exhaustive coverage to optimize effort and value.
ADVICE

Start with Synthetic Evals

  • Use synthetically generated evals early to speed up iteration and increase coverage quickly.
  • Transition to user data-driven evals over time for more relevance and robustness.
INSIGHT

Power of LLM and Human Judge Loop

  • LLM judges paired with human domain experts form a powerful feedback loop improving AI reliability.
  • This synergy refines judge accuracy while keeping domain experts engaged through actionable feedback.
Get the Snipd Podcast app to discover more snips from this episode
Get the app