Eval-Driven Development and Testing

Samraj and Ben describe equating evals to unit tests and using judges, CI, and telemetry to ensure quality.

Play episode from 25:40

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Databricks Roundtable episode: Operationalizing AI Agents: From Experimentation to Production.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

MLOps GPU Guide: https://go.mlops.community/gpuguide

Big shout-out to Databricks for the collaboration!

// Abstract

This panel discusses the real-world challenges of deploying AI agents at scale. The conversation explores technical and operational barriers that slow production adoption, including reliability, cost, governance, and security.

The panelists also examine how LLMOps, AIOps, and AgentOps differ from traditional MLOps, and why new approaches are required for generative and agent-based systems. Finally, experts define success criteria for GenAI frameworks, with a focus on robust evaluation, observability, and continuous monitoring across development and staging environments.

// Bio

Samraj Moorjani

Samraj is a software engineer working on the Agent Quality team. Previously, Samraj worked at Meta on ads/product classification research and AppLovin on MLOps. Samraj graduated with a BS+MS in Computer Science from UIUC, advised by Professor Hari Sundaram, where he worked on controllable natural language generation to produce appealing, interpretable science to combat the spread of misinformation. He also worked with Professor Wen-mei Hwu on accelerating LLM inference through extreme sparsification.

Apurva Misra

Apurva is an AI Consultant at Sentick, focusing on assisting startups with their AI strategy and building solutions. She leverages her extensive experience in machine learning and a Master's degree from the University of Waterloo, where her research bridged driving and machine learning, to offer valuable insights. Apurva's keen interest in the startup world fuels her passion for helping emerging companies incorporate AI effectively. In her free time, she is learning Spanish, and she also enjoys exploring hidden gem eateries, always eager to hear about new favourite spots!

Ben Epstein

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now the Co-founder and CTO at GrottoAI, focused on supercharging multifamily teams and reducing vacancy loss with AI-powered guidance for leasing and renewals. Ben also works as an adjunct professor at Washington University in St. Louis, teaching concepts in cloud computing and big data analytics.

Hosted by Adam Becker

// Related Links

Website: https://www.databricks.com/https://mlflow.org/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Samraj on LinkedIn: /samrajmoorjani/

Connect with Apurva on LinkedIn: /apurva-misra/

Connect with Ben on LinkedIn: /ben-epstein/