MLOps.community

Operationalizing AI Agents: From Experimentation to Production // Databricks Roundtable

33 snips
Mar 30, 2026
Samraj Moorjani, MLflow engineer focused on agent quality and observability. Apurva Misra, AI consultant helping startups scope POCs and automation. Ben Epstein, CTO building LLM-driven internal tools for property teams. They discuss scaling agent reliability, observability and testing strategies. Conversation covers eval-driven development, sandboxing and production-grade monitoring for agent workflows.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Slack Agents Replaced A Team Of Engineers

  • Ben describes internal Slack agents that replaced many engineer requests by providing data, spreadsheets, and analysis on demand.
  • He says a team of six now does work that previously needed 40 people because agents give everyone access to company data and context.
ADVICE

Build Narrow Composable Agents First

  • Start with narrow, composable agents rather than one monolithic assistant to reduce failure modes and speed iteration.
  • Break problems into state transitions or classification/regression tasks so you can build test sets and validate behavior.
INSIGHT

Eval Driven Development Is TDD For GenAI

  • Treat eval-driven development like TDD: write unit-like evaluations, integration tests, and production telemetry for GenAI systems.
  • Use evaluations as verifiable goals so agents can self-check and improve via feedback loops.
Get the Snipd Podcast app to discover more snips from this episode
Get the app