AI Engineering Podcast

Right-Sizing AI: Small Language Models for Real-World Production

45 snips
Sep 20, 2025
In this discussion, Steven Huels, VP of AI Engineering at Red Hat, unpacks the power of small language models (SLMs) for real-world applications. He highlights the advantages of SLMs in fitting onto single enterprise GPUs and their operational capabilities. The conversation dives into self-hosting models versus relying on APIs, tackles organizational readiness, and discusses innovations in agentic systems. Steven shares real-world examples like scam detection and emphasizes the importance of customization, automated evaluation, and continuous retraining for efficient AI deployment.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Match Hosting To Operational Maturity

  • Evaluate whether your IT organization already runs platforms before self-hosting models.
  • If not, consider an integrated AI platform to extend existing operational skills and reduce maintenance burden.
INSIGHT

Model Selection Driven By Constraints And Pace

  • Model choice depends on geopolitical, popularity, and use-case constraints more than raw hype.
  • Enterprises test multiple models with custom evaluation suites and must cope with rapid model churn.
INSIGHT

Agentic Systems Are Becoming Service-Oriented

  • Agentic systems trend toward a service-oriented architecture of specialized agents coordinating to solve complex tasks.
  • The ensemble of smaller, task-specific agents can outperform a single general-purpose model for cost and fit.
Get the Snipd Podcast app to discover more snips from this episode
Get the app