
Igor Šušić
Founding Machine Learning Engineer at CAST AI specializing in inference and training optimization for large-scale NLP and recommender systems, with deep expertise in model serving, quantization, and kernel/hardware-level performance tuning.
Best podcasts with Igor Šušić
Ranked by the Snipd community

44 snips
Feb 19, 2026 • 1h 6min
Serving LLMs in Production: Performance, Cost & Scale // CAST AI Roundtable
Igor Šušić, founding ML engineer focused on large-scale inference and performance tuning. Ioana Apetrei, senior product manager building accessible, cost-effective LLM deployment. They debate why deployments fail at scale. They cover model routing and cost vs accuracy. They explain time-sharing GPUs, quantization, prefill vs decode separation, and when self-hosting or managed endpoints make sense.


