
Maher Hanafi
SVP of Engineering who led self-hosting LLMs at enterprise scale, optimizing GPU costs and inference (including TensorRT LLM) and building an AI platform for HR tech.
Best podcasts with Maher Hanafi
Ranked by the Snipd community

26 snips
Apr 10, 2026 • 1h 5min
We Cut LLM Latency by 70% in Production
Maher Hanafi, SVP of Engineering who led self-hosting LLMs at enterprise scale and optimized GPU inference, shares practical production stories. He describes cutting latency 50–70% with TensorRT LLM. He explains cold-start fixes, KV-cache and in-flight batching, scaling strategies that lower GPU spend, and how vertical features evolve into a reusable AI platform.


