Maher Hanafi

SVP of Engineering who led self-hosting LLMs at enterprise scale, optimizing GPU costs and inference (including TensorRT LLM) and building an AI platform for HR tech.

Best podcasts with Maher Hanafi

Ranked by the Snipd community

26 snips

Apr 10, 2026 • 1h 5min

We Cut LLM Latency by 70% in Production

Maher Hanafi, SVP of Engineering who led self-hosting LLMs at enterprise scale and optimized GPU inference, shares practical production stories. He describes cutting latency 50–70% with TensorRT LLM. He explains cold-start fixes, KV-cache and in-flight batching, scaling strategies that lower GPU spend, and how vertical features evolve into a reusable AI platform.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app