

#30975
Mentioned in 2 episodes
Inference Engineering
AI Model Serving Optimization
Book • 0
Inference Engineering is a 300-page technical guide that maps the technologies and techniques powering inference across runtime, infrastructure, and tooling layers.
The book covers model architecture and optimization, GPU hardware specifications, software frameworks and inference engines, production optimization techniques including quantization and speculative decoding, and operational considerations for running AI models at scale.
It serves as a practical resource for engineers, executives, and technical leaders seeking to understand how to deploy and manage generative AI models efficiently.
The book covers model architecture and optimization, GPU hardware specifications, software frameworks and inference engines, production optimization techniques including quantization and speculative decoding, and operational considerations for running AI models at scale.
It serves as a practical resource for engineers, executives, and technical leaders seeking to understand how to deploy and manage generative AI models efficiently.
Mentioned by
Mentioned in 2 episodes
Mentioned by Alex and Philip when introducing Philip's new book about running and engineering inference systems end-to-end.

94 snips
📅 ThursdAI - Feb 26 - The Pentagon wants War Claude, every benchmark collapsed, and a solo founder hit $700K ARR with AI agents
Mentioned by ![undefined]()

as his recently published technical book about the practice of inference engineering and the inference stack.

Philip Kiely

87 snips
Inference engineering and the real-world deployment of LLMs, with Philip Kiely


