Inference Engineering

AI Model Serving Optimization

Book • 0

Inference Engineering is a 300-page technical guide that maps the technologies and techniques powering inference across runtime, infrastructure, and tooling layers.

The book covers model architecture and optimization, GPU hardware specifications, software frameworks and inference engines, production optimization techniques including quantization and speculative decoding, and operational considerations for running AI models at scale.

It serves as a practical resource for engineers, executives, and technical leaders seeking to understand how to deploy and manage generative AI models efficiently.

Mentioned by

Philip Kiely

Mentioned in 2 episodes

Mentioned by Alex and Philip when introducing Philip's new book about running and engineering inference systems end-to-end.

94 snips

📅 ThursdAI - Feb 26 - The Pentagon wants War Claude, every benchmark collapsed, and a solo founder hit $700K ARR with AI agents

Mentioned by

Philip Kiely

as his recently published technical book about the practice of inference engineering and the inference stack.

87 snips

Inference engineering and the real-world deployment of LLMs, with Philip Kiely

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app