

#29309
Mentioned in 2 episodes
AI Systems Performance Engineering
Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch
Book • 2025
AI Systems Performance Engineering equips professionals with actionable strategies to maximize efficiency across every layer of AI infrastructure.
The book provides step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems, along with techniques for scaling GPU clusters and implementing cutting-edge inference strategies.
It includes a 175+ item performance checklist covering the entire AI system lifecycle, from hardware planning and GPU programming to distributed training and efficient inference serving.
The book provides step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems, along with techniques for scaling GPU clusters and implementing cutting-edge inference strategies.
It includes a 175+ item performance checklist covering the entire AI system lifecycle, from hardware planning and GPU programming to distributed training and efficient inference serving.
Mentioned by
Mentioned in 2 episodes
Mentioned by 

and described by the author as his comprehensive O'Reilly book distilling GPU/CUDA/PyTorch performance engineering knowledge.


Jon Krohn

59 snips
973: AI Systems Performance Engineering, with Chris Fregly
Mentioned by ![undefined]()

as his new O'Reilly book covering co-design across hardware, CUDA, PyTorch, and algorithms for AI performance.

Chris Fregly

58 snips
Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs



