
The Stack Overflow Podcast Even the chip makers are making LLMs
21 snips
Mar 10, 2026 Kari Briski, VP of Generative AI Software for Enterprise at NVIDIA, leads the Nemotron open-model family and links model design to hardware. She talks about NVIDIA’s hardware-software co-design, precision training (FP8/FP4) and memory trade-offs. Conversations cover scalable context memory, hybrid architectures, agentic systems and why open weights and datasets matter for enterprises.
AI Snips
Chapters
Transcript
Episode notes
Chipmaker Builds Models To Inform Hardware Design
- NVIDIA develops models to drive hardware co-design and surface difficult GPU workloads.
- Kari Ann Briski explained they've worked on deep learning since 2018 to identify workloads that inform GPU, network, and storage design.
Train In Lower Precision To Save Memory Without Losing Accuracy
- Training in reduced precision preserves accuracy while saving memory and improving scalability.
- Briski cited MVFP4 (Blackwell) and FP8 as examples that cut model memory and help fit large models across fewer GPUs.
Match Models To Hardware Constraints For Efficiency
- Match model architecture and precision to hardware constraints to improve latency and efficiency.
- Briski recommended reducing memory needs so models can run on smaller nodes (e.g., 8 GPUs) instead of large multi-node setups.
