The Stack Overflow Podcast

Even the chip makers are making LLMs

21 snips
Mar 10, 2026
Kari Briski, VP of Generative AI Software for Enterprise at NVIDIA, leads the Nemotron open-model family and links model design to hardware. She talks about NVIDIA’s hardware-software co-design, precision training (FP8/FP4) and memory trade-offs. Conversations cover scalable context memory, hybrid architectures, agentic systems and why open weights and datasets matter for enterprises.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Chipmaker Builds Models To Inform Hardware Design

  • NVIDIA develops models to drive hardware co-design and surface difficult GPU workloads.
  • Kari Ann Briski explained they've worked on deep learning since 2018 to identify workloads that inform GPU, network, and storage design.
INSIGHT

Train In Lower Precision To Save Memory Without Losing Accuracy

  • Training in reduced precision preserves accuracy while saving memory and improving scalability.
  • Briski cited MVFP4 (Blackwell) and FP8 as examples that cut model memory and help fit large models across fewer GPUs.
ADVICE

Match Models To Hardware Constraints For Efficiency

  • Match model architecture and precision to hardware constraints to improve latency and efficiency.
  • Briski recommended reducing memory needs so models can run on smaller nodes (e.g., 8 GPUs) instead of large multi-node setups.
Get the Snipd Podcast app to discover more snips from this episode
Get the app