The Stack Overflow Podcast

Even the chip makers are making LLMs

21 snips

Mar 10, 2026

Kari Ann Briski

Kari Briski, VP of Generative AI Software for Enterprise at NVIDIA, leads the Nemotron open-model family and links model design to hardware. She talks about NVIDIA’s hardware-software co-design, precision training (FP8/FP4) and memory trade-offs. Conversations cover scalable context memory, hybrid architectures, agentic systems and why open weights and datasets matter for enterprises.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Chipmaker Builds Models To Inform Hardware Design

NVIDIA develops models to drive hardware co-design and surface difficult GPU workloads.
Kari Ann Briski explained they've worked on deep learning since 2018 to identify workloads that inform GPU, network, and storage design.

INSIGHT

Train In Lower Precision To Save Memory Without Losing Accuracy

Training in reduced precision preserves accuracy while saving memory and improving scalability.
Briski cited MVFP4 (Blackwell) and FP8 as examples that cut model memory and help fit large models across fewer GPUs.

ADVICE

Match Models To Hardware Constraints For Efficiency

Match model architecture and precision to hardware constraints to improve latency and efficiency.
Briski recommended reducing memory needs so models can run on smaller nodes (e.g., 8 GPUs) instead of large multi-node setups.

Get the Snipd Podcast app to discover more snips from this episode