Super Data Science: ML & AI Podcast with Jon Krohn

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU

Jun 30, 2023

Discover the SPQR approach for lossless LLM weight compression, enabling running large models on a single GPU. Learn about QLora, combining low-rank adaptation and quantization for better performance

Ask episode

Chapters

Transcript

Episode notes

Lossless LLM Weight Compression and Fine-Tuning Large Models on a Single GPU

00:00 • 8min