
692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Lossless LLM Weight Compression and Fine-Tuning Large Models on a Single GPU
Explore how the SPQR method enables near lossless LLM weight compression, allowing for running large models on a single GPU without compromising accuracy. The chapter also introduces QLora, a method combining low-rank adaptation and quantization for enhancing the performance of open-source large models on a single GPU.
Play episode from 00:00
Transcript


