Super Data Science: ML & AI Podcast with Jon Krohn cover image

692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Lossless LLM Weight Compression and Fine-Tuning Large Models on a Single GPU

Explore how the SPQR method enables near lossless LLM weight compression, allowing for running large models on a single GPU without compromising accuracy. The chapter also introduces QLora, a method combining low-rank adaptation and quantization for enhancing the performance of open-source large models on a single GPU.

Play episode from 00:00
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app