
Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
Researchers who developed S-LoRA, a system for efficiently serving a large number of Low-Rank Adaptation (LoRA) language model adapters by storing them in memory and using optimized memory management and computation strategies.
Best podcasts with Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
Ranked by the Snipd community

Nov 21, 2023 • 3min
ArXiv Preprint - S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Researchers discuss S-LoRA system for efficiently serving a large number of Low-Rank Adaptation language model adapters by using optimized memory management and computation strategies. They explain the concept of unified paging for memory management and batched inference to minimize communication and memory overheads.


