CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
| Title: | CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving |
|---|---|
| Authors: | Liu, Yuhan; Li, Hanchen; Cheng, Yihua; Ray, Siddhant; Huang, Yuyang; Zhang, Qizheng; Du, Kuntai; Yao, Jiayi; Lu, Shan; Ananthanarayanan, Ganesh; Maire, Michael; Hoffmann, Henry; Holtzman, Ari; Jiang, Junchen |
| Source: | Proceedings of the ACM SIGCOMM 2024 Conference. :38-56 |
| Availability: | http://dl.acm.org/doi/10.1145/3651890.3672274 |
| Database: | ACM Full-Text Collection |