IC-Cache: Efficient Large Language Model Serving via In-context Caching
| Title: | IC-Cache: Efficient Large Language Model Serving via In-context Caching |
|---|---|
| Authors: | Yu, Yifan; Gan, Yu; Sarda, Nikhil; Tsai, Lillian; Shen, Jiaming; Zhou, Yanqi; Krishnamurthy, Arvind; Lai, Fan; Levy, Hank; Culler, David |
| Source: | Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles. :375-398 |
| Availability: | http://dl.acm.org/doi/10.1145/3731569.3764829 |
| Database: | ACM Full-Text Collection |