SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference
| Title: | SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference |
|---|---|
| Authors: | Zhang, Ziyi; Jiang, Ziheng; Jiang, Chengquan; Yu, Menghan; Zheng, Size; Lin, Haibin; Liu, Xin; Hoffmann, Henry |
| Source: | Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. :2197-2211 |
| Availability: | http://dl.acm.org/doi/10.1145/3779212.3790246 |
| Database: | ACM Full-Text Collection |