MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference
| Title: | MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference |
|---|---|
| Authors: | Rhee, M.; Choi, S.; Kim, E.; Sim, J.; Joo, Y.; Kim, H. |
| Source: | IEEE Computer Architecture Letters IEEE Comput. Arch. Lett. Computer Architecture Letters. 24(2):365-368 Dec, 2025 |
| Database: | IEEE Xplore Digital Library |