Efficient MoE Inference on Single Consumer-grade GPU with Dynamic Expert Caching
| Title: | Efficient MoE Inference on Single Consumer-grade GPU with Dynamic Expert Caching |
|---|---|
| Authors: | Zhang, Rui; Yang, Boxuan; Wang, Rongji; Peng, Xuemei; Wen, Zeyi |
| Source: | 2026 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS Parallel and Distributed Processing Symposium (IPDPS), 2026 IEEE International. :1219-1232 May, 2026 |
| Relation: | 2026 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
| Database: | IEEE Xplore Digital Library |