Deep Optimizer States: Towards Scalable Training of Transformer Models using Interleaved Offloading
| Title: | Deep Optimizer States: Towards Scalable Training of Transformer Models using Interleaved Offloading |
|---|---|
| Authors: | Maurya, Avinash; Ye, Jie; Rafique, M. Mustafa; Cappello, Franck; Nicolae, Bogdan |
| Source: | Proceedings of the 25th International Middleware Conference. :404-416 |
| Availability: | http://dl.acm.org/doi/10.1145/3652892.3700781 |
| Database: | ACM Full-Text Collection |