DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models Paper • 2505.22549 • Published May 28 • 1
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates Paper • 2510.05361 • Published Oct 6 • 1