- Article
Mercury: Accelerating 3D Parallel Training with an AWGR-WSS-Based All-Optical Reconfigurable Network
- Shi Feng,
- Jiawei Zhang and
- Yuefeng Ji
- + 2 authors
The network traffic of 3D parallel training in large-scale deep learning, featuring burstiness, hot-spots, and periodic large-bandwidth patterns, severely challenges network efficiency, necessitating a high-performance and flexible optical network solution. To address this, this paper proposes Mercury, a hybrid optical network based on physical optical components: its optical timeslot switching (OTS) subnet uses an arrayed waveguide grating router (AWGR) and tunable lasers for dynamic traffic, while the optical circuit switching (OCS) subnet relies on wavelength selective switches (WSSs) for low-latency high-bandwidth transmission, which is coordinated by selective valiant load balancing (S-VLB) and most efficient path configuration (MEPC) mechanisms. Validated via simulations and FPGA-based testbed experiments, Mercury outperforms the Sirius network by reducing epoch training time (e.g., 179s with five jobs) and relieving OTS congestion through offloading large flows to OCS. This work demonstrates that Mercury provides a flexible, high-performance physical optical solution for 3D parallel training of large-scale deep learning models.
16 March 2026







