¹ State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China
中国 西安 西安电子科技大学 综合业务网理论及关键技术国家重点实验室
² State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi
中国 无锡 无锡数学工程与先进计算国家重点实验室
³ Huawei Technologies Co. Ltd, Nanjing, China
中国 南京 华为技术有限公司
High-radix hierarchical structures, such as the dragonfly, fat-tree, and torus, are cost-effective topologies for high-performance computer (HPC) networks. In these networks, dragonfly outperforms traditional topologies such as fat-tree and torus in cost and scalability. However, network congestion occurs due to the imbalanced traffic pattern, which can lead to degraded performance. The routing algorithm influences the performance of the dragonfly topology in many ways. Routing algorithm can be designed to avoid saturating global or local links, and to avoid deadlock in the network.
In this letter, we introduce an adaptive multi-level routing (AMLR) for dragonfly networks. AMLR has three-level routes. By dividing these routes meticulously, all paths of the network can be used more effectively. Traffic between groups will be more balanced. In particular, we propose a congestion control scheme to cooperate with AMLR in the data transmission process.
Furthermore, congestion detection and notification are leveraged to identify congested packet and inform the network. Evaluations show that the proposed adaptive multi-level routing and congestion control mechanism can relieve the imbalance between groups in the 100-node dragonfly topology. As a result, AMLR provides 26%, 98%, 78%, and 99% lower latencies, and 13%, 87%, 13%, and 128% higher throughputs compared to the shortest routing under uniform, adv+i, hotspot, and permutation traffic, respectively.