Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Characterizing the Impact of Congestion in Modern HPC Interconnects

Title: Characterizing the Impact of Congestion in Modern HPC Interconnects
Authors: Lorenzo Piarulli; Marco Faltelli; Dirk Pleiter; and Karthee Sivalingam; Dancheng Zhang; Kexue Zhao; Matteo Turisini; Francesco Iannone; Aldo Artigiani; Daniele De Sensi
Contributors: Piarulli, Lorenzo; Faltelli, Marco; Pleiter, Dirk; Karthee Sivalingam, And; Zhang, Dancheng; Zhao, Kexue; Turisini, Matteo; Iannone, Francesco; Artigiani, Aldo; De Sensi, Daniele
Publication Year: 2026
Collection: Sapienza Università di Roma: CINECA IRIS
Subject Terms: congestion; hpc
Description: High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. On modern supercomputers, however, network congestion has emerged as a major limitation, driven by heterogeneous traffic patterns resulting from diverse workload mixes. As system scale and active users continue to grow, understanding how today’s interconnect technologies respond to congestion is essential for establishing realistic performance expectations and informing future system design. This paper presents a comprehensive characterization of congestion behavior across four major HPC fabrics: EDR InfiniBand, HDR InfiniBand, NDR InfiniBand, Cray Slingshot, and emerging Ethernet fabrics. These fabrics span high-performance proprietary interconnects as well as adaptive Ethernet-based designs aligned with emerging standards such as Ultra Ethernet. We evaluate their responses to both steady congestion and a wide range of bursty patterns that vary in duration, intensity, and pause length, capturing the bursty communication typical of AI workloads. Our study covers multiple scales, examining how congestion manifests differently as system size increases and identifying scale-dependent behaviors that influence collective performance. By analyzing the challenges that arise under these controlled stress conditions, we aim to provide a practical overview of congestion issues and possible optimizations. The insights derived from this evaluation can guide researchers and HPC architects in designing more effective congestion-control mechanisms and network load-balancing strategies.
Document Type: conference object
Language: English
Relation: ispartofbook:SC High Performance 2026 Research Paper Proceedings (41th International Conference), Hamburg, Germany, June 22-26, 2026; ISC High Performance (was International Supercomputing Conference); https://hdl.handle.net/11573/1763625
Availability: https://hdl.handle.net/11573/1763625
Accession Number: edsbas.4AD4369
Database: BASE