Decentralized Asynchronous Gradient Sharing For Bandwidth-Efficient Collaborative Model Training

Authors

  • Vinitha M Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • Antonibiya S Assistant Professor, Department of Mathematics, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • Sayfiddinova Muniskhon Fakhriddin Kizi Turan International University, Namangan, Uzbekistan.
  • Dr. Kanchan Thakur Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.

Keywords:

Decentralized Training, Asynchronous SGD, Gradient Sparsification, Gossip Protocol, Bandwidth Efficiency, Distributed Deep Learning, Peer-to-Peer Training.

Abstract

Centralized parameter server topologies for distributed model training suffer from both communication bottlenecks at the aggregation point and synchronization barriers, where the workers' progress is slowed by the " straggling " of slow workers. Decentralized training over a peer-to-peer topology avoids a central aggregation point but leads to stale gradients from asynchronous updates and excessive communication overhead from gossip-based parameter sharing. This work proposes DAGrad: a decentralized asynchronous gradient sharing system for bandwidth-efficient collective training, built upon three components: (i) gossip-based partial gradient exchange, which only broadcasts the top 1% of gradient magnitude between pairs of peers; (ii) an age-weighted update strategy, which penalizes staleness; and (iii) dynamic peer selection to prioritize exchanging gradients that are maximally complementary to one's own gradients. We demonstrated through experiments using ResNet-50/ImageNet and BERT-base/GLUE over a variety of both 32- and 128-worker setups that DAGrad lowers communication bandwidth consumption between workers to 29% of synchronous dense training at 91.9% of accuracy (i.e., within 0.2% accuracy from synchronous dense training) and that the efficiency scales to 128 workers with 87% parallel efficiency.

Downloads

Published

2026-06-01

How to Cite

M, V., S, A., Kizi, S. M. F., & Thakur, D. K. (2026). Decentralized Asynchronous Gradient Sharing For Bandwidth-Efficient Collaborative Model Training. International Journal of Artificial Intelligence and Machine Learning, 6(4s), 458–462. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/474