Sub Linear Gradient Estimation Algorithms For Training Massive Scale Sparse Models
Keywords:
Sub-linear gradient estimation, Sparse model training, Carbon-aware optimization, Communication efficiency, Decentralized learning.Abstract
The training of massive-scale sparse models on decentralized platforms is fraught with numerous difficulties in terms of computational burden, communication network limitations, and a heavy energy consumption profile. Classical methods, such as gradient descent, have a problem of making large numbers of passes on datasets and exchanging huge numbers of parameters that scale linearly or super-linearly with respect to the size of the model. This leads to an increased carbon footprint for such distributed computations. In order to address this challenge, this paper presents a new sub-linear gradient estimation approach for training massive-scale sparse models in energy-aware edge networks. Experimentation was conducted through a distributed simulation setup using real-life datasets for edge IoT performance to monitor the training accuracy and energy efficiency. The statistics indicate that the use of the sub-linear approach leads to a reduction of the average communication costs by 42.6% and the reduction of cumulative carbon emissions by 38.4% relative to the full gradient optimization methods. Importantly, the approach delivers these levels of efficiency without compromising on the high classification performance, recording only a marginal reduction of 0.75% in model accuracy. This study clearly shows that sub-linear approaches can be adopted to achieve carbon-neutral AI training operations across massive, resource-constrained network architectures.




