Sustainability-Driven Neural Network Compression For Efficient Large-Scale Model Serving

Dr. Ponmurugan Panneerselvam; N. Nivetha; Dr. Utkarsh Anand; Sardorbek  Isroilov

Authors

Dr. Ponmurugan Panneerselvam Professor & Dean-Doctoral Studies & IPR, Department of Research, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
N. Nivetha Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
Dr. Utkarsh Anand Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.
Sardorbek Isroilov Vice-Rector for Strategic Development and International Cooperation, Faculty of Business administration, Business Administration, Turan International University, Namangan, Uzbekistan.

Keywords:

Neural Network Compression, Sustainable AI, Knowledge Distillation, Model Pruning, Quantization, Green Computing, Large-Scale Model Serving, Carbon Footprint.

Abstract

Large-scale deep learning models are spreading extremely rapidly, leading to heavy computational as well as environmental loads of modern inference infrastructure. Training and serving large-scale (billion-parameter) models require massive energy, which can be a significant portion of total carbon emissions and which makes it a challenge to satisfy the sustainability goals of the organizations. In this paper, propose SuComp, a sustainable neural network compression framework to minimize the energy of large-scale model serving without any task accuracy drop. SuComp combines three different compression methods (structured pruning, post-training quantization, and knowledge distillation) in one unified framework managed by a Sustainability-Aware Compression Scheduler (SACS) to trade-off between accuracy constraints and energy/carbon costs. Experiments show that on benchmark datasets (ResNet-50, BERT-base, and GPT-2), SuComp yields an average compression ratio of 9.7x, a reduction of 61.6% inference energy usage, and a 61.8% decrease in normalized CO₂ emission, while an average of 99.4% baseline model accuracy is maintained. The proposed framework offers a systematic and pragmatic approach towards responsible AI deployment that is aligned with environmental concerns.

Sustainability-Driven Neural Network Compression For Efficient Large-Scale Model Serving

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

INDEXING

Information

Keywords