Sustainability-Driven Neural Network Compression For Efficient Large-Scale Model Serving

Authors

  • Dr. Ponmurugan Panneerselvam Professor & Dean-Doctoral Studies & IPR, Department of Research, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • N. Nivetha Assistant Professor, Department of Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • Dr. Utkarsh Anand Assistant Professor, Kalinga University, Naya Raipur, Chhattisgarh, India.
  • Sardorbek Isroilov Vice-Rector for Strategic Development and International Cooperation, Faculty of Business administration, Business Administration, Turan International University, Namangan, Uzbekistan.

Keywords:

Neural Network Compression, Sustainable AI, Knowledge Distillation, Model Pruning, Quantization, Green Computing, Large-Scale Model Serving, Carbon Footprint.

Abstract

Large-scale deep learning models are spreading extremely rapidly, leading to heavy computational as well as environmental loads of modern inference infrastructure. Training and serving large-scale (billion-parameter) models require massive energy, which can be a significant portion of total carbon emissions and which makes it a challenge to satisfy the sustainability goals of the organizations. In this paper, propose SuComp, a sustainable neural network compression framework to minimize the energy of large-scale model serving without any task accuracy drop. SuComp combines three different compression methods (structured pruning, post-training quantization, and knowledge distillation) in one unified framework managed by a Sustainability-Aware Compression Scheduler (SACS) to trade-off between accuracy constraints and energy/carbon costs. Experiments show that on benchmark datasets (ResNet-50, BERT-base, and GPT-2), SuComp yields an average compression ratio of 9.7x, a reduction of 61.6% inference energy usage, and a 61.8% decrease in normalized CO₂ emission, while an average of 99.4% baseline model accuracy is maintained. The proposed framework offers a systematic and pragmatic approach towards responsible AI deployment that is aligned with environmental concerns.

Downloads

Published

2026-06-01

How to Cite

Panneerselvam, D. P., Nivetha, N., Anand, D. U., & Isroilov, S. (2026). Sustainability-Driven Neural Network Compression For Efficient Large-Scale Model Serving. International Journal of Artificial Intelligence and Machine Learning, 6(4s), 392–399. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/466