Efficient Model Compression for Resource-Constrained AI Systems Using Complexity-Driven Pruning

Authors

  • Mukesh Kumar Associate Professor, Department of Computer Science and Engineering, Faculty of Engineering and Technology, Parul Institute of Technology, Parul University, Vadodara, Gujarat, India.
  • Hannah Jessie Rani R Assistant Professor, Department of Electrical and Electronics Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bengaluru, Karnataka, India.
  • G. Karthika Assistant Professor, Department of Electronics and Communication Engineering, Saveetha Engineering College, Thandalam, Chennai – 602105, India
  • R. Suresh Assistant Professor, Department of Electronics and Communication Engineering, Sona College of Technology, Salem, Tamil Nadu, India.
  • Jai Kumar B Assistant Professor, Department of Computer Science and Engineering, Presidency University, Bangalore, Karnataka, India.
  • Arivukkodi R Assistant Professor, Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, India.
  • Seethaladevi S Assistant Professor, Department of Mathematics, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, India

Keywords:

Model Compression, Network Pruning, AI, IoT, Resource-Constrained Systems, Efficient Deep Learning.

Abstract

The rapid advancement of edge AI has enabled intelligent Internet of Things (IoT) applications such as smart healthcare, industrial automation, and agriculture. However, deploying deep learning (DL) models on resource-constrained devices is challenging due to high computation, memory, and energy demands. To address this, the research uses a dataset of 6784 layer-level records capturing computational cost, memory usage, and operational impact, and proposes a complexity-driven pruning-based model compression framework for efficient deployment. Unlike traditional approaches that rely on iterative training, pruning, and retraining cycles, the proposed Kookaburra Optimized Stacked Long Short-Term Memory (KO-StackedLSTM) Network performs layer-wise complexity analysis to selectively remove less significant filters, reducing computational overhead without expensive fine-tuning. The KO-StackedLSTM uses a bio-inspired Kookaburra optimization to remove redundant parameters and employs a stacked LSTM structure to improve temporal learning with efficient, accurate inference. To further enhance performance, Min–Max normalization is applied for improved data scaling and convergence, while PCA reduces input dimensionality, preserving essential features and minimizing processing cost. Additionally, the research introduces three adaptive compression modes: FLOPs-aware (FA), parameter-aware (PA), and memory-aware (MA) to enable flexible optimization based on specific resource constraints. It also presents a trade-off analysis between resource use and performance, offering practical insights for real-world deployment. The model achieves a high accuracy of 96.38% with an FLOPs by 83.40% using Python, demonstrating its effectiveness for efficient AI deployment in resource-constrained environments. Overall, the research provides a scalable and efficient solution for real-time inference under limited resources.s

Downloads

Published

2026-05-24

How to Cite

Kumar, M., Rani R, H. J., Karthika, G., Suresh, R., B, J. K., R, A., & S, S. (2026). Efficient Model Compression for Resource-Constrained AI Systems Using Complexity-Driven Pruning. International Journal of Artificial Intelligence and Machine Learning, 6(3s), 798–806. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/407