A Hybrid Deep Learning Framework For Multimodal Rice Growth Stage Classification Using Image And Iot Sensor Fusion

Authors

  • Riki Ruli A. Siregar School of Data Science, Mathematics, and Informatics IPB University, Bogor, Indonesia, Faculty of Energy Telematics Institut Teknologi PLN Jakarta, Indonesia.
  • Kudang Boro Seminar Department of Engineering and Biosystems IPB University Bogor, Indonesia.
  • Sri Wahjuni School of Data Science, Mathematics, and Informatics IPB University, Bogor, Indonesia.
  • Edi Santosa Department of Agronomy and Horticulture, Faculty of Agriculture IPB University, Bogor, Indonesia.

Keywords:

Multimodal data fusion, Spatio-temporal analysis, CNN-LSTM, Rice phenology classification, IoT-based vertical farming

Abstract

Vertical farming based on IoT is a promising solution to increase rice production in limited urban spaces, but reliable and automated monitoring of crop phenology remains an open challenge. Previous studies have generally relied on unimodal data or simulated environments, which limit accuracy and discrimination power, especially in growth phases with similar visual characteristics. This research aims to develop a multimodal deep learning framework for precise classification of rice growth phases in actual IoT-based vertical farming systems. The proposed method integrates RGB canopy images with temporal environmental sensor data, including temperature, humidity, light intensity, soil moisture, and pH, through end-to-end spatio-temporal feature fusion using several CNN architectures (MobileNet, ResNet-50, VGG-19, and Xception) combined with an LSTM branch before the classification stage. Evaluation was conducted on a real-world dataset annotated by experts and spanning eight rice growth phases, measured in days after planting. Experimental results show that the proposed model significantly outperforms unimodal and CNN-only approaches, achieving a macro average F1 score of 0.96 on the VGG19–LSTM variant and maintaining performance above 0.89 in the most challenging intermediate growth phases, where visual information alone is insufficient. The contribution of these findings to the lightweight MobileNet–LSTM model maintains high accuracy with real-time inference support, making it potentially effective for application in edge computing devices in operational vertical farming systems.

Downloads

Published

2026-05-24

How to Cite

Siregar, R. R. A., Seminar, K. B., Wahjuni, S., & Santosa, E. (2026). A Hybrid Deep Learning Framework For Multimodal Rice Growth Stage Classification Using Image And Iot Sensor Fusion. International Journal of Artificial Intelligence and Machine Learning, 6(3s), 1–21. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/281