A Hybrid Deep Learning Framework For Multimodal Rice Growth Stage Classification Using Image And Iot Sensor Fusion
Keywords:
Multimodal data fusion, Spatio-temporal analysis, CNN-LSTM, Rice phenology classification, IoT-based vertical farmingAbstract
Vertical farming based on IoT is a promising solution to increase rice production in limited urban spaces, but reliable and automated monitoring of crop phenology remains an open challenge. Previous studies have generally relied on unimodal data or simulated environments, which limit accuracy and discrimination power, especially in growth phases with similar visual characteristics. This research aims to develop a multimodal deep learning framework for precise classification of rice growth phases in actual IoT-based vertical farming systems. The proposed method integrates RGB canopy images with temporal environmental sensor data, including temperature, humidity, light intensity, soil moisture, and pH, through end-to-end spatio-temporal feature fusion using several CNN architectures (MobileNet, ResNet-50, VGG-19, and Xception) combined with an LSTM branch before the classification stage. Evaluation was conducted on a real-world dataset annotated by experts and spanning eight rice growth phases, measured in days after planting. Experimental results show that the proposed model significantly outperforms unimodal and CNN-only approaches, achieving a macro average F1 score of 0.96 on the VGG19–LSTM variant and maintaining performance above 0.89 in the most challenging intermediate growth phases, where visual information alone is insufficient. The contribution of these findings to the lightweight MobileNet–LSTM model maintains high accuracy with real-time inference support, making it potentially effective for application in edge computing devices in operational vertical farming systems.




