Abstract
This study addressed the core challenge of intelligent pest and disease monitoring and early warning in smart horticultural production by proposing a multimodal deep learning framework based on multi-parameter environmental sensor arrays. The framework integrates visual information with electrical signals to overcome the inherent limitations of conventional single-modality approaches in terms of real-time capability, stability, and early detection performance. A long-term field experiment was conducted over 18 months in the Hetao Irrigation District of Bayannur, Inner Mongolia, using three representative horticultural crops—grape (Vitis vinifera), tomato (Solanum lycopersicum), and sweet pepper (Capsicum annuum)—to construct a multimodal dataset comprising illumination intensity, temperature, humidity, gas concentration, and high-resolution imagery, with a total of more than recorded samples. The proposed framework consists of a lightweight convolution–Transformer hybrid encoder for electrical signal representation, a cross-modal feature alignment module, and an early-warning decision module, enabling dynamic spatiotemporal modeling and complementary feature fusion under complex field conditions. Experimental results demonstrated that the proposed model significantly outperformed both unimodal and traditional fusion methods, achieving an accuracy of , a precision of , a recall of , an F1-score of , and an area under curve (AUC) of , confirming its superior recognition stability and early-warning capability. Ablation experiments further revealed that the electrical feature encoder, cross-modal alignment module, and early-warning module each played a critical role in enhancing performance. This research provides a low-cost, scalable, and energy-efficient solution for precise pest and disease management in intelligent horticulture, supporting efficient monitoring and predictive decision-making in greenhouses, orchards, and facility-based production systems. It offers a novel technological pathway and theoretical foundation for artificial-intelligence-driven sustainable horticultural production.