You are currently viewing a new version of our website. To view the old version click .
Horticulturae
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

21 December 2025

A Multimodal Deep Learning Framework for Intelligent Pest and Disease Monitoring in Smart Horticultural Production Systems

,
,
,
,
,
and
1
China Agricultural University, Beijing 100083, China
2
National School of Development, Peking University, Beijing 100871, China
3
Materials Science and Technology, Beijing Forestry University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Horticulturae2026, 12(1), 8;https://doi.org/10.3390/horticulturae12010008 
(registering DOI)
This article belongs to the Special Issue Artificial Intelligence in Horticulture Production

Abstract

This study addressed the core challenge of intelligent pest and disease monitoring and early warning in smart horticultural production by proposing a multimodal deep learning framework based on multi-parameter environmental sensor arrays. The framework integrates visual information with electrical signals to overcome the inherent limitations of conventional single-modality approaches in terms of real-time capability, stability, and early detection performance. A long-term field experiment was conducted over 18 months in the Hetao Irrigation District of Bayannur, Inner Mongolia, using three representative horticultural crops—grape (Vitis vinifera), tomato (Solanum lycopersicum), and sweet pepper (Capsicum annuum)—to construct a multimodal dataset comprising illumination intensity, temperature, humidity, gas concentration, and high-resolution imagery, with a total of more than 2.6×106 recorded samples. The proposed framework consists of a lightweight convolution–Transformer hybrid encoder for electrical signal representation, a cross-modal feature alignment module, and an early-warning decision module, enabling dynamic spatiotemporal modeling and complementary feature fusion under complex field conditions. Experimental results demonstrated that the proposed model significantly outperformed both unimodal and traditional fusion methods, achieving an accuracy of 0.921, a precision of 0.935, a recall of 0.912, an F1-score of 0.923, and an area under curve (AUC) of 0.957, confirming its superior recognition stability and early-warning capability. Ablation experiments further revealed that the electrical feature encoder, cross-modal alignment module, and early-warning module each played a critical role in enhancing performance. This research provides a low-cost, scalable, and energy-efficient solution for precise pest and disease management in intelligent horticulture, supporting efficient monitoring and predictive decision-making in greenhouses, orchards, and facility-based production systems. It offers a novel technological pathway and theoretical foundation for artificial-intelligence-driven sustainable horticultural production.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.