Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems

Yuhan Cao; Yawen Zhu; Hanwen Zhang; Yuxuan Jiang; Ke Chen; Haoran Tang; Zhewei Wang; Yihong Song

doi:10.3390/horticulturae12010023

,

and

¹

China Agricultural University, Beijing 100083, China

²

National School of Development, Peking University, Beijing 100871, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Horticulturae2026, 12(1), 23;https://doi.org/10.3390/horticulturae12010023
(registering DOI)

This article belongs to the Special Issue Artificial Intelligence in Horticulture Production

Version Notes

Order Reprints

Abstract

This study was conducted to address the demand for interpretable intelligent recognition of fruit tree diseases in smart horticultural environments. A KAD-Former framework integrating an agricultural knowledge graph with a visual Transformer was proposed and systematically validated through extensive cross-regional, multi-variety, and multi-disease experiments. The primary objective of this work was to overcome the limitations of conventional deep models, including insufficient interpretability, unstable recognition of weak disease features, and poor cross-regional generalization. In the experimental evaluation, the model achieved significant advantages across multiple representative tasks: in the overall performance comparison, KAD-Former reached an accuracy of

0.946

, an F1-score of

0.933

, and a mAP of

0.938

, outperforming classical models such as ResNet50, EfficientNet, and Swin-T. In the cross-regional generalization assessment, a DGS of

0.933

was obtained, notably surpassing competing models. In terms of explainability consistency, a Consistency@5 score of

0.826

indicated strong alignment between the model’s attention regions and expert annotations. The ablation experiments further demonstrated that the three core modules—AKG (agricultural knowledge graph), SAM (semantic alignment module), and KGA (knowledge-guided attention)—each contributed substantially to final performance, with the complete model exhibiting the best results. These findings collectively demonstrate the comprehensive advantages of KAD-Former in disease classification, symptom localization, model interpretability, and cross-domain transfer. The proposed method not only achieved state-of-the-art performance in pure visual tasks but also advanced knowledge-enhanced and interpretable reasoning by emulating the diagnostic logic employed by agricultural experts in real orchard scenarios. Through the integration of the agricultural knowledge graph, semantic alignment, and knowledge-guided attention, the model maintained stable performance under challenging conditions such as complex illumination, background noise, and weak lesion features, while exhibiting strong robustness in cross-region and cross-variety transfer tests. Furthermore, the experimental results indicated that the approach enhanced fine-grained recognition capabilities for various fruit tree diseases, including apple ring rot, brown spot, powdery mildew, and downy mildew.

Keywords:

fruit disease recognition; agricultural knowledge graph; Vision Transformer; Explainable AI; smart horticulture; cross-modal reasoning

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.