Abstract
This study was conducted to address the demand for interpretable intelligent recognition of fruit tree diseases in smart horticultural environments. A KAD-Former framework integrating an agricultural knowledge graph with a visual Transformer was proposed and systematically validated through extensive cross-regional, multi-variety, and multi-disease experiments. The primary objective of this work was to overcome the limitations of conventional deep models, including insufficient interpretability, unstable recognition of weak disease features, and poor cross-regional generalization. In the experimental evaluation, the model achieved significant advantages across multiple representative tasks: in the overall performance comparison, KAD-Former reached an accuracy of , an F1-score of , and a mAP of , outperforming classical models such as ResNet50, EfficientNet, and Swin-T. In the cross-regional generalization assessment, a DGS of was obtained, notably surpassing competing models. In terms of explainability consistency, a Consistency@5 score of indicated strong alignment between the model’s attention regions and expert annotations. The ablation experiments further demonstrated that the three core modules—AKG (agricultural knowledge graph), SAM (semantic alignment module), and KGA (knowledge-guided attention)—each contributed substantially to final performance, with the complete model exhibiting the best results. These findings collectively demonstrate the comprehensive advantages of KAD-Former in disease classification, symptom localization, model interpretability, and cross-domain transfer. The proposed method not only achieved state-of-the-art performance in pure visual tasks but also advanced knowledge-enhanced and interpretable reasoning by emulating the diagnostic logic employed by agricultural experts in real orchard scenarios. Through the integration of the agricultural knowledge graph, semantic alignment, and knowledge-guided attention, the model maintained stable performance under challenging conditions such as complex illumination, background noise, and weak lesion features, while exhibiting strong robustness in cross-region and cross-variety transfer tests. Furthermore, the experimental results indicated that the approach enhanced fine-grained recognition capabilities for various fruit tree diseases, including apple ring rot, brown spot, powdery mildew, and downy mildew.