Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems
Abstract
1. Introduction
- 1.
- A fruit-disease-oriented AKG is constructed to structurally represent symptom–disease–stage semantic relationships, providing a foundation for explainability rather than simple classification.
- 2.
- A Semantic Alignment Module (SAM) is proposed to bridge the gap between low-level visual tokens and high-level symptom nodes, ensuring that visual features are grounded in structured agricultural semantics.
- 3.
- A Knowledge-Guided Attention (KGA) module is designed to replace data-driven attention with knowledge-anchored weights, directly addressing the black-box nature of standard Transformers by forcing the model to attend to regions consistent with expert diagnostic logic.
- 4.
- Through the multi-source, multi-region dataset constructed here, we demonstrate that KAD-Former outperforms conventional deep models in accuracy and cross-regional generalization, providing a transparent and trustworthy solution for real-world horticultural environments.
2. Materials and Method
2.1. Data Collection
2.2. Dataset Enhancement
2.2.1. Basic Image Enhancement
2.2.2. Cross-Domain Simulation Enhancement
2.2.3. Pseudo-Lesion Simulation Enhancement
2.3. Proposed Method
2.3.1. Overall
2.3.2. Agricultural Knowledge Graph (AKG)
2.3.3. Semantic Alignment Module (SAM)
2.3.4. Knowledge-Enhanced Attention Module (KGA)
3. Results and Discussion
3.1. Experimental Configuration
3.1.1. Hardware and Software Platform
3.1.2. Baseline Models and Evaluation Metrics
3.2. Comparison of KAD-Former with Baseline Models
3.3. Cross-Regional Generalization Performance
3.4. Ablation Study
3.5. Discusssion
3.6. Limitation and Future Work
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Khan, A.; Korban, S.S. Breeding and genetics of disease resistance in temperate fruit trees: Challenges and new opportunities. Theor. Appl. Genet. 2022, 135, 3961–3985. [Google Scholar] [CrossRef] [PubMed]
- Ray, R.V. Effects of pathogens and disease on plant physiology. In Agrios’ Plant Pathology; Elsevier: Amsterdam, The Netherlands, 2024; pp. 63–92. [Google Scholar]
- He, Y.; Xiao, Q.; Bai, X.; Zhou, L.; Liu, F.; Zhang, C. Recent progress of nondestructive techniques for fruits damage inspection: A review. Crit. Rev. Food Sci. Nutr. 2022, 62, 5476–5494. [Google Scholar] [CrossRef] [PubMed]
- Rojas Santelices, I.; Cano, S.; Moreira, F.; Peña Fritz, Á. Artificial Vision Systems for Fruit Inspection and Classification: Systematic Literature Review. Sensors 2025, 25, 1524. [Google Scholar] [CrossRef] [PubMed]
- Kumar, M.; Pal, Y.; Gangadharan, S.M.P.; Chakraborty, K.; Yadav, C.S.; Kumar, H.; Tiwari, B. Apple Sweetness Measurement and Fruit Disease Prediction Using Image Processing Techniques Based on Human-Computer Interaction for Industry 4.0. Wirel. Commun. Mob. Comput. 2022, 2022, 5760595. [Google Scholar] [CrossRef]
- Palei, S.; Behera, S.K.; Sethy, P.K. A systematic review of citrus disease perceptions and fruit grading using machine vision. Procedia Comput. Sci. 2023, 218, 2504–2519. [Google Scholar] [CrossRef]
- Lin, X.; Wa, S.; Zhang, Y.; Ma, Q. A dilated segmentation network with the morphological correction method in farming area image Series. Remote Sens. 2022, 14, 1771. [Google Scholar] [CrossRef]
- Zhang, Y.; He, S.; Wa, S.; Zong, Z.; Lin, J.; Fan, D.; Fu, J.; Lv, C. Symmetry GAN detection network: An automatic one-stage high-accuracy detection network for various types of lesions on CT images. Symmetry 2022, 14, 234. [Google Scholar] [CrossRef]
- Nancy, C.; Kiran, S. Cucumber leaf disease detection using glcm features with random forest algorithm. Int. Res. J. Multidiscip. Technovation 2024, 6, 40–50. [Google Scholar] [CrossRef]
- Li, Q.; Ren, J.; Zhang, Y.; Song, C.; Liao, Y.; Zhang, Y. Privacy-Preserving DNN Training with Prefetched Meta-Keys on Heterogeneous Neural Network Accelerators. In Proceedings of the 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; pp. 1–6. [Google Scholar]
- Gao, X.; Li, S.; Su, X.; Li, Y.; Huang, L.; Tang, W.; Zhang, Y.; Dong, M. Application of advanced deep learning models for efficient apple defect detection and quality grading in agricultural production. Agriculture 2024, 14, 1098. [Google Scholar] [CrossRef]
- Kunduracioglu, I. Cnn models approaches for robust classification of apple diseases. Comput. Decis. Mak. Int. J. 2024, 1, 235–251. [Google Scholar] [CrossRef]
- Liu, Y.; Gao, G.; Zhang, Z. Crop disease recognition based on modified light-weight CNN with attention mechanism. IEEE Access 2022, 10, 112066–112075. [Google Scholar] [CrossRef]
- Azgomi, H.; Haredasht, F.R.; Motlagh, M.R.S. Diagnosis of some apple fruit diseases by using image processing and artificial neural network. Food Control 2023, 145, 109484. [Google Scholar] [CrossRef]
- Krishnan, A. RCNN-Based Analysis of Apple Trees Leaves for Early Plant Disease Detection. Master’s Degree, Unitec, Te Pūkenga—New Zealand Institute of Skills and Technology, Auckland, New Zealand, 2024. [Google Scholar]
- Parez, S.; Dilshad, N.; Alghamdi, N.S.; Alanazi, T.M.; Lee, J.W. Visual intelligence in precision agriculture: Exploring plant disease detection via efficient vision transformers. Sensors 2023, 23, 6949. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Lan, Y.; Chen, X. CST: Convolutional Swin Transformer for detecting the degree and types of plant diseases. Comput. Electron. Agric. 2022, 202, 107407. [Google Scholar] [CrossRef]
- Aslan, E.; ÖZÜPAK, Y. Diagnosis and accurate classification of apple leaf diseases using vision transformers. Comput. Decis. Making Int. J. 2024, 1, 1–12. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, A. Plant Disease Detection Algorithm Based on Efficient Swin Transformer. Comput. Mater. Contin. 2025, 82, 3045–3068. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Y.; Ma, X. A new strategy for tuning ReLUs: Self-adaptive linear units (SALUs). In Proceedings of the ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, Shenyang, China, 17–19 December 2021; pp. 1–8. [Google Scholar]
- Yu, H.l.; Shen, J.m.; Bi, C.g.; Liang, J.; Chen, H.l. Intelligent diagnostic system for rice diseases and pests based on knowledge graph. J. S. China Agric. Univ. 2021, 42, 105–116. [Google Scholar]
- Wang, P.; Zhang, C.; Wang, D.; Zhang, S.; Wang, J.; Wang, X.; Huang, L. Relation extraction for knowledge graph generation in the agriculture domain: A case study on soybean pests and disease. Appl. Eng. Agric. 2023, 39, 215–224. [Google Scholar] [CrossRef]
- Gong, R.; Li, X. The application progress and research trends of knowledge graphs and large language models in agriculture. Comput. Electron. Agric. 2025, 235, 110396. [Google Scholar] [CrossRef]
- Alwan, W.H.; Alturfi, S.M. Multi-Stage Vision Transformer and Knowledge Graph Fusion for Enhanced Plant Disease Classification. Comput. Syst. Sci. Eng. 2025, 49, 419–434. [Google Scholar] [CrossRef]
- Zhao, X.; Chen, B.; Ji, M.; Wang, X.; Yan, Y.; Zhang, J.; Liu, S.; Ye, M.; Lv, C. Implementation of large language models and agricultural knowledge graphs for efficient plant disease detection. Agriculture 2024, 14, 1359. [Google Scholar] [CrossRef]
- Gao, R.; Dong, Z.; Wang, Y.; Cui, Z.; Ye, M.; Dong, B.; Lu, Y.; Wang, X.; Song, Y.; Yan, S. Intelligent cotton pest and disease detection: Edge computing solutions with transformer technology and knowledge graphs. Agriculture 2024, 14, 247. [Google Scholar] [CrossRef]
- Sun, Y.; Huang, Z.; Yang, L.; Wang, Z.; Ruan, M.; Suo, J.; Yan, S. Tree-Guided Transformer for Sensor-Based Ecological Image Feature Extraction and Multitarget Recognition in Agricultural Systems. Sensors 2025, 25, 6206. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Su, X.; Zhang, H.; Zhang, X.; Yao, Y.; Zhou, S.; Zhang, B.; Ye, M.; Lv, C. Integration of diffusion transformer and knowledge graph for efficient cucumber disease detection in agriculture. Plants 2024, 13, 2435. [Google Scholar] [CrossRef]
- Wang, H.; Zhao, R. Knowledge graph of agricultural engineering technology based on large language model. Displays 2024, 85, 102820. [Google Scholar] [CrossRef]
- Upadhyay, A.; Chandel, N.S.; Singh, K.P.; Chakraborty, S.K.; Nandede, B.M.; Kumar, M.; Subeesh, A.; Upendar, K.; Salem, A.; Elbeltagi, A. Deep learning and computer vision in plant disease detection: A comprehensive review of techniques, models, and trends in precision agriculture. Artif. Intell. Rev. 2025, 58, 92. [Google Scholar] [CrossRef]
- Li, J.; Zhao, X.; Xu, H.; Zhang, L.; Xie, B.; Yan, J.; Zhang, L.; Fan, D.; Li, L. An interpretable high-accuracy method for rice disease detection based on multisource data and transfer learning. Plants 2023, 12, 3273. [Google Scholar] [CrossRef]
- Pai, D.G.; Balachandra, M.; Kamath, R. Explainable AI in Agriculture: Review of Applications, Methodologies, and Future Directions. Eng. Res. Express 2025, 7, 032202. [Google Scholar] [CrossRef]
- Zhang, H.; Zhao, S.; Song, Y.; Ge, S.; Liu, D.; Yang, X.; Wu, K. A deep learning and Grad-Cam-based approach for accurate identification of the fall armyworm (Spodoptera frugiperda) in maize fields. Comput. Electron. Agric. 2022, 202, 107440. [Google Scholar] [CrossRef]
- Febriantono, M.A. Xai-Driven Apple Disease Identification Using Efficientnet and Grad-CAM. In Proceedings of the 2025 International Conference on Smart Computing, IoT and Machine Learning (SIML), Surakarta, Indonesia, 3–4 June 2025; pp. 1–6. [Google Scholar]
- Karim, M.J.; Goni, M.O.F.; Nahiduzzaman, M.; Ahsan, M.; Haider, J.; Kowalski, M. Enhancing agriculture through real-time grape leaf disease classification via an edge device with a lightweight CNN architecture and Grad-CAM. Sci. Rep. 2024, 14, 16022. [Google Scholar] [CrossRef]
- Nirgude, V.; Rathi, S. Improving the accuracy of real field pomegranate fruit diseases detection and visualisation using convolution neural networks and grad-CAM. Int. J. Data Anal. Tech. Strateg. 2023, 15, 57–75. [Google Scholar] [CrossRef]
- Stepin, I.; Alonso, J.M.; Catala, A.; Pereira-Fariña, M. A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 2021, 9, 11974–12001. [Google Scholar] [CrossRef]
- Yang, M.D.; Tseng, H.H. Rule-Based Multi-Task Deep Learning for Highly Efficient Rice Lodging Segmentation. Remote Sens. 2025, 17, 1505. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 347–357. [Google Scholar]
- Gui, L.; Wang, B.; Huang, Q.; Hauptmann, A.G.; Bisk, Y.; Gao, J. Kat: A knowledge augmented transformer for vision-and-language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 956–968. [Google Scholar]







| Data Source | Number of Disease Types | Number of Images | Time Range |
|---|---|---|---|
| Real Orchard Collection | 12 | 18,420 | 2023.03–2024.10 |
| Agricultural Platform Data | 10 | 6850 | 2023.01–2024.08 |
| Public Online Datasets | 14 | 12,300 | – |
| Expert Knowledge Texts | – | 2150 | 2023.05–2024.10 |
| Total | 36 | 37,570 | – |
| Model | Accuracy | F1-Score | mAP | Consistency@5 | DGS |
|---|---|---|---|---|---|
| ResNet50 | |||||
| EfficientNet-B3 | |||||
| ConvNeXt-T | |||||
| ViT-B (w/o KG) | |||||
| DeiT-B | |||||
| Swin-T | |||||
| Pure KG | |||||
| K-ViT | |||||
| KAD-Former |
| Model | Source (Inner Mongolia) | Target-1 (Hebei) | Target-2 (Yunnan) | Target-3 (Internet) | DGS |
|---|---|---|---|---|---|
| ResNet50 | |||||
| EfficientNet-B3 | |||||
| ViT-B | |||||
| Swin-T | |||||
| KAD-Former |
| Model Variant | Accuracy | F1-Score | mAP | Consistency@5 | DGS |
|---|---|---|---|---|---|
| ViT-B baseline | |||||
| ViT-B + AKG | |||||
| ViT-B + AKG + SAM | |||||
| KAD-Former | |||||
| KAD-Former w/o AKG | |||||
| KAD-Former w/o SAM | |||||
| KAD-Former w/o KGA | |||||
| KAD-Former w/o PLS |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cao, Y.; Zhu, Y.; Zhang, H.; Jiang, Y.; Chen, K.; Tang, H.; Wang, Z.; Song, Y. Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems. Horticulturae 2026, 12, 23. https://doi.org/10.3390/horticulturae12010023
Cao Y, Zhu Y, Zhang H, Jiang Y, Chen K, Tang H, Wang Z, Song Y. Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems. Horticulturae. 2026; 12(1):23. https://doi.org/10.3390/horticulturae12010023
Chicago/Turabian StyleCao, Yuhan, Yawen Zhu, Hanwen Zhang, Yuxuan Jiang, Ke Chen, Haoran Tang, Zhewei Wang, and Yihong Song. 2026. "Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems" Horticulturae 12, no. 1: 23. https://doi.org/10.3390/horticulturae12010023
APA StyleCao, Y., Zhu, Y., Zhang, H., Jiang, Y., Chen, K., Tang, H., Wang, Z., & Song, Y. (2026). Semantic Alignment and Knowledge Injection for Cross-Modal Reasoning in Intelligent Horticultural Decision Support Systems. Horticulturae, 12(1), 23. https://doi.org/10.3390/horticulturae12010023
