Abstract
Early and accurate date palm disease detection is the key to successful smart farming ecosystem sustainability. In this paper, we introduce DoST-DPD, a new Dual-Stream Transformer architecture for multimodal disease diagnosis utilizing RGB, thermal and NIR imaging. In contrast with standard deep learning approaches, our model receives ontology-based semantic supervision (via per-dataset OWL ontologies), enabling knowledge injection via SPARQL-driven reasoning during training. This structured knowledge layer not only improves multimodal feature correspondence but also restricts label consistency for improving generalization performance, particularly in early disease diagnosis. We tested our proposed method on a comprehensive set of five benchmarks (PlantVillage, PlantDoc, Figshare, Mendeley, and Kaggle Date Palm) together with domain-specific ontologies. An ablation study validates the effectiveness of incorporating ontology supervision, consistently improving the performance across Accuracy, Precision, Recall, F1-Score and AUC. We achieve state-of-the-art performance over five widely recognized baselines (PlantXViT, Multi-ViT, ERCP-Net, andResNet), with our model DoST-DPD achieving the highest Accuracy of 99.3% and AUC of 98.2% on the PlantVillage dataset. In addition, ontology-driven attention maps and semantic consistency contributed to high interpretability and robustness in multiple crop and imaging modalities. Results: This work presents a scalable roadmap for ontology-integrated AI systems in agriculture and illustrates how structured semantic reasoning can directly benefit multimodal plant disease detection systems. The proposed model demonstrates competitive performance across multiple datasets and highlights the unique advantage of integrating ontology-guided supervision in multimodal crop disease detection.