Next Article in Journal
Long-Chain 3-Hydroxyacyl-CoA Dehydrogenase Deficiency (LCHADD)-Associated Ocular Pathology—A Narrative Review
Previous Article in Journal
Neurosense: Bridging Neural Dynamics and Mental Health Through Deep Learning for Brain Health Assessment via Reaction Time and p-Factor Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis

Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Diagnostics 2026, 16(2), 294; https://doi.org/10.3390/diagnostics16020294
Submission received: 15 December 2025 / Revised: 9 January 2026 / Accepted: 13 January 2026 / Published: 16 January 2026
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

Background: Chest X-ray (CXR) is widely used for the assessment of thoracic diseases, yet automated multi-label interpretation remains challenging due to subtle visual patterns, overlapping anatomical structures, and frequent co-occurrence of abnormalities. While recent deep learning models have shown strong performance, limitations in interpretability, anatomical awareness, and robustness continue to hinder their clinical adoption. Methods: The proposed framework employs a hybrid ConvNeXtV2–Vision Transformer (ViT) architecture that combines convolutional feature extraction for capturing fine-grained local patterns with transformer-based global reasoning to model long-range contextual dependencies. The model is trained exclusively using image-level annotations. In addition to classification, three complementary post hoc components are integrated to enhance model trust and interpretability. A segmentation-aware Gradient-weighted class activation mapping (Grad-CAM) module leverages CheXmask lung and heart segmentations to highlight anatomically relevant regions and quantify predictive evidence inside and outside the lungs. An ontology-driven neuro-symbolic reasoning layer translates Grad-CAM activations into structured, rule-based explanations aligned with clinical concepts such as “basal effusion” and “enlarged cardiac silhouette”. Furthermore, a lightweight out-of-distribution (OOD) detection module based on confidence scores, energy scores, and Mahalanobis distance scores is employed to identify inputs that deviate from the training distribution. Results: On the VinBigData test set, the model achieved a macro-AUROC of 0.9525 and a Micro AUROC of 0.9777 when trained solely with image-level annotations. External evaluation further demonstrated strong generalisation, yielding macro-AUROC scores of 0.9106 on NIH ChestXray14 and 0.8487 on CheXpert (frontal views). Both Grad-CAM visualisations and ontology-based reasoning remained coherent on unseen data, while the OOD module successfully flagged non-thoracic images. Conclusions: Overall, the proposed approach demonstrates that hybrid convolutional neural network (CNN)–vision transformer (ViT) architectures, combined with anatomy-aware explainability and symbolic reasoning, can support automated chest X-ray diagnosis in a manner that is accurate, transparent, and safety-aware.
Keywords: hybrid CNN–ViT; interpretable model; grad-CAM; neuro-symbolic reasoning; thoracic diseases; out-of-distribution detection) hybrid CNN–ViT; interpretable model; grad-CAM; neuro-symbolic reasoning; thoracic diseases; out-of-distribution detection)

Share and Cite

MDPI and ACS Style

Almughamisi, N.; Abosamra, G.; Albar, A.; Saleh, M. Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis. Diagnostics 2026, 16, 294. https://doi.org/10.3390/diagnostics16020294

AMA Style

Almughamisi N, Abosamra G, Albar A, Saleh M. Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis. Diagnostics. 2026; 16(2):294. https://doi.org/10.3390/diagnostics16020294

Chicago/Turabian Style

Almughamisi, Naif, Gibrael Abosamra, Adnan Albar, and Mostafa Saleh. 2026. "Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis" Diagnostics 16, no. 2: 294. https://doi.org/10.3390/diagnostics16020294

APA Style

Almughamisi, N., Abosamra, G., Albar, A., & Saleh, M. (2026). Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis. Diagnostics, 16(2), 294. https://doi.org/10.3390/diagnostics16020294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop