Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (74)

Search Parameters:
Keywords = ConvNeXtV2

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 16470 KB  
Article
A Multi-Temporal Instance Segmentation Framework and Exhaustively Annotated Tree Crown Dataset for a Subtropical Urban Forest Case
by Weihong Lin, Hao Jiang, Mengjun Ku, Jing Zhang and Baomin Wang
Remote Sens. 2026, 18(7), 1082; https://doi.org/10.3390/rs18071082 - 3 Apr 2026
Viewed by 197
Abstract
Accurate individual tree crown identification is essential for urban forestry, yet existing datasets often lack exhaustive annotations and multi-temporal diversity. To address this limitation, an exhaustively annotated dataset was curated for crown instance segmentation, comprising 47,754 labeled individual crowns from approximately 110 species [...] Read more.
Accurate individual tree crown identification is essential for urban forestry, yet existing datasets often lack exhaustive annotations and multi-temporal diversity. To address this limitation, an exhaustively annotated dataset was curated for crown instance segmentation, comprising 47,754 labeled individual crowns from approximately 110 species across three temporal phases. Anchored in a “crown geometry” labeling criterion focusing on upper-canopy individuals visible in the imagery, and the high-resolution imagery captured seasonal variations in shape, color, and texture, providing an empirical basis for within-site robustness. Utilizing this dataset, this study (1) compared five instance segmentation models; (2) evaluated their generalization capabilities across different temporal phases; and (3) tested a multi-temporal joint training strategy and a non-maximum suppression (NMS)-based fusion. The experiments revealed significant overfitting in single-temporal models. While ConvNeXt-V2 achieved a high segmentation mean Average Precision (Segm_mAP) of 0.852 within the same temporal phase, its performance dropped sharply to 0.361 across phases. Bi-temporal joint training significantly mitigated this issue, improving cross-temporal performance to 0.665 and further increasing within-phase accuracy to 0.874. In contrast, tri-temporal training reduced accuracy (0.748), demonstrating that effective generalizability depends on the strategic selection of complementary temporal phases rather than the mere accumulation of data. The multi-temporal training framework provided in this study could serve as a practical reference and a foundational benchmark for further urban forest structural monitoring research. Full article
Show Figures

Figure 1

33 pages, 4952 KB  
Article
Modified RefineNet with Attention-Based Fusion for Multi-Class Classification of Corn and Pepper Plant Diseases
by Maramreddy Srinivasulu and Sandipan Maiti
AgriEngineering 2026, 8(4), 122; https://doi.org/10.3390/agriengineering8040122 - 30 Mar 2026
Viewed by 208
Abstract
Early and precise detection of plant diseases is essential for safeguarding crop yield and ensuring sustainable agricultural practices. In this study, we propose the Modified RefineNet with Attention based Fusion (MoRefNet-AF), a Modified RefineNet architecture enhanced with attention-based fusion for multi-class classification of [...] Read more.
Early and precise detection of plant diseases is essential for safeguarding crop yield and ensuring sustainable agricultural practices. In this study, we propose the Modified RefineNet with Attention based Fusion (MoRefNet-AF), a Modified RefineNet architecture enhanced with attention-based fusion for multi-class classification of corn (maize) and Pepper leaf diseases. Unlike the original RefineNet, which was segmentation-oriented and computationally heavy, MoRefNet-AF is redesigned for lightweight and discriminative classification. The modifications include replacing standard convolutions with depthwise separable convolutions for efficiency, adopting the Mish activation function for smoother gradient flow, redesigning the multi-resolution fusion module with concatenation and shared convolution for richer cross-scale integration, and incorporating Squeeze-and-Excitation (SE) blocks for adaptive channel recalibration. Additionally, Chained Residual Pooling (CRP) with atrous convolutions enhances contextual representation, while global average pooling with dense layers improves classification readiness. When evaluated on a curated six-class dataset combining PlantVillage and Mendeley leaf disease repositories, MoRefNet-AF achieved 99.88% accuracy, 99.74% precision, 99.73% recall, 99.95% F1-score, and 99.73% specificity. These results outperform strong baselines including ResNet152V2, DenseNet201, EfficientNet-B0, and ConvNeXt-Tiny, while maintaining only 0.3 M parameters. With its compact design and TensorFlow Lite (v2.13) compatibility, MoRefNet-AF offers a robust, lightweight, and real-time deployable solution for precision agriculture and smart plant disease monitoring. Full article
Show Figures

Figure 1

14 pages, 3023 KB  
Article
Lightweight Stereo Vision for Obstacle Detection and Range Estimation in Micro-Mobility Vehicles
by Jiansheng Ruan, Hui Weng, Zhaojun Yuan, Guangyuan Jin and Liang Zhou
Sensors 2026, 26(6), 1988; https://doi.org/10.3390/s26061988 - 23 Mar 2026
Viewed by 260
Abstract
Micro-mobility vehicles operating in closed, low-speed environments (e.g., parks) require reliable obstacle detection and accurate range estimation under strict constraints on cost, power, and onboard computation. This paper proposes HAGVNet, a lightweight stereo matching network for embedded ranging and validates its practical deployability [...] Read more.
Micro-mobility vehicles operating in closed, low-speed environments (e.g., parks) require reliable obstacle detection and accurate range estimation under strict constraints on cost, power, and onboard computation. This paper proposes HAGVNet, a lightweight stereo matching network for embedded ranging and validates its practical deployability in a target-level ranging pipeline with YOLO11n as the front-end detector. HAGVNet builds a hierarchical attention-guided cost volume (HAGV) that uses coarse-scale geometric priors to modulate fine-scale cost modeling and adopts ConvNeXtV2-style 2D cost aggregation blocks to improve stability and boundary consistency with controlled complexity. For ranging, depth statistics within detected regions are used to estimate target distance and 3D position. The model is pre-trained on SceneFlow and evaluated on KITTI. On SceneFlow, HAGVNet reaches 0.73 px EPE with 20.08 G FLOPs, indicating a favorable accuracy–complexity trade-off under low computation budgets. On an embedded Jetson Orin Nano Super platform, HAGVNet achieves 46.3 FPS under TensorRT FP16, and field tests indicate relative ranging errors of 0.5–8.6% within 2–10 m, demonstrating its practical feasibility for low-speed target-level ranging. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

22 pages, 2426 KB  
Article
MidFusionEfficientV2: Improving Ophthalmic Diagnosis with Mid-Level RGB–LBP Fusion and SE Attention
by Julide Kurt Keles, Soner Kiziloluk, Eser Sert, Furkan Talo and Muhammed Yildirim
J. Clin. Med. 2026, 15(6), 2352; https://doi.org/10.3390/jcm15062352 - 19 Mar 2026
Viewed by 394
Abstract
Background/Objectives: Early diagnosis of eye diseases is critically important for enhancing individuals’ quality of life and reducing the risk of vision loss. In this study, a deep learning-based hybrid model called MidFusionEfficientV2 has been proposed to classify eye diseases, including uveitis, conjunctivitis, [...] Read more.
Background/Objectives: Early diagnosis of eye diseases is critically important for enhancing individuals’ quality of life and reducing the risk of vision loss. In this study, a deep learning-based hybrid model called MidFusionEfficientV2 has been proposed to classify eye diseases, including uveitis, conjunctivitis, cataract, eyelid drooping, and normal conditions. Methods: The model presents a dual-branch architecture that combines an RGB image branch with an EfficientNetV2-S architecture and a specialized texture branch based on Local Binary Pattern (LBP) transformation at an intermediate level. Thanks to the Squeeze-and-Excitation (SE) blocks integrated into the LBP branch, channel-based attention mechanisms have been activated, enhancing the prominence of textural features. The features obtained from the RGB and LBP branches were combined at an intermediate level and transferred to the classification stage. Results: Experimental studies on the five-class eye disease dataset from the Mendeley Data platform have shown that the proposed model outperformed six strong models (ResNetV2, ConvNeXt, DenseNet-121, EfficientNet-B1, MobileNetV3 Large, and EfficientNetV2-S) with an accuracy of 98%. Especially in the difficult-to-diagnose uveitis class, recall and F1 scores of 97% and 94%, respectively, were achieved. Conclusions: The results show that a moderate combination of color and texture features significantly improves classification performance, and that MidFusionEfficientV2 offers a reliable and effective solution for the automatic diagnosis of eye diseases. Full article
Show Figures

Figure 1

30 pages, 3316 KB  
Article
A Novel Hybrid CNN-ViT-Based Bi-Directional Cross-Guidance Fusion-Driven Breast Cancer Detection Model
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Life 2026, 16(3), 474; https://doi.org/10.3390/life16030474 - 14 Mar 2026
Viewed by 397
Abstract
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at [...] Read more.
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at capturing localized textures, whereas Vision Transformers (ViTs) capture long-range dependencies; however, both often struggle to produce a unified representation that consistently supports diagnostic decision-making. To address these limitations, this study presents a dual-stream framework integrating ConvNeXt for high-fidelity local feature extraction with Swin Transformer V2 for hierarchical global context modeling. A Bi-Directional Cross-Guidance (BDCG) mechanism is added to harmonize interactions between the two feature domains and ensure mutual information learning in the representations. Furthermore, a Prototype-Anchored Similarity Head (PASH) is used to stabilize classification using distance-based reasoning instead of using linear separation. Comprehensive experiments show the effectiveness of the proposed method using two benchmark datasets. On Dataset 1, the model achieves accuracy: 98.8%, precision: 98.7%, recall: 98.6%, and F1 score: 97.2%, outperforming existing models based on CNN, ViTs, and hybrid architectures, and provides a lower inference time (8.3 ms/image). On the more heterogeneous Dataset 2, the model maintains strong performance, with an accuracy of 97.0%, precision of 95.4%, recall of 94.8%, and F1-score of 95.1%, demonstrating its resilience to domain shift and imaging variability. These results underscore the value of structural multi-scale feature interaction and prototype-driven classification for robust mammographic analysis. The consistent performance across internal and external evaluations indicates the potential for the proposed framework to be reliably applied in computer-aided screening systems. Full article
Show Figures

Figure 1

25 pages, 2809 KB  
Article
Multi-Architecture Deep Learning for Early Alzheimer’s Detection in MRI: Slice- and Scan-Level Analysis
by Isabelle Bricaud and Giovanni Luca Masala
Int. J. Environ. Res. Public Health 2026, 23(3), 322; https://doi.org/10.3390/ijerph23030322 - 5 Mar 2026
Viewed by 708
Abstract
Alzheimer’s disease (AD), the most common form of dementia, is a progressive and irreversible neurodegenerative disorder. Structural MRI is widely used for diagnosis, revealing brain changes associated with AD. However, these alterations are often subtle and difficult to detect manually, particularly at early [...] Read more.
Alzheimer’s disease (AD), the most common form of dementia, is a progressive and irreversible neurodegenerative disorder. Structural MRI is widely used for diagnosis, revealing brain changes associated with AD. However, these alterations are often subtle and difficult to detect manually, particularly at early stages. Early intervention during prodromal stages, such as mild cognitive impairment (MCI), can help slow disease progression, highlighting the need for reliable automated methods. In this work, we introduce a dual-level evaluation framework comparing fifteen deep learning architectures, including convolutional neural networks (CNNs), Transformers, and hybrid models, for classifying AD, MCI, and cognitively normal (CN) subjects using the ADNI dataset. A central focus of our work is the impact of robust and standardized preprocessing pipelines, which we identified as a critical yet underexplored factor influencing model reliability. By evaluating performance at both slice-level and scan-level, we reveal that multi-slice aggregation affects architectures asymmetrically. By systematically optimizing preprocessing steps to reduce data variability and enhance feature consistency, we established preprocessing quality as an essential determinant of deep learning performance in neuroimaging. Experimental results show that CNNs and hybrid pre-trained models outperform Transformer-based models in both slice-level and scan-level classification. ConvNeXtV2-L achieved the best scan-level performance (91.07%), EfficientNetV2-L the highest slice-level accuracy (86.84%), and VGG19 balanced results (86.07%/88.52%). ConvNeXtV2-L and SwinV1-L exhibited scan-level improvements of 7.60% and 9.04% respectively, while EfficientNetV2-L experienced degradation of 2.66%, demonstrating that architectural selection and aggregation strategy are interdependent factors. These findings suggest that carefully designed preprocessing not only improves classification accuracy but may also serve as a foundation for more reproducible and interpretable Alzheimer’s disease detection pipelines. Full article
Show Figures

Figure 1

24 pages, 4796 KB  
Article
Multi-Scale Feature Learning for Farmland Segmentation Under Complex Spatial Structures
by Yongqi Han, Yuqing Wang, Yun Zhang, Hongfu Ai, Chuan Qin and Xinle Zhang
Entropy 2026, 28(2), 242; https://doi.org/10.3390/e28020242 - 19 Feb 2026
Viewed by 470
Abstract
Fragmented, irregular, and scale-heterogeneous farmland parcels introduce high spatial complexity into high-resolution remote sensing imagery, leading to boundary ambiguity and inter-class spectral confusion that hinder effective feature discrimination in semantic segmentation. To address these challenges, we propose CSMNet, which adopts a ConvNeXt V2 [...] Read more.
Fragmented, irregular, and scale-heterogeneous farmland parcels introduce high spatial complexity into high-resolution remote sensing imagery, leading to boundary ambiguity and inter-class spectral confusion that hinder effective feature discrimination in semantic segmentation. To address these challenges, we propose CSMNet, which adopts a ConvNeXt V2 encoder for hierarchical representation learning and a multi-scale fusion architecture with redesigned skip connections and lateral outputs to reduce semantic gaps and preserve cross-scale information. An adaptive multi-head attention module dynamically integrates channel-wise, spatial, and global contextual cues through a lightweight gating mechanism, enhancing boundary awareness in structurally complex regions. To further improve robustness, a hybrid loss combining Binary Cross-Entropy and Dice loss is employed to alleviate class imbalance and ensure reliable extraction of small and fragmented parcels. Experimental results from Nong’an County demonstrate that the proposed model achieves superior performance compared with several state-of-the-art segmentation methods, attaining a Precision of 95.91%, a Recall of 93.95%, an F1-score of 94.92%, and an IoU of 90.85%. The IoU exceeds that of Unet++ by 8.92% and surpasses PSPNet, SegNet, DeepLabv3+, TransUNet, SeaFormer and SegMAN by more than 15%, 10%, 7%, 6%, 5% and 2%, respectively. These results indicate that CSMNet effectively improves information utilization and boundary delineation in complex agricultural landscapes. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

17 pages, 3661 KB  
Article
Wavefront Prediction for Adaptive Optics Without Wavefront Sensing Based on EfficientNetV2-S
by Zhiguang Zhang, Zelu Huang, Jiawei Wu, Zhaojun Yan, Xin Li, Chang Liu and Huizhen Yang
Photonics 2026, 13(2), 144; https://doi.org/10.3390/photonics13020144 - 2 Feb 2026
Viewed by 691
Abstract
Adaptive optics (AO) aims to counteract wavefront distortions caused by atmospheric turbulence and inherent system errors. Aberration recovery accuracy and computational speed play crucial roles in its correction capability. To address the issues of slow wavefront aberration detection speed and low measurement accuracy [...] Read more.
Adaptive optics (AO) aims to counteract wavefront distortions caused by atmospheric turbulence and inherent system errors. Aberration recovery accuracy and computational speed play crucial roles in its correction capability. To address the issues of slow wavefront aberration detection speed and low measurement accuracy in current wavefront sensorless adaptive optics, this paper proposes a wavefront correction method based on the EfficientNetV2-S model. The method utilizes paired focal plane and defocused plane intensity images to directly extract intensity features and reconstruct phase information in a non-iterative manner. This approach enables the direct prediction of wavefront Zernike coefficients from the measured intensity images, specifically for orders 3 to 35, significantly enhancing the real-time correction capability of the AO system. Simulation results show that the root mean square error (RMSE) of the predicted Zernike coefficients for D/r0 values of 5, 10, and 15 are 0.038λ, 0.071λ, and 0.111λ, respectively, outperforming conventional convolutional neural network (CNN), ResNet50/101 and ConvNeXt-T models. The experimental results demonstrate that the EfficientNetV2-S model maintains good wavefront reconstruction and prediction capabilities at D/r0 = 5 and 10, highlighting its high precision and robust wavefront prediction ability. Compared to traditional iterative algorithms, the proposed method offers advantages such as high precision, fast computation, no need for iteration, and avoidance of local minima in processing wavefront aberrations. Full article
(This article belongs to the Special Issue Adaptive Optics: Recent Technological Breakthroughs and Applications)
Show Figures

Figure 1

35 pages, 5337 KB  
Article
Enhancing Glioma Classification in Magnetic Resonance Imaging Using Vision Transformers and Convolutional Neural Networks
by Marco Antonio Gómez-Guzmán, José Jaime Esqueda-Elizondo, Laura Jiménez-Beristain, Gilberto Manuel Galindo-Aldana, Oscar Adrian Aguirre-Castro, Edgar Rene Ramos-Acosta, Cynthia Torres-Gonzalez, Enrique Efren García-Guerrero and Everardo Inzunza-Gonzalez
Electronics 2026, 15(2), 434; https://doi.org/10.3390/electronics15020434 - 19 Jan 2026
Viewed by 570
Abstract
Brain tumors, encompassing subtypes with distinct progression and risk profiles, are a serious public health concern. Magnetic resonance imaging (MRI) is the primary imaging modality for non-invasive assessment, providing the contrast and detail necessary for diagnosis, subtype classification, and individualized care planning. In [...] Read more.
Brain tumors, encompassing subtypes with distinct progression and risk profiles, are a serious public health concern. Magnetic resonance imaging (MRI) is the primary imaging modality for non-invasive assessment, providing the contrast and detail necessary for diagnosis, subtype classification, and individualized care planning. In this paper, we evaluate the capability of modern deep learning models to classify gliomas as high-grade (HGG) or low-grade (LGG) using reduced training data from MRI scans. Utilizing the BraTS 2019 best-slice dataset (2185 images in two classes, HGG and LGG) divided in two folders, training and testing, with different images obtained from different patients, we created subsets including 10%, 25%, 50%, 75%, and 100% of the dataset. Six deep learning architectures, DeiT3_base_patch16_224, Inception_v4, Xception41, ConvNextV2_tiny, swin_tiny_patch4_window7_224, and EfficientNet_B0, were evaluated utilizing three-fold cross-validation (k = 3) and increasingly large training datasets. Explainability was assessed using Grad-CAM. With 25% of the training data, DeiT3_base_patch16_224 achieved an accuracy of 99.401% and an F1-Score of 99.403%. Under the same conditions, Inception_v4 achieved an accuracy of 99.212% and a F1-Score of 99.222%. Considering how the models performed across both data subsets and their compute demands, Inception_v4 struck the best balance for MRI-based glioma classification. Both convolutional networks and vision transformers achieved superior discrimination between HGGs and LGGs, even under data-limited conditions. Architectural disparities became increasingly apparent as training data diminished, highlighting unique inductive biases and efficiency characteristics. Even with a relatively limited amount of training data, current deep learning (DL) methods can achieve reliable performance in classifying gliomas from MRI scans. Among the architectures evaluated, Inception_v4 offered the most consistent balance between accuracy, F1-Score, and computational cost, making it a strong candidate for integration into MRI-based clinical workflows. Full article
Show Figures

Figure 1

28 pages, 5548 KB  
Article
CVMFusion: ConvNeXtV2 and Visual Mamba Fusion for Remote Sensing Segmentation
by Zelin Wang, Li Qin, Cheng Xu, Dexi Liu, Zeyu Guo, Yu Hu and Tianyu Yang
Sensors 2026, 26(2), 640; https://doi.org/10.3390/s26020640 - 18 Jan 2026
Viewed by 523
Abstract
In recent years, extracting coastlines from high-resolution remote sensing imagery has proven difficult due to complex details and variable targets. Current methods struggle with the fact that CNNs cannot model long-range dependencies, while Transformers incur high computational costs. To address these issues, we [...] Read more.
In recent years, extracting coastlines from high-resolution remote sensing imagery has proven difficult due to complex details and variable targets. Current methods struggle with the fact that CNNs cannot model long-range dependencies, while Transformers incur high computational costs. To address these issues, we propose CVMFusion: a land–sea segmentation network based on a U-shaped encoder–decoder structure, whereby both the encoder and decoder are hierarchically organized. This architecture integrates the local feature extraction capabilities of CNNs with the global interaction efficiency of Mamba. The encoder uses parallel ConvNeXtV2 and VMamba branches to capture fine-grained details and long-range context, respectively. This network incorporates Dynamic Multi-Scale Attention (DyMSA) and Dynamic Weighted Cross-Attention (DyWCA) modules, which replace the traditional concatenation with an adaptive fusion mechanism to effectively fuse the features from the dual-branch encoder and utilize skip connections to complete the fusion between the encoder and decoder. Experiments on two public datasets demonstrate that CVMFusion attained MIoU accuracies of 98.05% and 96.28%, outperforming existing methods. It performs particularly well in segmenting small objects and intricate boundary regions. Full article
(This article belongs to the Special Issue Smart Remote Sensing Images Processing for Sensor-Based Applications)
Show Figures

Figure 1

19 pages, 3156 KB  
Article
Detecting Escherichia coli on Conventional Food Processing Surfaces Using UV-C Fluorescence Imaging and Deep Learning
by Zafar Iqbal, Thomas F. Burks, Snehit Vaddi, Pappu Kumar Yadav, Quentin Frederick, Satya Aakash Chowdary Obellaneni, Jianwei Qin, Moon Kim, Mark A. Ritenour, Jiuxu Zhang and Fartash Vasefi
Appl. Sci. 2026, 16(2), 968; https://doi.org/10.3390/app16020968 - 17 Jan 2026
Viewed by 559
Abstract
Detecting Escherichia coli on food preparation and processing surfaces is critical for ensuring food safety and preventing foodborne illness. This study focuses on detecting E. coli contamination on common food processing surfaces using UV-C fluorescence imaging and deep learning. Four concentrations of E. [...] Read more.
Detecting Escherichia coli on food preparation and processing surfaces is critical for ensuring food safety and preventing foodborne illness. This study focuses on detecting E. coli contamination on common food processing surfaces using UV-C fluorescence imaging and deep learning. Four concentrations of E. coli (0, 105, 107, and 108 colony forming units (CFU)/mL) and two egg solutions (white and yolk) were applied to stainless steel and white rubber to simulate realistic contamination with organic interference. For each concentration level, 256 droplets were inoculated in 16 groups, and fluorescence videos were captured. Droplet regions were extracted from the video frames, subdivided into quadrants, and augmented to generate a robust dataset, ensuring 3–4 droplets per sample. Wavelet-based denoising further improved image quality, with Haar wavelets producing the highest Peak Signal-to-Noise Ratio (PSNR) values, up to 51.0 dB on white rubber and 48.2 dB on stainless steel. Using this dataset, multiple deep learning (DL) models, including ConvNeXtBase, EfficientNetV2L, and five YOLO11-cls variants, were trained to classify E. coli concentration levels. Additionally, Eigen-CAM heatmaps were used to visualize model attention to bacterial fluorescence regions. Across four dataset groupings, YOLO11-cls models achieved consistently high performance, with peak test accuracies of 100% on white rubber and 99.60% on stainless steel, even in the presence of egg substances. YOLO11s-cls provided the best balance of accuracy (up to 98.88%) and inference speed (4–5 ms) whilst having a compact size (11 MB), outperforming larger models such as EfficientNetV2L. Classical machine learning models lagged significantly behind, with Random Forest reaching 89.65% accuracy and SVM only 67.62%. Overall, the results highlight the potential of combining UV-C fluorescence imaging with deep learning for rapid and reliable detection of E. coli on stainless steel and rubber conveyor belt surfaces. Additionally, this approach could support the design of effective interventions to remove E. coli from food processing environments. Full article
Show Figures

Figure 1

38 pages, 16831 KB  
Article
Hybrid ConvNeXtV2–ViT Architecture with Ontology-Driven Explainability and Out-of-Distribution Awareness for Transparent Chest X-Ray Diagnosis
by Naif Almughamisi, Gibrael Abosamra, Adnan Albar and Mostafa Saleh
Diagnostics 2026, 16(2), 294; https://doi.org/10.3390/diagnostics16020294 - 16 Jan 2026
Cited by 1 | Viewed by 691
Abstract
Background: Chest X-ray (CXR) is widely used for the assessment of thoracic diseases, yet automated multi-label interpretation remains challenging due to subtle visual patterns, overlapping anatomical structures, and frequent co-occurrence of abnormalities. While recent deep learning models have shown strong performance, limitations in [...] Read more.
Background: Chest X-ray (CXR) is widely used for the assessment of thoracic diseases, yet automated multi-label interpretation remains challenging due to subtle visual patterns, overlapping anatomical structures, and frequent co-occurrence of abnormalities. While recent deep learning models have shown strong performance, limitations in interpretability, anatomical awareness, and robustness continue to hinder their clinical adoption. Methods: The proposed framework employs a hybrid ConvNeXtV2–Vision Transformer (ViT) architecture that combines convolutional feature extraction for capturing fine-grained local patterns with transformer-based global reasoning to model long-range contextual dependencies. The model is trained exclusively using image-level annotations. In addition to classification, three complementary post hoc components are integrated to enhance model trust and interpretability. A segmentation-aware Gradient-weighted class activation mapping (Grad-CAM) module leverages CheXmask lung and heart segmentations to highlight anatomically relevant regions and quantify predictive evidence inside and outside the lungs. An ontology-driven neuro-symbolic reasoning layer translates Grad-CAM activations into structured, rule-based explanations aligned with clinical concepts such as “basal effusion” and “enlarged cardiac silhouette”. Furthermore, a lightweight out-of-distribution (OOD) detection module based on confidence scores, energy scores, and Mahalanobis distance scores is employed to identify inputs that deviate from the training distribution. Results: On the VinBigData test set, the model achieved a macro-AUROC of 0.9525 and a Micro AUROC of 0.9777 when trained solely with image-level annotations. External evaluation further demonstrated strong generalisation, yielding macro-AUROC scores of 0.9106 on NIH ChestXray14 and 0.8487 on CheXpert (frontal views). Both Grad-CAM visualisations and ontology-based reasoning remained coherent on unseen data, while the OOD module successfully flagged non-thoracic images. Conclusions: Overall, the proposed approach demonstrates that hybrid convolutional neural network (CNN)–vision transformer (ViT) architectures, combined with anatomy-aware explainability and symbolic reasoning, can support automated chest X-ray diagnosis in a manner that is accurate, transparent, and safety-aware. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

20 pages, 2070 KB  
Article
Automated Detection of Normal, Atrial, and Ventricular Premature Beats from Single-Lead ECG Using Convolutional Neural Networks
by Dimitri Kraft and Peter Rumm
Sensors 2026, 26(2), 513; https://doi.org/10.3390/s26020513 - 12 Jan 2026
Viewed by 825
Abstract
Accurate detection of premature atrial contractions (PACs) and premature ventricular contractions (PVCs) in single-lead electrocardiograms (ECGs) is crucial for early identification of patients at risk for atrial fibrillation, cardiomyopathy, and other adverse outcomes. In this work, we present a fully convolutional one-dimensional U-Net [...] Read more.
Accurate detection of premature atrial contractions (PACs) and premature ventricular contractions (PVCs) in single-lead electrocardiograms (ECGs) is crucial for early identification of patients at risk for atrial fibrillation, cardiomyopathy, and other adverse outcomes. In this work, we present a fully convolutional one-dimensional U-Net that reframes beat classification as a segmentation task and directly detects normal beats, PACs, and PVCs from raw ECG signals. The architecture employs a ConvNeXt V2 encoder with simple decoder blocks and does not rely on explicit R-peak detection, handcrafted features, or fixed-length input windows. The model is trained on the Icentia11k database and an in-house single-lead ECG dataset that emphasizes challenging, noisy recordings, and is validated on the CPSC2020 database. Generalization is assessed across several benchmark and clinical datasets, including MIT-BIH Arrhythmia (ADB), MIT 11, AHA, NST, SVDB, CST STRIPS, and CPSC2020. The proposed method achieves near-perfect QRS detection (sensitivity and precision up to 0.999) and competitive PVC performance, with sensitivity ranging from 0.820 (AHA) to 0.986 (MIT 11) and precision up to 0.993 (MIT 11). PAC detection is more variable, with sensitivities between 0.539 and 0.797 and precisions between 0.751 and 0.910, yet the resulting F1-score of 0.72 on SVDB exceeds that of previously published approaches. Model interpretability is addressed using Layer-wise Gradient-weighted Class Activation Mapping (LayerGradCAM), which confirms physiologically plausible attention to QRS complexes for PVCs and to P-waves for PACs. Overall, the proposed framework provides a robust, interpretable, and hardware-efficient solution for joint PAC and PVC detection in noisy, single-lead ECG recordings, suitable for integration into Holter and wearable monitoring systems. Full article
Show Figures

Figure 1

21 pages, 3769 KB  
Article
Benchmarking Robust AI for Microrobot Detection with Ultrasound Imaging
by Ahmed Almaghthawi, Changyan He, Suhuai Luo, Furqan Alam, Majid Roshanfar and Lingbo Cheng
Actuators 2026, 15(1), 16; https://doi.org/10.3390/act15010016 - 29 Dec 2025
Viewed by 820
Abstract
Microrobots are emerging as transformative tools in minimally invasive medicine, with applications in non-invasive therapy, real-time diagnosis, and targeted drug delivery. Effective use of these systems critically depends on accurate detection and tracking of microrobots within the body. Among commonly used imaging modalities, [...] Read more.
Microrobots are emerging as transformative tools in minimally invasive medicine, with applications in non-invasive therapy, real-time diagnosis, and targeted drug delivery. Effective use of these systems critically depends on accurate detection and tracking of microrobots within the body. Among commonly used imaging modalities, including MRI, CT, and optical imaging, ultrasound (US) offers an advantageous balance of portability, low cost, non-ionizing safety, and high temporal resolution, making it particularly suitable for real-time microrobot monitoring. This study reviews current detection strategies and presents a comparative evaluation of six advanced AI-based multi-object detectors, including ConvNeXt, Res2NeXt-101, ResNeSt-269, U-Net, and the latest YOLO variants (v11, v12), being applied to microrobot detection in US imaging. Performance is assessed using standard metrics (AP50–95, precision, recall, F1-score) and robustness to four visual perturbations: blur, brightness variation, occlusion, and speckle noise. Additionally, feature-level sensitivity analyses are conducted to identify the contributions of different visual cues. Computational efficiency is also measured to assess suitability for real-time deployment. Results show that ResNeSt-269 achieved the highest detection accuracy, followed by Res2NeXt-101 and ConvNeXt, while YOLO-based detectors provided superior computational efficiency. These findings offer actionable insights for developing robust and efficient microrobot tracking systems with strong potential in diagnostic and therapeutic healthcare applications. Full article
Show Figures

Figure 1

23 pages, 2688 KB  
Article
RGSGAN–MACRNet: A More Accurate Recognition Method for Imperfect Corn Kernels Under Sample-Size-Limited Conditions
by Chenxia Wan, Wenzheng Li, Qinghui Zhang, Le Xiao, Pengtao Lv, Huiyi Zhao and Shihua Jing
Foods 2025, 14(24), 4356; https://doi.org/10.3390/foods14244356 - 18 Dec 2025
Viewed by 660
Abstract
Under sample-size-limited conditions, the recognition accuracy of imperfect corn kernels is severely degraded. To address this issue, a recognition framework that integrates a Residual Generative Spatial–Channel Synergistic Attention Generative Adversarial Network (RGSGAN) with a Multi-Scale Asymmetric Convolutional Residual Network (MACRNet) was proposed. First, [...] Read more.
Under sample-size-limited conditions, the recognition accuracy of imperfect corn kernels is severely degraded. To address this issue, a recognition framework that integrates a Residual Generative Spatial–Channel Synergistic Attention Generative Adversarial Network (RGSGAN) with a Multi-Scale Asymmetric Convolutional Residual Network (MACRNet) was proposed. First, residual structures and a spatial–channel synergistic attention mechanism are incorporated into the RGSGAN generator, and the Wasserstein distance with gradient penalty is integrated to produce high-quality samples and expand the dataset. On this basis, the MACRNet employs a multi-branch asymmetric convolutional residual module to perform multi-scale feature fusion, thereby substantially enhancing its ability to capture subtle textural and local structural variations in imperfect corn kernels. The experimental results demonstrated that the proposed method attains a classification accuracy of 98.813%, surpassing ResNet18, EfficientNet-v2, ConvNeXt-T, and ConvNeXt-v2 by 8.3%, 6.16%, 3.01%, and 4.09%, respectively, and outperforms the model trained on the original dataset by 5.29%. These results confirm the superior performance of the proposed approach under sample-size-limited conditions, effectively alleviating the adverse impact of data scarcity on the recognition accuracy of imperfect corn kernels. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

Back to TopTop