Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (60)

Search Parameters:
Keywords = ConvNeXt V2

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3769 KB  
Article
Benchmarking Robust AI for Microrobot Detection with Ultrasound Imaging
by Ahmed Almaghthawi, Changyan He, Suhuai Luo, Furqan Alam, Majid Roshanfar and Lingbo Cheng
Actuators 2026, 15(1), 16; https://doi.org/10.3390/act15010016 - 29 Dec 2025
Viewed by 263
Abstract
Microrobots are emerging as transformative tools in minimally invasive medicine, with applications in non-invasive therapy, real-time diagnosis, and targeted drug delivery. Effective use of these systems critically depends on accurate detection and tracking of microrobots within the body. Among commonly used imaging modalities, [...] Read more.
Microrobots are emerging as transformative tools in minimally invasive medicine, with applications in non-invasive therapy, real-time diagnosis, and targeted drug delivery. Effective use of these systems critically depends on accurate detection and tracking of microrobots within the body. Among commonly used imaging modalities, including MRI, CT, and optical imaging, ultrasound (US) offers an advantageous balance of portability, low cost, non-ionizing safety, and high temporal resolution, making it particularly suitable for real-time microrobot monitoring. This study reviews current detection strategies and presents a comparative evaluation of six advanced AI-based multi-object detectors, including ConvNeXt, Res2NeXt-101, ResNeSt-269, U-Net, and the latest YOLO variants (v11, v12), being applied to microrobot detection in US imaging. Performance is assessed using standard metrics (AP50–95, precision, recall, F1-score) and robustness to four visual perturbations: blur, brightness variation, occlusion, and speckle noise. Additionally, feature-level sensitivity analyses are conducted to identify the contributions of different visual cues. Computational efficiency is also measured to assess suitability for real-time deployment. Results show that ResNeSt-269 achieved the highest detection accuracy, followed by Res2NeXt-101 and ConvNeXt, while YOLO-based detectors provided superior computational efficiency. These findings offer actionable insights for developing robust and efficient microrobot tracking systems with strong potential in diagnostic and therapeutic healthcare applications. Full article
Show Figures

Figure 1

23 pages, 2688 KB  
Article
RGSGAN–MACRNet: A More Accurate Recognition Method for Imperfect Corn Kernels Under Sample-Size-Limited Conditions
by Chenxia Wan, Wenzheng Li, Qinghui Zhang, Le Xiao, Pengtao Lv, Huiyi Zhao and Shihua Jing
Foods 2025, 14(24), 4356; https://doi.org/10.3390/foods14244356 - 18 Dec 2025
Viewed by 401
Abstract
Under sample-size-limited conditions, the recognition accuracy of imperfect corn kernels is severely degraded. To address this issue, a recognition framework that integrates a Residual Generative Spatial–Channel Synergistic Attention Generative Adversarial Network (RGSGAN) with a Multi-Scale Asymmetric Convolutional Residual Network (MACRNet) was proposed. First, [...] Read more.
Under sample-size-limited conditions, the recognition accuracy of imperfect corn kernels is severely degraded. To address this issue, a recognition framework that integrates a Residual Generative Spatial–Channel Synergistic Attention Generative Adversarial Network (RGSGAN) with a Multi-Scale Asymmetric Convolutional Residual Network (MACRNet) was proposed. First, residual structures and a spatial–channel synergistic attention mechanism are incorporated into the RGSGAN generator, and the Wasserstein distance with gradient penalty is integrated to produce high-quality samples and expand the dataset. On this basis, the MACRNet employs a multi-branch asymmetric convolutional residual module to perform multi-scale feature fusion, thereby substantially enhancing its ability to capture subtle textural and local structural variations in imperfect corn kernels. The experimental results demonstrated that the proposed method attains a classification accuracy of 98.813%, surpassing ResNet18, EfficientNet-v2, ConvNeXt-T, and ConvNeXt-v2 by 8.3%, 6.16%, 3.01%, and 4.09%, respectively, and outperforms the model trained on the original dataset by 5.29%. These results confirm the superior performance of the proposed approach under sample-size-limited conditions, effectively alleviating the adverse impact of data scarcity on the recognition accuracy of imperfect corn kernels. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

27 pages, 9435 KB  
Article
Research on an Intelligent Grading Method for Beef Freshness in Complex Backgrounds Based on the DEVA-ConvNeXt Model
by Xiuling Yu, Yifu Xu, Chenxiao Qu, Senyue Guo, Shuo Jiang, Linqiang Chen and Yang Zhou
Foods 2025, 14(24), 4178; https://doi.org/10.3390/foods14244178 - 5 Dec 2025
Viewed by 392
Abstract
This paper presents a novel DEVA-ConvNeXt model to address challenges in beef freshness grading, including data collection difficulties, complex backgrounds, and model accuracy issues. The Alpha-Background Generation Shift (ABG-Shift) technology enables rapid generation of beef image datasets with complex backgrounds. By incorporating the [...] Read more.
This paper presents a novel DEVA-ConvNeXt model to address challenges in beef freshness grading, including data collection difficulties, complex backgrounds, and model accuracy issues. The Alpha-Background Generation Shift (ABG-Shift) technology enables rapid generation of beef image datasets with complex backgrounds. By incorporating the Dynamic Non-Local Coordinate Attention (DNLC) and Enhanced Depthwise Convolution (EDW) modules, the model enhances feature extraction in complex environments. Additionally, Varifocal Loss (VFL) accelerates key feature learning, reducing training time and improving convergence speed. Experimental results show that DEVA-ConvNeXt outperforms models like ResNet101 and ShuffleNet V2 in terms of overall performance. Compared to the baseline model ConvNeXt, it achieves significant improvements in recognition Accuracy (94.8%, a 6.2% increase), Precision (94.8%, a 5.4% increase), Recall (94.6%, a 5.9% increase), and F1 score (94.7%, a 6.0% increase). Furthermore, real-world deployment and testing on embedded devices confirm the feasibility of this method in terms of accuracy and speed, providing valuable technical support for beef freshness grading and equipment design. Full article
(This article belongs to the Section Food Engineering and Technology)
Show Figures

Figure 1

17 pages, 1722 KB  
Article
Detection in Road Crack Images Based on Sparse Convolution
by Yang Li, Xinhang Li, Ke Shen, Yacong Li, Dong Sui and Maozu Guo
Math. Comput. Appl. 2025, 30(6), 132; https://doi.org/10.3390/mca30060132 - 3 Dec 2025
Viewed by 408
Abstract
Ensuring the structural integrity of road infrastructure is vital for transportation safety and long-term sustainability. This study presents a lightweight and accurate pavement crack detection framework named SpcNet, which integrates a Sparse Encoding Module, ConvNeXt V2-based decoder, and a Binary Attention Module (BAM) [...] Read more.
Ensuring the structural integrity of road infrastructure is vital for transportation safety and long-term sustainability. This study presents a lightweight and accurate pavement crack detection framework named SpcNet, which integrates a Sparse Encoding Module, ConvNeXt V2-based decoder, and a Binary Attention Module (BAM) within an asymmetric encoder–decoder architecture. The proposed method first applies a random masking strategy to generate sparse pixel inputs and employs sparse convolution to enhance computational efficiency. A ConvNeXt V2 decoder with Global Response Normalization (GRN) and GELU activation further stabilizes feature extraction, while the BAM, in conjunction with Channel and Spatial Attention Bridge (CAB/SAB) modules, strengthens global dependency modeling and multi-scale feature fusion. Comprehensive experiments on four public datasets demonstrate that SpcNet achieves state-of-the-art performance with significantly fewer parameters and lower computational cost. On the Crack500 dataset, the method achieves a precision of 91.0%, recall of 85.1%, F1 score of 88.0%, and mIoU of 79.8%, surpassing existing deep-learning-based approaches. These results confirm that SpcNet effectively balances detection accuracy and efficiency, making it well-suited for real-world pavement condition monitoring. Full article
Show Figures

Figure 1

20 pages, 1535 KB  
Article
ConvNeXt-Driven Detection of Alzheimer’s Disease: A Benchmark Study on Expert-Annotated AlzaSet MRI Dataset Across Anatomical Planes
by Mahdiyeh Basereh, Matthew Alexander Abikenari, Sina Sadeghzadeh, Trae Dunn, René Freichel, Prabha Siddarth, Dara Ghahremani, Helen Lavretsky and Vivek P. Buch
Diagnostics 2025, 15(23), 2997; https://doi.org/10.3390/diagnostics15232997 - 25 Nov 2025
Cited by 1 | Viewed by 585
Abstract
Background: Alzheimer’s disease (AD) is a leading worldwide cause of cognitive impairment, necessitating accurate, inexpensive diagnostic tools to enable early recognition. Methods: In this study, we present a robust deep learning approach for AD classification based on structural MRI scans, ConvNeXt, an emergent [...] Read more.
Background: Alzheimer’s disease (AD) is a leading worldwide cause of cognitive impairment, necessitating accurate, inexpensive diagnostic tools to enable early recognition. Methods: In this study, we present a robust deep learning approach for AD classification based on structural MRI scans, ConvNeXt, an emergent convolutional architecture inspired by vision transformers. We introduce AlzaSet, a clinically curated T1-weighted MRI dataset of 79 subjects (63 with Alzheimer’s disease [AD], 16 cognitively normal controls [NC]) acquired on a 1.5 T Siemens Aera in axial, coronal, and sagittal planes, respectively (12,947 slices in total). Images are neuroradiologist-labeled. Results are reported per plane, with awareness of the class imbalance at the subject level. We further present AlzaSet, a novel, expertly labeled clinical dataset with axial, coronal, and sagittal perspectives from AD and cognitively normal control subjects. Three ConvNeXt sizes (Tiny, Small, Base) were compared and benchmarked against existing state-of-the-art CNN models (VGG16, VGG19, InceptionV3, DenseNet121). Results: ConvNeXt-Base consistently outperformed the other models on coronal slices with an accuracy of 98.37% and an AUC of 0.992. Coronal views were determined to be most diagnostically informative, with emphasis on visualization of the medial temporal lobe. Moreover, comparison with recent ensemble-based techniques showed superior performance with comparable computational efficiency. Conclusions: These results indicate that ConvNeXt-capable models applied to clinically curated datasets have strong potential to provide scalable, real-time AD screening in diverse settings, including both high-resource and resource-constrained settings. Full article
Show Figures

Figure 1

22 pages, 1557 KB  
Article
AI-Driven Damage Detection in Wind Turbines: Drone Imagery and Lightweight Deep Learning Approaches
by Ahmed Hamdi and Hassan N. Noura
Future Internet 2025, 17(11), 528; https://doi.org/10.3390/fi17110528 - 19 Nov 2025
Viewed by 604
Abstract
Wind power plays an increasingly vital role in sustainable energy production, yet the harsh environments in which turbines operate often lead to mechanical or structural degradation. Detecting such faults early is essential to reducing maintenance expenses and extending operational lifetime. In this work, [...] Read more.
Wind power plays an increasingly vital role in sustainable energy production, yet the harsh environments in which turbines operate often lead to mechanical or structural degradation. Detecting such faults early is essential to reducing maintenance expenses and extending operational lifetime. In this work, we propose a deep learning-based image classification framework designed to assess turbine condition directly from drone-acquired imagery. Unlike object detection pipelines, which require locating specific damage regions, the proposed strategy focuses on recognizing global visual cues that indicate the overall turbine state. A comprehensive comparison is performed among several lightweight and transformer-based architectures, including MobileNetV3, ResNet, EfficientNet, ConvNeXt, ShuffleNet, ViT, DeiT, and DINOv2, to identify the most suitable model for real-time deployment. The MobileNetV3-Large network achieved the best trade-off between performance and efficiency, reaching 98.9% accuracy while maintaining a compact size of 5.4 million parameters. These results highlight the capability of compact CNNs to deliver accurate and efficient turbine monitoring, paving the way for autonomous, drone-based inspection solutions at the edge. Full article
(This article belongs to the Special Issue Navigation, Deployment and Control of Intelligent Unmanned Vehicles)
Show Figures

Figure 1

22 pages, 9577 KB  
Article
YOLOv11-4ConvNeXtV2: Enhancing Persimmon Ripeness Detection Under Visual Challenges
by Bohan Zhang, Zhaoyuan Zhang and Xiaodong Zhang
AI 2025, 6(11), 284; https://doi.org/10.3390/ai6110284 - 1 Nov 2025
Viewed by 886
Abstract
Reliable and efficient detection of persimmons provides the foundation for precise maturity evaluation. Persimmon ripeness detection remains challenging due to small target sizes, frequent occlusion by foliage, and motion- or focus-induced blur that degrades edge information. This study proposes YOLOv11-4ConvNeXtV2, an enhanced detection [...] Read more.
Reliable and efficient detection of persimmons provides the foundation for precise maturity evaluation. Persimmon ripeness detection remains challenging due to small target sizes, frequent occlusion by foliage, and motion- or focus-induced blur that degrades edge information. This study proposes YOLOv11-4ConvNeXtV2, an enhanced detection framework that integrates a ConvNeXtV2 backbone with Fully Convolutional Masked Auto-Encoder (FCMAE) pretraining, Global Response Normalization (GRN), and Single-Head Self-Attention (SHSA) mechanisms. We present a comprehensive persimmon dataset featuring sub-block segmentation that preserves local structural integrity while expanding dataset diversity. The model was trained on 4921 annotated images (original 703 + 6 × 703 augmented) collected under diverse orchard conditions and optimized for 300 epochs using the Adam optimizer with early stopping. Comprehensive experiments demonstrate that YOLOv11-4ConvNeXtV2 achieves 95.9% precision and 83.7% recall, with mAP@0.5 of 88.4% and mAP@0.5:0.95 of 74.8%, outperforming state-of-the-art YOLO variants (YOLOv5n, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n) by 3.8–6.3 percentage points in mAP@0.5:0.95. The model demonstrates superior robustness to blur, occlusion, and varying illumination conditions, making it suitable for deployment in challenging maturity detection environments. Full article
Show Figures

Figure 1

22 pages, 10489 KB  
Article
From Contemporary Datasets to Cultural Heritage Performance: Explainability and Energy Profiling of Visual Models Towards Textile Identification
by Evangelos Nerantzis, Lamprini Malletzidou, Eleni Kyratzopoulou, Nestor C. Tsirliganis and Nikolaos A. Kazakis
Heritage 2025, 8(11), 447; https://doi.org/10.3390/heritage8110447 - 24 Oct 2025
Viewed by 754
Abstract
The identification and classification of textiles play a crucial role in archaeometric studies, in the vicinity of their technological, economic, and cultural significance. Traditional textile analysis is closely related to optical microscopy and observation, while other microscopic, analytical, and spectroscopic techniques prevail over [...] Read more.
The identification and classification of textiles play a crucial role in archaeometric studies, in the vicinity of their technological, economic, and cultural significance. Traditional textile analysis is closely related to optical microscopy and observation, while other microscopic, analytical, and spectroscopic techniques prevail over fiber identification for composition purposes. This protocol can be invasive and destructive for the artifacts under study, time-consuming, and it often relies on personal expertise. In this preliminary study, an alternative, macroscopic approach is proposed, based on texture and surface textile characteristics, using low-magnification images and deep learning models. Under this scope, a publicly available, imbalanced textile image dataset was used to pretrain and evaluate six computer vision architectures (ResNet50, EfficientNetV2, ViT, ConvNeXt, Swin Transformer, and MaxViT). In addition to accuracy, energy efficiency and ecological footprint of the process were assessed using the CodeCarbon tool. The results indicate that two of the convolutional neural network models, Swin and EfficientNetV2, both deliver competitive accuracies together with low carbon emissions, in comparison to the transformer and hybrid models. This alternative, promising, sustainable, and non-invasive approach for textile classification demonstrates the feasibility of developing a custom, heritage-based image dataset. Full article
Show Figures

Figure 1

20 pages, 2565 KB  
Article
GBV-Net: Hierarchical Fusion of Facial Expressions and Physiological Signals for Multimodal Emotion Recognition
by Jiling Yu, Yandong Ru, Bangjun Lei and Hongming Chen
Sensors 2025, 25(20), 6397; https://doi.org/10.3390/s25206397 - 16 Oct 2025
Viewed by 1048
Abstract
A core challenge in multimodal emotion recognition lies in the precise capture of the inherent multimodal interactive nature of human emotions. Addressing the limitation of existing methods, which often process visual signals (facial expressions) and physiological signals (EEG, ECG, EOG, and GSR) in [...] Read more.
A core challenge in multimodal emotion recognition lies in the precise capture of the inherent multimodal interactive nature of human emotions. Addressing the limitation of existing methods, which often process visual signals (facial expressions) and physiological signals (EEG, ECG, EOG, and GSR) in isolation and thus fail to exploit their complementary strengths effectively, this paper presents a new multimodal emotion recognition framework called the Gated Biological Visual Network (GBV-Net). This framework enhances emotion recognition accuracy through deep synergistic fusion of facial expressions and physiological signals. GBV-Net integrates three core modules: (1) a facial feature extractor based on a modified ConvNeXt V2 architecture incorporating lightweight Transformers, specifically designed to capture subtle spatio-temporal dynamics in facial expressions; (2) a hybrid physiological feature extractor combining 1D convolutions, Temporal Convolutional Networks (TCNs), and convolutional self-attention mechanisms, adept at modeling local patterns and long-range temporal dependencies in physiological signals; and (3) an enhanced gated attention fusion module capable of adaptively learning inter-modal weights to achieve dynamic, synergistic integration at the feature level. A thorough investigation of the publicly accessible DEAP and MAHNOB-HCI datasets reveals that GBV-Net surpasses contemporary methods. Specifically, on the DEAP dataset, the model attained classification accuracies of 95.10% for Valence and 95.65% for Arousal, with F1-scores of 95.52% and 96.35%, respectively. On MAHNOB-HCI, the accuracies achieved were 97.28% for Valence and 97.73% for Arousal, with F1-scores of 97.50% and 97.74%, respectively. These experimental findings substantiate that GBV-Net effectively captures deep-level interactive information between multimodal signals, thereby improving emotion recognition accuracy. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

22 pages, 3267 KB  
Article
A Comparative Evaluation of Meta-Learning Models for Few-Shot Chest X-Ray Disease Classification
by Luis-Carlos Quiñonez-Baca, Graciela Ramirez-Alonso, Fernando Gaxiola, Alain Manzo-Martinez, Raymundo Cornejo and David R. Lopez-Flores
Diagnostics 2025, 15(18), 2404; https://doi.org/10.3390/diagnostics15182404 - 21 Sep 2025
Viewed by 1154
Abstract
Background/Objectives: The limited availability of labeled data, particularly in the medical domain, poses a significant challenge for training accurate diagnostic models. While deep learning techniques have demonstrated notable efficacy in image-based tasks, they require large annotated datasets. In data-scarce scenarios—especially involving rare [...] Read more.
Background/Objectives: The limited availability of labeled data, particularly in the medical domain, poses a significant challenge for training accurate diagnostic models. While deep learning techniques have demonstrated notable efficacy in image-based tasks, they require large annotated datasets. In data-scarce scenarios—especially involving rare diseases—their performance deteriorates significantly. Meta-learning offers a promising alternative by enabling models to adapt quickly to new tasks using prior knowledge and only a few labeled examples. This study aims to evaluate the effectiveness of representative meta-learning models for thoracic disease classification in chest X-rays. Methods: We conduct a comparative evaluation of four meta-learning models: Prototypical Networks, Relation Networks, MAML, and FoMAML. First, we assess five backbone architectures (ConvNeXt, DenseNet-121, ResNet-50, MobileNetV2, and ViT) using a Prototypical Network. The best-performing backbone is then used across all meta-learning models for fair comparison. Experiments are performed on the ChestX-ray14 dataset under a 2-way setting with multiple k-shot configurations. Results: Prototypical Networks combined with DenseNet-121 achieved the best performance, with a recall of 68.1%, an F1-score of 67.4%, and a precision of 0.693 in the 2-way, 10-shot configuration. In a disease-specific analysis, Hernia obtains the best classification results. Furthermore, Prototypical and Relation Networks demonstrate significantly higher computational efficiency, requiring fewer FLOPs and shorter execution times than MAML and FoMAML. Conclusions: Prototype-based meta-learning, particularly with DenseNet-121, proves to be a robust and computationally efficient approach for few-shot chest X-ray disease classification. These findings highlight its potential for real-world clinical applications, especially in scenarios with limited annotated medical data. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

26 pages, 5592 KB  
Article
AGRI-YOLO: A Lightweight Model for Corn Weed Detection with Enhanced YOLO v11n
by Gaohui Peng, Kenan Wang, Jianqin Ma, Bifeng Cui and Dawei Wang
Agriculture 2025, 15(18), 1971; https://doi.org/10.3390/agriculture15181971 - 18 Sep 2025
Cited by 3 | Viewed by 1344
Abstract
Corn, as a globally significant food crop, faces significant yield reductions due to competitive growth from weeds. Precise detection and efficient control of weeds are critical technical components for ensuring high and stable corn yields. Traditional deep learning object detection models generally suffer [...] Read more.
Corn, as a globally significant food crop, faces significant yield reductions due to competitive growth from weeds. Precise detection and efficient control of weeds are critical technical components for ensuring high and stable corn yields. Traditional deep learning object detection models generally suffer from issues such as large parameter counts and high computational complexity, making them unsuitable for deployment on resource-constrained devices such as agricultural drones and portable detection devices. Based on this, this paper proposes a lightweight corn weed detection model, AGRI-YOLO, based on the YOLO v11n architecture. First, the DWConv (Depthwise Separable Convolution) module from InceptionNeXt is introduced to reconstruct the C3k2 feature extraction module, enhancing the feature extraction capabilities for corn seedlings and weeds. Second, the ADown (Adaptive Downsampling) downsampling module replaces the Conv layer to address the issue of redundant model parameters; The LADH (Lightweight Asymmetric Detection) detection head is adopted to achieve dynamic weight adjustment while ensuring multi-branch output optimization for target localization and classification precision. Experimental results show that the AGRI-YOLO model achieves a precision rate of 84.7%, a recall rate of 73.0%, and a mAP50 value of 82.8%. Compared to the baseline architecture YOLO v11n, the results are largely consistent, while the number of parameters, G FLOPs, and model size are reduced by 46.6%, 49.2%, and 42.31%, respectively. The AGRI-YOLO model significantly reduces model complexity while maintaining high recognition precision, providing technical support for deployment on resource-constrained edge devices, thereby promoting agricultural intelligence, maintaining ecological balance, and ensuring food security. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

26 pages, 3973 KB  
Article
ViT-DCNN: Vision Transformer with Deformable CNN Model for Lung and Colon Cancer Detection
by Aditya Pal, Hari Mohan Rai, Joon Yoo, Sang-Ryong Lee and Yooheon Park
Cancers 2025, 17(18), 3005; https://doi.org/10.3390/cancers17183005 - 15 Sep 2025
Cited by 2 | Viewed by 1155
Abstract
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different [...] Read more.
Background/Objectives: Lung and colon cancers remain among the most prevalent and fatal diseases worldwide, and their early detection is a serious challenge. The data used in this study was obtained from the Lung and Colon Cancer Histopathological Images Dataset, which comprises five different classes of image data, namely colon adenocarcinoma, colon normal, lung adenocarcinoma, lung normal, and lung squamous cell carcinoma, split into training (80%), validation (10%), and test (10%) subsets. In this study, we propose the ViT-DCNN (Vision Transformer with Deformable CNN) model, with the aim of improving cancer detection and classification using medical images. Methods: The combination of the ViT’s self-attention capabilities with deformable convolutions allows for improved feature extraction, while also enabling the model to learn both holistic contextual information as well as fine-grained localized spatial details. Results: On the test set, the model performed remarkably well, with an accuracy of 94.24%, an F1 score of 94.23%, recall of 94.24%, and precision of 94.37%, confirming its robustness in detecting cancerous tissues. Furthermore, our proposed ViT-DCNN model outperforms several state-of-the-art models, including ResNet-152, EfficientNet-B7, SwinTransformer, DenseNet-201, ConvNext, TransUNet, CNN-LSTM, MobileNetV3, and NASNet-A, across all major performance metrics. Conclusions: By using deep learning and advanced image analysis, this model enhances the efficiency of cancer detection, thus representing a valuable tool for radiologists and clinicians. This study demonstrates that the proposed ViT-DCNN model can reduce diagnostic inaccuracies and improve detection efficiency. Future work will focus on dataset enrichment and enhancing the model’s interpretability to evaluate its clinical applicability. This paper demonstrates the promise of artificial-intelligence-driven diagnostic models in transforming lung and colon cancer detection and improving patient diagnosis. Full article
(This article belongs to the Special Issue Image Analysis and Machine Learning in Cancers: 2nd Edition)
Show Figures

Figure 1

12 pages, 2536 KB  
Article
Interpreting Venous and Arterial Ulcer Images Through the Grad-CAM Lens: Insights and Implications in CNN-Based Wound Image Classification
by Hannah Neuwieser, Naga Venkata Sai Jitin Jami, Robert Johannes Meier, Gregor Liebsch, Oliver Felthaus, Silvan Klein, Stephan Schreml, Mark Berneburg, Lukas Prantl, Heike Leutheuser and Sally Kempa
Diagnostics 2025, 15(17), 2184; https://doi.org/10.3390/diagnostics15172184 - 28 Aug 2025
Viewed by 1206
Abstract
Background/Objectives: Chronic wounds of the lower extremities, particularly arterial and venous ulcers, represent a significant and costly challenge in medical care. To assist in differential diagnosis, we aim to evaluate various advanced deep-learning models for classifying arterial and venous ulcers and visualize [...] Read more.
Background/Objectives: Chronic wounds of the lower extremities, particularly arterial and venous ulcers, represent a significant and costly challenge in medical care. To assist in differential diagnosis, we aim to evaluate various advanced deep-learning models for classifying arterial and venous ulcers and visualize their decision-making processes. Methods: A retrospective dataset of 607 images (198 arterial and 409 venous ulcers) was used to train five convolutional neural networks: ResNet50, ResNeXt50, ConvNeXt, EfficientNetB2, and EfficientNetV2. Model performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC. Grad-CAM was applied to visualize image regions contributing to classification decisions. Results: The models demonstrated high classification performance, with accuracy ranging from 72% (ConvNeXt) to 98% (ResNeXt50). Precision and recall values indicated strong discrimination between arterial and venous ulcers, with EfficientNetV2 achieving the highest precision. Conclusions: AI-assisted classification of venous and arterial ulcers offers a valuable method for enhancing diagnostic efficiency. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

11 pages, 2134 KB  
Proceeding Paper
Determination of Anteroposterior and Posteroanterior Imaging Positions on Chest X-Ray Images Using Deep Learning
by Fatih Gökçimen, Alpaslan Burak İnner and Özgür Çakır
Eng. Proc. 2025, 104(1), 58; https://doi.org/10.3390/engproc2025104058 - 28 Aug 2025
Viewed by 1406
Abstract
This study proposes a deep learning framework to classify anteroposterior (AP) and posteroanterior (PA) chest X-ray projections automatically. Multiple convolutional neural networks (CNNs), including ResNet18, ResNet34, ResNet50, DenseNet121, EfficientNetV2-S, and ConvNeXt-Tiny, were utilized. The NIH Chest X-ray Dataset, with 112,120 images, was used [...] Read more.
This study proposes a deep learning framework to classify anteroposterior (AP) and posteroanterior (PA) chest X-ray projections automatically. Multiple convolutional neural networks (CNNs), including ResNet18, ResNet34, ResNet50, DenseNet121, EfficientNetV2-S, and ConvNeXt-Tiny, were utilized. The NIH Chest X-ray Dataset, with 112,120 images, was used with strict patient-wise splitting to prevent data leakage. ResNet34 achieved the highest performance: 99.65% accuracy, 0.9956 F1 score, and 0.9994 ROC-AUC. Grad-CAM visualized model decisions, and expert-reviewed misclassified samples were removed to enhance dataset quality. This methodology highlights the importance of robust preprocessing, model interpretability, and clinical applicability in radiographic view classification tasks. Full article
Show Figures

Figure 1

37 pages, 3806 KB  
Article
Comparative Evaluation of CNN and Transformer Architectures for Flowering Phase Classification of Tilia cordata Mill. with Automated Image Quality Filtering
by Bogdan Arct, Bartosz Świderski, Monika A. Różańska, Bogdan H. Chojnicki, Tomasz Wojciechowski, Gniewko Niedbała, Michał Kruk, Krzysztof Bobran and Jarosław Kurek
Sensors 2025, 25(17), 5326; https://doi.org/10.3390/s25175326 - 27 Aug 2025
Viewed by 1462
Abstract
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of [...] Read more.
Understanding and monitoring the phenological phases of trees is essential for ecological research and climate change studies. In this work, we present a comprehensive evaluation of state-of-the-art convolutional neural networks (CNNs) and transformer architectures for the automated classification of the flowering phase of Tilia cordata Mill. (small-leaved lime) based on a large set of real-world images acquired under natural field conditions. The study introduces a novel, automated image quality filtering approach using an XGBoost classifier trained on diverse exposure and sharpness features to ensure robust input data for subsequent deep learning models. Seven modern neural network architectures, including VGG16, ResNet50, EfficientNetB3, MobileNetV3 Large, ConvNeXt Tiny, Vision Transformer (ViT-B/16), and Swin Transformer Tiny, were fine-tuned and evaluated under a rigorous cross-validation protocol. All models achieved excellent performance, with cross-validated F1-scores exceeding 0.97 and balanced accuracy up to 0.993. The best results were obtained for ResNet50 and ConvNeXt Tiny (F1-score: 0.9879 ± 0.0077 and 0.9860 ± 0.0073, balanced accuracy: 0.9922 ± 0.0054 and 0.9927 ± 0.0042, respectively), indicating outstanding sensitivity and specificity for both flowering and non-flowering classes. Classical CNNs (VGG16, ResNet50, and ConvNeXt Tiny) demonstrated slightly superior robustness compared to transformer-based models, though all architectures maintained high generalization and minimal variance across folds. The integrated quality assessment and classification pipeline enables scalable, high-throughput monitoring of flowering phases in natural environments. The proposed methodology is adaptable to other plant species and locations, supporting future ecological monitoring and climate studies. Our key contributions are as follows: (i) introducing an automated exposure-quality filtering stage for field imagery; (ii) publishing a curated, season-long dataset of Tilia cordata images; and (iii) providing the first systematic cross-validated benchmark that contrasts classical CNNs with transformer architectures for phenological phase recognition. Full article
(This article belongs to the Special Issue Application of UAV and Sensing in Precision Agriculture)
Show Figures

Figure 1

Back to TopTop