MDPI - Publisher of Open Access Journals

18 pages, 7321 KiB

Open AccessArticle

Fault Diagnosis of Wind Turbine Gearbox Based on Mel Spectrogram and Improved ResNeXt50 Model

by Xiaojuan Zhang, Feixiang Jia and Yayu Chen

Appl. Sci. 2025, 15(15), 8563; https://doi.org/10.3390/app15158563 (registering DOI) - 1 Aug 2025

In response to the problem of complex and variable loads on wind turbine gearbox bearing in working conditions, as well as the limited amount of sound data making fault identification difficult, this study focuses on sound signals and proposes an intelligent diagnostic method [...] Read more.

In response to the problem of complex and variable loads on wind turbine gearbox bearing in working conditions, as well as the limited amount of sound data making fault identification difficult, this study focuses on sound signals and proposes an intelligent diagnostic method using deep learning. By adding the CBAM module in ResNeXt to enhance the model’s attention to important features and combining it with the Arcloss loss function to make the model learn more discriminative features, the generalization ability of the model is strengthened. We used a fine-tuning transfer learning strategy, transferring pre-trained model parameters to the CBAM-ResNeXt50-ArcLoss model and training with an extracted Mel spectrogram of sound signals to extract and classify audio features of the wind turbine gearbox. Experimental validation of the proposed method on collected sound signals showed its effectiveness and superiority. Compared to CNN, ResNet50, ResNeXt50, and CBAM-ResNet50 methods, the CBAM-ResNeXt50-ArcLoss model achieved improvements of 13.3, 3.6, 2.4, and 1.3, respectively. Through comparison with classical algorithms, we demonstrated that the research method proposed in this study exhibits better diagnostic capability in classifying wind turbine gearbox sound signals. Full article

► Show Figures

Figure 1

24 pages, 9593 KiB

Open AccessArticle

Deep Learning Approaches for Skin Lesion Detection

by Jonathan Vieira, Fábio Mendonça and Fernando Morgado-Dias

Electronics 2025, 14(14), 2785; https://doi.org/10.3390/electronics14142785 - 10 Jul 2025

Viewed by 320

Abstract

Recently, there has been a rise in skin cancer cases, for which early detection is highly relevant, as it increases the likelihood of a cure. In this context, this work presents a benchmarking study of standard Convolutional Neural Network (CNN) architectures for automated [...] Read more.

Recently, there has been a rise in skin cancer cases, for which early detection is highly relevant, as it increases the likelihood of a cure. In this context, this work presents a benchmarking study of standard Convolutional Neural Network (CNN) architectures for automated skin lesion classification. A total of 38 CNN architectures from ten families (ConvNeXt, DenseNet, EfficientNet, Inception, InceptionResNet, MobileNet, NASNet, ResNet, VGG, and Xception) were evaluated using transfer learning on the HAM10000 dataset for seven-class skin lesion classification, namely, actinic keratoses, basal cell carcinoma, benign keratosis-like lesions, dermatofibroma, melanoma, melanocytic nevi, and vascular lesions. The comparative analysis used standardized training conditions, with all models utilizing frozen pre-trained weights. Cross-database validation was then conducted using the ISIC 2019 dataset to assess generalizability across different data distributions. The ConvNeXtXLarge architecture achieved the best performance, despite having one of the lowest performance-to-number-of-parameters ratios, with 87.62% overall accuracy and 76.15% F1 score on the test set, demonstrating competitive results within the established performance range of existing HAM10000-based studies. A proof-of-concept multiplatform mobile application was also implemented using a client–server architecture with encrypted image transmission, demonstrating the viability of integrating high-performing models into healthcare screening tools. Full article

(This article belongs to the Special Issue Future Trends and Challenges of Ubiquitous Computing and Smart Systems, 2nd Edition)

► Show Figures

Figure 1

17 pages, 1532 KiB

Open AccessArticle

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays

by Hanan Aljuaid, Hessa Albalahad, Walaa Alshuaibi, Shahad Almutairi, Tahani Hamad Aljohani, Nazar Hussain and Farah Mohammad

Diagnostics 2025, 15(13), 1728; https://doi.org/10.3390/diagnostics15131728 - 7 Jul 2025

Viewed by 505

Abstract

Background: Chest X-rays are rapidly gaining prominence as a prevalent diagnostic tool, as recognized by the World Health Organization (WHO). However, interpreting chest X-rays can be demanding and time-consuming, even for experienced radiologists, leading to potential misinterpretations and delays in treatment. Method: The [...] Read more.

Background: Chest X-rays are rapidly gaining prominence as a prevalent diagnostic tool, as recognized by the World Health Organization (WHO). However, interpreting chest X-rays can be demanding and time-consuming, even for experienced radiologists, leading to potential misinterpretations and delays in treatment. Method: The purpose of this research is the development of a RadAI model. The RadAI model can accurately detect four types of lung abnormalities in chest X-rays and generate a report on each identified abnormality. Moreover, deep learning algorithms, particularly convolutional neural networks (CNNs), have demonstrated remarkable potential in automating medical image analysis, including chest X-rays. This work addresses the challenge of chest X-ray interpretation by fine tuning the following three advanced deep learning models: Feature-selective and Spatial Receptive Fields Network (FSRFNet50), ResNext50, and ResNet50. These models are compared based on accuracy, precision, recall, and F1-score. Results: The outstanding performance of RadAI shows its potential to assist radiologists to interpret the detected chest abnormalities accurately. Conclusions: RadAI is beneficial in enhancing the accuracy and efficiency of chest X-ray interpretation, ultimately supporting the timely and reliable diagnosis of lung abnormalities. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

23 pages, 7163 KiB

Open AccessArticle

Entropy-Regularized Attention for Explainable Histological Classification with Convolutional and Hybrid Models

by Pedro L. Miguel, Leandro A. Neves, Alessandra Lumini, Giuliano C. Medalha, Guilherme F. Roberto, Guilherme B. Rozendo, Adriano M. Cansian, Thaína A. A. Tosta and Marcelo Z. do Nascimento

Entropy 2025, 27(7), 722; https://doi.org/10.3390/e27070722 - 3 Jul 2025

Viewed by 395

Abstract

Deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) perform well in histological image classification, but often lack interpretability. We introduce a unified framework that adds an attention branch and CAM Fostering, an entropy-based regularizer, to improve Grad-CAM visualizations. [...] Read more.

Deep learning models such as convolutional neural networks (CNNs) and vision transformers (ViTs) perform well in histological image classification, but often lack interpretability. We introduce a unified framework that adds an attention branch and CAM Fostering, an entropy-based regularizer, to improve Grad-CAM visualizations. Six backbone architectures (ResNet-50, DenseNet-201, EfficientNet-b0, ResNeXt-50, ConvNeXt, CoatNet-small) were trained, with and without our modifications, on five H&E-stained datasets. We measured explanation quality using coherence, complexity, confidence drop, and their harmonic mean (ADCC). Our method increased the ADCC in five of the six backbones; ResNet-50 saw the largest gain (+15.65%), and CoatNet-small achieved the highest overall score (+2.69%), peaking at 77.90% on the non-Hodgkin lymphoma set. The classification accuracy remained stable or improved in four models. These results show that combining attention and entropy produces clearer, more informative heatmaps without degrading performance. Our contributions include a modular architecture for both convolutional and hybrid models and a comprehensive, quantitative explainability evaluation suite. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

27 pages, 5780 KiB

Open AccessArticle

Utilizing GCN-Based Deep Learning for Road Extraction from Remote Sensing Images

by Yu Jiang, Jiasen Zhao, Wei Luo, Bincheng Guo, Zhulin An and Yongjun Xu

Sensors 2025, 25(13), 3915; https://doi.org/10.3390/s25133915 - 23 Jun 2025

Viewed by 523

Abstract

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly [...] Read more.

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly excelling in complex scenarios. However, extracting roads from remote sensing data remains challenging due to several factors that limit accuracy: (1) Roads often share similar visual features with the background, such as rooftops and parking lots, leading to ambiguous inter-class distinctions; (2) Roads in complex environments, such as those occluded by shadows or trees, are difficult to detect. To address these issues, this paper proposes an improved model based on Graph Convolutional Networks (GCNs), named FR-SGCN (Hierarchical Depth-wise Separable Graph Convolutional Network Incorporating Graph Reasoning and Attention Mechanisms). The model is designed to enhance the precision and robustness of road extraction through intelligent techniques, thereby supporting precise planning of green infrastructure. First, high-dimensional features are extracted using ResNeXt, whose grouped convolution structure balances parameter efficiency and feature representation capability, significantly enhancing the expressiveness of the data. These high-dimensional features are then segmented, and enhanced channel and spatial features are obtained via attention mechanisms, effectively mitigating background interference and intra-class ambiguity. Subsequently, a hybrid adjacency matrix construction method is proposed, based on gradient operators and graph reasoning. This method integrates similarity and gradient information and employs graph convolution to capture the global contextual relationships among features. To validate the effectiveness of FR-SGCN, we conducted comparative experiments using 12 different methods on both a self-built dataset and a public dataset. The proposed model achieved the highest F1 score on both datasets. Visualization results from the experiments demonstrate that the model effectively extracts occluded roads and reduces the risk of redundant construction caused by data errors during urban renewal. This provides reliable technical support for smart cities and sustainable development. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

► Show Figures

Figure 1

17 pages, 6780 KiB

Open AccessArticle

A Metric Learning-Based Improved Oriented R-CNN for Wildfire Detection in Power Transmission Corridors

by Xiaole Wang, Bo Wang, Peng Luo, Leixiong Wang and Yurou Wu

Sensors 2025, 25(13), 3882; https://doi.org/10.3390/s25133882 - 22 Jun 2025

Viewed by 358

Abstract

Wildfire detection in power transmission corridors is essential for providing timely warnings and ensuring the safe and stable operation of power lines. However, this task faces significant challenges due to the large number of smoke-like samples in the background, the complex and diverse [...] Read more.

Wildfire detection in power transmission corridors is essential for providing timely warnings and ensuring the safe and stable operation of power lines. However, this task faces significant challenges due to the large number of smoke-like samples in the background, the complex and diverse target morphologies, and the difficulty of detecting small-scale smoke and flame objects. To address these issues, this paper proposed an improved Oriented R-CNN model enhanced with metric learning for wildfire detection in power transmission corridors. Specifically, a multi-center metric loss (MCM-Loss) module based on metric learning was introduced to enhance the model’s ability to differentiate features of similar targets, thereby improving the recognition accuracy in the presence of interference. Experimental results showed that the introduction of the MCM-Loss module increased the average precision (AP) for smoke targets by 2.7%. In addition, the group convolution-based network ResNeXt was adopted to replace the original backbone network ResNet, broadening the channel dimensions of the feature extraction network and enhancing the model’s capability to detect flame and smoke targets with diverse morphologies. This substitution led to a 0.6% improvement in mean average precision (mAP). Furthermore, an FPN-CARAFE module was designed by incorporating the content-aware up-sampling operator CARAFE, which improved multi-scale feature representation and significantly boosted performance in detecting small targets. In particular, the proposed FPN-CARAFE module improved the AP for fire targets by 8.1%. Experimental results demonstrated that the proposed model achieved superior performance in wildfire detection within power transmission corridors, achieving a mAP of 90.4% on the test dataset—an improvement of 6.4% over the baseline model. Compared with other commonly used object detection algorithms, the model developed in this study exhibited improved detection performance on the test dataset, offering research support for wildfire monitoring in power transmission corridors. Full article

(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)

► Show Figures

Figure 1

25 pages, 8202 KiB

Open AccessArticle

Research on Identification Method of Transformer Windings’ Loose Vibration Spectrum Considering a Multi-Load Current Condition

by Jin Fang, Xudong Deng, Yuancan Xia, Chen Wu, Yuehua Li, Xin Li, Kaixin Chen, Fan Wang and Zhanlong Zhang

Appl. Sci. 2025, 15(12), 6949; https://doi.org/10.3390/app15126949 - 19 Jun 2025

Viewed by 507

Abstract

During transformer operation, long-term vibration causes the winding to loosen axially. When hit by a short-circuit, the winding deforms to different extents. Thus, identifying early looseness faults in transformer windings is vital for power systems’ stability. To address issues including scarce vibration data [...] Read more.

During transformer operation, long-term vibration causes the winding to loosen axially. When hit by a short-circuit, the winding deforms to different extents. Thus, identifying early looseness faults in transformer windings is vital for power systems’ stability. To address issues including scarce vibration data across multiple load conditions for transformer winding looseness faults, inadequate extraction of two-dimensional spectrogram features, and the inability to boost recognition accuracy caused by overfitting during fault recognition model training, this study constructed a 10 kV power transformer vibration test platform. It measured the vibration signals on the box surface under various winding looseness conditions and built a time–frequency-domain vibration spectrum library for different load currents. Then, a fault identification model based on vibration spectra and ConvNeXt was constructed, and model verification and analysis were carried out. The results indicate that after training, the fault recognition accuracy of the spectrum containing three load conditions is comparable to that of a single load condition. The average recognition accuracy at six box-surface measuring points reaches 97.9%. Moreover, the ConvNeXt model outperforms the traditional ResNet50 by 1.2%. This new model effectively addresses overfitting and offers strong technical support for detecting different transformer winding looseness faults. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

23 pages, 8979 KiB

Open AccessArticle

Beef Carcass Grading with EfficientViT: A Lightweight Vision Transformer Approach

by Hyunwoo Lim and Eungyeol Song

Appl. Sci. 2025, 15(11), 6302; https://doi.org/10.3390/app15116302 - 4 Jun 2025

Viewed by 781

Abstract

Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage [...] Read more.

Beef carcass grading plays a pivotal role in determining market value and consumer preferences. While traditional visual inspection by experts remains the industry standard, it suffers from subjectivity and inconsistencies, particularly in high-throughput slaughterhouse environments. To address these limitations, we propose a one-stage automated grading model based on EfficientViT, a lightweight vision transformer architecture. Unlike conventional two-stage methods that require prior segmentation of the loin region, our model directly predicts beef quality grades from raw RGB images, significantly simplifying the pipeline and reducing computational overhead. We evaluate the proposed model against representative convolutional neural networks (VGG-16, ResNeXt-50, DenseNet-121) as well as two-stage combinations of segmentation and classification models. Experiments were conducted on a publicly available beef carcass dataset consisting of over 77,000 labeled images. EfficientViT achieves the highest accuracy (98.46%) and F1-score (0.9867) among all evaluated models while maintaining low inference latency (3.92 ms) and compact parameter size (36.4 MB). In particular, it outperforms CNNs in predicting the top grade (1++), where global visual patterns such as marbling distribution are crucial. Furthermore, we employ Grad-CAM and attention map visualizations to analyze the model’s focus regions and demonstrate that EfficientViT captures holistic contextual features better than CNNs. The model also exhibits robustness across varying loin area proportions. Our findings suggest that EfficientViT is not only accurate but also efficient and interpretable, making it a strong candidate for real-time industrial applications in beef quality grading. Full article

► Show Figures

Figure 1

18 pages, 3721 KiB

Open AccessArticle

Haptic–Vision Fusion for Accurate Position Identification in Robotic Multiple Peg-in-Hole Assembly

by Jinlong Chen, Deming Luo, Zhigang Xiao, Minghao Yang, Xingguo Qin and Yongsong Zhan

Electronics 2025, 14(11), 2163; https://doi.org/10.3390/electronics14112163 - 26 May 2025

Viewed by 499

Abstract

Multi-peg-hole assembly is a fundamental process in robotic manufacturing, particularly for circular aviation electrical connectors (CAECs) that require precise axial alignment. However, CAEC assembly poses significant challenges due to small apertures, posture disturbances, and the need for high error tolerance. This paper proposes [...] Read more.

Multi-peg-hole assembly is a fundamental process in robotic manufacturing, particularly for circular aviation electrical connectors (CAECs) that require precise axial alignment. However, CAEC assembly poses significant challenges due to small apertures, posture disturbances, and the need for high error tolerance. This paper proposes a dual-stream Siamese network (DSSN) framework that fuses visual and tactile modalities to achieve accurate position identification in six-degree-of-freedom robotic connector assembly tasks. The DSSN employs ConvNeXt for visual feature extraction and SE-ResNet-50 with integrated attention mechanisms for tactile feature extraction, while a gated attention module adaptively fuses multimodal features. A bidirectional long short-term memory (Bi-LSTM) recurrent neural network is introduced to jointly model spatiotemporal deviations in position and orientation. Compared with state-of-the-art methods, the proposed DSSN achieves improvements of approximately 7.4%, 5.7%, and 5.4% in assembly success rates after 1, 5, and 10 buckling iterations, respectively. Experimental results validate that the integration of multimodal adaptive fusion and sequential spatiotemporal learning enables robust and precise robotic connectors assembly under high-tolerance conditions. Full article

► Show Figures

Figure 1

19 pages, 5522 KiB

Open AccessArticle

Performance of Fine-Tuning Techniques for Multilabel Classification of Surface Defects in Reinforced Concrete Bridges

by Benyamin Pooraskarparast, Son N. Dang, Vikram Pakrashi and José C. Matos

Appl. Sci. 2025, 15(9), 4725; https://doi.org/10.3390/app15094725 - 24 Apr 2025

Cited by 1 | Viewed by 625

Abstract

Machine learning models often face challenges in bridge inspections, especially in handling complex surface features and overlapping defects that make accurate classification difficult. These challenges are common for image-based monitoring, which has become increasingly popular for inspecting and assessing the structural condition of [...] Read more.

Machine learning models often face challenges in bridge inspections, especially in handling complex surface features and overlapping defects that make accurate classification difficult. These challenges are common for image-based monitoring, which has become increasingly popular for inspecting and assessing the structural condition of reinforced concrete bridges with automated possibilities. Despite advances in defect detection using convolutional neural networks (CNNs), although challenges such as overlapping defects, complex surface textures, and data imbalance remain difficult, full fine-tuning of deep learning models helps them better adapt to these conditions by updating all the layers for domain-specific learning. The aim of this study is to demonstrate how effective the fine-tuning of several deep learning architectures for bridge damage classification allows for robust performance and the best utilization value of the methods. Six CNN architectures, ResNet-18, ResNet-50, ResNet-101, ResNeXt-50, ResNeXt-101 and EfficientNet-B3, were fine-tuned using the CODEBRIM dataset. Their performance was evaluated using Precision, Recall, F1 Score, Balanced Accuracy and AUC-ROC metrics to ensure a robust evaluation framework. This indicates that the EfficientNet-B3 and ResNeXt-101 models outperformed the other models and achieved the highest classification accuracy in all the error categories. EfficientNet-B3 showed the best-balanced Precision (0.935) and perfect Recall (1.000) in background classification, indicating its ability to distinguish defect-free areas from structural damage. These results highlight the potential of these models to improve automated bridge inspection systems and thus increase accuracy and efficiency in real-world applications, as well as provide guidance for the selection of methods based on whether accuracy or overall consistency is more important for a specific application. Full article

► Show Figures

Figure 1

28 pages, 4033 KiB

Open AccessArticle

Advancing Prostate Cancer Diagnostics: A ConvNeXt Approach to Multi-Class Classification in Underrepresented Populations

by Declan Ikechukwu Emegano, Mubarak Taiwo Mustapha, Ilker Ozsahin, Dilber Uzun Ozsahin and Berna Uzun

Bioengineering 2025, 12(4), 369; https://doi.org/10.3390/bioengineering12040369 - 1 Apr 2025

Cited by 2 | Viewed by 737

Abstract

Prostate cancer is a leading cause of cancer-related morbidity and mortality worldwide, with diagnostic challenges magnified in underrepresented regions like sub-Saharan Africa. This study introduces a novel application of ConvNeXt, an advanced convolutional neural network architecture, for multi-class classification of prostate histopathological images [...] Read more.

Prostate cancer is a leading cause of cancer-related morbidity and mortality worldwide, with diagnostic challenges magnified in underrepresented regions like sub-Saharan Africa. This study introduces a novel application of ConvNeXt, an advanced convolutional neural network architecture, for multi-class classification of prostate histopathological images into normal, benign, and malignant categories. The dataset, sourced from a tertiary healthcare institution in Nigeria, represents a typically underserved African population, addressing critical disparities in global diagnostic research. We also used the ProstateX dataset (2017) from The Cancer Imaging Archive (TCIA) to validate our result. A comprehensive pipeline was developed, leveraging advanced data augmentation, Grad-CAM for interpretability, and an ablation study to enhance model optimization and robustness. The ConvNeXt model achieved an accuracy of 98%, surpassing the performance of traditional CNNs (ResNet50, 93%; EfficientNet, 94%; DenseNet, 92%) and transformer-based models (ViT, 88%; CaiT, 86%; Swin Transformer, 95%; RegNet, 94%). Also, using the ProstateX dataset, the ConvNeXt model recorded 87.2%, 85.7%, 86.4%, and 0.92 as accuracy, recall, F1 score, and AUC, respectively, as validation results. Its hybrid architecture combines the strengths of CNNs and transformers, enabling superior feature extraction. Grad-CAM visualizations further enhance explainability, bridging the gap between computational predictions and clinical trust. Ablation studies demonstrated the contributions of data augmentation, optimizer selection, and learning rate tuning to model performance, highlighting its robustness and adaptability for deployment in low-resource settings. This study advances equitable health care by addressing the lack of regional representation in diagnostic datasets and employing a clinically aligned three-class classification approach. Combining high performance, interpretability, and scalability, this work establishes a foundation for future research on diverse and underrepresented populations, fostering global inclusivity in cancer diagnostics. Full article

(This article belongs to the Special Issue Biomedical Imaging and Data Analytics for Disease Diagnosis and Treatment, 2nd Edition)

► Show Figures

Figure 1

26 pages, 4369 KiB

Open AccessArticle

Encoder–Decoder Variant Analysis for Semantic Segmentation of Gastrointestinal Tract Using UW-Madison Dataset

by Neha Sharma, Sheifali Gupta, Dalia H. Elkamchouchi and Salil Bharany

Bioengineering 2025, 12(3), 309; https://doi.org/10.3390/bioengineering12030309 - 18 Mar 2025

Cited by 1 | Viewed by 846

Abstract

The gastrointestinal (GI) tract, an integral part of the digestive system, absorbs nutrients from ingested food, starting from the mouth to the anus. GI tract cancer significantly impacts global health, necessitating precise treatment methods. Radiation oncologists use X-ray beams to target tumors while [...] Read more.

The gastrointestinal (GI) tract, an integral part of the digestive system, absorbs nutrients from ingested food, starting from the mouth to the anus. GI tract cancer significantly impacts global health, necessitating precise treatment methods. Radiation oncologists use X-ray beams to target tumors while avoiding the stomach and intestines, making the accurate segmentation of these organs crucial. This research explores various combinations of encoders and decoders to segment the small bowel, large bowel, and stomach in MRI images, using the UW-Madison GI tract dataset consisting of 38,496 scans. Encoders tested include ResNet50, EfficientNetB1, MobileNetV2, ResNext50, and Timm_Gernet_S, paired with decoders UNet, FPN, PSPNet, PAN, and DeepLab V3+. The study identifies ResNet50 with DeepLab V3+ as the most effective combination, assessed using the Dice coefficient, Jaccard index, and model loss. The proposed model, a combination of DeepLab V3+ and ResNet 50, obtained a Dice value of 0.9082, an IoU value of 0.8796, and a model loss of 0.117. The findings demonstrate the method’s potential to improve radiation therapy for GI cancer, aiding radiation oncologists in accurately targeting tumors while avoiding healthy organs. The results of this study will assist healthcare professionals involved in biomedical image analysis. Full article

(This article belongs to the Special Issue Cutting-Edge Applications of Machine and Deep Learning in Biomedical Signal and Image Processing)

► Show Figures

Figure 1

18 pages, 976 KiB

Open AccessArticle

TipSegNet: Fingertip Segmentation in Contactless Fingerprint Imaging

by Laurenz Ruzicka, Bernhard Kohn and Clemens Heitzinger

Sensors 2025, 25(6), 1824; https://doi.org/10.3390/s25061824 - 14 Mar 2025

Cited by 1 | Viewed by 976

Abstract

Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art [...] Read more.

Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art performance in segmenting fingertips directly from grayscale hand images. TipSegNet leverages a ResNeXt-101 backbone for robust feature extraction, combined with a Feature Pyramid Network (FPN) for multi-scale representation, enabling accurate segmentation across varying finger poses and image qualities. Furthermore, we employ an extensive data augmentation strategy to enhance the model’s generalizability and robustness. This model was trained and evaluated using a combined dataset of 2257 labeled hand images. TipSegNet outperforms existing methods, achieving a mean intersection over union (mIoU) of 0.987 and an accuracy of 0.999, representing a significant advancement in contactless fingerprint segmentation. This enhanced accuracy has the potential to substantially improve the reliability and effectiveness of contactless biometric systems in real-world applications. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

17 pages, 8138 KiB

Open AccessArticle

Deep Learning Models Based on Pretreatment MRI and Clinicopathological Data to Predict Responses to Neoadjuvant Systemic Therapy in Triple-Negative Breast Cancer

by Zhan Xu, Zijian Zhou, Jong Bum Son, Haonan Feng, Beatriz E. Adrada, Tanya W. Moseley, Rosalind P. Candelaria, Mary S. Guirguis, Miral M. Patel, Gary J. Whitman, Jessica W. T. Leung, Huong T. C. Le-Petross, Rania M. Mohamed, Bikash Panthi, Deanna L. Lane, Huiqin Chen, Peng Wei, Debu Tripathy, Jennifer K. Litton, Vicente Valero, Lei Huo, Kelly K. Hunt, Anil Korkut, Alastair Thompson, Wei Yang, Clinton Yam, Gaiane M. Rauch and Jingfei Ma Show full author list Hide full author list

Cancers 2025, 17(6), 966; https://doi.org/10.3390/cancers17060966 - 13 Mar 2025

Cited by 4 | Viewed by 1486

Abstract

Purpose: To develop deep learning models for predicting the pathologic complete response (pCR) to neoadjuvant systemic therapy (NAST) in patients with triple-negative breast cancer (TNBC) based on pretreatment multiparametric breast MRI and clinicopathological data. Methods: The prospective institutional review board-approved study [NCT02276443] included [...] Read more.

Purpose: To develop deep learning models for predicting the pathologic complete response (pCR) to neoadjuvant systemic therapy (NAST) in patients with triple-negative breast cancer (TNBC) based on pretreatment multiparametric breast MRI and clinicopathological data. Methods: The prospective institutional review board-approved study [NCT02276443] included 282 patients with stage I–III TNBC who had multiparametric breast MRI at baseline and underwent NAST and surgery during 2016–2021. Dynamic contrast-enhanced MRI (DCE), diffusion-weighted imaging (DWI), and clinicopathological data were used for the model development and internal testing. Data from the I-SPY 2 trial (2010–2016) were used for external testing. Four variables with a potential impact on model performance were systematically investigated: 3D model frameworks, tumor volume preprocessing, tumor ROI selection, and data inputs. Results: Forty-eight models with different variable combinations were investigated. The best-performing model in the internal testing dataset used DCE, DWI, and clinicopathological data with the originally contoured tumor volume, the tight bounding box of the tumor mask, and ResNeXt50, and achieved an area under the receiver operating characteristic curve (AUC) of 0.76 (95% CI: 0.60–0.88). The best-performing models in the external testing dataset achieved an AUC of 0.72 (95% CI: 0.57–0.84) using only DCE images (originally contoured tumor volume, enlarged bounding box of tumor mask, and ResNeXt50) and an AUC of 0.72 (95% CI: 0.56–0.86) using only DWI images (originally contoured tumor volume, enlarged bounding box of tumor mask, and ResNet18). Conclusions: We developed 3D deep learning models based on pretreatment data that could predict pCR to NAST in TNBC patients. Full article

(This article belongs to the Special Issue Advances in Triple-Negative Breast Cancer)

► Show Figures

Figure 1

30 pages, 34873 KiB

Open AccessArticle

Text-Guided Synthesis in Medical Multimedia Retrieval: A Framework for Enhanced Colonoscopy Image Classification and Segmentation

by Ojonugwa Oluwafemi Ejiga Peter, Opeyemi Taiwo Adeniran, Adetokunbo MacGregor John-Otumu, Fahmi Khalifa and Md Mahmudur Rahman

Algorithms 2025, 18(3), 155; https://doi.org/10.3390/a18030155 - 9 Mar 2025

Cited by 1 | Viewed by 1382

Abstract

The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized [...] Read more.

The lack of extensive, varied, and thoroughly annotated datasets impedes the advancement of artificial intelligence (AI) for medical applications, especially colorectal cancer detection. Models trained with limited diversity often display biases, especially when utilized on disadvantaged groups. Generative models (e.g., DALL-E 2, Vector-Quantized Generative Adversarial Network (VQ-GAN)) have been used to generate images but not colonoscopy data for intelligent data augmentation. This study developed an effective method for producing synthetic colonoscopy image data, which can be used to train advanced medical diagnostic models for robust colorectal cancer detection and treatment. Text-to-image synthesis was performed using fine-tuned Visual Large Language Models (LLMs). Stable Diffusion and DreamBooth Low-Rank Adaptation produce images that look authentic, with an average Inception score of 2.36 across three datasets. The validation accuracy of various classification models Big Transfer (BiT), Fixed Resolution Residual Next Generation Network (FixResNeXt), and Efficient Neural Network (EfficientNet) were 92%, 91%, and 86%, respectively. Vision Transformer (ViT) and Data-Efficient Image Transformers (DeiT) had an accuracy rate of 93%. Secondly, for the segmentation of polyps, the ground truth masks are generated using Segment Anything Model (SAM). Then, five segmentation models (U-Net, Pyramid Scene Parsing Network (PSNet), Feature Pyramid Network (FPN), Link Network (LinkNet), and Multi-scale Attention Network (MANet)) were adopted. FPN produced excellent results, with an Intersection Over Union (IoU) of

0.64

, an F1 score of

0.78

, a recall of

0.75

, and a Dice coefficient of 0.77. This demonstrates strong performance in terms of both segmentation accuracy and overlap metrics, with particularly robust results in balanced detection capability as shown by the high F1 score and Dice coefficient. This highlights how AI-generated medical images can improve colonoscopy analysis, which is critical for early colorectal cancer detection. Full article

(This article belongs to the Special Issue Algorithms and Applications of Machine Learning Techniques for Healthcare)

► Show Figures

Figure 1

Search Results (182)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (182)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI