Saved Queries

Medical anomaly detection is challenged by limited labeled data and domain shifts, which reduce the performance and generalization of deep learning (DL) models. Hybrid convolutional neural network–Vision Transformer (CNN–ViT) architectures have shown promise, but they often rely on large datasets. Multistage transfer learning (MTL) provides a practical strategy to address this limitation. In this study, we evaluated parallel hybrids, where convolutional neural network (CNN) and Vision Transformer (ViT) features are fused after independent extraction, and sequential hybrids, where CNN features are passed through the ViT for integrated processing. Models were pretrained on non-wrist musculoskeletal radiographs (MURA), fine-tuned on the MURA wrist subset, and evaluated for cross-domain generalization on an external wrist X-ray dataset from the Al-Huda Digital X-ray Laboratory. Parallel hybrids (Xception–DeiT, a data-efficient image transformer) achieved the strongest internal performance (accuracy 88%), while sequential DenseNet–ViT generalized best in zero-shot transfer. After light fine-tuning, parallel hybrids achieved near-perfect accuracy (98%) and recall (1.00). Statistical analyses showed no significant difference between the parallel and sequential models (McNemar’s test), while backbone selection played a key role in performance. The Wilcoxon test found no significant difference in recall and F1-score between image and patient-level evaluations, suggesting balanced performance across both levels. Sequential hybrids achieved up to 7× faster inference than parallel models on the MURA test set while maintaining similar GPU memory usage (3.7 GB). Both fusion strategies produced clinically meaningful saliency maps that highlighted relevant wrist regions. These findings present the first systematic comparison of CNN–ViT fusion strategies for wrist anomaly detection, clarifying trade-offs between accuracy, generalization, interpretability, and efficiency in clinical AI. Full article

►▼ Show Figures

Figure 1

18 pages, 2651 KB

Open AccessArticle

Deploying Neural Networks at Sea: Condition Monitoring of the Ropes on the Amerigo Vespucci

by Letizia Rosseti, Mattia Frascio, Massimiliano Avalle and Francesco Grella

J. Mar. Sci. Eng. 2025, 13(11), 2101; https://doi.org/10.3390/jmse13112101 - 4 Nov 2025

Viewed by 190

Abstract

Monitoring the condition of ropes aboard historic ships is crucial for both safety and preservation. This study introduces a portable, low-cost imaging device designed for deployment on the Italian training ship Amerigo Vespucci, enabling autonomous acquisition of high-quality images of onboard ropes. The device, built around a Raspberry Pi 3 and enclosed in a 3D-printed protective case, allows the crew to label the state of ropes using colored markers and capture standardized visual data. From 207 collected recordings, a curated and balanced dataset was created through frame extraction, blur filtering using Laplacian variance, and image preprocessing. This dataset was used to train and evaluate convolutional neural networks (CNNs) for binary classification of rope conditions. Both custom CNN architectures and pre-trained models (MobileNetV2 and EfficientNetB0) were tested. Results show that color images outperform grayscale in all cases, and that EfficientNetB0 achieved the best performance, with 97.74% accuracy and an F1-score of 0.9768. The study also compares model sizes and inference times, confirming the feasibility of real-time deployment on embedded hardware. These findings support the integration of deep learning techniques into field-deployable inspection tools for preventive maintenance in maritime environments. Full article

(This article belongs to the Section Ocean Engineering)

►▼ Show Figures

Figure 1

19 pages, 1672 KB

Open AccessArticle

Deep Learning-Based Method for a Ground-State Solution of Bose-Fermi Mixture at Zero Temperature

by Xianghong He, Jidong Gao, Rentao Wu, Yuhan Wang and Rongpei Zhang

Big Data Cogn. Comput. 2025, 9(11), 279; https://doi.org/10.3390/bdcc9110279 - 4 Nov 2025

Viewed by 208

Abstract

A Bose-Fermi mixture, consisting of both bosons and fermions, exhibits distinctive quantum coherence and phase transitions, offering valuable insights into many-body quantum systems. The ground state, as the system’s lowest energy configuration, is essential for understanding its overall behavior. In this study, we introduce the Bose-Fermi Energy-based Deep Neural Network (BF-EnDNN), a novel deep learning approach designed to solve the ground-state problem of Bose-Fermi mixtures at zero temperature through energy minimization. This method incorporates three key innovations: point sampling pre-training, a Dynamic Symmetry Layer (DSL), and a Positivity Preserving Layer (PPL). These features significantly improve the network’s accuracy and stability in quantum calculations. Our numerical results show that BF-EnDNN achieves accuracy comparable to traditional finite difference methods, with effective extension to two-dimensional systems. The method demonstrates high precision across various parameters, making it a promising tool for investigating complex quantum systems. Full article

(This article belongs to the Special Issue Application of Deep Neural Networks)

►▼ Show Figures

Figure 1

20 pages, 2659 KB

Open AccessArticle

Twin-Space Decoupling and Interaction for Efficient Vision-Language Transfer

by Wei Liang, Junqiang Li, Zhengkai Guo, Zhiwei Peng, Xiaocui Li, Junfeng Yang, Chuang Li and Wei Long

Electronics 2025, 14(21), 4314; https://doi.org/10.3390/electronics14214314 - 3 Nov 2025

Viewed by 154

Abstract

Pre-trained visual language models have become excellent basic models for many downstream tasks in transfer learning. However, due to the serious gap between the data scale of downstream tasks and the large-scale data used by pre-trained models, migration to downstream tasks will face the dilemma of discriminability and generalization. Therefore, it is necessary to learn task-specific knowledge while retaining general knowledge. How to accurately identify and distinguish these two types of representations remains a challenge. This paper proposes a dual-subspace driven cross-modal semantic interaction and dynamic feature fusion framework, which uses a decentralized covariance dual-subspace decomposition method to decouple visual and text features by constructing task subspaces and general knowledge subspaces, and performs refined modal interactions on the decoupled general features and task features through a cross-modal semantic interaction adapter module. Finally, a cross-level semantic fusion module based on a gating mechanism is used to achieve dynamic fusion of different semantics from shallow to deep. We verify the effectiveness of this method on three tasks: generalization to novel classes, novel target datasets, and domain generalization. Compared with a variety of advanced methods, the proposed method has achieved excellent performance in all evaluation tasks. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

►▼ Show Figures

Graphical abstract

26 pages, 5481 KB

Open AccessArticle

MCP-X: An Ultra-Compact CNN for Rice Disease Classification in Resource-Constrained Environments

by Xiang Zhang, Lining Yan, Belal Abuhaija and Baha Ihnaini

AgriEngineering 2025, 7(11), 359; https://doi.org/10.3390/agriengineering7110359 - 1 Nov 2025

Viewed by 196

Abstract

Rice, a dietary staple for over half of the global population, is highly susceptible to bacterial and fungal diseases such as bacterial blight, brown spot, and leaf smut, which can severely reduce yields. Traditional manual detection is labor-intensive and often results in delayed intervention and excessive chemical use. Although deep learning models like convolutional neural networks (CNNs) achieve high accuracy, their computational demands hinder deployment in resource-limited agricultural settings. We propose MCP-X, an ultra-compact CNN with only 0.21 million parameters for real-time, on-device rice disease classification. MCP-X integrates a shallow encoder, multi-branch expert routing, a bi-level recurrent simulation encoder–decoder (BRSE), an efficient channel attention (ECA) module, and a lightweight classifier. Trained from scratch, MCP-X achieves 98.93% accuracy on PlantVillage and 96.59% on the Rice Disease Detection Dataset, without external pretraining. Mechanistically, expert routing diversifies feature branches, ECA enhances channel-wise signal relevance, and BRSE captures lesion-scale and texture cues—yielding complementary, stage-wise gains confirmed through ablation studies. Despite slightly higher FLOPs than MobileNetV2, MCP-X prioritizes a minimal memory footprint (~1.01 MB) and deployability over raw speed, running at 53.83 FPS (2.42 GFLOPs) on an RTX A5000. It achieves 16.7×, 287×, 420×, and 659× fewer parameters than MobileNetV2, ResNet152V2, ViT-Base, and VGG-16, respectively. When integrated into a multi-resolution ensemble, MCP-X attains 99.85% accuracy, demonstrating exceptional robustness across controlled and field datasets while maintaining efficiency for real-world agricultural applications. Full article

►▼ Show Figures

Figure 1

26 pages, 13046 KB

Open AccessArticle

WeedNet-ViT: A Vision Transformer Approach for Robust Weed Classification in Smart Farming

by Ahmad Hasasneh, Rawan Ghannam and Sari Masri

Geographies 2025, 5(4), 64; https://doi.org/10.3390/geographies5040064 - 1 Nov 2025

Viewed by 204

Abstract

Weeds continue to pose a serious challenge to agriculture, reducing both the productivity and quality of crops. In this paper, we explore how modern deep learning, specifically Vision Transformers (ViTs), can help address this issue through fast and accurate weed classification. We developed a transformer-based model trained on the DeepWeeds dataset, which contains images of nine different weed species collected under various environmental conditions, such as changes in lighting and weather. By leveraging the ViT architecture, the model is able to capture complex patterns and spatial details in high-resolution images, leading to improved prediction accuracy. We also examined the effects of model optimization techniques, including fine-tuning and the use of pre-trained weights, along with different strategies for handling class imbalance. While traditional oversampling actually hurt performance, dropping accuracy to 94%, using class weights alongside strong data augmentation boosted accuracy to 96.9%. Overall, our ViT model outperformed standard Convolutional Neural Networks, achieving 96.9% accuracy on the held-out test set. Attention-based saliency maps were inspected to confirm that predictions were driven by weed regions, and model consistency under location shift and capture perturbations was assessed using the diverse acquisition sites in DeepWeeds. These findings show that with the right combination of model architecture and training strategies, Vision Transformers can offer a powerful solution for smarter weed detection and more efficient farming practices. Full article

►▼ Show Figures

Figure 1

20 pages, 918 KB

Open AccessArticle

MVIB-Lip: Multi-View Information Bottleneck for Visual Speech Recognition via Time Series Modeling

by Yuzhe Li, Haocheng Sun, Jiayi Cai and Jin Wu

Entropy 2025, 27(11), 1121; https://doi.org/10.3390/e27111121 - 31 Oct 2025

Viewed by 318

Abstract

Lipreading, or visual speech recognition, is the task of interpreting utterances solely from visual cues of lip movements. While early approaches relied on Hidden Markov Models (HMMs) and handcrafted spatiotemporal descriptors, recent advances in deep learning have enabled end-to-end recognition using large-scale datasets. However, such methods often require millions of labeled or pretraining samples and struggle to generalize under low-resource or speaker-independent conditions. In this work, we revisit lipreading from a multi-view learning perspective. We introduce MVIB-Lip, a framework that integrates two complementary representations of lip movements: (i) raw landmark trajectories modeled as multivariate time series, and (ii) recurrence plot (RP) images that encode structural dynamics in a texture form. A Transformer encoder processes the temporal sequences, while a ResNet-18 extracts features from RPs; the two views are fused via a product-of-experts posterior regularized by the multi-view information bottleneck. Experiments on the OuluVS and a self-collected dataset demonstrate that MVIB-Lip consistently outperforms handcrafted baselines and improves generalization to speaker-independent recognition. Our results suggest that recurrence plots, when coupled with deep multi-view learning, offer a principled and data-efficient path forward for robust visual speech recognition. Full article

(This article belongs to the Special Issue The Information Bottleneck Method: Theory and Applications)

►▼ Show Figures

Figure 1

16 pages, 2422 KB

Open AccessArticle

Enhancing Binary Security Analysis Through Pre-Trained Semantic and Structural Feature Matching

by Chen Yi, Wei Dai, Yiqi Deng, Liang Bao and Guoai Xu

Appl. Sci. 2025, 15(21), 11610; https://doi.org/10.3390/app152111610 - 30 Oct 2025

Viewed by 313

Abstract

Binary code similarity detection serves as a critical front-line defense mechanism in cybersecurity, playing an indispensable role in identifying known vulnerabilities, detecting emergent malware families, and preventing intellectual property theft via code plagiarism. However, existing methods based on Control Flow Graphs (CFGs) often suffer from two major limitations: the inadequate capture of deep semantic information within CFG nodes, and the neglect of structural relationships across different functions. To address these issues, we propose Breg, a novel framework that synergistically integrates pre-trained semantic features with cross-graph structural features. Breg employs a BERT model pre-trained on a large-scale binary corpus to capture nuanced semantic relationships, and introduces a Cross-Graph Neural Network (CGNN) to explicitly model topological correlations between two CFGs, thereby generating highly discriminative embeddings. Extensive experimental validation demonstrates that Breg achieves leading F1-scores of 0.8682 and 0.8970 on Dataset3. In real-world vulnerability search tasks on Dataset4, Breg achieves an MRR@10 of 0.9333 in the challenging MIPS32-to-x64 search task, a clear improvement over the 0.8533 scored by the strongest baseline. This underscores its superior effectiveness and robustness across diverse compilation environments and architectures. To the best of our knowledge, this is the first work to integrate a pre-trained language model with cross-graph structural learning for binary code similarity detection, offering enhanced effectiveness, generalization, and practical applicability in real-world security scenarios. Full article

(This article belongs to the Special Issue Cyberspace Security Technology in Computer Science)

►▼ Show Figures

Figure 1

17 pages, 2569 KB

Open AccessArticle

Automated Multi-Class Classification of Retinal Pathologies: A Deep Learning Approach to Unified Ophthalmic Screening

by Uğur Şevik and Onur Mutlu

Diagnostics 2025, 15(21), 2745; https://doi.org/10.3390/diagnostics15212745 - 29 Oct 2025

Viewed by 521

Abstract

Background/Objectives: The prevailing paradigm in ophthalmic AI involves siloed, single-disease models, which fails to address the complexity of differential diagnosis in clinical practice. This study aimed to develop and validate a unified deep learning framework for the automated multi-class classification of a wide spectrum of retinal pathologies from fundus photographs, moving beyond the single-disease paradigm to create a comprehensive screening tool. Methods: A publicly available dataset was manually curated by an ophthalmologist, resulting in 1841 images across nine classes, including Diabetic Retinopathy, Glaucoma, and Healthy retinas. After extensive data augmentation to mitigate class imbalance, three pre-trained CNN architectures (ResNet-152, EfficientNetV2, and a YOLOv11-based classifier) were comparatively evaluated. The models were trained using transfer learning and their performance was assessed on an independent test set using accuracy, macro-averaged F1-score, and Area Under the Curve (AUC). Results: The YOLOv11-based classifier demonstrated superior performance over the other architectures on the validation set. On the final independent test set, it achieved a robust overall accuracy of 0.861 and a macro-averaged F1-score of 0.861. The model yielded a validation set AUC of 0.961, which was statistically superior to both ResNet-152 (p < 0.001) and EfficientNetV2 (p < 0.01) as confirmed by the DeLong test. Conclusions: A unified deep learning framework, leveraging a YOLOv11 backbone, can accurately classify nine distinct retinal conditions from a single fundus photograph. This holistic approach moves beyond the limitations of single-disease algorithms, offering considerable promise as a comprehensive AI-driven screening tool to augment clinical decision-making and enhance diagnostic efficiency in ophthalmology. Full article

(This article belongs to the Special Issue Artificial Intelligence in Eye Disease, 4th Edition)

►▼ Show Figures

Figure 1

16 pages, 1268 KB

Open AccessArticle

A Community Benchmark for the Automated Segmentation of Pediatric Neuroblastoma on Multi-Modal MRI: Design and Results of the SPPIN Challenge at MICCAI 2023

by Myrthe A. D. Buser, Dominique C. Simons, Matthijs Fitski, Marc H. W. A. Wijnen, Annemieke S. Littooij, Annemiek H. ter Brugge, Iris N. Vos, Markus H. A. Janse, Mathijs de Boer, Rens ter Maat, Junya Sato, Shoji Kido, Satoshi Kondo, Satoshi Kasai, Marek Wodzinski, Henning Müller, Jin Ye, Junjun He, Yannick Kirchhoff, Maximilian R. Rokkus, Gao Haokai, Matías Fernández-Patón, Diana Veiga-Canuto, David G. Ellis, Michele Aizenberg, Bas H. M. van der Velden, Hugo Kuijf, Alberto de Luca and Alida F. W. van der Steeg Show full author list Hide full author list

Bioengineering 2025, 12(11), 1157; https://doi.org/10.3390/bioengineering12111157 - 26 Oct 2025

Viewed by 407

Abstract

Surgery plays a key role in treating neuroblastoma. To assist surgical planning, anatomical 3D models derived from the segmentation of anatomical structures on MRI scans are often used. Automation using deep learning can make segmentations less time-consuming and more reliable. We organized the Surgical Planning in PedIatric Neuroblastoma (SPPIN) challenge, to stimulate developments and benchmarking of automatic segmentation of neuroblastoma on MRI. SPPIN is the first segmentation challenge in extracranial pediatric oncology. Nine teams provided a valid submission. Evaluation was based on the Dice similarity coefficient (Dice score), the 95th percentile of the Hausdorff distance (HD95), and the volumetric similarity (VS). A combination of these scores determined the ranking of the teams. The spread in the median evaluation scores per team was large (Dice: 0.21–0.82; HD95: 63.31–7.69; VS: 0.31–0.91). The top-performing team achieved a median Dice score of 0.82 (with an HD95 of 7.69 mm and a VS of 0.91) using a large, pre-trained model. However, in the pre-operative segmentations, significantly lower evaluation scores were observed. Our results indicate that pre-training might be useful in small, pediatric datasets. Although the general results of the winning team were high, they were insufficient to use for surgical planning in small, pre-operative tumors. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence in Pediatric Healthcare)

►▼ Show Figures

Figure 1

22 pages, 10489 KB

Open AccessArticle

From Contemporary Datasets to Cultural Heritage Performance: Explainability and Energy Profiling of Visual Models Towards Textile Identification

by Evangelos Nerantzis, Lamprini Malletzidou, Eleni Kyratzopoulou, Nestor C. Tsirliganis and Nikolaos A. Kazakis

Heritage 2025, 8(11), 447; https://doi.org/10.3390/heritage8110447 - 24 Oct 2025

Viewed by 360

Abstract

The identification and classification of textiles play a crucial role in archaeometric studies, in the vicinity of their technological, economic, and cultural significance. Traditional textile analysis is closely related to optical microscopy and observation, while other microscopic, analytical, and spectroscopic techniques prevail over fiber identification for composition purposes. This protocol can be invasive and destructive for the artifacts under study, time-consuming, and it often relies on personal expertise. In this preliminary study, an alternative, macroscopic approach is proposed, based on texture and surface textile characteristics, using low-magnification images and deep learning models. Under this scope, a publicly available, imbalanced textile image dataset was used to pretrain and evaluate six computer vision architectures (ResNet50, EfficientNetV2, ViT, ConvNeXt, Swin Transformer, and MaxViT). In addition to accuracy, energy efficiency and ecological footprint of the process were assessed using the CodeCarbon tool. The results indicate that two of the convolutional neural network models, Swin and EfficientNetV2, both deliver competitive accuracies together with low carbon emissions, in comparison to the transformer and hybrid models. This alternative, promising, sustainable, and non-invasive approach for textile classification demonstrates the feasibility of developing a custom, heritage-based image dataset. Full article

(This article belongs to the Special Issue Futurescape of Heritage Preservation: Integrating AI, Digital Twins, and Multi-Scale Technologies for Cultural Sustainability)

►▼ Show Figures

Figure 1

14 pages, 6970 KB

Open AccessArticle

Rehearsal-Free Continual Learning for Emerging Unsafe Behavior Recognition in Construction Industry

by Tao Wang, Saisai Ye, Zimeng Zhai, Weigang Lu and Cunling Bian

Sensors 2025, 25(21), 6525; https://doi.org/10.3390/s25216525 - 23 Oct 2025

Viewed by 299

Abstract

In the realm of Industry 5.0, the incorporation of Artificial Intelligence (AI) in overseeing workers, machinery, and industrial systems is essential for fostering a human-centric, sustainable, and resilient industry. Despite technological advancements, the construction industry remains largely labor intensive, with site management and interventions predominantly reliant on manual judgments, leading to inefficiencies and various challenges. This research emphasizes identifying unsafe behaviors and risks within construction environments by employing AI. Given the continuous emergence of unsafe behaviors that requires certain caution, it is imperative to adapt to these novel categories while retaining the knowledge of existing ones. Although deep convolutional neural networks have shown excellent performance in behavior recognition, they traditionally function as predefined multi-way classifiers, which exhibit limited flexibility in accommodating emerging unsafe behavior classes. Addressing this issue, this study proposes a versatile and efficient recognition model capable of expanding the range of unsafe behaviors while maintaining the recognition of both new and existing categories. Adhering to the continual learning paradigm, this method integrates two types of complementary prompts into the pre-trained model: task-invariant prompts that encode knowledge shared across tasks, and task-specific prompts that adapt the model to individual tasks. These prompts are injected into specific layers of the frozen backbone to guide learning without requiring a rehearsal buffer, enabling effective recognition of both new and previously learned unsafe behaviors. Additionally, this paper introduces a benchmark dataset, Split-UBR, specifically constructed for continual unsafe behavior recognition on construction sites. To rigorously evaluate the proposed model, we conducted comparative experiments using average accuracy and forgetting as metrics, and benchmarked against state-of-the-art continual learning baselines. Results on the Split-UBR dataset demonstrate that our method achieves superior performance in terms of both accuracy and reduced forgetting across all tasks, highlighting its effectiveness in dynamic industrial environments. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

23 pages, 2701 KB

Open AccessArticle

Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection

by Manuel Friebolin, Michael Munz and Klaus Schlickenrieder

AI 2025, 6(10), 275; https://doi.org/10.3390/ai6100275 - 21 Oct 2025

Viewed by 564

Abstract

In shearography-based tire testing, so-called “Mode Hops”, abrupt phase changes caused by laser mode changes, can lead to significant disturbances in the interference image analysis. These artifacts distort defect assessment, lead to retesting or false-positive decisions, and, thus, represent a significant hurdle for the automation of the shearography-based tire inspection process. This work proposes a deep learning workflow that combines a pretrained, optimized ResNet-50 classifier with Grad-CAM, providing a practical and explainable solution for the reliable detection and localization of Mode Hops in shearographic tire inspection images. We trained the algorithm on an extensive, cross-machine dataset comprising more than 6.5 million test images. The final deep learning model achieves a classification accuracy of 99.67%, a false-negative rate of 0.48%, and a false-positive rate of 0.24%. Applying a probability-based quadrant-repeat decision rule within the inspection process effectively reduces process-level false positives to zero, with an estimated probability of repetition of ≤0.084%. This statistically validated approach increases the overall inspection accuracy to 99.83%. The method allows the robust detection and localization of relevant Mode Hops and represents a significant contribution to explainable, AI-supported tire testing. It fulfills central requirements for the automation of shearography-based tire testing and contributes to the possible certification process of non-destructive testing methods in safety-critical industries. Full article

►▼ Show Figures

Figure 1

22 pages, 5641 KB

Open AccessArticle

A Globally Optimal Alternative to MLP

by Zheng Li, Jerry Cheng and Huanying Helen Gu

Information 2025, 16(10), 921; https://doi.org/10.3390/info16100921 - 21 Oct 2025

Viewed by 267

Abstract

In deep learning, achieving the global minimum poses a significant challenge, even for relatively simple architectures such as Multi-Layer Perceptrons (MLPs). To address this challenge, we visualized model states at both local and global optima, thereby identifying the factors that impede the transition of models from local to global minima when employing conventional model training methodologies. Based on these insights, we propose the Lagrange Regressor (LReg), a framework that is mathematically equivalent to MLPs. Rather than updates via optimization techniques, LReg employs a Mesh-Refinement–Coarsening (discrete) process to ensure the convergence of the model’s loss function to the global minimum. LReg achieves faster convergence and overcomes the inherent limitations of neural networks in fitting multi-frequency functions. Experiments conducted on large-scale benchmarks including ImageNet-1K (image classification), GLUE (natural language understanding), and WikiText (language modeling) show that LReg consistently enhances the performance of pre-trained models, significantly lowers test loss, and scales effectively to big data scenarios. These results underscore LReg’s potential as a scalable, optimization-free alternative for deep learning in large and complex datasets, aligning closely with the goals of innovative big data analytics. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics, 2nd Edition)

►▼ Show Figures

Figure 1

18 pages, 2025 KB

Open AccessArticle

A Priori Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer Using Deep Features from Pre-Treatment MRI and CT

by Deok Hyun Jang, Laurentius O. Osapoetra, Lakshmanan Sannachi, Belinda Curpen, Ana Pejović-Milić and Gregory J. Czarnota

Cancers 2025, 17(20), 3394; https://doi.org/10.3390/cancers17203394 - 21 Oct 2025

Viewed by 511

Abstract

Background: Response to neoadjuvant chemotherapy (NAC) is a key prognostic indicator in breast cancer, yet current assessment relies on postoperative pathology. This study investigated the use of deep features derived from pre-treatment MRI and CT scans, in conjunction with clinical variables, to predict treatment response a priori. Methods: Two response endpoints were analyzed: pathologic complete response (pCR) versus non-pCR, and responders versus non-responders, with response defined as a reduction in tumor size of at least 30%. Intratumoral and peritumoral segmentations were generated on contrast-enhanced T1-weighted (CE-T1) and T2-weighted MRI, as well as contrast-enhanced CT images of tumors. Deep features were extracted from these regions using ResNet10, ResNet18, ResNet34, and ResNet50 architectures pre-trained with MedicalNet. Handcrafted radiomic features were also extracted for comparison. Feature selection was conducted with minimum redundancy maximum relevance (mRMR) followed by recursive feature elimination (RFE), and classification was performed using XGBoost across ten independent data partitions. Results: A total of 177 patients were analyzed in this study. ResNet34-derived features achieved the highest overall classification performance under both criteria, outperforming handcrafted features and deep features from other ResNet architectures. For distinguishing pCR from non-pCR, ResNet34 achieved a balanced accuracy of 81.6%, whereas handcrafted radiomics achieved 77.9%. For distinguishing responders from non-responders, ResNet34 achieved a balanced accuracy of 73.5%, compared with 70.2% for handcrafted radiomics. Conclusions: Deep features extracted from routinely acquired MRI and CT, when combined with clinical information, improve the prediction of NAC response in breast cancer. This multimodal framework demonstrates the value of deep learning-based approaches as a complement to handcrafted radiomics and provides a basis for more individualized treatment strategies. Full article

(This article belongs to the Special Issue CT/MRI/PET in Cancer)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 38.

Go to page 1 2 3 4 5

Search Results (1,897)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI