Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (345)

Search Parameters:
Keywords = inception architecture

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 4016 KiB  
Article
Integrated Deep Learning Framework for Cardiac Risk Stratification and Complication Analysis in Leigh’s Disease
by Md Aminul Islam, Jayasree Varadarajan, Md Abu Sufian, Bhupesh Kumar Mishra and Md Ruhul Amin Rasel
Cardiogenetics 2025, 15(3), 19; https://doi.org/10.3390/cardiogenetics15030019 - 15 Jul 2025
Viewed by 62
Abstract
Background: Leigh’s Disease is a rare mitochondrial disorder primarily affecting the central nervous system, with frequent secondary cardiac manifestations such as hypertrophic and dilated cardiomyopathies. Early detection of cardiac complications is crucial for patient management, but manual interpretation of cardiac MRI is labour-intensive [...] Read more.
Background: Leigh’s Disease is a rare mitochondrial disorder primarily affecting the central nervous system, with frequent secondary cardiac manifestations such as hypertrophic and dilated cardiomyopathies. Early detection of cardiac complications is crucial for patient management, but manual interpretation of cardiac MRI is labour-intensive and subject to inter-observer variability. Methodology: We propose an integrated deep learning framework using cardiac MRI to automate the detection of cardiac abnormalities associated with Leigh’s Disease. Four CNN architectures—Inceptionv3, a custom 3-layer CNN, DenseNet169, and EfficientNetB2—were trained on preprocessed MRI data (224 × 224 pixels), including left ventricular segmentation, contrast enhancement, and gamma correction. Morphological features (area, aspect ratio, and extent) were also extracted to aid interpretability. Results: EfficientNetB2 achieved the highest test accuracy (99.2%) and generalization performance, followed by DenseNet169 (98.4%), 3-layer CNN (95.6%), and InceptionV3 (94.2%). Statistical morphological analysis revealed significant differences in cardiac structure between Leigh’s and non-Leigh’s cases, particularly in area (212,097 vs. 2247 pixels) and extent (0.995 vs. 0.183). The framework was validated using ROC (AUC = 1.00), Brier Score (0.000), and cross-validation (mean sensitivity = 1.000, std = 0.000). Feature embedding visualisation using PCA, t-SNE, and UMAP confirmed class separability. Grad-CAM heatmaps localised relevant myocardial regions, supporting model interpretability. Conclusions: Our deep learning-based framework demonstrated high diagnostic accuracy and interpretability in detecting Leigh’s disease-related cardiac complications. Integrating morphological analysis and explainable AI provides a robust and scalable tool for early-stage detection and clinical decision support in rare diseases. Full article
Show Figures

Figure 1

19 pages, 3165 KiB  
Article
Majority Voting Ensemble of Deep CNNs for Robust MRI-Based Brain Tumor Classification
by Kuo-Ying Liu, Nan-Han Lu, Yung-Hui Huang, Akari Matsushima, Koharu Kimura, Takahide Okamoto and Tai-Been Chen
Diagnostics 2025, 15(14), 1782; https://doi.org/10.3390/diagnostics15141782 - 15 Jul 2025
Viewed by 143
Abstract
Background/Objectives: Accurate classification of brain tumors is critical for treatment planning and prognosis. While deep convolutional neural networks (CNNs) have shown promise in medical imaging, few studies have systematically compared multiple architectures or integrated ensemble strategies to improve diagnostic performance. This study [...] Read more.
Background/Objectives: Accurate classification of brain tumors is critical for treatment planning and prognosis. While deep convolutional neural networks (CNNs) have shown promise in medical imaging, few studies have systematically compared multiple architectures or integrated ensemble strategies to improve diagnostic performance. This study aimed to evaluate various CNN models and optimize classification performance using a majority voting ensemble approach on T1-weighted MRI brain images. Methods: Seven pretrained CNN architectures were fine-tuned to classify four categories: glioblastoma, meningioma, pituitary adenoma, and no tumor. Each model was trained using two optimizers (SGDM and ADAM) and evaluated on a public dataset split into training (70%), validation (10%), and testing (20%) subsets, and further validated on an independent external dataset to assess generalizability. A majority voting ensemble was constructed by aggregating predictions from all 14 trained models. Performance was assessed using accuracy, Kappa coefficient, true positive rate, precision, confusion matrix, and ROC curves. Results: Among individual models, GoogLeNet and Inception-v3 with ADAM achieved the highest classification accuracy (0.987). However, the ensemble approach outperformed all standalone models, achieving an accuracy of 0.998, a Kappa coefficient of 0.997, and AUC values above 0.997 for all tumor classes. The ensemble demonstrated improved sensitivity, precision, and overall robustness. Conclusions: The majority voting ensemble of diverse CNN architectures significantly enhanced the performance of MRI-based brain tumor classification, surpassing that of any single model. These findings underscore the value of model diversity and ensemble learning in building reliable AI-driven diagnostic tools for neuro-oncology. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

24 pages, 9593 KiB  
Article
Deep Learning Approaches for Skin Lesion Detection
by Jonathan Vieira, Fábio Mendonça and Fernando Morgado-Dias
Electronics 2025, 14(14), 2785; https://doi.org/10.3390/electronics14142785 - 10 Jul 2025
Viewed by 152
Abstract
Recently, there has been a rise in skin cancer cases, for which early detection is highly relevant, as it increases the likelihood of a cure. In this context, this work presents a benchmarking study of standard Convolutional Neural Network (CNN) architectures for automated [...] Read more.
Recently, there has been a rise in skin cancer cases, for which early detection is highly relevant, as it increases the likelihood of a cure. In this context, this work presents a benchmarking study of standard Convolutional Neural Network (CNN) architectures for automated skin lesion classification. A total of 38 CNN architectures from ten families (ConvNeXt, DenseNet, EfficientNet, Inception, InceptionResNet, MobileNet, NASNet, ResNet, VGG, and Xception) were evaluated using transfer learning on the HAM10000 dataset for seven-class skin lesion classification, namely, actinic keratoses, basal cell carcinoma, benign keratosis-like lesions, dermatofibroma, melanoma, melanocytic nevi, and vascular lesions. The comparative analysis used standardized training conditions, with all models utilizing frozen pre-trained weights. Cross-database validation was then conducted using the ISIC 2019 dataset to assess generalizability across different data distributions. The ConvNeXtXLarge architecture achieved the best performance, despite having one of the lowest performance-to-number-of-parameters ratios, with 87.62% overall accuracy and 76.15% F1 score on the test set, demonstrating competitive results within the established performance range of existing HAM10000-based studies. A proof-of-concept multiplatform mobile application was also implemented using a client–server architecture with encrypted image transmission, demonstrating the viability of integrating high-performing models into healthcare screening tools. Full article
Show Figures

Figure 1

23 pages, 3645 KiB  
Article
Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics
by Patrycja Kwiek, Filip Ciepiela and Małgorzata Jakubowska
Electronics 2025, 14(14), 2773; https://doi.org/10.3390/electronics14142773 - 10 Jul 2025
Viewed by 112
Abstract
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware [...] Read more.
Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware loss functions to enhance synthetic blood cell image quality. Methods: RGB microscopic images from the BloodMNIST dataset (eight blood cell types, resolution 3 × 128 × 128) underwent preprocessing with k-means clustering to extract the dominant colors and UMAP for visualizing class similarity. Spearman correlation-based distance matrices were used to evaluate the discriminative power of each RGB channel. A MoE–cGAN architecture was developed with residual blocks and LeakyReLU activations. Expert generators were conditioned on cell type, and the generator’s loss was augmented with a Wasserstein distance-based term comparing red and green channel histograms, which were found most relevant for class separation. Results: The red and green channels contributed most to class discrimination; the blue channel had minimal impact. The proposed model achieved 0.97 classification accuracy on generated images (ResNet50), with 0.96 precision, 0.97 recall, and a 0.96 F1-score. The best Fréchet Inception Distance (FID) was 52.1. Misclassifications occurred mainly among visually similar cell types. Conclusions: Integrating histogram alignment into the MoE–cGAN training significantly improves the realism and class-specific variability of synthetic images, supporting robust model development under data scarcity in hematological imaging. Full article
Show Figures

Figure 1

18 pages, 13103 KiB  
Article
ILViT: An Inception-Linear Attention-Based Lightweight Vision Transformer for Microscopic Cell Classification
by Zhangda Liu, Panpan Wu, Ziping Zhao and Hengyong Yu
J. Imaging 2025, 11(7), 219; https://doi.org/10.3390/jimaging11070219 - 1 Jul 2025
Viewed by 283
Abstract
Microscopic cell classification is a fundamental challenge in both clinical diagnosis and biological research. However, existing methods still struggle with the complexity and morphological diversity of cellular images, leading to limited accuracy or high computational costs. To overcome these constraints, we propose an [...] Read more.
Microscopic cell classification is a fundamental challenge in both clinical diagnosis and biological research. However, existing methods still struggle with the complexity and morphological diversity of cellular images, leading to limited accuracy or high computational costs. To overcome these constraints, we propose an efficient classification method that balances strong feature representation with a lightweight design. Specifically, an Inception-Linear Attention-based Lightweight Vision Transformer (ILViT) model is developed for microscopic cell classification. The ILViT integrates two innovative modules: Dynamic Inception Convolution (DIC) and Contrastive Omni-Kolmogorov Attention (COKA). DIC combines dynamic and Inception-style convolutions to replace large kernels with fewer parameters. COKA integrates Omni-Dimensional Dynamic Convolution (ODC), linear attention, and a Kolmogorov-Arnold Network(KAN) structure to enhance feature learning and model interpretability. With only 1.91 GFLOPs and 8.98 million parameters, ILViT achieves high efficiency. Extensive experiments on four public datasets are conducted to validate the effectiveness of the proposed method. It achieves an accuracy of 97.185% on BioMediTech dataset for classifying retinal pigment epithelial cells, 97.436% on ICPR-HEp-2 dataset for diagnosing autoimmune disorders via HEp-2 cell classification, 90.528% on Hematological Malignancy Bone Marrow Cytology Expert Annotation dataset for categorizing bone marrow cells, and 99.758% on a white blood cell dataset for distinguishing leukocyte subtypes. These results show that ILViT outperforms the state-of-the-art models in both accuracy and efficiency, demonstrating strong generalizability and practical potential for cell image classification. Full article
Show Figures

Figure 1

23 pages, 8902 KiB  
Article
2D Prediction of the Nutritional Composition of Dishes from Food Images: Deep Learning Algorithm Selection and Data Curation Beyond the Nutrition5k Project
by Rachele Bianco, Sergio Coluccia, Michela Marinoni, Alex Falcon, Federica Fiori, Giuseppe Serra, Monica Ferraroni, Valeria Edefonti and Maria Parpinel
Nutrients 2025, 17(13), 2196; https://doi.org/10.3390/nu17132196 - 30 Jun 2025
Viewed by 319
Abstract
Background/Objectives: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder [...] Read more.
Background/Objectives: Deep learning (DL) has shown strong potential in analyzing food images, but few studies have directly predicted mass, energy, and macronutrient content from images. In addition to the importance of high-quality data, differences in country-specific food composition databases (FCDBs) can hinder model generalization. Methods: We assessed the performance of several standard DL models using four ground truth datasets derived from Nutrition5k—the largest image–nutrition dataset with ~5000 complex US cafeteria dishes. In light of developing an Italian dietary assessment tool, these datasets varied by FCDB alignment (Italian vs. US) and data curation (ingredient–mass correction and frame filtering on the test set). We evaluated combinations of four feature extractors [ResNet-50 (R50), ResNet-101 (R101), InceptionV3 (IncV3), and Vision Transformer-B-16 (ViT-B-16)] with two regression networks (2+1 and 2+2), using IncV3_2+2 as the benchmark. Descriptive statistics (percentages of agreement, unweighted Cohen’s kappa, and Bland–Altman plots) and standard regression metrics were used to compare predicted and ground truth nutritional composition. Dishes mispredicted by ≥7 algorithms were analyzed separately. Results: R50, R101, and ViT-B-16 consistently outperformed the benchmark across all datasets. Specifically, when replacing it with these top algorithms, reductions in median Mean Absolute Percentage Errors were 6.2% for mass, 6.4% for energy, 12.3% for fat, and 33.1% and 40.2% for protein and carbohydrates. Ingredient–mass correction substantially improved prediction metrics (6–42% when considering the top algorithms), while frame filtering had a more limited effect (<3%). Performance was consistently poor across most models for complex salads, chicken-based or eggs-based dishes, and Western-inspired breakfasts. Conclusions: The R101 and ViT-B-16 architectures will be prioritized in future analyses, where ingredient–mass correction and automated frame filtering methods will be considered. Full article
Show Figures

Figure 1

16 pages, 3892 KiB  
Article
Fault Diagnosis Method for Shearer Arm Gear Based on Improved S-Transform and Depthwise Separable Convolution
by Haiyang Wu, Hui Zhou, Chang Liu, Gang Cheng and Yusong Pang
Sensors 2025, 25(13), 4067; https://doi.org/10.3390/s25134067 - 30 Jun 2025
Viewed by 248
Abstract
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise [...] Read more.
To address the limitations in time–frequency feature representation of shearer arm gear faults and the issues of parameter redundancy and low training efficiency in standard convolutional neural networks (CNNs), this study proposes a diagnostic method based on an improved S-transform and a Depthwise Separable Convolutional Neural Network (DSCNN). First, the improved S-transform is employed to perform time–frequency analysis on the vibration signals, converting the original one-dimensional signals into two-dimensional time–frequency images to fully preserve the fault characteristics of the gear. Then, a neural network model combining standard convolution and depthwise separable convolution is constructed for fault identification. The experimental dataset includes five gear conditions: tooth deficiency, tooth breakage, tooth wear, tooth crack, and normal. The performance of various frequency-domain and time-frequency methods—Wavelet Transform, Fourier Transform, S-transform, and Gramian Angular Field (GAF)—is compared using the same network model. Furthermore, Grad-CAM is applied to visualize the responses of key convolutional layers, highlighting the regions of interest related to gear fault features. Finally, four typical CNN architectures are analyzed and compared: Deep Convolutional Neural Network (DCNN), InceptionV3, Residual Network (ResNet), and Pyramid Convolutional Neural Network (PCNN). Experimental results demonstrate that frequency–domain representations consistently outperform raw time-domain signals in fault diagnosis tasks. Grad-CAM effectively verifies the model’s accurate focus on critical fault features. Moreover, the proposed method achieves high classification accuracy while reducing both training time and the number of model parameters. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

30 pages, 30354 KiB  
Article
Typological Transcoding Through LoRA and Diffusion Models: A Methodological Framework for Stylistic Emulation of Eclectic Facades in Krakow
by Zequn Chen, Nan Zhang, Chaoran Xu, Zhiyu Xu, Songjiang Han and Lishan Jiang
Buildings 2025, 15(13), 2292; https://doi.org/10.3390/buildings15132292 - 29 Jun 2025
Viewed by 301
Abstract
The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, [...] Read more.
The stylistic emulation of historical building facades presents significant challenges for artificial intelligence (AI), particularly for complex and data-scarce styles like Krakow’s Eclecticism. This study aims to develop a methodological framework for a “typological transcoding” of style that moves beyond mere visual mimicry, which is crucial for heritage preservation and urban renewal. The proposed methodology integrates architectural typology with Low-Rank Adaptation (LoRA) for fine-tuning a Stable Diffusion (SD) model. This process involves a typology-guided preparation of a curated dataset (150 images) and precise control of training parameters. The resulting typologically guided LoRA-tuned model demonstrates significant performance improvements over baseline models. Quantitative analysis shows a 24.6% improvement in Fréchet Inception Distance (FID) and a 7.0% improvement in Learned Perceptual Image Patch Similarity (LPIPS). Furthermore, qualitative evaluations by 68 experts confirm superior realism and stylistic accuracy. The findings indicate that this synergy enables data-efficient, typology-grounded stylistic emulation, highlighting AI’s potential as a creative partner for nuanced reinterpretation. However, achieving deeper semantic understanding and robust 3D inference remains an ongoing challenge. Full article
Show Figures

Figure 1

25 pages, 4471 KiB  
Article
A Novel Lightweight Framework for Non-Contact Broiler Face Identification in Intensive Farming
by Bin Gao, Yongmin Guo, Pengshen Zheng, Kaisi Yang and Changxi Chen
Sensors 2025, 25(13), 4051; https://doi.org/10.3390/s25134051 - 29 Jun 2025
Viewed by 319
Abstract
Efficient individual identification is essential for advancing precision broiler farming. In this study, we propose YOLO-IFSC, a high-precision and lightweight face recognition framework specifically designed for dense broiler farming environments. Building on the YOLOv11n architecture, the proposed model integrates four key modules to [...] Read more.
Efficient individual identification is essential for advancing precision broiler farming. In this study, we propose YOLO-IFSC, a high-precision and lightweight face recognition framework specifically designed for dense broiler farming environments. Building on the YOLOv11n architecture, the proposed model integrates four key modules to overcome the limitations of traditional methods and recent CNN-based approaches. The Inception-F module employs a dynamic multi-branch design to enhance multi-scale feature extraction, while the C2f-Faster module leverages partial convolution to reduce computational redundancy and parameter count. Furthermore, the SPPELANF module reinforces cross-layer spatial feature aggregation to alleviate the adverse effects of occlusion, and the CBAM module introduces a dual-domain attention mechanism to emphasize critical facial regions. Experimental evaluations on a self-constructed dataset demonstrate that YOLO-IFSC achieves a mAP@0.5 of 91.5%, alongside a 40.8% reduction in parameters and a 24.2% reduction in FLOPs compared to the baseline, with a consistent real-time inference speed of 36.6 FPS. The proposed framework offers a cost-effective, non-contact alternative for broiler face recognition, significantly advancing individual tracking and welfare monitoring in precision farming. Full article
Show Figures

Figure 1

23 pages, 5745 KiB  
Article
BDSER-InceptionNet: A Novel Method for Near-Infrared Spectroscopy Model Transfer Based on Deep Learning and Balanced Distribution Adaptation
by Jianghai Chen, Jie Ling, Nana Lei and Lingqiao Li
Sensors 2025, 25(13), 4008; https://doi.org/10.3390/s25134008 - 27 Jun 2025
Viewed by 292
Abstract
Near-Infrared Spectroscopy (NIRS) analysis technology faces numerous challenges in industrial applications. Firstly, the generalization capability of models is significantly affected by instrumental heterogeneity, environmental interference, and sample diversity. Traditional modeling methods exhibit certain limitations in handling these factors, making it difficult to achieve [...] Read more.
Near-Infrared Spectroscopy (NIRS) analysis technology faces numerous challenges in industrial applications. Firstly, the generalization capability of models is significantly affected by instrumental heterogeneity, environmental interference, and sample diversity. Traditional modeling methods exhibit certain limitations in handling these factors, making it difficult to achieve effective adaptation across different scenarios. Specifically, data distribution shifts and mismatches in multi-scale features hinder the transferability of models across different crop varieties or instruments from different manufacturers. As a result, the large amount of previously accumulated NIRS and reference data cannot be effectively utilized in modeling for new instruments or new varieties, thereby limiting improvements in modeling efficiency and prediction accuracy. To address these limitations, this study proposes a novel transfer learning framework integrating multi-scale network architecture with Balanced Distribution Adaptation (BDA) to enhance cross-instrument compatibility. The key contributions include: (1) RX-Inception multi-scale structure: Combines Xception’s depthwise separable convolution with ResNet’s residual connections to strengthen global–local feature coupling. (2) Squeeze-and-Excitation (SE) attention: Dynamically recalibrates spectral band weights to enhance discriminative feature representation. (3) Systematic evaluation of six transfer strategies: Comparative analysis of their impacts on model adaptation performance. Experimental results on open corn and pharmaceutical datasets demonstrate that BDSER-InceptionNet achieves state-of-the-art performance on primary instruments. Notably, the proposed Method 6 successfully enables NIRS model sharing from primary to secondary instruments, effectively mitigating spectral discrepancies and significantly improving transfer efficacy. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

14 pages, 1438 KiB  
Article
CDBA-GAN: A Conditional Dual-Branch Attention Generative Adversarial Network for Robust Sonar Image Generation
by Wanzeng Kong, Han Yang, Mingyang Jia and Zhe Chen
Appl. Sci. 2025, 15(13), 7212; https://doi.org/10.3390/app15137212 - 26 Jun 2025
Viewed by 238
Abstract
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data [...] Read more.
The acquisition of real-world sonar data necessitates substantial investments of manpower, material resources, and financial capital, rendering it challenging to obtain sufficient authentic samples for sonar-related research tasks. Consequently, sonar image simulation technology has become increasingly vital in the field of sonar data analysis. Traditional sonar simulation methods predominantly focus on low-level physical modeling, which often suffers from limited image controllability and diminished fidelity in multi-category and multi-background scenarios. To address these limitations, this paper proposes a Conditional Dual-Branch Attention Generative Adversarial Network (CDBA-GAN). The framework comprises three key innovations: The conditional information fusion module, dual-branch attention feature fusion mechanism, and cross-layer feature reuse. By integrating encoded conditional information with the original input data of the generative adversarial network, the fusion module enables precise control over the generation of sonar images under specific conditions. A hierarchical attention mechanism is implemented, sequentially performing channel-level and pixel-level attention operations. This establishes distinct weight matrices at both granularities, thereby enhancing the correlation between corresponding elements. The dual-branch attention features are fused via a skip-connection architecture, facilitating efficient feature reuse across network layers. The experimental results demonstrate that the proposed CDBA-GAN generates condition-specific sonar images with a significantly lower Fréchet inception distance (FID) compared to existing methods. Notably, the framework exhibits robust imaging performance under noisy interference and outperforms state-of-the-art models (e.g., DCGAN, WGAN, SAGAN) in fidelity across four categorical conditions, as quantified by FID metrics. Full article
Show Figures

Figure 1

33 pages, 5602 KiB  
Article
CELM: An Ensemble Deep Learning Model for Early Cardiomegaly Diagnosis in Chest Radiography
by Erdem Yanar, Fırat Hardalaç and Kubilay Ayturan
Diagnostics 2025, 15(13), 1602; https://doi.org/10.3390/diagnostics15131602 - 25 Jun 2025
Viewed by 463
Abstract
Background/Objectives: Cardiomegaly—defined as the abnormal enlargement of the heart—is a key radiological indicator of various cardiovascular conditions. Early detection is vital for initiating timely clinical intervention and improving patient outcomes. This study investigates the application of deep learning techniques for the automated diagnosis [...] Read more.
Background/Objectives: Cardiomegaly—defined as the abnormal enlargement of the heart—is a key radiological indicator of various cardiovascular conditions. Early detection is vital for initiating timely clinical intervention and improving patient outcomes. This study investigates the application of deep learning techniques for the automated diagnosis of cardiomegaly from chest X-ray (CXR) images, utilizing both convolutional neural networks (CNNs) and Vision Transformers (ViTs). Methods: We assembled one of the largest and most diverse CXR datasets to date, combining posteroanterior (PA) images from PadChest, NIH CXR, VinDr-CXR, and CheXpert. Multiple pre-trained CNN architectures (VGG16, ResNet50, InceptionV3, DenseNet121, DenseNet201, and AlexNet), as well as Vision Transformer models, were trained and compared. In addition, we introduced a novel stacking-based ensemble model—Combined Ensemble Learning Model (CELM)—that integrates complementary CNN features via a meta-classifier. Results: The CELM achieved the highest diagnostic performance, with a test accuracy of 92%, precision of 99%, recall of 89%, F1-score of 0.94, specificity of 92.0%, and AUC of 0.90. These results highlight the model’s high agreement with expert annotations and its potential for reliable clinical use. Notably, Vision Transformers offered competitive performance, suggesting their value as complementary tools alongside CNNs. Conclusions: With further validation, the proposed CELM framework may serve as an efficient and scalable decision-support tool for cardiomegaly screening, particularly in resource-limited settings such as intensive care units (ICUs) and emergency departments (EDs), where rapid and accurate diagnosis is imperative. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

27 pages, 3134 KiB  
Article
A Hybrid Deep Learning Approach for Cotton Plant Disease Detection Using BERT-ResNet-PSO
by Chetanpal Singh, Santoso Wibowo and Srimannarayana Grandhi
Appl. Sci. 2025, 15(13), 7075; https://doi.org/10.3390/app15137075 - 23 Jun 2025
Viewed by 308
Abstract
Cotton is one of the most valuable non-food agricultural products in the world. However, cotton production is often hampered by the invasion of disease. In most cases, these plant diseases are a result of insect or pest infestations, which can have a significant [...] Read more.
Cotton is one of the most valuable non-food agricultural products in the world. However, cotton production is often hampered by the invasion of disease. In most cases, these plant diseases are a result of insect or pest infestations, which can have a significant impact on production if not addressed promptly. It is, therefore, crucial to accurately identify leaf diseases in cotton plants to prevent any negative effects on yield. This paper presents a hybrid deep learning approach based on Bidirectional Encoder Representations from Transformers with Residual network and particle swarm optimization (BERT-ResNet-PSO) for detecting cotton plant diseases. This approach starts with image pre-processing, which they pass to a BERT-like encoder after linearly embedding the image patches. It results in segregating disease regions. Then, the output of the encoded feature is passed to ResNet-based architecture for feature extraction and further optimized by PSO to increase the classification accuracy. The approach is tested on a cotton dataset from the Plant Village dataset, where the experimental results show the effectiveness of this hybrid deep learning approach, achieving an accuracy of 98.5%, precision of 98.2% and recall of 98.7% compared to the existing deep learning approaches such as ResNet50, VGG19, InceptionV3, and ResNet152V2. This study shows that the hybrid deep learning approach is capable of dealing with the cotton plant disease detection problem effectively. This study suggests that the proposed approach is beneficial to help avoid crop losses on a large scale and support effective farming management practices. Full article
Show Figures

Figure 1

27 pages, 2049 KiB  
Article
Optimizing Tumor Detection in Brain MRI with One-Class SVM and Convolutional Neural Network-Based Feature Extraction
by Azeddine Mjahad and Alfredo Rosado-Muñoz
J. Imaging 2025, 11(7), 207; https://doi.org/10.3390/jimaging11070207 - 21 Jun 2025
Viewed by 347
Abstract
The early detection of brain tumors is critical for improving clinical outcomes and patient survival. However, medical imaging datasets frequently exhibit class imbalance, posing significant challenges for traditional classification algorithms that rely on balanced data distributions. To address this issue, this study employs [...] Read more.
The early detection of brain tumors is critical for improving clinical outcomes and patient survival. However, medical imaging datasets frequently exhibit class imbalance, posing significant challenges for traditional classification algorithms that rely on balanced data distributions. To address this issue, this study employs a One-Class Support Vector Machine (OCSVM) trained exclusively on features extracted from healthy brain MRI images, using both deep learning architectures—such as DenseNet121, VGG16, MobileNetV2, InceptionV3, and ResNet50—and classical feature extraction techniques. Experimental results demonstrate that combining Convolutional Neural Network (CNN)-based feature extraction with OCSVM significantly improves anomaly detection performance compared with simpler handcrafted approaches. DenseNet121 achieved an accuracy of 94.83%, a precision of 99.23%, and a sensitivity of 89.97%, while VGG16 reached an accuracy of 95.33%, a precision of 98.87%, and a sensitivity of 91.32%. MobileNetV2 showed a competitive trade-off between accuracy (92.83%) and computational efficiency, making it suitable for resource-constrained environments. Additionally, the pure CNN model—trained directly for classification without OCSVM—outperformed hybrid methods with an accuracy of 97.83%, highlighting the effectiveness of deep convolutional networks in directly learning discriminative features from MRI data. This approach enables reliable detection of brain tumor anomalies without requiring labeled pathological data, offering a promising solution for clinical contexts where abnormal samples are scarce. Future research will focus on reducing inference time, expanding and diversifying training datasets, and incorporating explainability tools to support clinical integration and trust in AI-based diagnostics. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

32 pages, 8835 KiB  
Article
SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification
by Xuan Liu, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen and Yongjie Ma
Remote Sens. 2025, 17(12), 2034; https://doi.org/10.3390/rs17122034 - 12 Jun 2025
Viewed by 1440
Abstract
Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional [...] Read more.
Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution. Full article
Show Figures

Figure 1

Back to TopTop