Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,653)

Search Parameters:
Keywords = ResNet-50 CNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 6894 KB  
Article
Visualizing the Machine Learning Process in Multichannel Time Series Classification
by Edgar Acuña and Roxana Aparicio
Analytics 2026, 5(1), 15; https://doi.org/10.3390/analytics5010015 (registering DOI) - 12 Mar 2026
Abstract
This paper uses visualization techniques to analyze the learning process of six machine learning classifiers for multichannel time series classification (MTSC), including five deep learning models—1D CNN, CNN-LSTM, ResNet, InceptionTime, and Transformer—and one non-deep learning method, ROCKET. Sixteen datasets from the University of [...] Read more.
This paper uses visualization techniques to analyze the learning process of six machine learning classifiers for multichannel time series classification (MTSC), including five deep learning models—1D CNN, CNN-LSTM, ResNet, InceptionTime, and Transformer—and one non-deep learning method, ROCKET. Sixteen datasets from the University of East Anglia (UEA) multivariate time series repository were employed to assess and compare classifier performance. To explore how data characteristics influence accuracy, we applied channel selection, feature selection, and similarity analysis between training and testing sets. Visualization techniques were used to examine the temporal and structural patterns of each dataset, offering insight into how feature relevance, channel informativeness, and group separability affect model performance. The experimental results show that ROCKET achieves the most consistent accuracy across datasets, although its performance decreases with a very large number of channels. Conversely, the Transformer model underperforms in datasets with limited training instances per class. Overall, the findings highlight the importance of visual exploration in understanding MTSC behavior and indicate that channel relevance and data separability have a greater impact on classification accuracy than feature-level patterns. Full article
Show Figures

Figure 1

29 pages, 4003 KB  
Article
Real-Time Detection of Blowing Snow Events on Rural Mountainous Freeways Using Existing Webcam Infrastructure and Convolutional Neural Networks
by Ahmed Mohamed, Md Nasim Khan and Mohamed M. Ahmed
Electronics 2026, 15(6), 1188; https://doi.org/10.3390/electronics15061188 - 12 Mar 2026
Abstract
The main objective of this study is to automatically detect real-time snow-related road surface conditions using imagery captured from existing roadside webcams along interstate freeways. Blowing snow is considered one of the most hazardous roadway weather phenomena because it significantly reduces driver visibility [...] Read more.
The main objective of this study is to automatically detect real-time snow-related road surface conditions using imagery captured from existing roadside webcams along interstate freeways. Blowing snow is considered one of the most hazardous roadway weather phenomena because it significantly reduces driver visibility and adversely affects vehicle operation. A comprehensive image preprocessing and reduction process was conducted to construct two reference datasets. The first dataset consisted of two categories (blowing snow and no blowing snow), while the second dataset included five surface condition categories: blowing snow, dry, slushy, snow covered, and snow patched. Eight pre-trained convolutional neural networks (CNNs), including AlexNet, SqueezeNet, ShuffleNet, ResNet18, GoogleNet, ResNet50, MobileNet-V3, and EfficientNet-B0, were evaluated for roadway surface condition classification. For Dataset 1, ResNet50 achieved the highest detection accuracy of 97.88%, while AlexNet demonstrated competitive performance with 97.56% accuracy and significantly shorter training time. Among the lightweight architectures, MobileNet-V3 achieved 95.56% accuracy, demonstrating strong computational efficiency. EfficientNet-B0 achieved 93.56% accuracy while maintaining reduced model complexity. For Dataset 2, ResNet18 achieved the highest multi-class detection accuracy of 96.10%, while AlexNet required the shortest training time among the evaluated models. A comparative analysis between deep CNN models and traditional machine learning approaches showed that deep CNNs significantly outperform feature-based methods in detecting blowing snow conditions. The proposed framework provides an automated, accurate, and scalable solution for roadway surface condition monitoring and supports real-time applications in intelligent transportation systems. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

31 pages, 11795 KB  
Article
Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings
by Yiwen Liu, Yibing Xue, Chunlu Liu and Runyu Wang
Urban Sci. 2026, 10(3), 150; https://doi.org/10.3390/urbansci10030150 - 11 Mar 2026
Abstract
Accurate occupancy information is critical for optimizing energy efficiency in buildings. Hybrid machine learning models have demonstrated great potential in previous studies; however, their application in passive ultra-low-energy buildings remains underexplored. This study conducts an empirical evaluation of real-time occupancy rate prediction using [...] Read more.
Accurate occupancy information is critical for optimizing energy efficiency in buildings. Hybrid machine learning models have demonstrated great potential in previous studies; however, their application in passive ultra-low-energy buildings remains underexplored. This study conducts an empirical evaluation of real-time occupancy rate prediction using a CNN-ResNet-RF hybrid model based on multi-source environmental and behavioral data from a passive ultra-low-energy educational building. The model integrates Convolutional Neural Networks (CNN) for local feature extraction, Residual Networks (ResNet) to enhance deep feature representation, and Random Forests (RF) for ensemble-based generalization. Indoor CO2 concentration exhibits the strongest linear correlation with occupancy rate (r = 0.54), indicating a meaningful association with occupancy dynamics. The model demonstrates strong predictive performance on the test set, with a coefficient of determination (R2) of 0.964, a root mean square error (RMSE) of 0.054, and a residual prediction deviation (RPD) exceeding 5. Compared with baseline models such as CNN, RF, and CNN-RF, the proposed framework exhibits generally lower prediction errors and improved stability. Further lightweight compression experiments reveal that the structured compact CNN-ResNet-RF-25 variant achieves even better accuracy (R2 = 0.9748, RMSE = 0.0449, RPD = 6.327) while substantially reducing model complexity, demonstrating strong deployment potential in resource-constrained environments. Full article
(This article belongs to the Topic Geospatial AI: Systems, Model, Methods, and Applications)
Show Figures

Figure 1

30 pages, 2375 KB  
Article
Deep Learning Based Computer-Aided Detection of Prostate Cancer Metastases in Bone Scintigraphy: An Experimental Analysis
by Eslam Jabali, Omar Almomani, Louai Qatawneh, Sinan Badwan, Yazan Almomani, Mohammad Al-soreeky, Alia Ibrahim and Natalie Khalil
J. Imaging 2026, 12(3), 121; https://doi.org/10.3390/jimaging12030121 - 11 Mar 2026
Abstract
Bone scintigraphy is a widely available and cost-effective modality for detecting skeletal metastases in prostate cancer, yet visual interpretation can be challenging due to heterogeneous uptake patterns, benign mimickers, and a high reporting workload, motivating robust computer-aided decision support. In this study, we [...] Read more.
Bone scintigraphy is a widely available and cost-effective modality for detecting skeletal metastases in prostate cancer, yet visual interpretation can be challenging due to heterogeneous uptake patterns, benign mimickers, and a high reporting workload, motivating robust computer-aided decision support. In this study, we present an experimental evaluation of fourteen convolutional neural network (CNN) architectures for binary metastasis classification in planar bone scintigraphy using a unified protocol. Fourteen models, CNN (baseline), AlexNet, VGG16, VGG19, ResNet18, ResNet34, ResNet50, ResNet50-attention, DenseNet121, DenseNet169, DenseNet121-attention, WideResNet50_2, EfficientNet-B0, and ConvNeXt-Tiny, were trained and tested on 600 scan images (300 normal, 300 metastatic) from the Jordanian Royal Medical Services under identical preprocessing and augmentation with stratified five-fold cross-validation. We report mean ± SD for AUC-ROC, accuracy, precision, sensitivity (recall), F1-score, specificity, and Cohen’s κ, alongside calibration via the Brier score and deployment indicators (parameters, FLOPs, model size, and inference time). DenseNet121 achieved the best overall balance of diagnostic performance and reliability, reaching AUC-ROC 96.0 ± 1.2, accuracy 89.2 ± 2.2, sensitivity 83.7 ± 3.4, specificity 94.7 ± 2.2, F1-score 88.5 ± 2.5, κ = 0.783 ± 0.045, and the strongest calibration (Brier 0.080 ± 0.013), with stable fold-to-fold behaviour. DenseNet121-attention produced the highest AUC-ROC (96.3 ± 1.1) but exhibited greater variability in specificity, indicating less consistent false-alarm control. Complexity analysis supported DenseNet121 as deployable (~7.0 M parameters, ~26.9 MB, ~92 ms/image), whereas heavier models yielded only limited additional clinical value. These results support DenseNet121 as a reliable backbone for automated metastasis detection in planar scintigraphy, with future work focusing on external validation, threshold optimisation, interpretability, and model compression for clinical adoption. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

25 pages, 11205 KB  
Article
Remote Sensing Image Captioning via Self-Supervised DINOv3 and Transformer Fusion
by Maryam Mehmood, Ahsan Shahzad, Farhan Hussain, Lismer Andres Caceres-Najarro and Muhammad Usman
Remote Sens. 2026, 18(6), 846; https://doi.org/10.3390/rs18060846 - 10 Mar 2026
Viewed by 41
Abstract
Effective interpretation of coherent and usable information from aerial images (e.g., satellite imagery or high-altitude drone photography) can greatly reduce human effort in many situations, both natural (e.g., earthquakes, forest fires, tsunamis) and man-made (e.g., highway pile-ups, traffic congestion), particularly in disaster management. [...] Read more.
Effective interpretation of coherent and usable information from aerial images (e.g., satellite imagery or high-altitude drone photography) can greatly reduce human effort in many situations, both natural (e.g., earthquakes, forest fires, tsunamis) and man-made (e.g., highway pile-ups, traffic congestion), particularly in disaster management. This research proposes a novel encoder–decoder framework for captioning of remote sensing images that integrates self-supervised DINOv3 visual features with a hybrid Transformer–LSTM decoder. Unlike existing approaches that rely on supervised CNN-based encoders (e.g., ResNet, VGG), the proposed method leverages DINOv3’s self-supervised learning capabilities to extract dense, semantically rich features from aerial images without requiring domain-specific labeled pretraining. The proposed hybrid decoder combines Transformer layers for global context modeling with LSTM layers for sequential caption generation, producing coherent and context-aware descriptions. Feature extraction is performed using the DINOv3 model, which employs the gram-anchoring technique to stabilize dense feature maps. Captions are generated through a hybrid of Transformer with Long Short-Term Memory (LSTM) layers, which adds contextual meaning to captions through sequential hidden layer modeling with gated memory. The model is first evaluated on two traditional remote sensing image captioning datasets: RSICD and UCM-Captions. Multiple evaluation metrics like Bilingual Evaluation Understudy (BLEU), Consensus-based Image Description Evaluation (CIDEr), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L), and Metric for Evaluation of Translation with Explicit Ordering (METEOR), are used to quantify the performance and robustness of the proposed DINOv3 hybrid model. The proposed model outperforms conventional Convolutional Neural Network (CNN) and Vision Transformers (ViT)-based models by approximately 9–12% across most evaluation metrics. Attention heatmaps are also employed to qualitatively validate the proposed model when identifying and describing key spatial elements. In addition, the proposed model is evaluated on advanced remote sensing datasets, including RSITMD, DisasterM3, and GeoChat. The results demonstrate that self-supervised vision transformers are robust encoders for multi-modal understanding in remote sensing image analysis and captioning. Full article
Show Figures

Figure 1

41 pages, 7209 KB  
Article
Towards the Development of a Deep Learning Framework Using Adaptive and Non-Adaptive Time-Frequency Features for EEG-Based Depression Therapy Prediction
by Hesam Akbari, Sara Bagherzadeh, Javid Farhadi Sedehi, Rab Nawaz, Reza Rostami, Reza Kazemi, Sadiq Muhammad, Haihua Chen and Mutlu Mete
Brain Sci. 2026, 16(3), 301; https://doi.org/10.3390/brainsci16030301 - 9 Mar 2026
Viewed by 141
Abstract
Background/Objectives: Predicting individual response to depression therapy prior to treatment initiation remains a critical clinical challenge, as the response rate to both selective serotonin reuptake inhibitors (SSRIs) and repetitive transcranial magnetic stimulation (rTMS) is approximately 50%, leaving treatment selection largely trial-based. This study [...] Read more.
Background/Objectives: Predicting individual response to depression therapy prior to treatment initiation remains a critical clinical challenge, as the response rate to both selective serotonin reuptake inhibitors (SSRIs) and repetitive transcranial magnetic stimulation (rTMS) is approximately 50%, leaving treatment selection largely trial-based. This study presents a computer-aided decision (CAD) framework that predicts depression therapy outcomes from pre-treatment electroencephalogram (EEG) signals using advanced time-frequency representations and pretrained convolutional neural networks (CNNs). Methods: EEG signals from 30 SSRI patients and 46 rTMS patients are transformed into time-frequency images using Continuous Wavelet Transform (CWT), Variational Mode Decomposition (VMD), and their pixel-level fusion. Four pretrained CNN architectures, including ResNet-18, MobileNet-V3, EfficientNet-B0, and TinyViT-Hybrid, are fine-tuned and evaluated under both image-independent and subject-independent 6-fold cross-validation (CV). Results: Results reveal a clear therapy-specific pattern: CWT-based representations yield superior discrimination for SSRI outcome prediction, with ResNet-18 achieving 99.43% image-level accuracy, while VMD-based representations are statistically superior for rTMS outcome prediction, with ResNet-18 reaching 98.77%. Pixel-level fusion of CWT and VMD does not consistently improve performance over the best individual representation in either therapy context. Pairwise Wilcoxon signed-rank tests confirm a two-tier architectural hierarchy in which ResNet-18 and TinyViT-Hybrid significantly outperform MobileNet-V3 and EfficientNet-B0 across all conditions, while remaining statistically indistinguishable from each other. At the subject level, the framework achieves 82.50% and 83.53% accuracy for SSRI and rTMS, respectively, under strict subject-independent evaluation. Per-channel analysis reveals occipital dominance for SSRI under CWT and frontotemporal dominance for rTMS under VMD, consistent with known neurophysiological mechanisms. Conclusions: These findings demonstrate that the choice of time-frequency representation is therapy-specific and at least as important as architectural complexity, and that competitive performance can be achieved without recurrent or attention layers by combining well-designed spectral images with a simple pretrained residual network. Full article
Show Figures

Figure 1

24 pages, 14350 KB  
Article
Adaptive Logit Fusion for Mitigating Class Imbalance in Multi-Category Sperm Morphology Assessment
by Emin Can Özge, Hamza Osman Ilhan, Gorkem Serbes, Hakkı Uzun, Ali Can Karaca and Merve Huner Yigit
Life 2026, 16(3), 438; https://doi.org/10.3390/life16030438 - 9 Mar 2026
Viewed by 128
Abstract
Sperm morphology is one of the most critical indicators of male fertility. This paper presents a deep learning-based approach to classify sperm cells into 18 morphological classes, including one normal and 17 abnormal types. Two state-of-the-art convolutional neural networks, EfficientNetV2-S and ResNet50V2, are [...] Read more.
Sperm morphology is one of the most critical indicators of male fertility. This paper presents a deep learning-based approach to classify sperm cells into 18 morphological classes, including one normal and 17 abnormal types. Two state-of-the-art convolutional neural networks, EfficientNetV2-S and ResNet50V2, are employed and fine-tuned using a class-weighted loss function together with extensive data augmentation to improve generalization under class imbalance. Automatic mixed precision training is adopted to reduce memory consumption and accelerate the training process. An ensemble strategy is subsequently constructed by linearly fusing the logits of both architectures, where the fusion weight is optimized to maximize recall, precision, and overall F1-score. Experimental results show that the proposed ensemble achieves an overall accuracy of 70.94%, consistently outperforming the individual models. Sperm cells with pronounced structural abnormalities, such as PinHead and DoubleTail, are classified with high accuracy, whereas less visually distinctive defects result in comparatively lower performance. These findings demonstrate the potential of CNN-based ensemble models to provide consistent and reliable automated sperm morphology classification. Full article
(This article belongs to the Section Physiology and Pathology)
Show Figures

Figure 1

24 pages, 4693 KB  
Article
A Short-Term Photovoltaic Power Prediction Based on Multidimensional Feature Fusion of Satellite Cloud Images
by Lingling Xie, Chunhui Li, Yanjing Luo and Long Li
Processes 2026, 14(5), 846; https://doi.org/10.3390/pr14050846 - 5 Mar 2026
Viewed by 221
Abstract
Clouds are a key factor affecting solar radiation, and their dynamic variations directly cause uncertainty and fluctuations in photovoltaic (PV) power output. To improve PV power prediction accuracy, this paper proposes an enhanced short-term photovoltaic power forecasting approach based on a hybrid neural [...] Read more.
Clouds are a key factor affecting solar radiation, and their dynamic variations directly cause uncertainty and fluctuations in photovoltaic (PV) power output. To improve PV power prediction accuracy, this paper proposes an enhanced short-term photovoltaic power forecasting approach based on a hybrid neural network architecture using features extracted from satellite cloud images. First, a dual-layer image fusion method is developed for satellite cloud images from different wavelengths and spectral bands, effectively improving fusion accuracy. Second, texture descriptors derived from the Gray-Level Co-occurrence Matrix and multiscale information obtained via the wavelet transform are employed for feature extraction from fused images. Combined with a residual network (ResNet), an optical flow method, as well as an LSTM-based temporal modeling module, multidimensional features of the predicted cloud images are obtained. An improved Bayesian optimization (IBO) algorithm is then employed to derive the optimal fused features, thereby improving the matching between cloud image features and PV power. Third, an enhanced hybrid architecture integrating a convolutional neural network and long short-term memory units with a multi-head self-attention mechanism is developed. Numerical weather prediction (NWP) meteorological features are incorporated, and a tilted irradiance model is introduced to calculate the solar irradiance received by PV modules for use in near-term photovoltaic power forecasting. Finally, measurements collected at a photovoltaic power plant located in Hebei Province are used to validate the proposed method. The results show that, relative to the SA-CNN-MSA-LSTM and BO-CNN-LSTM models, the developed approach lowers the RMSE to an extent of 22.56% and 4.32%, while decreasing the MAE by 24.84% and 5.91%, respectively. Overall, the proposed model accurately captures the characteristics of predicted cloud images and effectively improves PV power prediction accuracy. Full article
(This article belongs to the Special Issue Process Safety and Control Strategies for Urban Clean Energy Systems)
Show Figures

Figure 1

24 pages, 3943 KB  
Article
A Convolutional Neural Network(CNN)–Residual Network (ResNet)-Based Faulted Line Selection Method for Single-Phase Ground Faults in Distribution Network
by Qianqiu Shao, Zhen Yu and Shenfa Yin
Electronics 2026, 15(5), 1090; https://doi.org/10.3390/electronics15051090 - 5 Mar 2026
Viewed by 204
Abstract
Single-phase ground faults account for more than 80% of total faults in distribution networks. However, the introduction of distributed generation complicates power grid topology, leading to strong nonlinearity and non-stationarity in the zero-sequence current. This limits the accuracy of traditional faulted line selection [...] Read more.
Single-phase ground faults account for more than 80% of total faults in distribution networks. However, the introduction of distributed generation complicates power grid topology, leading to strong nonlinearity and non-stationarity in the zero-sequence current. This limits the accuracy of traditional faulted line selection methods. To address this problem, a CNN–ResNet-based method for faulted line selection for single-phase ground faults in distribution networks is proposed. Firstly, a 10 kV arc ground fault simulation test platform is built to analyze the nonlinear distortion characteristics of fault current. The WOA–VMD algorithm, optimized by permutation entropy, is used to denoise the zero-sequence current signal. The Gram Angular Difference Field (GADF) is then adopted to convert the one-dimensional signal into a two-dimensional image that retains its temporal characteristics. A hybrid deep learning model is constructed by fusing the one-dimensional time-domain features extracted by CNN and the two-dimensional time-frequency image features extracted by ResNet34. Matlab/Simulink simulations and physical experimental verification demonstrate that the proposed method achieves a training accuracy of over 97%, with zero misjudgments recorded in 15 arc grounding fault tests, representing a significant improvement in accuracy compared with existing diagnostic algorithms. It can adapt to complex scenarios such as high-resistance grounding and changes in neutral point grounding mode, effectively improving the accuracy and robustness of faulted line selection and providing technical support for the safe operation of distribution networks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 3598 KB  
Article
Fractional Tchebichef-ResNet-SE: A Hybrid Deep Learning Framework Integrating Fractional Tchebichef Moments with Attention Mechanisms for Enhanced IoT Intrusion Detection
by Islam S. Fathi, Ahmed R. El-Saeed, Mohammed Tawfik and Gaber Hassan
Fractal Fract. 2026, 10(3), 172; https://doi.org/10.3390/fractalfract10030172 - 5 Mar 2026
Viewed by 149
Abstract
The Internet of Things (IoT) faces critical security challenges stemming from resource-constrained devices and inadequate intrusion detection capabilities. Traditional machine learning approaches struggle with high-dimensional network traffic data due to the curse of dimensionality, severe class imbalance between benign and malicious traffic, and [...] Read more.
The Internet of Things (IoT) faces critical security challenges stemming from resource-constrained devices and inadequate intrusion detection capabilities. Traditional machine learning approaches struggle with high-dimensional network traffic data due to the curse of dimensionality, severe class imbalance between benign and malicious traffic, and dependence on manual feature engineering that fails to capture complex non-linear attack patterns. Although deep neural networks offer automatic feature extraction, they suffer from two fundamental limitations: the degradation problem, where increasing network depth paradoxically raises training error rather than improving performance, and uniform channel weighting, which prevents the network from adaptively emphasizing attack-relevant features while suppressing irrelevant noise. This research proposes a novel hybrid framework integrating Fractional Tchebichef moment-based feature preprocessing with deep Residual Networks enhanced by Squeeze-and-Excitation (ResNet-SE) attention mechanisms. Fractional Tchebichef moments provide compact, noise-resistant representations by operating directly in the discrete domain, eliminating discretization errors inherent in continuous moment approaches. Network traffic features are transformed into 232 × 232 moment-based matrices capturing discriminative patterns across multiple scales. Comprehensive evaluation on Bot-IoT and Leopard Mobile IoT datasets demonstrates superior performance, achieving 99.78% accuracy and a 99.37% F1-score, substantially outperforming K-Nearest Neighbors (84.7%), Support Vector Machines (87.5%), and baseline CNNs (99.3%). Ablation studies confirm synergistic contributions, with residual connections contributing 0.18% and SE attention adding 0.14% improvements. Cross-dataset evaluation achieves 96.34% and 97.12% accuracy on UNSW-NB15 and IoT-Bot datasets without retraining, while the framework processes 127.9 samples per second across diverse attack taxonomies. Full article
(This article belongs to the Section Optimization, Big Data, and AI/ML)
Show Figures

Figure 1

17 pages, 1775 KB  
Article
Evaluation of Maxillary Sinus Membrane Morphology Using a Novel Hybrid CNN-ViT-Based Deep Learning Model: An Automated Classification Study
by Nurullah Duger, Furkan Talo, Gulucag Giray Tekin, Burak Dagtekin, Mucahit Karaduman, Muhammed Yildirim and Tuba Talo Yildirim
Diagnostics 2026, 16(5), 777; https://doi.org/10.3390/diagnostics16050777 - 5 Mar 2026
Viewed by 175
Abstract
Objectives: This study aimed to develop and validate a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Vision Transformers (ViT) to automatically classify maxillary sinus membrane morphologies on Cone-Beam Computed Tomography (CBCT) images, distinguishing between Normal, Flat, Polypoid, and Obstruction [...] Read more.
Objectives: This study aimed to develop and validate a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Vision Transformers (ViT) to automatically classify maxillary sinus membrane morphologies on Cone-Beam Computed Tomography (CBCT) images, distinguishing between Normal, Flat, Polypoid, and Obstruction types. Methods: A dataset of 959 CBCT images was collected and categorized into four morphological classes: Normal, Flat, Polypoid and Obstruction. A custom hybrid model was developed, integrating a lightweight residual CNN for local feature extraction, learnable weighted feature fusion with a bidirectional feature pyramid network and a Transformer encoder for global context modeling. The performance of proposed model was compared against six different architectures, including ResNet50, MobileNetV3L and standard ViT models, using accuracy, precision, recall and F1-score metrics. Results: The proposed hybrid model achieved the highest overall accuracy of 98.44%, outperforming six strong CNN and ViT models including ResNet50 (97.92%) and ViT-B16 (86.46%) models. In class-wise analysis, the model demonstrated superior diagnostic capability, particularly for the “Obstruction” class, achieving 100% accuracy. High discrimination was also observed for “Flat” (98.21%) and “Polypoid” (98.04%) morphologies, confirming the model’s sensitivity to shape-based features. Conclusions: The proposed hybrid CNN-ViT model successfully classifies maxillary sinus membrane morphologies with high accuracy, effectively overcoming the limitations of standard ViT models on limited datasets. Detection of membrane morphology is vital for predicting surgical risks like membrane perforation and post-operative sinusitis. This model serves as a reliable clinical decision support tool, enabling clinicians to objectively assess specific risk factors before implant surgery and sinus floor elevation. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

33 pages, 8140 KB  
Article
Diagnosing Shortcut Learning in CNN-Based Photovoltaic Fault Recognition from RGB Images: A Multi-Method Explainability Audit
by Bogdan Marian Diaconu
AI 2026, 7(3), 94; https://doi.org/10.3390/ai7030094 - 4 Mar 2026
Viewed by 175
Abstract
Convolutional neural networks (CNNs) can achieve high accuracy in photovoltaic (PV) fault recognition from RGB imagery, yet their decisions may rely on shortcut cues induced by heterogeneous backgrounds, viewpoints, and class imbalance. This work presents a multi-method explainability audit on the Kaggle PV [...] Read more.
Convolutional neural networks (CNNs) can achieve high accuracy in photovoltaic (PV) fault recognition from RGB imagery, yet their decisions may rely on shortcut cues induced by heterogeneous backgrounds, viewpoints, and class imbalance. This work presents a multi-method explainability audit on the Kaggle PV Panel Defect Dataset (six classes), comparing five architectures (Baseline CNN, VGG16, ResNet50, InceptionV3, EfficientNetB0). Explanations are obtained with LIME superpixel surrogates (reported together with kernel-weighted surrogate fidelity), occlusion sensitivity (quantified via IoU@Top10% against consistent proxy masks, Shannon entropy, and Hoyer sparsity), and Integrated Gradients evaluated by deletion–insertion faithfulness and a Faithfulness Gap. While ResNet50 yields the best predictive performance, EfficientNetB0 shows the most consistent faithfulness evidence and stable panel-centered attributions. The analysis highlights class-dependent vulnerability to context cues, especially for the Clean and damaged classes, and supports using quantitative explainability diagnostics during model selection and dataset curation to mitigate shortcuts in vision-based PV monitoring. Full article
Show Figures

Figure 1

25 pages, 1420 KB  
Article
Identification of Retinal Diseases Using Light Convolutional Neural Networks and Intrinsic Mode Function Technique
by Preethi Kulkarni and Konda Srinivasa Reddy
Diagnostics 2026, 16(5), 773; https://doi.org/10.3390/diagnostics16050773 - 4 Mar 2026
Viewed by 198
Abstract
Background/Objectives: Fundus imaging provides a detailed view of the interior surface of the eye and plays a crucial role in the early diagnosis of retinal diseases. However, automated interpretation of fundus images remains challenging due to variations in illumination, noise, and structural [...] Read more.
Background/Objectives: Fundus imaging provides a detailed view of the interior surface of the eye and plays a crucial role in the early diagnosis of retinal diseases. However, automated interpretation of fundus images remains challenging due to variations in illumination, noise, and structural complexity. Methods: A novel hybrid model that integrates the Intrinsic Mode Function (IMF) filter, derived from Empirical Mode Decomposition (EMD), with a Light Convolutional Neural Network (LightCNN) for enhanced fundus image classification was proposed. The IMF filter effectively decomposes the input signal into intrinsic components, isolating high-frequency noise and preserving critical retinal patterns. These refined components are subsequently processed by the LightCNN architecture, which offers lightweight yet highly discriminative feature extraction and classification capabilities. Results: Experimental results on DIARETDB fundus datasets demonstrate that the proposed IMF + LightCNN model achieves 99.4% accuracy, 99.1% precision, 98.87% recall, and a 98.31 F1-score, significantly outperforming conventional CNN and ResNet-based models. Conclusions: Integrating advanced signal processing with lightweight deep learning improves both diagnostic accuracy and computational efficiency. This hybrid framework establishes a promising pathway for reliable and real-time clinical screening of retinal diseases. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

17 pages, 4773 KB  
Article
Optimizing Radiographic Diagnosis Through Signal-Balanced Convolutional Models
by Sakina Juzar Neemuchwala, Raja Hashim Ali, Qamar Abbas, Talha Ali Khan, Ambreen Shahnaz and Iftikhar Ahmed
J. Imaging 2026, 12(3), 108; https://doi.org/10.3390/jimaging12030108 - 4 Mar 2026
Viewed by 118
Abstract
Accurate interpretation of chest radiographs is central to the early diagnosis and management of pulmonary disorders. This study introduces an explainable deep learning framework that integrates biomedical signal fidelity analysis with transfer learning to enhance diagnostic reliability and transparency. Using the publicly available [...] Read more.
Accurate interpretation of chest radiographs is central to the early diagnosis and management of pulmonary disorders. This study introduces an explainable deep learning framework that integrates biomedical signal fidelity analysis with transfer learning to enhance diagnostic reliability and transparency. Using the publicly available COVID-19 Radiography Dataset (21,165 chest X-ray images across four classes: COVID-19, Viral Pneumonia, Lung Opacity, and Normal), three architectures, namely baseline Convolutional Neural Network (CNN), ResNet-50, and EfficientNetB3, were trained and evaluated under varied class-balancing and hyperparameter configurations. Signal preservation was quantitatively verified using the Structural Similarity Index Measure (SSIM = 0.93 ± 0.02), ensuring that preprocessing retained key diagnostic features. Among all models, ResNet-50 achieved the highest classification accuracy (93.7%) and macro-AUC = 0.97 (class-balanced), whereas EfficientNetB3 demonstrated superior generalization with reduced parameter overhead. Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations confirmed anatomically coherent activations aligned with pathological lung regions, substantiating clinical interpretability. The integration of signal fidelity metrics with explainable deep learning presents a reproducible and computationally efficient framework for medical image analysis. These findings highlight the potential of signal-aware transfer learning to support reliable, transparent, and resource-efficient diagnostic decision-making in radiology and other imaging-based medical domains. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

25 pages, 1853 KB  
Article
Deep Learning for Process Monitoring and Defect Detection of Laser-Based Powder Bed Fusion of Polymers
by Mohammadali Vaezi, Victor Klamert and Mugdim Bublin
Polymers 2026, 18(5), 629; https://doi.org/10.3390/polym18050629 - 3 Mar 2026
Viewed by 428
Abstract
Maintaining consistent part quality remains a critical challenge in industrial additive manufacturing, particularly in laser-based powder bed fusion of polymers (PBF-LB/P), where crystallization-driven thermal instabilities, governed by isothermal crystallization within a narrow sintering window, precipitate defects such as curling, warping, and delamination. In [...] Read more.
Maintaining consistent part quality remains a critical challenge in industrial additive manufacturing, particularly in laser-based powder bed fusion of polymers (PBF-LB/P), where crystallization-driven thermal instabilities, governed by isothermal crystallization within a narrow sintering window, precipitate defects such as curling, warping, and delamination. In contrast to metal-based systems dominated by melt-pool hydrodynamics, polymer PBF-LB/P requires monitoring strategies capable of resolving subtle spatio-temporal thermal deviations under realistic industrial operating conditions. Although machine learning, particularly convolutional neural networks (CNNs), has demonstrated efficacy in defect detection, a structured evaluation of heterogeneous modeling paradigms and their deployment feasibility in polymer PBF-LB/P remains limited. This study presents a systematic cross-paradigm assessment of unsupervised anomaly detection (autoencoders and generative adversarial networks), supervised CNN classifiers (VGG-16, ResNet50, and Xception), hybrid CNN-LSTM architectures, and physics-informed neural networks (PINNs) using 76,450 synchronized thermal and RGB images acquired from a commercial industrial system operating under closed control constraints. CNN-based models enable frame- and sequence-level defect classification, whereas the PINN component complements detection by providing physically consistent thermal-field regression. The results reveal quantifiable trade-offs between detection performance, temporal robustness, physical consistency, and algorithmic complexity. Pre-trained CNNs achieve up to 99.09% frame-level accuracy but impose a substantial computational burden for edge deployment. The PINN model attains an RMSE of approximately 27 K under quasi-isothermal process conditions, supporting trend-level thermal monitoring. A lightweight hybrid CNN achieves 99.7% validation accuracy with 1860 parameters and a CPU-benchmarked forward-pass inference time of 1.6 ms (excluding sensor acquisition latency). Collectively, this study establishes a rigorously benchmarked, scalable, and resource-efficient deep-learning framework tailored to crystallization-dominated polymer PBF-LB/P, providing a technically grounded basis for real-time industrial quality monitoring. Full article
(This article belongs to the Special Issue Artificial Intelligence in Polymers)
Show Figures

Graphical abstract

Back to TopTop