Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (583)

Search Parameters:
Keywords = EfficientNet-B0 model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 5763 KB  
Article
SatNet-B3: A Lightweight Deep Edge Intelligence Framework for Satellite Imagery Classification
by Tarbia Hasan, Jareen Anjom, Md. Ishan Arefin Hossain and Zia Ush Shamszaman
Future Internet 2025, 17(12), 579; https://doi.org/10.3390/fi17120579 - 16 Dec 2025
Abstract
Accurate weather classification plays a vital role in disaster management and minimizing economic losses. However, satellite-based weather classification remains challenging due to high inter-class similarity; the computational complexity of existing deep learning models, which limits real-time deployment on resource-constrained edge devices; and the [...] Read more.
Accurate weather classification plays a vital role in disaster management and minimizing economic losses. However, satellite-based weather classification remains challenging due to high inter-class similarity; the computational complexity of existing deep learning models, which limits real-time deployment on resource-constrained edge devices; and the limited interpretability of model decisions in practical environments. To address these challenges, this study proposes SatNet-B3, a quantized, lightweight deep learning framework that integrates an EfficientNetB3 backbone with custom classification layers to enable accurate and edge-deployable weather event recognition from satellite imagery. SatNet-B3 is evaluated on the LSCIDMR dataset and demonstrates high-precision performance, achieving 98.20% accuracy and surpassing existing benchmarks. Ten CNN models, including SatNet-B3, were experimented with to classify eight weather conditions, Tropical Cyclone, Extratropical Cyclone, Snow, Low Water Cloud, High Ice Cloud, Vegetation, Desert, and Ocean, with SatNet-B3 yielding the best results. The model addresses class imbalance and inter-class similarity through extensive preprocessing and augmentation, and the pipeline supports the efficient handling of high-resolution geospatial imagery. Post-training quantization reduced the model size by 90.98% while retaining accuracy, and deployment on a Raspberry Pi 4 achieved a 0.3 s inference time. Integrating explainable AI tools such as LIME and CAM enhances interpretability for intelligent climate monitoring. Full article
Show Figures

Graphical abstract

18 pages, 3003 KB  
Article
Vineyard Groundcover Biodiversity: Using Deep Learning to Differentiate Cover Crop Communities from Aerial RGB Imagery
by Isabella Ghiglieno, Girma Tariku Woldesemayat, Andres Sanchez Morchio, Celine Birolleau, Luca Facciano, Fulvio Gentilin, Salvatore Mangiapane, Anna Simonetto and Gianni Gilioli
AgriEngineering 2025, 7(12), 434; https://doi.org/10.3390/agriengineering7120434 - 16 Dec 2025
Abstract
Monitoring groundcover diversity in vineyards is a complex task, often limited by the time and expertise required for accurate botanical identification. Remote sensing technologies and AI-based tools are still underutilized in this context, particularly for classifying herbaceous vegetation in inter-row areas. In this [...] Read more.
Monitoring groundcover diversity in vineyards is a complex task, often limited by the time and expertise required for accurate botanical identification. Remote sensing technologies and AI-based tools are still underutilized in this context, particularly for classifying herbaceous vegetation in inter-row areas. In this study, we introduce a novel approach to classify the groundcover into one of nine categories, in order to simplify this task. Using UAV images to train a convolutional neural network through a deep learning methodology, this study evaluates the effectiveness of different backbone structures applied to a UNet network for the classification of pixels into nine classes of groundcover: vine canopy, bare soil, and seven distinct cover crop community types. Our results demonstrate that the UNet model, especially when using an EfficientNetB0 backbone, significantly improves classification performance, achieving 85.4% accuracy, 59.8% mean Intersection over Union (IoU), and a Jaccard index of 73.0%. Although this study demonstrates the potential of integrating remote sensing and deep learning for vineyard biodiversity monitoring, its applicability is limited by the small image coverage, as data were collected from a single vineyard and only one drone flight. Future work will focus on expanding the model’s applicability to a broader range of vineyard systems, soil types, and geographic regions, as well as testing its performance on lower-resolution multispectral imagery to reduce data acquisition costs and time, enabling large-scale and cost-effective monitoring. Full article
Show Figures

Figure 1

18 pages, 3112 KB  
Article
Denatured Recognition of Biological Tissue Using Ultrasonic Phase Space Reconstruction and CBAM-EfficientNet-B0 During HIFU Therapy
by Bei Liu, Haitao Zhu and Xian Zhang
Fractal Fract. 2025, 9(12), 819; https://doi.org/10.3390/fractalfract9120819 - 15 Dec 2025
Abstract
This study proposes an automatic denatured recognition method of biological tissue during high-intensity focused ultrasound (HIFU) therapy. The technique integrates ultrasonic phase space reconstruction (PSR) with a convolutional block attention mechanism-enhanced EfficientNet-B0 model (CBAM-EfficientNet-B0). Ultrasonic echo signals are first transformed into high-dimensional phase [...] Read more.
This study proposes an automatic denatured recognition method of biological tissue during high-intensity focused ultrasound (HIFU) therapy. The technique integrates ultrasonic phase space reconstruction (PSR) with a convolutional block attention mechanism-enhanced EfficientNet-B0 model (CBAM-EfficientNet-B0). Ultrasonic echo signals are first transformed into high-dimensional phase space reconstruction trajectory diagrams using PSR, which reveal distinct fractal and chaotic characteristics to analyze tissue complexity. The CBAM module is incorporated into EfficientNet-B0 to enhance feature extraction from these nonlinear dynamic representations by focusing on critical channels and spatial regions. The network is further optimized with Dropout and Scaled Exponential Linear Units (SeLUs) to prevent overfitting, alongside a cosine annealing learning rate scheduler. Experimental results demonstrate the superior performance of the proposed CBAM-EfficientNet-B0 model, achieving a high recognition accuracy of 99.57% and outperforming five benchmark CNN models (EfficientNet-B0, ResNet101, DenseNet201, ResNet18, and VGG16). The method avoids the subjectivity and uncertainty inherent in traditional manual feature extraction, enabling effective identification of HIFU-induced tissue denaturation. This work confirms the significant potential of combining nonlinear dynamics, fractal analysis, and deep learning for accurate, real-time monitoring in HIFU therapy. Full article
Show Figures

Figure 1

22 pages, 9457 KB  
Article
Enhancing Document Classification Through Multimodal Image-Text Classification: Insights from Fine-Tuned CLIP and Multimodal Deep Fusion
by Hosam Aljuhani, Mohamed Yehia Dahab and Yousef Alsenani
Sensors 2025, 25(24), 7596; https://doi.org/10.3390/s25247596 - 15 Dec 2025
Abstract
Foundation models excel on general benchmarks but often underperform in clinical settings due to domain shift between internet-scale pretraining data and medical data. Multimodal deep learning, which jointly leverages medical images and clinical text, is promising for diagnosis, yet it remains unclear whether [...] Read more.
Foundation models excel on general benchmarks but often underperform in clinical settings due to domain shift between internet-scale pretraining data and medical data. Multimodal deep learning, which jointly leverages medical images and clinical text, is promising for diagnosis, yet it remains unclear whether domain adaptation is better achieved by fine-tuning large vision–language models or by training lighter, task-specific architectures. We address this question by introducing PairDx, a balanced dataset of 22,665 image–caption pairs spanning six medical document classes, curated to reduce class imbalance and support fair, reproducible comparisons. Using PairDx, we develop and evaluate two approaches: (i) PairDxCLIP, a fine-tuned CLIP (ViT-B/32), and (ii) PairDxFusion, a custom hybrid model that combines ResNet-18 visual features and GloVe text embeddings with attention-based fusion. Both adapted models substantially outperform a zero-shot CLIP baseline (61.18% accuracy) and a specialized model, BiomedCLIP, which serves as an additional baseline and achieves 66.3% accuracy. Our fine-tuned CLIP (PairDxCLIP) attains 93% accuracy and our custom fusion model (PairDxFusion) reaches 94% accuracy on a held-out test set. Notably, PairDxFusion achieves this high accuracy with 17 min, 55 s of training time, nearly four times faster than PairDxCLIP (65 min, 52 s), highlighting a practical efficiency–performance trade-off for clinical deployment. The testing time also outperforms the specialized model—BiomedCLIP (0.387 s/image). Our results demonstrate that carefully constructed domain-specific datasets and lightweight multimodal fusion can close the domain gap while reducing computational cost in healthcare decision support. Full article
(This article belongs to the Special Issue Transforming Healthcare with Smart Sensing and Machine Learning)
Show Figures

Figure 1

36 pages, 7233 KB  
Article
Deep Learning for Tumor Segmentation and Multiclass Classification in Breast Ultrasound Images Using Pretrained Models
by K. E. ArunKumar, Matthew E. Wilson, Nathan E. Blake, Tylor J. Yost and Matthew Walker
Sensors 2025, 25(24), 7557; https://doi.org/10.3390/s25247557 - 12 Dec 2025
Viewed by 169
Abstract
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence [...] Read more.
Early detection of breast cancer commonly relies on imaging technologies such as ultrasound, mammography and MRI. Among these, breast ultrasound is widely used by radiologists to identify and assess lesions. In this study, we developed image segmentation techniques and multiclass classification artificial intelligence (AI) tools based on pretrained models to segment lesions and detect breast cancer. The proposed workflow includes both the development of segmentation models and development of a series of classification models to classify ultrasound images as normal, benign or malignant. The pretrained models were trained and evaluated on the Breast Ultrasound Images (BUSI) dataset, a publicly available collection of grayscale breast ultrasound images with corresponding expert-annotated masks. For segmentation, images and ground-truth masks were used to pretrained encoder (ResNet18, EfficientNet-B0 and MobileNetV2)–decoder (U-Net, U-Net++ and DeepLabV3) models, including the DeepLabV3 architecture integrated with a Frequency-Domain Feature Enhancement Module (FEM). The proposed FEM improves spatial and spectral feature representations using Discrete Fourier Transform (DFT), GroupNorm, dropout regularization and adaptive fusion. For classification, each image was assigned a label (normal, benign or malignant). Optuna, an open-source software framework, was used for hyperparameter optimization and for the testing of various pretrained models to determine the best encoder–decoder segmentation architecture. Five different pretrained models (ResNet18, DenseNet121, InceptionV3, MobielNetV3 and GoogleNet) were optimized for multiclass classification. DeepLabV3 outperformed other segmentation architectures, with consistent performance across training, validation and test images, with Dice Similarity Coefficient (DSC, a metric describing the overlap between predicted and true lesion regions) values of 0.87, 0.80 and 0.83 on training, validation and test sets, respectively. ResNet18:DeepLabV3 achieved an Intersection over Union (IoU) score of 0.78 during training, while ResNet18:U-Net++ achieved the best Dice coefficient (0.83) and IoU (0.71) and area under the curve (AUC, 0.91) scores on the test (unseen) dataset when compared to other models. However, the proposed Resnet18: FrequencyAwareDeepLabV3 (FADeepLabV3) achieved a DSC of 0.85 and an IoU of 0.72 on the test dataset, demonstrating improvements over standard DeepLabV3. Notably, the frequency-domain enhancement substantially improved the AUC from 0.90 to 0.98, indicating enhanced prediction confidence and clinical reliability. For classification, ResNet18 produced an F1 score—a measure combining precision and recall—of 0.95 and an accuracy of 0.90 on the training dataset, while InceptionV3 performed best on the test dataset, with an F1 score of 0.75 and accuracy of 0.83. We demonstrate a comprehensive approach to automate the segmentation and multiclass classification of breast cancer ultrasound images into benign, malignant or normal transfer learning models on an imbalanced ultrasound image dataset. Full article
Show Figures

Figure 1

14 pages, 1452 KB  
Article
Ensemble Method of Pre-Trained Models for Classification of Skin Lesion Images
by Umadevi V, Joshi Manisha Shivaram, Shankru Guggari and Kingsley Okoye
Appl. Sci. 2025, 15(24), 13083; https://doi.org/10.3390/app152413083 - 12 Dec 2025
Viewed by 172
Abstract
Human beings are affected by different types of skin diseases worldwide. Automatic identification of skin disease from Dermoscopy images has proved effective for diagnosis and treatment to reduce fatality rate. The objective of this work is to demonstrate efficiency of three deep learning [...] Read more.
Human beings are affected by different types of skin diseases worldwide. Automatic identification of skin disease from Dermoscopy images has proved effective for diagnosis and treatment to reduce fatality rate. The objective of this work is to demonstrate efficiency of three deep learning pre-trained models, namely MobileNet, EfficientNetB0, and DenseNet121 with ensembling techniques for classification of skin lesion images. This study considers HAM1000 dataset which consists of n = 10,015 images of seven different classes, with a huge class imbalance. The study has two-fold contributions for the classification methodology of skin lesions. First, modification of three pre-trained deep learning models for grouping of skin lesion into seven types. Second, Weighted Grid Search algorithm is proposed to address the class imbalance problem for improving the accuracy of the base classifiers. The results showed that the weighted ensembling method achieved a 3.67% average improvement in Accuracy, Precision, and Recall, 3.33% average improvement for F1-Score, and 7% average improvement for Matthews Correlation Coefficient (MCC) when compared to base classifiers. Evaluation of the model’s efficiency and performance shows that it obtained the highest ROC-AUC score of 92.5% for the modified MobileNet model for skin lesion categorization in comparison to EfficientNetB0 and DenseNet121, respectively. The implications of the results show that deep learning methods and classification techniques are effective for diagnosis and treatment of skin lesion diseases to reduce fatality rate or detect early warnings. Full article
(This article belongs to the Special Issue Process Mining: Theory and Applications)
Show Figures

Figure 1

22 pages, 3733 KB  
Article
LightEdu-Net: Noise-Resilient Multimodal Edge Intelligence for Student-State Monitoring in Resource-Limited Environments
by Chenjia Huang, Yanli Chen, Bocheng Zhou, Xiuqi Cai, Ziying Zhai, Jiarui Zhang and Yan Zhan
Sensors 2025, 25(24), 7529; https://doi.org/10.3390/s25247529 - 11 Dec 2025
Viewed by 170
Abstract
Multimodal perception for student-state monitoring is difficult to deploy in rural classrooms because sensors are noisy and computing resources are highly constrained. This work targets these challenges by enabling noise-resilient, multimodal, real-time student-state recognition on low-cost edge devices. We propose LightEdu-Net, a sensor-noise-adaptive [...] Read more.
Multimodal perception for student-state monitoring is difficult to deploy in rural classrooms because sensors are noisy and computing resources are highly constrained. This work targets these challenges by enabling noise-resilient, multimodal, real-time student-state recognition on low-cost edge devices. We propose LightEdu-Net, a sensor-noise-adaptive Transformer-based multimodal network that integrates visual, physiological, and environmental signals in a unified lightweight architecture. The model incorporates three key components: a sensor noise adaptive module (SNAM) to suppress degraded sensor inputs, a cross-modal attention fusion module (CMAF) to capture complementary temporal dependencies across modalities, and an edge-aware knowledge distillation module (EAKD) to transfer knowledge from high-capacity teachers to an embedded-friendly student network. We construct a multimodal behavioral dataset from several rural schools and formulate student-state recognition as a multimodal classification task with explicit evaluation of noise robustness and edge deployability. Experiments show that LightEdu-Net achieves 92.4% accuracy with an F1-score of 91.4%, outperforming representative lightweight CNN and Transformer baselines. Under a noise level of 0.3, accuracy drops by only 1.1%, indicating strong robustness to sensor degradation. Deployment experiments further show that the model operates in real time on Jetson Nano with a latency of 42.8 ms (23.4 FPS) and maintains stable high accuracy on Raspberry Pi 4B and Intel NUC platforms. Beyond technical performance, the proposed system provides a low-cost and quantifiable mechanism for capturing fine-grained learning process indicators, offering new data support for educational economics studies on instructional efficiency and resource allocation in underdeveloped regions. Full article
Show Figures

Figure 1

23 pages, 2303 KB  
Article
Explainable Deep Learning for Breast Lesion Classification in Digital and Contrast-Enhanced Mammography
by Samara Acosta-Jiménez, Miguel M. Mendoza-Mendoza, Carlos E. Galván-Tejada, José M. Celaya-Padilla, Jorge I. Galván-Tejada and Manuel A. Soto-Murillo
Diagnostics 2025, 15(24), 3143; https://doi.org/10.3390/diagnostics15243143 - 10 Dec 2025
Viewed by 169
Abstract
Background: Artificial intelligence (AI) emerges as a powerful tool to assist breast cancer screening; however, its integration into different mammographic modalities remains insufficiently explored. Digital Mammography (DM) is widely accessible but presents limitations in dense breast tissue, whereas Contrast-Enhanced Spectral Mammography (CESM) [...] Read more.
Background: Artificial intelligence (AI) emerges as a powerful tool to assist breast cancer screening; however, its integration into different mammographic modalities remains insufficiently explored. Digital Mammography (DM) is widely accessible but presents limitations in dense breast tissue, whereas Contrast-Enhanced Spectral Mammography (CESM) provides functional information that enhances lesion visualization. Understanding how deep learning models behave across these modalities, and determining whether their decision-making patterns remain consistent, is essential for equitable clinical adoption. Methods: This study evaluates three convolutional neural network (CNN) architectures, ResNet-18, DenseNet-121, and EfficientNet-B0, for binary classification of breast lesions using DM and CESM images from the public CDD-CESM dataset (2006 images, three diagnostic classes). The models are trained separately on DM and CESM using three classification tasks: Normal vs. Benign, Benign vs. Malignant, and Normal vs. Malignant. A 3-fold cross-validation scheme and an independent test set are employed. Training uses transfer learning with ImageNet weights, weighted binary cross-entropy (BCE) loss, and SHapley Additive exPlanations (SHAP) analysis to visualize pixel-level relevance of model decisions. Results: CESM yields higher performance in the Normal vs. Benign and Benign vs. Malignant tasks, whereas DM achieves the highest discriminative ability in the Normal vs. Malignant comparison (EfficientNet-B0: AUC = 97%, Accuracy = 93.15%), surpassing the corresponding CESM results (AUC = 93%, Accuracy = 85.66%). SHAP attribution maps reveal anatomically coherent decision patterns in both modalities, with CESM producing sharper and more localized relevance regions due to contrast uptake, while DM exhibits broader yet spatially aligned attention. Across architectures, EfficientNet-B0 demonstrates the most stable performance and interpretability. Conclusions: CESM enhances subtle lesion discrimination through functional contrast, whereas DM, despite its simpler acquisition and wider availability, provides highly accurate and explainable outcomes when combined with modern CNNs. The consistent SHAP-based relevance observed across modalities indicates that both preserve clinically meaningful information. To the best of our knowledge, this study is the first to directly compare DM and CESM under identical preprocessing, training, and evaluation conditions using explainable deep learning models. Full article
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)
Show Figures

Figure 1

23 pages, 3030 KB  
Article
DualStream-AttnXGS: An Attention-Enhanced Dual-Stream Model Based on Human Keypoint Recognition for Driver Distraction Detection
by Zhuo He, Chengming Chen and Xiaoyi Zhou
Appl. Sci. 2025, 15(24), 12974; https://doi.org/10.3390/app152412974 - 9 Dec 2025
Viewed by 147
Abstract
Driver distraction remains one of the leading causes of traffic accidents. Although deep learning approaches such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers have been extensively applied for distracted driving detection, their performance is often hindered by limited real-time [...] Read more.
Driver distraction remains one of the leading causes of traffic accidents. Although deep learning approaches such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers have been extensively applied for distracted driving detection, their performance is often hindered by limited real-time efficiency and high false detection rates. To address these challenges, this paper proposes an efficient dual-stream neural architecture, termed DualStream-AttnXGS, which jointly leverages visual and pose information to improve distraction recognition accuracy. In the RGB stream, an enhanced EfficientNetB0 backbone is employed, where Ghost Convolution and Coordinate Attention modules are integrated to strengthen feature representation while maintaining lightweight computation. A compound loss function combining Center Loss and Focal Loss is further introduced to promote inter-class separability and stabilize training. In parallel, the keypoint stream extracts human skeletal features using YOLOv8-Pose, which are subsequently classified through a compact ensemble model based on XGBoost v2.1.4 and Gradient Boosting. Finally, a Softmax-based probabilistic fusion strategy integrates the outputs of both streams for the final prediction. The proposed model achieved 99.59% accuracy on the SFD3 dataset while attaining 99.12% accuracy on the AUCD2 dataset, demonstrating that the proposed dual-stream architecture provides a more effective solution than single-stream models by leveraging complementary visual and pose information. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

21 pages, 765 KB  
Article
DERI1000: A New Benchmark for Dataset Explainability Readiness
by Andrej Pisarcik, Robert Hudec and Roberta Hlavata
AI 2025, 6(12), 320; https://doi.org/10.3390/ai6120320 - 8 Dec 2025
Viewed by 305
Abstract
Deep learning models are increasingly evaluated not only for predictive accuracy but also for their robustness, interpretability, and data quality dependencies. However, current benchmarks largely isolate these dimensions, lacking a unified evaluation protocol that integrates data-centric and model-centric properties. To bridge the gap [...] Read more.
Deep learning models are increasingly evaluated not only for predictive accuracy but also for their robustness, interpretability, and data quality dependencies. However, current benchmarks largely isolate these dimensions, lacking a unified evaluation protocol that integrates data-centric and model-centric properties. To bridge the gap between data quality assessment and eXplainable Artificial Intelligence (XAI), we introduce DERI1000—the Dataset Explainability Readiness Index—a benchmark that quantifies how suitable and well-prepared a dataset is for explainable and trustworthy deep learning. DERI1000 combines eleven measurable factors—sharpness, noise artifacts, exposure, resolution, duplicates, diversity, separation, imbalance, label noise proxy, XAI overlay, and XAI stability—into a single normalized score calibrated around a reference baseline of 1000. Using five MedMNIST datasets (PathMNIST, ChestMNIST, BloodMNIST, OCTMNIST, OrganCMNIST) and five convolutional neural architectures (DenseNet121, ResNet50, ResNet18, VGG16, EfficientNet-B0), we fitted factor weights through multi-dataset impact analysis. The results indicate that imbalance (0.3319), separation (0.1377), and label noise proxy (0.2161) are the dominant contributors to explainability readiness. Experiments demonstrate that DERI1000 effectively distinguishes models with superficially high accuracy (ACC) but poor interpretability or robustness. The framework thereby enables cross-domain, reproducible evaluation of model performance and data quality under unified metrics. We conclude that DERI1000 provides a scalable, interpretable, and extensible foundation for benchmarking deep learning systems across both data-centric and explainability-driven dimensions. Full article
Show Figures

Figure 1

22 pages, 2302 KB  
Article
MAF-GAN: A Multi-Attention Fusion Generative Adversarial Network for Remote Sensing Image Super-Resolution
by Zhaohe Wang, Hai Tan, Zhongwu Wang, Jinlong Ci and Haoran Zhai
Remote Sens. 2025, 17(24), 3959; https://doi.org/10.3390/rs17243959 - 7 Dec 2025
Viewed by 208
Abstract
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator [...] Read more.
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator of MAF-GAN is built on a U-Net backbone, which incorporates Oriented Convolutions (OrientedConv) to enhance the extraction of directional features and textures, while a novel co-calibration mechanism—incorporating channel, spatial, gating, and spectral attention—is embedded in the encoding path and skip connections, supplemented by an adaptive weighting strategy to enable effective multi-scale feature fusion, and a composite loss function is further designed to integrate adversarial loss, perceptual loss, hybrid pixel loss, total variation loss, and feature consistency loss for optimizing model performance; extensive experiments on the GF7-SR4×-MSD dataset demonstrate that MAF-GAN achieves state-of-the-art performance, delivering a Peak Signal-to-Noise Ratio (PSNR) of 27.14 dB, Structural Similarity Index (SSIM) of 0.7206, Learned Perceptual Image Patch Similarity (LPIPS) of 0.1017, and Spectral Angle Mapper (SAM) of 1.0871, which significantly outperforms mainstream models including SRGAN, ESRGAN, SwinIR, HAT, and ESatSR as well as exceeds traditional interpolation methods (e.g., Bicubic) by a substantial margin, and notably, MAF-GAN maintains an excellent balance between reconstruction quality and inference efficiency to further reinforce its advantages over competing methods; additionally, ablation studies validate the individual contribution of each proposed component to the model’s overall performance, and this method generates super-resolution remote sensing images with more natural visual perception, clearer spatial structures, and superior spectral fidelity, thus offering a reliable technical solution for high-precision remote sensing applications. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Figure 1

25 pages, 2667 KB  
Article
Dual-Attention EfficientNet Hybrid U-Net for Segmentation of Rheumatoid Arthritis Hand X-Rays
by Madallah Alruwaili, Mahmood A. Mahmood and Murtada K. Elbashir
Diagnostics 2025, 15(24), 3105; https://doi.org/10.3390/diagnostics15243105 - 6 Dec 2025
Viewed by 154
Abstract
Background: Accurate segmentation in radiographic imaging remains difficult due to heterogeneous contrast, acquisition artifacts, and fine-scale anatomical boundaries. Objective: This paper presents a Hybrid Attention U-Net, which paired an EfficientNet-B3 encoder with a decoder that is both lightweight, featuring CBAM and [...] Read more.
Background: Accurate segmentation in radiographic imaging remains difficult due to heterogeneous contrast, acquisition artifacts, and fine-scale anatomical boundaries. Objective: This paper presents a Hybrid Attention U-Net, which paired an EfficientNet-B3 encoder with a decoder that is both lightweight, featuring CBAM and SCSE modules, and complementary for channel-wise and spatial-wise recalibration of sharper boundary recovery. Methods: The preprocessing phase uses percentile windowing, N4 bias compensation, per-image normalization, and geometric standardization as well as sparse geometric augmentations to reduce domain shift and make the pipeline viable. Results: For hand X-ray segmentation, the model achieves results with Dice = 0.8426, IoU around 0.78, pixel accuracy = 0.9058, ROC-AUC = 0.9074, and PR-AUC = 0.8452, and converges quickly at the early stages and remains steady at late epochs. Controlled ablation shows that the main factor of overlap quality of EfficientNet-B3 and that smaller batches (bs = 16) are always better at gradient noise and implicit regularization than larger batches. The qualitative overlays are complementary to quantitative gains that reveal more distinct cortical profiles and lower background leakage. Conclusions: It is computationally moderate, end-to-end trainable, and can be easily extended to multi-class problems through a softmax head and class-balanced objectives, rendering it a powerful, deployable option for musculoskeletal radiograph segmentation as well as an effective baseline in future clinical translation analyses. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

28 pages, 3650 KB  
Article
Gastrointestinal Lesion Detection Using Ensemble Deep Learning Through Global Contextual Information
by Vikrant Aadiwal, Vishesh Tanwar, Bhisham Sharma and Dhirendra Prasad Yadav
Bioengineering 2025, 12(12), 1329; https://doi.org/10.3390/bioengineering12121329 - 5 Dec 2025
Viewed by 363
Abstract
The presence of subtle mucosal abnormalities makes small bowel Crohn’s disease (SBCD) and other gastrointestinal lesions difficult to detect, as these features are often very subtle and can closely resemble other disorders. Although the Kvasir and Esophageal Endoscopy datasets offer high-quality visual representations [...] Read more.
The presence of subtle mucosal abnormalities makes small bowel Crohn’s disease (SBCD) and other gastrointestinal lesions difficult to detect, as these features are often very subtle and can closely resemble other disorders. Although the Kvasir and Esophageal Endoscopy datasets offer high-quality visual representations of various parts of the GI tract, their manual interpretation and analysis by clinicians remain labor-intensive, time-consuming, and prone to subjective variability. To address this, we propose a generalizable ensemble deep learning framework for gastrointestinal lesion detection, capable of identifying pathological patterns such as ulcers, polyps, and esophagitis that visually resemble SBCD-associated abnormalities. Further, the classical convolutional neural network (CNN) extracts shallow high-dimensional features; due to this, it may miss the edges and complex patterns of the gastrointestinal lesions. To mitigate these limitations, this study introduces a deep learning ensemble framework that combines the strengths of EfficientNetB5, MobileNetV2, and multi-head self-attention (MHSA). EfficientNetB5 extracts detailed hierarchical features that help distinguish fine-grained mucosal structures, while MobileNetV2 enhances spatial representation with low computational overhead. The MHSA module further improves the model’s global correlation of the spatial features. We evaluated the model on two publicly available DBE datasets and compared the results with four state-of-the-art methods. Our model achieved classification accuracies of 99.25% and 98.86% on the Kvasir and Kaither datasets. Full article
Show Figures

Figure 1

24 pages, 3036 KB  
Article
MPG-SwinUMamba: High-Precision Segmentation and Automated Measurement of Eye Muscle Area in Live Sheep Based on Deep Learning
by Zhou Zhang, Yaojing Yue, Fuzhong Li, Leifeng Guo and Svitlana Pavlova
Animals 2025, 15(24), 3509; https://doi.org/10.3390/ani15243509 - 5 Dec 2025
Viewed by 210
Abstract
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and [...] Read more.
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and noise interference. To address these challenges, we present MPG-SwinUMamba, a novel deep learning-based segmentation network. This model uniquely combines the state-space model with a U-Net architecture. It also integrates an edge-enhancement multi-scale attention module (MSEE) and a pyramid attention refinement module (PARM) to improve the detection of indistinct boundaries and better capture global context. The global context aggregation decoder (GCAD) is employed to precisely reconstruct the segmentation mask, enabling automated measurement of the EMA. Compared to 12 other leading segmentation models, MPG-SwinUMamba achieved superior performance, with an intersection-over-union of 91.62% and a Dice similarity coefficient of 95.54%. Additionally, automated measurements show excellent agreement with expert manual assessments (correlation coefficient r = 0.9637), with a mean absolute percentage error of only 4.05%. This method offers non-invasive and efficient and objective evaluation of carcass performance in live sheep, with the potential to reduce measurement costs and enhance breeding efficiency. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

27 pages, 6664 KB  
Article
Advancing Multi-Label Tomato Leaf Disease Identification Using Vision Transformer and EfficientNet with Explainable AI Techniques
by Md. Nurullah, Rania Hodhod, Hyrum Carroll and Yi Zhou
Electronics 2025, 14(23), 4762; https://doi.org/10.3390/electronics14234762 - 3 Dec 2025
Viewed by 374
Abstract
Plant diseases pose a significant threat to global food security, affecting crop yield, quality, and overall agricultural productivity. Traditionally, diagnosing plant diseases has relied on time-consuming visual inspections by experts, which can often lead to errors. Machine learning (ML) and artificial intelligence (AI), [...] Read more.
Plant diseases pose a significant threat to global food security, affecting crop yield, quality, and overall agricultural productivity. Traditionally, diagnosing plant diseases has relied on time-consuming visual inspections by experts, which can often lead to errors. Machine learning (ML) and artificial intelligence (AI), particularly Vision Transformers (ViTs), and Convolutional Neural Networks, offer a faster, automated alternative for identifying plant diseases through leaf image analysis. However, these models are often criticized for their “black box” nature, limiting trust in their predictions due to a lack of transparency. Our findings show that incorporating Explainable AI (XAI) techniques, such as Grad-CAM, Integrated Gradients, and LIME, significantly improves model interpretability, making it easier for practitioners to identify the underlying symptoms of plant diseases. This study not only contributes to the field of plant disease detection but also offers a novel perspective on improving AI transparency in real-world agricultural applications through the use of XAI techniques. With training accuracies of 100.00% for ViT, 96.88% for EfficientNetB7, 93.75% for EfficientNetB0, and 87.50% for ResNet50, and corresponding validation accuracies of 96.39% for ViT, 86.98% for EfficientNetB7, and 82.00% for EfficientNetB0, our proposed models outperform earlier research on the same dataset. This demonstrates a notable improvement in model performance while maintaining transparency and trustworthiness through interpretable and reliable decision-making. Full article
(This article belongs to the Special Issue Artificial Intelligence and Image Processing in Smart Agriculture)
Show Figures

Figure 1

Back to TopTop