MDPI - Publisher of Open Access Journals

23 pages, 4788 KB

Open AccessArticle

Leakage-Free Evaluation and Multi-Prototype Contrastive Learning for Hyperspectral Classification of Vegetation

by Tong Jia and Haiyong Ding

Appl. Sci. 2026, 16(7), 3543; https://doi.org/10.3390/app16073543 - 4 Apr 2026

Viewed by 119

Hyperspectral image (HSI) classification regarding vegetation is hampered by strong intra-class spectral variability and inter-class similarity, and commonly used random pixel splits can introduce spatial-context leakage that inflates test accuracy in patch-based models. To address these issues, we propose a classification framework that [...] Read more.

Hyperspectral image (HSI) classification regarding vegetation is hampered by strong intra-class spectral variability and inter-class similarity, and commonly used random pixel splits can introduce spatial-context leakage that inflates test accuracy in patch-based models. To address these issues, we propose a classification framework that couples a leakage-free block partition (LFBP) strategy with class-aware multi-prototype contrastive loss (CAMP-CL). LFBP assigns non-overlapping spatial blocks to training/validation/test sets and reserves a buffer matched to the patch radius to prevent contextual overlap while keeping class distributions balanced. CAMP-CL represents each class with multiple learnable prototypes and performs supervised contrastive learning at the prototype level, encouraging compact yet multimodal intra-class embedding and improved inter-class separation. Experiments conducted on the Matiwan Village airborne HSI dataset under the LFBP protocol show that the proposed method can achieve 91.51% overall accuracy (OA) and 91.49% average accuracy (AA). Compared with the strongest baseline, supervised contrastive learning (SupCon), the proposed method yields consistent gains of 1.07 percentage points (pp) in both OA and AA while improving OA by 5.76 pp over the cross-entropy baseline. The results suggest that CAMP-CL is beneficial for addressing the challenges of HSI classification for fine-grained vegetation, while leakage-free evaluation protocols are important for obtaining more reliable performance estimates in practical settings. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

38 pages, 1145 KB

Open AccessArticle

Transfer Learning Strategies for Comic Character Recognition in Low-Data Regimes: A Comparative Study

by Marco Parrillo, Luigi Laura and Alessandro Manna

Future Internet 2026, 18(4), 192; https://doi.org/10.3390/fi18040192 - 2 Apr 2026

Viewed by 207

Abstract

Image classification in low-data regimes remains a challenging problem, particularly in stylized visual domains where intra-class similarity and inter-class feature overlap limit discriminative capacity. This study presents a systematic evaluation of regularization and transfer learning strategies for multi-class comic character recognition under constrained [...] Read more.

Image classification in low-data regimes remains a challenging problem, particularly in stylized visual domains where intra-class similarity and inter-class feature overlap limit discriminative capacity. This study presents a systematic evaluation of regularization and transfer learning strategies for multi-class comic character recognition under constrained data conditions. Four convolutional architectures are compared: (i) a baseline CNN trained from scratch, (ii) a regularized CNN incorporating data augmentation, dropout, and early stopping, (iii) a pretrained ResNet-50 used as a fixed feature extractor, and (iv) a partially fine-tuned ResNet-50 with selective layer unfreezing. Experiments are conducted on a custom four-class dataset exhibiting moderate class imbalance, evaluated using both a fixed 70/20/10 split and 5-fold cross-validation to assess generalization stability. Results indicate that shallow CNN architectures suffer from substantial overfitting, even when regularization is applied, whereas transfer learning significantly improves macro-averaged F1-score and out-of-distribution detection performance. Cross-validated results, the primary basis for inference given the dataset scale, show that both ResNet-50 strategies achieve equivalent mean accuracy of 95.0% (SD: ±0.4% for feature extraction, ±0.8% for fine-tuning; paired t = 0.00, p = 1.000), while shallow CNN architectures reach only 81–87%. Under a single fixed 70/20/10 partition (n = 69 test samples, 95% CI: ±9–12%), fine-tuning nominally reaches 98.5%; crucially, cross-validation deflates this figure to parity with feature extraction, confirming it reflects favorable partitioning rather than genuine architectural superiority. The primary finding is therefore that frozen ResNet-50 feature extraction is the recommended strategy: it matches fine-tuning in cross-validated generalization while requiring 15× fewer trainable parameters and exhibiting lower fold-to-fold variance. The findings demonstrate that pretrained deep residual representations transfer effectively to stylized comic imagery and that evaluation protocol selection critically impacts perceived performance in small datasets. These results provide practical guidelines for robust model selection in domain-specific, limited-data image classification tasks. Full article

(This article belongs to the Special Issue Innovations in Artificial Intelligence and Neural Networks)

► Show Figures

Graphical abstract

24 pages, 4742 KB

Open AccessArticle

Comparative Evaluation of YOLOv8 and YOLO11 for Image-Based Classification of Sugar Beet Seed Treatment Levels

by Cihan Unal, Ilkay Cinar, Zulfi Saripinar and Murat Koklu

Sensors 2026, 26(7), 2137; https://doi.org/10.3390/s26072137 - 30 Mar 2026

Viewed by 282

Abstract

This study addresses the automatic classification of sugar beet seeds according to their spraying levels using RGB images, aiming to enable a fast, practical, and non-destructive early warning system without chemical analysis. A dataset of 16,519 seed images acquired under controlled lighting conditions [...] Read more.

This study addresses the automatic classification of sugar beet seeds according to their spraying levels using RGB images, aiming to enable a fast, practical, and non-destructive early warning system without chemical analysis. A dataset of 16,519 seed images acquired under controlled lighting conditions was used to evaluate YOLOv8-CLS and YOLO11-CLS architectures, including the n, s, m, l, and x scale variants within the Ultralytics framework. All experiments were conducted using a 10-fold cross-validation strategy, with models trained under different batch size and learning rate configurations. The results indicate that both architectures achieve reliable performance, with accuracy values ranging from approximately 78–83% for YOLOv8-CLS and 80–82% for YOLO11-CLS models. ROC-AUC scores consistently above 0.94 demonstrate strong inter-class discrimination. Misclassification analysis shows that errors mainly occur between visually similar intermediate treatment levels, particularly 25% and 50%. Despite this challenge, low log-loss values and balanced precision–recall profiles indicate stable decision behavior. Overall, the findings confirm that sugar beet seed treatment levels can be effectively distinguished using only RGB imagery, providing a potentially low-cost and scalable approach for early warning and quality control in seed treatment processes. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

18 pages, 1175 KB

Open AccessArticle

Cross-Modal Few-Shot Learning via Siamese Similarity Networks on CLIP Embeddings for Fine-Grained Image Classification

by Julius Olaniyan, Silas Formunyuy Verkijika and Ibidun C. Obagbuwa

Appl. Sci. 2026, 16(7), 3181; https://doi.org/10.3390/app16073181 - 26 Mar 2026

Viewed by 256

Abstract

Fine-grained image classification under few-shot learning conditions remains a significant challenge due to limited labeled data and high intra-class similarity. This paper proposes a novel cross-modal framework that integrates Contrastive Language-Image Pretraining (CLIP) embeddings within a Siamese similarity network to enable robust and [...] Read more.

Fine-grained image classification under few-shot learning conditions remains a significant challenge due to limited labeled data and high intra-class similarity. This paper proposes a novel cross-modal framework that integrates Contrastive Language-Image Pretraining (CLIP) embeddings within a Siamese similarity network to enable robust and label-efficient classification. By leveraging the semantic alignment between textual class descriptions and visual representations, the model forms hybrid similarity pairs of image-to-image and image-to-text within a shared latent space, facilitating discriminative learning under low-shot scenarios. The architecture employs a dual-branch CLIP encoder and a contrastive loss function to optimize intra-class compactness and inter-class separability. Experiments conducted on benchmark datasets including miniImageNet and CUB-200-2011 demonstrate substantial improvements over zero-shot and few-shot baselines, achieving 70.32% accuracy, 71.15% F1-score, and 68.47% mAP on 5-way 1-shot and 78.41% accuracy, 79.02% F1-score, and 76.83% mAP on 5-way 5-shot tasks (averaged over 600 episodes with 95% confidence intervals on the CUB-200-2011 dataset). Extended evaluations under 10-way settings show similarly strong performance. Ablation studies further validate the critical roles of contrastive learning, normalization, and cross-modal embeddings in enhancing generalization. This work presents a scalable and interpretable paradigm for fine-grained classification in data-scarce domains. Full article

► Show Figures

Figure 1

30 pages, 18176 KB

Open AccessArticle

CRECA-Net: Class Representation-Enhanced Class-Aware Network for Semantic Segmentation of High-Resolution Remote Sensing Images

by Ruolan Liu, Bingcai Chen, Lin Yu and Shaodong Zhang

Remote Sens. 2026, 18(6), 950; https://doi.org/10.3390/rs18060950 - 21 Mar 2026

Viewed by 218

Abstract

High-resolution remote sensing (RS) images exhibit complex backgrounds, large intra-class variability, and low inter-class differences, posing substantial challenges for semantic segmentation. Although existing class-level contextual modeling methods partially alleviate these issues, they often overlook the importance of accurate and discriminative class representations and [...] Read more.

High-resolution remote sensing (RS) images exhibit complex backgrounds, large intra-class variability, and low inter-class differences, posing substantial challenges for semantic segmentation. Although existing class-level contextual modeling methods partially alleviate these issues, they often overlook the importance of accurate and discriminative class representations and fail to effectively handle hard samples during training. To address these limitations, we propose CRECA-Net, a class representation-enhanced class-aware network designed from two complementary perspectives: class prototype refinement and difficulty-aware learning. Specifically, we introduce a class prototype refinement (CPR) module that improves class representations through pixel selection, confidence-aware contribution weighting, and an inter-class prototype separation loss, yielding more reliable and discriminative class centers. In addition, class-level context aggregation (CLCA) modules capture pixel-to-class prototype correlations via cross-attention to inject class-aware semantics into decoder features, thereby reducing interference from cluttered backgrounds and visually similar categories. Furthermore, a difficulty-aware (DA) loss dynamically estimates pixel-wise difficulty and redistributes the loss weights within each image, gradually shifting the learning focus from easy to hard samples while maintaining training stability. Extensive experiments on two benchmark RS segmentation datasets demonstrate that CRECA-Net consistently outperforms state-of-the-art methods across multiple evaluation metrics. Full article

► Show Figures

Figure 1

34 pages, 12105 KB

Open AccessArticle

A Hybrid MIL Architecture for Multi-Class Classification of Bacterial Microscopic Images

by Aisulu Ismailova, Gulbanu Yessenbayeva, Kuanysh Kadirkulov, Raushan Moldasheva, Elmira Eldarova, Gulnaz Zhilkishbayeva, Shynar Kodanova, Shynar Yelezhanova, Valentina Makhatova and Alexander Nedzved

Computers 2026, 15(3), 180; https://doi.org/10.3390/computers15030180 - 10 Mar 2026

Viewed by 352

Abstract

This paper addresses the problem of multi-class classification of bacterial microscopic images using a rigorous experimental protocol designed to prevent information leakage and improve performance. The dataset consists of 2034 images representing 33 taxa, organized by class. Data integrity checks confirmed the absence [...] Read more.

This paper addresses the problem of multi-class classification of bacterial microscopic images using a rigorous experimental protocol designed to prevent information leakage and improve performance. The dataset consists of 2034 images representing 33 taxa, organized by class. Data integrity checks confirmed the absence of corrupted or unreadable files. To formalize image characteristics and ensure quality control, indirect geometric and textural features were calculated, including minimum frame size, brightness statistics (mean and standard deviation), Shannon entropy, Laplace variance, and Sobel gradient energy. Quality checks revealed a small proportion of images with extreme brightness (2.5074%), while no samples with critically low sharpness according to the selected criteria were detected. Statistical analysis of interclass differences using the Kruskal–Wallis test with multiple comparison correction demonstrated the high discriminatory power of texture features, specifically gradient energy (ε² = 0.819987) and Laplace variance (ε² = 0.709904). Feature correlations were consistent with their physical interpretation, revealing a strong positive relationship between sharpness and gradient energy. Principal component analysis confirmed a strong structural pattern, with the first two components explaining 75.5766% of the total variance. For a unified comparison, classical machine learning, transfer learning, and modern deep architectures were evaluated within a single protocol. Full article

(This article belongs to the Special Issue Machine Learning: Innovation, Implementation, and Impact)

► Show Figures

Graphical abstract

21 pages, 6787 KB

Open AccessArticle

Seeing What’s on the Plate: Composition-Aware Fine-Grained Food Recognition for Dietary Analysis

by Linghui Ye, Qingbing Sang and Zhiyong Xiao

Foods 2026, 15(5), 931; https://doi.org/10.3390/foods15050931 - 6 Mar 2026

Viewed by 471

Abstract

Reliable visual characterization of food composition is a fundamental prerequisite for image-based dietary assessment and health-oriented food analysis. In fine-grained food recognition, models often suffer from large intra-class variation and small inter-class differences, where visually similar dishes exhibit subtle yet discriminative differences in [...] Read more.

Reliable visual characterization of food composition is a fundamental prerequisite for image-based dietary assessment and health-oriented food analysis. In fine-grained food recognition, models often suffer from large intra-class variation and small inter-class differences, where visually similar dishes exhibit subtle yet discriminative differences in ingredient compositions, spatial distribution, and structural organization, which are closely associated with different nutritional characteristics and health relevance. Capturing such composition-related visual structures in a non-invasive manner remains challenging. In this work, we propose a fine-grained food classification framework that enhances spatial relation modeling and key-region awareness to improve discriminative feature representation. The proposed approach strengthens sensitivity to composition-related visual cues while effectively suppressing background interference. A lightweight multi-branch fusion strategy is further introduced for the stable integration of heterogeneous features. Moreover, to support reliable classification under large intra-class variation, a token-aware subcenter-based classification head is designed. The proposed framework is evaluated on the public FoodX-251 and UEC Food-256 datasets, achieving accuracies of 82.28% and 82.64%, respectively. Beyond benchmark performance, the framework is designed to support practical image-based dietary analysis under real-world dining conditions, where variations in appearance, viewpoint, and background are common. By enabling stable recognition of the same food category across diverse acquisition conditions and accurate discrimination among visually similar dishes with different ingredient compositions, the proposed approach provides reliable food characterization for dietary interpretation, thereby supporting practical dietary monitoring and health-oriented food analysis applications. Full article

(This article belongs to the Special Issue Digital, Computational, and Learning Technologies for Food Analysis)

► Show Figures

Figure 1

20 pages, 4711 KB

Open AccessArticle

by Jie Wang, Xin Yan Jin, Yi Fan Zhang, Jie Yuan, Zi Tong Lin and Ying Chen

Diagnostics 2026, 16(5), 710; https://doi.org/10.3390/diagnostics16050710 - 27 Feb 2026

Viewed by 238

Abstract

Background/Objectives: Medical image analysis of vertical root fractures (VRFs) is challenged by limited annotated data, class imbalance, and subtle inter-class differences. To address these issues, we propose an SBMN: a Similarity-Based Memory Network that integrates Category Memory with the Basic SBMN Module and [...] Read more.

Background/Objectives: Medical image analysis of vertical root fractures (VRFs) is challenged by limited annotated data, class imbalance, and subtle inter-class differences. To address these issues, we propose an SBMN: a Similarity-Based Memory Network that integrates Category Memory with the Basic SBMN Module and a similarity-based classifier. Methods: An SBMN stores representative features for each class and leverages similarity-based gating to enhance feature discrimination. Experiments were conducted on a CBCT dataset of fractured and non-fractured teeth to evaluate performance. Results: The SBMN achieved up to 97.1% and 99.7% classification accuracy on automatically and manually segmented images, respectively. Memory manipulation experiments confirm the critical role of Category Memory in controlling classification outcomes. Conclusions: These results indicate that SBMNs offer an effective and interpretable approach for small-sample medical image classification and diagnosis. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence to Oral Diseases)

► Show Figures

Figure 1

28 pages, 2771 KB

Open AccessArticle

Improving Tree-Based Lung Disease Classification from Chest X-Ray Images Using Deep Feature Representations

by Abdulaziz A. Alsulami, Qasem Abu Al-Haija, Rayed Alakhtar, Huda Alsobhi, Rayan A. Alsemmeari, Badraddin Alturki and Ahmad J. Tayeb

Bioengineering 2026, 13(3), 267; https://doi.org/10.3390/bioengineering13030267 - 25 Feb 2026

Viewed by 527

Abstract

Healthcare systems worldwide face increasing pressure to deliver accurate, affordable, and scalable diagnostic services while maintaining long-term sustainability. Chest X-ray screening is considered one of the most cost-effective methods for detecting lung disease. However, many deep learning approaches are computationally intensive and difficult [...] Read more.

Healthcare systems worldwide face increasing pressure to deliver accurate, affordable, and scalable diagnostic services while maintaining long-term sustainability. Chest X-ray screening is considered one of the most cost-effective methods for detecting lung disease. However, many deep learning approaches are computationally intensive and difficult to interpret, which limits their adoption in high-throughput, resource-constrained clinical settings. This study proposes a hybrid CNN–tree framework for automated lung disease classification from chest X-ray images, which targets COVID-19, pneumonia, tuberculosis, lung cancer, and normal cases. To ensure robustness and generalization, four publicly available chest X-ray datasets from different sources are merged into a unified five-class dataset, which introduces realistic variations in imaging conditions and patient populations. A ResNet-18 model is fine-tuned to extract domain-specific deep feature representations. Feature dimensionality and redundancy are reduced using Principal Component Analysis, while class imbalance is addressed through the Synthetic Minority Over-sampling Technique. The resulting compact feature vectors are used to train interpretable tree-based classifiers, which include Decision Tree, Random Forest, and XGBoost. Experiments conducted using five-fold stratified cross-validation demonstrate substantial and consistent performance gains. When trained on fine-tuned and preprocessed deep features, all evaluated tree-based classifiers achieve weighted F1-scores between 0.977 and 0.982 using five-fold cross-validation, with a significant reduction in inter-class confusion. In addition, the proposed framework maintains low per-sample inference latency, which supports energy-efficient and scalable deployment. These results indicate that combining deep feature learning with interpretable tree-based models provides a practical and reliable solution for sustainable chest X-ray screening in real-world clinical environments. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

28 pages, 4533 KB

Open AccessArticle

SFCF-Net: Spatial-Frequency Synergistic Learning for Casting Defect Segmentation of Pre-Service Aircraft Engine Blades in Industrial Radiographic Inspection

by Shun Wang, Zhiying Sun, Xifeng Fang and Dejun Cheng

Sensors 2026, 26(5), 1416; https://doi.org/10.3390/s26051416 - 24 Feb 2026

Viewed by 395

Abstract

Turbine blades serve as critical components in aircraft engines, yet casting defects inevitably arise during manufacturing. Therefore, accurate pre-service turbine blade defect detection is critical for aircraft engine safety. However, existing deep learning-based detection methods face several challenges: poor image quality, intraclass variance, [...] Read more.

Turbine blades serve as critical components in aircraft engines, yet casting defects inevitably arise during manufacturing. Therefore, accurate pre-service turbine blade defect detection is critical for aircraft engine safety. However, existing deep learning-based detection methods face several challenges: poor image quality, intraclass variance, interclass similarity, and irregular defect geometries. Moreover, most existing defect detection methods rely primarily on spatial-domain features, which are insufficient for capturing fine-grained texture information, limiting their ability to discriminate complex defect patterns. To address these challenges, we propose a novel Spatial-Frequency Complementary Fusion Network (SFCF-Net) that synergistically integrates spatial and frequency-domain features through complementary cross-modal fusion for accurate defect segmentation. First, a Selective Cross-modal Calibration (SCC) module is introduced that selectively calibrates spatial-frequency features through gated cross-modal interactions, effectively preserving fine-grained details under poor image conditions. Next, we propose a Cross-modal Refinement and Complementation (CRC) module that employs dual-stage attention mechanisms to model intra- and inter-modal feature dependencies, enabling robust discrimination between similar defect categories while maintaining consistency within the same defect class. Finally, we propose an Asymmetric Window Attention (AWA) module that employs bidirectional rectangular windows for accurate defect geometric characterization. Comprehensive experiments on the Aero-engine Turbine Blade Casting Defect Segmentation (ATBCD-Seg) dataset and a public benchmark demonstrate that SFCF-Net consistently outperforms state-of-the-art methods across multiple evaluation metrics, meeting practical requirements for automated quality control in blade manufacturing. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

23 pages, 26789 KB

Open AccessArticle

DermaCalibra: A Robust and Explainable Multimodal Framework for Skin Lesion Diagnosis via Bayesian Uncertainty and Dynamic Modulation

by Ben Wang, Qingjun Niu, Chengying She, Jialu Zhang, Wei Gao and Lizhuang Liu

Diagnostics 2026, 16(4), 630; https://doi.org/10.3390/diagnostics16040630 - 21 Feb 2026

Viewed by 411

Abstract

Background: Accurate and timely diagnosis of skin lesions, including Melanoma (MEL), Basal Cell Carcinoma (BCC), Squamous Cell Carcinoma (SCC), Actinic Keratosis (ACK), Seborrheic Keratosis (SEK), and Nevus (NEV), is often hindered by the severe class imbalance and high morphological similarity among pathologies in [...] Read more.

Background: Accurate and timely diagnosis of skin lesions, including Melanoma (MEL), Basal Cell Carcinoma (BCC), Squamous Cell Carcinoma (SCC), Actinic Keratosis (ACK), Seborrheic Keratosis (SEK), and Nevus (NEV), is often hindered by the severe class imbalance and high morphological similarity among pathologies in clinical practice. Although multimodal learning has shown potential in resolving these issues, existing approaches often fail to address predictive uncertainty or effectively integrate heterogeneous clinical metadata. Therefore, this study proposes DermaCalibra, a robust and explainable multimodal framework optimized for small-scale, imbalanced clinical datasets. Methods: The proposed framework integrates three essential modules: First, the Attention-Based Multimodal Channel Recalibration (AMCR) module introduces a probabilistic Bayesian uncertainty estimation mechanism via Monte Carlo dropout to adjust focal loss weights, prioritizing features from underrepresented classes. Second, the Metadata-Driven Dynamic Feature Modulation and Cross-Attention Fusion (MDFM-CAF) module, designed to resolve inter-class visual ambiguity, dynamically rescales dermoscopic feature maps using non-linear clinical context transformations. Lastly, the Gradient Feature Attribution (GFA) module is implemented to provide pixel-level diagnostic heatmaps and metadata importance scores. Results: Evaluated on the PAD-UFES-20 dataset, DermaCalibra achieves a balanced accuracy (BACC) of 84.2%, outperforming current state-of-the-art (SOTA) methods by 3.6%, and a Macro Area Under the Receiver Operating Characteristic Curve (Macro AUC) of 96.9%. Extensive external validation on unseen hospital and synthetic datasets confirms robust generalizability across diverse clinical settings without the need for retraining. Conclusions: DermaCalibra effectively bridges the gap between deep learning complexity and clinical intuition through uncertainty-aware reasoning and transparent interpretability. The framework provides a reliable and scalable computer-aided diagnostic tool for early skin lesion detection, particularly in resource-limited clinical environments. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

24 pages, 6691 KB

Open AccessArticle

GISLC: Gated-Inception Model for Skin Lesion Classification

by Tamam Alsarhan, Mohammad Kamal Abdulaziz, Ahmad Ali, Ayoub Alsarhan, Sami Aziz Alshammari, Rahaf R. Alshammari, Nayef H. Alshammari and Khalid Hamad Alnafisah

Electronics 2026, 15(4), 861; https://doi.org/10.3390/electronics15040861 - 18 Feb 2026

Viewed by 373

Abstract

Skin-lesion recognition from clinical photographs is clinically valuable yet computationally challenging due to large intra-class variation, subtle inter-class boundaries, class imbalance, and heterogeneous acquisition conditions. To address these constraints under realistic compute budgets, we investigate Inception-family convolutional baselines and propose GISLC—a Gated-Inception model [...] Read more.

Skin-lesion recognition from clinical photographs is clinically valuable yet computationally challenging due to large intra-class variation, subtle inter-class boundaries, class imbalance, and heterogeneous acquisition conditions. To address these constraints under realistic compute budgets, we investigate Inception-family convolutional baselines and propose GISLC—a Gated-Inception model that augments a GoogLeNet/Inception-V1 backbone with a lightweight, spatial gating head inspired by ConvLSTM. Unlike static fusion (concatenation/summation) of multi-branch features, the proposed gated head performs per-location, learnable regulation of feature flow across branches, prioritizing diagnostically salient patterns while suppressing redundant activations. Experiments were conducted on the clinical-images subset of the Multimodal Augmented Skin Lesion Dataset (MASLD), an augmented derivative of HAM10000, using stratified train/validation/test splits, clinically motivated augmentation, and class-weighted optimization to mitigate skewed label frequencies. A controlled ablation study evaluates backbone choices and optimization settings and isolates the contribution of gated fusion relative to standard Inception heads. Across runs, the gated fusion strategy improves discriminative performance while remaining parameter-efficient, supporting the view that spatially adaptive regulation can enhance robustness on non-dermatoscopic clinical imagery. We further outline practical steps for calibration analysis and compression-aware deployment in clinical and edge settings. Full article

(This article belongs to the Special Issue AI-Driven Edge Intelligence for Smart Cities, Healthcare, and Autonomous Systems)

► Show Figures

Figure 1

23 pages, 5641 KB

Open AccessArticle

Lightweight Multi-Scale Framework for Human Pose and Action Classification

by Alireza Saber, Mohammad-Mehdi Hosseini, Amirreza Fateh, Mansoor Fateh and Vahid Abolghasemi

Sensors 2026, 26(4), 1102; https://doi.org/10.3390/s26041102 - 8 Feb 2026

Viewed by 467

Abstract

Human pose classification, along with related tasks such as action recognition, is a crucial area in deep learning due to its wide range of applications in assisting human activities. Despite significant progress, it remains a challenging problem because of high inter-class similarity, dataset [...] Read more.

Human pose classification, along with related tasks such as action recognition, is a crucial area in deep learning due to its wide range of applications in assisting human activities. Despite significant progress, it remains a challenging problem because of high inter-class similarity, dataset noise, and the large variability in human poses. In this paper, we propose a lightweight yet highly effective modular attention-based architecture for human pose classification, built upon a Swin Transformer backbone for robust multi-scale feature extraction. The proposed design integrates the Spatial Attention module, the Context-Aware Channel Attention Module, and a novel Dual Weighted Cross Attention module, enabling effective fusion of spatial and channel-wise cues. Additionally, explainable AI techniques are employed to improve the reliability and interpretability of the model. We train and evaluate our approach on two distinct datasets: Yoga-82 (in both main-class and subclass configurations) and Stanford 40 Actions. Experimental results show that our model outperforms state-of-the-art baselines across accuracy, precision, recall, F1-score, and mean average precision, while maintaining an extremely low parameter count of only 0.79 million. Specifically, our method achieves accuracies of 90.40% and 87.44% for the 6-class and 20-class Yoga-82 configurations, respectively, and 94.28% for the Stanford 40 Actions dataset. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 3447 KB

Open AccessArticle

Hybrid Decoding with Co-Occurrence Awareness for Fine-Grained Food Image Segmentation

by Shenglong Wang and Guorui Sheng

Foods 2026, 15(3), 534; https://doi.org/10.3390/foods15030534 - 3 Feb 2026

Viewed by 330

Abstract

Fine-grained food image segmentation is essential for accurate dietary assessment and nutritional analysis, yet remains highly challenging due to ambiguous boundaries, inter-class similarity, and dense layouts of meals containing many different ingredients in real-world settings. Existing methods based solely on CNNs, Transformers, or [...] Read more.

Fine-grained food image segmentation is essential for accurate dietary assessment and nutritional analysis, yet remains highly challenging due to ambiguous boundaries, inter-class similarity, and dense layouts of meals containing many different ingredients in real-world settings. Existing methods based solely on CNNs, Transformers, or Mamba architectures often fail to simultaneously preserve fine-grained local details and capture contextual dependencies over long distances. To address these limitations, we propose HDF (Hybrid Decoder for Food Image Segmentation), a novel decoding framework built upon the MambaVision backbone. Our approach first employs a convolution-based feature pyramid network (FPN) to extract multi-stage features from the encoder. These features are then thoroughly fused across scales using a Cross-Layer Mamba module that models inter-level dependencies with linear complexity. Subsequently, an Attention Refinement module integrates global semantic context through spatial–channel reweighting. Finally, a Food Co-occurrence Module explicitly enhances food-specific semantics by learning dynamic co-occurrence patterns among categories, improving segmentation of visually similar or frequently co-occurring ingredients. Evaluated on two widely used, high-quality benchmarks, FoodSeg103 and UEC-FoodPIX Complete, which are standard datasets for fine-grained food segmentation, HDF achieves a 52.25% mean Intersection-over-Union (mIoU) on FoodSeg103 and a 76.16% mIoU on UEC-FoodPIX Complete, outperforming current state-of-the-art methods by a clear margin. These results demonstrate that HDF’s hybrid design and explicit co-occurrence awareness effectively address key challenges in food image segmentation, providing a robust foundation for practical applications in dietary logging, nutritional estimation, and food safety inspection. Full article

(This article belongs to the Section Food Analytical Methods)

► Show Figures

Figure 1

20 pages, 59693 KB

Open AccessArticle

GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines

by Jingyao Zhang, Song Dai, Weihua Jiang, Xuerong Cui and Juan Li

Remote Sens. 2026, 18(3), 439; https://doi.org/10.3390/rs18030439 - 30 Jan 2026

Viewed by 400

Abstract

The detection of exposed subsea pipelines is a key task in current marine remote sensing, and multibeam echosounders (MBESs) are a primary instrument for detecting exposed pipelines. However, complex seabed environments interfere with acoustic echoes, introducing substantial noise points into MBES point-cloud data [...] Read more.

The detection of exposed subsea pipelines is a key task in current marine remote sensing, and multibeam echosounders (MBESs) are a primary instrument for detecting exposed pipelines. However, complex seabed environments interfere with acoustic echoes, introducing substantial noise points into MBES point-cloud data and substantially degrading its quality. Conventional point-cloud denoising methods struggle to suppress noise while simultaneously preserving pipeline integrity, whereas point-cloud noise-segmentation methods can better address this challenge. Nevertheless, noise-segmentation methods remain constrained by the lack of geometric priors and the presence of class imbalance. To address these issues, this paper proposes a geometry-prior and rational-activation Transformer for the MBES point-cloud denoising of exposed subsea pipelines (GPRAformer). The method comprises the following three core designs: a pipeline-informed prior encoder (PIPE) sampling module to enhance the separability between pipeline points and noise points; a rational-activated Kolmogorov–Arnold network transformer (RaKANsformer) feature extraction module that couples gated self-attention with KAN structures using rational-function activations for joint feature extraction, thereby strengthening global dependency modeling and nonlinear expressivity; and class-adaptive loss (CAL)-constrained noise-segmentation module that introduces intra-class consistency and inter-class separation constraints to mitigate false detections and miss detections arising from class imbalance. Evaluations on actual measured MBES point-cloud datasets show that, compared with the suboptimal model under each metric, GPRAformer achieves improvements of 6.83%, 1.78%, 5.12%, and 6.20% in mean intersection over union (mIoU), Accuracy, F1-score, and Recall, respectively. These results indicate a significant enhancement in overall segmentation performance. Therefore, GPRAformer can achieve high-precision and robust MBES point-cloud noise segmentation in complex seabed environments. Full article

► Show Figures

Figure 1

Search Results (214)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (214)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI