MDPI - Publisher of Open Access Journals

24 pages, 14242 KB

Open AccessArticle

DBA-YOLO: A Dense Target Detection Model Based on Lightweight Neural Networks

by Zhiyong He, Jiahong Yang, Hongtian Ning, Chengxuan Li and Qiang Tang

J. Imaging 2025, 11(10), 345; https://doi.org/10.3390/jimaging11100345 - 4 Oct 2025

Current deep learning-based dense target detection models face dual challenges in industrial scenarios: high computational complexity leading to insufficient inference efficiency on mobile devices, and missed/false detections caused by dense small targets, high inter-class similarity, and complex background interference. To address these issues, [...] Read more.

Current deep learning-based dense target detection models face dual challenges in industrial scenarios: high computational complexity leading to insufficient inference efficiency on mobile devices, and missed/false detections caused by dense small targets, high inter-class similarity, and complex background interference. To address these issues, this paper proposes DBA-YOLO, a lightweight model based on YOLOv10, which significantly reduces computational complexity through model compression and algorithm optimization while maintaining high accuracy. Key improvements include the following: (1) a C2f PA module for enhanced feature extraction, (2) a parameter-refined BIMAFPN neck structure to improve small target detection, and (3) a DyDHead module integrating scale, space, and task awareness for spatial feature weighting. To validate DBA-YOLO, we constructed a real-world dataset from cigarette package images. Experiments on SKU-110K and our dataset show that DBA-YOLO achieves 91.3% detection accuracy (1.4% higher than baseline), with mAP and mAP75 improvements of 2–3%. Additionally, the model reduces parameters by 3.6%, balancing efficiency and performance for resource-constrained devices. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 2628 KB

Open AccessArticle

Importance-Weighted Locally Adaptive Prototype Extraction Network for Few-Shot Detection

by Haibin Wang, Yong Tao, Zhou Zhou, Yue Wang, Xu Fan and Xiangjun Wang

Sensors 2025, 25(19), 5945; https://doi.org/10.3390/s25195945 - 23 Sep 2025

Viewed by 194

Abstract

Few-Shot Object Detection (FSOD) aims to identify new object categories with a limited amount of labeled data, which holds broad application prospects in real-life scenarios. Previous approaches usually ignore attention to critical information, which leads to the generation of low-quality prototypes and suboptimal [...] Read more.

Few-Shot Object Detection (FSOD) aims to identify new object categories with a limited amount of labeled data, which holds broad application prospects in real-life scenarios. Previous approaches usually ignore attention to critical information, which leads to the generation of low-quality prototypes and suboptimal performance in few-shot scenarios. To overcome the defect, an improved FSOD network is proposed in this paper, which mimics the human visual attention mechanism by emphasizing areas that are semantically important and rich in spatial information. Specifically, an Importance-Weighted Local Adaptive Prototype module is first introduced, which highlights key local features of support samples, and more expressive class prototypes are generated by assigning greater weights to salient regions so that generalization ability is effectively enhanced under few-shot settings. Secondly, an Imbalanced Diversity Sampling module is utilized to select diverse and challenging negative sample prototypes, which enhances inter-class separability and reduces confusion among visually similar categories. Moreover, a Weighted Non-Linear Fusion module is designed to integrate various forms of feature interaction. The contributions of the feature interactions are modulated by learnable importance weights, which improve the effect of feature fusion. Extensive experiments on PASCAL VOC and MS COCO benchmarks validate the effectiveness of our method. The experimental results reflect the fact that the mean average precision from our method is improved by 2.84% on the PASCAL VOC dataset compared with Fine-Grained Prototypes Distillation (FPD), and the AP from our method surpasses the recent FPD baseline by 0.8% and 1.8% on the MS COCO dataset, respectively. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

22 pages, 3632 KB

Open AccessArticle

RFR-YOLO-Based Recognition Method for Dairy Cow Behavior in Farming Environments

by Congcong Li, Jialong Ma, Shifeng Cao and Leifeng Guo

Agriculture 2025, 15(18), 1952; https://doi.org/10.3390/agriculture15181952 - 15 Sep 2025

Viewed by 407

Abstract

Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity [...] Read more.

Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity among different cow behaviors. To address these limitations, this study introduces an enhanced target detection algorithm for cow behavior recognition, termed RFR-YOLO, which is developed upon the YOLOv11n framework. A well-structured dataset encompassing nine distinct cow behaviors—namely, lying, standing, walking, eating, drinking, licking, grooming, estrus, and limping—is constructed, comprising a total of 13,224 labeled samples. The proposed algorithm incorporates three major technical improvements: First, an Inverted Dilated Convolution module (Region Semantic Inverted Convolution, RsiConv) is designed and seamlessly integrated with the C3K2 module to form the C3K2_Rsi module, which effectively reduces computational overhead while enhancing feature representation. Second, a Four-branch Multi-scale Dilated Attention mechanism (Four Multi-Scale Dilated Attention, FMSDA) is incorporated into the network architecture, enabling the scale-specific features to align with the corresponding receptive fields, thereby improving the model’s capacity to capture multi-scale characteristics. Third, a Reparameterized Generalized Residual Feature Pyramid Network (Reparameterized Generalized Residual-FPN, RepGRFPN) is introduced as the Neck component, allowing for the features to propagate through differentiated pathways and enabling flexible control over multi-scale feature expression, thereby facilitating efficient feature fusion and mitigating the impact of behavioral similarity. The experimental results demonstrate that RFR-YOLO achieves precision, recall, mAP50, and mAP50:95 values of 95.9%, 91.2%, 94.9%, and 85.2%, respectively, representing performance gains of 5.5%, 5%, 5.6%, and 3.5% over the baseline model. Despite a marginal increase in computational complexity of 1.4G, the algorithm retains a high detection speed of 147.6 frames per second. The proposed RFR-YOLO algorithm significantly improves the accuracy and robustness of target detection in group cow farming scenarios. Full article

(This article belongs to the Section Farm Animal Production)

► Show Figures

Figure 1

14 pages, 11856 KB

Open AccessArticle

Few-Shot Fine-Grained Image Classification with Residual Reconstruction Network Based on Feature Enhancement

by Ying Liu, Haibin Zhang and Weidong Zhang

Appl. Sci. 2025, 15(18), 9953; https://doi.org/10.3390/app15189953 - 11 Sep 2025

Viewed by 295

Abstract

In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably [...] Read more.

In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably introduce background noise. To overcome these limitations, this paper proposes a Bidirectional Feature Reconstruction Network that incorporates a Feature Enhancement Attention Module (FEAM) to highlight discriminative regions and suppress background interference, while integrating a Channel-Aware Spatial Attention (CASA) module to strengthen local feature modeling and compensate for the Transformer’s tendency to overemphasize global information. This joint design not only enhances inter-class separability but also effectively reduces intra-class variation. Extensive experiments on the CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches, validating its effectiveness and robustness in few-shot fine-grained image classification. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

26 pages, 32504 KB

Open AccessArticle

Smart Tourism Landmark Recognition: A Multi-Threshold Enhancement and Selective Ensemble Approach Using YOLO11

by Ulugbek Hudayberdiev, Junyeong Lee and Odil Fayzullaev

Sustainability 2025, 17(17), 8081; https://doi.org/10.3390/su17178081 - 8 Sep 2025

Viewed by 682

Abstract

Automated landmark recognition represents a cornerstone technology for advancing smart tourism systems, cultural heritage documentation, and enhanced visitor experiences. Contemporary deep learning methodologies have substantially transformed the accuracy and computational efficiency of destination classification tasks. Addressing critical gaps in existing approaches, we introduce [...] Read more.

Automated landmark recognition represents a cornerstone technology for advancing smart tourism systems, cultural heritage documentation, and enhanced visitor experiences. Contemporary deep learning methodologies have substantially transformed the accuracy and computational efficiency of destination classification tasks. Addressing critical gaps in existing approaches, we introduce an enhanced Samarkand_v2 dataset encompassing twelve distinct historical landmark categories with comprehensive environmental variability. Our methodology incorporates a systematic multi-threshold pixel intensification strategy, applying graduated enhancement transformations at intensity levels of 100, 150, and 225 to accentuate diverse architectural characteristics spanning from fine-grained textural elements to prominent reflective components. Four independent YOLO11 architectures were trained using original imagery alongside systematically enhanced variants, with optimal epoch preservation based on validation performance criteria. A key innovation lies in our intelligent selective ensemble mechanism that conducts exhaustive evaluation of model combinations, identifying optimal configurations through data-driven selection rather than conventional uniform weighting schemes. Experimental validation demonstrates substantial performance gains over established baseline architectures and traditional ensemble approaches, achieving exceptional metrics: 99.24% accuracy, 99.36% precision, 99.40% recall, and 99.36% F1-score. Rigorous statistical analysis via paired t-tests validates the significance of enhancement strategies, particularly demonstrating effectiveness of lower-threshold transformations in capturing architectural nuances. The framework exhibits remarkable resilience across challenging conditions including illumination variations, structural occlusions, and inter-class architectural similarities. These achievements establish the methodology’s substantial potential for practical smart tourism deployment, automated heritage preservation initiatives, and real-time mobile landmark recognition systems, contributing significantly to the advancement of intelligent tourism technologies. Full article

(This article belongs to the Special Issue Smart and Responsible Tourism: Innovations for a Sustainable Future)

► Show Figures

Figure 1

32 pages, 14316 KB

Open AccessArticle

FewMedical-XJAU: A Challenging Benchmark for Fine-Grained Medicinal Plant Classification

by Tao Zhang, Sheng Huang, Gulimila Kezierbieke, Yeerjiang Halimu and Hui Li

Sensors 2025, 25(17), 5499; https://doi.org/10.3390/s25175499 - 4 Sep 2025

Viewed by 855

Abstract

Fine-grained plant image classification (FPIC) aims to distinguish plant species with subtle visual differences, but existing datasets often suffer from limited category diversity, homogeneous backgrounds, and insufficient environmental variation, limiting their effectiveness in complex real-world scenarios. To address these challenges, a novel dataset, [...] Read more.

Fine-grained plant image classification (FPIC) aims to distinguish plant species with subtle visual differences, but existing datasets often suffer from limited category diversity, homogeneous backgrounds, and insufficient environmental variation, limiting their effectiveness in complex real-world scenarios. To address these challenges, a novel dataset, FewMedical-XJAU, is presented, focusing on rare medicinal plants native to Xinjiang, China. This dataset offers higher intra-class variability, more complex and diverse natural backgrounds, varied shooting angles and lighting conditions, and more rigorous expert annotations, providing a realistic testbed for FPIC tasks. Building on this, an improved method called BDCC (Bilinear Deep Cross-modal Composition) is proposed, which incorporates textual priors into a deep metric learning framework to enhance semantic discrimination. A Class-Aware Structured Text Prompt Construction strategy is introduced to improve the model’s semantic understanding, along with a dynamic fusion mechanism to address high inter-class similarity and intra-class variability. In few-shot classification experiments, the method demonstrates superior accuracy and robustness under complex environmental conditions, offering strong support for practical applications of fine-grained classification. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

18 pages, 1767 KB

Open AccessArticle

A Blind Few-Shot Learning for Multimodal-Biological Signals with Fractal Dimension Estimation

by Nadeem Ullah, Seung Gu Kim, Jung Soo Kim, Min Su Jeong and Kang Ryoung Park

Fractal Fract. 2025, 9(9), 585; https://doi.org/10.3390/fractalfract9090585 - 3 Sep 2025

Viewed by 462

Abstract

Improving the decoding accuracy of biological signals has been a research focus for decades to advance health, automation, and robotic industries. However, challenges like inter-subject variability, data scarcity, and multifunctional variability cause low decoding accuracy, thus hindering the practical deployment of biological signal [...] Read more.

Improving the decoding accuracy of biological signals has been a research focus for decades to advance health, automation, and robotic industries. However, challenges like inter-subject variability, data scarcity, and multifunctional variability cause low decoding accuracy, thus hindering the practical deployment of biological signal paradigms. This paper proposes a multifunctional biological signals network (Multi-BioSig-Net) that addresses the aforementioned issues by devising a novel blind few-shot learning (FSL) technique to quickly adapt to multiple target domains without needing a pre-trained model. Specifically, our proposed multimodal similarity extractor (MMSE) and self-multiple domain adaptation (SMDA) modules address data scarcity and inter-subject variability issues by exploiting and enhancing the similarity between multimodal samples and quickly adapting the target domains by adaptively adjusting the parameters’ weights and position, respectively. For multifunctional learning, we proposed inter-function discriminator (IFD) that discriminates the classes by extracting inter-class common features and then subtracts them from both classes to avoid false prediction of the proposed model due to overfitting on the common features. Furthermore, we proposed a holistic-local fusion (HLF) module that exploits contextual-detailed features to adapt the scale-varying features across multiple functions. In addition, fractal dimension estimation (FDE) was employed for the classification of left-hand motor imagery (LMI) and right-hand motor imagery (RMI), confirming that proposed method can effectively extract the discriminative features for this task. The effectiveness of our proposed algorithm was assessed quantitatively and statistically against competent state-of-the-art (SOTA) algorithms utilizing three public datasets, demonstrating that our proposed algorithm outperformed SOTA algorithms. Full article

(This article belongs to the Special Issue Applied Fractional Calculus in Machine Learning and Biomedical Engineering)

► Show Figures

Figure 1

23 pages, 4776 KB

Open AccessArticle

Category-Guided Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images

by Yue Ni, Jiahang Liu, Hui Zhang, Weijian Chi and Ji Luan

Remote Sens. 2025, 17(17), 3054; https://doi.org/10.3390/rs17173054 - 2 Sep 2025

Viewed by 947

Abstract

High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, [...] Read more.

High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, and high computational cost. Moreover, existing multi-scale fusion mechanisms are prone to semantic misalignment across levels, hindering effective information integration and reducing boundary clarity. To address these issues, a Category-Guided Transformer (CIGFormer) is proposed. Specifically, the Category-Information-Guided Transformer Module (CIGTM) integrates global and local branches: the global branch combines window-based self-attention (WSAM) and window adaptive pooling self-attention (WAPSAM), using class predictions to enhance global context modeling and reduce intra-class and inter-class confusion; the local branch extracts multi-scale structural features to refine semantic representation and boundaries. In addition, an Adaptive Wavelet Fusion Module (AWFM) is designed, which leverages wavelet decomposition and channel-spatial joint attention for dynamic multi-scale fusion while preserving structural details. Extensive experiments on the ISPRS Vaihingen and Potsdam datasets demonstrate that CIGFormer, with only 21.50 M parameters, achieves outstanding performance in small object recognition, boundary refinement, and complex scene parsing, showing strong potential for practical applications. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

19 pages, 4057 KB

Open AccessArticle

Few-Shot Target Detection Algorithm Based on Adaptive Sampling Meta-DETR

by Zihao Ma, Gang Liu, Zhaoya Tong and Xiaoliang Fan

Electronics 2025, 14(17), 3506; https://doi.org/10.3390/electronics14173506 - 2 Sep 2025

Viewed by 520

Abstract

Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of [...] Read more.

Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of sampling points in the deformable attention is fixed, which limits the effective information involved in feature extraction, resulting in insufficient feature extraction of the target and affecting detection performance. To solve this problem, a Meta-DETR few-shot target detection algorithm based on adaptive sampling deformable attention is proposed. Firstly, the cosine similarity between feature points is calculated by query features that are integrated with support features. Secondly, the number of related features of each feature point is counted by the similarity threshold. Thirdly, the final number of sampling points of the feature map are calculated by using the idea of maximum inter-class variance to achieve adaptive sampling. Finally, adaptive sampling deformable attention is integrated into Meta-DETR to achieve few-shot target detection. From the attention activation map, it can be seen that the deformable attention based on adaptive sampling pays more attention to the target itself. Compared with Meta-DETR, the proposed algorithm improves the detection accuracy of novel classes by 0.9%, 0.7%, 1.4%, and 2.1%, respectively, for shots 1, 2, 3, and 10 in partition 1 on the PASCAL VOC dataset; 3.5%, 0.1%, 5.5%, and 5.7%, respectively, for shots 2, 3, 5, and 10 in partition 2; and 1.9%, 1.0%, 2.1%, and 0.1%, respectively, for shots 2, 3, 5, and 10 in partition 3. Compared with MPF-Net, CRK-Net, and FSCE, the proposed algorithm achieves the best performance and can effectively realize detection under few-shot conditions. In addition, experiments on a self-made infrared dataset further validate the effectiveness of the algorithm proposed in this paper. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

31 pages, 3554 KB

Open AccessArticle

FFFNet: A Food Feature Fusion Model with Self-Supervised Clustering for Food Image Recognition

by Zhejun Kuang, Haobo Gao, Jian Zhao, Liu Wang and Lei Sun

Appl. Sci. 2025, 15(17), 9542; https://doi.org/10.3390/app15179542 - 29 Aug 2025

Viewed by 492

Abstract

With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion [...] Read more.

With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion Network (FFFNet), which leverages a multi-head cross-attention mechanism to integrate the local detail-capturing capability of Convolutional Neural Networks with the global modeling capacity of Vision Transformers. This enables the model to capture key discriminative features when addressing such challenging food recognition tasks. FFFNet also introduces self-supervised clustering, generating pseudo-labels from the feature space distribution and employing a clustering objective derived from Kullback–Leibler divergence to optimize the feature space distribution. By maximizing similarity between features and their corresponding cluster centers, and minimizing similarity with non-corresponding centers, it promotes intra-class compactness and inter-class separability, thereby addressing the core challenges. We evaluated FFFNet across the ISIA Food-500, ETHZ Food-101, and UEC Food256 datasets, attaining Top-1/Top-5 accuracies of 65.31%/88.94%, 89.98%/98.37%, and 80.91%/94.92%, respectively, outperforming existing approaches. Full article

► Show Figures

Figure 1

25 pages, 3904 KB

Open AccessArticle

Physics-Guided Multi-Representation Learning with Quadruple Consistency Constraints for Robust Cloud Detection in Multi-Platform Remote Sensing

by Qing Xu, Zichen Zhang, Guanfang Wang and Yunjie Chen

Remote Sens. 2025, 17(17), 2946; https://doi.org/10.3390/rs17172946 - 25 Aug 2025

Cited by 1 | Viewed by 733

Abstract

With the rapid expansion of multi-platform remote sensing applications, cloud contamination significantly impedes cross-platform data utilization. Current cloud detection methods face critical technical challenges in cross-platform settings, including neglect of atmospheric radiative transfer mechanisms, inadequate multi-scale structural decoupling, high intra-class variability coupled with [...] Read more.

With the rapid expansion of multi-platform remote sensing applications, cloud contamination significantly impedes cross-platform data utilization. Current cloud detection methods face critical technical challenges in cross-platform settings, including neglect of atmospheric radiative transfer mechanisms, inadequate multi-scale structural decoupling, high intra-class variability coupled with inter-class similarity, cloud boundary ambiguity, cross-modal feature inconsistency, and noise propagation in pseudo-labels within semi-supervised frameworks. To address these issues, we introduce a Physics-Guided Multi-Representation Network (PGMRN) that adopts a student–teacher architecture and fuses tri-modal representations—Pseudo-NDVI, structural, and textural features—via atmospheric priors and intrinsic image decomposition. Specifically, PGMRN first incorporates an InfoNCE contrastive loss to enhance intra-class compactness and inter-class discrimination while preserving physical consistency; subsequently, a boundary-aware regional adaptive weighted cross-entropy loss integrates PA-CAM confidence with distance transforms to refine edge accuracy; furthermore, an Uncertainty-Aware Quadruple Consistency Propagation (UAQCP) enforces alignment across structural, textural, RGB, and physical modalities; and finally, a dynamic confidence-screening mechanism that couples PA-CAM with information entropy and percentile-based thresholding robustly refines pseudo-labels. Extensive experiments on four benchmark datasets demonstrate that PGMRN achieves state-of-the-art performance, with Mean IoU values of 70.8% on TCDD, 79.0% on HRC_WHU, and 83.8% on SWIMSEG, outperforming existing methods. Full article

(This article belongs to the Special Issue Multi-platform and Multi-modal Remote Sensing Data Fusion with Advanced Deep Learning Techniques (Second Edition))

► Show Figures

Figure 1

23 pages, 3739 KB

Open AccessArticle

FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data

by Oussama Akram Bensiah and Rohallah Benaboud

Electronics 2025, 14(16), 3286; https://doi.org/10.3390/electronics14163286 - 19 Aug 2025

Viewed by 697

Abstract

Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge [...] Read more.

Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge that impairs model performance, compromises generalization, and delays convergence. To address these issues, we propose FedDPA, a novel framework that utilizes dynamic prototypical alignment. FedDPA operates in three stages. First, it computes class-specific prototypes for each client to capture local data distributions, integrating them into an adaptive regularization mechanism. Next, a hierarchical aggregation strategy clusters and combines prototypes from similar clients, which reduces communication overhead and stabilizes model updates. Finally, a contrastive alignment process refines the global model by enforcing intra-class compactness and inter-class separation in the feature space. These mechanisms work in concert to mitigate client drift and enhance global model performance. We conducted extensive evaluations on standard classification benchmarks—EMNIST, FEMNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet 200—under various non-identically and independently distributed (non-IID) scenarios. The results demonstrate the superiority of FedDPA over state-of-the-art methods, including FedAvg, FedNH, and FedROD. Our findings highlight FedDPA’s enhanced effectiveness, stability, and adaptability, establishing it as a scalable and efficient solution to the critical problem of data heterogeneity in federated learning. Full article

► Show Figures

Figure 1

24 pages, 2115 KB

Open AccessArticle

MHD-Protonet: Margin-Aware Hard Example Mining for SAR Few-Shot Learning via Dual-Loss Optimization

by Marii Zayani, Abdelmalek Toumi and Ali Khalfallah

Algorithms 2025, 18(8), 519; https://doi.org/10.3390/a18080519 - 16 Aug 2025

Viewed by 774

Abstract

Synthetic aperture radar (SAR) image classification under limited data conditions faces two major challenges: inter-class similarity, where distinct radar targets (e.g., tanks and armored trucks) have nearly identical scattering characteristics, and intra-class variability, caused by speckle noise, pose changes, and differences in depression [...] Read more.

Synthetic aperture radar (SAR) image classification under limited data conditions faces two major challenges: inter-class similarity, where distinct radar targets (e.g., tanks and armored trucks) have nearly identical scattering characteristics, and intra-class variability, caused by speckle noise, pose changes, and differences in depression angle. To address these challenges, we propose MHD-ProtoNet, a meta-learning framework that extends prototypical networks with two key innovations: margin-aware hard example mining to better separate confusable classes by enforcing prototype distance margins, and dual-loss optimization to refine embeddings and improve robustness to noise-induced variations. Evaluated on the MSTAR dataset in a five-way one-shot task, MHD-ProtoNet achieves

76.80 %

accuracy, outperforming the Hybrid Inference Network (HIN)

(74.70 %)

, as well as standard few-shot methods such as prototypical networks

(69.38 %)

, ST-PN

(72.54 %)

, and graph-based models like ADMM-GCN

(61.79 %)

and DGP-NET

(68.60 %)

. By explicitly mitigating inter-class ambiguity and intra-class noise, the proposed model enables robust SAR target recognition with minimal labeled data. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithms for Prediction, Control, Classification, Regression, and Intelligent Signal Processing in Industry)

► Show Figures

Figure 1

20 pages, 4191 KB

Open AccessArticle

A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification

by Gan Yang and Zhaohui Wang

Remote Sens. 2025, 17(16), 2800; https://doi.org/10.3390/rs17162800 - 13 Aug 2025

Viewed by 598

Abstract

Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual [...] Read more.

Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual interpretation. Traditional DL models require massive datasets to learn sophisticated feature representations, hindering their full potential in data-scarce contexts. To tackle this issue, a deep transfer contrastive learning network is proposed. A spectral data augmentation module is incorporated to expand limited sample pairs. Subsequently, a spatial–spectral feature extraction module is designed to fuse the learned feature information. The weights of the spatial feature extraction network are initialized with knowledge transferred from source-domain pretraining, while the spectral residual network acquires rich spectral information. Furthermore, contrastive learning is integrated to enhance discriminative representation learning from scarce samples, effectively mitigating obstacles arising from the high inter-class similarity and large intra-class variance inherent in HSIs. Experiments on four public HSI datasets demonstrate that our method achieves competitive performance against state-of-the-art approaches. Full article

► Show Figures

Figure 1

35 pages, 13933 KB

Open AccessArticle

EndoNet: A Multiscale Deep Learning Framework for Multiple Gastrointestinal Disease Classification via Endoscopic Images

by Omneya Attallah, Muhammet Fatih Aslan and Kadir Sabanci

Diagnostics 2025, 15(16), 2009; https://doi.org/10.3390/diagnostics15162009 - 11 Aug 2025

Viewed by 599

Abstract

Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns [...] Read more.

Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns remains difficult. Methods: Many existing computer-aided diagnostic (CAD) systems rely on manually crafted features or single deep learning (DL) models, which often fail to capture the complex and varied characteristics of GI diseases. In this study, we proposed “EndoNet,” a multi-stage hybrid DL framework for eight-class GI disease classification using WCE images. Features were extracted from two different layers of three pre-trained convolutional neural networks (CNNs) (Inception, Xception, ResNet101), with both inter-layer and inter-model feature fusion performed. Dimensionality reduction was achieved using Non-Negative Matrix Factorization (NNMF), followed by selection of the most informative features via the Minimum Redundancy Maximum Relevance (mRMR) method. Results: Two datasets were used to evaluate the performance of EndoNer, including Kvasir v2 and HyperKvasir. Classification using seven different Machine Learning algorithms achieved a maximum accuracy of 97.8% and 98.4% for Kvasir v2 and HyperKvasir datasets, respectively. Conclusions: By integrating transfer learning with feature engineering, dimensionality reduction, and feature selection, EndoNet provides high accuracy, flexibility, and interpretability. This framework offers a powerful and generalizable artificial intelligence solution suitable for clinical decision support systems. Full article

(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Clinical Classification and Prediction)

► Show Figures

Figure 1

Search Results (241)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (241)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI