Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (241)

Search Parameters:
Keywords = inter-class similarity

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 14242 KB  
Article
DBA-YOLO: A Dense Target Detection Model Based on Lightweight Neural Networks
by Zhiyong He, Jiahong Yang, Hongtian Ning, Chengxuan Li and Qiang Tang
J. Imaging 2025, 11(10), 345; https://doi.org/10.3390/jimaging11100345 - 4 Oct 2025
Abstract
Current deep learning-based dense target detection models face dual challenges in industrial scenarios: high computational complexity leading to insufficient inference efficiency on mobile devices, and missed/false detections caused by dense small targets, high inter-class similarity, and complex background interference. To address these issues, [...] Read more.
Current deep learning-based dense target detection models face dual challenges in industrial scenarios: high computational complexity leading to insufficient inference efficiency on mobile devices, and missed/false detections caused by dense small targets, high inter-class similarity, and complex background interference. To address these issues, this paper proposes DBA-YOLO, a lightweight model based on YOLOv10, which significantly reduces computational complexity through model compression and algorithm optimization while maintaining high accuracy. Key improvements include the following: (1) a C2f PA module for enhanced feature extraction, (2) a parameter-refined BIMAFPN neck structure to improve small target detection, and (3) a DyDHead module integrating scale, space, and task awareness for spatial feature weighting. To validate DBA-YOLO, we constructed a real-world dataset from cigarette package images. Experiments on SKU-110K and our dataset show that DBA-YOLO achieves 91.3% detection accuracy (1.4% higher than baseline), with mAP and mAP75 improvements of 2–3%. Additionally, the model reduces parameters by 3.6%, balancing efficiency and performance for resource-constrained devices. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 2628 KB  
Article
Importance-Weighted Locally Adaptive Prototype Extraction Network for Few-Shot Detection
by Haibin Wang, Yong Tao, Zhou Zhou, Yue Wang, Xu Fan and Xiangjun Wang
Sensors 2025, 25(19), 5945; https://doi.org/10.3390/s25195945 - 23 Sep 2025
Viewed by 194
Abstract
Few-Shot Object Detection (FSOD) aims to identify new object categories with a limited amount of labeled data, which holds broad application prospects in real-life scenarios. Previous approaches usually ignore attention to critical information, which leads to the generation of low-quality prototypes and suboptimal [...] Read more.
Few-Shot Object Detection (FSOD) aims to identify new object categories with a limited amount of labeled data, which holds broad application prospects in real-life scenarios. Previous approaches usually ignore attention to critical information, which leads to the generation of low-quality prototypes and suboptimal performance in few-shot scenarios. To overcome the defect, an improved FSOD network is proposed in this paper, which mimics the human visual attention mechanism by emphasizing areas that are semantically important and rich in spatial information. Specifically, an Importance-Weighted Local Adaptive Prototype module is first introduced, which highlights key local features of support samples, and more expressive class prototypes are generated by assigning greater weights to salient regions so that generalization ability is effectively enhanced under few-shot settings. Secondly, an Imbalanced Diversity Sampling module is utilized to select diverse and challenging negative sample prototypes, which enhances inter-class separability and reduces confusion among visually similar categories. Moreover, a Weighted Non-Linear Fusion module is designed to integrate various forms of feature interaction. The contributions of the feature interactions are modulated by learnable importance weights, which improve the effect of feature fusion. Extensive experiments on PASCAL VOC and MS COCO benchmarks validate the effectiveness of our method. The experimental results reflect the fact that the mean average precision from our method is improved by 2.84% on the PASCAL VOC dataset compared with Fine-Grained Prototypes Distillation (FPD), and the AP from our method surpasses the recent FPD baseline by 0.8% and 1.8% on the MS COCO dataset, respectively. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

22 pages, 3632 KB  
Article
RFR-YOLO-Based Recognition Method for Dairy Cow Behavior in Farming Environments
by Congcong Li, Jialong Ma, Shifeng Cao and Leifeng Guo
Agriculture 2025, 15(18), 1952; https://doi.org/10.3390/agriculture15181952 - 15 Sep 2025
Viewed by 407
Abstract
Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity [...] Read more.
Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity among different cow behaviors. To address these limitations, this study introduces an enhanced target detection algorithm for cow behavior recognition, termed RFR-YOLO, which is developed upon the YOLOv11n framework. A well-structured dataset encompassing nine distinct cow behaviors—namely, lying, standing, walking, eating, drinking, licking, grooming, estrus, and limping—is constructed, comprising a total of 13,224 labeled samples. The proposed algorithm incorporates three major technical improvements: First, an Inverted Dilated Convolution module (Region Semantic Inverted Convolution, RsiConv) is designed and seamlessly integrated with the C3K2 module to form the C3K2_Rsi module, which effectively reduces computational overhead while enhancing feature representation. Second, a Four-branch Multi-scale Dilated Attention mechanism (Four Multi-Scale Dilated Attention, FMSDA) is incorporated into the network architecture, enabling the scale-specific features to align with the corresponding receptive fields, thereby improving the model’s capacity to capture multi-scale characteristics. Third, a Reparameterized Generalized Residual Feature Pyramid Network (Reparameterized Generalized Residual-FPN, RepGRFPN) is introduced as the Neck component, allowing for the features to propagate through differentiated pathways and enabling flexible control over multi-scale feature expression, thereby facilitating efficient feature fusion and mitigating the impact of behavioral similarity. The experimental results demonstrate that RFR-YOLO achieves precision, recall, mAP50, and mAP50:95 values of 95.9%, 91.2%, 94.9%, and 85.2%, respectively, representing performance gains of 5.5%, 5%, 5.6%, and 3.5% over the baseline model. Despite a marginal increase in computational complexity of 1.4G, the algorithm retains a high detection speed of 147.6 frames per second. The proposed RFR-YOLO algorithm significantly improves the accuracy and robustness of target detection in group cow farming scenarios. Full article
(This article belongs to the Section Farm Animal Production)
Show Figures

Figure 1

14 pages, 11856 KB  
Article
Few-Shot Fine-Grained Image Classification with Residual Reconstruction Network Based on Feature Enhancement
by Ying Liu, Haibin Zhang and Weidong Zhang
Appl. Sci. 2025, 15(18), 9953; https://doi.org/10.3390/app15189953 - 11 Sep 2025
Viewed by 295
Abstract
In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably [...] Read more.
In recent years, few-shot fine-grained image classification has shown great potential in addressing data scarcity and distinguishing highly similar categories. However, existing unidirectional reconstruction methods, while enhancing inter-class differences, fail to effectively suppress intra-class variations; bidirectional reconstruction methods, although alleviating intra-class variations, inevitably introduce background noise. To overcome these limitations, this paper proposes a Bidirectional Feature Reconstruction Network that incorporates a Feature Enhancement Attention Module (FEAM) to highlight discriminative regions and suppress background interference, while integrating a Channel-Aware Spatial Attention (CASA) module to strengthen local feature modeling and compensate for the Transformer’s tendency to overemphasize global information. This joint design not only enhances inter-class separability but also effectively reduces intra-class variation. Extensive experiments on the CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches, validating its effectiveness and robustness in few-shot fine-grained image classification. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

26 pages, 32504 KB  
Article
Smart Tourism Landmark Recognition: A Multi-Threshold Enhancement and Selective Ensemble Approach Using YOLO11
by Ulugbek Hudayberdiev, Junyeong Lee and Odil Fayzullaev
Sustainability 2025, 17(17), 8081; https://doi.org/10.3390/su17178081 - 8 Sep 2025
Viewed by 682
Abstract
Automated landmark recognition represents a cornerstone technology for advancing smart tourism systems, cultural heritage documentation, and enhanced visitor experiences. Contemporary deep learning methodologies have substantially transformed the accuracy and computational efficiency of destination classification tasks. Addressing critical gaps in existing approaches, we introduce [...] Read more.
Automated landmark recognition represents a cornerstone technology for advancing smart tourism systems, cultural heritage documentation, and enhanced visitor experiences. Contemporary deep learning methodologies have substantially transformed the accuracy and computational efficiency of destination classification tasks. Addressing critical gaps in existing approaches, we introduce an enhanced Samarkand_v2 dataset encompassing twelve distinct historical landmark categories with comprehensive environmental variability. Our methodology incorporates a systematic multi-threshold pixel intensification strategy, applying graduated enhancement transformations at intensity levels of 100, 150, and 225 to accentuate diverse architectural characteristics spanning from fine-grained textural elements to prominent reflective components. Four independent YOLO11 architectures were trained using original imagery alongside systematically enhanced variants, with optimal epoch preservation based on validation performance criteria. A key innovation lies in our intelligent selective ensemble mechanism that conducts exhaustive evaluation of model combinations, identifying optimal configurations through data-driven selection rather than conventional uniform weighting schemes. Experimental validation demonstrates substantial performance gains over established baseline architectures and traditional ensemble approaches, achieving exceptional metrics: 99.24% accuracy, 99.36% precision, 99.40% recall, and 99.36% F1-score. Rigorous statistical analysis via paired t-tests validates the significance of enhancement strategies, particularly demonstrating effectiveness of lower-threshold transformations in capturing architectural nuances. The framework exhibits remarkable resilience across challenging conditions including illumination variations, structural occlusions, and inter-class architectural similarities. These achievements establish the methodology’s substantial potential for practical smart tourism deployment, automated heritage preservation initiatives, and real-time mobile landmark recognition systems, contributing significantly to the advancement of intelligent tourism technologies. Full article
(This article belongs to the Special Issue Smart and Responsible Tourism: Innovations for a Sustainable Future)
Show Figures

Figure 1

32 pages, 14316 KB  
Article
FewMedical-XJAU: A Challenging Benchmark for Fine-Grained Medicinal Plant Classification
by Tao Zhang, Sheng Huang, Gulimila Kezierbieke, Yeerjiang Halimu and Hui Li
Sensors 2025, 25(17), 5499; https://doi.org/10.3390/s25175499 - 4 Sep 2025
Viewed by 855
Abstract
Fine-grained plant image classification (FPIC) aims to distinguish plant species with subtle visual differences, but existing datasets often suffer from limited category diversity, homogeneous backgrounds, and insufficient environmental variation, limiting their effectiveness in complex real-world scenarios. To address these challenges, a novel dataset, [...] Read more.
Fine-grained plant image classification (FPIC) aims to distinguish plant species with subtle visual differences, but existing datasets often suffer from limited category diversity, homogeneous backgrounds, and insufficient environmental variation, limiting their effectiveness in complex real-world scenarios. To address these challenges, a novel dataset, FewMedical-XJAU, is presented, focusing on rare medicinal plants native to Xinjiang, China. This dataset offers higher intra-class variability, more complex and diverse natural backgrounds, varied shooting angles and lighting conditions, and more rigorous expert annotations, providing a realistic testbed for FPIC tasks. Building on this, an improved method called BDCC (Bilinear Deep Cross-modal Composition) is proposed, which incorporates textual priors into a deep metric learning framework to enhance semantic discrimination. A Class-Aware Structured Text Prompt Construction strategy is introduced to improve the model’s semantic understanding, along with a dynamic fusion mechanism to address high inter-class similarity and intra-class variability. In few-shot classification experiments, the method demonstrates superior accuracy and robustness under complex environmental conditions, offering strong support for practical applications of fine-grained classification. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

18 pages, 1767 KB  
Article
A Blind Few-Shot Learning for Multimodal-Biological Signals with Fractal Dimension Estimation
by Nadeem Ullah, Seung Gu Kim, Jung Soo Kim, Min Su Jeong and Kang Ryoung Park
Fractal Fract. 2025, 9(9), 585; https://doi.org/10.3390/fractalfract9090585 - 3 Sep 2025
Viewed by 462
Abstract
Improving the decoding accuracy of biological signals has been a research focus for decades to advance health, automation, and robotic industries. However, challenges like inter-subject variability, data scarcity, and multifunctional variability cause low decoding accuracy, thus hindering the practical deployment of biological signal [...] Read more.
Improving the decoding accuracy of biological signals has been a research focus for decades to advance health, automation, and robotic industries. However, challenges like inter-subject variability, data scarcity, and multifunctional variability cause low decoding accuracy, thus hindering the practical deployment of biological signal paradigms. This paper proposes a multifunctional biological signals network (Multi-BioSig-Net) that addresses the aforementioned issues by devising a novel blind few-shot learning (FSL) technique to quickly adapt to multiple target domains without needing a pre-trained model. Specifically, our proposed multimodal similarity extractor (MMSE) and self-multiple domain adaptation (SMDA) modules address data scarcity and inter-subject variability issues by exploiting and enhancing the similarity between multimodal samples and quickly adapting the target domains by adaptively adjusting the parameters’ weights and position, respectively. For multifunctional learning, we proposed inter-function discriminator (IFD) that discriminates the classes by extracting inter-class common features and then subtracts them from both classes to avoid false prediction of the proposed model due to overfitting on the common features. Furthermore, we proposed a holistic-local fusion (HLF) module that exploits contextual-detailed features to adapt the scale-varying features across multiple functions. In addition, fractal dimension estimation (FDE) was employed for the classification of left-hand motor imagery (LMI) and right-hand motor imagery (RMI), confirming that proposed method can effectively extract the discriminative features for this task. The effectiveness of our proposed algorithm was assessed quantitatively and statistically against competent state-of-the-art (SOTA) algorithms utilizing three public datasets, demonstrating that our proposed algorithm outperformed SOTA algorithms. Full article
Show Figures

Figure 1

23 pages, 4776 KB  
Article
Category-Guided Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images
by Yue Ni, Jiahang Liu, Hui Zhang, Weijian Chi and Ji Luan
Remote Sens. 2025, 17(17), 3054; https://doi.org/10.3390/rs17173054 - 2 Sep 2025
Viewed by 947
Abstract
High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, [...] Read more.
High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, and high computational cost. Moreover, existing multi-scale fusion mechanisms are prone to semantic misalignment across levels, hindering effective information integration and reducing boundary clarity. To address these issues, a Category-Guided Transformer (CIGFormer) is proposed. Specifically, the Category-Information-Guided Transformer Module (CIGTM) integrates global and local branches: the global branch combines window-based self-attention (WSAM) and window adaptive pooling self-attention (WAPSAM), using class predictions to enhance global context modeling and reduce intra-class and inter-class confusion; the local branch extracts multi-scale structural features to refine semantic representation and boundaries. In addition, an Adaptive Wavelet Fusion Module (AWFM) is designed, which leverages wavelet decomposition and channel-spatial joint attention for dynamic multi-scale fusion while preserving structural details. Extensive experiments on the ISPRS Vaihingen and Potsdam datasets demonstrate that CIGFormer, with only 21.50 M parameters, achieves outstanding performance in small object recognition, boundary refinement, and complex scene parsing, showing strong potential for practical applications. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

19 pages, 4057 KB  
Article
Few-Shot Target Detection Algorithm Based on Adaptive Sampling Meta-DETR
by Zihao Ma, Gang Liu, Zhaoya Tong and Xiaoliang Fan
Electronics 2025, 14(17), 3506; https://doi.org/10.3390/electronics14173506 - 2 Sep 2025
Viewed by 520
Abstract
Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of [...] Read more.
Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of sampling points in the deformable attention is fixed, which limits the effective information involved in feature extraction, resulting in insufficient feature extraction of the target and affecting detection performance. To solve this problem, a Meta-DETR few-shot target detection algorithm based on adaptive sampling deformable attention is proposed. Firstly, the cosine similarity between feature points is calculated by query features that are integrated with support features. Secondly, the number of related features of each feature point is counted by the similarity threshold. Thirdly, the final number of sampling points of the feature map are calculated by using the idea of maximum inter-class variance to achieve adaptive sampling. Finally, adaptive sampling deformable attention is integrated into Meta-DETR to achieve few-shot target detection. From the attention activation map, it can be seen that the deformable attention based on adaptive sampling pays more attention to the target itself. Compared with Meta-DETR, the proposed algorithm improves the detection accuracy of novel classes by 0.9%, 0.7%, 1.4%, and 2.1%, respectively, for shots 1, 2, 3, and 10 in partition 1 on the PASCAL VOC dataset; 3.5%, 0.1%, 5.5%, and 5.7%, respectively, for shots 2, 3, 5, and 10 in partition 2; and 1.9%, 1.0%, 2.1%, and 0.1%, respectively, for shots 2, 3, 5, and 10 in partition 3. Compared with MPF-Net, CRK-Net, and FSCE, the proposed algorithm achieves the best performance and can effectively realize detection under few-shot conditions. In addition, experiments on a self-made infrared dataset further validate the effectiveness of the algorithm proposed in this paper. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

31 pages, 3554 KB  
Article
FFFNet: A Food Feature Fusion Model with Self-Supervised Clustering for Food Image Recognition
by Zhejun Kuang, Haobo Gao, Jian Zhao, Liu Wang and Lei Sun
Appl. Sci. 2025, 15(17), 9542; https://doi.org/10.3390/app15179542 - 29 Aug 2025
Viewed by 492
Abstract
With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion [...] Read more.
With the growing emphasis on healthy eating and nutrition management in modern society, food image recognition has become increasingly important. However, it faces challenges such as large intra-class differences and high inter-class similarities. To tackle these issues, we present a Food Feature Fusion Network (FFFNet), which leverages a multi-head cross-attention mechanism to integrate the local detail-capturing capability of Convolutional Neural Networks with the global modeling capacity of Vision Transformers. This enables the model to capture key discriminative features when addressing such challenging food recognition tasks. FFFNet also introduces self-supervised clustering, generating pseudo-labels from the feature space distribution and employing a clustering objective derived from Kullback–Leibler divergence to optimize the feature space distribution. By maximizing similarity between features and their corresponding cluster centers, and minimizing similarity with non-corresponding centers, it promotes intra-class compactness and inter-class separability, thereby addressing the core challenges. We evaluated FFFNet across the ISIA Food-500, ETHZ Food-101, and UEC Food256 datasets, attaining Top-1/Top-5 accuracies of 65.31%/88.94%, 89.98%/98.37%, and 80.91%/94.92%, respectively, outperforming existing approaches. Full article
Show Figures

Figure 1

25 pages, 3904 KB  
Article
Physics-Guided Multi-Representation Learning with Quadruple Consistency Constraints for Robust Cloud Detection in Multi-Platform Remote Sensing
by Qing Xu, Zichen Zhang, Guanfang Wang and Yunjie Chen
Remote Sens. 2025, 17(17), 2946; https://doi.org/10.3390/rs17172946 - 25 Aug 2025
Cited by 1 | Viewed by 733
Abstract
With the rapid expansion of multi-platform remote sensing applications, cloud contamination significantly impedes cross-platform data utilization. Current cloud detection methods face critical technical challenges in cross-platform settings, including neglect of atmospheric radiative transfer mechanisms, inadequate multi-scale structural decoupling, high intra-class variability coupled with [...] Read more.
With the rapid expansion of multi-platform remote sensing applications, cloud contamination significantly impedes cross-platform data utilization. Current cloud detection methods face critical technical challenges in cross-platform settings, including neglect of atmospheric radiative transfer mechanisms, inadequate multi-scale structural decoupling, high intra-class variability coupled with inter-class similarity, cloud boundary ambiguity, cross-modal feature inconsistency, and noise propagation in pseudo-labels within semi-supervised frameworks. To address these issues, we introduce a Physics-Guided Multi-Representation Network (PGMRN) that adopts a student–teacher architecture and fuses tri-modal representations—Pseudo-NDVI, structural, and textural features—via atmospheric priors and intrinsic image decomposition. Specifically, PGMRN first incorporates an InfoNCE contrastive loss to enhance intra-class compactness and inter-class discrimination while preserving physical consistency; subsequently, a boundary-aware regional adaptive weighted cross-entropy loss integrates PA-CAM confidence with distance transforms to refine edge accuracy; furthermore, an Uncertainty-Aware Quadruple Consistency Propagation (UAQCP) enforces alignment across structural, textural, RGB, and physical modalities; and finally, a dynamic confidence-screening mechanism that couples PA-CAM with information entropy and percentile-based thresholding robustly refines pseudo-labels. Extensive experiments on four benchmark datasets demonstrate that PGMRN achieves state-of-the-art performance, with Mean IoU values of 70.8% on TCDD, 79.0% on HRC_WHU, and 83.8% on SWIMSEG, outperforming existing methods. Full article
Show Figures

Figure 1

23 pages, 3739 KB  
Article
FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data
by Oussama Akram Bensiah and Rohallah Benaboud
Electronics 2025, 14(16), 3286; https://doi.org/10.3390/electronics14163286 - 19 Aug 2025
Viewed by 697
Abstract
Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge [...] Read more.
Federated learning (FL) has emerged as a powerful framework for decentralized model training, preserving data privacy by keeping datasets localized on distributed devices. However, data heterogeneity, characterized by significant variations in size, statistical distribution, and composition across client datasets, presents a persistent challenge that impairs model performance, compromises generalization, and delays convergence. To address these issues, we propose FedDPA, a novel framework that utilizes dynamic prototypical alignment. FedDPA operates in three stages. First, it computes class-specific prototypes for each client to capture local data distributions, integrating them into an adaptive regularization mechanism. Next, a hierarchical aggregation strategy clusters and combines prototypes from similar clients, which reduces communication overhead and stabilizes model updates. Finally, a contrastive alignment process refines the global model by enforcing intra-class compactness and inter-class separation in the feature space. These mechanisms work in concert to mitigate client drift and enhance global model performance. We conducted extensive evaluations on standard classification benchmarks—EMNIST, FEMNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet 200—under various non-identically and independently distributed (non-IID) scenarios. The results demonstrate the superiority of FedDPA over state-of-the-art methods, including FedAvg, FedNH, and FedROD. Our findings highlight FedDPA’s enhanced effectiveness, stability, and adaptability, establishing it as a scalable and efficient solution to the critical problem of data heterogeneity in federated learning. Full article
Show Figures

Figure 1

24 pages, 2115 KB  
Article
MHD-Protonet: Margin-Aware Hard Example Mining for SAR Few-Shot Learning via Dual-Loss Optimization
by Marii Zayani, Abdelmalek Toumi and Ali Khalfallah
Algorithms 2025, 18(8), 519; https://doi.org/10.3390/a18080519 - 16 Aug 2025
Viewed by 774
Abstract
Synthetic aperture radar (SAR) image classification under limited data conditions faces two major challenges: inter-class similarity, where distinct radar targets (e.g., tanks and armored trucks) have nearly identical scattering characteristics, and intra-class variability, caused by speckle noise, pose changes, and differences in depression [...] Read more.
Synthetic aperture radar (SAR) image classification under limited data conditions faces two major challenges: inter-class similarity, where distinct radar targets (e.g., tanks and armored trucks) have nearly identical scattering characteristics, and intra-class variability, caused by speckle noise, pose changes, and differences in depression angle. To address these challenges, we propose MHD-ProtoNet, a meta-learning framework that extends prototypical networks with two key innovations: margin-aware hard example mining to better separate confusable classes by enforcing prototype distance margins, and dual-loss optimization to refine embeddings and improve robustness to noise-induced variations. Evaluated on the MSTAR dataset in a five-way one-shot task, MHD-ProtoNet achieves 76.80% accuracy, outperforming the Hybrid Inference Network (HIN) (74.70%), as well as standard few-shot methods such as prototypical networks (69.38%), ST-PN (72.54%), and graph-based models like ADMM-GCN (61.79%) and DGP-NET (68.60%). By explicitly mitigating inter-class ambiguity and intra-class noise, the proposed model enables robust SAR target recognition with minimal labeled data. Full article
Show Figures

Figure 1

20 pages, 4191 KB  
Article
A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification
by Gan Yang and Zhaohui Wang
Remote Sens. 2025, 17(16), 2800; https://doi.org/10.3390/rs17162800 - 13 Aug 2025
Viewed by 598
Abstract
Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual [...] Read more.
Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual interpretation. Traditional DL models require massive datasets to learn sophisticated feature representations, hindering their full potential in data-scarce contexts. To tackle this issue, a deep transfer contrastive learning network is proposed. A spectral data augmentation module is incorporated to expand limited sample pairs. Subsequently, a spatial–spectral feature extraction module is designed to fuse the learned feature information. The weights of the spatial feature extraction network are initialized with knowledge transferred from source-domain pretraining, while the spectral residual network acquires rich spectral information. Furthermore, contrastive learning is integrated to enhance discriminative representation learning from scarce samples, effectively mitigating obstacles arising from the high inter-class similarity and large intra-class variance inherent in HSIs. Experiments on four public HSI datasets demonstrate that our method achieves competitive performance against state-of-the-art approaches. Full article
Show Figures

Figure 1

35 pages, 13933 KB  
Article
EndoNet: A Multiscale Deep Learning Framework for Multiple Gastrointestinal Disease Classification via Endoscopic Images
by Omneya Attallah, Muhammet Fatih Aslan and Kadir Sabanci
Diagnostics 2025, 15(16), 2009; https://doi.org/10.3390/diagnostics15162009 - 11 Aug 2025
Viewed by 599
Abstract
Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns [...] Read more.
Background: Gastrointestinal (GI) disorders present significant healthcare challenges, requiring rapid, accurate, and effective diagnostic methods to improve treatment outcomes and prevent complications. Wireless capsule endoscopy (WCE) is an effective tool for diagnosing GI abnormalities; however, precisely identifying diverse lesions with similar visual patterns remains difficult. Methods: Many existing computer-aided diagnostic (CAD) systems rely on manually crafted features or single deep learning (DL) models, which often fail to capture the complex and varied characteristics of GI diseases. In this study, we proposed “EndoNet,” a multi-stage hybrid DL framework for eight-class GI disease classification using WCE images. Features were extracted from two different layers of three pre-trained convolutional neural networks (CNNs) (Inception, Xception, ResNet101), with both inter-layer and inter-model feature fusion performed. Dimensionality reduction was achieved using Non-Negative Matrix Factorization (NNMF), followed by selection of the most informative features via the Minimum Redundancy Maximum Relevance (mRMR) method. Results: Two datasets were used to evaluate the performance of EndoNer, including Kvasir v2 and HyperKvasir. Classification using seven different Machine Learning algorithms achieved a maximum accuracy of 97.8% and 98.4% for Kvasir v2 and HyperKvasir datasets, respectively. Conclusions: By integrating transfer learning with feature engineering, dimensionality reduction, and feature selection, EndoNet provides high accuracy, flexibility, and interpretability. This framework offers a powerful and generalizable artificial intelligence solution suitable for clinical decision support systems. Full article
Show Figures

Figure 1

Back to TopTop