MDPI - Publisher of Open Access Journals

24 pages, 2508 KiB

Open AccessArticle

Class-Discrepancy Dynamic Weighting for Cross-Domain Few-Shot Hyperspectral Image Classification

by Chen Ding, Jiahao Yue, Sirui Zheng, Yizhuo Dong, Wenqiang Hua, Xueling Chen, Yu Xie, Song Yan, Wei Wei and Lei Zhang

Remote Sens. 2025, 17(15), 2605; https://doi.org/10.3390/rs17152605 - 27 Jul 2025

Viewed by 256

Abstract

In recent years, cross-domain few-shot learning (CDFSL) has demonstrated remarkable performance in hyperspectral image classification (HSIC), partially alleviating the distribution shift problem. However, most domain adaptation methods rely on similarity metrics to establish cross-domain class matching, making it difficult to simultaneously account for [...] Read more.

In recent years, cross-domain few-shot learning (CDFSL) has demonstrated remarkable performance in hyperspectral image classification (HSIC), partially alleviating the distribution shift problem. However, most domain adaptation methods rely on similarity metrics to establish cross-domain class matching, making it difficult to simultaneously account for intra-class sample size variations and inherent inter-class differences. To address this problem, existing studies have introduced a class weighting mechanism within the prototype network framework, determining class weights by calculating inter-sample similarity through distance metrics. However, this method suffers from a dual limitation: susceptibility to noise interference and insufficient capacity to capture global class variations, which may lead to distorted weight allocation and consequently result in alignment bias. To solve these issues, we propose a novel class-discrepancy dynamic weighting-based cross-domain FSL (CDDW-CFSL) framework. It integrates three key components: (1) the class-weighted domain adaptation (CWDA) method dynamically measures cross-domain distribution shifts using global class mean discrepancies. It employs discrepancy-sensitive weighting to strengthen the alignment of critical categories, enabling accurate domain adaptation while maintaining feature topology; (2) the class mean refinement (CMR) method incorporates class covariance distance to compute distribution discrepancies between support set samples and class prototypes, enabling the precise capture of cross-domain feature internal structures; (3) a novel multi-dimensional feature extractor that captures both local spatial details and continuous spectral characteristics simultaneously, facilitating deep cross-dimensional feature fusion. The results in three publicly available HSIC datasets show the effectiveness of the CDDW-CFSL. Full article

► Show Figures

Figure 1

19 pages, 28897 KiB

Open AccessArticle

MetaRes-DMT-AS: A Meta-Learning Approach for Few-Shot Fault Diagnosis in Elevator Systems

by Hongming Hu, Shengying Yang, Yulai Zhang, Jianfeng Wu, Liang He and Jingsheng Lei

Sensors 2025, 25(15), 4611; https://doi.org/10.3390/s25154611 - 25 Jul 2025

Viewed by 237

Abstract

Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust [...] Read more.

Recent advancements in deep learning have spurred significant research interest in fault diagnosis for elevator systems. However, conventional approaches typically require substantial labeled datasets that are often impractical to obtain in real-world industrial environments. This limitation poses a fundamental challenge for developing robust diagnostic models capable of performing reliably under data-scarce conditions. To address this critical gap, we propose MetaRes-DMT-AS (Meta-ResNet with Dynamic Meta-Training and Adaptive Scheduling), a novel meta-learning framework for few-shot fault diagnosis. Our methodology employs Gramian Angular Fields to transform 1D raw sensor data into 2D image representations, followed by episodic task construction through stochastic sampling. During meta-training, the system acquires transferable prior knowledge through optimized parameter initialization, while an adaptive scheduling module dynamically configures support/query sets. Subsequent regularization via prototype networks ensures stable feature extraction. Comprehensive validation using the Case Western Reserve University bearing dataset and proprietary elevator acceleration data demonstrates the framework’s superiority: MetaRes-DMT-AS achieves state-of-the-art few-shot classification performance, surpassing benchmark models by 0.94–1.78% in overall accuracy. For critical few-shot fault categories—particularly emergency stops and severe vibrations—the method delivers significant accuracy improvements of 3–16% and 17–29%, respectively. Full article

(This article belongs to the Special Issue Signal Processing and Sensing Technologies for Fault Diagnosis)

► Show Figures

Figure 1

23 pages, 10648 KiB

Open AccessArticle

Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

by Aili Wang, Kang Zhang, Haibin Wu, Haisong Chen and Minhui Wang

Electronics 2025, 14(15), 2952; https://doi.org/10.3390/electronics14152952 - 24 Jul 2025

Viewed by 182

Abstract

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a [...] Read more.

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarity between different data sources. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that utilizing spatial texture information and rich category information of multi-source data, and through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

15 pages, 3893 KiB

Open AccessArticle

Exploration of 3D Few-Shot Learning Techniques for Classification of Knee Joint Injuries on MR Images

by Vinh Hiep Dang, Minh Tri Nguyen, Ngoc Hoang Le, Thuan Phat Nguyen, Quoc-Viet Tran, Tan Ha Mai, Vu Pham Thao Vy, Truong Nguyen Khanh Hung, Ching-Yu Lee, Ching-Li Tseng, Nguyen Quoc Khanh Le and Phung-Anh Nguyen

Diagnostics 2025, 15(14), 1808; https://doi.org/10.3390/diagnostics15141808 - 18 Jul 2025

Viewed by 396

Abstract

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning [...] Read more.

Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning (FSL) addresses this by enabling models to classify new conditions from minimal annotated examples, often leveraging knowledge from related tasks. However, creating robust 3D FSL frameworks for varied knee injuries remains challenging. Methods: We introduce MedNet-FS, a 3D FSL framework that effectively classifies knee injuries by utilizing domain-specific pre-trained weights and generalized end-to-end (GE2E) loss for discriminative embeddings. Results: MedNet-FS, with knee-MRI-specific pre-training, significantly outperformed models using generic or other medical pre-trained weights and approached supervised learning performance on internal datasets with limited samples (e.g., achieving an area under the curve (AUC) of 0.76 for ACL tear classification with k = 40 support samples on the MRNet dataset). External validation on the KneeMRI dataset revealed challenges in classifying partially torn ACL (AUC up to 0.58) but demonstrated promising performance for distinguishing intact versus fully ruptured ACLs (AUC 0.62 with k = 40). Conclusions: These findings demonstrate that tailored FSL strategies can substantially reduce data dependency in developing specialized medical imaging tools. This approach fosters rapid AI tool development for knee injuries and offers a scalable solution for data scarcity in other medical imaging domains, potentially democratizing AI-assisted diagnostics, particularly for rare conditions or in resource-limited settings. Full article

(This article belongs to the Special Issue New Technologies and Tools Used for Risk Assessment of Diseases)

► Show Figures

Figure 1

16 pages, 2355 KiB

Open AccessArticle

Generalising Stock Detection in Retail Cabinets with Minimal Data Using a DenseNet and Vision Transformer Ensemble

by Babak Rahi, Deniz Sagmanli, Felix Oppong, Direnc Pekaslan and Isaac Triguero

Mach. Learn. Knowl. Extr. 2025, 7(3), 66; https://doi.org/10.3390/make7030066 - 16 Jul 2025

Viewed by 278

Abstract

Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can [...] Read more.

Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can deviate from the training conditions, often necessitating manual intervention. As a real-world industry problem, we aim to automate stock level estimation in retail cabinets. As technology advances, new cabinet models with varying shapes emerge alongside new camera types. This evolving scenario poses a substantial obstacle to deploying long-term, scalable solutions. To surmount the challenge of generalising to new cabinet models and cameras with minimal amounts of sample images, this research introduces a new solution. This paper proposes a novel ensemble model that combines DenseNet-201 and Vision Transformer (ViT-B/8) architectures to achieve generalisation in stock-level classification. The novelty aspect of our solution comes from the fact that we combine a transformer with a DenseNet model in order to capture both the local, hierarchical details and the long-range dependencies within the images, improving generalisation accuracy with less data. Key contributions include (i) a novel DenseNet-201 + ViT-B/8 feature-level fusion, (ii) an adaptation workflow that needs only two images per class, (iii) a balanced layer-unfreezing schedule, (iv) a publicly described domain-shift benchmark, and (v) a 47 pp accuracy gain over four standard few-shot baselines. Our approach leverages fine-tuning techniques to adapt two pre-trained models to the new retail cabinets (i.e., standing or horizontal) and camera types using only two images per class. Experimental results demonstrate that our method achieves high accuracy rates of 91% on new cabinets with the same camera and 89% on new cabinets with different cameras, significantly outperforming standard few-shot learning methods. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

29 pages, 5825 KiB

Open AccessArticle

BBSNet: An Intelligent Grading Method for Pork Freshness Based on Few-Shot Learning

by Chao Liu, Jiayu Zhang, Kunjie Chen and Jichao Huang

Foods 2025, 14(14), 2480; https://doi.org/10.3390/foods14142480 - 15 Jul 2025

Viewed by 308

Abstract

Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with [...] Read more.

Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with a limited number of images. BBSNet incorporates a batch channel normalization (BCN) layer to enhance feature distinguishability and employs BiFormer for optimized fine-grained feature extraction. Trained on a dataset of 600 pork images graded by microbial cell concentration, BBSNet achieved an average accuracy of 96.36% in a challenging 5-way 80-shot task. This approach significantly reduces data dependency while maintaining high accuracy, presenting a viable solution for cost-effective real-time pork quality monitoring. This work introduces a novel framework that connects laboratory freshness indicators to industrial applications in data-scarce conditions. Future research will investigate its extension to various food types and optimization for deployment on portable devices. Full article

(This article belongs to the Special Issue Conventional and Emerging Meat Processing Techniques for Improved Shelf Life and Quality)

► Show Figures

Figure 1

22 pages, 3279 KiB

Open AccessArticle

HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception

by Dongmei Song, Shuzhen Wang, Bin Wang, Weimin Chen and Lei Chen

J. Mar. Sci. Eng. 2025, 13(7), 1340; https://doi.org/10.3390/jmse13071340 - 13 Jul 2025

Viewed by 295

Abstract

Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is [...] Read more.

Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is difficult to obtain a large number of labeled samples in real oil spill monitoring scenarios. Surprisingly, few-shot learning can achieve excellent classification performance with only a small number of labeled samples. In this context, a new cross-domain few-shot SAR oil spill detection network is proposed in this paper. Significantly, the network is embedded with a hybrid attention feature extraction block, which consists of a coordinate attention module to perceive the channel information and spatial location information, as well as a global self-attention transformer module capturing the global dependencies and a multi-scale self-attention module depicting the local detailed features, thereby achieving deep mining and accurate characterization of image features. In addition, to address the problem that it is difficult to distinguish between the suspected oil film in seawater and real oil film using few-shot due to the small difference in features, this paper proposes a double loss function category determination block, which consists of two parts: a well-designed category-perception loss function and a traditional cross-entropy loss function. The category-perception loss function optimizes the spatial distribution of sample features by shortening the distance between similar samples while expanding the distance between different samples. By combining the category-perception loss function with the cross-entropy loss function, the network’s performance in discriminating between real and suspected oil films is thus maximized. The experimental results effectively demonstrate that this study provides an effective solution for high-precision oil spill detection under few-shot conditions, which is conducive to the rapid identification of oil spill accidents. Full article

(This article belongs to the Section Marine Environmental Science)

► Show Figures

Figure 1

22 pages, 3025 KiB

Open AccessArticle

A Novel Hybrid Technique for Detecting and Classifying Hyperspectral Images of Tomato Fungal Diseases Based on Deep Feature Extraction and Manhattan Distance

by Guifu Ma, Seyed Mohamad Javidan, Yiannis Ampatzidis and Zhao Zhang

Sensors 2025, 25(14), 4285; https://doi.org/10.3390/s25144285 - 9 Jul 2025

Viewed by 301

Abstract

Accurate and early detection of plant diseases is essential for effective management and the advancement of sustainable smart agriculture. However, building large annotated datasets for disease classification is often costly and time-consuming, requiring expert input. To address this challenge, this study explores the [...] Read more.

Accurate and early detection of plant diseases is essential for effective management and the advancement of sustainable smart agriculture. However, building large annotated datasets for disease classification is often costly and time-consuming, requiring expert input. To address this challenge, this study explores the integration of few-shot learning with hyperspectral imaging to detect four major fungal diseases in tomato plants: Alternaria alternata, Alternaria solani, Botrytis cinerea, and Fusarium oxysporum. Following inoculation, hyperspectral images were captured every other day from Day 1 to Day 7 post inoculation. The proposed hybrid method includes three main steps: (1) preprocessing of hyperspectral image cubes, (2) deep feature extraction using the EfficientNet model, and (3) classification using Manhattan distance within a few-shot learning framework. This combination leverages the strengths of both spectral imaging and deep learning for robust detection with minimal data. The few-shot learning approach achieved high detection accuracies of 85.73%, 80.05%, 90.33%, and 82.09% for A. alternata, A. solani, B. cinerea, and F. oxysporum, respectively, based on data collected on Day 7 post inoculation using only three training images per class. Accuracy improved over time, reflecting the progressive nature of symptom development and the model’s adaptability with limited data. Notably, A. alternata and B. cinerea were reliably detected by Day 3, while A. solani and F. oxysporum reached dependable detection levels by Day 5. Routine visual assessments showed that A. alternata and B. cinerea developed visible symptoms by Day 5, whereas A. solani and F. oxysporum remained asymptomatic until Day 7. The model’s ability to detect infections up to two days before visual symptoms emerged highlights its value for pre-symptomatic diagnosis. These findings support the use of few-shot learning and hyperspectral imaging for early, accurate disease detection, offering a practical solution for precision agriculture and timely intervention. Full article

(This article belongs to the Special Issue Advances in Sensor Technologies and Measurement Techniques for Smart Agri-Food)

► Show Figures

Figure 1

18 pages, 580 KiB

Open AccessArticle

Feature Transformation-Based Few-Shot Class-Incremental Learning

by Xubo Zhang and Yang Luo

Algorithms 2025, 18(7), 422; https://doi.org/10.3390/a18070422 - 9 Jul 2025

Viewed by 326

Abstract

In the process of few-shot class-incremental learning, the limited number of samples for newly introduced classes makes it difficult to adequately adapt model parameters, resulting in poor feature representations for these classes. To address this issue, this paper proposes a feature transformation method [...] Read more.

In the process of few-shot class-incremental learning, the limited number of samples for newly introduced classes makes it difficult to adequately adapt model parameters, resulting in poor feature representations for these classes. To address this issue, this paper proposes a feature transformation method that mitigates feature degradation in few-shot incremental learning. The transformed features better align with the ideal feature distribution required by an optimal classifier, thereby alleviating performance decline during incremental updates. Before classification, the method learns a well-conditioned linear mapping from the available base classes. After classification, both class prototypes and query samples are projected into the transformed feature space to improve the overall feature distribution. Experimental results on three benchmark datasets demonstrate that the proposed method achieves strong performance: it reduces performance degradation to 24.85 percentage points on miniImageNet, 24.45 on CIFAR100, and 24.14 on CUB, consistently outperforming traditional methods such as iCaRL (44.13–50.71 points degradation) and recent techniques like FeTrIL and PL-FSCIL. Further analysis shows that the transformed features bring class prototypes significantly closer to the theoretically optimal equiangular configuration described by neural collapse, highlighting the effectiveness of the proposed approach. Full article

► Show Figures

Figure 1

35 pages, 2865 KiB

Open AccessArticle

eyeNotate: Interactive Annotation of Mobile Eye Tracking Data Based on Few-Shot Image Classification

by Michael Barz, Omair Shahzad Bhatti, Hasan Md Tusfiqur Alam, Duy Minh Ho Nguyen, Kristin Altmeyer, Sarah Malone and Daniel Sonntag

J. Eye Mov. Res. 2025, 18(4), 27; https://doi.org/10.3390/jemr18040027 - 7 Jul 2025

Viewed by 448

Abstract

Mobile eye tracking is an important tool in psychology and human-centered interaction design for understanding how people process visual scenes and user interfaces. However, analyzing recordings from head-mounted eye trackers, which typically include an egocentric video of the scene and a gaze signal, [...] Read more.

Mobile eye tracking is an important tool in psychology and human-centered interaction design for understanding how people process visual scenes and user interfaces. However, analyzing recordings from head-mounted eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we develop eyeNotate, a web-based annotation tool that enables semi-automatic data annotation and learns to improve from corrective user feedback. Users can manually map fixation events to areas of interest (AOIs) in a video-editing-style interface (baseline version). Further, our tool can generate fixation-to-AOI mapping suggestions based on a few-shot image classification model (IML-support version). We conduct an expert study with trained annotators (n = 3) to compare the baseline and IML-support versions. We measure the perceived usability, annotations’ validity and reliability, and efficiency during a data annotation task. We asked our participants to re-annotate data from a single individual using an existing dataset (n = 48). Further, we conducted a semi-structured interview to understand how participants used the provided IML features and assessed our design decisions. In a post hoc experiment, we investigate the performance of three image classification models in annotating data of the remaining 47 individuals. Full article

► Show Figures

Figure 1

30 pages, 11197 KiB

Open AccessArticle

Few-Shot Unsupervised Domain Adaptation Based on Refined Bi-Directional Prototypical Contrastive Learning for Cross-Scene Hyperspectral Image Classification

by Xuebin Tang, Hanyi Shi, Chunchao Li, Cheng Jiang, Xiaoxiong Zhang, Lingbin Zeng and Xiaolei Zhou

Remote Sens. 2025, 17(13), 2305; https://doi.org/10.3390/rs17132305 - 4 Jul 2025

Viewed by 517

Abstract

Hyperspectral image cross-scene classification (HSICC) tasks are confronted with tremendous challenges due to spectral shift phenomena across scenes and the tough work of obtaining labels. Unsupervised domain adaptation has proven its effectiveness in tackling this issue, but it has a fatal limitation of [...] Read more.

Hyperspectral image cross-scene classification (HSICC) tasks are confronted with tremendous challenges due to spectral shift phenomena across scenes and the tough work of obtaining labels. Unsupervised domain adaptation has proven its effectiveness in tackling this issue, but it has a fatal limitation of intending to narrow the disparity between source and target domains by utilizing fully labeled source data and unlabeled target data. However, it is costly even to attain labels from source domains in many cases, rendering sufficient labeling as used in prior work impractical. In this work, we investigate an extreme and realistic scenario where unsupervised domain adaptation methods encounter sparsely labeled source data when handling HSICC tasks, namely, few-shot unsupervised domain adaptation. We propose an end-to-end refined bi-directional prototypical contrastive learning (RBPCL) framework for overcoming the HSICC problem with only a few labeled samples in the source domain. RBPCL captures category-level semantic features of hyperspectral data and performs feature alignment through in-domain refined prototypical self-supervised learning and bi-directional cross-domain prototypical contrastive learning, respectively. Furthermore, our framework introduces the class-balanced multicentric dynamic prototype strategy to generate more robust and representative prototypes. To facilitate prototype contrastive learning, we employ a Siamese-style distance metric loss function to aggregate intra-class features while increasing the discrepancy of inter-class features. Finally, extensive experiments and ablation analysis implemented on two public cross-scene data pairs and three pairs of self-collected ultralow-altitude hyperspectral datasets under different illumination conditions verify the effectiveness of our method, which will further enhance the practicality of hyperspectral intelligent sensing technology. Full article

► Show Figures

Graphical abstract

23 pages, 3791 KiB

Open AccessArticle

A Method for Few-Shot Radar Target Recognition Based on Multimodal Feature Fusion

by Yongjing Zhou, Yonggang Li and Weigang Zhu

Sensors 2025, 25(13), 4162; https://doi.org/10.3390/s25134162 - 4 Jul 2025

Viewed by 362

Abstract

Enhancing generalization capabilities and robustness in scenarios with limited sample sizes, while simultaneously decreasing reliance on extensive and high-quality datasets, represents a significant area of inquiry within the domain of radar target recognition. This study introduces a few-shot learning framework that leverages multimodal [...] Read more.

Enhancing generalization capabilities and robustness in scenarios with limited sample sizes, while simultaneously decreasing reliance on extensive and high-quality datasets, represents a significant area of inquiry within the domain of radar target recognition. This study introduces a few-shot learning framework that leverages multimodal feature fusion. We develop a cross-modal representation optimization mechanism tailored for the target recognition task by incorporating natural resonance frequency features that elucidate the target’s scattering characteristics. Furthermore, we establish a multimodal fusion classification network that integrates bi-directional long short-term memory and residual neural network architectures, facilitating deep bimodal fusion through an encoding-decoding framework augmented by an energy embedding strategy. To optimize the model, we propose a cross-modal equilibrium loss function that amalgamates similarity metrics from diverse features with cross-entropy loss, thereby guiding the optimization process towards enhancing metric spatial discrimination and balancing classification performance. Empirical results derived from simulated datasets indicate that the proposed methodology achieves a recognition accuracy of 95.36% in the 5-way 1-shot task, surpassing traditional unimodal image and concatenation fusion feature approaches by 2.26% and 8.73%, respectively. Additionally, the inter-class feature separation is improved by 18.37%, thereby substantiating the efficacy of the proposed method. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

25 pages, 34278 KiB

Open AccessArticle

Complementary Local–Global Optimization for Few-Shot Object Detection in Remote Sensing

by Yutong Zhang, Xin Lyu, Xin Li, Siqi Zhou, Yiwei Fang, Chenlong Ding, Shengkai Gao and Jiale Chen

Remote Sens. 2025, 17(13), 2136; https://doi.org/10.3390/rs17132136 - 21 Jun 2025

Viewed by 580

Abstract

Few-shot object detection (FSOD) in remote sensing remains challenging due to the scarcity of annotated samples and the complex background environments in aerial images. Existing methods often struggle to capture fine-grained local features or suffer from bias during global adaptation to novel categories, [...] Read more.

Few-shot object detection (FSOD) in remote sensing remains challenging due to the scarcity of annotated samples and the complex background environments in aerial images. Existing methods often struggle to capture fine-grained local features or suffer from bias during global adaptation to novel categories, leading to misclassification as background. To address these issues, we propose a framework that simultaneously enhances local feature learning and global feature adaptation. Specifically, we design an Extensible Local Feature Aggregator Module (ELFAM) that reconstructs object structures via multi-scale recursive attention aggregation. We further introduce a Self-Guided Novel Adaptation (SGNA) module that employs a teacher-student collaborative strategy to generate high-quality pseudo-labels, thereby refining the semantic feature distribution of novel categories. In addition, a Teacher-Guided Dual-Branch Head (TG-DH) is developed to supervise both classification and regression using pseudo-labels generated by the teacher model to further stabilize and enhance the semantic features of novel classes. Extensive experiments on DlOR and iSAlD datasets demonstrate that our method achieves superior performance compared to existing state-of-the-art FSOD approaches and simultaneously validate the effectiveness of all proposed components. Full article

(This article belongs to the Special Issue Efficient Object Detection Based on Remote Sensing Images)

► Show Figures

Figure 1

18 pages, 11805 KiB

Open AccessArticle

VL-PAW: A Vision–Language Dataset for Pear, Apple and Weed

by Gwang-Hyun Yu, Le Hoang Anh, Dang Thanh Vu, Jin Lee, Zahid Ur Rahman, Heon-Zoo Lee, Jung-An Jo and Jin-Young Kim

Electronics 2025, 14(10), 2087; https://doi.org/10.3390/electronics14102087 - 21 May 2025

Viewed by 514

Abstract

Vision–language models (VLMs) have achieved remarkable success in natural image domains, yet their potential remains underexplored in agriculture due to the lack of high-quality, joint image–text datasets. To address this limitation, we introduce VL-PAW (Vision–Language dataset for Pear, [...] Read more.

Vision–language models (VLMs) have achieved remarkable success in natural image domains, yet their potential remains underexplored in agriculture due to the lack of high-quality, joint image–text datasets. To address this limitation, we introduce VL-PAW (Vision–Language dataset for Pear, Apple, and Weed), a dataset comprising 3.9 K image–caption pairs for two key agricultural tasks: weed species classification and fruit inspection. We fine-tune the CLIP model on VL-PAW and gain several insights. First, the model demonstrates impressive zero-shot performance, achieving 98.21% accuracy in classifying coarse labels. Second, for fine-grained categories, the vision–language model outperforms vision-only models in both few-shot settings and entire dataset training (1-shot: 56.79%; 2-shot: 72.82%; 3-shot: 74.49%; 10-shot: 83.85%). Third, using intuitive captions enhances fine-grained fruit inspection performance compared to using class names alone. These findings demonstrate the applicability of VLMs in future agricultural querying systems. Full article

(This article belongs to the Collection Image and Video Analysis and Understanding)

► Show Figures

Figure 1

24 pages, 1736 KiB

Open AccessArticle

ProFusion: Multimodal Prototypical Networks for Few-Shot Learning with Feature Fusion

by Jia Zhao, Ziyang Cao, Huiling Wang, Xu Wang and Yingzhou Chen

Symmetry 2025, 17(5), 796; https://doi.org/10.3390/sym17050796 - 20 May 2025

Viewed by 791

Abstract

Existing few-shot learning models leverage vision-language pre-trained models to alleviate the data scarcity problem. However, such models usually process visual and text information separately, which causes still inherent disparities between cross-modal features. Therefore, we propose the ProFusion model, which leverages multimodal pre-trained models [...] Read more.

Existing few-shot learning models leverage vision-language pre-trained models to alleviate the data scarcity problem. However, such models usually process visual and text information separately, which causes still inherent disparities between cross-modal features. Therefore, we propose the ProFusion model, which leverages multimodal pre-trained models and prototypical networks to construct multiple prototypes. Specifically, ProFusion generates image and text prototypes symmetrically using the visual encoder and text encoder, while integrating visual and text information through the fusion module to create more expressive multimodal feature fusion prototypes. Additionally, we introduce the alignment module to ensure consistency between image and text prototypes. During inference, ProFusion calculates the similarity of test images to the three types of prototypes separately and applies a weighted sum to generate the final prediction. Experiments demonstrate that ProFusion performs outstanding classification tasks on 15 benchmark datasets. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)

► Show Figures

Figure 1

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (165)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI