MDPI - Publisher of Open Access Journals

24 pages, 2308 KB

Open AccessReview

Review on Application of Machine Vision-Based Intelligent Algorithms in Gear Defect Detection

by Dehai Zhang, Shengmao Zhou, Yujuan Zheng and Xiaoguang Xu

Processes 2025, 13(10), 3370; https://doi.org/10.3390/pr13103370 - 21 Oct 2025

Gear defect detection directly affects the operational reliability of critical equipment in fields such as automotive and aerospace. Gear defect detection technology based on machine vision, leveraging the advantages of non-contact measurement, high efficiency, and cost-effectiveness, has become a key support for quality [...] Read more.

Gear defect detection directly affects the operational reliability of critical equipment in fields such as automotive and aerospace. Gear defect detection technology based on machine vision, leveraging the advantages of non-contact measurement, high efficiency, and cost-effectiveness, has become a key support for quality control in intelligent manufacturing. However, it still faces challenges including difficulties in semantic alignment of multimodal data, the imbalance between real-time detection requirements and computational resources, and poor model generalization in few-shot scenarios. This paper takes the paradigm evolution of gear defect detection technology as the main line, systematically reviews its development from traditional image processing to deep learning, and focuses on the innovative application of intelligent algorithms. A research framework of “technical bottleneck-breakthrough path-application verification” is constructed: for the problem of multimodal fusion, the cross-modal feature alignment mechanism based on Transformer network is deeply analyzed, clarifying its technical path of realizing joint embedding of visual and vibration signals by establishing global correlation mapping; for resource constraints, the performance of lightweight models such as MobileNet and ShuffleNet is quantitatively compared, verifying that these models reduce Parameters by 40–60% while maintaining the mean Average Precision essentially unchanged; for small-sample scenarios, few-shot generation models based on contrastive learning are systematically organized, confirming that their accuracy in the 10-shot scenario can reach 90% of that of fully supervised models, thus enhancing generalization ability. Future research can focus on the collaboration between few-shot generation and physical simulation, edge-cloud dynamic scheduling, defect evolution modeling driven by multiphysics fields, and standardization of explainable artificial intelligence. It aims to construct a gear detection system with autonomous perception capabilities, promoting the development of industrial quality inspection toward high-precision, high-robustness, and low-cost intelligence. Full article

(This article belongs to the Special Issue Simulation, Modeling, and Decision-Making Processes in Manufacturing Systems and Industrial Engineering, 2nd Edition)

► Show Figures

Figure 1

25 pages, 2968 KB

Open AccessArticle

ECSA: Mitigating Catastrophic Forgetting and Few-Shot Generalization in Medical Visual Question Answering

by Qinhao Jia, Shuxian Liu, Mingliang Chen, Tianyi Li and Jing Yang

Tomography 2025, 11(10), 115; https://doi.org/10.3390/tomography11100115 - 20 Oct 2025

Abstract

Objective: Medical Visual Question Answering (Med-VQA), a key technology that integrates computer vision and natural language processing to assist in clinical diagnosis, possesses significant potential for enhancing diagnostic efficiency and accuracy. However, its development is constrained by two major bottlenecks: weak few-shot generalization [...] Read more.

Objective: Medical Visual Question Answering (Med-VQA), a key technology that integrates computer vision and natural language processing to assist in clinical diagnosis, possesses significant potential for enhancing diagnostic efficiency and accuracy. However, its development is constrained by two major bottlenecks: weak few-shot generalization capability stemming from the scarcity of high-quality annotated data and the problem of catastrophic forgetting when continually learning new knowledge. Existing research has largely addressed these two challenges in isolation, lacking a unified framework. Methods: To bridge this gap, this paper proposes a novel Evolvable Clinical-Semantic Alignment (ECSA) framework, designed to synergistically solve these two challenges within a single architecture. ECSA is built upon powerful pre-trained vision (BiomedCLIP) and language (Flan-T5) models, with two innovative modules at its core. First, we design a Clinical-Semantic Disambiguation Module (CSDM), which employs a novel debiased hard negative mining strategy for contrastive learning. This enables the precise discrimination of “hard negatives” that are visually similar but clinically distinct, thereby significantly enhancing the model’s representation ability in few-shot and long-tail scenarios. Second, we introduce a Prompt-based Knowledge Consolidation Module (PKC), which acts as a rehearsal-free non-parametric knowledge store. It consolidates historical knowledge by dynamically accumulating and retrieving task-specific “soft prompts,” thus effectively circumventing catastrophic forgetting without relying on past data. Results: Extensive experimental results on four public benchmark datasets, VQA-RAD, SLAKE, PathVQA, and VQA-Med-2019, demonstrate ECSA’s state-of-the-art or highly competitive performance. Specifically, ECSA achieves excellent overall accuracies of 80.15% on VQA-RAD and 85.10% on SLAKE, while also showing strong generalization with 64.57% on PathVQA and 82.23% on VQA-Med-2019. More critically, in continual learning scenarios, the framework achieves a low forgetting rate of just 13.50%, showcasing its significant advantages in knowledge retention. Conclusions: These findings validate the framework’s substantial potential for building robust and evolvable clinical decision support systems. Full article

► Show Figures

Figure 1

20 pages, 695 KB

Open AccessArticle

Threshold Dynamic Multi-Source Decisive Prototypical Network

by Qibing Ma, Guangyang Pang and Xinyue Liu

Electronics 2025, 14(20), 4077; https://doi.org/10.3390/electronics14204077 - 17 Oct 2025

Viewed by 220

Abstract

To address the issue that prototypical networks in existing few-shot text classification methods suffer from performance limitations due to prototype shift and metric constraints, this paper proposes a meta-learning-based few-shot text classification method: Threshold Dynamic Multi-Source Decisive Prototypical Network (TDMP-Net) to solve these [...] Read more.

To address the issue that prototypical networks in existing few-shot text classification methods suffer from performance limitations due to prototype shift and metric constraints, this paper proposes a meta-learning-based few-shot text classification method: Threshold Dynamic Multi-Source Decisive Prototypical Network (TDMP-Net) to solve these problems. This method designs two core components: the threshold dynamic data augmentation module and the multi-source information Decider. Specifically, the threshold dynamic data augmentation module achieves the optimization of the prototype estimation process by leveraging the multi-source information of query set samples, which thereby alleviates the prototype shift problem; meanwhile, the multi-source information Decider performs classification by relying on the multi-source information of the query set, thus alleviating the metric constraint problem. The effectiveness of the proposed method is verified on four benchmark datasets: under the five-way one-shot and five-way five-shot settings, TDMP-Net achieves average accuracies of 78.3% and 86.5%, respectively, which are an average improvement of 3.3 percentage points compared with current state-of-the-art methods. Experimental results show that this TDMP-Net can effectively alleviate the prototype shift problem and metric constraint problems, and has stronger generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence and Pattern Recognition for Intelligent Systems)

► Show Figures

Figure 1

23 pages, 2173 KB

Open AccessArticle

Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization

by Shenyi Qian, Bowen Fu, Chao Liu, Songhe Jin, Tong Sun, Zhen Chen, Daiyi Li, Yifan Sun, Yibing Chen and Yuheng Li

Symmetry 2025, 17(10), 1673; https://doi.org/10.3390/sym17101673 - 7 Oct 2025

Viewed by 299

Abstract

The purpose of few-shot relation extraction (RE) is to recognize the relationship between specific entity pairs in text when there are a limited number of labeled samples. A few-shot RE method based on a prototype network, which constructs relation prototypes by relying on [...] Read more.

The purpose of few-shot relation extraction (RE) is to recognize the relationship between specific entity pairs in text when there are a limited number of labeled samples. A few-shot RE method based on a prototype network, which constructs relation prototypes by relying on the support set to assign labels to query samples, inherently leverages the symmetry between support and query processing. Although these methods have achieved remarkable results, they still face challenges such as the misjudging of noisy samples or outliers, as well as distinguishing semantic similarity relations. To address the aforementioned challenges, we propose a novel semantic enhanced prototype network, which can integrate the semantic information of relations more effectively to promote more expressive representations of instances and relation prototypes, so as to improve the performance of the few-shot RE. Firstly, we design a prompt encoder to uniformly process different prompt templates for instance and relation information, and then utilize the powerful semantic understanding and generation capabilities of large language models (LLMs) to obtain precise semantic representations of instances, their prototypes, and conceptual prototypes. Secondly, graph attention learning techniques are introduced to effectively extract specific-relation features between conceptual prototypes and isomorphic instances while maintaining structural symmetry. Meanwhile, a prototype-level contrastive learning strategy with bidirectional feature symmetry is proposed to predict query instances by integrating the interpretable features of conceptual prototypes and the intra-class shared features captured by instance prototypes. In addition, a clustering loss function was designed to guide the model to learn a discriminative metric space with improved relational symmetry, effectively improving the accuracy of the model’s relationship recognition. Finally, the experimental results on the FewRel1.0 and FewRel2.0 datasets show that the proposed approach delivers improved performance compared to existing advanced models in the task of few-shot RE. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

16 pages, 2489 KB

Open AccessArticle

Sentence-Level Silent Speech Recognition Using a Wearable EMG/EEG Sensor System with AI-Driven Sensor Fusion and Language Model

by Nicholas Satterlee, Xiaowei Zuo, Kee Moon, Sung Q. Lee, Matthew Peterson and John S. Kang

Sensors 2025, 25(19), 6168; https://doi.org/10.3390/s25196168 - 5 Oct 2025

Viewed by 807

Abstract

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable [...] Read more.

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable SSR system capable of accurate sentence-level recognition using single-channel EMG and EEG sensors with real-time wireless transmission. A moving window-based few-shot learning model, implemented with a Siamese neural network, segments and classifies words from continuous biosignals without requiring pauses or manual segmentation between word signals. A novel sensor fusion model integrates both EMG and EEG modalities, enhancing classification accuracy. To further improve sentence-level recognition, a statistical language model (LM) is applied as post-processing to correct syntactic and lexical errors. The system was evaluated on a dataset of four military command sentences containing ten unique words, achieving 95.25% sentence-level recognition accuracy. These results demonstrate the feasibility of sentence-level SSR using wearable sensors through a window-based few-shot learning model, sensor fusion, and ML applied to limited simultaneous EMG and EEG signals. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

► Show Figures

Figure 1

21 pages, 9052 KB

Open AccessArticle

SAM–Attention Synergistic Enhancement: SAR Image Object Detection Method Based on Visual Large Model

by Yirong Yuan, Jie Yang, Lei Shi and Lingli Zhao

Remote Sens. 2025, 17(19), 3311; https://doi.org/10.3390/rs17193311 - 26 Sep 2025

Viewed by 581

Abstract

The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong [...] Read more.

The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong generalization capabilities for natural image processing, but their application to SAR imagery remains relatively rare. This paper attempts to introduce a visual large model into the SAR object detection task, aiming to alleviate the problems of weak cross-domain generalization and poor adaptability to few-shot samples caused by the characteristics of SAR images in existing models. The proposed model comprises an image encoder, an attention module, and a detection decoder. The image encoder leverages the pre-trained Segment Anything Model (SAM) for effective feature extraction from SAR images. An Adaptive Channel Interactive Attention (ACIA) module is introduced to suppress SAR speckle noise. Further, a Dynamic Tandem Attention (DTA) mechanism is proposed in the decoder to integrate scale perception, spatial focusing, and task adaptation, while decoupling classification from detection for improved accuracy. Leveraging the strong representational and few-shot adaptation capabilities of large pre-trained models, this study evaluates their cross-domain and few-shot detection performance on SAR imagery. For cross-domain detection, the model was trained on AIR-SARShip-1.0 and tested on SSDD, achieving an mAP50 of 0.54. For few-shot detection on SAR-AIRcraft-1.0, using only 10% of the training samples, the model reached an mAP50 of 0.503. Full article

(This article belongs to the Special Issue Big Data Era: AI Technology for SAR and PolSAR Image)

► Show Figures

Figure 1

26 pages, 1333 KB

Open AccessArticle

Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning

by Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng and Minglei Shu

Entropy 2025, 27(9), 991; https://doi.org/10.3390/e27090991 - 22 Sep 2025

Viewed by 432

Abstract

With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences [...] Read more.

With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model’s sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model’s generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset. Full article

► Show Figures

Figure 1

18 pages, 2229 KB

Open AccessArticle

Large Language Models for Construction Risk Classification: A Comparative Study

by Abdolmajid Erfani and Hussein Khanjar

Buildings 2025, 15(18), 3379; https://doi.org/10.3390/buildings15183379 - 18 Sep 2025

Viewed by 1032

Abstract

Risk identification is a critical concern in the construction industry. In recent years, there has been a growing trend of applying artificial intelligence (AI) tools to detect risks from unstructured data sources such as news articles, social media, contracts, and financial reports. The [...] Read more.

Risk identification is a critical concern in the construction industry. In recent years, there has been a growing trend of applying artificial intelligence (AI) tools to detect risks from unstructured data sources such as news articles, social media, contracts, and financial reports. The rapid advancement of large language models (LLMs) in text analysis, summarization, and generation offers promising opportunities to improve construction risk identification. This study conducts a comprehensive benchmarking of natural language processing (NLP) and LLM techniques for automating the classification of risk items into a generic risk category. Twelve model configurations are evaluated, ranging from classical NLP pipelines using TF-IDF and Word2Vec to advanced transformer-based models such as BERT and GPT-4 with zero-shot, instruction, and few-shot prompting strategies. The results reveal that LLMs, particularly GPT-4 with few-shot prompts, achieve a competitive performance (F1 = 0.81) approaching that of the best classical model (BERT + SVM; F1 = 0.86), all without the need for training data. Moreover, LLMs exhibit a more balanced performance across imbalanced risk categories, showcasing their adaptability in data-sparse settings. These findings contribute theoretically by positioning LLMs as scalable plug-and-play alternatives to NLP pipelines, offering practical value by highlighting how LLMs can support early-stage project planning and risk assessment in contexts where labeled data and expert resources are limited. Full article

(This article belongs to the Special Issue Next-Gen Risk Management: AI-Driven Solutions for Engineering and Construction Projects)

► Show Figures

Figure 1

25 pages, 2210 KB

Open AccessArticle

KG-SR-LLM: Knowledge-Guided Semantic Representation and Large Language Model Framework for Cross-Domain Bearing Fault Diagnosis

by Chengyong Xiao, Xiaowei Liu, Aziguli Wulamu and Dezheng Zhang

Sensors 2025, 25(18), 5758; https://doi.org/10.3390/s25185758 - 16 Sep 2025

Viewed by 735

Abstract

Bearing fault diagnosis is crucial for stable operation and safe manufacturing as industry intelligence becomes increasingly advanced. However, under complicated non-linear vibration modes and multiple operating conditions, most of the current diagnostic methods are limited in terms of cross-domain generalization. To address these [...] Read more.

Bearing fault diagnosis is crucial for stable operation and safe manufacturing as industry intelligence becomes increasingly advanced. However, under complicated non-linear vibration modes and multiple operating conditions, most of the current diagnostic methods are limited in terms of cross-domain generalization. To address these issues, this study develops a generalized diagnostic framework leveraging Large Language Models (LLMs), integrating multiple enhancements to improve both accuracy and adaptability. Initially, a structured representation approach is designed to transform raw vibration time series into interpretable text sequences by extracting physically meaningful features in both time and frequency domains. This transformation bridges the gap between sequential sensor data and semantic understanding. Furthermore, to explicitly incorporate bearings’ structural parameters and operating condition information, a knowledge-guided prompt tuning strategy based on Low-Rank Adaptation (LoRA-Prompt) is introduced. This mechanism enables the model to adapt more effectively to varying fault scenarios by embedding expert prior knowledge directly into the learning process. Finally, a generalized fault diagnosis method named Knowledge-Guided Semantic Representation and Large Language Model (KG-SR-LLM) is established. Large-scale experiments using 11 public datasets from industrial, aerospace, and energy fields are carried out to extensively evaluate its performance. Based on experiment analysis and a comparison of results, KG-SR-LLM is superior to classical deep learning models by 9.22%, reaching an average diagnostic accuracy of 98.36%. KG-SR-LLM is effective for handling few-shot transfer and cross-condition adaptation tasks. All these results illustrate the theoretical significance and application benefit of KG-SR-LLM for intelligent fault diagnosis of bearings. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

19 pages, 611 KB

Open AccessArticle

Prompt-Driven and Kubernetes Error Report-Aware Container Orchestration

by Niklas Beuter, André Drews and Nane Kratzke

Future Internet 2025, 17(9), 416; https://doi.org/10.3390/fi17090416 - 11 Sep 2025

Viewed by 411

Abstract

Background: Container orchestration systems like Kubernetes rely heavily on declarative manifest files, which serve as orchestration blueprints. However, managing these manifest files is often complex and requires substantial DevOps expertise. Methodology: This study investigates the use of Large Language Models (LLMs) to automate [...] Read more.

Background: Container orchestration systems like Kubernetes rely heavily on declarative manifest files, which serve as orchestration blueprints. However, managing these manifest files is often complex and requires substantial DevOps expertise. Methodology: This study investigates the use of Large Language Models (LLMs) to automate the creation of Kubernetes manifest files from natural language specifications, utilizing prompt engineering techniques within an innovative error- and warning-report–aware refinement process. We assess the capabilities of these LLMs using Zero-Shot, Few-Shot, Prompt-Chaining, and Self-Refine methods to address DevOps needs and support fully automated deployment pipelines. Results: Our findings show that LLMs can generate Kubernetes manifests with varying levels of manual intervention. Notably, GPT-4 and GPT-3.5 demonstrate strong potential for deployment automation. Interestingly, smaller models sometimes outperform larger ones, challenging the assumption that larger models always yield better results. Conclusions: This research highlights the crucial impact of prompt engineering on LLM performance for Kubernetes tasks and recommends further exploration of prompt techniques and model comparisons, outlining a promising path for integrating LLMs into automated deployment workflows. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))

► Show Figures

Figure 1

22 pages, 47099 KB

Open AccessArticle

Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications

by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil

AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025

Viewed by 954

Abstract

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article

(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)

► Show Figures

Figure 1

19 pages, 4057 KB

Open AccessArticle

Few-Shot Target Detection Algorithm Based on Adaptive Sampling Meta-DETR

by Zihao Ma, Gang Liu, Zhaoya Tong and Xiaoliang Fan

Electronics 2025, 14(17), 3506; https://doi.org/10.3390/electronics14173506 - 2 Sep 2025

Viewed by 651

Abstract

Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of [...] Read more.

Meta-DETR is a few-shot target detection algorithm that combines meta-learning and transformer architecture to solve the problem of data sample scarcity. This algorithm uses deformable attention to focus feature learning process more accurately on the target and its surroundings. However, the number of sampling points in the deformable attention is fixed, which limits the effective information involved in feature extraction, resulting in insufficient feature extraction of the target and affecting detection performance. To solve this problem, a Meta-DETR few-shot target detection algorithm based on adaptive sampling deformable attention is proposed. Firstly, the cosine similarity between feature points is calculated by query features that are integrated with support features. Secondly, the number of related features of each feature point is counted by the similarity threshold. Thirdly, the final number of sampling points of the feature map are calculated by using the idea of maximum inter-class variance to achieve adaptive sampling. Finally, adaptive sampling deformable attention is integrated into Meta-DETR to achieve few-shot target detection. From the attention activation map, it can be seen that the deformable attention based on adaptive sampling pays more attention to the target itself. Compared with Meta-DETR, the proposed algorithm improves the detection accuracy of novel classes by 0.9%, 0.7%, 1.4%, and 2.1%, respectively, for shots 1, 2, 3, and 10 in partition 1 on the PASCAL VOC dataset; 3.5%, 0.1%, 5.5%, and 5.7%, respectively, for shots 2, 3, 5, and 10 in partition 2; and 1.9%, 1.0%, 2.1%, and 0.1%, respectively, for shots 2, 3, 5, and 10 in partition 3. Compared with MPF-Net, CRK-Net, and FSCE, the proposed algorithm achieves the best performance and can effectively realize detection under few-shot conditions. In addition, experiments on a self-made infrared dataset further validate the effectiveness of the algorithm proposed in this paper. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 2799 KB

Open AccessArticle

Few-Shot Leukocyte Classification Algorithm Based on Feature Reconstruction Network with Improved EfficientNetV2

by Xinzheng Wang, Cuisi Ou, Guangjian Pan, Zhigang Hu and Kaiwen Cao

Appl. Sci. 2025, 15(17), 9377; https://doi.org/10.3390/app15179377 - 26 Aug 2025

Viewed by 615

Abstract

Deep learning has excelled in image classification largely due to large, professionally labeled datasets. However, in the field of medical images data annotation often relies on experienced experts, especially in tasks such as white blood cell classification where the staining methods for different [...] Read more.

Deep learning has excelled in image classification largely due to large, professionally labeled datasets. However, in the field of medical images data annotation often relies on experienced experts, especially in tasks such as white blood cell classification where the staining methods for different cells vary greatly and the number of samples in certain categories is relatively small. To evaluate leukocyte classification performance with limited labeled samples, a few-shot learning method based on Feature Reconstruction Network with Improved EfficientNetV2 (FRNE) is proposed. Firstly, this paper presents a feature extractor based on the improved EfficientNetv2 architecture. To enhance the receptive field and extract multi-scale features effectively, the network incorporates an ASPP module with dilated convolutions at different dilation rates. This enhancement improves the model’s spatial reconstruction capability during feature extraction. Subsequently, the support set and query set are processed by the feature extractor to obtain the respective feature maps. A feature reconstruction-based classification method is then applied. Specifically, ridge regression reconstructs the query feature map using features from the support set. By analyzing the reconstruction error, the model determines the likelihood of the query sample belonging to a particular class, without requiring additional modules or extensive parameter tuning. Evaluated on the LDWBC and Raabin datasets, the proposed method achieves accuracy improvements of 3.67% and 1.27%, respectively, compared to the method that demonstrated strong OA performance on both datasets among all compared approaches. Full article

► Show Figures

Figure 1

24 pages, 1651 KB

Open AccessArticle

Attentive Neural Processes for Few-Shot Learning Anomaly-Based Vessel Localization Using Magnetic Sensor Data

by Luis Fernando Fernández-Salvador, Borja Vilallonga Tejela, Alejandro Almodóvar, Juan Parras and Santiago Zazo

J. Mar. Sci. Eng. 2025, 13(9), 1627; https://doi.org/10.3390/jmse13091627 - 26 Aug 2025

Viewed by 759

Abstract

Underwater vessel localization using passive magnetic anomaly sensing is a challenging problem due to the variability in vessel magnetic signatures and operational conditions. Data-based approaches may fail to generalize even to slightly different conditions. Thus, we propose an Attentive Neural Process (ANP) approach, [...] Read more.

Underwater vessel localization using passive magnetic anomaly sensing is a challenging problem due to the variability in vessel magnetic signatures and operational conditions. Data-based approaches may fail to generalize even to slightly different conditions. Thus, we propose an Attentive Neural Process (ANP) approach, in order to take advantage of its few-shot capabilities to generalize, for robust localization of underwater vessels based on magnetic anomaly measurements. Our ANP models the mapping from multi-sensor magnetic readings to position as a stochastic function: it cross-attends to a variable-size set of context points and fuses these with a global latent code that captures trajectory-level factors. The decoder outputs a Gaussian over coordinates, providing both point estimates and well-calibrated predictive variance. We validate our approach using a comprehensive dataset of magnetic disturbance fields, covering 64 distinct vessel configurations (combinations of varying hull sizes, submersion depths (water-column height over a seabed array), and total numbers of available sensors). Six magnetometer sensors in a fixed circular arrangement record the magnetic field perturbations as a vessel traverses sinusoidal trajectories. We compare the ANP against baseline multilayer perceptron (MLP) models: (1) base MLPs trained separately on each vessel configuration, and (2) a domain-randomized search (DRS) MLP trained on the aggregate of all configurations to evaluate generalization across domains. The results demonstrate that the ANP achieves superior generalization to new vessel conditions, matching the accuracy of configuration-specific MLPs while providing well-calibrated uncertainty quantification. This uncertainty-aware prediction capability is crucial for real-world deployments, as it can inform adaptive sensing and decision-making. Across various in-distribution scenarios, the ANP halves the mean absolute error versus a domain-randomized MLP (0.43 m vs. 0.84 m). The model is even able to generalize to out-of-distribution data, which means that our approach has the potential to facilitate transferability from offline training to real-world conditions. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

20 pages, 1320 KB

Open AccessArticle

A Method for Few-Shot Modulation Recognition Based on Reinforcement Metric Meta-Learning

by Fan Zhou, Xiao Han, Jinyang Ren, Wei Wang, Yang Wang, Peiying Zhang and Shaolin Liao

Computers 2025, 14(9), 346; https://doi.org/10.3390/computers14090346 - 22 Aug 2025

Viewed by 575

Abstract

In response to the problem where neural network models fail to fully learn signal sample features due to an insufficient number of signal samples, leading to a decrease in the model’s ability to recognize signal modulation methods, a few-shot signal modulation mode recognition [...] Read more.

In response to the problem where neural network models fail to fully learn signal sample features due to an insufficient number of signal samples, leading to a decrease in the model’s ability to recognize signal modulation methods, a few-shot signal modulation mode recognition method based on reinforcement metric meta-learning (RMML) is proposed. This approach, grounded in meta-learning techniques, employs transfer learning to building a feature extraction network that effectively extracts the data features under few-shot conditions. Building on this, by integrating the measurement of features of similar samples and the differences between features of different classes of samples, the metric network’s target loss function is optimized, thereby improving the network’s ability to distinguish between features of different modulation methods. The experimental results demonstrate that this method exhibits a good performance in processing new class signals that have not been previously trained. Under the condition of 5-way 5-shot, when the signal-to-noise ratio (SNR) is 0 dB, this method can achieve an average recognition accuracy of 91.8%, which is 2.8% higher than that of the best-performing baseline method, whereas when the SNR is 18 dB, the model’s average recognition accuracy significantly improves to 98.5%. Full article

(This article belongs to the Special Issue Wireless Sensor Networks in IoT)

► Show Figures

Figure 1

Search Results (197)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (197)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI