MDPI - Publisher of Open Access Journals

23 pages, 8644 KB

Open AccessArticle

Understanding What the Brain Sees: Semantic Recognition from EEG Responses to Visual Stimuli Using Transformer

by Ahmed Fares

AI 2025, 6(11), 288; https://doi.org/10.3390/ai6110288 - 7 Nov 2025

Understanding how the human brain processes and interprets multimedia content represents a frontier challenge in neuroscience and artificial intelligence. This study introduces a novel approach to decode semantic information from electroencephalogram (EEG) signals recorded during visual stimulus perception. We present DCT-ViT, a spatial–temporal [...] Read more.

Understanding how the human brain processes and interprets multimedia content represents a frontier challenge in neuroscience and artificial intelligence. This study introduces a novel approach to decode semantic information from electroencephalogram (EEG) signals recorded during visual stimulus perception. We present DCT-ViT, a spatial–temporal transformer architecture that pioneers automated semantic recognition from brain activity patterns, advancing beyond conventional brain state classification to interpret higher level cognitive understanding. Our methodology addresses three fundamental innovations: First, we develop a topology-preserving 2D electrode mapping that, combined with temporal indexing, generates 3D spatial–temporal representations capturing both anatomical relationships and dynamic neural correlations. Second, we integrate discrete cosine transform (DCT) embeddings with standard patch and positional embeddings in the transformer architecture, enabling frequency-domain analysis that quantifies activation variability across spectral bands and enhances attention mechanisms. Third, we introduce the Semantics-EEG dataset comprising ten semantic categories extracted from visual stimuli, providing a benchmark for brain-perceived semantic recognition research. The proposed DCT-ViT model achieves 72.28% recognition accuracy on Semantics-EEG, substantially outperforming LSTM-based and attention-augmented recurrent baselines. Ablation studies demonstrate that DCT embeddings contribute meaningfully to model performance, validating their effectiveness in capturing frequency-specific neural signatures. Interpretability analyses reveal neurobiologically plausible attention patterns, with visual semantics activating occipital–parietal regions and abstract concepts engaging frontal–temporal networks, consistent with established cognitive neuroscience models. To address systematic misclassification between perceptually similar categories, we develop a hierarchical classification framework with boundary refinement mechanisms. This approach substantially reduces confusion between overlapping semantic categories, elevating overall accuracy to 76.15%. Robustness evaluations demonstrate superior noise resilience, effective cross-subject generalization, and few-shot transfer capabilities to novel categories. This work establishes the technical foundation for brain–computer interfaces capable of decoding semantic understanding, with implications for assistive technologies, cognitive assessment, and human–AI interaction. Both the Semantics-EEG dataset and DCT-ViT implementation are publicly released to facilitate reproducibility and advance research in neural semantic decoding. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

13 pages, 624 KB

Open AccessArticle

Contrastive Learning with Gaussian Embeddings and Self-Attention for Few-Shot Named Entity Recognition

by Yihao Zhang, Wei Chen and Lei Ma

Appl. Sci. 2025, 15(21), 11819; https://doi.org/10.3390/app152111819 - 6 Nov 2025

Viewed by 5

Abstract

Named entity recognition (NER) in few-shot scenarios plays a critical role in entity annotation for low-resource domains. However, existing methods are often limited to learning semantic features and intermediate representations specific to the source domain, which restricts their generalization capability when applied to [...] Read more.

Named entity recognition (NER) in few-shot scenarios plays a critical role in entity annotation for low-resource domains. However, existing methods are often limited to learning semantic features and intermediate representations specific to the source domain, which restricts their generalization capability when applied to unseen target domains and leads to prominent performance degradation. To address this issue, we propose a novel few-shot NER model based on contrastive learning. Specifically, the model enhances token representations through Gaussian distribution embedding and a self-attention mechanism, while adaptively optimizing the weighting parameters of the contrastive loss to achieve performance improvement. This design effectively mitigates overfitting and enhances the model’s generalization ability. Experiments on multiple datasets (including CoNLL2003, GUM, and Few-NERD) demonstrate that our approach achieves performance gains of 2.05% to 15.89% compared to state-of-the-art methods. These results confirm the effectiveness of our model in few-shot NER tasks and suggest its potential for broader application in low-resource information extraction scenarios. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 17739 KB

Open AccessArticle

Re_MGFE: A Multi-Scale Global Feature Embedding Spectrum Sensing Method Based on Relation Network

by Jiayi Wang, Fan Zhou, Jinyang Ren, Lizhuang Tan, Jian Wang, Peiying Zhang and Shaolin Liao

Computers 2025, 14(11), 480; https://doi.org/10.3390/computers14110480 - 4 Nov 2025

Viewed by 207

Abstract

Currently, the increasing number of Internet of Things devices makes spectrum resource shortage prominent. Spectrum sensing technology can effectively solve this problem by conducting real-time monitoring of the spectrum. However, in practical applications, it is difficult to obtain a large number of labeled [...] Read more.

Currently, the increasing number of Internet of Things devices makes spectrum resource shortage prominent. Spectrum sensing technology can effectively solve this problem by conducting real-time monitoring of the spectrum. However, in practical applications, it is difficult to obtain a large number of labeled samples, which leads to the neural network model not being fully trained and affects the performance. Moreover, the existing few-shot methods focus on capturing spatial features, ignoring the representation forms of features at different scales, thus reducing the diversity of features. To address the above issues, this paper proposes a few-shot spectrum sensing method based on multi-scale global feature. To enhance the feature diversity, this method employs a multi-scale feature extractor to extract features at multiple scales. This improves the model’s ability to distinguish signals and avoids overfitting of the network. In addition, to make full use of the frequency features at different scales, a learnable weight feature reinforcer is constructed to enhance the frequency features. The simulation results show that, when SNR is under 0∼10 dB, the recognition accuracy of the network under different task modes all reaches above 81%, which is better than the existing methods. It realizes the accurate spectrum sensing under the few-shot conditions. Full article

(This article belongs to the Section Internet of Things (IoT) and Industrial IoT)

► Show Figures

Figure 1

27 pages, 4104 KB

Open AccessArticle

CropCLR-Wheat: A Label-Efficient Contrastive Learning Architecture for Lightweight Wheat Pest Detection

by Yan Wang, Chengze Li, Chenlu Jiang, Mingyu Liu, Shengzhe Xu, Binghua Yang and Min Dong

Insects 2025, 16(11), 1096; https://doi.org/10.3390/insects16111096 - 25 Oct 2025

Viewed by 1010

Abstract

To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature [...] Read more.

To address prevalent challenges in field-based wheat pest recognition—namely, viewpoint perturbations, sample scarcity, and heterogeneous data distributions—a pest identification framework named CropCLR-Wheat is proposed, which integrates self-supervised contrastive learning with an attention-enhanced mechanism. By incorporating a viewpoint-invariant feature encoder and a diffusion-based feature filtering module, the model significantly enhances pest damage localization and feature consistency, enabling high-accuracy recognition under limited-sample conditions. In 5-shot classification tasks, CropCLR-Wheat achieves a precision of 89.4%, a recall of 87.1%, and an accuracy of 88.2%; these metrics further improve to 92.3%, 90.5%, and 91.2%, respectively, under the 10-shot setting. In the semantic segmentation of wheat pest damage regions, the model attains a mean intersection over union (mIoU) of 82.7%, with precision and recall reaching 85.2% and 82.4%, respectively, markedly outperforming advanced models such as SegFormer and Mask R-CNN. In robustness evaluation under viewpoint disturbances, a prediction consistency rate of 88.7%, a confidence variation of only 7.8%, and a prediction consistency score (PCS) of 0.914 are recorded, indicating strong stability and adaptability. Deployment results further demonstrate the framework’s practical viability: on the Jetson Nano device, an inference latency of 84 ms, a frame rate of 11.9 FPS, and an accuracy of 88.2% are achieved. These results confirm the efficiency of the proposed approach in edge computing environments. By balancing generalization performance with deployability, the proposed method provides robust support for intelligent agricultural terminal systems and holds substantial potential for wide-scale application. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) and Insect Pests Management: Securing Food Security, Human Health, and Natural Resources)

► Show Figures

Figure 1

19 pages, 1781 KB

Open AccessArticle

HiSeq-TCN: High-Dimensional Feature Sequence Modeling and Few-Shot Reinforcement Learning for Intrusion Detection

by Yadong Pei, Yanfei Tan, Wei Gao, Fangwei Li and Mingyue Wang

Electronics 2025, 14(21), 4168; https://doi.org/10.3390/electronics14214168 - 25 Oct 2025

Viewed by 381

Abstract

Intrusion detection is essential to cybersecurity. However, the curse of dimensionality and class imbalance limit detection accuracy and impede the identification of rare attacks. To address these challenges, this paper proposes the high-dimensional feature sequence temporal convolutional network (HiSeq-TCN) for intrusion detection. The [...] Read more.

Intrusion detection is essential to cybersecurity. However, the curse of dimensionality and class imbalance limit detection accuracy and impede the identification of rare attacks. To address these challenges, this paper proposes the high-dimensional feature sequence temporal convolutional network (HiSeq-TCN) for intrusion detection. The proposed HiSeq-TCN transforms high-dimensional feature vectors into pseudo-temporal sequences, enabling the network to capture contextual dependencies across feature dimensions. This enhances feature representation and detection robustness. In addition, a few-shot reinforcement strategy adaptively assigns larger loss weights to minority classes, mitigating class imbalance and improving the recognition of rare attacks. Experiments on the NSL-KDD dataset show that HiSeq-TCN achieves an overall accuracy of 99.44%, outperforming support vector machines, deep neural networks, and long short-term memory models. More importantly, it significantly improves the detection of rare attack types such as remote-to-local and user-to-root attacks. These results highlight the potential of HiSeq-TCN for robust and reliable intrusion detection in practical cybersecurity environments. Full article

(This article belongs to the Special Issue Applications of Deep Learning in Cyber Threat Detection)

► Show Figures

Graphical abstract

23 pages, 16607 KB

Open AccessArticle

Few-Shot Class-Incremental SAR Target Recognition with a Forward-Compatible Prototype Classifier

by Dongdong Guan, Rui Feng, Yuzhen Xie, Xiaolong Zheng, Bangjie Li and Deliang Xiang

Remote Sens. 2025, 17(21), 3518; https://doi.org/10.3390/rs17213518 - 23 Oct 2025

Viewed by 370

Abstract

In practical Synthetic Aperture Radar (SAR) applications, new-class objects can appear at any time as the rapid accumulation of large-scale and high-quantity SAR imagery and are usually supported by limited instances in most cooperative scenarios. Hence, powering advanced deep-learning (DL)-based SAR Automatic Target [...] Read more.

In practical Synthetic Aperture Radar (SAR) applications, new-class objects can appear at any time as the rapid accumulation of large-scale and high-quantity SAR imagery and are usually supported by limited instances in most cooperative scenarios. Hence, powering advanced deep-learning (DL)-based SAR Automatic Target Recognition (SAR ATR) systems with the ability to continuously learn new concepts from few-shot samples without forgetting the old ones is important. In this paper, we tackle the Few-Shot Class-Incremental Learning (FSCIL) problem in the SAR ATR field and propose a Forward-Compatible Prototype Classifier (FCPC) by emphasizing the model’s forward compatibility to incoming targets before and after deployment. Specifically, the classifier’s sensitivity to diversified cues of emerging targets is improved in advance by a Virtual-class Semantic Synthesizer (VSS), considering the class-agnostic scattering parts of targets in SAR imagery and semantic patterns of the DL paradigm. After deploying the classifier in dynamic worlds, since novel target patterns from few-shot samples are highly biased and unstable, the model’s representability to general patterns and its adaptability to class-discriminative ones are balanced by a Decoupled Margin Adaptation (DMA) strategy, in which only the model’s high-level semantic parameters are timely tuned by improving the similarity of few-shot boundary samples to class prototypes and the dissimilarity to interclass ones. For inference, a Nearest-Class-Mean (NCM) classifier is adopted for prediction by comparing the semantics of unknown targets with prototypes of all classes based on the cosine criterion. In experiments, contributions of the proposed modules are verified by ablation studies, and our method achieves considerable performance on three FSCIL of SAR ATR datasets, i.e., SAR-AIRcraft-FSCIL, MSTAR-FSCIL, and FUSAR-FSCIL, compared with numerous benchmarks, demonstrating its superiority and effectiveness in dealing with the FSCIL of SAR ATR. Full article

► Show Figures

Figure 1

22 pages, 573 KB

Open AccessArticle

Federated Self-Supervised Few-Shot Face Recognition

by Nursultan Makhanov, Beibut Amirgaliyev, Talgat Islamgozhayev and Didar Yedilkhan

J. Imaging 2025, 11(10), 370; https://doi.org/10.3390/jimaging11100370 - 18 Oct 2025

Viewed by 426

Abstract

This paper presents a systematic framework that combines federated learning, self-supervised learning, and few-shot learning paradigms for privacy-preserving face recognition. We use the large-scale CASIA-WebFace dataset for self-supervised pre-training using SimCLR in a federated setting, followed by federated few-shot fine-tuning on the LFW [...] Read more.

This paper presents a systematic framework that combines federated learning, self-supervised learning, and few-shot learning paradigms for privacy-preserving face recognition. We use the large-scale CASIA-WebFace dataset for self-supervised pre-training using SimCLR in a federated setting, followed by federated few-shot fine-tuning on the LFW dataset using prototypical networks. Through comprehensive evaluation across six state-of-the-art architectures (ResNet, DenseNet, MobileViT, ViT-Small, CvT, and CoAtNet), we demonstrate that while our federated approach successfully preserves data privacy, it comes with significant performance trade-offs. Our results show 12–30% accuracy degradation compared to centralized methods, representing the substantial cost of privacy preservation. We find that traditional CNNs show superior robustness to federated constraints compared to transformer-based architectures, and that five-shot configurations provide an optimal balance between data efficiency and performance. This work provides important empirical insights and establishes benchmarks for federated few-shot face recognition, quantifying the privacy–utility trade-offs that practitioners must consider when deploying such systems in real-world applications. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 3081 KB

Open AccessArticle

Lightweight CNN–Transformer Hybrid Network with Contrastive Learning for Few-Shot Noxious Weed Recognition

by Ruiheng Li, Boda Yu, Boming Zhang, Hongtao Ma, Yihan Qin, Xinyang Lv and Shuo Yan

Horticulturae 2025, 11(10), 1236; https://doi.org/10.3390/horticulturae11101236 - 13 Oct 2025

Viewed by 506

Abstract

In resource-constrained edge agricultural environments, the accurate recognition of toxic weeds poses dual challenges related to model lightweight design and the few-shot generalization capability. To address these challenges, a multi-strategy recognition framework is proposed, which integrates a lightweight backbone network, a pseudo-labeling guidance [...] Read more.

In resource-constrained edge agricultural environments, the accurate recognition of toxic weeds poses dual challenges related to model lightweight design and the few-shot generalization capability. To address these challenges, a multi-strategy recognition framework is proposed, which integrates a lightweight backbone network, a pseudo-labeling guidance mechanism, and a contrastive boundary enhancement module. This approach is designed to improve deployment efficiency on low-power devices while ensuring high accuracy in identifying rare toxic weed categories. The proposed model achieves a real-time inference speed of 18.9 FPS on the Jetson Nano platform, with a compact model size of 18.6 MB and power consumption maintained below 5.1 W, demonstrating its efficiency for edge deployment. In standard classification tasks, the model attains 89.64%, 87.91%, 88.76%, and 88.43% in terms of precision, recall, F1-score, and accuracy, respectively, outperforming existing mainstream lightweight models such as ResNet18, MobileNetV2, and MobileViT across all evaluation metrics. In few-shot classification tasks targeting rare toxic weed species, the complete model achieves an accuracy of 80.32%, marking an average improvement of over 13 percentage points compared to ablation variants that exclude pseudo-labeling and self-supervised modules or adopt a CNN-only architecture. The experimental results indicate that the proposed model not only delivers strong overall classification performance but also exhibits superior adaptability for deployment and robustness in low-data regimes, offering an effective solution for the precise identification and ecological control of toxic weeds within intelligent agricultural perception systems. Full article

(This article belongs to the Special Issue Applied Artificial Intelligence in Digital Horticulture: Practices and Innovations)

► Show Figures

Figure 1

23 pages, 2173 KB

Open AccessArticle

Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization

by Shenyi Qian, Bowen Fu, Chao Liu, Songhe Jin, Tong Sun, Zhen Chen, Daiyi Li, Yifan Sun, Yibing Chen and Yuheng Li

Symmetry 2025, 17(10), 1673; https://doi.org/10.3390/sym17101673 - 7 Oct 2025

Viewed by 425

Abstract

The purpose of few-shot relation extraction (RE) is to recognize the relationship between specific entity pairs in text when there are a limited number of labeled samples. A few-shot RE method based on a prototype network, which constructs relation prototypes by relying on [...] Read more.

The purpose of few-shot relation extraction (RE) is to recognize the relationship between specific entity pairs in text when there are a limited number of labeled samples. A few-shot RE method based on a prototype network, which constructs relation prototypes by relying on the support set to assign labels to query samples, inherently leverages the symmetry between support and query processing. Although these methods have achieved remarkable results, they still face challenges such as the misjudging of noisy samples or outliers, as well as distinguishing semantic similarity relations. To address the aforementioned challenges, we propose a novel semantic enhanced prototype network, which can integrate the semantic information of relations more effectively to promote more expressive representations of instances and relation prototypes, so as to improve the performance of the few-shot RE. Firstly, we design a prompt encoder to uniformly process different prompt templates for instance and relation information, and then utilize the powerful semantic understanding and generation capabilities of large language models (LLMs) to obtain precise semantic representations of instances, their prototypes, and conceptual prototypes. Secondly, graph attention learning techniques are introduced to effectively extract specific-relation features between conceptual prototypes and isomorphic instances while maintaining structural symmetry. Meanwhile, a prototype-level contrastive learning strategy with bidirectional feature symmetry is proposed to predict query instances by integrating the interpretable features of conceptual prototypes and the intra-class shared features captured by instance prototypes. In addition, a clustering loss function was designed to guide the model to learn a discriminative metric space with improved relational symmetry, effectively improving the accuracy of the model’s relationship recognition. Finally, the experimental results on the FewRel1.0 and FewRel2.0 datasets show that the proposed approach delivers improved performance compared to existing advanced models in the task of few-shot RE. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

20 pages, 1972 KB

Open AccessArticle

Few-Shot Identification of Individuals in Sports: The Case of Darts

by Val Vec, Anton Kos, Rongfang Bie, Libin Jiao, Haodi Wang, Zheng Zhang, Sašo Tomažič and Anton Umek

Information 2025, 16(10), 865; https://doi.org/10.3390/info16100865 - 5 Oct 2025

Viewed by 501

Abstract

This paper contains an analysis of methods for person classification based on signals from wearable IMU sensors during sports. While this problem has been investigated in prior work, existing approaches have not addressed it within the context of few-shot or minimal-data scenarios. A [...] Read more.

This paper contains an analysis of methods for person classification based on signals from wearable IMU sensors during sports. While this problem has been investigated in prior work, existing approaches have not addressed it within the context of few-shot or minimal-data scenarios. A few-shot scenario is especially useful as the main use case for person identification in sports systems is to be integrated into personalised biofeedback systems in sports. Such systems should provide personalised feedback that helps athletes learn faster. When introducing a new user, it is impractical to expect them to first collect many recordings. We demonstrate that the problem can be solved with over 90% accuracy in both open-set and closed-set scenarios using established methods. However, the challenge arises when applying few-shot methods, which do not require retraining the model to recognise new people. Most few-shot methods perform poorly due to feature extractors that learn dataset-specific representations, limiting their generalizability. To overcome this, we propose a combination of an unsupervised feature extractor and a prototypical network. This approach achieves 91.8% accuracy in the five-shot closed-set setting and 81.5% accuracy in the open-set setting, with a 99.6% rejection rate for unknown athletes. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)

► Show Figures

Figure 1

16 pages, 2489 KB

Open AccessArticle

Sentence-Level Silent Speech Recognition Using a Wearable EMG/EEG Sensor System with AI-Driven Sensor Fusion and Language Model

by Nicholas Satterlee, Xiaowei Zuo, Kee Moon, Sung Q. Lee, Matthew Peterson and John S. Kang

Sensors 2025, 25(19), 6168; https://doi.org/10.3390/s25196168 - 5 Oct 2025

Viewed by 1109

Abstract

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable [...] Read more.

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable SSR system capable of accurate sentence-level recognition using single-channel EMG and EEG sensors with real-time wireless transmission. A moving window-based few-shot learning model, implemented with a Siamese neural network, segments and classifies words from continuous biosignals without requiring pauses or manual segmentation between word signals. A novel sensor fusion model integrates both EMG and EEG modalities, enhancing classification accuracy. To further improve sentence-level recognition, a statistical language model (LM) is applied as post-processing to correct syntactic and lexical errors. The system was evaluated on a dataset of four military command sentences containing ten unique words, achieving 95.25% sentence-level recognition accuracy. These results demonstrate the feasibility of sentence-level SSR using wearable sensors through a window-based few-shot learning model, sensor fusion, and ML applied to limited simultaneous EMG and EEG signals. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

► Show Figures

Figure 1

20 pages, 1325 KB

Open AccessArticle

Intelligent Fault Diagnosis for Cross-Domain Few-Shot Learning of Rotating Equipment Based on Mixup Data Augmentation

by Kun Yu, Yan Li, Qiran Zhan, Yongchao Zhang and Bin Xing

Machines 2025, 13(9), 807; https://doi.org/10.3390/machines13090807 - 3 Sep 2025

Viewed by 807

Abstract

Existing fault diagnosis methods assume the identical distribution of training and test data, failing to adapt to source–target domain differences in industrial scenarios and limiting generalization. They also struggle to explore inter-domain correlations with scarce labeled target samples, leading to poor convergence and [...] Read more.

Existing fault diagnosis methods assume the identical distribution of training and test data, failing to adapt to source–target domain differences in industrial scenarios and limiting generalization. They also struggle to explore inter-domain correlations with scarce labeled target samples, leading to poor convergence and generalization. To address this, our paper proposes a cross-domain few-shot intelligent fault diagnosis method based on Mixup data augmentation. Firstly, a Mixup data augmentation method is used to linearly combine source domain and target domain data in a specific proportion to generate mixed-domain data, enabling the model to learn correlations and features between data from different domains and improving its generalization ability in cross-domain few-shot learning tasks. Secondly, a feature decoupling module based on the self-attention mechanism is proposed to extract domain-independent features and domain-related features, allowing the model to further reduce the domain distribution gap and effectively generalize source domain knowledge to the target domain. Then, the model parameters are optimized through a multi-task learning mechanism consisting of sample classification tasks and domain classification tasks. Finally, applications in classification tasks on multiple sets of equipment fault datasets show that the proposed method can significantly improve the fault recognition ability of the diagnosis model under the conditions of large distribution differences in the target domain and scarce labeled samples. Full article

(This article belongs to the Section Machines Testing and Maintenance)

► Show Figures

Figure 1

22 pages, 47099 KB

Open AccessArticle

Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications

by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil

AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025

Viewed by 1123

Abstract

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.

Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article

(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)

► Show Figures

Figure 1

22 pages, 1057 KB

Open AccessArticle

Relation-Guided Embedding Transductive Propagation Network with Residual Correction for Few-Shot SAR ATR

by Xuelian Yu, Hailong Yu, Yan Peng, Lei Miao and Haohao Ren

Remote Sens. 2025, 17(17), 2980; https://doi.org/10.3390/rs17172980 - 27 Aug 2025

Viewed by 586

Abstract

Deep learning-based methods have shown great promise for synthetic aperture radar (SAR) automatic target recognition (ATR) in recent years. These methods demonstrate superior performance compared to traditional approaches across various recognition tasks. However, these methods often face significant challenges due to the limited [...] Read more.

Deep learning-based methods have shown great promise for synthetic aperture radar (SAR) automatic target recognition (ATR) in recent years. These methods demonstrate superior performance compared to traditional approaches across various recognition tasks. However, these methods often face significant challenges due to the limited availability of labeled samples, which is a common issue in SAR image analysis owing to the high cost and difficulty of data annotation. To address this issue, a variety of few-shot learning approaches have been proposed and have demonstrated promising results under data-scarce conditions. Nonetheless, a notable limitation of many existing few-shot methods is that their performance tends to plateau when more labeled samples become available. Most few-shot methods are optimized for scenarios with extremely limited data. As a result, they often fail to leverage the advantages of larger datasets. This leads to suboptimal recognition performance compared to conventional deep learning techniques when sufficient training data is available. Therefore, there is a pressing need for approaches that not only excel in few-shot scenarios but also maintain robust performance as the number of labeled samples increases. To this end, we propose a novel method, termed relation-guided embedding transductive propagation network with residual correction (RGE-TPNRC), specifically designed for few-shot SAR ATR tasks. By leveraging mechanisms such as relation node modeling, relation-guided embedding propagation, and residual correction, RGE-TPNRC can fully utilize limited labeled samples by deeply exploring inter-sample relations, enabling better scalability as the support set size increases. Consequently, it effectively addresses the plateauing performance problem of existing few-shot learning methods when more labeled samples become available. Firstly, input samples are transformed into support-query relation nodes, explicitly capturing the dependencies between support and query samples. Secondly, the known relations among support samples are utilized to guide the propagation of embeddings within the network, enabling manifold smoothing and allowing the model to generalize effectively to unseen target classes. Finally, a residual correction propagation classifier refines predictions by correcting potential errors and smoothing decision boundaries, ensuring robust and accurate classification. Experimental results on the moving and stationary target acquisition and recognition (MSTAR) and OpenSARShip datasets demonstrate that our method can achieve state-of-the-art performance in few-shot SAR ATR scenarios. Full article

► Show Figures

Figure 1

20 pages, 1320 KB

Open AccessArticle

A Method for Few-Shot Modulation Recognition Based on Reinforcement Metric Meta-Learning

by Fan Zhou, Xiao Han, Jinyang Ren, Wei Wang, Yang Wang, Peiying Zhang and Shaolin Liao

Computers 2025, 14(9), 346; https://doi.org/10.3390/computers14090346 - 22 Aug 2025

Viewed by 664

Abstract

In response to the problem where neural network models fail to fully learn signal sample features due to an insufficient number of signal samples, leading to a decrease in the model’s ability to recognize signal modulation methods, a few-shot signal modulation mode recognition [...] Read more.

In response to the problem where neural network models fail to fully learn signal sample features due to an insufficient number of signal samples, leading to a decrease in the model’s ability to recognize signal modulation methods, a few-shot signal modulation mode recognition method based on reinforcement metric meta-learning (RMML) is proposed. This approach, grounded in meta-learning techniques, employs transfer learning to building a feature extraction network that effectively extracts the data features under few-shot conditions. Building on this, by integrating the measurement of features of similar samples and the differences between features of different classes of samples, the metric network’s target loss function is optimized, thereby improving the network’s ability to distinguish between features of different modulation methods. The experimental results demonstrate that this method exhibits a good performance in processing new class signals that have not been previously trained. Under the condition of 5-way 5-shot, when the signal-to-noise ratio (SNR) is 0 dB, this method can achieve an average recognition accuracy of 91.8%, which is 2.8% higher than that of the best-performing baseline method, whereas when the SNR is 18 dB, the model’s average recognition accuracy significantly improves to 98.5%. Full article

(This article belongs to the Special Issue Wireless Sensor Networks in IoT)

► Show Figures

Figure 1

Search Results (147)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (147)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI