Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,352)

Search Parameters:
Keywords = few shot learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 44276 KB  
Article
MSFFDet: A Meta-Learning-Based Support-Guided Feature Fusion Detector for Few-Shot Remote Sensing Detection
by Haoxiang Qi, Wenzhe Zhao, Ting Zhang and Guangyao Zhou
Appl. Sci. 2026, 16(2), 917; https://doi.org/10.3390/app16020917 - 15 Jan 2026
Abstract
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the [...] Read more.
Few-shot object detection in remote sensing imagery faces significant challenges, including limited labeled samples, complex scene backgrounds, and subtle inter-class differences. To tackle these issues, we design a novel detection framework that effectively transfers supervision from a few annotated support examples to the query domain. We introduce a feature enhancement mechanism that injects fine-grained support cues into the query representation, helping the model focus on relevant regions and suppress background noise. This allows the model to generate more accurate proposals and perform robust classification, especially for visually confusing or small objects. Additionally, our method enhances feature interaction between support and query images through a nonlinear combination strategy, which captures both semantic similarity and discriminative differences. The proposed framework is fully end-to-end and jointly optimizes the feature fusion and detection processes. Experiments on three challenging benchmarks, NWPU VHR-10, iSAID and DIOR, demonstrate that our method consistently achieves state-of-the-art results under different few-shot settings and category splits. Compared with other advanced methods, it yields superior performance, highlighting its strong generalization ability in low-data remote sensing scenarios. Full article
(This article belongs to the Special Issue AI in Object Detection)
Show Figures

Figure 1

28 pages, 30101 KB  
Article
Machine Learning-Driven Soil Fungi Identification Using Automated Imaging Techniques
by Karol Struniawski, Ryszard Kozera, Aleksandra Konopka, Lidia Sas-Paszt and Agnieszka Marasek-Ciolakowska
Appl. Sci. 2026, 16(2), 855; https://doi.org/10.3390/app16020855 - 14 Jan 2026
Abstract
Soilborne fungi (Fusarium, Trichoderma, Verticillium, Purpureocillium) critically impact agricultural productivity, disease dynamics, and soil health, requiring rapid identification for precision agriculture. Current diagnostics require labor-intensive microscopy or expensive molecular assays (up to 10 days), while existing ML studies [...] Read more.
Soilborne fungi (Fusarium, Trichoderma, Verticillium, Purpureocillium) critically impact agricultural productivity, disease dynamics, and soil health, requiring rapid identification for precision agriculture. Current diagnostics require labor-intensive microscopy or expensive molecular assays (up to 10 days), while existing ML studies suffer from small datasets (<500 images), expert selection bias, and lack of public availability. A fully automated identification system integrating robotic microscopy (Keyence VHX-700) with deep learning was developed. The Soil Fungi Microscopic Images Dataset (SFMID) comprises 20,151 images (11,511 no-water, 8640 water-based)—the largest publicly available soil fungi dataset. Four CNN architectures (InceptionResNetV2, ResNet152V2, DenseNet121, DenseNet201) were evaluated with transfer learning and three-shot majority voting. Grad-CAM analysis validated biological relevance. ResNet152V2 conv2 achieved optimal SFMID-NW performance (precision: 0.6711; AUC: 0.8031), with real-time inference (20 ms, 48–49 images/second). Statistical validation (McNemar’s test: χ2=27.34,p<0.001) confirmed that three-shot classification significantly outperforms single-image prediction. Confusion analysis identified Fusarium–Trichoderma (no-water) and Fusarium–Verticillium (water-based) challenges, indicating morphological ambiguities. The publicly available SFMID provides a scalable foundation for AI-enhanced agricultural diagnostics. Full article
(This article belongs to the Special Issue Latest Research on Computer Vision and Image Processing)
Show Figures

Figure 1

36 pages, 4355 KB  
Article
The Comparison of Human and Machine Performance in Object Recognition
by Gokcek Kul and Andy J. Wills
Behav. Sci. 2026, 16(1), 109; https://doi.org/10.3390/bs16010109 - 13 Jan 2026
Viewed by 21
Abstract
Deep learning models have advanced rapidly, leading to claims that they now match or exceed human performance. However, such claims are often based on closed-set conditions with fixed labels, extensive supervised training, and do not considering differences between the two systems. Recent findings [...] Read more.
Deep learning models have advanced rapidly, leading to claims that they now match or exceed human performance. However, such claims are often based on closed-set conditions with fixed labels, extensive supervised training, and do not considering differences between the two systems. Recent findings also indicate that some models align more closely with human categorisation behaviour, whereas other studies argue that even highly accurate models diverge from human behaviour. Following principles from comparative psychology and imposing similar constraints on both systems, this study investigates whether these models can achieve human-level accuracy and human-like categorisation through three experiments using subsets of the ObjectNet dataset. Experiment 1 examined performance under varying presentation times and task complexities, showing that while recent models can match or exceed humans under conditions optimised for machines, they struggle to generalise to certain real-world categories without fine-tuning or task-specific zero-shot classification. Experiment 2 tested whether human performance remains stable when shifting from N-way categorisation to a free-naming task, while machine performance declines without fine-tuning; the results supported this prediction. Additional analyses separated detection from classification, showing that object isolation improved performance for both humans and machines. Experiment 3 investigated individual differences in human performance and whether models capture the qualitative ordinal relationships characterising human categorisation behaviour; only the multimodal CoCa model achieved this. These findings clarify the extent to which current models approximate human categorisation behaviour beyond mere accuracy and highlight the importance of incorporating principles from comparative psychology while considering individual differences. Full article
(This article belongs to the Special Issue Advanced Studies in Human-Centred AI)
Show Figures

Figure 1

22 pages, 30575 KB  
Article
Dual-Domain Seismic Data Reconstruction Based on U-Net++
by Enkai Li, Wei Fu, Feng Zhu, Bonan Li, Xiaoping Fan, Tuo Zheng, Peng Zhang, Tiantian Hu, Ziming Zhou, Chongchong Wang and Pengcheng Jiang
Processes 2026, 14(2), 263; https://doi.org/10.3390/pr14020263 - 12 Jan 2026
Viewed by 148
Abstract
Missing seismic data in reflection seismology, which frequently arises from a variety of operational and natural limitations, immediately impairs the quality of ensuing imaging and calls into question the validity of geological interpretation. Traditional techniques for reconstructing seismic data frequently rely significantly on [...] Read more.
Missing seismic data in reflection seismology, which frequently arises from a variety of operational and natural limitations, immediately impairs the quality of ensuing imaging and calls into question the validity of geological interpretation. Traditional techniques for reconstructing seismic data frequently rely significantly on parameter choices and prior assumptions. Even while these methods work well for partially missing traces, reconstructing whole shot gather is still a difficult task that has not been thoroughly studied. Data-driven approaches that summarize and generalize patterns from massive amounts of data have become more and more common in seismic data reconstruction research in recent years. This work builds on earlier research by proposing an enhanced technique that can recreate whole shot gathers as well as partially missing traces. During model training, we first implement a Moveout-window selective slicing method for reconstructing missing traces. By creating training datasets inside a high signal-to-noise ratio (SNR) window, this method improves the model’s capacity for learning. Additionally, a technique is presented for the receiver domain reconstruction of missing shot data. A dual-domain reconstruction method is used to successfully recover the seismic data in order to handle situations where there is simultaneous missing data in both domains. Full article
Show Figures

Figure 1

19 pages, 528 KB  
Article
On Cost-Effectiveness of Language Models for Time Series Anomaly Detection
by Ali Yassine, Luca Cagliero and Luca Vassio
Information 2026, 17(1), 72; https://doi.org/10.3390/info17010072 - 12 Jan 2026
Viewed by 93
Abstract
Detecting anomalies in time series data is crucial across several domains, including healthcare, finance, and automotive. Large Language Models (LLMs) have recently shown promising results by leveraging robust model pretraining. However, fine-tuning LLMs with several billion parameters requires a large number of training [...] Read more.
Detecting anomalies in time series data is crucial across several domains, including healthcare, finance, and automotive. Large Language Models (LLMs) have recently shown promising results by leveraging robust model pretraining. However, fine-tuning LLMs with several billion parameters requires a large number of training samples and significant training costs. Conversely, LLMs under a zero-shot learning setting require lower overall computational costs, but can fall short in handling complex anomalies. In this paper, we explore the use of lightweight language models for Time Series Anomaly Detection, either zero-shot or via fine-tuning them. Specifically, we leverage lightweight models that were originally designed for time series forecasting, benchmarking them for anomaly detection against both open-source and proprietary LLMs across different datasets. Our experiments demonstrate that lightweight models (<1 Billion parameters) provide a cost-effective solution, as they achieve performance that is competitive and sometimes even superior to that of larger models (>70 Billions). Full article
(This article belongs to the Special Issue Deep Learning Approach for Time Series Forecasting)
Show Figures

Graphical abstract

15 pages, 1363 KB  
Article
Hierarchical Knowledge Distillation for Efficient Model Compression and Transfer: A Multi-Level Aggregation Approach
by Titinunt Kitrungrotsakul and Preeyanuch Srichola
Information 2026, 17(1), 70; https://doi.org/10.3390/info17010070 - 12 Jan 2026
Viewed by 82
Abstract
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these [...] Read more.
The success of large-scale deep learning models in remote sensing tasks has been transformative, enabling significant advances in image classification, object detection, and image–text retrieval. However, their computational and memory demands pose challenges for deployment in resource-constrained environments. Knowledge distillation (KD) alleviates these issues by transferring knowledge from a strong teacher to a student model, which can be compact for efficient deployment or architecturally matched to improve accuracy under the same inference budget. In this paper, we introduce Hierarchical Multi-Segment Knowledge Distillation (HIMS_KD), a multi-stage framework that sequentially distills knowledge from a teacher into multiple assistant models specialized in low-, mid-, and high-level representations, and then aggregates their knowledge into the final student. We integrate feature-level alignment, auxiliary similarity-logit alignment, and supervised loss during distillation. Experiments on benchmark remote sensing datasets (RSITMD and RSICD) show that HIMS_KD improves retrieval performance and enhances zero-shot classification; and when a compact student is used, it reduces deployment cost while retaining strong accuracy. Full article
(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)
Show Figures

Figure 1

31 pages, 3343 KB  
Article
GridFM: A Physics-Informed Foundation Model for Multi-Task Energy Forecasting Using Real-Time NYISO Data
by Ali Sayghe, Mohammed Ahmed Mousa, Salem Batiyah, Abdulrahman Husawi and Mansour Almuwallad
Energies 2026, 19(2), 357; https://doi.org/10.3390/en19020357 - 11 Jan 2026
Viewed by 108
Abstract
The rapid integration of renewable energy sources and increasing complexity of modern power grids demand advanced forecasting tools capable of simultaneously predicting multiple interconnected variables. While time series foundation models (TSFMs) have demonstrated remarkable zero-shot forecasting capabilities across diverse domains, their application in [...] Read more.
The rapid integration of renewable energy sources and increasing complexity of modern power grids demand advanced forecasting tools capable of simultaneously predicting multiple interconnected variables. While time series foundation models (TSFMs) have demonstrated remarkable zero-shot forecasting capabilities across diverse domains, their application in power grid operations remains limited due to complex coupling relationships between load, price, emissions, and renewable generation. This paper proposes GridFM, a novel physics-informed foundation model specifically designed for multi-task energy forecasting in power systems. GridFM introduces four key innovations: (1) a FreqMixer adaptation layer that transforms pre-trained foundation model representations to power-grid-specific patterns through frequency domain mixing without modifying base weights; (2) a physics-informed constraint module embedding power balance equations and zonal grid topology using graph neural networks; (3) a multi-task learning framework enabling joint forecasting of load demand, locational-based marginal prices (LBMP), carbon emissions, and renewable generation with uncertainty-weighted loss functions; and (4) an explainability module utilizing SHAP values and attention visualization for interpretable predictions. We validate GridFM using over 10 years of real-time data from the New York Independent System Operator (NYISO) at 5 min resolution, comprising more than 10 million data points across 11 load zones. Comprehensive experiments demonstrate that GridFM achieves state-of-the-art performance with an 18.5% improvement in load forecasting MAPE (achieving 2.14%), a 23.2% improvement in price forecasting (achieving 7.8% MAPE), and a 21.7% improvement in emission prediction compared to existing TSFMs including Chronos, TimesFM, and Moirai-MoE. Ablation studies confirm the contribution of each proposed component. The physics-informed constraints reduce physically inconsistent predictions by 67%, while the multi-task framework improves individual task performance by exploiting inter-variable correlations. The proposed model provides interpretable predictions supporting the Climate Leadership and Community Protection Act (CLCPA) 2030/2040 compliance objectives, enabling grid operators to make informed decisions for sustainable energy transition and carbon reduction strategies. Full article
Show Figures

Figure 1

17 pages, 5916 KB  
Article
Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States
by Naoyuki Matsuyama, Weiwei Wan and Kensuke Harada
Appl. Sci. 2026, 16(2), 717; https://doi.org/10.3390/app16020717 - 9 Jan 2026
Viewed by 133
Abstract
To achieve precise in-hand manipulation and feedback control using soft robotic fingers, it is essential to accurately measure their deformable structures. In particular, estimating the three-dimensional shape of a soft finger under contact conditions is a critical challenge, as the deformation state directly [...] Read more.
To achieve precise in-hand manipulation and feedback control using soft robotic fingers, it is essential to accurately measure their deformable structures. In particular, estimating the three-dimensional shape of a soft finger under contact conditions is a critical challenge, as the deformation state directly affects manipulation reliability. However, nonlinear deformations and occlusions arising from interactions with external objects make the estimation difficult. To address these issues, we propose a soft finger structure that integrates small magnets and magnetic sensors inside the body, enabling the acquisition of rich deformation information in both contact and non-contact states. The design provides a 15-dimensional time-series signal composed of motor angles, motor currents, and magnetic sensor outputs as inputs for shape estimation. Built on the sensing signals, we propose a mode-selection-based learning approach that outputs multiple candidate shapes and selects the correct one. The proposed network predicts the three-dimensional positions of four external markers attached to the finger, which serve as a proxy representation of the finger’s shape. The network is trained in a supervised manner using ground-truth marker positions measured by a motion capture system. The experimental results under both contact and non-contact conditions demonstrate that the proposed method achieves an average estimation error of approximately 4 mm, outperforming conventional one-shot regression models that output coordinates directly. The integration of magnetic sensing is demonstrated to be able to enable accurate recognition of contact states and significantly improve stability in shape estimation. Full article
Show Figures

Figure 1

24 pages, 1916 KB  
Article
ServiceGraph-FM: A Graph-Based Model with Temporal Relational Diffusion for Root-Cause Analysis in Large-Scale Payment Service Systems
by Zhuoqi Zeng and Mengjie Zhou
Mathematics 2026, 14(2), 236; https://doi.org/10.3390/math14020236 - 8 Jan 2026
Viewed by 133
Abstract
Root-cause analysis (RCA) in large-scale microservice-based payment systems is challenging due to complex failure propagation along service dependencies, limited availability of labeled incident data, and heterogeneous service topologies across deployments. We propose ServiceGraph-FM, a pretrained graph-based model for RCA, where “foundation” denotes a [...] Read more.
Root-cause analysis (RCA) in large-scale microservice-based payment systems is challenging due to complex failure propagation along service dependencies, limited availability of labeled incident data, and heterogeneous service topologies across deployments. We propose ServiceGraph-FM, a pretrained graph-based model for RCA, where “foundation” denotes a self-supervised graph encoder pretrained on large-scale production cluster traces and then adapted to downstream diagnosis. ServiceGraph-FM introduces three components: (1) masked graph autoencoding pretraining to learn transferable service-dependency embeddings for cross-topology generalization; (2) a temporal relational diffusion module that models anomaly propagation as graph diffusion on dynamic service graphs (i.e., Laplacian-governed information flow with learnable edge propagation strengths); and (3) a causal attention mechanism that leverages multi-hop path signals to better separate likely causes from correlated downstream effects. Experiments on the Alibaba Cluster Trace and synthetic PayPal-style topologies show that ServiceGraph-FM outperforms state-of-the-art baselines, improving Top-1 accuracy by 23.7% and Top-3 accuracy by 18.4% on average, and reducing mean time to detection by 31.2%. In zero-shot deployment on unseen architectures, the pretrained model retains 78.3% of its fully fine-tuned performance, indicating strong transferability for practical incident management. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

20 pages, 4726 KB  
Article
Enhancing SeeGround with Relational Depth Text for 3D Visual Grounding
by Hyun-Sik Jeon, Seong-Hui Kang and Jong-Eun Ha
Appl. Sci. 2026, 16(2), 652; https://doi.org/10.3390/app16020652 - 8 Jan 2026
Viewed by 131
Abstract
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot [...] Read more.
Three-dimensional visual grounding is a core technology that identifies specific objects within complex 3D scenes based on natural language instructions, enhancing human–machine interactions in robotics and augmented reality domains. Traditional approaches have focused on supervised learning, which relies on annotated data; however, zero-shot methodologies are emerging due to the high costs of data construction and limitations in generalization. SeeGround achieves state-of-the-art performance by integrating 2D rendered images and spatial text descriptions. Nevertheless, SeeGround exhibits vulnerabilities in clearly discerning relative depth relationships owing to its implicit depth representations in 2D views. This study proposes the relational depth text (RDT) technique to overcome these limitations, utilizing a Monocular Depth Estimation model to extract depth maps from rendered 2D images and applying the K-Nearest Neighbors algorithm to convert inter-object relative depth relations into natural language descriptions, thereby incorporating them into Vision–Language Model (VLM) prompts. This method distinguishes itself by augmenting spatial reasoning capabilities while preserving SeeGround’s existing pipeline, demonstrating a 3.54% improvement in the Acc@0.25 metric on the Nr3D dataset in a 7B VLM environment that is approximately 10.3 times lighter than the original model, along with a 6.74% increase in Unique cases on the ScanRefer dataset, albeit with a 1.70% decline in Multiple cases. The proposed technique enhances the robustness of grounding through viewpoint anchoring and candidate discrimination in complex query scenarios, and is expected to improve efficiency in practical applications through future multi-view fusion and conditional execution optimizations. Full article
(This article belongs to the Special Issue Advances in Computer Graphics and 3D Technologies)
Show Figures

Figure 1

25 pages, 1075 KB  
Article
Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer
by Deling Huang, Zanxiong Li, Jian Yu and Yulong Zhou
Information 2026, 17(1), 58; https://doi.org/10.3390/info17010058 - 8 Jan 2026
Viewed by 176
Abstract
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, [...] Read more.
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, existing verbalizer construction methods often rely on external knowledge bases, which require complex noise filtering and manual refinement, making the process time-consuming and labor-intensive, while approaches based on pre-trained language models (PLMs) frequently overlook inherent prediction biases. Furthermore, conventional data augmentation methods focus on modifying input instances while overlooking the integral role of label semantics in prompt tuning. This disconnection often leads to a trade-off where increased sample diversity comes at the cost of semantic consistency, resulting in marginal improvements. To address these limitations, this paper first proposes a novel Bayesian Mutual Information-based method that optimizes label mapping to retain general PLM features while reducing reliance on irrelevant or unfair attributes to mitigate latent biases. Based on this method, we propose two synergistic generators that synthesize semantically consistent samples by integrating label word information from the verbalizer to effectively enrich data distribution and alleviate sparsity. To guarantee the reliability of the augmented set, we propose a Low-Entropy Selector that serves as a semantic filter, retaining only high-confidence samples to safeguard the model against ambiguous supervision signals. Furthermore, we propose a Difficulty-Aware Adversarial Training framework that fosters generalized feature learning, enabling the model to withstand subtle input perturbations. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on most few-shot and full-data splits, with F1 score improvements of up to +2.8% on the standard AG’s News benchmark and +1.0% on the challenging DBPedia benchmark. Full article
Show Figures

Graphical abstract

31 pages, 14010 KB  
Article
Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction
by Haoran Zhang, Xiaofei Li, Tianjiao Wan and Junjie Du
Symmetry 2026, 18(1), 112; https://doi.org/10.3390/sym18010112 - 7 Jan 2026
Viewed by 213
Abstract
Deep reinforcement learning (DRL) plays a pivotal role in decision-making within financial markets. However, DRL models are highly reliant on raw market data and often overlook the impact of future trends on model performance. To address these challenges, we propose a novel framework [...] Read more.
Deep reinforcement learning (DRL) plays a pivotal role in decision-making within financial markets. However, DRL models are highly reliant on raw market data and often overlook the impact of future trends on model performance. To address these challenges, we propose a novel framework named Cluster Embedding-Proximal Policy Optimization (CE-PPO) for trading decision-making in financial markets. Specifically, the framework groups feature channels with intrinsic similarities and enhances the original model by leveraging clustering information instead of features from individual channels. Meanwhile, zero-shot prediction for unseen samples is achieved by assigning them to appropriate clusters. Future Open, High, Low, Close, and Volume (OHLCV) data predicted from observed values are integrated with actually observed OHLCV data, forming the state space inherent to reinforcement learning. Experiments conducted on five real-world financial datasets demonstrate that the time series model integrated with Cluster Embedding (CE) achieves significant improvements in predictive performance: in short-term prediction, the Mean Absolute Error (MAE) is reduced by an average of 20.09% and the Mean Squared Error (MSE) by 30.12%; for zero-shot prediction, the MAE and MSE decrease by an average of 21.56% and 31.71%, respectively. Through data augmentation using real and predicted data, the framework substantially enhances trading performance, achieving a cumulative return rate of 137.94% on the S&P 500 Index. Beyond its empirical contributions, this study also highlights the conceptual relevance of symmetry in the domain of algorithmic trading. The constructed deep reinforcement learning framework is capable of capturing the inherent balanced relationships and nonlinear interaction characteristics embedded in financial market behaviors. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis III)
Show Figures

Figure 1

40 pages, 2728 KB  
Article
From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection
by Geuncheol Jang and Yongjin Kwon
Actuators 2026, 15(1), 38; https://doi.org/10.3390/act15010038 - 6 Jan 2026
Viewed by 247
Abstract
Unmanned helicopters play a critical role in various fields including defense, disaster response, and infrastructure inspection. Military platforms such as the MQ-8C Fire Scout represent high-value assets exceeding $40 million per unit including development costs, particularly when compared to expendable multicopter drones costing [...] Read more.
Unmanned helicopters play a critical role in various fields including defense, disaster response, and infrastructure inspection. Military platforms such as the MQ-8C Fire Scout represent high-value assets exceeding $40 million per unit including development costs, particularly when compared to expendable multicopter drones costing approximately $500–2000 per unit. Unexpected failures of these high-value assets can lead to substantial economic losses and mission failures, making the implementation of Health and Usage Monitoring Systems (HUMS) essential. However, the scarcity of failure data in unmanned helicopters presents significant challenges for HUMS development, while the economic feasibility of investing resources comparable to manned helicopter programs remains questionable. This study presents a novel cross-scale transfer learning framework for vibration-based anomaly detection in unmanned helicopters. The framework successfully transfers knowledge from a source domain (Airbus large manned helicopter) using publicly available data to a target domain (Stanford small RC helicopter), achieving excellent anomaly detection performance without labeled target domain data. The approach consists of three key processes. First, we developed a multi-task learning transformer model achieving an F-β score of 0.963 (β = 0.3) using only Airbus vibration data. Second, we applied CORAL (Correlation Alignment) domain adaptation techniques to reduce the distribution discrepancy between source and target domains by 79.7%. Third, we developed a Control Effort Score (CES) based on control input data as a proxy labeling metric for 20 flight maneuvers in the target domain, achieving a Spearman correlation coefficient ρ of 0.903 between the CES and the Anomaly Index measured by the transfer-learned model. This represents a 95.5% improvement compared to the non-transfer learning baseline of 0.462. Full article
Show Figures

Figure 1

17 pages, 5623 KB  
Article
Prompt-Contrastive Learning for Zero-Shot Relation Extraction
by Xueyi Zhong, Liye Zhao, Licheng Peng, Guodong Yang, Kun Hu and Wansen Wu
Entropy 2026, 28(1), 69; https://doi.org/10.3390/e28010069 - 6 Jan 2026
Viewed by 228
Abstract
Relation extraction serves as an essential task for knowledge acquisition and management, defined as determining the relation between two annotated entities from a piece of text. Over recent years, zero-shot learning has been introduced to train relation extraction models due to the expensive [...] Read more.
Relation extraction serves as an essential task for knowledge acquisition and management, defined as determining the relation between two annotated entities from a piece of text. Over recent years, zero-shot learning has been introduced to train relation extraction models due to the expensive cost of incessantly annotating emerging relations. Current methods endeavor to transfer knowledge of seen relations into predictions of unseen relations by conducting relation extraction through different tasks. Nonetheless, the divergence in task formulations prevents relation extraction models from acquiring informative semantic representations, resulting in inferior performance. In this paper, we strive to exploit the relational knowledge contained in pre-trained language models, which may generate enlightening information for the representation of unseen relations from seen relations. To this end, we investigate a Prompt-Contrastive learning perspective for Relation Extraction under a zero-shot setting, namely PCRE. To be specific, based on leveraging semantic knowledge from pre-trained language models with prompt tuning, we augment each instance with different prompt templates to construct two views for an instance-level contrastive objective. Additionally, we devise an instance-description contrastive objective to elicit relational knowledge from relation descriptions. With joint optimization, the relation extraction model can learn how to separate relations. The experimental results show our PCRE method outperforms state-of-the-art baselines in zero-shot relation extraction. The further extensive analysis verifies that our proposal is robust in different datasets, the number of seen relations, and the number of training instances. Full article
Show Figures

Figure 1

17 pages, 1388 KB  
Article
HISF: Hierarchical Interactive Semantic Fusion for Multimodal Prompt Learning
by Haohan Feng and Chen Li
Multimodal Technol. Interact. 2026, 10(1), 6; https://doi.org/10.3390/mti10010006 - 6 Jan 2026
Viewed by 100
Abstract
Recent vision-language pre-training models, like CLIP, have been shown to generalize well across a variety of multitask modalities. Nonetheless, their generalization for downstream tasks is limited. As a lightweight adaptation approach, prompt learning could allow task transfer by optimizing only several learnable vectors [...] Read more.
Recent vision-language pre-training models, like CLIP, have been shown to generalize well across a variety of multitask modalities. Nonetheless, their generalization for downstream tasks is limited. As a lightweight adaptation approach, prompt learning could allow task transfer by optimizing only several learnable vectors and thus is more flexible for pre-trained models. However, current methods mainly concentrate on the design of unimodal prompts and ignore effective means for multimodal semantic fusion and label alignment, which limits their representation power. To tackle these problems, this paper designs a Hierarchical Interactive Semantic Fusion (HISF) framework for multimodal prompt learning. On top of frozen CLIP backbones, HISF injects visual and textual signals simultaneously in intermediate layers of a Transformer through a cross-attention mechanism as well as fitting category embeddings. This architecture realizes the hierarchical semantic fusion at the modality level with structural consistency kept at each layer. In addition, a Label Embedding Constraint and a Semantic Alignment Loss are proposed to promote category consistency while alleviating semantic drift in training. Extensive experiments across 11 few-shot image classification benchmarks show that HISF improves the average accuracy by around 0.7% compared to state-of-the-art methods and has remarkable robustness in cross-domain transfer tasks. Ablation studies also verify the effectiveness of each proposed part and their combination: hierarchical structure, cross-modal attention, and semantic alignment collaborate to enrich representational capacity. In conclusion, the proposed HISF is a new hierarchical view for multimodal prompt learning and provides a more lightweight and generalizable paradigm for adapting vision-language pre-trained models. Full article
Show Figures

Figure 1

Back to TopTop