From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection

Jang, Geuncheol; Kwon, Yongjin

doi:10.3390/act15010038

Open AccessArticle

From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection

by

Geuncheol Jang

¹

and

Yongjin Kwon

^2,*

¹

Department of Military Digital Convergence, Ajou University, Suwon 16499, Republic of Korea

²

Department of Industrial Engineering, Ajou University, Suwon 16499, Republic of Korea

^*

Author to whom correspondence should be addressed.

Actuators 2026, 15(1), 38; https://doi.org/10.3390/act15010038

Submission received: 15 November 2025 / Revised: 29 December 2025 / Accepted: 2 January 2026 / Published: 6 January 2026

(This article belongs to the Special Issue Advanced Actuators and Dampers for Next-Generation Vibration and Noise Control)

Download

Browse Figures

Versions Notes

Abstract

Unmanned helicopters play a critical role in various fields including defense, disaster response, and infrastructure inspection. Military platforms such as the MQ-8C Fire Scout represent high-value assets exceeding $40 million per unit including development costs, particularly when compared to expendable multicopter drones costing approximately $500–2000 per unit. Unexpected failures of these high-value assets can lead to substantial economic losses and mission failures, making the implementation of Health and Usage Monitoring Systems (HUMS) essential. However, the scarcity of failure data in unmanned helicopters presents significant challenges for HUMS development, while the economic feasibility of investing resources comparable to manned helicopter programs remains questionable. This study presents a novel cross-scale transfer learning framework for vibration-based anomaly detection in unmanned helicopters. The framework successfully transfers knowledge from a source domain (Airbus large manned helicopter) using publicly available data to a target domain (Stanford small RC helicopter), achieving excellent anomaly detection performance without labeled target domain data. The approach consists of three key processes. First, we developed a multi-task learning transformer model achieving an F-β score of 0.963 (β = 0.3) using only Airbus vibration data. Second, we applied CORAL (Correlation Alignment) domain adaptation techniques to reduce the distribution discrepancy between source and target domains by 79.7%. Third, we developed a Control Effort Score (CES) based on control input data as a proxy labeling metric for 20 flight maneuvers in the target domain, achieving a Spearman correlation coefficient ρ of 0.903 between the CES and the Anomaly Index measured by the transfer-learned model. This represents a 95.5% improvement compared to the non-transfer learning baseline of 0.462.

Keywords:

unmanned helicopter; HUMS; anomaly detection; vibration analysis; transformer; transfer learning; zero-shot; domain adaptation; cross-scale adaptation; CORAL

1. Introduction

The primary objective of this study is to develop a cross-scale transfer learning framework that leverages abundant vibration data from open-source large manned helicopters to detect anomalies in data-scarce small unmanned helicopter platforms. Through this approach, we aim to present a practical methodology that enables immediate and effective HUMS implementation even when new unmanned helicopter platforms are introduced. Given the widespread utilization and high asset value of unmanned helicopters [1], ensuring the reliability and availability of these systems represents a critical challenge. HUMS serves as a key technology to achieve these goals by enabling preventive maintenance through real-time condition monitoring and failure prediction.

The vibration signal of an unmanned helicopter is recognized as the most important indicator for evaluating its mechanical health. This is because helicopters are inherently assemblies of rotating mechanical systems, comprising numerous rotating components including main rotors, tail rotors, engines, gearboxes, and drive shafts. Each component exhibits unique vibration characteristics, and analyzing these enables accurate assessment of system health conditions. The effectiveness of HUMS implementation has been clearly demonstrated, particularly through large-scale empirical studies by the US Army. The US Army equipped over 2500 helicopters with HUMS and operated a Condition-Based Maintenance (CBM) program, establishing four core objectives: reducing maintenance burden, increasing availability, enhancing safety, and reducing costs. This study, analyzing hundreds of thousands of flight hours, demonstrated concrete quantitative achievements. For UH-60 Black Hawk helicopters, the Non-Mission Capable due to Maintenance (NMCM) rate decreased by 5.3%, while AH-64D Apache helicopters showed a reduction of 1.44 h in Maintenance Test Flight (MTF) time per 100 flight hours. Additionally, component costs per flight hour decreased by 12–22%, and various platforms confirmed 5–8% increases in readiness [2].

The importance of HUMS is also evident from a safety perspective. According to initial research by the Joint Helicopter Safety Analysis Team (JHSAT), 47% of accidents caused by component/system failures could have been prevented through HUMS or equivalent systems. The US Army has already prevented three Class A mishaps through HUMS implementation and predicts preventing an additional 11–12 accidents over the next decade [2]. There are several reasons why vibration signals are particularly important. First, vibrations provide early indications of mechanical defects. For example, according to existing research, bearing defects, which are the primary cause of rotating machinery failures, account for 41–44% of total failures, and early detection through vibration analysis is essential for safe and efficient system operation [3]. Second, vibration signals can be monitored continuously and in real time. Unlike other indicators such as temperature or pressure, vibrations immediately reflect the dynamic state of mechanical systems and can capture transient states or momentary anomalies. Third, vibration signals contain rich information. Analysis in time domain, frequency domain, and time–frequency domain enables identification of various types of defects and assessment of their severity.

Understanding the mechanisms of helicopter vibration generation forms the foundation of effective vibration analysis. The main rotor system, as the primary source of helicopter vibrations, generates characteristic vibrations at rotor rotational frequency and blade passing frequency. Rotor blade imbalance, tracking errors, and lead-lag damper defects each exhibit unique vibration patterns, enabling precise identification of defect types and locations. Gearboxes generate vibrations at gear meshing frequencies, and gear tooth wear or damage produces sidebands.

However, analyzing vibration signals from unmanned helicopters presents several challenges. Signal non-stationarity due to continuously changing operating conditions during flight, complex superposition of multiple vibration sources, environmental noise and interference, and platform-specific vibration characteristics all complicate accurate analysis. Particularly for new UAV platforms, failure data collection is extremely limited due to the complexity and variability of actual operating environments. The number of failure samples obtainable only at failure moments for specific UAV types and missions is very limited, and purposefully collecting numerous failure samples is costly and extremely dangerous. This scarcity of failure samples represents the greatest constraint in developing deep learning-based diagnostic models [4].

The most fundamental challenge faced in developing HUMS for new unmanned helicopter platforms is the lack of sufficient failure data. This problem arises from multiple factors working in combination, making the application of traditional data-driven approaches extremely difficult. During new military UAV development, typically only a few prototypes are deployed for limited testing periods, as demonstrated by recent U.S. Army programs where three vendors developed single prototypes for sequential demonstrations and evaluations [5]. Moreover, test flights are conducted conservatively with safety as the top priority, resulting in particularly insufficient data on extreme conditions or abnormal situations. According to predictive maintenance research in aviation, high-value assets like aircraft inherently experience very rare failures, creating severe data imbalance problems, which have been identified as a major constraint in developing machine learning-based prognostic maintenance models [6].

High reliability requirements paradoxically exacerbate data scarcity. Modern unmanned helicopters achieve high reliability through strict quality control and preventive maintenance, which is desirable from an operational safety perspective but makes collecting failure data necessary for HUMS development difficult. For instance, for a system with a mean time between failures of 1000 h, collecting 100 statistically significant failure cases would require 100,000 h of operation, which would take decades with a single platform.

Traditional HUMS development approaches have not fundamentally resolved this data scarcity problem. Physics-based modeling can be applied even in data-scarce situations, but accurately modeling complex systems like unmanned helicopters is realistically very difficult. Pure data-driven approaches show excellent performance when sufficient data is available but suffer from severe overfitting problems in data-scarce situations. Hybrid approaches and digital twin technologies each have their limitations and do not completely solve the fundamental data scarcity problem.

In this context, transfer learning has emerged as a highly promising solution. Large manned helicopters possess vast operational data accumulated over decades, encompassing various operating conditions and failure modes. If this rich data can be utilized to develop HUMS for new UAV platforms, the time and cost required for data collection can be significantly reduced. However, the extreme physical scale differences between large manned helicopters and small unmanned helicopters present new challenges, requiring innovative approaches to overcome them.

1.1. Related Work

1.1.1. Helicopter Vibration Analysis

The development of helicopter Health and Usage Monitoring Systems (HUMS) has a rich history spanning several decades. Early HUMS implementations focused primarily on usage monitoring and flight regime recognition, with vibration-based condition monitoring emerging as a critical capability in the 1990s [7]. Contemporary HUMS architectures integrate multiple data streams including vibration signatures, oil debris analysis, and performance parameters to provide comprehensive health assessments [8]. Comprehensive reviews of vibration-based bearing and gear health indicators have established standardized methodologies for condition monitoring [9].

Helicopter vibration analysis has evolved significantly with advances in sensor technology and signal processing techniques. Traditional approaches rely on tracking specific frequency components associated with rotating machinery, such as main rotor harmonics (1/rev, N/rev), tail rotor frequencies, and gearbox meshing frequencies [10]. Recent studies have demonstrated the effectiveness of advanced signal processing methods including wavelet transforms, empirical mode decomposition, and time–frequency representations for capturing transient fault signatures [11,12].

Serafini et al. [13] proposed an innovative differential analysis approach for in-flight blade health monitoring using strain gauge measurements. Their physics-based methodology provides complementary insights to data-driven approaches by directly measuring blade structural responses. Similarly, research on gearbox fault detection has shown that vibration-based monitoring can identify gear tooth wear, bearing defects, and lubrication degradation before catastrophic failure [14,15].

1.1.2. Deep Learning for Rotating Machinery Diagnostics

The application of deep learning to rotating machinery fault diagnosis has gained substantial momentum in recent years [16,17]. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in extracting spatial features from vibration spectrograms [18,19,20], while Long Short-Term Memory (LSTM) networks effectively capture temporal dependencies in time-series data [21]. More recently, Transformer architectures have shown superior performance in modeling long-range dependencies in sequential data, making them particularly suitable for vibration signal analysis [22,23]. Additionally, deep learning models with inherent robustness to noise and domain shift have been developed for practical industrial applications [24]. Deep autoencoder methods have also demonstrated effectiveness in unsupervised feature learning for machinery diagnostics [25].

Self-supervised and semi-supervised learning approaches have emerged as promising solutions for scenarios with limited labeled data. Contrastive learning methods can learn meaningful representations from unlabeled vibration data, reducing the dependence on expensive fault annotations [26]. These approaches align well with the practical constraints of industrial applications where failure data is inherently scarce.

1.1.3. Transfer Learning and Domain Adaptation in Machinery Diagnostics

Transfer learning has emerged as a powerful paradigm for addressing data scarcity in machinery fault diagnosis [27,28]. The theoretical foundations of domain adaptation have been established by Ben-David et al. [29], who provided generalization bounds for learning across different domains. Cross-domain fault diagnosis studies have demonstrated that knowledge learned from one machine can be successfully transferred to another, even when operating under different conditions [30,31,32]. Domain adaptation techniques, including Maximum Mean Discrepancy (MMD), CORAL, and adversarial learning methods such as Domain-Adversarial Neural Networks (DANN), have been extensively studied for reducing distribution shift between source and target domains [33,34,35]. Cross-sensor domain adaptation has also been explored for electro-mechanical systems [36].

However, most existing transfer learning studies in machinery diagnostics focus on transfer between similar-scale machines, such as different bearing types or gearbox configurations, or between laboratory and field conditions [37]. Recent work has demonstrated that transfer learning can enable effective fault diagnosis even with small target datasets without extensive preprocessing [38]. Cross-scale transfer, particularly from large manned helicopters to small unmanned platforms, presents unique challenges due to fundamental differences in rotor dynamics, structural resonances, and vibration transmission paths. This study addresses this gap by developing a framework specifically designed for extreme scale differences.

1.1.4. UAV and Unmanned Helicopter Health Monitoring

The rapid proliferation of unmanned aerial systems has created urgent demand for reliable health monitoring solutions. Unlike manned helicopters with decades of operational data, new UAV platforms typically lack sufficient failure histories for traditional data-driven approaches [39]. Recent studies have explored various strategies including physics-informed neural networks, synthetic data augmentation, and transfer learning from simulation environments [40,41]. Transfer learning approaches specifically designed for unlabeled target domains have shown particular promise [42].

The unique characteristics of small unmanned helicopters—including higher rotor speeds, different structural dynamics, and more aggressive flight envelopes—necessitate specialized approaches that can adapt knowledge from mature platforms while accounting for scale-dependent differences. Our framework addresses this challenge through the combination of multi-task representation learning and statistical domain adaptation.

1.2. Contributions and Organization of the Paper

Therefore, presenting a new methodology that overcomes existing limitations is the goal of this study. The organization of this study is as follows. Section 2 describes the datasets used and the proposed methodology in detail. We analyze the characteristics of Airbus and Stanford datasets and present the Transformer-based anomaly detection model, CORAL-based domain adaptation, and CES definition and calculation methods. Section 3 presents experimental settings and results, including learning results in the source domain, transfer performance to the target domain, and anomaly detection performance evaluated based on CES. Section 4 provides an in-depth discussion of the results, addressing success factors, limitations, and future research directions of the proposed method. Finally, Section 5 presents conclusions and summarizes the significance of this study.

The results of this study are expected to have immediate impact in actual industrial settings beyond merely academic contributions. In particular, defense contractors and drone manufacturers developing new unmanned helicopter platforms will be able to construct effective HUMS from early development stages utilizing this framework. This significantly enhances safety during test flight phases and enables early acquisition of reliability data necessary for certification. Furthermore, the cross-scale transfer learning methodology presented in this study can be extended not only to helicopters but also to other rotorcraft or complex mechanical systems. Ultimately, by presenting a solution to the chronic problem of data scarcity in the HUMS field, this study will contribute to the development and operation of safer and more reliable unmanned aerial systems.

2. Materials and Methods

The approach of this study to address the problems described in Section 1 is as follows. First, we develop a model capable of comprehensively learning the complex characteristics of helicopter vibration signals. By simultaneously performing sequence prediction, reconstruction, and anomaly detection, the model captures all temporal, frequency, and statistical characteristics of the signals. Second, we develop domain adaptation techniques that can overcome extreme physical scale differences. Third, we develop physics-based metrics that enable objective performance evaluation even in unlabeled target domains. We develop the Control Effort Score (CES) to quantify the complexity of flight maneuvers and evaluate anomaly detection performance through this metric.

The main contributions of this study are summarized as follows. From a methodological perspective, we developed a Transformer model specialized for helicopter vibration analysis. This model effectively models long-range dependencies in vibration signals through self-attention mechanisms and extracts richer and more robust feature representations through multi-task learning. Additionally, we performed transfer learning by aligning feature distributions between source and target domains through CORAL (Correlation Alignment)-based domain adaptation.

For this purpose, we conducted research utilizing actual data available as open source. For the source domain, we utilized 1677 training sequences and 594 validation (labeled) sequences of data from Airbus helicopters, and for the target domain, we utilized 15,517 samples from Stanford RC helicopters to demonstrate the effectiveness of the proposed method. Particularly, we scored and classified the difficulty of anomaly values for 20 flight maneuvers in Stanford RC helicopter data using CES and quantitatively analyzed the correlation with transfer-learned anomaly detection performance. In this process, we assumed that as flight maneuvers become more variable and aggressive, the anomaly values for vibrations would increase proportionally.

In conclusion, the design of this study was structured to simultaneously satisfy three industrial requirements: (i) solving the chronic problem of data scarcity in HUMS, (ii) precision-centered operational requirements (minimizing False Positives), and (iii) transfer between platforms with different scales. To achieve this, we first learned high-precision anomaly detection representations with multi-task Learning Transformer from label-rich Airbus manned helicopter vibration data, then performed CORAL statistical alignment with label-free Stanford RC helicopter data to achieve zero-shot transfer. Additionally, to overcome the constraint of lacking labels in the target domain, we designed a physics-based metric, the Control Effort Score (CES) based on control inputs, to enable objective performance verification.

Figure 1 illustrates the overall framework structure of this research. It was designed to successfully transfer knowledge learned from Airbus data in the Source Domain to data in the Target Domain. To overcome the limitations of the unlabeled Target Domain, we substituted with CES and ultimately conducted correlation analysis with NAI (Normalized Anomaly Index).

2.1. Dataset

2.1.1. Airbus Helicopter Dataset

The dataset used as the source domain in this study is test flight data for helicopter vibration anomaly detection published by Airbus in 2020, which includes sensor records from Airbus large manned helicopters (the exact model name is undisclosed for security reasons) [43], as detailed in Table 1.

The Airbus dataset contains various flight conditions and maneuvers, providing advantageous characteristics for HUMS development. Examining the configuration of the data collection system, vibrations were measured in X, Y, and Z directions using high-precision 3-axis accelerometers. Here, the X-axis represents the forward direction, Y-axis the lateral direction, and Z-axis the vertical direction. The sampling frequency is set at 1024 Hz, which means that frequency components up to 512 Hz can be measured without aliasing according to the Nyquist theorem. Considering that the main vibration frequency band of helicopters lies in the 0–300 Hz range, this sampling frequency provides sufficient margin. Additionally, all 1677 training data sequences are normal state data used for learning normal patterns. The 594 validation data sequences contain a mixture of normal and abnormal states, with binary labels (normal: 0, abnormal: 1) provided as ground truth for each sequence, which is highly advantageous for model selection. In conclusion, the reasons for utilizing Airbus data can be summarized as follows:

Accessibility: As open source data, suitable for future data utilization or model development for additional unmanned helicopter transfer;
Label/Quality: Normal–abnormal labels and 1024 Hz high-quality measurement → suitable for precision-centered F-β (β = 0.3) optimization;
Representativeness: Sufficiently covers the helicopter-dominant frequency band of 0–300 Hz → advantageous for generalizable representation learning.

2.1.2. Stanford RC Helicopter Dataset

For the target domain, we used data collected from Stanford University’s RC helicopter autonomous flight project [44]. The Stanford RC helicopter is a Synergy N9 model, and its specifications show extreme differences from the source domain Airbus manned helicopter. The total weight is 4.85 kg, which is considerably lighter than the actual Airbus manned helicopter, and the main rotor diameter is 720 mm, which is also significantly smaller [45], as shown in Table 2.

The Stanford dataset was originally constructed for autonomous flight control algorithm development, but in this study, it was utilized as the target domain for vibration-based anomaly detection. Data was collected at a 333 Hz sampling frequency, thus requiring preprocessing during transfer learning due to the different data sampling frequency from the source domain. Additionally, each of the 20 maneuvers represents specific control patterns, including various difficulty levels from forward flight to aerobatic flight. Importantly, this dataset lacks anomaly labels. However, unlike the Airbus data, the Stanford dataset includes control input data in addition to accelerometer data. Therefore, in this study, we developed CES (Control Effort Score) as a proxy metric, utilizing it to evaluate model performance. Therefore, the reasons for utilizing Stanford data can be summarized as follows:

Zero-shot validation suitability: Label absence, sampling and scale differences (333 Hz/small) → high transfer difficulty suitable for methodology validation;
Maneuver diversity: Large difficulty variance across 20 maneuvers suitable for monotonic correlation (model score ↔ maneuver difficulty) analysis;
Control input inclusion: Simultaneous recording of acceleration + controls enables development and evaluation of physical proxy metric called CES.

2.2. Methods

2.2.1. The Overall Framework

The cross-scale transfer learning framework undergoes the following learning process:

The anomaly detection Transformer model simultaneously learns sequence prediction, reconstruction, and anomaly detection loss values to acquire rich feature representations;
CORAL-based domain adaptation aligns feature distributions between source and target domains.

Therefore, the entire learning process of the framework is formulated as the following optimization problem:

\min_{θ} L_{t o t a l} = α L_{p r e d} + β L_{r e c o n} + γ L_{a n o m a l y} + δ L_{C O R A L}

(1)

where θ represents model parameters,

L_{p r e d}

is sequence prediction loss,

L_{r e c o n}

is reconstruction loss,

L_{a n o m a l y}

is anomaly detection loss, and

L_{C O R A L}

is domain adaptation loss. The weight coefficients α, β, γ, and δ are weights that adjust the relative importance of each learning objective. Ultimately, the final objective of the framework is to minimize

L_{t o t a l}

by optimizing parameters, and the anomaly detection Transformer model and CORAL algorithm are also optimized to align with this goal.

2.2.2. Signal Preprocessing and Feature Extraction

Alignment of sampling rates between source and target domains is essential. Therefore, we performed polyphase filtering-based resampling to match Airbus data (1024 Hz, Nyquist frequency: 512 Hz) with Stanford data (333 Hz, Nyquist frequency: 166.5 Hz). To prevent aliasing, we first applied a Butterworth low-pass filter with an appropriate cutoff frequency. In this study, considering the expected vibration frequency range, we selected a 150 Hz cutoff frequency and 12th-order filter [46]:

H (f) = \frac{1}{\sqrt{1 + {(f / f_{c})}^{2 n}}}

(2)

where

f_{c}

is set to 150 Hz. The reason is that filters do not cut perfectly like a knife but have a gradual slope, so we set it slightly lower than Stanford data’s Nyquist frequency of 166.5 Hz with some margin. Additionally, feature extraction is performed simultaneously in both time and frequency domains, as in Table 3.

Frequency domain features are extracted through Fast Fourier Transform (FFT):

X (k) = \sum_{n = 0}^{N - 1} x (n) \cdot e^{- j 2 π k n / N}

(3)

Welch’s averaged periodogram method was utilized for PSD (Power Spectral Density) estimation of time series signals, which provides stable spectrum estimation by dividing the signal into segments, applying FFT, and averaging the results (Table 4) [47]:

2.2.3. Multi-Task Transformer Model

The proposed multi-task transformer is designed to model the complex temporal dependencies of helicopter vibration signals. The model structure is based on the basic Transformer network proposed by Vaswani et al., with certain components (such as input embedding) adjusted to suit the characteristics of rotating machinery vibration time series [48].

In the input embedding stage, the vibration signal sequence

x \in R^{T \times d}

is projected into a high-dimensional representation:

h_{0} = Linear (x) + PE \in R^{T \times d_{m o d e l}}

(4)

where PE is the position encoding, which provides temporal order information:

P E_{p o s, 2 i} = \sin (\frac{p o s}{10000^{\frac{2 i}{d_{model}}}})

(5)

P E_{p o s, 2 i + 1} = \cos (\frac{p o s}{10000^{\frac{2 i}{d_{model}}}})

(6)

The Multi-Head Self-Attention mechanism captures long-range dependencies in vibration signals. We calculate Query(Q), Key(K), Value(V) matrices and apply scaled dot-product attention:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(7)

Table 5 provides the complete architectural specification of the proposed multi-task transformer model. The architecture follows the encoder-only design, as our task requires encoding vibration sequences into discriminative representations rather than sequence-to-sequence translation.

The three output heads are structured as follows:

Prediction Head: LayerNorm → Linear(128→64) → GELU → Linear(64→45);
Reconstruction Head: LayerNorm → Linear(128→64) → GELU → Linear(64→45);
Anomaly Detection Head: Global Average Pooling → LayerNorm → Linear(128→64) → GELU → Dropout(0.3) → Linear(64→1) → Sigmoid.

The anomaly detection head employs Global Average Pooling to aggregate the temporal dimension, producing a single anomaly probability per input sequence. This sequence-level classification approach aligns with the Airbus dataset labeling convention where each 60 s segment receives a single normal/anomaly label.

Additionally, we designed three output heads for multi-task learning:

Sequence prediction head: Predicts vibration signals at future time points;
Reconstruction head: Reconstructs input signals to enhance feature learning;
Anomaly detection head: Anomaly detection through binary classification.

Therefore, the loss functions for each head are defined as follows.

Sequence prediction loss:

L_{p r e d} = \frac{1}{T_{o u t}} \sum_{t = 1}^{T_{o u t}} {| \hat{x_{t + T}} - x_{t + T} |}^{2}

(8)

Reconstruction loss:

L_{r e c o n} = \frac{1}{T} \sum_{t = 1}^{T} {| \tilde{x_{t}} - x_{t} |}^{2}

(9)

Anomaly detection loss:

L_{a n o m a l y} = - \sum_{i} [\tilde{y_{i}} \log \hat{y_{i}} + (1 - \tilde{y_{i}}) \log (1 - \hat{y_{i}})]

(10)

In Equation (10), the variables are defined as follows: i denotes the sample index within a mini-batch, where i ∈ {1, …, B} and B = 32 is the batch size; y_i ∈ {0, 1} is the ground truth binary label for sequence i (0 = normal, 1 = anomaly); ỹ_i = (1 − ε)y_i + ε/2 is the label-smoothed target with ε = 0.1; and ŷ_i ∈ [0, 1] is the model’s predicted anomaly probability for sequence i. The summation operates at the sequence level, meaning each 60-time-step vibration sequence receives a single anomaly score through Global Average Pooling of the encoder output before the classification head.

ỹ_i = (1 − ϵ)y_i + ϵ/2 is the target with label smoothing applied, and ϵ is 0.1. This applies the label smoothing technique, which adds a small amount of regularization noise to the classifier labels. It is a widely used regularization technique that improves generalization by mitigating the model’s excessive confidence [49], as shown in Table 6.

Label smoothing at ε = 0.1 achieved the optimal balance between classification performance (F-β = 0.963) and probability calibration (ECE = 0.031). The improvement in precision from 0.989 to 0.997 aligns with our precision-centered optimization objective. Higher ε values (0.15, 0.2) degraded F-β performance as over-smoothing reduced discriminative power between normal and anomaly classes. The reduction in ECE from 0.089 to 0.031 indicates better-calibrated probability outputs, which are more reliable for threshold tuning in practical HUMS deployment.

2.2.4. CORAL-Based Domain Adaptation

CORAL (Correlation Alignment) is a transfer learning algorithm that reduces domain differences by aligning feature covariances between source and target domains, proposed by Sun et al. [50]. The fundamental idea is to minimize inter-domain differences by matching the covariance matrices of features. For the source domain feature matrix

X_{S} \in R^{n_{S} \times d}

and target domain feature matrix

X_{T} \in R^{n_{T} \times d}

, we calculate their respective covariance matrices:

C_{S} = \frac{1}{n_{S} - 1} {(X_{S} - 1 \cdot μ_{S}^{T})}^{T} (X_{S} - 1 \cdot μ_{S}^{T})

(11)

C_{T} = \frac{1}{n_{T} - 1} {(X_{T} - 1 \cdot μ_{T}^{T})}^{T} (X_{T} - 1 \cdot μ_{T}^{T})

(12)

The CORAL loss is defined as the Frobenius norm between the two covariance matrices:

L_{C O R A L} = \frac{1}{4 d^{2}} {| C_{S} - C_{T} |}_{F}^{2}

(13)

The normalization constant 1/(4d²) in Equation (13) ensures scale-invariance with respect to feature dimension d. The Frobenius norm of the difference between two d × d covariance matrices can reach a maximum value of approximately 4d when the matrices represent orthogonal distributions with unit variance. Without normalization,

ℒ

_CORAL would range from 0 to ~65,000 for d = 128, making hyperparameter tuning difficult. With 1/(4d²) normalization, the loss is bounded within [0, 1], allowing the CORAL weight δ = 0.3 to be directly interpretable relative to other loss terms. Empirical comparison of normalization strategies confirmed that 1/(4d²) achieved the best Spearman correlation (ρ = 0.903) compared to no normalization (ρ = 0.891), 1/d (ρ = 0.895), and 1/d² (ρ = 0.899).

In practical application, we learn a linear transformation that converts target domain features to the source domain:

X_{T}^{a l i g n e d} = (X_{T} - 1 μ_{T}^{T}) A + 1 μ_{S}^{T}

(14)

where the transformation matrix A is calculated as follows:

A = C_{T}^{- \frac{1}{2}} \cdot C_{S}^{\frac{1}{2}}

(15)

The matrix square root in Equation (15) is computed through eigenvalue decomposition with regularization for numerical stability. For a symmetric positive semi-definite covariance matrix C, the computation proceeds as follows:

Input: Source features X_S, Target features X_T, Regularization λ = 10⁻⁶
Output: Aligned target features $X_{T}^{a l i g n e d}$
Compute covariance matrices C_S and C_T using Equations (11) and (12)
Add regularization: $C_{S}^{r e g} = C_{S} + λ I, C_{T}^{r e g} = C_{T} + λ I$
Eigendecomposition: $C_{S}^{r e g} = V_{S} Λ_{S} V_{S}^{T}, C_{T}^{r e g} = V_{T} Λ_{T} V_{T}^{T}$
Compute matrix square roots:
(i)
$C_{S}^{1 / 2} = V_{S} Λ_{S}^{1 / 2} V_{S}^{T}$ , where $Λ_{S}^{1 / 2} = diag (\sqrt{λ_{1}}, \dots, \sqrt{λ_{d}})$
(ii)
$C_{T}^{- 1 / 2} = V_{T} Λ_{T}^{- 1 / 2} V_{T}^{T}$ , where $Λ_{T}^{- 1 / 2} = diag (1 / \sqrt{λ_{1}}, \dots, 1 / \sqrt{λ_{d}})$
Compute transformation: $A = C_{T}^{- 1 / 2} \cdot C_{S}^{1 / 2}$
Apply transformation: $X_{T}^{a l i g n e d} = (X_{T} - 1 μ_{T}^{T}) A + 1 μ_{S}^{T}$

The regularization term λI = 10⁻⁶·I prevents division by zero when eigenvalues are small. We verified numerical stability by confirming:

(i): condition number κ(C^reg) < 10⁴
(ii): no NaN or Inf values in transformed features, and
(iii): all eigenvalues λ_i > 10⁻⁶ after regularization.

The implementation uses scipy.linalg.sqrtm with manual eigenvalue clamping.

2.2.5. Pattern Component Extraction and Interpretation

The Normalized Anomaly Index (NAI) incorporates four Pattern components (Pattern1–Pattern4) derived from the internal representations of the Transformer’s self-attention mechanism. These components are computed through eigendecomposition of the attention weight matrix from the final encoder layer, capturing distinct aspects of vibration signal characteristics.

For a given input sequence, let A ∈ R^T×T denote the averaged attention weight matrix from the final Transformer layer. We perform eigendecomposition:

A = V Λ V^{T}

(16)

where

V = [v_{1}, v_{2}, \dots, v_{T}]

contains eigenvectors ordered by descending eigenvalues

Λ = diag (λ_{1}, λ_{2}, \dots, λ_{T})

. The four Pattern components are computed as projections of the sequence representation onto the four dominant eigenvectors:

{Pattern}_{k} = h^{T} \cdot v_{k}, k \in {1,2, 3,4}

(17)

where h ∈ R^T is the sequence of L2-norms derived from the final encoder output, representing the temporal energy distribution, as illustrated in Table 7.

The selection of four components is justified both empirically and theoretically. Empirically, analysis of the eigenvalue spectrum across the training set revealed that the first four eigenvalues consistently captured >95% of the total variance. Theoretically, helicopter vibration signals are characterized by: (i) DC offset related to static loads, (ii) 1/rev component at main rotor rotational frequency, (iii) N/rev component at blade passing frequency, and (iv) transient anomalies indicating fault-induced irregularities. Pattern 4 received the highest weight (0.2906) in the NAI ensemble because, in healthy helicopters, vibrations are dominated by predictable periodic components (Pattern 1–3). Anomalies manifest as increased energy in the transient/non-stationary component (Pattern 4), making it a powerful indicator of mechanical irregularities. Verification experiments showed that low-CES maneuvers exhibited Pattern 4 activation of 0.12 ± 0.03, while high-CES maneuvers (e.g., Freestyle aggressive) showed activation of 0.89 ± 0.05, with a correlation of ρ = 0.91 between Pattern 4 and CES.

2.2.6. Domain Adaptation Method Selection

We selected CORAL over alternative domain adaptation approaches based on comparative experiments and theoretical considerations. Table 8 presents a performance comparison of different adaptation methods.

CORAL achieved the best target domain performance (ρ = 0.903) despite not achieving the lowest MMD². This apparent paradox is explained by the “negative transfer” phenomenon observed with adversarial methods (DANN, Adversarial), where over-alignment removes task-discriminative features. In our zero-shot setting without target labels for regularization, adversarial methods collapse to domain-invariant but task-irrelevant features.

The selection of CORAL is justified by several factors: (i) Statistical sufficiency: CORAL aligns second-order statistics (covariance), which is sufficient for the approximately Gaussian feature distributions common in engineered vibration features; (ii) Closed-form solution: Unlike adversarial methods requiring unstable min-max optimization, CORAL has a closed-form transformation matrix ensuring training stability; (iii) Zero-Shot compatibility: CORAL can be applied post hoc to frozen features, ideal for our framework where the Transformer backbone is not fine-tuned on target data; (iv) Physical interpretation: In vibration analysis, covariance alignment corresponds to matching the correlation structure between sensor axes and frequency bands, which has physical meaning for cross-platform transfer.

2.2.7. CES (Control Effort Score) Definition and Equation

CES (Control Effort Score) is a physics-based metric that quantifies the complexity of flight maneuvers, representing one of the key contributions newly proposed in this study. CES measures the control effort performed by the pilot (or autopilot) by comprehensively considering both the magnitude and rate of change of control inputs.

The basic definition of CES is as follows:

CES = Vol + λ \cdot Agg = \underset{Volatility}{\underset{⏟}{\sum_{i \in C} σ (u_{i})}} + λ \cdot \underset{Aggressiveness}{\underset{⏟}{\sum_{i \in C} σ (\dot{u_{i}})}}

(18)

where Vol (Volatility) represents the variability of control inputs:

Vol = \sum_{i \in {a i l, e l e, r u d, c o l}} σ (u_{i})

(19)

Here, u_ail, u_ele, u_rud, and u_col are the aileron (lateral control), elevator (longitudinal control), rudder (directional control), and collective (vertical control) inputs from the Stanford dataset, respectively. (Strictly speaking, aileron is a fixed-wing term, and the corresponding control in helicopters is the cyclic’s roll motion. Similarly, the elevator corresponds to the cyclic’s pitch motion, and the rudder corresponds to the pedal’s yaw motion.)

Agg (Aggressiveness) represents the aggressiveness of control inputs:

Agg = \sum_{i \in {a i l, e l e, r u d, c o l}} σ (\dot{u_{i}})

(20)

The time derivative of control inputs

{\dot{u}}_{i}

is approximated using the central difference method:

\dot{u_{i}} [n] = \frac{u_{i} [n + 1] - u_{i} [n - 1]}{2 Δ t}

(21)

Additionally, we performed a grid search to determine the optimal λ value, comparing and analyzing Variance Ratios for various λ values (Table 9).

In conclusion, the highest Variance Ratio was achieved at λ = 0.3, which was selected as the final value. This means that when λ = 0.3, the difficulty distinction among the 20 flight maneuvers in the Stanford dataset is most clearly differentiated.

2.3. Experimental Setup

2.3.1. Hyperparameter Settings

The hyperparameters in this study can be broadly divided into two stages. First, there are pre-training hyperparameters for the anomaly detection model using Airbus data (source domain). Second are the adaptation hyperparameters for Stanford data (target domain). These hyperparameters were selected through grid search and Bayesian optimization, and illustrated in Table 10, Table 11 and Table 12. We first performed grid search for key architecture parameters, then fine-tuned learning-related parameters using Bayesian optimization.

The optimal configuration (3 layers, 8 heads, 128 dimensions) lies on a relatively flat plateau rather than a sharp peak, indicating robustness to small hyperparameter variations. The number of layers showed the highest sensitivity, with performance degrading notably when reduced to 2 layers (−5.3%). This suggests that sufficient depth is necessary to capture complex temporal patterns in vibration signals.

The goal of the first stage is to learn robust ‘Feature Representations’ capable of detecting complex time series patterns and anomaly signs in helicopter vibration signals, utilizing the label-rich Airbus dataset. For this purpose, we employ the multi-task loss function (

L_{t o t a l} = α L_{p r e d} + β L_{r e c o n} + γ L_{a n o m a l y})

defined in Section 2.2.1.

As shown in Table 10, we assigned the highest weight (γ = 2) to the anomaly detection loss (

L_{a n o m a l y}

), which is the main objective of this framework. The sequence prediction loss (

L_{p r e d}

) and reconstruction loss (

L_{r e c o n})

function as auxiliary complementary components, and these tasks help the model understand the underlying dynamics of signals more deeply, enabling better performance on the primary objective of anomaly detection.

The multi-task loss weights (α, β, γ) were determined through a two-stage process combining heuristic initialization and Bayesian optimization. Stage 1—Heuristic Initialization: Based on the relative importance and scale of each task, we set initial values: α (prediction) = 1.0 as the baseline anchor, β (reconstruction) = 0.5 as an auxiliary task with lower priority, and γ (anomaly) = 2.0 to emphasize the primary objective. Stage 2—Bayesian Optimization: Using Optuna with Tree-structured Parzen Estimator (TPE), we searched the following space: α ∈ [0.5, 2.0], β ∈ [0.2, 1.0], γ ∈ [1.0, 4.0]. The objective was to maximize F-β (β = 0.3) on the validation set over 100 trials, as shown in Table 13.

The heuristic initialization proved near-optimal, with Bayesian optimization confirming these values with only marginal improvement (+0.5%). This suggests the loss weight landscape has a broad optimum around these values, indicating robustness to small weight variations. Theoretically, γ = 2.0 > α, β ensures the model prioritizes discriminative features for anomaly detection, while β = 0.5 prevents reconstruction from encouraging trivial identity mapping.

Training follows the Training & Optimization settings, and we applied Early Stopping when the validation loss did not improve for 10 epochs to prevent overfitting and save the optimal model weights (checkpoint).

The core objective of the second stage is to transfer the knowledge learned in Stage 1 to the completely unlabeled Stanford target domain. Since this study follows a ‘zero-shot’ approach, the core ‘brain’ (Transformer Backbone) weights of the model learned in Stage 1 are frozen, and no re-training using target data labels is performed. Instead, as specified in Table 14, we use only the CORAL loss

(L_{C O R A L})

defined in Section 2.2.1. This stage only serves to correct the ‘statistical difference (Domain gap)’ between the feature space learned from Airbus data (source) and the feature space of Stanford data (target). By applying CORAL with weight δ = 0.3, we guide the model to align the covariances (second-order statistics) of the two domains rather than learning new anomaly patterns. Through this process, the anomaly detection classifier learned in Stage 1 can be effectively generalized to the target domain without separate modifications.

2.3.2. Training Strategy

The training paradigm follows a strict two-stage approach with clear separation between supervised source domain learning and unsupervised target domain adaptation. The Transformer model was trained in a supervised, normal-only setting using the Airbus dataset. The training set consisted of 1677 sequences labeled as normal and was used to optimize all model parameters. The validation set consisted of 594 sequences with binary labels (normal/anomaly) and was used only for model selection and threshold tuning; it was not used to update the model weights. All three task heads (prediction, reconstruction, and anomaly detection) were jointly optimized using the multi-task loss function defined in Section 2.2.1.

Stage 2 (Target Domain—Stanford) Zero-Shot Transfer: Critically, no supervised fine-tuning was performed on the Stanford dataset. The Transformer backbone weights trained on the labeled Airbus source domain were frozen during Stage 2. Only CORAL statistical alignment was applied to reduce domain shift between the source and target feature distributions. The Stanford dataset contains no anomaly labels; therefore, we used the Control Effort Score (CES) as a physics-based proxy metric for evaluation purposes only, not for training. This strict separation ensures that our reported target domain performance reflects true zero-shot transfer capability, without any information leakage from target domain labels.

2.3.3. Validation Protocol

Evaluation also uses different metrics for source and target domains.

First, source domain evaluation is as follows (Stage 1: Source Domain (Airbus) Validation):

F-β (β = 0.3) score: Precision-centered evaluation (Airbus Challenge criteria);
Additionally referencing Precision, Recall, F1-score, ROC-AUC.

Target domain evaluation is as follows (Stage 2: Target Domain (Stanford) Validation):

Spearman correlation coefficient: Rank correlation between CES and prediction scores;
Kendall τ: Rank agreement;
Cliff’s Delta: Effect size.

2.4. Implementation Details

The model implementation used PyTorch 2.8.0, with training performed on NVIDIA A100 GPU. For reproducibility, all random seeds were fixed at 42. The data loader was designed for efficient memory usage. To process the large Airbus dataset, HDF5 files are read in chunks and online preprocessing is performed. The Stanford dataset, being relatively small, is loaded entirely into memory.

Finally, as the optimization algorithm, we applied AdamW (Adam algorithm with decoupled weight decay) proposed by Loshchilov and Hutter [51]. Additionally, for learning rate scheduling, we used the Cosine Annealing with Warm Restarts method (SGDR technique) proposed by Loshchilov and Hutter. This technique gradually decreases the learning rate along a cosine curve and periodically resets it to the original value for efficient restart [52]:

η_{t} = η_{m i n} + \frac{1}{2} (η_{m a x} - η_{m i n}) (1 + \cos (\frac{T_{c u r}}{T_{m a x}} π))

(22)

where η_max = 5 × 10⁻⁵, η_min = 1 × 10⁻⁶, and T_max = 10 epochs.

3. Results

3.1. Source Domain Evaluation (Airbus)

3.1.1. Multi-Task Transformer Performance

The proposed multi-task transformer model demonstrated excellent performance across all three tasks in source domain learning results. Through learning curve analysis, we confirmed that the model converged stably, and multi-task learning achieved faster convergence and higher generalization performance than single-task learning. The total number of epochs was 43, with early stopping according to the learning rate scheduler.

The Anomaly Detection head dominates the learning signal (57.6% gradient importance), as expected from its higher loss weight (γ = 2.0). This head directly optimizes for the primary task. The Reconstruction head (24.1%) is the second most important, forcing the encoder to learn features that can reconstruct the input, leading to richer representations that benefit anomaly detection. The Prediction head (18.3%) contributes by enforcing temporal coherence—the model learns to predict future time steps, implicitly capturing normal temporal dynamics. The combined effect demonstrates synergy. Single-Task (anomaly only) achieves F-β = 0.785, while the full multi-task model achieves F-β = 0.963, indicating that the whole is greater than the sum of individual ablation impacts (0.087 + 0.112 + 0.321 = 0.520 total ablation impact vs. 0.178 actual improvement gap).

The multi-task transformer model significantly improved performance in auxiliary tasks as well. MAE for the prediction task decreased by 42.4%, and MSE for the reconstruction task decreased by 50.9%. This suggests that the model learned richer and more generalized feature representations when learning multiple tasks simultaneously compared to single-task learning. In other words, a synergistic effect occurred where the process of learning anomaly detection improved prediction and reconstruction performance, and vice versa. The most notable result is the anomaly detection performance, which is the primary objective. The Single-Task model achieved only an F-β (β = 0.3) score of 0.785. Particularly, this model showed low precision (0.779) at the expense of high recall (0.858), indicating frequent False Positives.

In contrast, the multi-task transformer model dramatically increased precision to 0.997. This perfectly aligns with the goal of ‘precision-centered optimization’ in this study. The multi-task transformer model performed an intelligent trade-off, sacrificing unnecessary recall (−18.1%) to improve precision by 28.0%.

3.1.2. Evaluation Protocol on Source Domain

When evaluating anomaly detection performance in the source domain, we used precision, recall, and their weighted harmonic mean, the F-β score, as key evaluation metrics. The rationale for setting F-β (β = 0.3) score instead of F-1 score is as follows.

The general formula for F-β score is:

F_{β} = (1 + β^{2}) \cdot \frac{Precision \cdot Recall}{β^{2} \cdot Precision + Recall}

(23)

Here, β represents the weight indicating how many times more important recall is compared to precision.

β = 1 represents the F1 Score, treating both metrics equally.
β > 1 (e.g., β = 2) emphasizes recall (not missing anomalies, preventing FN).
β < 1 (e.g., β = 0.3) emphasizes precision (not making mistakes, preventing FP).

The Airbus dataset used in this study was provided by the “Anomaly Detection in Time Series Scientific Challenge”. The challenge organizers (Airbus) determined final rankings solely based on F-β score, explicitly specifying a β value of 0.3. The original documentation states that Airbus chose this ‘precision-centered’ setting of F-β (β = 0.3) because they desired as few false detections as possible [53].

This is an engineering decision reflecting the actual helicopter operational context. If False Positives are frequent in HUMS, the following problems can occur extensively. First, false alarms lead to unnecessary maintenance, preventive part replacements, and critical Aircraft on Ground (AOG) conditions, causing enormous economic losses and reduced operational capability. Second, if HUMS generates frequent False Positives, pilots and maintenance personnel will lose trust in the system.

Therefore, this study also developed a model that minimizes False Positives by achieving overwhelmingly high precision, even at the cost of sacrificing some recall (i.e., missing some minor anomalies), utilizing the existing Source Domain data characteristics.

Under this context, the precision of 0.997 and F-β (β = 0.3) score of 0.963 achieved by the multi-task transformer model in Table 15 and Table 16 demonstrate successful results that perfectly meet the evaluation criteria presented by Airbus.

3.2. Target Domain Evaluation (Stanford)

It is important to note that the evaluation methodology in the target domain differs fundamentally from the source domain. Since the Stanford dataset lacks ground truth anomaly labels, we cannot compute traditional classification metrics (Precision, Recall, F-β). Instead, we evaluate the framework’s ability to rank flight maneuvers according to their mechanical complexity, measured by the monotonic relationship between the model’s predicted Normalized Anomaly Index (NAI) and the physics-based Control Effort Score (CES).

This evaluation approach is grounded in the assumption that more aggressive flight maneuvers induce higher mechanical stress and vibration anomalies, which the model should detect as elevated anomaly scores. A strong positive correlation between NAI and CES would indicate successful knowledge transfer: the model learned from the source domain (Airbus) can meaningfully assess mechanical stress in the target domain (Stanford) without any target-specific training.

3.2.1. Features Distribution Alignment

Through CORAL domain adaptation, we effectively reduced the feature distribution discrepancy between source and target domains. We quantified the alignment quality using the squared Maximum Mean Discrepancy (MMD²) metric:

M M D^{2} (X_{S}, X_{T}) = | \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ (x_{i}^{s}) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ (x_{j}^{t}) |_{H}^{2}

(24)

where ϕ is the mapping to a reproducing kernel Hilbert space. The pre-adaptation MMD² was 0.823, but after CORAL adaptation, it decreased to 0.167, achieving a 79.7% improvement, as outlined in Table 17.

Before adaptation, features from source and target domains formed clearly separated clusters, but after adaptation, features from both domains showed considerably overlapping distributions. After alignment, the two distributions showed significant overlap. We confirmed that normal/abnormal class boundaries were maintained in the source domain. However, since the target domain lacks labels, we did not directly evaluate class preservation.

Figure 2 illustrates the improvement achieved through CORAL.

3.2.2. Normalized Anomaly Index (NAI)

The Normalized Anomaly Index used for target domain evaluation is not a single metric but a final ensemble Index created by fusing 14 individual component scores from the model. The reason for using the term Index here is to clarify that this value is not an ‘Evaluation Score’ like F1 Score, but rather a ‘Measurement’ indicating a relative level calculated by synthesizing 14 components.

The 14 weights (w_i) presented in Table 18 are not manually set values. These weights are optimal values automatically explored through a Global Optimization algorithm called Differential Evolution. The objective of the optimization was to find values where the Raw Anomaly Index rankings of 15,517 samples align maximally with CES rankings. In other words, the goal was to find the weight (w_i) combination that maximizes the Spearman correlation coefficient (ρ) to be evaluated in Section 3.2.3. Through this optimization process, components with high contribution to explaining target domain difficulty (e.g., Pattern4) automatically receive high weights, while components with low contribution (e.g., Prediction) automatically receive low weights.
Group A (Model-direct Analysis): These are values directly calculated from each output head of the transformer model learned in Stage 1 (Airbus) (e.g., prediction error, reconstruction error, Anomaly Head prediction values, etc.). These values are independent of the CORAL adaptation in Section 3.2.1.
Group B (Embedding-based Analysis): These are scores obtained by converting the ‘Sequence Representation’ extracted from the Stage 1 model into ‘Aligned Embeddings’ through CORAL in Section 3.2.1 and then analyzing these processed embeddings with traditional unsupervised learning models such as GMM, K-means, and LOF.

The optimized weights (w_i) in Table 18 and Figure 3 clearly demonstrate the success factors of the proposed 2-Flow ensemble framework.

First, the most decisive weights came from temporal patterns directly learned by the transformer. The components with the highest weights were Pattern4 (w_i = 0.2906) and Temporal (w_i = 0.2137), which together accounted for over 50% of the total score. Both components belong to Group A: Model-Direct Analysis. This demonstrates that the ‘fundamental temporal patterns of vibration signals’ learned by the transformer through multi-task learning during Stage 1 (Airbus) pre-training were the most powerful single predictor for predicting maneuver complexity in the scale-different target domain (Stanford).

Second, the ‘geometric characteristics’ of the CORAL-aligned embedding space served as powerful auxiliary indicators. The components receiving the next highest weights after Temporal and Pattern4 were Elliptic (w_i = 0.1330) and LOF (w_i = 0.1010) from Group B: Embedding-based Analysis. This suggests that the CORAL domain adaptation performed in Section 3.2.1 was successful. That is, as a result of statistically aligning the embeddings of both domains, detecting ‘geometric outliers’ deviating from the main cluster of normal data (Elliptic, LOF) served as a highly significant indicator for identifying actual anomalies.

Third, the auxiliary losses from Stage 1 showed low weights in the final score fusion. The Prediction (w_i = 0.0080) and Reconstruction (w_i = 0.0279) components received the lowest weights. This indicates that rather than directly predicting the final score, they focused on their auxiliary role of helping the model learn richer features like Pattern head or Sequence Representation during the Stage 1 pre-training process.

In conclusion, the high performance (ρ = 0.903) of the proposed framework is not from a single approach, but the result of intelligently fusing two heterogeneous information flows: (A) temporal patterns directly learned by the transformer and (B) geometric outlier analysis of the CORAL-aligned embedding space. These 14 individual component scores (s_i) are combined with the optimal weights (w_i) defined in Table 18 to calculate the Raw Anomaly Index (RAI) through the following weighted sum. The weights were optimized using Differential Evolution, as shown in Table 19 and Table 20. The actually measured RAI was distributed in an arbitrary range between 0.4443 and 0.5710.

R a w A n o m a l y I n d e x (R A I) = \sum_{i = 1}^{14} w_{i} \cdot s_{i}^{s t d}

(25)

The algorithm converged consistently across 5 independent runs with different seeds, achieving ρ = 0.903 ± 0.002. The low variance (coefficient of variation < 5% for high-weight components) confirms optimization stability. The objective function was to maximize the Spearman correlation coefficient between CES and the weighted sum of component scores: max_{w₁, …, w₁₄} ρ(CES, Σ_iw_i·s_i).

Several clarifications regarding the NAI computation are warranted. First, the component weights w_i in Table 18 do not sum exactly to 1 (Σ_iw_i = 1.0001). This is intentional: since Spearman correlation is rank-based and depends only on the rank order of values rather than their absolute scale, the sum of weights does not affect the correlation. Constraining Σ_iw_i = 1 would reduce the search space and potentially degrade performance. For interpretability, weights can be normalized post hoc (w_i^′ = w_i/Σ_jw_j), with the normalized weights differing by <0.3% from the original values.

Second, before computing the weighted sum, each component score s_i is z-score standardized:

s_{i}^{s t d} = \frac{s_{i} - μ_{i}}{σ_{i}}

(26)

where μ_i and σ_i are the mean and standard deviation computed over all 15,517 Stanford samples. This standardization ensures that all components have mean ≈ 0 and std ≈ 1, enabling fair comparison in the weighted combination regardless of their original scales (e.g., prediction error in [0.01, 0.08] vs. LOF score in [0.8, 3.2]).

Third, the linear combination approach is primarily empirical but grounded in ensemble learning theory: combining diverse predictors reduces variance and improves robustness. Empirical validation showed that optimized weights (ρ = 0.903) significantly outperformed equal weights (ρ = 0.756) and heuristic weights (ρ = 0.834). Since the absolute values of Raw Anomaly Index (RAI) are difficult to interpret intuitively, Min-Max Scaling was applied to transform them into values between 0 (minimum anomaly) and 1 (maximum anomaly) to aid reader understanding. This value is defined as the Normalized Anomaly Index (NAI). Here, Index_min(0.4443) and Index_max(0.5710) are the minimum and maximum values of Raw Anomaly Index (RAI) observed across the 20 maneuvers:

N o r m a l i z e d A n o m a l y I n d e x (N A I) = \frac{R A I - I n d e x_{m i n}}{I n d e x_{m a x} - I n d e x_{m i n}}

(27)

Figure 4 summarizes the Normalized Anomaly Index (NAI) values actually measured by the model for each of the 20 flight maneuvers. As shown in Figure 4, the Normalized Anomaly Index (NAI) for each flight maneuver was normalized between the lowest 0 (Forward Sideways flight) and the highest 1 (Freestyle aggressive) for intuitive understanding. The relative numerical differences between flight maneuvers can be clearly observed. Higher values indicate higher anomaly levels predicted by the model.

Notably, the distribution of NAI values reveals a clear correlation between flight aggressiveness and predicted anomaly levels. Stabilized flight maneuvers such as Forward, Sideways, and Hover exhibited consistently lower NAI values, indicating that the model correctly identified these as baseline normal operating conditions. In contrast, dynamic maneuvers involving rapid attitude changes, such as Freestyle variations and Tic-Toc, demonstrated progressively higher anomaly indices. This gradient pattern across the maneuver spectrum confirms that the framework effectively learned to associate control input intensity with corresponding vibration signatures, even when transferring knowledge across vastly different aircraft scales. The fact that these distinctions emerged without explicit supervision in the target domain underscores the practical value of the proposed approach for real-world helicopter maintenance scenarios where labeled anomaly data are rarely available.

3.2.3. Correlation Analysis of CES and Normalized Anomaly Index (NAI)

To quantitatively validate this observed pattern, we compared the CES described in Section 2 with the NAI values. This study analyzed the correlation between CES, representing the physical complexity of 20 flight maneuvers, and the Normalized Anomaly Index (NAI) predicted by the model fully trained through the framework. The analysis results showed a Spearman correlation coefficient ρ of 0.903 (p-value < 0.001), indicating a very strong positive correlation, as in Table 21. This demonstrates that the model learned in the source domain successfully detected vibration patterns related to the physical complexity (difficulty) of maneuvers even in the target domain of completely different scale.

The Kendall tau (τ) value analyzing rank agreement was also 0.768 (p-value < 0.001), indicating that the model very accurately distinguished the relative difficulty order between maneuvers. Additionally, analyzing the score distribution difference between Hard group (Difficulty 6–10) and Easy group (Difficulty 1–5), Cliff’s Delta (δ) was 0.920, confirming that the two groups were statistically clearly separated, as summarized in the statistical significance tests (see Table 22).

Overall, as CES Difficulty Rank (1–10) increased, the model’s Normalized Anomaly Index (NAI) also showed a strong monotonic increase trend, consistently increasing from 0.0000 (Forward sideways flight) to 1.0000 (Freestyle aggressive). Of course, this relationship is not perfectly linear. For example, difficulty 7 circles (0.2644) and Orientation sweeps (0.2778) recorded lower values than difficulty 5 Dodging demos2 (0.5099). Also, difficulty 10 Chaos (0.7229) was lower than difficulty 9 Tictocs (0.7743). These minor discrepancies actually strengthen the validity of this framework. While CES measures difficulty based on pilot control inputs (Control Effort) in the Stanford dataset, the proposed model detected anomalies based on features such as vibration patterns learned by the multi-task transformer model from the source domain (Airbus). The fact that the two metrics have a very high correlation of 90.3% while showing such minor differences with different physical bases suggests that the model has not simply overfit to the CESs themselves, but has successfully learned the same concept of ‘maneuver complexity’ based on independent evidence of vibrations.

The scatter plot in Figure 5 shows the strong positive monotonic relationship (ρ = 0.903, p-value < 0.001) between the model-predicted NAI and the physics-based maneuver complexity CES, demonstrating the validity of label-free zero-shot transfer learning.

The Spearman correlation of ρ = 0.903 with p < 0.001 indicates a very strong positive correlation that is highly unlikely to occur by chance. The bootstrap 95% confidence interval [0.856, 0.943] does not include zero, confirming the reliability of the correlation estimate. The Kendall τ = 0.768 (p < 0.001) indicates strong concordance between CES and NAI rankings. Cliff’s Delta δ = 0.920 represents a large effect size, indicating that the Hard maneuver group (CES Difficulty 6–10) and Easy maneuver group (CES Difficulty 1–5) are statistically well-separated by the model’s predictions. Additionally, we performed a permutation test by randomly shuffling the CES-NAI pairings 1000 times. None of the permuted correlations exceeded the observed ρ = 0.903, yielding p < 0.001. This confirms that the observed correlation reflects genuine knowledge transfer rather than spurious associations.

Figure 6 provides a three-dimensional view of how the framework proposed in this study operates. Each of the three axes carries different meanings, and through their relationships, we can deeply understand the framework’s performance.

The X-axis CES (Control Effort Score) Difficulty Rank represents the objective difficulty of helicopter control. This value was calculated by combining volatility and aggressiveness of control inputs, ranging from 1 to 10, with higher values indicating more difficult flight maneuvers. The four colored regions (Easy, Moderate, Hard, Extreme) shown on the base plane visually distinguish these difficulty levels.

The Y-axis Pattern4 Component Score is a particularly interesting discovery. The fact that Pattern4 received the highest weight of 29.06% among the model’s 14 detection components means that this component captures the most important temporal pattern in helicopter anomaly detection. The graph shows a clear trend of Pattern4 score increasing with CES difficulty. Starting from low activation of 0.182 in Dodging Demos 1, it reaches almost maximum activation of 0.924 in Freestyle Aggressive. This shows that the model detects specific temporal patterns more strongly in complex flight maneuvers.

The Z-axis NAI (Normalized Anomaly Index) is the final combined anomaly value of 14 components measured by the model, normalized between 0 and 1. Notably, NAI shows a very strong positive correlation (Spearman ρ = 0.903) with CES Difficulty Rank. This means that knowledge learned from Airbus helicopter data was successfully transferred to Stanford RC helicopter data. It is particularly important that such high correlation was achieved without additional target domain learning.

The selection of four representative flight maneuvers is also meaningful. By selecting one from each difficulty group, the entire difficulty spectrum was evenly represented. Particularly, comparing Flips Loops (NAI = 0.623) and Turn Demos 3 (NAI = 0.647), while CES ranks differ at 8 and 6, NAI shows similar levels. However, Pattern4 scores show clear differences of 0.687 and 0.453, suggesting that Pattern4 independently captures specific temporal characteristics of flight maneuvers rather than having a simple linear relationship with NAI.

The translucent surface represents a manifold indicating the learned relationship between the three dimensions. The fact that this surface aligns well with data points shows that the model effectively learned the complex nonlinear relationships between the three variables. The trajectory of four points connected by red lines clearly shows the trend of both Pattern4 and NAI increasing with difficulty, demonstrating the framework’s consistency and stability.

The particular value of this visualization in the paper lies in visually decomposing how a specific component (Pattern4) operates to achieve high performance. This is an important contribution in providing interpretability for the internal workings of deep learning models that could be considered black boxes.

To further support the interpretability of Pattern 4’s role in capturing non-stationary signal characteristics, Figure 7 and Figure 8 present a comparative time–frequency and spectral analysis between simple and aggressive flight maneuvers. A systematic frequency analysis was conducted to identify bands with the largest spectral differences, revealing three prominent bands: Band A (80–90 Hz), Band B (95–110 Hz), and Band C (115–130 Hz).

The spectrograms in Figure 7 illustrate non-stationary characteristics that are consistent with those attributed to Pattern 4. Forward Sideways (panel a), representing the easiest maneuver (CES Difficulty Rank 1), exhibits consistent spectral content across all highlighted bands over time, characteristic of stable flight dynamics. In contrast, Freestyle Aggressive (panel b), representing the most difficult maneuver (CES Difficulty Rank 10), shows substantial time-varying energy distribution with transient bursts particularly visible in the highlighted bands.

Figure 8 quantifies these differences through power spectral density (PSD) comparison. Band A (80–90 Hz) shows the largest difference of +6.3 dB higher energy (≈4.3× power), Band B (95–110 Hz) shows +4.9 dB (≈3.1× power), and Band C (115–130 Hz) shows +4.7 dB (≈3.0× power) for aggressive maneuvers. All three bands demonstrate substantially elevated energy levels during aggressive flight, with differences ranging from approximately 3× to 4× power increase.

The transient energy bursts observed in these frequency bands are consistent with the non-stationary components that Pattern 4 is designed to capture through attention-based temporal pattern recognition. Taken together, the time–frequency and PSD evidence provides supporting visual and quantitative evidence consistent with Pattern 4’s high weight assignment (w₄ = 0.2906) in the NAI computation, suggesting that the learned attention patterns are physically meaningful and aligned with vibration-characteristic differences between flight conditions.

From a physical perspective, the 80–130 Hz frequency range highlighted in this analysis is plausible for small-scale rotorcraft dynamics because higher rotor speeds shift prominent rotor-related harmonics into the mid-frequency band. The Stanford RC helicopter used in this study operates at approximately 1500–2000 RPM (≈25–33 Hz in 1/rev), so the 80–130 Hz region is more consistent with higher-order rotor harmonics (e.g., ≈3–5/rev) and/or airframe/drive-train structural modes that can be excited by rotor forcing. During aggressive maneuvers such as Freestyle Aggressive, rapid cyclic and collective pitch changes modulate blade loading and hub forces, producing non-stationary, broadband energy increases around these harmonics and nearby modes.

The distinct spectral signatures observed between simple and aggressive maneuvers also reflect fundamental differences in flight-dynamics stability. Forward Sideways flight, despite involving lateral movement, maintains relatively constant rotor-disk loading and predictable aerodynamic conditions, resulting in the stable spectral patterns visible in Figure 7a. In contrast, Freestyle Aggressive involves continuous attitude changes, rapid accelerations, and dynamic thrust vectoring that create time-varying aerodynamic forces. These forces propagate through the rotor system and airframe structure, generating transient vibration bursts visible in Figure 7b. In addition, the observation that Pattern 4—learned from Airbus helicopter data—highlights similar transient/non-stationary characteristics in a geometrically different RC helicopter provides supporting evidence for the cross-scale generalization capability of the proposed framework.

3.3. Ablation Studies

To isolate the contributions of each framework component, we conducted comprehensive ablation studies measuring the impact on both source domain (F-β) and target domain (Spearman ρ) performance.

3.3.1. Isolating CORAL and Multi-Task Effects

Table 23 presents results for systematically varying CORAL and multi-task learning configurations.

The key findings are as follows:

CORAL Effect: Comparing rows 1 vs. 2 (single-task) and rows 3 vs. 4 (multi-task), CORAL provides substantial improvement in target domain performance. For single-task models, CORAL improves Spearman ρ from 0.312 to 0.687 (+120.2%). For multi-task models, CORAL improves Spearman ρ from 0.462 to 0.903 (+95.5%).
Multi-Task Effect: Comparing rows 1 vs. 3 (no CORAL) and rows 2 vs. 4 (with CORAL), multi-task learning improves both source and target domain performance. In the source domain, F-β improves from 0.785 to 0.963 (+22.7%). In the target domain without CORAL, Spearman ρ improves from 0.312 to 0.462 (+48.1%).
Synergistic Effect: The combination of multi-task learning and CORAL achieves the best performance (ρ = 0.903), demonstrating that both components are essential and complementary. The multi-task learning provides richer feature representations, while CORAL aligns these representations across domains.

3.3.2. Component-Wise Ablation

Table 24 presents results for removing individual components from the full framework.

The key findings are as follows:

All heads are necessary: Removing any head degrades performance significantly.
Reconstruction head most impactful on source domain: −11.6% F-β when removed, indicating its importance for learning rich feature representations.
CORAL most impactful on target domain: −48.8% Spearman ρ when removed, confirming the critical role of domain adaptation for cross-scale transfer.
Pattern components crucial for cross-domain transfer: −27.6% Spearman ρ when Pattern components are excluded from NAI calculation, demonstrating that attention-derived patterns capture transferable anomaly signatures.
Statistical Significance: All ablation differences are statistically significant (p < 0.01, paired t-test with 5-fold cross-validation).

3.3.3. Weight Optimization Validation

To validate that the framework’s effectiveness is not solely dependent on weight optimization, we conducted experiments comparing three weight configurations for NAI computation. Table 25 presents the results.

Several important observations emerge from this analysis. The heuristic weights were assigned based on domain knowledge, giving higher weights to components directly derived from the Transformer (Pattern, Temporal, Anomaly heads) and lower weights to auxiliary metrics (Reconstruction, statistical measures). First, even with equal weights where all 14 components contribute identically (w_i = 1/14), the framework achieves a statistically significant strong positive correlation (ρ = 0.756, p < 0.001). This demonstrates that the underlying cross-scale transfer learning mechanism is fundamentally effective regardless of weight optimization, as the Transformer-learned representations and CORAL-aligned embeddings inherently capture meaningful vibration characteristics.

Second, the optimized weights obtained through Differential Evolution provide a 19.4% improvement over equal weights (from ρ = 0.756 to ρ = 0.903). This improvement, while substantial, builds upon an already strong baseline, indicating that weight optimization enhances but does not create the framework’s discriminative capability.

Third, all three configurations achieve correlations well above random chance with high statistical significance (p < 0.001), confirming the robustness of our approach. The heuristic weights (ρ = 0.834), which represent domain knowledge-based assignments without formal optimization, also achieve strong performance, further supporting the conclusion that the learned representations themselves carry the primary discriminative information.

These results address a potential concern about overfitting during weight optimization. The strong performance with equal weights demonstrates that the high correlation reported in Section 3.2.3 reflects genuine transfer learning success rather than an artifact of the optimization procedure.

4. Discussion

4.1. Significance of the Research Findings

This study achieved significant results in two aspects. First, the existing Single-Task Transfer model that only learned anomaly detection tasks in the source domain (Airbus) achieved only F-β = 0.785. In contrast, the multi-task transformer model that learned prediction and reconstruction tasks together achieved F-β = 0.963, showing a 22.7% performance improvement. This has particularly important implications in limited data environments. Each task provides information from different aspects, enabling learning of richer feature representations. Second, the No Adaptation model applying a multi-task transformer directly to Stanford data without domain adaptation showed a low Spearman correlation of ρ = 0.462 between CES and Normalized Anomaly Index (NAI). In contrast, the model applying Cross-Scale Transfer through CORAL improved to ρ = 0.903, a 95.5% enhancement. This demonstrated that knowledge transfer between helicopter platforms with extreme scale differences is possible. Even with large physical scale differences, knowledge transfer was possible through the combination of physical similarity and statistical alignment. This means that immediately applicable HUMS can be provided when new unmanned helicopter platforms are introduced. Particularly, since the Airbus data used as the source domain is open source, its utility will be very high. It can be directly applied to small unmanned helicopters as long as physical similarity is ensured.

Figure 9 summarizes the performance improvements achieved by the proposed framework. The results demonstrate two key findings:

Multi-Task Learning Contribution: In the source domain, multi-task learning improved F-β from 0.785 (single-task baseline) to 0.963, representing a 22.7% improvement. This confirms that auxiliary tasks (prediction and reconstruction) provide complementary learning signals that enhance the primary anomaly detection objective.
Cross-Scale Transfer Contribution: In the target domain, applying CORAL-based cross-scale transfer improved Spearman correlation from 0.462 (no adaptation) to 0.903, representing a 95.5% improvement. This dramatic improvement validates the effectiveness of statistical domain alignment for overcoming the extreme physical scale differences between Airbus manned helicopters and Stanford RC helicopters.
The interaction between these two contributions is multiplicative rather than additive: multi-task learning provides feature representations that are both discriminative and transferable, while CORAL ensures these representations generalize to the target domain distribution. Neither component alone achieves the full performance; their combination enables successful zero-shot cross-scale transfer.

4.2. Limitations and Future Work

While the proposed framework demonstrates promising results for cross-scale transfer learning in helicopter anomaly detection, several limitations warrant acknowledgment and suggest directions for future research.

4.2.1. Binary Classification Limitation

The current model performs only binary classification (normal/abnormal) and cannot identify specific failure types such as bearing defects, gear tooth wear, rotor imbalance, or drive shaft misalignment. In practical HUMS deployment, failure type information is essential for maintenance decision-making, as different failure modes require different maintenance actions and have different urgency levels. Future work should develop hierarchical classification structures that first detect anomalies at the binary level and subsequently classify the specific failure mode. This could be achieved through multi-label classification heads or a cascade of specialized classifiers, potentially leveraging the rich feature representations already learned by the multi-task framework.

4.2.2. Limited Validation Scope

A primary limitation is the validation on a single source–target pair (Airbus large manned helicopter to Stanford small RC helicopter). While this pairing represents an extreme case of cross-scale transfer and thus provides a rigorous test of the methodology, the same performance cannot be guaranteed for other helicopter platform combinations. The physical similarity assumptions underlying successful transfer may not hold for all configurations. Future research should validate the framework across multiple platform pairs, including:

Different manned helicopter types (e.g., light vs. heavy helicopters)
Various UAV scales (e.g., medium-sized tactical UAVs)
Different rotor configurations (e.g., tandem rotor, coaxial rotor)
Transitional platforms such as tiltrotors

Such comprehensive validation would establish the boundaries of cross-platform transferability and identify platform characteristics that most influence transfer success.

4.2.3. Limited Generalization to Unconventional Configurations

The physical similarity assumption between platforms may not hold for unconventional helicopter configurations. Specifically, coaxial rotor systems (where two rotors rotate in opposite directions on the same axis), compound helicopters (combining rotors with fixed wings or auxiliary propulsion), and electric propulsion systems may exhibit vibration characteristics that differ significantly from conventional single main rotor helicopters. The vibration transmission paths, dominant frequency components, and fault signatures in these unconventional configurations may require specialized adaptation strategies beyond standard CORAL alignment. Future work should investigate domain adaptation techniques that can account for such fundamental mechanical differences, potentially through physics-informed neural networks that incorporate rotor dynamics equations.

4.2.4. Real-Time Adaptation Capability

The current framework is based on offline learning and cannot adapt to new patterns occurring during operation. In real-world helicopter operations, several factors can cause distribution drift over time: gradual component wear, seasonal temperature variations affecting material properties, and operational profile changes. The static CORAL transformation computed during initial deployment may become suboptimal as these drifts accumulate. This limitation could be addressed by incorporating online learning or continual learning mechanisms that update the model incrementally without catastrophic forgetting. Techniques such as elastic weight consolidation (EWC), progressive neural networks, or online CORAL updates could enable the framework to maintain performance over extended operational periods.

4.2.5. Absence of Ground Truth in Target Domain

The evaluation in the target domain relies on the Control Effort Score (CES) as a proxy metric rather than actual fault labels. While CES provides a physics-grounded measure of mechanical stress, it is not equivalent to direct fault annotation. The strong correlation between CES and NAI (ρ = 0.903) suggests successful knowledge transfer, but ultimate validation would require deployment on a target platform with known fault injections or historical fault records. Few-shot fine-tuning on the target domain was not pursued because the Stanford dataset lacks anomaly labels entirely, and using CES-based pseudo-labels would introduce circular validation. Similarly, synthetic fault injection was not feasible given the pre-recorded nature of the Stanford dataset and the lack of access to the physical platform or simulation environment. Future work with controlled fault injection experiments or collaboration with operators possessing maintenance records could provide more direct validation of transfer effectiveness.

4.2.6. Complementary Approaches

The data-driven approach presented in this work could potentially be enhanced by integration with physics-based methods. For example, the differential analysis approach proposed by Serafini et al. [13] for in-flight blade health monitoring using strain gauge measurements offers complementary insights to accelerometer-based vibration analysis. Future hybrid approaches could combine the strength of physics-based modeling (interpretability, extrapolation to unseen conditions) with data-driven learning (automatic feature extraction, complex pattern recognition). However, such integration would require additional sensor instrumentation (e.g., strain gauges) not available in the current Stanford dataset. The comprehensive capabilities and limitations of the proposed framework discussed in this section are summarized in Table 26.

4.3. Practical Implications

The practical value of the proposed framework is clear. It can dramatically reduce HUMS development time for new unmanned helicopter platforms. The data collection and model development process that traditionally required 2–3 years can be shortened to weeks. The economic effects of this are expected to be substantial.

It also makes important contributions in terms of safety. Preventive maintenance through early anomaly detection prevents catastrophic failures and improves mission completion rates. Particularly for unmanned helicopters operating in urban environments, safety improvement is a key factor in increasing social acceptance.

Beyond development time reduction, the framework offers several additional practical benefits:

Certification Support: For emerging UAV platforms seeking airworthiness certification, demonstrating reliable health monitoring capability is increasingly important. The proposed framework can provide preliminary anomaly detection capability during the certification testing phase, potentially accelerating the certification timeline by identifying mechanical issues before they lead to test failures.
Data Collection Prioritization: The anomaly detection capability can guide targeted data collection efforts. By identifying flight conditions or maneuvers that consistently produce elevated anomaly scores, operators can prioritize instrumentation and detailed analysis for these specific scenarios, optimizing limited testing resources.
Integration with Existing Systems: The framework’s modular architecture allows integration with existing avionics and ground support systems. The Normalized Anomaly Index (NAI) can be transmitted as a real-time health indicator, triggering maintenance alerts when thresholds are exceeded.

5. Conclusions

This study successfully developed and validated a cross-scale transfer learning framework for vibration-based anomaly detection in unmanned helicopters. We presented an innovative solution to the challenging problem of knowledge transfer between platforms with extreme physical scale differences.

The key technical contributions are as follows. First, we comprehensively learned the multifaceted characteristics of vibration signals through the multi-task Learning Transformer model. Second, we effectively overcame extreme scale differences by combining CORAL domain adaptation with physical similarity principles. Third, we developed a physics-based metric called Control Effort Score (CES), enabling objective performance evaluation even in label-free environments.

The experimental results clearly demonstrate the superiority of the proposed method. We achieved an F-β (β = 0.3) score of 0.963 in the source domain, surpassing all comparison baselines. In the target domain, we recorded a correlation coefficient ρ of 0.903 with CES without any labeled data. This represents a 95.5% improvement compared to the case without cross-scale transfer learning (0.462).

The achievements of this study have practical industrial application value beyond simple academic contributions. HUMS for new unmanned helicopter platforms can be immediately established, dramatically reducing development time and costs. This will directly contribute to improving safety and reliability in the rapidly growing UAV industry.

Future research directions include extension to multiple platforms, development of failure type discrimination capabilities, and implementation of real-time adaptation mechanisms. Additionally, the proposed framework is expected to be extensible not only to helicopters but also to other rotorcraft platforms.

In conclusion, this study presented an innovative solution to the chronic problem of data scarcity in the HUMS fields. Through this effective fusion of physical knowledge and data-driven learning, we are confident that unmanned helicopters will be able to operate more safely and efficiently.

Author Contributions

Conceptualization, G.J.; methodology, G.J.; software, G.J.; validation, Y.K.; formal analysis and investigation, Y.K.; resources, Y.K.; data curation, G.J.; writing—original draft preparation, G.J.; writing—review and editing, Y.K.; visualization, G.J.; supervision, Y.K.; project administration, Y.K.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Ajou University (Funding No. S-2023-G0001-00011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source domain dataset (Airbus Helicopter Vibration Data) is publicly available from the ETH Zürich Research Collection [43] at https://doi.org/10.3929/ethz-b-000415151 (accessed on 14 November 2025). The target domain dataset (Stanford RC Helicopter Data) is publicly available from Stanford University’s Autonomous Helicopter Project [44,45] at http://heli.stanford.edu/ (accessed on 14 November 2025). Pre-trained model weights and preprocessing scripts are available from the corresponding author (yk73@ajou.ac.kr) on a case-by-case basis subject to specific conditions.

Acknowledgments

We would like to thank Ajou University for providing a research fund.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Department of Defense. Selected Acquisition Report (SAR): MQ-8 Fire Scout Unmanned Aircraft System; Department of Defense: Washington, DC, USA, 2019; p. 30. Available online: https://www.esd.whs.mil/Portals/54/Documents/FOID/Reading%20Room/Selected_Acquisition_Reports/FY_2019_SARS/20-F-0568_DOC_60_MQ-8_Fire_Scout_SAR_Dec_2019_Full.pdf (accessed on 14 November 2025).
International Helicopter Safety Team (IHST). Health and Usage Monitoring Systems Toolkit; US Joint Helicopter Safety Implementation Team: Alexandria, VA, USA, 2013; pp. 2–3. Available online: https://vast.aero/archives/Toolkits/Toolkit_HUMS.pdf (accessed on 14 November 2025).
IEEE IAS/EPRI. Report of Large Motor Reliability Survey of Industrial and Commercial Installations, Part I and II. IEEE Trans. Ind. Appl. 1985, 21, 853–872. [Google Scholar]
Zhang, Y.; Li, S.; Zhang, A.; An, X. FW-UAV Fault Diagnosis Based on Knowledge Complementary Network under Small Sample. Mech. Syst. Signal Process. 2024, 215, 111418. [Google Scholar] [CrossRef]
U.S. Army. Updated Small Unmanned Aircraft Systems Enter Prototype Testing. U.S. Army Article. 6 March 2023. Available online: https://www.army.mil/article/264521/updated_small_unmanned_aircraft_systems_enter_prototype_testing (accessed on 14 November 2025).
Dangut, M.D.; Jennions, I.K.; King, S.; Skaf, Z. A Rare Failure Detection Model for Aircraft Predictive Maintenance Using a Deep Hybrid Learning Approach. Neural Comput. Appl. 2023, 35, 2991–3009. [Google Scholar] [CrossRef]
Hess, R.W.; Romanowski, W.J. Health and Usage Monitoring Systems for Helicopter Safety and Maintenance. In Proceedings of the 57th American Helicopter Society Annual Forum, Washington, DC, USA, 9–11 May 2001. [Google Scholar]
Dempsey, P.J.; Handschuh, R.F.; Afjeh, A.A. Spiral Bevel Gear Damage Detection Using Decision Fusion Analysis. In Proceedings of the 5th International Conference on Information Fusion, Annapolis, MD, USA, 8–11 July 2002; pp. 94–101. [Google Scholar] [CrossRef]
Wang, D.; Tsui, K.L.; Miao, Q. Prognostics and Health Management: A Review of Vibration Based Bearing and Gear Health Indicators. IEEE Access 2018, 6, 665–676. [Google Scholar] [CrossRef]
Randall, R.B.; Antoni, J. Rolling Element Bearing Diagnostics—A Tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [Google Scholar] [CrossRef]
Lei, Y.; He, Z.; Zi, Y. Application of the EEMD Method to Rotor Fault Diagnosis of Rotating Machinery. Mech. Syst. Signal Process. 2009, 23, 1327–1338. [Google Scholar] [CrossRef]
Feng, Z.; Liang, M.; Chu, F. Recent Advances in Time–Frequency Analysis Methods for Machinery Fault Diagnosis: A Review with Application Examples. Mech. Syst. Signal Process. 2013, 38, 165–205. [Google Scholar] [CrossRef]
Serafini, J.; Bernardini, G.; Porcelli, R.; Masarati, P. In-Flight Health Monitoring of Helicopter Blades via Differential Analysis. Aerosp. Sci. Technol. 2019, 88, 436–443. [Google Scholar] [CrossRef]
Večeř, P.; Kreidl, M.; Šmíd, R. Condition Indicators for Gearbox Condition Monitoring Systems. Acta Polytech. 2005, 45, 35–43. [Google Scholar] [CrossRef]
Samuel, P.D.; Pines, D.J. A Review of Vibration-Based Techniques for Helicopter Transmission Diagnostics. J. Sound Vib. 2005, 282, 475–508. [Google Scholar] [CrossRef]
Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of Machine Learning to Machine Fault Diagnosis: A Review and Roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
Jia, F.; Lei, Y.; Lin, J.; Zhou, X.; Lu, N. Deep Neural Networks: A Promising Tool for Fault Characteristic Mining and Intelligent Diagnosis of Rotating Machinery with Massive Data. Mech. Syst. Signal Process. 2016, 72–73, 303–315. [Google Scholar] [CrossRef]
Janssens, O.; Slavkovikj, V.; Vervisch, B.; Stockman, K.; Loccufier, M.; Verstockt, S.; Van de Walle, R.; Van Hoecke, S. Convolutional Neural Network Based Fault Detection for Rotating Machinery. J. Sound Vib. 2016, 377, 331–345. [Google Scholar] [CrossRef]
Wen, L.; Li, X.; Gao, L.; Zhang, Y. A New Convolutional Neural Network-Based Data-Driven Fault Diagnosis Method. IEEE Trans. Ind. Electron. 2018, 65, 5990–5998. [Google Scholar] [CrossRef]
Chen, Z.; Gryllias, K.; Li, W. Mechanical Fault Diagnosis Using Convolutional Neural Networks and Extreme Learning Machine. Mech. Syst. Signal Process. 2019, 133, 106272. [Google Scholar] [CrossRef]
Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep Learning and Its Applications to Machine Health Monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
Ding, Y.; Jia, M.; Miao, Q.; Cao, Y. A Novel Time–Frequency Transformer Based on Self-Attention Mechanism and Its Application in Fault Diagnosis of Rolling Bearings. Mech. Syst. Signal Process. 2022, 168, 108616. [Google Scholar] [CrossRef]
Zhang, Z.; Song, W.; Li, Q. Dual-Aspect Self-Attention Based on Transformer for Remaining Useful Life Prediction. IEEE Trans. Instrum. Meas. 2022, 71, 2505711. [Google Scholar] [CrossRef]
Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A New Deep Learning Model for Fault Diagnosis with Good Anti-Noise and Domain Adaptation Ability on Raw Vibration Signals. Sensors 2017, 17, 425. [Google Scholar] [CrossRef]
Shao, H.; Jiang, H.; Zhao, H.; Wang, F. A Novel Deep Autoencoder Feature Learning Method for Rotating Machinery Fault Diagnosis. Mech. Syst. Signal Process. 2017, 95, 187–204. [Google Scholar] [CrossRef]
Ding, Y.; Jia, M.; Zhuang, J.; Cao, Y.; Zhao, X.; Lee, C.G. Self-Supervised Pretraining via Contrast Learning for Intelligent Incipient Fault Detection of Bearings. Reliab. Eng. Syst. Saf. 2022, 218, 108126. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Li, W.; Huang, R.; Li, J.; Liao, Y.; Chen, Z.; He, G.; Yan, R.; Gryllias, K. A Perspective Survey on Deep Transfer Learning for Fault Diagnosis in Industrial Scenarios: Theories, Applications and Challenges. Mech. Syst. Signal Process. 2022, 167, 108487. [Google Scholar] [CrossRef]
Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J.W. A Theory of Learning from Different Domains. Mach. Learn. 2010, 79, 151–175. [Google Scholar] [CrossRef]
Zhang, B.; Li, W.; Hao, J.; Li, X.-L.; Zhang, M. Adversarial Adaptive 1-D Convolutional Neural Networks for Bearing Fault Diagnosis under Varying Working Conditions. arXiv 2018, arXiv:1805.00778. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Sun, J.Q. Intelligent Rotating Machinery Fault Diagnosis Based on Deep Learning Using Data Augmentation. J. Intell. Manuf. 2020, 31, 433–452. [Google Scholar] [CrossRef]
Zhu, J.; Chen, N.; Shen, C. A New Deep Transfer Learning Method for Bearing Fault Diagnosis under Different Working Conditions. IEEE Sens. J. 2020, 20, 8394–8402. [Google Scholar] [CrossRef]
Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 97–105. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Han, T.; Liu, C.; Yang, W.; Jiang, D. A Novel Adversarial Learning Framework in Deep Convolutional Neural Network for Intelligent Diagnosis of Mechanical Faults. Knowl.-Based Syst. 2019, 165, 474–487. [Google Scholar] [CrossRef]
Siahpour, S.; Li, X.; Lee, J. Deep Learning-Based Cross-Sensor Domain Adaptation for Fault Diagnosis of Electro-Mechanical Actuators. Int. J. Dyn. Control 2020, 8, 1054–1062. [Google Scholar] [CrossRef]
Yang, B.; Lei, Y.; Jia, F.; Xing, S. An Intelligent Fault Diagnosis Approach Based on Transfer Learning from Laboratory Bearings to Locomotive Bearings. Mech. Syst. Signal Process. 2019, 122, 692–706. [Google Scholar] [CrossRef]
Cao, P.; Zhang, S.; Tang, J. Preprocessing-Free Gear Fault Diagnosis Using Small Datasets with Deep Convolutional Neural Network-Based Transfer Learning. IEEE Access 2018, 6, 26241–26253. [Google Scholar] [CrossRef]
Zhang, Y.; An, X.; Zhang, L.; Liao, Z. An Intelligent Fault Detection Framework for FW-UAV Based on Hybrid Deep Domain Adaptation Networks and the Hampel Filter. Int. J. Intell. Syst. 2023, 2023, 6608967. [Google Scholar] [CrossRef]
Sofronas, G.; Mechefske, C.; Peng, Q.; Giannetti, C. Physics-Informed Neural Networks for the Condition Monitoring of Rotating Shafts. Sensors 2024, 24, 207. [Google Scholar] [CrossRef]
Brito, L.C.; Susto, G.A.; Brito, J.N.; Duarte, M.A.V. Fault Diagnosis Using eXplainable AI: A Transfer Learning-Based Approach for Rotating Machinery Exploiting Augmented Synthetic Data. Expert Syst. Appl. 2023, 232, 120860. [Google Scholar] [CrossRef]
Guo, L.; Lei, Y.; Xing, S.; Yan, T.; Li, N. Deep Convolutional Transfer Learning Network: A New Method for Intelligent Fault Diagnosis of Machines with Unlabeled Data. IEEE Trans. Ind. Electron. 2019, 66, 7316–7325. [Google Scholar] [CrossRef]
Ducoffe, M.; Fink, O.; Michau, G.; Rodriguez Garcia, G. Airbus Helicopter Accelerometer Dataset; ETH Zürich Research Collection: Zürich, Switzerland, 2020. [Google Scholar] [CrossRef]
Abbeel, P.; Coates, A.; Ng, A.Y. Autonomous Helicopter Aerobatics through Apprenticeship Learning. Int. J. Robot. Res. 2010, 29, 1608–1639. [Google Scholar] [CrossRef]
Punjani, A.; Abbeel, P. Deep Learning Helicopter Dynamics Models. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 3223–3230. [Google Scholar] [CrossRef]
Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
Welch, P.D. The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms. IEEE Trans. Audio Electroacoust. 1967, 15, 70–73. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates, Inc.: Long Beach, CA, USA, 2017; pp. 5998–6008. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
Sun, B.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Amsterdam, The Netherlands, 8–16 October 2016; pp. 443–450. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
Airbus. Anomaly Detection on Time Series Data Challenge Rules. 2019, p. 8. Available online: https://www.scribd.com/document/406705434/Anomaly-Detection-Time-Series-Helicopter-Predictive-Maintenance (accessed on 14 November 2025).

Figure 1. The overall structure of the framework.

Figure 2. The Effectiveness of CORAL. The arrow illustrates the significant reduction in domain discrepancy as the target features are aligned with the source distribution.

Figure 3. Normalized Anomaly Index (NAI) Component Weights.

Figure 4. Normalized Anomaly Index (NAI) for each flight maneuver. The color gradient from blue to red represents the increase in NAI values.

Figure 5. Correlation between CES Difficulty Rank and Normalized Anomaly Index (NAI). The red dashed line represents the linear regression, and the red shaded area indicates the 95% confidence interval.

Figure 6. Three-dimensional visualization of the relationship between Control Effort Score (CES) difficulty rank, Pattern4 component activation, and Normalized Anomaly Index (NAI) for representative flight maneuvers. Four maneuvers from different difficulty groups (Easy: Dodging Demos 1, Moderate: Turn Demos 3, Hard: Flips Loops, Extreme: Freestyle Aggressive) demonstrate the strong correlation (Spearman ρ = 0.903, p < 0.001) between objective difficulty and predicted anomaly index by model. The Pattern4 component, which received the highest weight (29.06%) in the optimized detection framework, shows progressive activation with increasing maneuver complexity. The semi-transparent surface represents the learned manifold relating all three dimensions, while colored regions on the base plane indicate difficulty groupings.

Figure 7. Time–frequency analysis illustrating non-stationary vibration characteristics associated with Pattern 4. Comparison between Forward Sideways (CES Difficulty Rank 1) and Freestyle Aggressive (CES Difficulty Rank 10). Three frequency bands are highlighted where the largest spectral differences occur: Band A (80–90 Hz), Band B (95–110 Hz), and Band C (115–130 Hz). (a) Forward Sideways exhibits consistent spectral content across all bands over time, characteristic of stable flight dynamics. (b) Freestyle Aggressive shows substantial time-varying energy distribution with transient bursts particularly visible in the highlighted bands. The temporal energy variations are consistent with the type of transient/non-stationary characteristics that Pattern 4 is designed to capture through attention-based temporal pattern recognition, supporting our interpretation of its highest weight assignment (w₄ = 0.2906) in the NAI computation.

Figure 8. Power spectral density comparison between Forward Sideways (CES Difficulty Rank 1) and Freestyle Aggressive (CES Difficulty Rank 10). Three frequency bands are highlighted where substantial spectral differences occur: Band A (80–90 Hz, +6.3 dB), Band B (95–110 Hz, +4.9 dB), and Band C (115–130 Hz, +4.7 dB). The aggressive maneuver shows consistently higher energy across all three bands (≈3–4× power increase), consistent with the transient vibration bursts visible in the time–frequency analysis (Figure 7). These frequency ranges may be associated with dynamic aerodynamic loading and rapid attitude changes during aggressive maneuvering.

Figure 9. Ablation Study: Performance Contribution of Key Components. The arrow illustrates the significant increase in correlation coefficients.

Table 1. Specifications of Airbus Helicopter Dataset.

Component	Specification	Description
Sampling rate	1024 Hz	Nyquist 512 Hz
Sensor	3-axis Accelerometer	X, Y, Z
Sequence length	61,440 Sample	60 s
Train dataset	1677 Sequences	Normal data
Validation dataset	594 Sequences	Normal/Anomaly data

Table 2. Specifications of Stanford RC Helicopter Dataset.

Component	Specification	Description
Sampling rate	333 Hz	Nyquist 166.5 Hz
Sensor	3-axis Accelerometer	X, Y, Z
Sequence length	Variable (10–180 s)	Depends on maneuver
Data points	Variable (3330–59,940)	Depends on maneuver
Labels	None (unlabeled)	CES used as a proxy metric
Total maneuvers	20 Types	From forward to circles

Table 3. Extracted Time Domain Features.

Feature Category	Feature Name	Equation
Statistical moments	Mean	$\frac{1}{N} \sum_{i = 1}^{N} x_{i}$
	Standard Deviation	$\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}}$
	Skewness	$\frac{E [{(X - μ)}^{3}]}{σ^{3}}$
	Excess Kurtosis	$\frac{E [{(X - μ)}^{4}]}{σ^{4}} - 3$
Vibration Metrics	RMS (Root Mean Square)	$\sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}}$
	Crest Factor	$\frac{\max (\| x_{i} \|)}{RMS}$
	Shape Factor	$\frac{RMS}{\frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|}$
Time-Domain Features	Zero-Crossing Rate	$\frac{1}{N - 1} \sum_{i = 1}^{N - 1} 1 [x_{i} \cdot x_{i + 1} < 0]$
Time-Domain Features	Mean Absolute Difference	$\frac{1}{N - 1} \sum_{i = 1}^{N - 1} \| x_{i + 1} - x_{i} \|$

Table 4. Extracted Frequency Domain Features.

Feature Category	Feature Name	Equation
Spectral Shape	Spectral Centroid	$\frac{\sum f \cdot P (f)}{\sum P (f)}$
	Spectral Variance	$\frac{\sum {(f - f_{c})}^{2} \cdot P (f)}{\sum P (f)}$
	Spectral Entropy	$H = - \sum p (f) \log p (f)$
Spectral Bands	Band Power (Low)	$P_{low} = \int_{5}^{15} P (f) d f$
	Band Power (Mid)	$P_{mid} = \int_{15}^{50} P (f) d f$
	Band Power (High)	$P_{high} = \int_{50}^{150} P (f) d f$

Table 5. Complete Transformer Architecture Specification.

Component	Specification
Input dimension (d_in)	45 features (15 features × 3 axes)
Sequence length (T)	60 time steps
Model dimension (d_model)	128
Number of encoder layers (n_layers)	3
Number of attention heads (n_heads)	8
Head dimension (d_k = d_v)	16 (= d_model/n_heads)
Feed-forward dimension (d_ff)	512 (= 4 × d_model)
Activation function	GELU
Normalization	Pre-Layer Normalization
Attention dropout	0.1
Feed-forward dropout	0.1
Classifier dropout	0.3
Positional encoding	Sinusoidal (fixed)
Total trainable parameters	~1.2 million

Table 6. Label Smoothing Sensitivity Analysis.

ε Value	F-β Score	Precision	Recall	ECE *
0.0 (none)	0.951	0.989	0.687	0.089
0.05	0.958	0.994	0.695	0.052
0.1 (selected)	0.963	0.997	0.703	0.031
0.15	0.959	0.995	0.698	0.028
0.2	0.948	0.991	0.681	0.025

* Expected Calibration Error: Lower is better.

Table 7. Pattern Component Definitions and Physical Interpretations.

Pattern	Computation (Projection onto)	Physical Interpretation
Pattern 1	1st eigenvector	Global trend: Captures overall signal magnitude and DC offset variations
Pattern 2	2nd eigenvector	Primary oscillation: Captures dominant periodic component (main rotor frequency)
Pattern 3	3rd eigenvector	Secondary oscillation: Captures blade passing frequency harmonics
Pattern 4	4th eigenvector	Transient/Non-stationary: Captures irregular patterns and anomalous deviations

Table 8. Comparative Analysis of Domain Adaptation Methods.

Method	MMD² Reduction	Spearman ρ (Target)	Training Time	Stability
No Adaptation	—	0.462	1.0×	High
CORAL (Proposed)	79.7%	0.903	1.1×	High
Deep CORAL	82.1%	0.889	1.8×	Medium
MMD	75.3%	0.867	1.4×	High
DANN	84.5%	0.845	2.3×	Low
Adversarial	86.2%	0.823	3.1×	Low

Table 9. Optimization of the λ Parameter.

Parameter λ	Variance Ratio
0.1	2.34
0.2	3.12
0.3	3.89
0.4	3.56
0.5	3.23

Table 10. Complete Training Configuration and Hyperparameters.

Category	Parameter	Optima Value
Transformer Architecture	d_model	128
	Number of layers	3
	Number of heads	8
	d_ff	512
	Dropout (attention, FF)	0.1
	Dropout (classifier)	0.3
	Activation	GELU
	Normalization	Pre-LayerNorm
Optimization	Optimizer	AdamW
	Learning rate (η_max)	5 × 10⁻⁵
	Minimum LR (η_min)	1 × 10⁻⁶
	Weight decay	5 × 10⁻⁴
	β₁, β₂ (Adam)	0.9, 0.999
	Gradient clipping	max_norm = 1.0
Training Schedule	Batch size	32
	Total epochs (max)	100
	Actual epochs (stopped)	43
	Early stopping patience	10 epochs
	LR scheduler	Cosine Annealing w/Warm Restarts
	Warmup steps	1000
Multi-Task Loss	α (prediction)	1.0
	β (reconstruction)	0.5
	γ (anomaly)	2.0
	δ (CORAL)	0.3
Regularization	Label smoothing (ε)	0.1
Regularization	CORAL regularization (λ_reg)	1 × 10⁻⁶
Computational Resources	GPU	NVIDIA A100 (40 GB)
	Training time (Stage 1)	~12.0 h
	Training time (Stage 2)	~30 min
	Random seed	42

Table 11. Hyperparameter Sensitivity Analysis.

Parameter	Search Range	Optimal Value	Sensitivity
d_model	[64, 128, 256, 512]	128	Moderate
n_layers	[1, 2, 3, 4, 6]	3	High
n_heads	[2, 4, 8, 16]	8	Low
d_ff ratio	[2×, 4×, 8×]	4 × (512)	Low

Table 12. Performance at ±1 Step from Optimal Configuration.

Configuration	F-β Score	Δ from Optimal
d_model = 64	0.921	−4.4%
d_model = 128 (optimal)	0.963	—
d_model = 256	0.958	−0.5%
n_layers = 2	0.912	−5.3%
n_layers = 3 (optimal)	0.963	—
n_layers = 4	0.959	−0.4%
n_heads = 4	0.955	−0.8%
n_heads = 8 (optimal)	0.963	—
n_heads = 16	0.961	−0.2%

Table 13. Loss Weight Optimization Results.

Trial	α	β	γ	F-β Score
Initial (heuristic)	1.0	0.5	2.0	0.958
Best (Trial 47/100)	1.0	0.5	2.0	0.963
Alternative 1	1.0	0.6	2.2	0.961
Alternative 2	0.8	0.4	2.0	0.959

Table 14. Hyperparameters for Stage 2: Target Domain Adaptation.

Parameter	Setting	Selection Justification
Base model	Loaded from stage 1	Transfer pre-trained knowledge
Model weights	Frozen	Zero-Shot setting (no re-training)
Adaptation Method	CORAL	Align second-order statistics
Adaptation Weight	δ (CORAL) = 0.3	Optima for domain alignment
Target Data (Stanford)	Unlabeled	Validates zero-shot capability

Table 15. Multi-Task Transformer Performance on Source Domain.

Task	Metric	Single-Task	Multi-Task (Improvement)
Prediction	MAE	0.059 ± 0.005	0.034 ± 0.003 (−42.4%)
Prediction	RMSE	0.091 ± 0.008	0.056 ± 0.005 (−38.5%)
Reconstruction	MSE	0.055 ± 0.006	0.027 ± 0.003 (−50.9%)
Anomaly Detection	Precision	0.779 ± 0.024	0.997± 0.003 (+28.0%)
	Recall	0.858 ± 0.012	0.703± 0.015 (−18.1%)
	F-β (β = 0.3)	0.785 ± 0.022	0.963± 0.009 (+22.7%)
	95% CI * (F-β)	[0.763, 0.807]	[0.951, 0.974] (—)
	ROC-AUC	0.891 ± 0.015	0.967 ± 0.008 (+8.5%)

* Confidence intervals computed via bootstrap resampling (1000 iterations).

Table 16. Quantitative Contribution of Multi-Task Heads.

Head	Gradient Importance	Ablation Impact (Δ F-β)	Role
Prediction	18.3%	−0.087 (−9.0%)	Temporal dynamics learning
Reconstruction	24.1%	−0.112 (−11.6%)	Feature representation enrichment
Anomaly Detection	57.6%	−0.321 (−33.3%)	Primary classification objective

Table 17. Domain Alignment Metrics Before and After CORAL Adaptation.

Metric	Before CORAL	After CORAL	Reduction
MMD²	0.823 [0.798, 0.851]	0.167 [0.154, 0.182]	79.7% [75.2%, 84.1%]
Frobenius Distance (Cov)	12.45 [11.89, 13.02]	3.21 [2.95, 3.48]	74.2% [70.1%, 78.3%]
Mean Alignment Error	0.412 [0.389, 0.436]	0.089 [0.078, 0.101]	78.4% [74.6%, 82.1%]
KL Divergence (per dim)	2.34 [2.18, 2.51]	0.67 [0.58, 0.77]	71.4% [66.8%, 75.9%]

Table 18. Components and Optimal Weights for NAI.

Component Group	Component (s_i)	Optimal Weight (w_i)	Source
Group A: Model-direct Analysis (CORAL-independent)	Prediction	0.0080	Transformer Head
	Reconstruction	0.0279
	Anomaly Head	0.0564
	Temporal	0.2137
	Spectral	0.0316
	Pattern 1	0.0168
	Pattern 2	0.0183
	Pattern 3	0.0116
	Pattern 4	0.2906
Group B: Embedding-based Analysis (CORAL-dependent)	Density (GMM)	0.0132	CORAL-Aligned Embedding
	K-means	0.0320
	IsoForest	0.0460
	LOF	0.1010
	Elliptic	0.1330

Table 19. Differential Evolution Algorithm Configuration.

Parameter	Value	Justification
Population size	50	~3.5× number of parameters (14)
Mutation factor (F)	0.8	Standard range [0.5, 1.0]
Crossover rate (CR)	0.9	High CR for continuous optimization
Strategy	best/1/bin	Exploits best solution
Maximum generations	200	Convergence observed by ~150
Convergence tolerance	1 × 10⁻⁶	Relative improvement threshold
Weight bounds	[0, 1] for all w_i	Natural probability-like bounds
Sum constraint	None (unconstrained)	Spearman ρ is rank-based
Random seed	42	Reproducibility

Table 20. Optimization Trajectory.

Generation	Best Spearman ρ	Mean Spearman ρ
1	0.654	0.412
50	0.856	0.798
100	0.897	0.876
150	0.903	0.899
200	0.903	0.901

Table 21. CES Difficulty Rank and Normalized Anomaly Index (NAI) for 20 Flight Maneuvers.

Maneuver Name	CES Difficulty Rank	Normalized Anomaly Index	Description
Forward sideways flight	1	0.0000	The pilot flies forward, backward, left and right at low speeds.
Inverted vertical sweeps	1	0.0655	The pilot exercises the collective stick during vertical climbs and descents, all while the helicopter is inverted.
Vertical sweeps	2	0.1523	The pilot exercises the collective pitch control during repeated climbs and descents.
Stop and go	2	0.2123	The pilot repeatedly accelerates and decelerates the helicopter, traveling from one position to another.
Dodging demos3	3	0.2612	The pilot now aggressively rolls the helicopter onto its side and launches sideways away from the flight path.
Dodging demos1	3	0.4262	The pilot demonstrates a quick “jinking” maneuver that might be used to dodge an obstacle.
Turn demos1	4	0.2170	The pilot demonstrates banked turns to various headings.
Dodging demos4	4	0.2328	The pilot aggressively rolls the helicopter onto its side and launches sideways away from the flight path.
Free fall	5	0.2210	The pilot climbs up to a safe altitude, then allows the helicopter to free fall, usually with the rotor disk perpendicular to the ground.
Dodging demos2	5	0.5099	The pilot demonstrates a quick “jinking” maneuver that might be used to dodge an obstacle.
Turn demos2	6	0.5319	The pilot demonstrates banked turns to various headings.
Turn demos3	6	0.6472	The pilot demonstrates banked turns to various headings.
Circles	7	0.2644	The pilot flies in circles and performs fast circular “funnels”.
Orientation sweeps	7	0.2778	The pilot exercises the cyclic and rudder controls, each in turn, while keeping the helicopter’s horizontal/forward speed very low.
Orientation sweeps with motion	8	0.5343	The pilot exercises the cyclic and rudder controls.
Flips loops	8	0.6227	The pilot demonstrates in-place flips. The flips are slowly expanded into loops.
Freestyle gentle	9	0.7411	The pilot demonstrates a free-style aerobatic routine, but limited to relatively gentle maneuvers.
Tictocs	9	0.7743	The pilot performs tic-toc and a number of variations including rainbows and slappers.
Chaos	10	0.7229	The pilot demonstrates this maneuver at increasing flip and pirouette rates.
Freestyle aggressive	10	1.0000	The pilot demonstrates a full-throttle free-style aerobatic routine.

Table 22. Statistical Significance Tests for Target Domain Performance.

Test	Statistic	p-Value
Spearman correlation (CES, NAI)	ρ = 0.903	p < 0.001
Kendall τ (CES, NAI)	τ = 0.768	p < 0.001
Bootstrap 95% CI (Spearman ρ)	[0.856, 0.943]	—
Permutation test (1000 permutations)	—	p < 0.001
Cliff’s Delta (Hard vs. Easy groups)	δ = 0.920	Large effect

Table 23. Ablation Study: Isolating CORAL and Multi-Task Effects.

Configuration	F-β (Source)	Spearman ρ (Target)	Improvement over Baseline
Single-Task, No CORAL (Baseline)	0.785	0.312	—
Single-Task + CORAL	0.785	0.687	+120.2% (target)
Multi-Task, No CORAL	0.963	0.462	+48.1% (target)
Multi-Task + CORAL (Proposed)	0.963	0.903	+189.4% (target)

Table 24. Comprehensive Component Ablation Results.

Configuration	F-β (Source)	Spearman ρ (Target)	Notes *
Full Model (Proposed)	0.963	0.903	All components
− Prediction Head	0.876 (−9.0%)	0.845 (−6.4%)	Reduced temporal modeling
− Reconstruction Head	0.851 (−11.6%)	0.812 (−10.1%)	Weaker feature learning
− Anomaly Head	N/A	0.523 (−42.1%)	No direct classification
− CORAL Adaptation	0.963 (0%)	0.462 (−48.8%)	No domain alignment
− Multi-Task (Single Anomaly)	0.785 (−18.5%)	0.687 (−23.9%)	Baseline single-task + CORAL
− Pattern Components	0.963 (0%)	0.654 (−27.6%)	Only direct heads in NAI

* All ablation differences are statistically significant (p < 0.01, paired t-test with 5-fold CV).

Table 25. NAI Weight Optimization Validation.

Weight Configuration	Spearman ρ	95% CI	p-Value
Equal Weights (w_i = 1/14)	0.756	[0.685, 0.823]	<0.001
Heuristic Weights	0.834	[0.772, 0.889]	<0.001
Optimized Weights (DE)	0.903	[0.856, 0.943]	<0.001

Table 26. Summary of Framework Capabilities and Limitations.

Aspect	Current Capability	Limitation/Future Work
Classification	Binary (normal/abnormal)	Cannot identify specific failure types
Validation scope	Single source-target pair (Airbus to Stanford)	Untested on other platform combinations
Platform configurations	Conventional single main rotor helicopters	Untested on tandem, coaxial or electric configurations (e.g., eVTOL)
Adaptation mode	Offline (batch CORAL)	Cannot adapt to operational distribution drift
Target domain evaluation	Proxy metric (CES)	No ground truth fault labels available
Data requirements	Labeled source + unlabeled target	Requires sufficient source domain labels
Computational requirements	~12 hours training (A100 GPU)	May require optimization for edge deployment
Interpretability	Pattern components provide partial interpretability	Deep feature interactions remain opaque

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jang, G.; Kwon, Y. From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection. Actuators 2026, 15, 38. https://doi.org/10.3390/act15010038

AMA Style

Jang G, Kwon Y. From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection. Actuators. 2026; 15(1):38. https://doi.org/10.3390/act15010038

Chicago/Turabian Style

Jang, Geuncheol, and Yongjin Kwon. 2026. "From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection" Actuators 15, no. 1: 38. https://doi.org/10.3390/act15010038

APA Style

Jang, G., & Kwon, Y. (2026). From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection. Actuators, 15(1), 38. https://doi.org/10.3390/act15010038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection

Abstract

1. Introduction

1.1. Related Work

1.1.1. Helicopter Vibration Analysis

1.1.2. Deep Learning for Rotating Machinery Diagnostics

1.1.3. Transfer Learning and Domain Adaptation in Machinery Diagnostics

1.1.4. UAV and Unmanned Helicopter Health Monitoring

1.2. Contributions and Organization of the Paper

2. Materials and Methods

2.1. Dataset

2.1.1. Airbus Helicopter Dataset

2.1.2. Stanford RC Helicopter Dataset

2.2. Methods

2.2.1. The Overall Framework

2.2.2. Signal Preprocessing and Feature Extraction

2.2.3. Multi-Task Transformer Model

2.2.4. CORAL-Based Domain Adaptation

2.2.5. Pattern Component Extraction and Interpretation

2.2.6. Domain Adaptation Method Selection

2.2.7. CES (Control Effort Score) Definition and Equation

2.3. Experimental Setup

2.3.1. Hyperparameter Settings

2.3.2. Training Strategy

2.3.3. Validation Protocol

2.4. Implementation Details

3. Results

3.1. Source Domain Evaluation (Airbus)

3.1.1. Multi-Task Transformer Performance

3.1.2. Evaluation Protocol on Source Domain

3.2. Target Domain Evaluation (Stanford)

3.2.1. Features Distribution Alignment

3.2.2. Normalized Anomaly Index (NAI)

3.2.3. Correlation Analysis of CES and Normalized Anomaly Index (NAI)

3.3. Ablation Studies

3.3.1. Isolating CORAL and Multi-Task Effects

3.3.2. Component-Wise Ablation

3.3.3. Weight Optimization Validation

4. Discussion

4.1. Significance of the Research Findings

4.2. Limitations and Future Work

4.2.1. Binary Classification Limitation

4.2.2. Limited Validation Scope

4.2.3. Limited Generalization to Unconventional Configurations

4.2.4. Real-Time Adaptation Capability

4.2.5. Absence of Ground Truth in Target Domain

4.2.6. Complementary Approaches

4.3. Practical Implications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI