4.5.1. Overview of the Flight-State-Driven Threshold Optimization Method
The proposed workflow is summarized below and illustrated as a flowchart in
Figure 8. Building on the statistics-based step, the procedure first performs proactive spike filtering and then refines thresholds using flight-state context. In the first step, a proactive statistical filtering and baseline thresholding process is applied. A Z-score rule is used to remove upper-tail spikes exceeding
, and, based on the cleaned distribution, a baseline CI threshold is established at
. In the second step, candidate fault-operations are identified. Flight operations with CI values exceeding the
baseline are marked as candidate fault operations, and evaluation windows are defined around the corresponding alarms or exceedances. In the third step, a Principal Component Analysis (PCA) space is constructed using flight parameter vectors from operations that exclude these candidate fault operations. The number of components retained corresponds to the level required to achieve 95% cumulative explained variance. In the fourth step, reconstruction error screening is applied to the evaluation windows. The flight parameter sets from candidate fault operations are projected into the PCA space, and the reconstruction error is computed as described in
Section 4.5.2.
If the reconstruction error is large, the corresponding maneuver deviates meaningfully from nominal flight behavior. In this case, the associated CI variation is interpreted as being driven by the flight state, and these samples are retained in the threshold-estimation dataset. Conversely, if the reconstruction error is small, the flight profile is nominal while only the CI exhibits a transient spike. These samples are treated as spike-like artifacts and excluded from threshold estimation to mitigate the influence of false alarms.
Finally, the CI threshold is re-estimated using the filtered dataset, and the resulting performance is evaluated.
4.5.2. Principal Component Analysis
Principal Component Analysis (PCA) is a standard technique for dimensionality reduction that projects high-dimensional data onto a lower-dimensional subspace spanned by the principal components [
30,
31]. PCA iteratively finds orthogonal axes that (equivalently) minimize reconstruction error or maximize preserved variance. The first component explains the largest variance in the data, the second component is orthogonal to the first component and explains the next largest variance, and so on. Through projection and reconstruction, the reconstruction error can be computed and subsequently used for outlier screening and anomaly detection [
32].
Figure 9 illustrates the core concept of PCA. The left panel presents a scatter plot of two variables
where the arrows indicate the first component (PC1), representing the direction that preserves the largest overall variance, and the second component (PC2), which is orthogonal to PC1 and accounts for the second-largest variance. The right panel shows the same samples projected onto three axes: PC1 (blue), an arbitrary axis (yellow) rotated about PC1, and PC2 (green). This visualization demonstrates that variance is maximized along PC1 and minimized along PC2, which is consistent with PCA’s objective of finding “maximum-variance” directions [
23,
33]. In the following, the term “principal component” (PC) refers to the axis or direction, whereas the corresponding scalar projection of each sample onto a PC is referred to as the principal component score (or simply “PCA score”).
In the proposed workflow, flight parameter vectors from candidate fault operations are projected into the PCA space derived from non-candidate operations, and then reconstructed using only the retained components. In this study, the “flight state” of each operation is represented by a small set of PCA-based indices computed from the selected flight parameters. For every operation where a CI is evaluated, the corresponding values of the selected flight parameters (
Section 3.1) are assembled into a single feature vector. PCA is trained on these vectors from nominal operations and produces a few principal components (PCs) that capture the dominant patterns of variation. Each operation is then described by its principal component scores, i.e., the numerical values obtained by projecting the feature vector onto the retained PCs. This short vector of scores is treated as a compact, continuous representation of the flight state, without introducing any discrete state labels or clustering.
The resulting reconstruction error is interpreted qualitatively: large errors indicate deviations driven by flight-state changes, which are included for threshold estimation, whereas small errors indicate spike-like CI excursions under nominal flight profiles and are excluded. Details of the computation are described in the next section; here, the focus is on how reconstruction error guides the decisions of inclusion and exclusion within the flight-state-driven threshold optimization process.
4.5.3. PCA Reconstruction-Error Computation
In the proposed workflow, the PCA model is not used to construct additional health indicators directly from the principal component analysis, nor to assign discrete flight-state classes. Instead, it serves as a compact representation of nominal flight behavior in the selected flight parameter space. Flight parameter vectors from normal operations are expected to lie close to the PCA subspace and to be well reconstructed from the retained components, whereas unusual operating conditions manifest as larger reconstruction errors. The PCA reconstruction error is therefore used as a scalar measure of how far each operation departs from the normal flight-state manifold. This quantity is then exploited to distinguish flight-state-driven CI variations from spike-like artifacts that occur under otherwise nominal flight profiles.
The procedure for computing PCA-based reconstruction errors on flight parameters is summarized schematically in
Figure 10 and proceeds as follows:
- (i)
Baseline matrix construction: A baseline matrix is built from nominal operations. One hundred flight operations regarded as nominal are randomly sampled, and for each flight operation where a CI is recorded, a flight parameter vector is extracted that consists of the average value of each flight parameter within the ±10 s window corresponding to that CI. These flight parameter vectors are then assembled to form matrix A.
- (ii)
PCA fitting and component selection: PCA is applied to A, and the minimum number of principal components required to explain at least 95% of the cumulative variance is retained. This basis defines the nominal PCA space.
- (iii)
Nominal reconstruction and error quantification: Using the retained components, the nominal data are projected and reconstructed to obtain , and reconstruction errors are computed for . The resulting nominal error distribution serves as the reference for subsequent evaluation.
- (iv)
Evaluation of candidate fault operations: For each candidate fault operation (
Section 4.5.1), the corresponding flight parameter vector is extracted to form vector
. The flight parameter vector
is then projected into the nominal PCA space and reconstructed to obtain
, yielding the reconstruction error for each candidate operation.
- (v)
One-sided decision rule: From the nominal error distribution in step (iii), a one-sided cutoff is defined at the 97.5th percentile (i.e., the upper 2.5%), as illustrated by the red dashed line in
Figure 10 [
34,
35]. If a candidate window’s reconstruction error exceeds this cutoff, the corresponding flight profile is interpreted as meaningfully deviating from nominal (e.g., due to aggressive maneuvering or an operating-condition shift). The associated CI peak is then treated as flight-state-driven, and the window is included in the dataset for threshold estimation. Conversely, if the reconstruction error does not exceed the cutoff, the flight profile is considered nominal while only the CI exhibits a transient spike. Such windows are treated as spike-like artifacts (e.g., brief sensor disturbances) and excluded from threshold estimation.
Figure 10.
Conceptual diagram of PCA-based reconstruction error of abnormal operation evaluation.
Figure 10.
Conceptual diagram of PCA-based reconstruction error of abnormal operation evaluation.
Figure 11 illustrates two examples across all 45 flight parameters: In
Figure 11a, several variables exceed the one-sided cutoff, indicating broad deviations driven by changes in operating conditions. In contrast,
Figure 11b remains below the cutoff, suggesting a localized anomaly that is likely unrelated to a true change in operating state.
In this study, the PCA reconstruction error is used not only to determine inclusion or exclusion during threshold re-estimation but also as a signal to adjust the threshold itself to the operational context. For operations with reconstruction errors exceeding the 97.5th percentile, a positive weight proportional to the degree of exceedance is applied to raise the threshold for those windows. This adjustment reflects the fact that, under flight states deviating from nominal, CI values may legitimately increase within the normal range. By adapting the threshold upward in such cases, unnecessary alarms that would otherwise occur under fixed legacy thresholds can be reduced.
Conversely, when the reconstruction error is at or below the 97.5th percentile, the corresponding CI peaks are interpreted as spike-like events without operational context, and no weighting or downward adjustment is applied. This design choice reflects the fact that lowering the threshold solely on the basis of a small reconstruction error could lead to incidental noise being interpreted as alarms, while the proposed procedure already maintains robust detection performance for genuine anomalies. Therefore, the potential risks of unnecessary alarms outweigh the marginal benefits of downward threshold adjustment.
Threshold adjustment proceeds as follows. For each operation, let
denote the flight parameter vector and
its reconstruction from the retained principal components. The PCA reconstruction error is defined as
where
is the dimension of the flight parameter vector. Let
denote the 97.5-percentile of the nominal reconstruction error distribution. The exceedance score is computed as
where
is a PCA reconstruction error for the given operation, and
is the reference cutoff for reconstruction error, and
is a normalized exceedance beyond the cutoff, indicating how much
exceeds
in relative terms. The exceedance score is then scaled into a weight
according to
where
is the weight applied to the corresponding operation (
indicates no adjustment, while larger values of
produce a stronger upward shift). The coefficient
controls the strength of the upward adjustment. In this study,
was used based on the sensitivity analysis presented later in
Section 4.5.4 (
Table 8), which indicated this value offers an optimal balance between sensitivity and specificity. Finally, the operation-specific threshold is computed as
By calculating the reconstruction error , the deviation of the current flight state from the nominal manifold can be quantified by using Equation (4). This deviation is then mapped to a weight using a scaling function, which dynamically adjusts the detection threshold by using Equation (5). This mathematical mechanism ensures that the detection boundary “adapts” to the energy of the flight maneuver, thereby stabilizing the false alarm rate even under highly dynamic flight conditions.
This scheme classifies CI elevations driven by operational context as normal, while suppressing alarms caused by context-free spikes without compromising sensitivity. As described in the previous sections, the include/exclude filtering is repeated across all windows, and thresholds are recalculated using based on the filtered CI distribution within the overall flight-state-driven threshold optimization framework.
In the implementation, all scalar constants in the threshold optimization workflow were fixed a priori and applied uniformly across all CI component pairs. In the spike filtering step, CI samples exceeding (computed from all operations of a given CI) were treated as outliers and removed. For the density-based method, only samples at or below the 97.5th percentile of the CI amplitude distribution were retained, and the density-based CI threshold was set to of this trimmed distribution; the same was used for the manual-based and flight-state-driven thresholds after their respective filtering steps. For the flight-state-driven method, the PCA reconstruction error cutoff was likewise fixed at the 97.5th percentile of the nominal reconstruction error distribution. Operation-wise exceedances of this cutoff were normalized and converted into weights through a smooth saturating function with an upper gain of 0.5, and the resulting weights were smoothed across neighboring operations using a five-point Gaussian kernel.
4.5.4. Results of the Flight-State-Driven Threshold Optimization
Applying the flight-state-driven outlier removal to HUMS data from in-service rotorcraft enabled finer-grained filtering that reflects flight parameters and operating states, unlike purely statistical or density-based methods. When CI thresholds are estimated using samples that exceed the threshold for reasons unrelated to flight state, CI variance can become overstated and thresholds inflated, which may degrade detection specificity. The proposed procedure retains CI variations explainable by flight state and removes unexplained spike-like excursions, thereby reducing unnecessary alarms while preserving fault-detection performance.
Figure 12 illustrates threshold-optimization examples for each CI, conditioned on whether the PCA reconstruction error (
Section 4.5.3) exceeds the one-sided cutoff. Each panel presents results for the same CI measured at different locations of the component that experienced the actual failure, allowing direct comparison with the legacy thresholding methods shown in
Figure 7. As observed previously in
Figure 7, common outliers appear over similar operation intervals only for the Left Ancillary Intermediate Gear, which was the failed component, and only for the CIs associated with that failure.
Figure 13 provides concrete examples of applying the PCA-reconstruction-error based weighting (
Section 4.5.3) to re-estimate thresholds, demonstrating the method’s effectiveness in distinguishing artifacts from true faults.
Figure 13a illustrates a case of a premature alarm in a non-failing component. Traditional methods (black dashed/solid lines) would have raised a false alarm due to the spike; however, the proposed method (pink line) correctly identifies this as a flight-state-driven variation and adjusts the threshold upward, thereby suppressing the nuisance alarm. Conversely,
Figure 13b presents the real fault case (Left Ancillary Intermediate Gear). Crucially, the proposed method successfully captures the anomaly leading up to the maintenance event. Despite the adaptive thresholding capability, the fault-induced vibration energy sufficiently exceeds the limit, confirming that the method preserves detection sensitivity for actual defects while filtering out noise.
Because spurious alarms can trigger repeated inspections and checks, they impose additional operational and maintenance costs. These examples demonstrate that the proposed flight-state-driven threshold optimization effectively suppresses such unnecessary alarms while preserving accurate detection of true faults. By excluding spike-like segments unrelated to the failures and allowing normal, flight-state-driven amplitude variations through PCA-informed threshold adjustment, the proposed method enhances the overall reliability and interpretability of the alarm system.
After resetting thresholds with the flight-state-driven outlier removal, the total number of alarms decreased while fault detection capability was maintained.
Figure 14 compares the normalized magnitudes of three thresholds: (①) after Z-score-based statistical filtering, (②) after density-based filtering, (③) after the proposed flight-state-driven filtering. All three optimized thresholds fall within a similar range, indicating that the proposed approach establishes a baseline performance comparable to that of existing methods.
Furthermore,
Figure 15 contrasts the proposed approach (right) with the statistical and density-based methods (left). All three methods detect anomalies at similar times around the true-fault interval, while the proposed flight-state-driven method produces fewer spurious alarms during nominal periods, resulting in higher specificity while maintaining comparable sensitivity.
The practical impact of threshold optimization is quantified from two perspectives: minimizing unnecessary alarms during nominal operation and concentrating alarms immediately before a fault. Two evaluation metrics are defined based on previous studies [
12,
36,
37]. Let
denote the interval exhibiting fault-like behavior (anchored at the maintenance operation), and let
represent the background interval comprising all remaining operations. For each operation (
), define the alarm indicator
if at least one alarm occurs during the operation, and
otherwise.
The Background Alarm Rate (
), which measures the frequency of unnecessary alarms during nominal operation, is defined as
where
denotes an operation,
is the binary alarm indicator for the
, and
represents the background set of operations, that is, all operations outside the maintenance window
. A smaller
indicates fewer spurious alarms during nominal operation.
The In-window Alarm Concentration (
), which measures the degree to which alarms are concentrated within the maintenance window
, is defined as
where
denotes the in-window interval of operations used to assess alarm concentration near the fault. Larger
values indicate stronger concentration of alarms within the fault-related interval.
Figure 16 compares the performance of the three methods on two axes: the
(
y-axis) and the
(
x-axis). The
represents the proportion of operations within
M that contain at least one alarm, where higher values indicate stronger fault-related concentration, while the
represents the proportion of operations within
that contain at least one alarm, where lower values indicate fewer unnecessary alarms.
For Z-score-based statistical filtering method (①, shown in orange) and Density-based (②, shown in yellow) filtering method, several instances appear in the high- region where the rises to 0.3–0.8, indicating intermittent spurious alarms during nominal periods. In contrast, the proposed flight-state-driven threshold optimization (③, shown in purple) generally occupies the high- and low- region, with no isolated spikes of high-. In other words, it maintains detection performance comparable to the alternative methods while reducing both the variance and the upper bound of , thereby suppressing unnecessary alarms in many cases.
For the flight-state-driven method, three hyperparameters are considered in the sensitivity analysis: the percentile used to set the PCA reconstruction error cutoff, the percentile used to trim the upper tail of the CI density distribution, and the maximum weighting factor
applied when adjusting thresholds according to flight state deviations. The comparative results indicate that the flight-state-driven threshold optimization achieves a more favorable balance between sensitivity and specificity than the legacy statistical and density-based schemes. Across the verified fault cases, it preserves or slightly improves the
while reducing
, so that alarms are more tightly focused around true fault intervals with fewer nuisance events in nominal operation. Moreover, varying the PCA cutoff, CI density trimming level, and weighting gain for the reconstruction error-based adjustment leads only to modest changes in these metrics, indicating that the proposed method is robust to reasonable hyperparameter choices rather than being tuned to a single configuration. The numerical results summarized in
Table 8 therefore provide additional evidence that embedding flight state information into CI threshold review can enhance HUMS decision confidence under practical operational constraints.
Table 8.
Sensitivity of IAC and BAR to hyperparameters in the flight-state-driven threshold optimization. Baseline and alternative hyperparameter settings for the proposed flight-state-driven method and the corresponding overall IAC and BAR are summarized. For the sub-rows grouped under “Flight-state-driven” in the first column, each hyperparameter is perturbed one at a time from the baseline configuration while the others are held fixed. The gray-shaded cells indicate the perturbed hyperparameter value in each setting, while the baseline row reports the unperturbed configuration. The “Before Optimization”, “Z-score-based”, and “Density-based” rows use the baseline configuration of the legacy methods for comparison.
Table 8.
Sensitivity of IAC and BAR to hyperparameters in the flight-state-driven threshold optimization. Baseline and alternative hyperparameter settings for the proposed flight-state-driven method and the corresponding overall IAC and BAR are summarized. For the sub-rows grouped under “Flight-state-driven” in the first column, each hyperparameter is perturbed one at a time from the baseline configuration while the others are held fixed. The gray-shaded cells indicate the perturbed hyperparameter value in each setting, while the baseline row reports the unperturbed configuration. The “Before Optimization”, “Z-score-based”, and “Density-based” rows use the baseline configuration of the legacy methods for comparison.
| Method | Config. | PCA Reconstruction Error Cutoff [%] | CI Density Trimming Percentile [%] | Flight State Weight (Alpha) | IAC | BAR |
|---|
| Before Optimization | Baseline | - | - | - | 0.243 | 0.202 |
| Z-score-based | Baseline | - | - | - | 0.033 | 0.030 |
| Density-based | Baseline | - | - | - | 0.064 | 0.050 |
| Flight-state-driven | Baseline | 97.5 | 97.5 | 0.50 | 0.036 | 0.030 |
| PRC95 | 95.0 | 97.5 | 0.50 | 0.023 | 0.023 |
| PRC99 | 99.0 | 97.5 | 0.50 | 0.046 | 0.035 |
| DENS95 | 97.5 | 95.0 | 0.50 | 0.036 | 0.030 |
| Alpha = 0.7 | 97.5 | 97.5 | 0.70 | 0.034 | 0.029 |
Table 9 summarizes the CI-wise mean
and
for six representative gear-related CIs (SO1, SO2, GE2, M6, WEA, and STD) across the four thresholding schemes. For all CIs, the “Before Optimization” row exhibits the highest background alarm rates, confirming that the legacy thresholds generate frequent nuisance alarms during nominal operation. Both the Z-score-based and density-based optimizations substantially reduce
, but in some cases this reduction is accompanied by a noticeable decrease in
, indicating a loss of sensitivity. In contrast, the proposed flight-state-driven threshold optimization consistently maintains high
values that are comparable to or higher than those of the density-based method, while achieving similar or lower
values for most CIs, particularly SO2, WEA, and STD. These CI-level results are consistent with the aggregate behavior shown in
Figure 16 and
Table 8, and they confirm that embedding flight state information into the threshold review process improves alarm specificity across multiple gearbox CIs without sacrificing fault detection.