Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach

Darhi, Liron; Dvorjetski, Ariel; Aperstein, Yehudit

doi:10.3390/electronics15010141

Open AccessArticle

Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach

by

Liron Darhi

^1,2,

Ariel Dvorjetski

² and

Yehudit Aperstein

^1,*

¹

Intelligent Systems, Afeka Academic College of Engineering, Mivtsa Kadesh St 38, Tel Aviv 6910717, Israel

²

Israeli Air Force, Kaplan 23, Tel Aviv 6473424, Israel

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(1), 141; https://doi.org/10.3390/electronics15010141

Submission received: 21 November 2025 / Revised: 21 December 2025 / Accepted: 24 December 2025 / Published: 28 December 2025

Download

Browse Figures

Versions Notes

Abstract

The accurate estimation of helicopter engine power is crucial for ensuring operational performance and maintaining safety. Current methods, such as Maximum Power Checks (MPCs), are effective but resource-intensive and infrequent. This paper presents a novel machine learning-based framework tailored for operational helicopter fleets to estimate Engine Torque Factor (ETF) values from routine flight data obtained via Health and Usage Monitoring Systems (HUMS). The novelty lies in combining a statistically validated labeling strategy that links MPC-derived ETF values to regular flights with a dual-stage preprocessing pipeline, consisting of steady-state filtering and data consolidation, which is designed to produce high-quality, representative training data from noisy operational logs. Regression models, including XGBoost, CatBoost, and Random Forest, were trained and evaluated using HUMS data from AH-64A helicopters. Results demonstrate that focusing on specific ETF ranges significantly improves model performance, achieving R² values of up to 0.94. While the current implementation operates post-flight, the approach enables continuous monitoring between scheduled MPCs, potentially reducing unnecessary checks and providing earlier indications of power degradation.

Keywords:

helicopter predictive maintenance; engine torque factor (ETF) estimation; HUMS (health and usage monitoring systems); machine learning for aircraft; data-driven engine health monitoring

1. Introduction

Helicopter engine power is a critical determinant of overall aircraft performance across diverse operating conditions, including hovering, vertical takeoff and landing, and forward flight. Sufficient engine output is vital for generating the necessary lift and thrust, but power availability and efficiency can degrade due to environmental variables such as air density, temperature, humidity, and altitude. Engine power is also influenced by baseline performance upon entering service, progressive deterioration from wear and fouling of turbomachinery components (e.g., blades, vanes, and seals), emergent faults, and routine maintenance efficacy [1]. In harsh environments like desert operations, sand ingestion exacerbates erosion and compressor fouling, accelerating degradation and reducing performance and available power [2,3].

Accurate estimation of available engine power under varying flight regimes and conditions is essential for flight safety, mission planning, performance optimization, and timely maintenance. The General Electric T700 turboshaft engine family, widely adopted by manufacturers such as Bell, Boeing, Korean Aerospace Industries (KAI), and Sikorsky, powers various helicopter platforms [4]. Traditionally, engine power assessment relies on the Maximum Power Check (MPC), a manual procedure executed by pilots to compute the Engine Torque Factor (ETF), which represents the engine’s actual power output relative to its specified maximum.

The MPC is a manual procedure conducted during a dedicated maintenance flight. It involves reducing power to one engine while loading the other until it hits a limit (e.g., turbine temperature or torque). The pilot stabilizes altitude, airspeed, and power to record parameters, then repeats for the second engine. Using charts in the maintenance manual [5], the pilot accounts for ambient conditions like altitude and temperature to calculate the ETF. MPCs occur during periodic maintenance, after test failures, or new engine installations, with protocols detailed in the T700 manual.

Although MPCs provide reliable ETF values, they are resource-intensive, time-consuming, and infrequent, risking undetected degradation that could impair readiness and safety. These limitations hinder continuous ETF monitoring, potentially delaying maintenance and increasing risks.

To address this issue, this research introduces a machine learning (ML)-driven framework for ETF estimation using routine flight data from Health and Usage Monitoring Systems (HUMS). HUMS record operational parameters during standard flights and have been used for predictive maintenance (PdM) tasks like gearbox assessment and vibration detection, but less so for engine power estimation [1,6]. Unlike MPCs, routine flights offer variable profiles and dual-engine data, providing a larger dataset.

Our approach uses HUMS data via a pipeline: statistically validated labeling for reference ETFs, steady-state filtering to isolate stable segments, and data consolidation. This enables post-flight monitoring of power trends, adaptable for near-real-time use. By supporting continuous surveillance between MPCs, it promotes predictive maintenance, reduces manual check dependency, and enhances reliability and safety. As an AI-integrated solution, it handles noisy, high-dimensional data for improved scalability and precision.

The paper is structured as follows: Section 2 reviews related work on engine monitoring and ML in aerospace. Section 3 details the methodology, including labeling, filtering, consolidation, and modeling. Section 4 presents results across ETF ranges. Section 5 discusses implementation aspects and limitations, and also concludes with insights and future directions.

2. Prior Work

In recent years, predictive maintenance (PdM) has become central to aircraft operations, enabling data-driven strategies for identifying faults, forecasting failures, and optimizing maintenance schedules. This transition is fueled by the increasing availability of high-resolution operational data collected via onboard systems such as Flight Data Monitoring (FDM) and Health and Usage Monitoring Systems (HUMS). As modern aircraft can include tens of thousands of sensors, PdM methods are particularly suited to the complexity and scale of aviation systems [7]. Machine learning techniques have played a key role in this transformation. ML models are widely used for fault classification, anomaly detection, and remaining useful life (RUL) prediction across diverse aircraft systems. Karaoğlu et al. provide a comprehensive review of ML applications in aircraft maintenance, covering tasks such as component diagnostics, spare parts forecasting, and defect prioritization using models like Random Forest, Support Vector Regression, and Neural Networks [8]. Similarly, Helgo emphasizes the role of deep learning models including CNNs, LSTMs, and Autoencoders for aircraft prognosis, highlighting their ability to automatically extract features from flight data and support real-time decision-making [9]. For helicopters specifically, ML techniques have been applied to a range of PdM tasks, including gearbox operating condition estimation [10], gearbox health diagnostics [11], main gearbox failure prediction [12], helicopter-wide predictive analytics using flight and maintenance records [13], and bolt loosening detection in Apache helicopters using vibration signals and unsupervised anomaly detection [14] and temporal causality-based feature selection for fault prediction in flight control systems [15]. HUMS has been a cornerstone technology in these developments, offering continuous collection of flight and engine parameters in real operational environments. Prior studies have applied HUMS data to detect component wear, monitor gearbox condition, and analyze vibration signatures for anomaly detection, and recent work has shown that both probabilistic indicators and attention-based neural architectures can provide robust assessments of helicopter turbine engine health directly from HUMS measurements, either by modeling torque-margin–based health indices or by using multi-head attention to diagnose engine faults from multivariate operational data [16,17]. However, applications that directly estimate engine power parameters, such as the ETF, remain relatively scarce compared to the breadth of gearbox- and structure-focused HUMS research. This underlines a clear research gap and the opportunity to develop HUMS-based methods for direct engine performance monitoring without relying on dedicated test flights.

Engine health monitoring is a core element of aircraft maintenance and safety assurance. Traditional methods have long relied on model-based approaches such as Gas Path Analysis (GPA), Kalman filtering, and performance trend monitoring to detect deviations in critical engine parameters. These physics-based approaches use performance baselines to track deviations in pressure, temperature, and rotational speed, offering interpretable fault diagnostics. However, they are limited in handling nonlinearities, sensor uncertainties, and often require detailed engine models, which are often proprietary or incomplete in operational settings [18].

To overcome these constraints, machine learning (ML) methods have increasingly been applied to engine health monitoring tasks. These include supervised learning algorithms trained on historical sensor data to identify anomalous behavior and classify fault types. In particular, tree-based models such as Gradient Boosting and Random Forests have gained traction due to their robustness, interpretability, and suitability for tabular multivariate data, a common format in engine telemetry. Such models have demonstrated success in classifying engine faults, detecting performance anomalies, and supporting maintenance decision-making [19].

Across many fields, signal processing provides structured representations of raw signals, while machine learning builds data-driven models on top of them for robust inference. The applications range from remote sensing [20,21] to biomedical signal analysis [22,23] and speech and audio processing [24,25]. Recent studies applying multivariate time series analysis have shown that even without deep temporal models, changes in correlated sensor patterns can be effectively exploited for early fault detection using statistical and tree-based approaches. For example, a sensor fusion approach leveraging multivariate input features was proposed for fault detection in aircraft engines, showing high sensitivity and generalization to different flight conditions [26].

Moreover, practical implementations have focused on ensuring low computational overhead and high explainability, two strengths of ensemble tree methods, which make them particularly suitable for integration with onboard or near-real-time diagnostic systems. As noted in [27], while deep learning approaches show high accuracy, their deployment often faces hurdles in terms of interpretability and real-time feasibility, reinforcing the relevance of more lightweight alternatives in many operational contexts.

Estimating engine torque and power under real operational conditions is essential for assessing engine health and maintaining mission readiness. In helicopter operations, this is typically achieved through the Maximum Power Check (MPC), a dedicated and manually executed test flight that provides a quantitative measure known as the Engine Torque Factor (ETF). While accurate, the MPC is resource-intensive, infrequent, and cannot support continuous monitoring of engine degradation.

To address these limitations, Simon and Litt proposed a data-driven alternative that enables ETF estimation using operational flight data collected by Health and Usage Monitoring Systems (HUMS) [1]. Their approach relies on three key components: (1) a steady-state filter that extracts suitable engine operating points from raw flight data; (2) a trend-monitoring method that updates performance baselines over time; and (3) an algorithm for calculating ETF from filtered observations. The proposed system was validated on UH-60L helicopters with T700-GE-701C engines and showed strong alignment with manually derived ETF values, establishing an important baseline for automated engine power assessment.

While our research builds on the steady-state filtering principle used by Simon and Litt, it differs in several important aspects. First, our dataset is drawn from a large set of routine operational flights in an active fleet, rather than from a smaller set of controlled test conditions. Second, we introduce a statistically validated labeling strategy that links MPC-derived ETF values to regular flights over defined operational intervals, enabling supervised learning without manual labeling for every flight. Third, we integrate the steady-state filter with a consolidation process to remove redundant operating points, creating a high-quality dataset for training machine learning models. These adaptations are designed to improve robustness under diverse flight regimes and make the approach scalable for continuous post-flight monitoring. Beyond helicopters, machine learning approaches for power and torque estimation have been widely explored in automotive domains. Vong et al. used least squares support vector machines (LS-SVM) combined with Bayesian inference to model engine output from dynamometer data, reducing the need for repeated physical tests [28]. Similarly, Zeng et al. demonstrated that Extreme Learning Machines (ELMs) could deliver accurate torque predictions for gasoline engines across diverse operational conditions [29]. Comparable data-driven approaches have also been applied to turboshaft engines using supervised learning models trained on dedicated flight-test datasets collected under controlled conditions, for dynamic torque modeling [30].

While these studies highlight the potential of machine learning in engine modeling, they primarily rely on idealized or laboratory-collected datasets. In contrast, our approach focuses on real-world HUMS data collected during regular flights and employs tree-based ensemble models (XGBoost, CatBoost, Random Forest). By leveraging labeled ETF values from MPC flights and combining them with steady-state filtered routine flight data, and by applying a preprocessing pipeline that improves data quality and reduces redundancy, the proposed method enables continuous, interpretable ETF estimation under regular operational conditions, offering a scalable alternative to scheduled special flights and supporting predictive maintenance with minimal operational overhead.

3. Materials and Methods

The proposed Engine Torque Factor (ETF) estimation process involves key stages: data extraction and labeling (detailed in Section 3.1), preprocessing via steady-state filtering (Section 3.2) and consolidation (Section 3.3), and model training/evaluation. The complete data processing and modeling workflow is illustrated in Figure 1, which provides an algorithm-level overview of the proposed method, explicitly illustrating the sequential processing stages from HUMS data extraction and MPC-based labeling to steady-state filtering, data consolidation, and supervised model training and evaluation.

Flight data is extracted from the HUMS database, including Maximum Power Check (MPC) flights for ETF computation and regular operational flights. ETFs from MPCs are assigned to subsequent regular flights using a statistically validated labeling strategy, forming the supervised dataset.

The labeled data is cleaned, filtered for steady-state points to reduce noise, and consolidated to aggregate overlaps and prevent leakage. The final dataset is partitioned into training, validation, and test subsets.

Tree-based ensemble models (XGBoost, CatBoost, Random Forest) are trained and tuned via hyperparameter optimization for robustness to noisy HUMS data, multivariate suitability, and interpretability in maintenance contexts. Performance is assessed with regression metrics like R-squared and Root Mean Squared Error (RMSE).

Explainability techniques quantify contributions of key features (e.g., altitude, airspeed, engine parameters) to ETF estimates, supporting practical maintenance decisions.

3.1. Labeling Strategy for Engine Performance Analysis

The labeling process was designed to account for the operational differences between regular flights and MPC flights. Unlike MPC flights, which isolate a single engine under controlled conditions, regular flights involve both engines running simultaneously and vary widely in altitude, temperature, and speed. These differences lead to distinct engine feature distributions.

To enable labeling, we assume that an engine’s power output remains stable over time unless impacted by significant events such as faults or maintenance. This assumption is supported by an analysis of 56 MPCs across 22 engines, which showed minimal ETF variation within 60-day intervals (see Appendix A). The labeling window of T = 60 days was selected as a practical balance between data availability and engine stability. Shorter windows (e.g., 30 or 45 days) included fewer routine flights within each labeling interval, resulting in fewer labeled samples and degraded model accuracy. The 60-day interval therefore provided sufficient data for effective learning while remaining consistent with the observed engine stability across the 56 MPCs analyzed.

The 22 engines analyzed represent a diverse cross-section of the fleet, spanning five production series, 13–34 years of service, and 3500–7100 operational hours. This diversity in age, production series, and maintenance history supports the conclusion that the 60-day power stability is not engine-specific and generalizes across the fleet. To ensure labeling integrity, “significant events” were defined as any occurrence that could immediately or cumulatively affect engine power or stability or require unscheduled maintenance, even if minor. Routine flights following such events were excluded from labeling to maintain consistency and reliability of the assigned ETF values. Algorithm 1 formalizes the MPC-based labeling procedure used to assign ETF values to routine flights under these assumptions.

Algorithm 1. MPC-based labeling of routine flights (per engine)

Input:
MPCRecords: List of MPC entries containing engine ID, MPC time, and MPC-derived ETF value.
RoutineFlights: List of routine flight records containing engine ID, flight ID, and flight start time.
EventLog: List of significant events containing engine ID and event time.
T: Labeling window in days (T = 60).
Output:
LabeledFlights: List of labeled routine flights, each containing engine ID, flight ID, assigned ETF label, and the MPC time from which the label was derived.
Procedure:

1.

Group MPC records, routine flights, and significant events by engine ID.

2.

For each engine, sort MPC records by MPC time in ascending order, routine flights by start time, and significant events by event time.

3.

Initialize an empty list of labeled routine flights.

4.

For each routine flight of a given engine:

Let $t_{f}$ denote the start time of the routine flight.
Identify the most recent MPC performed at time $t_{m p c} \leq t_{f}$ . If no such MPC exists, discard the flight.
If the time $t_{f} - t_{m p c}$ difference exceeds T days, discard the flight.
Check whether any significant event occurred between $t_{f}$ and $t_{m p c}$ . If such an event exists, discard the flight.
Otherwise, assign the ETF value obtained from the MPC at time $t_{m p c}$ as a label of the routine flight and store the labeled flight.

5.

Return the list of labeled routine flights.

3.2. Steady-State Filter

According to [5] the MPC is conducted when the helicopter is in a stable condition, allowing the features to stabilize before performing the measurements. A steady-state filter for the T-700 engine was introduced in [1] and further developed in [6], which suggested an “algorithm that automatically identifies and extracts steady-state engine operating points from engine flight data”.

This process is achieved by comparing the standard deviation (σ) of selected input features in the data. Data points that fall under a defined threshold indicate a steady-state operating point, enabling the identification of relevant patterns within the dataset. The filter’s architecture consists of three main stages, as presented in [6]:

Initial Buffer Window: This stage captures an initial segment of data of size Tminbuff for further analysis. It serves as the starting point for identifying steady-state conditions.
Sliding Buffer of Non-Steady-State Data: This buffer moves across the dataset, identifying segments where steady-state conditions have not yet been established, ensuring only stable data points are retained.
Expanding Buffer of Steady-State Data: As the filter progresses, this buffer expands to accumulate steady-state data points, progressively refining the dataset by excluding non-relevant data.

The output of the steady-state filter will contain significantly fewer data points than the original dataset. However, these retained points will be much more stable and cleaner, which is crucial for improving the accuracy and reliability of machine learning models. By reducing noise and focusing on steady-state conditions, the filter enables better prediction of engine performance, ultimately leading to more precise and insightful results. In this study, the filter implementation was adapted for offline processing by replacing recursive mean and standard deviation calculations with standard statistical formulas, reducing computational complexity while preserving accuracy.

The steady-state detection algorithm and its parameterization were adopted directly from the configuration described in [6], which defines criteria for identifying stable operating points in helicopter engine data. The same threshold values were applied in this study, including σ ≤ 0.5% for torque, σ ≤ 0.2% for Ng and Np, σ ≤ 1.5% for TGT, σ ≤ 30 ft for barometric altitude, σ ≤ 4 knots for airspeed, and a minimum buffer window of 15 s. These parameters, validated through prior NASA testing, were retained without modification to maintain consistency with established filtering practices in HUMS-based engine performance analysis. After applying the Steady-State filter, the dataset was reduced to approximately 9000 points derived from 211 flights conducted by 16 helicopters.

3.3. Steady-States Consolidator

The steady-states consolidator, introduced in this study, is an advanced preprocessing tool designed to mitigate overlapping steady-state data points in the dataset, which can lead to data leakage and artificially high model performance. The consolidator addresses this issue by grouping identical or near-identical (in time) steady-state points and aggregating them into a single representative occurrence. This tool ensures a streamlined dataset, reducing the risk of data leakage and improving the reliability of machine learning model training.

The steady-states consolidator operates in three key stages:

Preprocessing: After ensuring timestamps are time zone-free, the data is organized based on the flight filename and the engine’s operational side. This ensures that steady states are processed within their respective operational contexts.
Row Grouping: Identifies clusters of data points based on temporal continuity. Rows are grouped together if their time values fall within a predefined threshold (default: 2 s), ensuring that only closely occurring events are aggregated.
Feature Aggregation: For each identified group, key features are summarized using statistical metrics such as mean and standard deviation. A single consolidated row is generated for each group, capturing its representative values while removing redundant rows.

The output of the consolidator is a refined dataset with significantly fewer rows, each representing a unique steady-state condition. By consolidating overlapping and redundant data points, the dataset is reduced in size while retaining its essential patterns. In this study, the consolidator reduced the dataset from approximately 9000 data points to 1400, resulting in a cleaner, smaller, and more representative dataset. This preprocessing step minimizes noise and redundancy, mitigates the risk of data leakage, and contributes to more realistic and generalizable model performance.

The 2 s window was selected empirically after testing alternative intervals (1 s and 3 s). Shorter windows yielded negligible reduction in redundancy, while longer windows began merging distinct steady-state events. The 2 s threshold therefore provided an optimal balance between data compactness and preservation of transient variations. Algorithm 2 summarizes the consolidation procedure used to merge temporally adjacent steady-state points (within Δt = 2 s) and to generate a compact, non-redundant dataset for model training.

Algorithm 2. Steady-state consolidation of filtered points (per flight and engine side)

Input:
SteadyStatePoints: List of steady-state points produced by the steady-state filter. Each point includes flight ID, engine side, timestamp, the selected input features, and the assigned ETF label.
Δt: Temporal grouping threshold in seconds (Δt = 2 s).
Output:
ConsolidatedPoints: List of consolidated steady-state samples, where each sample represents a unique steady-state occurrence and contains aggregated feature values and an ETF label.
Procedure:

1.

Standardize timestamps (convert to a uniform format and remove timezone offsets, if present).

2.

Sort all steady-state points by flight ID, engine side, and timestamp in ascending order.

3.

Initialize an empty list of consolidated points.

4.

For each unique pair (flight ID, engine side):

Extract the ordered list of steady-state points associated with this pair.
Initialize a current group with the first point in the list.
For each subsequent point in the list:
- Let $t_{p r e v}$ denote the timestamp of the last point in the current group and let $t_{c u r r}$ denote the timestamp of the new point.
- If $t_{c u r r} - t_{p r e v} \leq Δ t$ , add the new point to the current group.
- Otherwise, finalize the current group and create one consolidated sample as follows:
  -
  Set the representative timestamp to the first timestamp in the group.
  -
  For each feature, compute the mean value over all points in the group (and optionally the standard deviation, if retained as an additional feature).
  -
  Assign the ETF label of the group (the label is expected to be identical for all points because labeling is performed per flight; if inconsistent labels are detected, discard the group).
  -
  Append the consolidated sample to the consolidated points list.
  -
  Start a new current group with the new point.
After processing the final point, finalize and append the last group using the same aggregation rule.

5.

Return the list of consolidated steady-state samples.

3.4. Tree-Based Ensemble Models

Tree-based ensemble methods were selected as the primary modeling approach due to their suitability for structured, multivariate HUMS telemetry and their ability to capture nonlinear interactions among engine, environmental, and aerodynamic variables. These models combine multiple decision trees to produce stable and accurate predictions: Random Forest aggregates independently trained trees to reduce variance, while gradient-boosting approaches such as XGBoost and CatBoost build trees sequentially, each correcting the residual errors of the previous ones. Boosting methods are particularly effective in learning complex patterns in tabular data, handling noise, and modeling nonlinearities without requiring explicit feature transformations. Given the nonlinear relationship between HUMS parameters and ETF, these ensemble techniques provide a robust and interpretable framework for engine power estimation. Recent predictive maintenance studies have shown that tree-based ensemble methods such as XGBoost and Random Forest provide competitive or superior performance on structured telemetry data compared to alternative modeling approaches, particularly in regression-based health monitoring tasks where robustness and interpretability are critical [31,32].

From a computational perspective, the proposed pipeline is efficient and scalable. The preprocessing stages, including steady-state filtering and data consolidation, operate with linear time complexity O(N) where N denotes the number of HUMS samples. Model training is performed offline and follows the standard computational complexity of tree-based ensemble methods, which is approximately O(T·D·nlogn), where T is the number of trees, D is the maximum tree depth, and n is the number of consolidated samples (n << N). During deployment, inference requires only the evaluation of a limited number of decision trees per steady-state point, approximately O(T·D), resulting in low computational overhead and making the approach suitable for post-flight processing and near-real-time operational use.

4. Experiments and Results

To evaluate the feasibility and effectiveness of predicting engine torque factor (ETF) values from routine flight data, we designed a series of experiments centered on real-world helicopter telemetry, applying machine learning models across operationally meaningful ETF ranges. The goal is to evaluate whether routine HUMS data, after steady-state filtering and consolidation, can reliably estimate ETF within operationally meaningful ranges and with error levels that are actionable for maintenance.

4.1. Data Description and ETF Analysis

The dataset for this research was provided by the Israeli Air Force (IAF) and is derived from the Health and Usage Monitoring System (HUMS) installed on AH-64A helicopters, sampling data at 10 Hz. The initial dataset comprises over 2750 regular flights and 36 Maximum Power Check (MPC) flights conducted across 34 engines, with MPC procedures carried out according to the guidelines defined in [5]. These flights span three years and encompass diverse operating regimes.

The number of regular flights ultimately used as input for the machine learning models is determined during the nonstandard labeling stage. This process involves the parameter T, which defines the maximum allowable number of days between a regular flight and the most recent MPC flight for the same engine, provided no maintenance activities occurred during that interval. Based on an analysis of ETF stability over time, T = 60 days was selected as the benchmark. This choice is supported by Appendix A, which shows minimal ETF drift within 60 days, enabling label stability without re-labeling each flight. After applying the steady-state filter, this yielded a usable subset of 211 regular flights.

Figure 2 provides a comparison between the number of regular flights and MPC flights conducted for each helicopter. The blue bars represent MPC flights, while the orange bars correspond to regular flights. A key observation from this figure is the significant disparity in the number of regular flights versus MPC flights. This disparity highlights the underutilization of MPC data relative to the abundant operational data available from regular flights. By leveraging regular flights to predict ETF values, the proposed methodology addresses this imbalance, maximizing the utility of existing data for engine performance analysis.

After applying data preparation and filtering steps, the final dataset used for training the machine learning models consists of 1396 samples. These samples were derived from 211 regular flights, spanning 32 distinct engines.

The input features tested in the machine learning models include Airspeed (knots), Ambient Temperature, Altitude, Engine Torque, Gas Generator Speed (Ng), Power Turbine Speed (Np), Turbine Gas Temperature (TGT). These features were selected to closely mirror the parameters used in the Maximum Power Check (MPC), as they are integral to the calculation of the Engine Torque Factor (ETF). By aligning the input features with those used in the ETF computation, the study ensures consistency and relevance to real-world engine performance evaluation processes.

The operational envelope of the analyzed flights encompasses significant ranges across key aerodynamic and performance parameters. The dataset captures aircraft operations across airspeeds ranging from approximately 3.5 to 148 knots, altitude variations between 3500 and 8775 feet, and engine torque measurements spanning from 38% to 101%. These ranges effectively characterize typical mission profiles while encompassing diverse operational conditions, thereby ensuring robust model training across representative flight regimes.

Figure 3 illustrates the distribution of ETF values across the dataset. The x-axis represents ETF values, while the y-axis shows the frequency of samples. Most samples cluster near nominal ETF, which justifies focusing models on narrow operational ranges to reduce variance and improve accuracy.

A notable insight is the concentration of ETF values within specific operational ranges, reflecting the consistency of engine performance under normal conditions. This distribution serves as the foundation for selecting focused ETF ranges in the subsequent machine learning methodology, enabling better model accuracy and reliability. Table 1 presents the data distribution across training, validation, and test sets for each ETF range.

The splits are stratified by ETF range to preserve distributional balance and avoid leakage across training/validation/test.

4.2. Best Model Selection

To address the regression challenge of predicting ETF values, we evaluated XGBoost, CatBoost, and Random Forest models, using a 60/20/20 train/validation/test split. To confirm that the HUMS–ETF relationship cannot be captured by a linear model, we evaluated a linear regression baseline. The linear regression baseline showed markedly lower performance (R² ≈ 0.46 on the test set), which confirms that the relationship between HUMS parameters and ETF is strongly nonlinear. This nonlinearity originates from several coupled physical and operational effects. Torque and turbine gas temperature depend nonlinearly on altitude, airspeed, and ambient temperature because of changes in air density and compressor efficiency. In addition, the interaction between gas-generator speed (Ng), power-turbine speed (Np), and torque involves multiplicative and saturation effects that a linear model cannot represent. Operational data also contain threshold behaviors, for example, temperature limits near maximum torque, which further distort linearity. Consequently, linear regression fails to generalize across flight regimes, whereas nonlinear ensemble models are able to capture these complex dependencies and yield substantially higher accuracy.

The following section presents experimental results, beginning with challenges encountered when training across the full ETF range, and detailing the improvements achieved by narrowing the target range, rounding values, and applying frequency-based filtering. Initial experiments we conducted on the complete ETF range (0.886–1.078) revealed significant overfitting, particularly with the XGBoost model. Despite extensive optimization efforts, including hyperparameter tuning, feature engineering, and the addition of regularization techniques, the model struggled to generalize (see Appendix B). These results underscored the high variability in the full ETF range, which limited the model’s ability to capture meaningful relationships between features and target values and led us to shift in focus toward narrower ETF ranges and additional filtering.

To mitigate overfitting observed during initial experiments on the full ETF range (0.886–1.078), we systematically evaluated multiple narrower ETF intervals. These included: (1) 0.90–0.98, representing lower-than-nominal power outputs; (2) 0.90–0.99, offering broader coverage while maintaining manageable variability; (3) 0.97–1.05, centered around nominal engine conditions; and (4) 0.97–1.078, capturing higher-end torque values with a modest tail. Across all intervals, XGBoost and CatBoost consistently outperformed other models, achieving higher R² values and lower RMSEs. The most significant improvement was observed within the 0.97–1.05 range, where model accuracy reached up to R² = 0.94. These findings highlight the importance of training within operationally consistent ETF bands to minimize noise and enhance predictive reliability.

To further improve data consistency, we applied filtering strategies such as including ETF values with more than 10 or 30 occurrences and rounding ETF values to three decimal places. However, these actions did not result in significant improvements.

Based on these findings we focus on two ETF ranges.

4.2.1. Focused Range: 0.97–1.05

Narrowing the ETF range to 0.97–1.05, which represents a subset of major operational engine conditions, led to significant improvements. The reduced variability in this range allowed the models to focus on consistent patterns, resulting in higher accuracy (see Figure 4):

XGBoost: Achieved the highest R² values (≈0.93–0.94 on validation/test) and RMSE as low as 0.005, with minimal drop from training performance, indicating strong generalization.
CatBoost: Slightly lower R² (≈0.90–0.92) but comparably small train–test gaps, providing a reliable alternative with smoother residual trends.
Random Forest: Reached R² of ≈ 0.90, with higher RMSE than XGBoost and CatBoost, but still demonstrated consistent performance across splits.

At RMSE ≈ 0.005–0.007 ETF, the prediction error corresponds to less than 1% deviation from the nominal ETF value, which is sufficiently precise for monitoring trends between MPCs and for providing early indications of emerging engine power degradation.

Q-Q (quantile-quantile) plots are essential statistical visualization tools that assess the adherence of model residuals to theoretical distributions, primarily the normal distribution, by comparing empirical and theoretical quantiles. This diagnostic approach enables the systematic evaluation of distributional assumptions underlying the modeling process.

Q-Q plots of the residuals were examined to assess the normality and distribution characteristics of model predictions:

XGBoost (Figure 5 left): The Q-Q plot exhibits strong alignment in the central region (−1 to 1), with some expected deviations in the tails. The pattern suggests good predictive performance across typical ETF values, with only minor variations in handling extreme cases.
CatBoost (Figure 5 right): The residual distribution demonstrates improved alignment with theoretical quantiles, particularly in the central region, while maintaining good consistency even towards the edges. The symmetric nature of the plot indicates CatBoost’s strong capability to handle both positive and negative residuals uniformly across this operational range.

Residuals are well behaved in the nominal band, indicating stable generalization without heavy tails that could bias maintenance decisions.

These observations suggest that CatBoost provides a slight advantage in handling residual distributions within the 0.97–1.05 range, demonstrating enhanced reliability and consistency when predicting ETF values in this operational window.

After performing 5-fold cross-validation on this range (0.97–1.05), we evaluated the model’s consistency and generalization. As shown in Figure 6, cross-validation revealed a close alignment between the training and validation RMSE curves across boosting rounds, indicating robust performance with no overfitting. These results validated the optimized hyperparameters and preprocessing pipeline, affirming the model’s reliability within the operational ETF range.

4.2.2. Focused Range: 0.90–0.99

The range 0.90–0.99 captures another operationally relevant subset of ETF values, representing lower power outputs. Training models on this range revealed strong performance, comparable to the results from 0.97 to 1.05 (see Figure 7):

XGBoost and CatBoost: achieved similar results with R² values of ~0.89–0.90 and RMSE as low as 0.009 on validation sets.
Random Forest: delivered slightly lower R² of ~0.85 with higher RMSE than XGBoost and CatBoost.

RMSE ≈ 0.008–0.011 ETF in this lower-power band supports monitoring of sub-nominal engine performance and early alerts when ETF trends decline.

Q-Q plots of the residuals were also analyzed to validate model performance further:

XGBoost (Figure 8 left): The Q-Q plot shows strong alignment with a minor deviation in the upper tail, suggesting the model struggles to fully capture extreme deviations in the actual data. This leads to larger residuals (errors) in these cases, as the model predicts values that are closer to the mean or expected trend, thereby underestimating the true magnitude of the observed deviations. CatBoost (Figure 8 right): The residuals show slightly smoother alignment with the theoretical quantiles across most of the range, with only minor deviations in the extremes. While CatBoost’s maximum errors are larger, the residual distribution appears more consistent and evenly spread.

These observations suggest that CatBoost may offer a slight advantage in capturing the overall residual distribution. However, XGBoost demonstrates stronger overall performance metrics, particularly in minimizing high residuals and achieving better accuracy when predicting ETF values with lower variability.

A 5-fold cross-validation conducted on the ETF range of 0.90–0.99, similar to the previous range, suggests strong model generalization. As shown in Figure 9, both the training and test RMSE curves exhibit consistent improvement across boosting rounds, with a minimal gap of 0.008 at convergence. The small difference between training and test performance, alongside the stability of the test RMSE curve in later boosting rounds, indicates healthy generalization with no significant overfitting. These results confirm that the model maintains robust predictive capabilities within the 0.90–0.99 range.

The experiments demonstrated that narrowing ETF ranges significantly enhances model performance. Among the tested ranges, 0.97–1.05 yielded the best results, with XGBoost achieving the highest R-squared and demonstrating strong generalizability for operational use. XGBoost performed well for the ETF range 0.90–0.99, yet CatBoost slightly outperformed it, with Random Forest following as a reliable alternative (see Table 2):

For nominal operations (0.97–1.05), XGBoost is preferred due to the best R²/RMSE. For lower-power operations (0.90–0.99), CatBoost offers slightly smoother residuals, which can be preferable for trend analytics. In contrast, the markedly inferior performance of linear regression across both ranges (R² < 0.46) indicates that the underlying relationships between HUMS parameters and ETF are highly nonlinear. This finding justifies the use of advanced nonlinear ensemble models, which can capture complex feature interactions that make simple linear approaches fail to model effectively.

All evaluated models underwent optimization through a uniform automated procedure to ensure comparability. For conciseness, the optimized hyperparameters for CatBoost are reported as an illustrative example. Within the nominal ETF range (0.97–1.05), the configuration featured a tree depth of 8, a learning rate of 0.115, 850 boosting iterations, a subsample ratio of 0.55, L2 regularization of 1.19, and ordered boosting. In contrast, for the lower-power range (0.90–0.99), the optimal setup adapted to a shallower tree depth of 4, a reduced learning rate of 0.088, elevated subsampling (0.84), intensified L2 regularization (2.83), and plain boosting. These variations underscore the regime-dependent nature of model regularization and learning dynamics, highlighting the necessity for tailored hyperparameter tuning across distinct operational contexts.

The findings serve as proof of concept for predicting ETF values and underscore the importance of tailoring training strategies to specific operational conditions. Range-focused training delivers operationally useful accuracy without added sensors or flight tests. Model choice can be tailored by bands (XGBoost for nominal, CatBoost for lower-power), providing a practical recipe for fleet deployment with post-flight HUMS data.

5. Discussion and Conclusions

5.1. Discussion

This research introduces a novel approach for estimating helicopter engine power using regular flight data, addressing a critical need in helicopter maintenance and operations. To our knowledge, this is the first study to combine nonstandard HUMS data labeling, steady-state filtering, and gradient-boosted tree models for power prediction, achieving accuracy suitable for operational deployment.

The methodology demonstrates how modern machine learning techniques can enhance maintenance procedures and power estimation, offering several key advantages:

Enhanced maintenance planning through continuous power monitoring, reducing the need for frequent power check flights.
Improved operational efficiency with near real-time (post-flight) insights into engine performance.
Better understanding of engine degradation by analyzing power trends over time.
Informed decision-making regarding operability, maintenance, and replacement scheduling.

Integrating various technologies was key to achieving reliable power estimation. The nonstandard labeling process was crucial for leveraging regular flight data for power prediction, bridging the gap between routine operations and dedicated power check flights. The steady-state filter and consolidator complemented each other by enhancing data quality. The steady-state filter identified stable operating conditions, while the consolidator minimized data leakage by aggregating temporally proximate measurements. This dual-stage preparation reduced the dataset from around 9000 points to 1400 high-quality samples, providing a strong foundation for machine learning training. Notably, the resulting model demonstrates robust performance within validated physical parameter ranges, encompassing specified operational envelopes for airspeed, altitude, and engine torque measurements. This validation within defined operational envelopes reinforces the practical applicability and reliability of the proposed methodology in real-world scenarios. Expanding the dataset further could improve performance and broaden the model’s applicability.

The comparative analysis revealed that XGBoost and CatBoost are particularly effective when trained on focused ETF ranges. The superior performance within these ranges highlights that operationally segmented models can outperform general-purpose models in high-noise, full-spectrum conditions—an insight with direct implications for operational deployment.

5.2. Limitations and Future Research

The present study achieves promising results; however, several limitations should be acknowledged, together with the steps planned to mitigate them. Despite the promising results achieved within focused ETF ranges (0.90–0.99 and 0.97–1.05), the model’s performance on the full ETF spectrum remains limited due to increased variability, data sparsity, and higher noise near range boundaries. A practical alternative involves developing a modular ensemble of range-specific models, each optimized for a defined ETF interval, combined with a preliminary classifier to estimate the applicable range before regression based on key steady-state parameters. This two-stage system would retain model precision while improving coverage across the entire engine-performance envelope.

The dataset’s limited size and diversity constrain model generalization. The data used in this study cover a restricted number of engines and lack extreme-weather operational conditions, which may reduce robustness under broader scenarios. Future research will incorporate larger multi-year HUMS datasets representing more diverse mission profiles and environmental conditions. It will also include a systematic sensitivity analysis of the labeling window (T) to quantify its influence on labeling accuracy and model performance.

Additionally, expanding the feature space to incorporate environmental and configuration-specific variables, along with transfer learning strategies, may improve generalization. These considerations are critical for enabling robust real-world deployment of the proposed methodology in support of condition-based maintenance programs.

Future research directions include

Expanding the dataset to incorporate a broader range of operational conditions and environmental factors.
Exploring transfer learning to adapt models to different helicopter types.
Developing ensemble methods that combine predictions from range-specific models.
Integrating additional sensor data or operational parameters to improve prediction accuracy.

Future research could also adapt the methodology for real-time processing on legacy helicopters, which often lack a full avionics suite. Preprocessing techniques, such as steady-state detection, could handle high-frequency unstable data dynamically. Onboard edge processing tailored to legacy avionics would reduce latency and reliance on external systems, providing immediate insights. Alert mechanisms for ETF deviations would enhance safety, while incremental learning and scalability testing would ensure practicality for these older platforms.

This work represents a significant advancement in data-driven approaches to helicopter engine monitoring. By showing that accurate power estimation can be achieved using routine flight data without the need for dedicated test flights, this study paves the way for continuous, cost-effective engine health assessment. The methodology could be adapted for use with other aircraft types and engine configurations, providing a template for similar applications across the industry.

Author Contributions

Conceptualization, L.D., A.D. and Y.A.; methodology, L.D., A.D. and Y.A.; software, L.D.; validation, L.D., A.D. and Y.A.; formal analysis, L.D. and Y.A.; investigation, L.D. and Y.A.; data curation, L.D.; writing—original draft preparation, L.D., A.D. and Y.A.; writing—review and editing, L.D., A.D. and Y.A.; visualization, L.D. and Y.A.; supervision, A.D. and Y.A.; project administration, Y.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ETF	Engine Torque Factor
HUMS	Health and Usage Monitoring System
MPC	Maximum Power Check
PdM	Predictive maintenance
FDM	Flight Data Monitoring
GBDT	Gradient Boosted Decision Trees
RMSE	Root Mean Squared Error

Appendix A. Change in Engine’s Power over Time

As stated earlier, it is assumed that the power of an engine remains stable as long as no external event, such as a malfunction, environmental impact, or corrective maintenance, occurs. To validate this assumption, data from 56 MPC flights were analyzed, creating 28 MPC pairs for comparison. The ratio between the ETF values of each pair was calculated to determine the relative change in the engine’s power over time.

Figure A1 illustrates the relative changes observed in the 28 pairs. The results show a mean of 1.006 and a standard deviation of 0.02, indicating that, on average, the engine’s power remains stable over time, with only minor variations. These findings provide confidence in the assumption and the labeling process while acknowledging a small deviation that will be taken into account when evaluating the model’s performance.

Figure A1. Relative change in power over time.

This analysis supports the choice of T = 60 as the window for assigning ETF values to regular flights following an MPC flight, ensuring that the labeling process is both reliable and consistent with observed engine behavior.

Appendix B. Exploring Narrow ETF Range

The initial experiments conducted on the full ETF range (0.886–1.078) revealed significant overfitting, particularly with the XGBoost model. Despite extensive optimization efforts, including hyperparameter tuning, feature engineering, and the addition of regularization techniques, the results were suboptimal:

R-Squared: Approximately 0.35–0.41 for validation and test sets, compared to ~0.8 for the training set.
PCA: While dimensionality reduction using PCA helped simplify the data, it failed to address overfitting effectively, providing only marginal improvements.
Stacking Regressor: Similar results were observed when experimenting with a stacking regressor, highlighting the need for further refinement.

This analysis suggested that the high variability across the full ETF range hindered the model’s ability to capture meaningful relationships. The wide range introduced high variability, reducing the generalizability of predictions.

As a result, the focus shifted to narrower ETF ranges to reduce variability and improve model performance. This adjustment aligned the dataset with specific operational conditions, enabling the model to learn more robust and consistent patterns.

References

Simon, D.L.; Litt, J.S. Automated power assessment for helicopter turboshaft engines. In Proceedings of the 64th Annual Forum and Technology Display (AHS Forum 64), Montréal, QC, Canada, 29–31 July 2008. NASA/TM-2008-215270. [Google Scholar]
Litt, J.S.; Simon, D.L. Toward a real-time measurement-based system for estimation of helicopter engine degradation due to compressor erosion. J. Am. Helicopter Soc. 2009, 54, 12008. [Google Scholar] [CrossRef][Green Version]
Scala, S.; Konrad, M.; Mason, R.; Skelton, D. Predicting the performance of a gas turbine engine undergoing compressor blade erosion. In Proceedings of the 39th AIAA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit, Huntsville, AL, USA, 20–23 July 2003; p. 5259. [Google Scholar]
General Electric Aviation. T700 Turboshaft Engine. Available online: https://www.geaerospace.com/propulsion/military/t700 (accessed on 27 August 2023).
U.S. Department of the Army; Air Force Headquarters. Technical Manual, Aviation Unit and Intermediate Maintenance Manual: Engine, Aircraft Turboshaft Models T700–GE–700, T700–GE–701, T700–GE–701C, T700–GE–701D; Army TM 1-2840-248-23&P. 2019. Available online: https://www.nsndepot.com/ (accessed on 27 August 2023).
Simon, D.L.; Litt, J.S. A data filter for identifying steady-state operating points in engine flight data for condition monitoring applications. J. Eng. Gas Turbines Power. 2011, 133, 071603. [Google Scholar] [CrossRef]
Stanton, I.; Munir, K.; Ikram, A.; El-Bakry, M. Predictive maintenance analytics and implementation for aircraft: Challenges and opportunities. Syst. Eng. 2023, 26, 216–237. [Google Scholar] [CrossRef]
Karaoğlu, U.; Mbah, O.; Zeeshan, Q. Applications of machine learning in aircraft maintenance. J. Eng. Manag. Syst. Eng. 2023, 2, 76–95. [Google Scholar] [CrossRef]
Helgo, M. Deep learning and machine learning algorithms for enhanced aircraft maintenance and flight data analysis. J. Robot. Spectrum 2023, 1, 090–099. [Google Scholar] [CrossRef]
Gildish, E.; Grebshtein, M.; Aperstein, Y.; Makienko, I. Vibration-based estimation of gearbox operating conditions: Machine learning approach. In Proceedings of the 2023 International Conference on Control, Automation and Diagnosis (ICCAD), Paris, France, 4–6 May 2023; pp. 1–6. [Google Scholar]
Moreira, G.; Pereira, A.; Nabarrete, A.; Gomes, W. Addressing gearbox health monitoring challenges for helicopters: A machine learning approach. An. Acad. Bras. Ciênc. 2024, 96, e20240404. [Google Scholar] [CrossRef] [PubMed]
Daouayry, N.; Maisonneuve, P.L.; Mechouche, A.; Scuturici, V.M.; Petit, J.M. Predictive maintenance for helicopter from usage data: Application to main gear box. In Proceedings of the 44th European Rotorcraft Forum, Delft, The Netherlands, 19–20 September 2018. [Google Scholar]
Mechouche, A.; Daouayry, N.; Camerini, V. Helicopter big data processing and predictive analytics: Feedback and perspectives. In Proceedings of the 45th European Rotorcraft Forum, Warsaw, Poland, 17–20 September 2019; p. 143. [Google Scholar]
Gildish, E.; Grebshtein, M.; Aperstein, Y.; Kushnirski, A.; Makienko, I. Helicopter bolt loosening monitoring using vibrations and machine learning. In Proceedings of the PHM Society European Conference, Turin, Italy, 27–30 June 2022; Volume 7, pp. 146–155. [Google Scholar]
Shavit, D.; Davidovits, M.; Kushnirsky, A.; Aperstein, Y. Temporal causality-based feature selection for fault prediction in rotorcraft flight controls. IFAC Pap. 2022, 55, 235–239. [Google Scholar] [CrossRef]
Han, P.; Liang, Q.; Vanem, E.; Knutsen, K.E.; Zhang, H. Assessing helicopter turbine engine health: A simple yet robust prob-abilistic approach. In Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM, Nashville, TN, USA, 9–10 November 2024. [Google Scholar] [CrossRef]
Park, Y.H.; Oh, H.I.; Kim, I.T.; Lee, S.J.; Moon, S.H.; Park, G.J.; Park, J.K.; Jung, J.H. Intelligent helicopter turbine engine fault diagnosis using multi-head attention. In Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM, Nashville, TN, USA, 9–10 November 2024. [Google Scholar] [CrossRef]
Fentaye, A.D.; Zaccaria, V.; Kyprianidis, K. Aircraft engine performance monitoring and diagnostics based on deep convolutional neural networks. Machines 2021, 9, 337. [Google Scholar] [CrossRef]
Adryan, F.A.; Sastra, K.W. Predictive maintenance for aircraft engine using machine learning: Trends and challenges. Int. J. Aviat. Sci. Eng. 2021, 3, 37–44. [Google Scholar] [CrossRef]
Apartsin, A.; Cooper, L.N.; Intrator, N. Semi-coherent time of arrival estimation using regression. J. Acoust. Soc. Am. 2012, 132, 832–837. [Google Scholar] [CrossRef] [PubMed]
Apartsin, A.; Cooper, L.N.; Intrator, N. Time-of-flight estimation in the presence of outliers. Part I: Single echo processing. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3382–3392. [Google Scholar] [CrossRef]
Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adam, M.; Gertych, A.; Tan, R.S. A deep convolutional neural network model for automated diagnosis of epilepsy using EEG signals. Comput. Biol. Med. 2018, 100, 270–278. [Google Scholar] [CrossRef] [PubMed]
Faust, O.; Hagiwara, Y.; Hong, T.J.; Lih, O.S.; Acharya, U.R. Deep learning for healthcare applications based on physiological signals: A review. Comput. Methods Programs Biomed. 2018, 161, 1–13. [Google Scholar] [CrossRef] [PubMed]
Deng, L.; Li, X. Machine learning paradigms for speech recognition: An overview. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 1060–1089. [Google Scholar] [CrossRef]
Purwins, H.; Li, B.; Virtanen, T.; Schlüter, J.; Chang, S.-Y.; Sainath, T. Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 2019, 13, 206–219. [Google Scholar] [CrossRef]
Bian, H.; Zou, Q.; Kong, X. Aircraft engine fault detection algorithm based on multivariate time series sensor data. In Proceedings of the 2024 Second International Conference on Inventive Computing and Informatics (ICICI), Kuala Lumpur, Malaysia, 2–4 June 2024; pp. 622–629. [Google Scholar]
Suliman, S.I.; Yusof, Y.W.M.; Rahman, F.Y.A.; Izran, M.H.B.S. Enhancing aviation safety: A deep learning-based fault detection system for jet engines. In Proceedings of the 2024 IEEE 14th Symposium on Computer Applications and Industrial Electronics (ISCAIE), Penang, Malaysia, 24–26 May 2024; pp. 560–566. [Google Scholar]
Vong, C.M.; Wong, P.K.; Wu, J. Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference. Eng. Appl. Artif. Intell. 2006, 19, 277–287. [Google Scholar] [CrossRef]
Zeng, Y.; Liang, L.; Sun, J.; Jiang, H. A study on extreme learning machine for gasoline engine torque prediction. Int. J. Automot. Technol. 2020, 21, 245–252. [Google Scholar] [CrossRef]
Paniccia, D.; Tucci, F.A.; Guerrero, J.; Capone, L.; Sanguini, N.; Benacchio, T.; Bottasso, L. A supervised machine-learning approach for turboshaft engine dynamic modeling under real flight conditions. arXiv 2025, arXiv:2502.14120. [Google Scholar] [CrossRef]
Abdulkareem, A.O.; Jimada-Ojuolape, B.; Balogun, M.O.; Adesina, L.M. Comparative analysis of XGBoost and Random Forest for transformer failure prediction in predictive maintenance. Ann. Fac. Eng. Hunedoara 2024, 22, 135–142. [Google Scholar]
Noura, H.N.; Chu, T.; Allal, Z.; Salman, O.; Chahine, K. A comparative study of ensemble methods and multi-output classifiers for predictive maintenance of hydraulic systems. Results Eng. 2024, 24, 102900. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed ETF estimation pipeline using routine HUMS data.

Figure 2. Flight counts per helicopter (MPC vs. regular). Bars compare MPC and routine flights used in this study. The abundance of routine flights relative to MPCs motivates learning ETF from HUMS to enable continuous monitoring between scheduled checks.

Figure 3. ETF distribution in the final dataset. Histogram of labeled ETF after steady-state filtering and consolidation. Concentration near nominal ETF supports training range-focused models for robust accuracy.

Figure 4. Model performance for ETF range 0.97–1.05 across training, validation, and test sets. Nonlinear ensemble methods (XGBoost, CatBoost, Random Forest) outperform linear regression, indicating the nonlinear nature of the relationship between HUMS features and ETF in the nominal power band.

Figure 5. Residual Q–Q plots in ETF 0.97–1.05. Residual distributions for XGBoost (left) and CatBoost (right). Blue dots show empirical residual quantiles and the red line denotes the theoretical normal reference. Alignment in the central region indicates approximately normal error behavior under typical operating conditions.

Figure 6. 5-fold cross-validation (ETF 0.97–1.05). Training and validation RMSE vs. boosting rounds. Small gaps indicate good generalization.

Figure 7. Model performance in ETF 0.90–0.99. CatBoost and XGBoost show comparable accuracy, with CatBoost slightly smoother residuals.

Figure 8. Residual Q–Q Plots for XGBoost and CatBoost within ETF Range 0.90–0.99. Blue dots show empirical residual quantiles and the red line denotes the theoretical normal reference.

Figure 9. 5-fold cross-validation (ETF 0.90–0.99). Training and test RMSE versus boosting rounds. The blue curve denotes mean training RMSE across folds, while the orange curve represents mean test RMSE. Shaded regions indicate ±1 standard deviation across folds, reflecting variability of model performance. Stable convergence with a small train–test gap supports healthy generalization.

Table 1. Sample Distribution Across Train, Validation, and Test Sets for ETF Ranges.

ETF Range	Data	Sample Count
0.97–1.05	Train	436
	Val	146
	Test	146
0.90–0.99	Train	476
	Val	159
	Test	159

Table 2. Model Evaluation Results for ETF Prediction in Ranges 0.97–1.05 and 0.90–0.99.

	ETF Range 0.97–1.05		ETF Range 0.90–0.99
Model	RMSE	R²	RMSE	R²
XGBoost	0.006	0.939	0.009	0.898
CatBoost	0.007	0.905	0.008	0.903
Random Forest	0.007	0.906	0.011	0.846

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Darhi, L.; Dvorjetski, A.; Aperstein, Y. Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach. Electronics 2026, 15, 141. https://doi.org/10.3390/electronics15010141

AMA Style

Darhi L, Dvorjetski A, Aperstein Y. Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach. Electronics. 2026; 15(1):141. https://doi.org/10.3390/electronics15010141

Chicago/Turabian Style

Darhi, Liron, Ariel Dvorjetski, and Yehudit Aperstein. 2026. "Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach" Electronics 15, no. 1: 141. https://doi.org/10.3390/electronics15010141

APA Style

Darhi, L., Dvorjetski, A., & Aperstein, Y. (2026). Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach. Electronics, 15(1), 141. https://doi.org/10.3390/electronics15010141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Estimation of Helicopter Engine Power Using Regular Flight Data: A Machine Learning Approach

Abstract

1. Introduction

2. Prior Work

3. Materials and Methods

3.1. Labeling Strategy for Engine Performance Analysis

3.2. Steady-State Filter

3.3. Steady-States Consolidator

3.4. Tree-Based Ensemble Models

4. Experiments and Results

4.1. Data Description and ETF Analysis

4.2. Best Model Selection

4.2.1. Focused Range: 0.97–1.05

4.2.2. Focused Range: 0.90–0.99

5. Discussion and Conclusions

5.1. Discussion

5.2. Limitations and Future Research

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Change in Engine’s Power over Time

Appendix B. Exploring Narrow ETF Range

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI