A Hybrid Degradation Modeling and Prognostic Method for the Multi-Modal System

: Engineering systems typically go through complicated degradation processes, partially due to their multiple operating modes. Therefore, how to accurately estimate their remaining useful life is a critical issue. To address this challenge, a hybrid degradation modeling and prognostic method for the multi-modal system is proposed. Firstly, the cumulative dynamic differential health indicator is constructed for the multi-modal switching system using a multi-objective optimization approach. The long-term cumulative degradation assessment model is constructed based on the gated recurrent unit. Then, considering that the damage in the latest stage has a signiﬁcant impact on the remaining useful life, the time window is used to extract local features of the sequence, including energy features and statistical features. The latest stage degradation is predicted based on the light gradient boosting machine. Finally, model averaging is used to integrate the two predicted results, which is expected to improve the prognostic robustness. The proposed model is evaluated with synthetic analysis and NASA turbofan aero-engine datasets. Extensive experimental results demonstrate the proposed method provides a better characterization of the degradation status of the system and provides a higher estimation accuracy than existing methods.


Introduction
Prognostic health management is a critical piece of technology for providing early warnings of system failure [1], aiming to minimize economic loss and potentially catastrophic accidents. As a vital issue of prognostic health management, the remaining useful life (RUL), which is defined as the duration from the current time to the failure time, should be predicted accurately. Therefore, remaining useful life prediction has drawn much attention from both industry and academia.
An engineering system can work under a single operating condition or multiple operating conditions depending the application scenario. The sensor data often have stationary aging trends or features when the system works under a simple condition. Most researchers concentrate on the prognostic health management of system with the single operation condition [2][3][4][5][6][7][8][9][10]. However, in practice, the engineering system often works under multiple operating conditions [11] and switches operation modes according to the changes in the operating environment and its states. For example, the information gain and find the optimal partition point by scanning all samples. Efficiency and scalability are difficult to satisfy when calculating large amounts of data. The light gradient boosting machine (LightGBM) has proven to have a faster and more accurate performance in [31]. In addition, the light gradient boosting machine has great potential in remaining useful life estimation when the sensor data contains a lot of noise, which is very common in industrial applications.
After solving the above problems, a decision-level fusion model is needed to get the final aging prediction result. Most previous researchers intended to remedy the lack of performance of one of the algorithms, such as convergence and computational complexity [32,33], rather than focusing on the physical characteristics of the system, which are important for remaining useful life prognostic and are analyzed in this paper. The averaging method has the advantages of low computational complexity and the ability to adjust parameters online. In [34], the strategy, which uses different methods to construct the degradation models and combines them for remaining useful life prediction, is defined as the hybrid prognostics approach.
In this paper, a hybrid degradation modeling and prognostic method for multi-modal systems is proposed to address shortcomings mentioned above. First, a cumulative dynamic differential health indicator (CDD-HI) is proposed based on the physical characteristics with the existence of mode switching. The proposed health indicator is used to predict long-term cumulative degradation by the gated recurrent unit network. Then, local features are constructed to predict latest-stage degradation by light gradient boosting machine. Finally, based on averaging method, the results of the gated recurrent unit and light gradient boosting machine are integrated to obtain remaining useful life. In conclusion, this paper aims to construct a hybrid degradation modeling and prognostic method for multi-modal systems, which combines the long-term cumulative degradation assessment and latest-stage degradation assessment, and takes into account changes in system capabilities to withstand damage under different aging conditions. The main contributions of this paper are as follows: 1) The cumulative dynamic physical difference feature is constructed to deal with the superposition effect of different operation modes, and gains the physical characteristics with the existence of mode switching, which is hidden in the inter-frame data. The health indicators are constructed based on the physical properties to improve the interpretability and robustness of a prediction model. The combination of a cumulative dynamic differential health indicator and the gated recurrent unit will further enhance the accuracy of cumulative degradation prediction.
2) During the latest stage of system operation, the system's ability to resist damage is usually the worst in its life time. The time window is used to extract local features, including statistical and energy features, which could mine aging information more fully to predict latest-stage degradation with the light gradient boosting machine.
3) The experiments on theoretical examples and the accessible turbofan aero-engine degradation dataset from NASA, and comparisons with the related model validate the effectiveness and superiority of the proposed method.
The rest of this paper is organized as follows. The proposed model is detailed in Section 2. Section 3 evaluates the proposed model on the NASA degradation dataset and presents an analysis of the results. The conclusion and prospects are given in Section 4.

Degradation Modeling and Prognostic Hybrid
This paper develops a hybrid degradation modeling and prognostic method for the multi-modal system to characterize the underlying degradation process accurately. Considering that different combinations of modes have different effects on the degradation process, the method predicts remaining useful life from two aspects: the long-term cumulative degradation assessment based on a cumulative dynamic differential health indicator, and the latest stage degradation assessment based on local features.
As shown in Figure 1, the proposed method works in an offline stage and an online stage. In the offline stage, the degradation model is learned from history monitoring data. The learned model is used to predict the remaining useful life in real-time in the online stage. S is defined as the input dataset. s i,k (t) is the data sequence of kth sensor data of the ith sample (system) to represent the change in data as the number of actions in time t changes. Then, the remaining useful life estimation process is as follows. For the single mode system, static features are extracted from the raw sensor data to represent the degradation trend. For multi-modal systems, different combinations of switching order have different effects on the degradation. So for multi-modal systems, the whole degradation cannot be built only based on stationary features. To solve it, the proposed cumulative dynamic difference features consist of four parts, which are sthe mooth feature (F 1 ), cumulative modes (F 2 ), difference times (F 3 ) and difference data (F 4 ).
In order to leave out the noise in the original signal S, smoothing is used to obtain the smoothed feature F 1 . The size of F 1 is the number of sensor K. F 2 represents the cumulative action times of each operation mode. The size of F 2 equals the number of modes M. To present the information of mode switching, F 3 and F 4 are constructed. F 3 represents the forward differences of action times of each operation mode, each of which is gained by calculating the time interval between the current action time and the last action time of each mode. F 4 represents the forward differences of the sensor data of each operation mode (d). The differences of sensor data between the current action time and the last action time of each mode are calculated to characterize data changes between different modes.
The cumulative dynamic physical difference feature consists of the above four parts for the multi-modal system to extract the switching order of modes. Figure 2 illustrates an example of the cumulative dynamic difference feature extraction process for a sensor. The number of modes is m = 6; the length of data sequence is 40. The cumulative dynamic difference feature is the F t = [F 1 , ] for the system in tth action time.

Composite Health Indicator Construction
To avoid the redundancy among different features, this paper develops a feature-level fusion method to combine cumulative dynamic physical difference features to get a composite health indicator based on the multi-objective optimization model. The health indicator H(t) is regarded as a linear combination of each cumulative dynamic physical difference feature f o , as Equation (1), where o is the dimension of cumulative dynamic physical difference feature, and w is the weight coefficient of each cumulative dynamic difference feature, −1 < w < 1. To represent the actual degradation trend of system, the interpretable health indicator is constructed based on three properties.

Property 1:
The degradation of the system should be monotonic in general without external maintenance.
To establish the monotonic health indicator, the slack variable ε i,t is introduced to measure the violation of monotonicity of health indicator. The ε i,t of system i and action time t is ε i,t = max(H i,t − H i,t+1 , 0). Assuming that the indicator is monotonically increasing, Equation (2) tries to minimize the total weighted sum of violations to ensure the monotonicity of the health indicator.
where p is number of system samples, and q is the maximum observation epochs (action times) of each sample. M is a diagonal matrix to express the degradation trend information. For example, if the feature shows an increasing (decreasing) trend, the diagonal entry of M is 1 (−1). The different system i varies in the life cycle length q i . Property 2: The first hitting time of failure is regarded as the failure threshold θ i of each system. The systems with the same functions and components should have similar fault thresholds under the same environmental conditions. The predictability is expressed by minimizing the variance of θ i , as Assuming A ∈ R p×o is the matrix recording the feature of the last observation epoch in full life-cycle of each system, the variance can be translated as the quadratic term w A BAw based on Equation (3): where B is a symmetric matrix; B = (I − O/p)/(p − 1). I is the identity matrix, and O is the matrix which each entry equals 1. Above all, Property 2 could be formulated with an optimization problem, as Equation (4): Property 3: For multi-modal systems, although different mode combinations would result in different short-term fluctuations of each sample, there are still similar degradation trends for those samples from the long-term perspective.
Samples of multi-modal systems with identical structures usually have different lengths of life cycle. Therefore, it is necessary to calculate the similarities between different lengths of degradation sequences. In this paper, the dynamic time warping (DTW) method is used to measure the similarities of different health indicator series.
The similarity between The obtained warp path D indicates the correspondence between H α (t α ) and H β (t β ), permissible point-to-point, and many-to-one point correspondences.
The obtained warp path D should satisfy three constraints: 1) The constraint of endpoint ensures that the starting point and the endpoint are consistent when the sequences are aligned, as D 1 = (1, 1), D N path = (q α , q β ). 2) The continuity constraint is used to limit the excessive expansion of the dynamic time warping and limit the jump correspondence, as Then, the minimum distance of dynamic time warping is revised as Equation (5), where d(H α (t αd ), H β (t β d )) is the euclidean distance between two points. To reduce the amount of calculation, the upper triangular matrix of the DTW matrix which consists of DTW(H α , H β ), 1 ≤ α ≤ p, 1 ≤ β ≤ p is used to calculate the distance sum of samples. Then, the Property 3 could be formulated as Equation (6), Above all, the optimization problem of composite health indicator construction model could be formulated as Equation (7), where λ is the parameter adjusting the relative importance of the three properties. Based on solving the multi-objective programming equation to calculate (w 1 , w 2 , ...w o ), the composite health indicator H(t) with good monotonicity and generalization will be calculated from cumulative dynamic physical difference feature with Equation (1).

Cumulative Degradation Assessment with Gated Recurrent Unit
The recurrent neural network has good performance when processing arbitrary time-sequences, based on the network structure in which neurons could connect to themselves across time. Because the problem of long-term dependencies, it is hard for the recurrent neural network to learn to store information for very long [35]. To solve the above problems, this paper adopts the gated recurrent unit, which is another recurrent neural network variant, to estimate remaining useful life with a long-term cumulative degradation health indicator.
In the gated recurrent unit, the gradient disappearance or explosion problem is alleviated by introducing an additive structure of the gate. There are two gates to process data information. The reset gate r(t) is used to adjust the incorporation of current input with the previous memory. The preservation of the previous memory is controlled by the update gate u(t). The transition function is formulated as Equations (8) and (9), where W and V represent the weight matrices of H(t) andŷ(t − 1), respectively, and b is the bias vector for the update gate and reset gate. Since the reset gate controls the degree of ignoring the information and the update gate controls the impact of the previous information, the output of current status can be represented as Equation (10): Regarding the cumulative dynamic differential health indicator H(t) as input, the corresponding forecast resultŷ(t) could be obtained, which is the long-term cumulative degradation assessmentŷ(t) c .

Latest-Term Degradation Assessment Based on Local Features
The aging of an engineering system is usually slow in the early stage and rapid in the later stage. Therefore, it is necessary to study the latest-stage aging data of the system in detail.

Construction of Local Aging Features
Constructing local aging features of these complex signal series is significant for obtaining useful aging information. In order to predict latest stage degradation, the overlapped time window is used to decompose the time series. The size of the time window is L. Two kinds of features, statistical features and energy features, are extracted, as shown in Figure 3.
The statistical features are used to represent the mathematical rules of sequence, which are the average value v 1 , variance v 2 and skewness v 3 . For multi-modal systems, the ensemble empirical mode decomposition (EEMD) algorithm is used to decompose the original signal to intrinsic mode function (IMF) components based on the local characteristic time scale of the raw signal . Among them, the white Gaussian noise with uniform distribution is added in the time-frequency space to reduce the mixing degree of IMF component model and achieve signal continuity in different frequency regions. The algorithm's steps are as follows.  Firstly, the number of empirical mode decomposition executions is N and the amplitude coefficient of white noise. In execution epoch n, the white Gaussian noise series g(t) is added to the signal data s(t), as the s noise n = s(t) + g n (t). In order to obtain IMF components of s noise n , the mean of the upper and lower envelope m(t) is obtained between the local maximum and minimum of s noise n . Let l(t) = s noise n − m(t); if l n (t) satisfies the condition of intrinsic mode function component, it means that it is the first IMF. When it does not, replace s noise n with l n (t) and repeat calculation l n (t) until the IMF conditions are satisfied. The l n (t) from s noise n is subtracted to obtain the remainder. The process of extracting an IMF component is repeated until all components are separated from the s noise n . The remainder becomes a monotonic function r n (t), Equation (11): where Z is the number of IMFs which contains the local characteristics of original signal. In each empirical mode's decomposition execution time, a different white noise series is added to s(t).
After that, we calculate the mean value of all IMFs in N times as in Equation (12):

Latest-Term Degradation Assessment Based on Local Features and LightGBM
In present research, the gradient boosting decision tree has good performance when predicting short sequences, but the computational complexity needs to be improved. In this paper, the improved gradient boosting decision tree, the light gradient boosting machine, is used to predict the remaining useful life of latest stage series based on local features. LightGBM used the leaf-wise growth with depth-limiting strategy instead of the traditional level-wise strategy to train decision trees to reduce errors and improve accuracy. The leaf-wise strategy is used to find the leaf with the largest split gain from all current leaves, and then split each time to reduce more errors and obtain higher accuracy at the same split time. To ensure high efficiency and prevent over-fitting, the maximum depth limit is added to the leaf-wise strategy.
For the local features (v 1 , v 2 , v 3 , e 0 , e 1 ...e z ), J decision trees are trained to predict the residuals of the prior models. If ϕ j is the learning function of the jth decision tree and theŷ (j) i is the prediction of for ith sample at the jth iteration; the process can be expressed as Equation (13): In each iteration, the current modelŷ (j) i is reserved and the learning residual function ϕ is learned by minimizing the objective function, as in Equation (14): The function consists of two parts. One is a loss function regarding the difference between the predictionŷ (J) i and the actual y i . The other is the regular term J ∑ j=1 Ω(ϕ j ) that penalizes the complexity of model, as in Equation (15): where µ is the penalty parameter for the number of leaves Z, ω is the weight of a leaf and τ is the penalty parameter. Considering that the samples with larger gradients play a more important role in calculating information gain, the gradient-based one-side sampling (GOSS) strategy [31] is used to segment the data. Firstly, the local feature dataset is sorted in descending order based on absolute values of gradients. Then, data instances with large gradients are retained, and samples with small gradients are randomly sampled while splitting out the set according to the estimated variance gain.
As Equation (16) shows, predictions of all decision trees are added to obtain the latest stage prediction. S tree is a space that contains all possible structures of trees.

Decision-Level Fusion Based on Model Averaging
The system's ability to withstand the damage is different for the system in different aging states. In order to integrate the long-term cumulative degradation and the latest stage degradation, the fusion model is used in this paper. The fusion model includes two kinds of methods, one is based on a bagging, boosting or stacking method, which consumes a lot of training time. Therefore, the averaging method is used to make a final decision in this paper. The long-term cumulative predictionŷ c is obtained as in Section 2.2. The latest stage predictionŷ l is obtained as in Section 2.2.2. Finally, the final resultŷ is calculated as in Equation (17) using the averaging method.
where coefficient γ is set as in [36].

Experiment and Discussion
In this section, we investigate the performance of proposed method using a synthetic dataset and an aero-engine turbofan dataset from NASA. The synthetic dataset is used to theoretical analysis. The turbofan aero-engine degradation dataset was published publicly in NASA's Prognostics Repository [37]; it simulates the degradation of engines in four working environments, so should verify the generality of the proposed model.

Evaluation Metrics
Samples working in different environments have different life cycle lengths. In this paper, therefore, the remaining useful life is set as the percentage of useful times. A linear function represents the degradation trajectory of system, and the remaining useful life y t is expressed by Equation (18).
y and y represent the predicted result and the real result. Ifŷ < y, the prediction result is considered as an early prediction. Ifŷ > y, the prediction result is considered as a late prediction. Considering that late predictions have more serious consequences, two evaluation metrics, the root mean squared error (RMSE) and scores are used to evaluate the performance of the proposed method. Root mean squared error gives an equal penalty to late and early predictions, as in Equation (19): which is y (t) =ŷ(t) − y(t). The score metric was proposed in the PHM 2008 Data Challenge [38] and defined as an asymmetric function that penalizes late predictions more than the early predictions. The expression of score function is a weighted sum of remaining useful life errors, as in Equation (20):

Theoretical Analysis
This section is used to discuss the performance of the proposed method based on different trend synthetic data. The unchanged data has constant outputs throughout the life of the system, which cannot provide useful information to facilitate prediction [1]. Most of the systems show a slow aging trend in the early stage and a fast aging trend in the later stage. Therefore, the synthetic data is constructed with exponential functional form (Equation (21)).

S syn (t) = exp(t) + Gaussian(t),
where, exp(t) = e a exp t+b exp is the exponential function to generate exponential data for different trends by adjusting parameter a exp and b exp . Gaussian(t) is the Gaussian random variables with the mean µ g and the variance σ 2 to simulate noise. The probability density of Gaussian noise follows the standard normal distribution (Equation (22)).
The normal distribution is symmetrical about t = µ g . µ g and σ are used to adjust the data fluctuation. Therefore, different parameter settings, (a exp , b exp , µ g , σ 2 ), are used to generate synthetic data for different trends. In the theoretical analysis, small scale samples are used for simulation analysis. The parameter settings of synthetic data samples are shown in Table 1. The dataset has 40 synthetic data samples, which have four different exponential forms and forty different Gaussian random variables. When a exp > 0, the trend is ascending and the trend is descending for a exp < 0. Then, the synthetic dataset is used to study the sensitivity of the proposed method to different types of input data. The results are shown in . Experimental results show that the method has good prediction performance on samples with different trends. The conclusions in the section are used as the basis for sensor selection during the experimental phase. More discussion on the method performance will be given in the experimental simulation section based on NASA's dataset.

Benchmark Data Description
The commercial modular aero-propulsion system simulation (C-MAPSS) turbofan aero-engine dataset [39] is widely used in prognostic studies. This dataset simulated the degradation of aero-engine in four working environments to get the corresponding sub-datasets FD001, FD002, FD003 and FD004. Each subset consists of a training set, a test set and corresponding remaining useful life values. The training samples degrade until the system failure, and the last time is regarded as the failure time of the engine unit. For test samples, the sensor data recording is stopped before the system fails and the failure period is recorded for verification. The details and statistical data of each subset are listed in Table 2. The monitoring data in each sample consist of 21 sensors (e.g., total temperature, pressure at fan inlet, physical core speed, etc.) and three operational settings which jointly determine the system's working mode [40].  The sample of FD001 suffered high-pressure compressor failure under a single operating condition. For FD002, the sample suffered the high pressure compressor failure under six operating conditions. In FD003, the sample suffered high pressure compressor and fan failure under a single operating condition, while in FD004 the sample suffered high pressure compressor and fan failure under six operating conditions. The proposed method was applied to four sub-datasets which have different working environments to verify the generality of the proposed model.

Data Preprocessing
Different sensors have different physical meanings and numerical ranges. In order to eliminate the influence of ranges of value on the degree of contribution, a standardized method is used to adjust the range of each sensor. The standardized rules are learned in the training set. Then, the sensors are selected based on the trends of the raw data.
To the system working under the single operating condition, sensors can be divided into three categories, which are ascending, descending and unchanged, according to the trend of the data, as shown in Figure 5a. Data without obvious changes throughout the life time of the system do not contribute to predicting the remaining useful life of system. Thus, 13 kinds of sensors are selected as the raw data of the aging status, including sensor data with ascending trends (2,3,4,8,9,11,13,15,17) and descending trends (7,12,20,21) for FD001, and 11 kinds of sensors (the 3, 4, 8, 13, 15, 17 have an ascending trend and the 7, 11, 12, 20, 21 have a descending trend) are selected for FD003. For the system working under multiple operating conditions, the trend of sensor data depends mainly on the switching of operating modes, so the degradation trend does not show a noticeable monotonic trend. As shown in Figure 5b, the same sensors 11, 16, 20 and 21 show irregular trends in FD002 (working in multi-modal). Therefore, principal component analysis method is used to reconstruct features based on the principle of maximum difference. The threshold value of total contribution is set to 95% to filter cumulative dynamic difference features of irregular sensors. More details about principal component analysis may be found in [41]. The working mode of the sample in the dataset is represented by three operational settings. To determine the mode switching sequence, k-means clustering method is used to label six working modes. The results are as shown in Figure 6.

Discussion of Results
The detailed description of the proposed method is given in the Section 2. Here, the processed data features are input into the framework. Then, the long-term cumulative degradation prediction and the latest stage degradation prediction are performed separately and the results are fused. After that, the model performance is evaluated and analyzed according to the evaluation metrics.
Since each sub-dataset has larger feature dimensions, considering that there may be a linear correlation between multi-dimensional features, the principal component analysis is used to reconstruct the data to reduce unnecessary calculations. Based that, FD001 retains 27 dimensional features, FD002 retains 14 dimensional features, FD003 retains 23 dimensional features and FD004 retains 14 dimensional features. Then, the multi-objective plan is used to establish health indicators based on monotonicity, threshold similarity and trend similarity. Figure 7 illustrates the comparison of a health indicator and three representative reconstructed features of random samples from different sub-datasets. The points show the data and the solid line is the smoothing data based on moving average smooth method to show the trend of the data more clearly. The experimental results show that the proposed health indicator is more monotonic and better model fitting.   Then, the gated recurrent unit model is used to predict remaining useful life for long-term cumulative degradation. Long short term memory network (LSTM) and stacked autoencoders (SAEs) are used to compare predictions. The results are shown in Figure 8, which indicates the proposed CDDHI-GRU mode has a lower RMSE and score values than other algorithms. In addition, it was found that the values of evaluation metrics of FD002 and FD004 are higher than those of FD001 and FD003, indicating that the samples working in multi-modal switching are more challenging for obtaining accurate prediction results than those working in single mode. On the other hand, the time window is used to extract energy and statistical features. It is worth noting that during decomposing the data of the time window based on ensemble empirical mode decomposition, the number of IMF components N im f is different for different local data. The reason is that the local data of the sample have various fluctuations, and there is noise in the data, so N im f is added as an energy feature.
The size of latest stage sequence L is a key parameter affecting the performance of model. For the sample with periodic mode switching, a multiple of the period is selected as the time window length. For samples with irregular mode switching, the time window length is selected by traversing. In order to avoid late prediction as much as possible, we use the scores to choose the length of time window. Figure 9 shows the effects of L on prediction accuracy, and the sizes of the time windows for the sub-datasets are 30, 55, 40 and 45 respectively. The reason is that a larger time window can cover more information to get better results, but extending the window length too much could adversely affect near-term forecasts. Finally, the model averaging is used to fuse the prediction results. After several experiments, the parameters of each sub-dataset are finally determined. Figure 10 illustrates the comparison of results for long-term cumulative degradation prediction, latest stage degradation prediction and fusion prediction for the same sample. The experimental results show that the long-term cumulative prediction model is more accurate in predicting the remaining useful life of the system at the early stage of aging, and the latest stage degradation prediction model is more accurate at predicting the system in the late stage of aging. The hybrid prediction model combines the advantages of the two and has better overall prediction performance. No matter whether the system is in an early or late stage of the fault, it has a stabler prediction effect on the life cycle of the system.
The estimated results based on the proposed model and the actual values of remaining useful life for the four sub-datasets' test samples are compared in Figure 11. In order to observe the performance of the prediction model intuitively, the test samples are arranged in descending order of the actual values .   0  50  100  150  200  250  Test sample with descending RUL   0  20  40  60  80  100  120  Test sample with descending RUL   0  20  40  60  80  100  0  50  100  150  200  250  Test sample with descending RUL  Test sample with Figure 11 shows that the prediction results of the proposed model are close to the actual value of remaining useful life, and it is worth noting that the prediction ability of the model in the early stage of aging is weaker than that in the late stage of aging. The result is that the fault characteristics in the early stage of aging are not obvious, and with the aging of the system, the prediction accuracy is gradually enhanced. In addition, with the increase in the number of faults and the number of operational modes, it is more difficult to obtain an accurate prediction. However, the overall prediction performance of the model remains stable. When it is difficult to predict accurately, the prediction results tend to an early prediction as opposed to a late prediction.
The proposed method is compared with other popular methods in this dataset to verify the superiority of the proposed method. The recurrent neural network (RNN), the long-short term memory network (LSTM), stacked autoencodesrs (SAEs), convolutional neural networks (CNN), the gradient boosting decision tree (GBDT) and support vector regression (SVR) as representative methods are evaluated in Table 3. The tabular data shows that the model proposed in this paper has better evaluation metric values and has been proven to perform well in both overall prediction and late prediction. Comparing the predictions between four sub-datasets, it was found that to accurately predict the remaining useful life of the system working in multiple modes is more challenging than that of the system working in single mode.

Conclusions
In this paper, a hybrid degradation modeling and prognostic method is proposed for a multi-modal system. Firstly, the cumulative dynamic differential feature is constructed to describe the physical characteristics with the existence of mode switching. The constructed health indicator is based on the three properties, including monotonicity, threshold similarity, and trend similarity. Based on the cumulative dynamic differential health indicator, the long-term cumulative degradation assessment model is constructed by gated recurrent unit. Then, the time window is used to extract energy and statistical features to obtain local features. The latest degradation effect is predicted based on light gradient boosting machine. Finally, the model is averaged to construct a decision-level fusion model.
In the experimental analysis section, the theoretical experiment uses synthetic data to study the sensitivity of the proposed method to different types of input data. Most of systems show a slow aging trend in the early stage and a fast aging trend in the later stage. The synthetic data with exponential functional form are constructed according to based on exponential functional and Gaussian random variable. The experimental result shows the proposed method has good performances on samples with different trends. In addition, experimental analysis on the C-MAPSS turbofan aero-engine dataset, which includes four different working environment sub-datasets, demonstrates the effectiveness and generality of the proposed method. Compared with other popular methods, the proposed model has shown very promising prognostic performance.
In the future, the proposed method will be extended in the following ways: 1) How can one reconstruct data samples while working with different mode switchings from historic data samples? The accuracy of prediction depends on the completeness of the dataset. An effective reconstruction sample model could supplement the dataset to improve the accuracy of prediction. 2) How can one construct the health indicators which have better prediction performance? This paper constructed a health indicator based on three properties, including monotonicity, threshold similarity and trend similarity. Exploring more properties of the health indicators and studying optimization solution of complex multi-objective programming are the challenges.