An Ensemble Prognostic Method of Francis Turbine Units Using Low-Quality Data under Variable Operating Conditions

The prognostic is the key to the state-based maintenance of Francis turbine units (FTUs), which consists of performance state evaluation and degradation trend prediction. In practical engineering environments, there are three significant difficulties: low data quality, complex variable operation conditions, and prediction model parameter optimization. In order to effectively solve the above three problems, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed in this study. Firstly, to consider the operation condition parameters, the running data set of the FTU is constructed by the water head, active power, and vibration amplitude of the top cover. Then, to improve the robustness of the proposed model against anomaly data, the density-based spatial clustering of applications with noise (DBSCAN) is introduced to clean outliers and singularities in the raw running data set. Next, considering the randomness of the monitoring data, the healthy state model based on the Gaussian mixture model is constructed, and the negative log-likelihood probability is calculated as the performance degradation indicator (PDI). Furthermore, to predict the trend of PDIs with confidence interval and automatically optimize the prediction model on both accuracy and certainty, the multiobjective prediction model is proposed based on the non-dominated sorting genetic algorithm and Gaussian process regression. Finally, monitoring data from an actual large FTU was used for effectiveness verification. The stability and smoothness of the PDI curve are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma. The root-mean-squared error, the prediction interval normalized average, the prediction interval coverage probability, the mean absolute percentage error, and the R2 score of the proposed method achieved 0.223, 0.289, 1.000, 0.641%, and 0.974, respectively. The comparison experiments demonstrate that the proposed method is more robust to low-quality data and has better accuracy, certainty, and reliability for the prognostic of the FTU under complex operating conditions.


Introduction
With the optimization of global energy structure, hydropower has been vigorously developed [1]. As critical equipment of hydropower utilization, it is important to ensure the safe and stable operation of Francis turbine units (FTUs) [2]. In order to transform the service mode of the FTUs from traditional schedule-based maintenance to state-based maintenance, the prognostic of FTUs has received more and more attention [3]. There are two main parts in the prognostic of FTUs, including performance state evaluation and degradation trend prediction [4]. Based on this, the current and future states of FTUs can be determined so as to formulate a targeted maintenance strategy. However, in practical engineering environments, there are three significant difficulties, including low data quality, complex variable operation conditions, and prediction model parameter optimization.
Thus the NSGA-II is adopted to optimize the GPR on the two objectives of accuracy and certainty to construct the multiobjective GPR (MOGPR) for degradation trend prediction of FTUs.
According to the above discussion, in the relative field of the prognostic of FTUs, there are few targeted processing methods for low-quality data obtained in practical engineering environments. Meanwhile, existing evaluation methods of FTU under complex conditions rarely consider the randomness of monitoring data. Moreover, in the interval prediction of PDIs, it is still difficult to optimize model parameters automatically while considering both the accuracy and certainty of results.
In this paper, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed. The major contributions are outlined as follows: (1) A monitoring data set cleaning approach of FTUs based on DBSCAN is proposed to identify both the singulars and outliers, which enhances the stability and smoothness of the obtained PDI curve. (2) The running data set of the FTU is constructed by fusing the operation condition parameters and the monitoring data. The HSM is established based on the GMM to realize accurate performance evaluation of the FTU under complex operating conditions to improve the robustness of the performance evaluation model against data missing and data randomness. (3) Coupling NSGA-II and GPR, the MOGPR is constructed to automatically tune model parameters, avoiding the dependence on prior knowledge, and the accuracy and certainty of probability prediction are improved synchronously.
The remainder of this paper is organized as follows: In Section 2, the proposed approach framework and related theories are expounded. In Section 3, an engineering application of the proposed method is presented. In Section 4, different data cleaning methods, HSM construction, and trend prediction are compared and discussed. Finally, a summary is presented in Section 5.

Proposed Approach
To evaluate and predict the performance state of FTUs more accurately under the circumstance of low-quality data and variable operation conditions, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed in this paper, which mainly includes four steps: data acquisition, data cleaning, performance state evaluation, and degradation trend prediction. The proposed framework flow chart is shown in Figure 1. First, the monitoring data of the water head, active power, and vibration amplitude of the top cover are integrated to construct the running data set of the FTU. Second, aiming at data loss and data anomaly, the data cleaning operation based on DBSCAN is implemented. Third, to solve the problem of monitoring data fluctuating with operation conditions, the HSM is established based on GMM. Then the negative log-likelihood probability (NLLP) is calculated as the PDI of the FTU. The relative trend of PDIs over time reflects the process of performance degradation. Fourth, the MOGPR model is constructed to predict the performance degradation trend of the FTU and takes both prediction accuracy and confidence interval into consideration. Finally, the validity of the performance evaluation model and degradation trend prediction model is evaluated by multi-criterions.

Data Acquisition
Due to their task of regulating the power grid, the operation condition parameters of FTUs change more frequently than other kinds of rotating machines. Therefore, unlike most traditional methods, operation condition parameters are taken into full consideration in the performance state evaluation in this study. The major operation condition parameters of FTUs include rotation speed (n), water head (H), active power (P), flow rate (Q), guide vanes opening degree (α), etc. Since the duration of transient operation conditions is particularly short compared with the total working time of FTUs, only steady operation conditions are considered in this research. So n can be considered equal to the rated value of the FTU. For a specific FTU, there is a certain relationship between H, P, Q, and α. If two of them are identified, others can be inquired from the comprehensive operation curve of the FTU [30]. Thus H and P are chosen as the studied operation condition parameters. As a critical component of the FTU, the top cover is used to seal the runner and support the main shaft. Its vibration amplitude (V) can reflect the performance state of the FTU. In conclusion, the running sample set of the FTU is formed by (H, P, V), including both operation condition parameters and monitoring data.

Data Cleaning Based on DBSCAN
To solve the data anomaly in the raw running sample set, a data cleaning approach based on DBSCAN is proposed. Traditional statistics-based methods such as the 3-sigma principle are effective in detecting singular points, but they can hardly identify outliers whose value is

Data Acquisition
Due to their task of regulating the power grid, the operation condition parameters of FTUs change more frequently than other kinds of rotating machines. Therefore, unlike most traditional methods, operation condition parameters are taken into full consideration in the performance state evaluation in this study. The major operation condition parameters of FTUs include rotation speed (n), water head (H), active power (P), flow rate (Q), guide vanes opening degree (α), etc. Since the duration of transient operation conditions is particularly short compared with the total working time of FTUs, only steady operation conditions are considered in this research. So n can be considered equal to the rated value of the FTU. For a specific FTU, there is a certain relationship between H, P, Q, and α. If two of them are identified, others can be inquired from the comprehensive operation curve of the FTU [30]. Thus H and P are chosen as the studied operation condition parameters. As a critical component of the FTU, the top cover is used to seal the runner and support the main shaft. Its vibration amplitude (V) can reflect the performance state of the FTU. In conclusion, the running sample set of the FTU is formed by (H, P, V), including both operation condition parameters and monitoring data.

Data Cleaning Based on DBSCAN
To solve the data anomaly in the raw running sample set, a data cleaning approach based on DBSCAN is proposed. Traditional statistics-based methods such as the 3-sigma principle are effective in detecting singular points, but they can hardly identify out- liers whose value is within the normal range [31]. DBSCAN clustering algorithm has the advantages of high adaptability, extensible dimension, and applicability to arbitrary data shape [32]. It can automatically identify the singularities and outliers in the multidimensional data set. Its schematic diagram is shown in Figure 2, and the main steps are as follows [33]: within the normal range [31]. DBSCAN clustering algorithm has the advantages of high adaptability, extensible dimension, and applicability to arbitrary data shape [32]. It can automatically identify the singularities and outliers in the multi-dimensional data set. Its schematic diagram is shown in Figure 2, and the main steps are as follows [33]: Step 1: For the sample set , the region whose Euclidean distance from the sample point i ϕ is less than ε is defined as the ε neighborhood of i ϕ . If the sample number in the neighborhood of i ϕ is greater than the threshold M, i ϕ is defined as the core point. Samples that are not core points but are in the ε neighborhood of a core point are defined as boundary points. Samples that are not core points and are not boundary points are defined as noise points. The ε neighborhood of a core point is defined as a temporary cluster j C′ .
Step 2: Traversal the sample set Φ , if the sample point i ϕ in the temporary cluster m C′ is also a core point in another temporary cluster n C′ , the union set m n C C ′ ′  is defined as a new temporary cluster.
Step 3: Repeat Step 2 until the sample points in each temporary cluster are all core points or boundary points, then each temporary cluster is determined as a cluster k C′ .

Performance State Evaluation Based on GMM
The performance degradation process of the FTU is reflected in the PDD variation of the running sample set (H, P, V). Thus the GMM is adopted to fit the three-dimensional PDD function of the healthy data as the HSM. In GMM, the population distribution of the sample set is assumed to be a combination of a series of Gaussian distributions: where X represents the healthy data set, k w represents the weight coefficient.
is the PDD function of the k th Gaussian component given by: The expectation-maximization algorithm is used to estimate k w , and k θ of the GMM [34]: Step 1: For the sample set Φ = {ϕ 1 , ϕ 2 , · · · , ϕ N }, the region whose Euclidean distance from the sample point ϕ i is less than ε is defined as the ε neighborhood of ϕ i . If the sample number in the neighborhood of ϕ i is greater than the threshold M, ϕ i is defined as the core point. Samples that are not core points but are in the ε neighborhood of a core point are defined as boundary points. Samples that are not core points and are not boundary points are defined as noise points. The ε neighborhood of a core point is defined as a temporary cluster C j .
Step 2: Traversal the sample set Φ, if the sample point ϕ i in the temporary cluster C m is also a core point in another temporary cluster C n , the union set C m ∪ C n is defined as a new temporary cluster.
Step 3: Repeat Step 2 until the sample points in each temporary cluster are all core points or boundary points, then each temporary cluster is determined as a cluster C k .

Performance State Evaluation Based on GMM
The performance degradation process of the FTU is reflected in the PDD variation of the running sample set (H, P, V). Thus the GMM is adopted to fit the three-dimensional PDD function of the healthy data as the HSM. In GMM, the population distribution of the sample set is assumed to be a combination of a series of Gaussian distributions: where X represents the healthy data set, w k represents the weight coefficient.
is the PDD function of the k th Gaussian component given by: The expectation-maximization algorithm is used to estimate w k , and θ k of the GMM [34]: whereθ is the maximum likelihood estimate value of θ. The NLLP represents the probability that current data is observed based on the prior given by the GMM. It represents the difference between the PDD of running data X and the constructed HSM. So the NLLP is calculated as the PDI of the FTU.

Degradation Trend Prediction Based on MOGPR
Through the above procedures, the performance state of the FTU is quantified as PDIs. The performance degradation prediction of the FTU is transformed to a time series prediction task. In this section, the GPR and NSGA-II are combined to construct the MOGPR model for the degradation trend prediction.

GPR Algorithm
As a nonparametric Bayesian inference model, GPR is widely used in probability interval prediction [35,36]. In GPR, the distribution of possible values at each time point is assumed to obey the Gaussian distribution, which can be expressed in terms of the expectation function µ f and the covariance function κ(·) as: where X is the independent variables, κ(·) is also called the kernel function, κ f f = κ(X, X). The joint distribution of the actual observed values Y * and Y also follows the Gaussian distribution.
Finally, the GPR model can be expressed as: The kernel functions κ(·) are used to enhance the representation of relationships between input samples. Various kernel functions have different properties and characteristics. The commonly used kernel functions include radial basis function (RBF), matern (MA), rational quadratic (RQ). The kernel function adopted in this study is composed of them.
These kernel functions all contain a length scale l. To improve the performance of the model, the common approach is to adjust l manually, which is less efficient. Aiming at this problem, the NSGA-II algorithm is introduced to automatically optimize the GPR model. Since GPR outputs the PDD of predicted values, quantiles and confidence intervals (CIs) can be calculated directly. Interval prediction results are usually evaluated by multiple criteria, including the root-mean-square error (RMSE), which reflects the accuracy, and the prediction interval normalized average (PINAW), which reflects certainty, defined as: (12) where N is the number of series data,ŷ and y * represents the prediction value and the actual value, respectively. ∆y * is the difference between the maximum and minimum values of the actual series, u i and l i represents the upper and lower boundary of 95% CI. Lower RMSEs and PINAWs indicate better accuracy and certainty of the prediction model.

NSGA-II Algorithm
NSGA-II makes excellent improvements on NSGA, and it has faster computational efficiency and population diversity [37]. There are two basic concepts in NSGA-II: nondominated sorting and crowding distance. The procedure of non-dominated sorting begins with the identification of non-dominated solutions. As shown in Figure 3, for two members m i and m j , if all the objectives of m i are better than m j , m i is defined to dominate m j . Then, the members which are not dominated by others constitute the current front. Next, the members of the current front are removed, and the sorting is performed on the remained population. The procedure is repeated until all the members are distributed to different fronts. prediction interval normalized average (PINAW), which reflects certainty, defined as: where N is the number of series data, ŷ and * y represents the prediction value and the actual value, respectively. * y Δ is the difference between the maximum and minimum values of the actual series, i u and i l represents the upper and lower boundary of 95% CI.
Lower RMSEs and PINAWs indicate better accuracy and certainty of the prediction model.

NSGA-II Algorithm
NSGA-II makes excellent improvements on NSGA, and it has faster computational efficiency and population diversity [37]. There are two basic concepts in NSGA-II: nondominated sorting and crowding distance. The procedure of non-dominated sorting begins with the identification of non-dominated solutions. As shown in Figure  ( ) where K is the dimension number of the objectives. Selecting members with a high crowding distance can improve the diversity of the population.  The crowding distance is used to measure the density of members. It is defined as the sum of the side length of the cuboid shown in Figure 3.
where K is the dimension number of the objectives. Selecting members with a high crowding distance can improve the diversity of the population.
The brief schematic of NSGA-II is illustrated in Figure 4, and the main steps are as follows: Step 1: The population is initialized. Then, the offspring population Q t is generated from the parental population P t through crossover and mutation operations. The population sizes of P t and Q t are both N.
Step 2: The P t and Q t are merged to form the R t . The non-dominated sorting is performed on R t , and a series of fronts F i are obtained. Step 3: The fronts are selected in sort order to form the P t+1 until the population size of P t+1 exceeds N. The members of the last front F l are sorted by the crowding distance. The members are selected in order until the population size of P t+1 equals N.
Step 4: The above steps are repeated until the maximum number of evolution is reached. Then the first front is selected as the Pareto front.
The brief schematic of NSGA-II is illustrated in Figure 4, and the main steps are as follows: Step 1: The population is initialized. Then, the offspring population Qt is generated from the parental population Pt through crossover and mutation operations. The population sizes of Pt and Qt are both N.
Step 2: The Pt and Qt are merged to form the Rt. The non-dominated sorting is performed on Rt, and a series of fronts Fi are obtained.
Step 3: The fronts are selected in sort order to form the Pt+1 until the population size of Pt+1 exceeds N. The members of the last front Fl are sorted by the crowding distance. The members are selected in order until the population size of Pt+1 equals N.
Step 4: The above steps are repeated until the maximum number of evolution is reached. Then the first front is selected as the Pareto front.

MOGPR Model
The GPR and the NSGA-II are coupled to construct the MOGPR model. The threedimensional decision vector is formed by the length scales of RBF, MA, and RQ in the GPR model. The multiple objectives include both RMSE and PINAW. The schematic diagram of the MOGPR is illustrated in Figure 5. The GPR model sets parameter values according to the decision vector generated by NSGA-II and calculates the prediction results. Then the results are compared with the actual values to obtain multi objectives. NSGA-II updates population location according to the multi objectives. Through several epochs of evolution, the Pareto front of the optimal result is finally output.

Engineering Application
In this section, the long-term monitoring data of a large actual FTU is adopted to conduct the experiments. The basic information of the FTU and the data source are described first. Then the data cleaning based on DBSCAN is implemented on the raw run-

MOGPR Model
The GPR and the NSGA-II are coupled to construct the MOGPR model. The threedimensional decision vector is formed by the length scales of RBF, MA, and RQ in the GPR model. The multiple objectives include both RMSE and PINAW. The schematic diagram of the MOGPR is illustrated in Figure 5. The GPR model sets parameter values according to the decision vector generated by NSGA-II and calculates the prediction results. Then the results are compared with the actual values to obtain multi objectives. NSGA-II updates population location according to the multi objectives. Through several epochs of evolution, the Pareto front of the optimal result is finally output.
The brief schematic of NSGA-II is illustrated in Figure 4, and the main steps are as follows: Step 1: The population is initialized. Then, the offspring population Qt is generated from the parental population Pt through crossover and mutation operations. The population sizes of Pt and Qt are both N.
Step 2: The Pt and Qt are merged to form the Rt. The non-dominated sorting is performed on Rt, and a series of fronts Fi are obtained.
Step 3: The fronts are selected in sort order to form the Pt+1 until the population size of Pt+1 exceeds N. The members of the last front Fl are sorted by the crowding distance. The members are selected in order until the population size of Pt+1 equals N.
Step 4: The above steps are repeated until the maximum number of evolution is reached. Then the first front is selected as the Pareto front.

MOGPR Model
The GPR and the NSGA-II are coupled to construct the MOGPR model. The threedimensional decision vector is formed by the length scales of RBF, MA, and RQ in the GPR model. The multiple objectives include both RMSE and PINAW. The schematic diagram of the MOGPR is illustrated in Figure 5. The GPR model sets parameter values according to the decision vector generated by NSGA-II and calculates the prediction results. Then the results are compared with the actual values to obtain multi objectives. NSGA-II updates population location according to the multi objectives. Through several epochs of evolution, the Pareto front of the optimal result is finally output.

Engineering Application
In this section, the long-term monitoring data of a large actual FTU is adopted to conduct the experiments. The basic information of the FTU and the data source are described first. Then the data cleaning based on DBSCAN is implemented on the raw run-

Engineering Application
In this section, the long-term monitoring data of a large actual FTU is adopted to conduct the experiments. The basic information of the FTU and the data source are described first. Then the data cleaning based on DBSCAN is implemented on the raw running data set. Next, the HSM of the FTU under variable operation conditions is constructed based on GMM. Finally, the performance degradation trend of the FTU is predicted by the proposed MOGPR.

Data Description
The studied FTU is located in the Dadu River basin, Sichuan, China. Its basic parameters are listed in Table 1. A set of PSTA-2100 state monitoring systems, including on-site monitoring cabinet and upper computer system, is configured on the FTU. The on-site monitoring cabinet is formed by a sensor power module, data acquisition module, synchronous clock module, and industry cabinet. It is located near the FTU. The top cover vibration (V) is monitored by the acceleration sensor (PCB 352A60). The water head (H) and the active power (P) are obtained through communication with the supervisory control system of the plant following the modbus 485 protocol. The collected data are transmitted to the upper computer through the communication link following TCP/IP protocol. The actual length of the communication link, which consists of optical cables, switches, routers, repeaters, and transceivers, is above 1000 m. The analyzed sample set (H, P, V) is constructed by the data records exported from the upper computer. The overall structure of the FTU and the data acquisition system are illustrated in Figure 6. To sum up, the data acquisition process involves a series of devices and processes, which are prone to data loss and data abnormality.  The exported data includes 104,454 historical records from 20 January 2019 to 11 October 2019 for a total of 264 days, as shown in Figure 7. The on-site dataset is quite different from the high-quality data obtained in the ideal experimental environment, such as the bearing life cycle dataset published by the Center for Intelligent Maintenance Systems, University of Cincinnati [38]. Most experimental data sets are less disturbed by external interference and have stable sampling frequency. However, it can be seen that the data loss and data anomaly are very obvious in the raw on-site data set. Data loss leads to unequal time stamps of samples. As shown in Figure 8, the time intervals between adja-  Figure 7. The on-site dataset is quite different from the high-quality data obtained in the ideal experimental environment, such as the bearing life cycle dataset published by the Center for Intelligent Maintenance Systems, University of Cincinnati [38]. Most experimental data sets are less disturbed by external interference and have stable sampling frequency. However, it can be seen that the data loss and data anomaly are very obvious in the raw on-site data set. Data loss leads to unequal time stamps of samples. As shown in Figure 8, the time intervals between adjacent samples ranged from 10 s to 10 3 s, which makes it difficult to analyze the frequency domain features of signals. In addition, because of the task of grid regulation and seasonal changes in the water head, the operation condition parameters of the FTU change frequently and drastically. Low data quality and variable operation conditions greatly influence the performance state evaluation and degradation trend prediction of the FTU. The exported data includes 104,454 historical records from 20 January 2019 to 11 October 2019 for a total of 264 days, as shown in Figure 7. The on-site dataset is quite different from the high-quality data obtained in the ideal experimental environment, such as the bearing life cycle dataset published by the Center for Intelligent Maintenance Systems, University of Cincinnati [38]. Most experimental data sets are less disturbed by external interference and have stable sampling frequency. However, it can be seen that the data loss and data anomaly are very obvious in the raw on-site data set. Data loss leads to unequal time stamps of samples. As shown in Figure 8, the time intervals between adjacent samples ranged from 10 s to 10 3 s, which makes it difficult to analyze the frequency domain features of signals. In addition, because of the task of grid regulation and seasonal changes in the water head, the operation condition parameters of the FTU change frequently and drastically. Low data quality and variable operation conditions greatly influence the performance state evaluation and degradation trend prediction of the FTU.

Data Cleaning
The raw running sample set (H, P, V) is exhibited in Figure 9. Due to the characteristic of the FTU, there is a specific limited operation region within the operating conditions. In the limited area, the stability and efficiency of the FTU decrease. In practice, the FTU is avoided from working in the limited area, so the sample points are concentrated in two regions. Because of various interference factors, the data anomaly is very obvious in the raw data set. The data anomaly mainly includes singular points whose values deviate significantly from the normal level and outliers whose values are within the normal scope, but their distribution deviates from the valid samples. To reduce the impact of data anomaly, DBSCAN was adopted to identify the singulars and outliers in the original sample set. The ε and M were set as 7.5 and 200, respectively. The processing result is shown in

Data Cleaning
The raw running sample set (H, P, V) is exhibited in Figure 9. Due to the characteristic of the FTU, there is a specific limited operation region within the operating conditions. In the limited area, the stability and efficiency of the FTU decrease. In practice, the FTU is avoided from working in the limited area, so the sample points are concentrated in two regions. Because of various interference factors, the data anomaly is very obvious in the raw data set. The data anomaly mainly includes singular points whose values deviate significantly from the normal level and outliers whose values are within the normal scope, but their distribution deviates from the valid samples. To reduce the impact of data anomaly, DBSCAN was adopted to identify the singulars and outliers in the original sample set. The ε and M were set as 7.5 and 200, respectively. The processing result is shown in Figure 10. The two sample concentrated regions are marked as cluster 1 and cluster 2. Singulars and outliers are both marked as anomaly data automatically. Since the DBSCAN considers the distribution density of samples rather than the value of data, it performs well in identifying both outliers and singularities in the low-quality data set. Singulars and outliers are both marked as anomaly data automatically. Since the DBSCAN considers the distribution density of samples rather than the value of data, it performs well in identifying both outliers and singularities in the low-quality data set.

Performance State Evaluation of the FTU
Routine maintenance was implemented before 20 January 2019 on the studied FTU, and the restart test run performed normally. Moreover, as shown in Figure 7, the period from 20 January to 6 May 2019 includes most of the possible operating conditions of the FTU. Therefore, a total of 58,175 valid data points during this period were selected to construct the healthy sample set.
The GMM was adopted to fit the three-dimensional PDD function of the healthy sample set for the HSM construction of the FTU, as illustrated in Figure 11a

Performance State Evaluation of the FTU
Routine maintenance was implemented before 20 January 2019 on the studied FTU, and the restart test run performed normally. Moreover, as shown in Figure 7, the period from 20 January to 6 May 2019 includes most of the possible operating conditions of the FTU. Therefore, a total of 58,175 valid data points during this period were selected to construct the healthy sample set.
The GMM was adopted to fit the three-dimensional PDD function of the healthy sample set for the HSM construction of the FTU, as illustrated in Figure 11a The valid data after 6 May 2019 were selected to construct the evaluation sample set. All 300 evaluation samples were input into the HSM as a group to calculate the PDIs. The timestamp corresponding to the last evaluation sample in each group was taken as the evaluation time. The obtained performance degradation curve, including 135 PDIs, is demonstrated in Figure 12. Although the PDI curve fluctuates locally due to low data quality, the curve has an obvious upward trend reflecting the performance degradation process of the FTU. The valid data after 6 May 2019 were selected to construct the evaluation sample set. All 300 evaluation samples were input into the HSM as a group to calculate the PDIs. The timestamp corresponding to the last evaluation sample in each group was taken as the evaluation time. The obtained performance degradation curve, including 135 PDIs, is demonstrated in Figure 12. Although the PDI curve fluctuates locally due to low data quality, the curve has an obvious upward trend reflecting the performance degradation process of the FTU.
The valid data after 6 May 2019 were selected to construct the evaluation sample set. All 300 evaluation samples were input into the HSM as a group to calculate the PDIs. The timestamp corresponding to the last evaluation sample in each group was taken as the evaluation time. The obtained performance degradation curve, including 135 PDIs, is demonstrated in Figure 12. Although the PDI curve fluctuates locally due to low data quality, the curve has an obvious upward trend reflecting the performance degradation process of the FTU.

Degradation Trend Prediction of the FTU
To forecast the performance degradation trend of the FTU, a rolling prediction model based on MOGPR was established. Every seven PDIs, as the time window, were input into the prediction model, and the next PDI was predicted. The sliding step was set as one PDI point. As the time window slides, the prediction model is constantly updated.
The RBF, MA, and RQ were chosen to construct the kernel function of the GPR model.  Table 2. The prediction result that is closest to the origin in the Pareto front of multiobjective optimization is exhibited in Figure 13. The proposed MOGPR takes both accuracy and certainty of interval prediction into account. The maximum error does not exceed 0.47, and the maximum width of CI does not exceed 3.41.

Degradation Trend Prediction of the FTU
To forecast the performance degradation trend of the FTU, a rolling prediction model based on MOGPR was established. Every seven PDIs, as the time window, were input into the prediction model, and the next PDI was predicted. The sliding step was set as one PDI point. As the time window slides, the prediction model is constantly updated.
The RBF, MA, and RQ were chosen to construct the kernel function of the GPR model. The length scales l RBF , l MA and l RQ were optimized with the NSGA-II according to two objectives of RMSE and PINAW. The search scopes of the above three length scales were all set as [10 −5 , 1]. The parameter configuration of the NSGA-II is listed in Table 2. The prediction result that is closest to the origin in the Pareto front of multiobjective optimization is exhibited in Figure 13. The proposed MOGPR takes both accuracy and certainty of interval prediction into account. The maximum error does not exceed 0.47, and the maximum width of CI does not exceed 3.41.

Comparison Analyses
To validate the effectiveness of the proposed approach, different methods of data cleaning, HSM construction, and trend prediction were compared. The experiments were

Comparison Analyses
To validate the effectiveness of the proposed approach, different methods of data cleaning, HSM construction, and trend prediction were compared. The experiments were conducted by Python 3.7.9 in a calculation station with an Intel Core i9-10900K CPU, an NVIDIA GeForce RTX 2080 Super GPU, and 64 GB RAM.

Comparison of Different Data Cleaning Methods
To validate the effectiveness of the proposed data cleaning method, the DBSCAN and 3-sigma principle are compared in this section. Different data sets were adopted to establish HSMs, including the raw data set Φ 0 , the 3-sigma-processed data set Φ 1 , and the DBSCAN-processed data set Φ 2 . The processing result of the 3-sigma principle is shown in Figure 14. Compared with Figure 10, it can be discovered that the 3-sigma principle can recognize partial singularities but can hardly identify outliers.
Then the PDI curves were calculated in the same way as Section 3.3. The STD and S were adopted as the criteria of stability and smoothness, defined as follows: where I represents the PDI and N is the number of PDIs in the curve. Smaller S and STD indicate that the curve is smoother and the represented performance degradation trend is clearer.
conducted by Python 3.7.9 in a calculation station with an Intel Core i9-10900K CPU, an NVIDIA GeForce RTX 2080 Super GPU, and 64 GB RAM.

Comparison of Different Data Cleaning Methods
To validate the effectiveness of the proposed data cleaning method, the DBSCAN and 3-sigma principle are compared in this section. Different data sets were adopted to establish HSMs, including the raw data set 0 Φ , the 3-sigma-processed data set 1 Φ , and the DBSCAN-processed data set 2 Φ . The processing result of the 3-sigma principle is shown in Figure 14. Compared with Figure 10, it can be discovered that the 3-sigma principle can recognize partial singularities but can hardly identify outliers. Then the PDI curves were calculated in the same way as Section 3.3. The STD and S were adopted as the criteria of stability and smoothness, defined as follows: ( ) where I represents the PDI and N is the number of PDIs in the curve. Smaller S and STD indicate that the curve is smoother and the represented performance degradation trend is clearer. The comparison results are shown in Table 3 and Figure 15. It can be seen that: (1) Due to the influence of data anomaly, the performance degradation curve obtained by the raw data set presents many mutation points, which is difficult to accurately reflect the real performance state of the FTU.
(2) There are 28.7% and 50.2% decreases in the STD and S of the raw PDI curve, respectively, using 3-sigma, while there are 93.1% and 97.2% decreases in the STD and S, respectively, using DBSCAN. The stability and smoothness of PDI curves are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma (3) After data cleaning based on DBSCAN, the calculated PDI curve has the lowest STD and S (1.42 and 0.24), which indicates the curve reveals the performance degradation trend more clearly. The comparison results are shown in Table 3 and Figure 15. It can be seen that: (1) Due to the influence of data anomaly, the performance degradation curve obtained by the raw data set presents many mutation points, which is difficult to accurately reflect the real performance state of the FTU. (2) There are 28.7% and 50.2% decreases in the STD and S of the raw PDI curve, respectively, using 3-sigma, while there are 93.1% and 97.2% decreases in the STD and S, respectively, using DBSCAN. The stability and smoothness of PDI curves are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma (3) After data cleaning based on DBSCAN, the calculated PDI curve has the lowest STD and S (1.42 and 0.24), which indicates the curve reveals the performance degradation trend more clearly.

Comparison of Different Methods for HSM Construction
To compare the robustness of HSMs towards data loss, different ratios (r) of randomly selected healthy samples were adopted to build HSMs based on GMM, SVM, and . The major parameters of the models mentioned above are listed in Table 4. Then the entire healthy sample set was input into the HSMs to calculate PDIs. Each experiment was repeated 100 times, and the STDs of results were calculated. In order to make the PDI results of different HSMs comparable, the maximum-minimum normalization was implemented to transform the value range of PDIs to [0, 1]. Unlike GMM, SVM and BPNN establish the mapping relationship between operation condition parameters and vibration amplitude values. Thus the PDI of HSMs based on SVM and BPNN was defined as: where * V , * H , and * P represent the real values of vibration amplitude, water head, and active power. V denotes the mapping value outputted by the HSM. The comparison results are illustrated in Figure 16. As the sample quantity decreases, the results' STD based on GMM increases slowly when r is greater than 0.6. The results' STDs based on SVM and BPNN are relatively higher and have no significant relationship with the data quantity of HSM construction. Since the HSM based on GMM looks at the PDD of samples rather than the specific values of individual samples, the actual distribution of the population can be estimated from the sample distribution as long as the total quantity of samples is enough. Therefore, the proposed HSM is less affected by data loss.

Comparison of Different Methods for HSM Construction
To compare the robustness of HSMs towards data loss, different ratios (r) of randomly selected healthy samples were adopted to build HSMs based on GMM, SVM, and BPNN, respectively, r ∈ [0.2, 0.4, 0.6, 0.8, 1]. The major parameters of the models mentioned above are listed in Table 4. Then the entire healthy sample set was input into the HSMs to calculate PDIs. Each experiment was repeated 100 times, and the STDs of results were calculated. In order to make the PDI results of different HSMs comparable, the maximum-minimum normalization was implemented to transform the value range of PDIs to [0, 1]. Unlike GMM, SVM and BPNN establish the mapping relationship between operation condition parameters and vibration amplitude values. Thus the PDI of HSMs based on SVM and BPNN was defined as: where V * , H * , and P * represent the real values of vibration amplitude, water head, and active power.V denotes the mapping value outputted by the HSM. The comparison results are illustrated in Figure 16. As the sample quantity decreases, the results' STD based on GMM increases slowly when r is greater than 0.6. The results' STDs based on SVM and BPNN are relatively higher and have no significant relationship with the data quantity of HSM construction. Since the HSM based on GMM looks at the PDD of samples rather than the specific values of individual samples, the actual distribution of the population can be estimated from the sample distribution as long as the total quantity of samples is enough. Therefore, the proposed HSM is less affected by data loss.

Comparison of Different Prediction Methods
To verify the effectiveness of the proposed MOGPR, different prediction models are compared, including two multiobjective optimization machine learning models named MOQRLSTM and MOQRNN, two machine learning models without parameter optimization named QRLSTM and QRNN [39], three nonparametric statistical regression models named locally weighted linear regression (LWLR) [40] and kernel regression (KR) [41]. The loss functions of QRLSTM and QRNN are defined as quantile regression functions [42], which makes them predict conditional quantiles. According to the two objectives of RMSE and PINAW, the node numbers of each hidden layer of QRLSTM and QRNN are optimized by NSGA-II to construct the MOQRLSTM and MOQRNN. The NSGA-II is set as the same as Section 3.4. LWLR and KR are widely used to predict time series data with trend terms of different scales because their algorithms are simple, and the time costs are low. The main parameters of the mentioned models are listed in Table 5.

Comparison of Different Prediction Methods
To verify the effectiveness of the proposed MOGPR, different prediction models are compared, including two multiobjective optimization machine learning models named MOQRLSTM and MOQRNN, two machine learning models without parameter optimization named QRLSTM and QRNN [39], three nonparametric statistical regression models named locally weighted linear regression (LWLR) [40] and kernel regression (KR) [41]. The loss functions of QRLSTM and QRNN are defined as quantile regression functions [42], which makes them predict conditional quantiles. According to the two objectives of RMSE and PINAW, the node numbers of each hidden layer of QRLSTM and QRNN are optimized by NSGA-II to construct the MOQRLSTM and MOQRNN. The NSGA-II is set as the same as Section 3.4. LWLR and KR are widely used to predict time series data with trend terms of different scales because their algorithms are simple, and the time costs are low. The main parameters of the mentioned models are listed in Table 5. For a more complete performance comparison of the various models, in addition to the two objectives of RMSE and PINAW, the other fourcriteria named prediction interval coverage probability (PICP), mean absolute percentage error (MAPE), R2_score, and probability integral transform (PIT) are introduced [43]. The RMSE represents the accuracy of prediction results; the PINAW indicates the certainty of the prediction interval. Their definitions are given in Equations (11) and (12), respectively. The MAPE is complementary to RMSE and reflects the relative accuracy of the prediction, defined as: The R2_score represents the proportion that can be explained by the fitted regression relationship in the overall change trend of the dependent variable, defined as: where y is the mean of the actual value y * . A R2_score close to 1 indicates that the prediction model is more reliable.
The PICP represents the probability that the predicted CI contains the observed values, expressed as: where f i indicates whether the observed value is within the predicted CI. A higher PICP indicates the interval prediction result is more accurate.
The PIT values reflect the correlation between the prediction distribution and the actual distribution, defined as: (20) whereP is the prediction distribution and y * is the observed value. If the distribution of PIT values is similar to the uniform distribution, the prediction reliability is better. The Pareto fronts of optimization results of MOGPR, MOQRLSTM, and MOQRNN are illustrated in Figure 17. The results of MOGPR are generally closer to the origin. The lowest PINAW values reached by the 3 models are similar (about 0.243~0.271), while the lowest RMSE value obtained by MOGPR (about 0.199) is smaller than MOQRLSTM's (about 0.262) and MOQRNN's (about 0.273). As two independent optimization objectives, RMSE and PINAW have the same dimension of quantity and similar value range. In this study, the importance of the two objectives is considered equal. So the result closest to the origin in the Pareto front is selected as the final result of three multiobjective optimization prediction models.
The multi metrics of different models are listed in Table 6, including RMSE, PINAW, PICP, MAPE, R2_score, and cost time. The prediction results are shown in Figure 18, and the quantile-quantile (QQ) plots of PITs are illustrated in Figure 19. The areas between the purple dotted lines in QQ plots represent PITs that follow the uniform distribution at a 5% significance level. The analysis of the results is as follows: (1) The RMSE and the MAPE of MOGPR are the lowest (0.223 and 0.641%), and the R2_score of MOGPR is the highest (0.974). As shown in Figure 18a, MOGPR has a good prediction effect on both long-term trends and local fluctuation of the PDI curve. It proves that the prediction accuracy of MOGPR is the best. (2) The PINAW of LWLR is the lowest (0.080), but the RMSE of LWLR is the highest (0.472). As can be seen from Figure 18g, LWLR cannot effectively forecast the local fluctuation of data. Except for LWLR, the PINAW of MOGPR is lower than other models, which demonstrates that MOGPR has better certainty.
(3) The RMSEs, PINAWs, MAPEs, and R2_scores of multiobjective optimization models (MOGPR, MOQRLSTM, and MOQRNN) are improved compared with original models (GPR, QRLSTM, and QRNN), which indicates that the multiobjective optimization process can effectively improve the performance of the original models. (4) The PICPs of most models are higher than 0.95, apart from LWLR. The PICPs of MOGPR, GPR, and KR achieve 1.0, which means all the actual points fall within the prediction 95% CIs. It also indicates that compared with other criteria, PICP is easier to reach the maximum value, so it is not suitable to be an optimization objective. (5) Because of the process of parameter optimization, the cost time of the three multiobjective optimization prediction models (MOGPR, MOQRLSTM, and MOQRNN) is obviously higher. Among these three models, MOGPR has the fastest calculation speed. Because compared with neural network algorithms, the GPR has fewer parameters. (6) As shown in Figure 19, the PIT values' distribution of MOGPR is closest to the uniform distribution, which is displayed as the red diagonal in the QQ plot. In addition, all PIT points are located within the 5% significance band, which indicates that the prediction CI of MOGPR is reliable. The PITs of the LWLR distribute around 0 or 1, which indicates that the LWLR is most unsuitable for interval prediction of PDI compared with other models. The multi metrics of different models are listed in Table 6, including RMSE, PINAW, PICP, MAPE, R2_score, and cost time. The prediction results are shown in Figure 18, and the quantile-quantile (QQ) plots of PITs are illustrated in Figure 19. The areas between the purple dotted lines in QQ plots represent PITs that follow the uniform distribution at a 5% significance level. The analysis of the results is as follows: (1) The RMSE and the MAPE of MOGPR are the lowest (0.223 and 0.641%), and the R2_score of MOGPR is the highest (0.974). As shown in Figure 18a, MOGPR has a good prediction effect on both long-term trends and local fluctuation of the PDI curve. It proves that the prediction accuracy of MOGPR is the best.
(2) The PINAW of LWLR is the lowest (0.080), but the RMSE of LWLR is the highest (0.472). As can be seen from Figure 18g, LWLR cannot effectively forecast the local fluctuation of data. Except for LWLR, the PINAW of MOGPR is lower than other models, which demonstrates that MOGPR has better certainty.
(3) The RMSEs, PINAWs, MAPEs, and R2_scores of multiobjective optimization models (MOGPR, MOQRLSTM, and MOQRNN) are improved compared with original models (GPR, QRLSTM, and QRNN), which indicates that the multiobjective optimization process can effectively improve the performance of the original models.
(4) The PICPs of most models are higher than 0.95, apart from LWLR. The PICPs of MOGPR, GPR, and KR achieve 1.0, which means all the actual points fall within the prediction 95% CIs. It also indicates that compared with other criteria, PICP is easier to reach the maximum value, so it is not suitable to be an optimization objective.
(5) Because of the process of parameter optimization, the cost time of the three multiobjective optimization prediction models (MOGPR, MOQRLSTM, and MOQRNN) is obviously higher. Among these three models, MOGPR has the fastest calculation speed. Be-

Conclusions
In this study, aiming at the three major practical engineering difficulties of low data quality, complex variable operation conditions, and prediction model parameter optimization, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed. Firstly, to comprehensively reflect the performance of the FTU under complex operation conditions, the running data set is constructed by combining operation condition parameters and monitoring data. Secondly, to reduce the impact of anomaly data, the DBSCAN is adopted to clean both outliers and singulars in the raw running data set. Thirdly, based on the GMM and the probability theory, the HSM is established, which improves the robustness of the evaluation model against data missing and data randomness. Fourthly, the MOGPR is proposed to predict the performance degradation trend with a confidence interval and to automatically optimize model parameters on both accuracy and certainty. Finally, a series of comparison experiments were implemented on practical data set from an actual large FTU.
The experimental results demonstrate that: (1) The data cleaning approach based on DBSCAN performs better in identifying both outliers and singularities. The stability and smoothness of PDI curves are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma. (2) Compared with SVM and BPNN, the HSM based on GMM has better robustness against data loss. (3) The proposed MOGPR has better accuracy, certainty, and reliability of probability prediction. The root-mean-squared error, the prediction interval normalized average, the prediction interval coverage probability, the mean absolute percentage error, and the R2 score of the proposed method achieved 0.223, 0.289, 1.000, 0.641%, and 0.974, respectively.
Thus, the proposed method can be applied to the performance evaluation and degradation trend prediction of FTUs in a practical engineering environment. In further research, if the running data across multiple maintenance periods are available, the corresponding relationship between PDI values and actual state can be established according to maintenance records. On this basis, the multi-level degradation state of the FTU can be divided, and corresponding maintenance strategies can be formulated to provide technical support for state-based maintenance.  In summary, among the eight compared models, MOGPR has better accuracy, certainty, and reliability of probability prediction. Besides, the calculation effectiveness of MOGPR is higher than other multiobjective optimization prediction models.

Conclusions
In this study, aiming at the three major practical engineering difficulties of low data quality, complex variable operation conditions, and prediction model parameter optimization, an ensemble prognostic method of FTUs using low-quality data under variable operation conditions is proposed. Firstly, to comprehensively reflect the performance of the FTU under complex operation conditions, the running data set is constructed by combining operation condition parameters and monitoring data. Secondly, to reduce the impact of anomaly data, the DBSCAN is adopted to clean both outliers and singulars in the raw running data set. Thirdly, based on the GMM and the probability theory, the HSM is established, which improves the robustness of the evaluation model against data missing and data randomness. Fourthly, the MOGPR is proposed to predict the performance degradation trend with a confidence interval and to automatically optimize model parameters on both accuracy and certainty. Finally, a series of comparison experiments were implemented on practical data set from an actual large FTU.
The experimental results demonstrate that: (1) The data cleaning approach based on DBSCAN performs better in identifying both outliers and singularities. The stability and smoothness of PDI curves are improved by 3.2 times and 1.9 times, respectively, by DBSCAN compared with 3-sigma. (2) Compared with SVM and BPNN, the HSM based on GMM has better robustness against data loss. (3) The proposed MOGPR has better accuracy, certainty, and reliability of probability prediction. The root-mean-squared error, the prediction interval normalized average, the prediction interval coverage probability, the mean absolute percentage error, and the R2 score of the proposed method achieved 0.223, 0.289, 1.000, 0.641%, and 0.974, respectively.
Thus, the proposed method can be applied to the performance evaluation and degradation trend prediction of FTUs in a practical engineering environment. In further research, if the running data across multiple maintenance periods are available, the corresponding relationship between PDI values and actual state can be established according to maintenance records. On this basis, the multi-level degradation state of the FTU can be divided, and corresponding maintenance strategies can be formulated to provide technical support for state-based maintenance.