A Data-Driven Framework for Small Hydroelectric Plant Prognosis Using Tsfresh and Machine Learning Survival Models

Maintenance in small hydroelectric plants (SHPs) is essential for securing the expansion of clean energy sources and supplying the energy estimated to be required for the coming years. Identifying failures in SHPs before they happen is crucial for allowing better management of asset maintenance, lowering operating costs, and enabling the expansion of renewable energy sources. Most fault prognosis models proposed thus far for hydroelectric generating units are based on signal decomposition and regression models. In the specific case of SHPs, there is a high occurrence of data being censored, since the operation is not consistently steady and can be repeatedly interrupted due to transmission problems or scarcity of water resources. To overcome this, we propose a two-step, data-driven framework for SHP prognosis based on time series feature engineering and survival modeling. We compared two different strategies for feature engineering: one using higher-order statistics and the other using the Tsfresh algorithm. We adjusted three machine learning survival models—CoxNet, survival random forests, and gradient boosting survival analysis—for estimating the concordance index of these approaches. The best model presented a significant concordance index of 77.44%. We further investigated and discussed the importance of the monitored sensors and the feature extraction aggregations. The kurtosis and variance were the most relevant aggregations in the higher-order statistics domain, while the fast Fourier transform and continuous wavelet transform were the most frequent transformations when using Tsfresh. The most important sensors were related to the temperature at several points, such as the bearing generator, oil hydraulic unit, and turbine radial bushing.


Introduction
The expansion of renewable energy sources is vital for ensuring the energy supply of a fast-paced market growing in the coming decades, with expectations for it to double by 2060 [1]. Clean energy already accounts for three quarters of newly installed capacity annually [1], and those related to water resources are the most-used ones. The building of small hydroelectric plants (SHPs), which accounts for a significant share of this group, has increased worldwide due to the lower initial investment, lower operating costs, and expanding regulation of energy markets. The potential total energy generation capacity of these SHPs is twice the total capacity of the currently installed energy plants [2].
The maintenance of a hydropower plant is a complex task. It demands a specific level of skill to ensure an adequate level of dependability of the asset through its useful life. There are three kinds of maintenance. The first and most basic is corrective maintenance, in which a component is replaced after a failure occurs. The second is preventive maintenance, which estimates the service life of a component and realizes a replacement once the operating lifetime is reached. Finally, there is predictive maintenance, in which the system condition is assessed from data periodically or continually acquired from various sensors [3,4]. A predictive or condition-based maintenance system consists of two stages. The first stage is the diagnosis, which incorporates fault detection or anomalous operating conditions, fault isolation by subcomponents, and identification of the character and degree of the failure [4]. The second stage is the prognosis, which involves using statistical and machine learning models in order to calculate the use life of the assets and the confidence interval of the estimation [5], foresee maintenance, and increase the dependability and availability of the generation units.
Many data-driven models have been proposed for fault detection and diagnosis in hydroelectric plants. These models include principal component analysis (PCA) [6], independent component analysis (ICA) [7], and a least square support vector machine [8][9][10][11]. PCA decomposition is used to assist specialists in determining and selecting the principal features which contribute to cavitation in hydro-turbines [12]. Current studies have presented a new monitoring method based on ICA-PCA that can extract both non-Gaussian and Gaussian information from operating data for fault detection and diagnosis [13]. This ICA-PCA method has been expanded with the adoption of a nonlinear kernel transformation prior to the application of the decomposition method, which has become known as kernel ICA-PCA [14]. Zhu et al. applied this method in the hydropower generation context with increased success rates and lower fault detection delays than either the PCA or ICA-PCA applications. While most models rely on signal processing, De Souza Gomes et al. proposed functional analysis and computational intelligence models for fault classification in power transmission lines [15]. Santis and Costa proposed the application of isolation iForest for small hydroelectric monitoring, where iForest isolates anomalous sensor readings by creating a health index based on the average distance of the points to the tree root [16]. Hara et al. extended iForest's performance by implementing a preliminary step of feature selection using the Hilbert-Schmidt independence criterion [17]. It is worth emphasizing that in addition to data-driven models, there is the application of analytic model-based methods, which have been presenting significant design results in the context of fault diagnosis in power systems, such as in [18].
For prognoses, the techniques generally applied for estimating the use life are classified into statistical techniques, comprising regression techniques [19], Wiener-, Gamma-, and Markovian-based processes such as machine learning techniques, comprising neural networks, vector support machines, and electrical signature analysis [20], and principal component analysis [21], as well as deep learning techniques more recently, comprising auto-encoder, recurrent, and convolutional neural networks [22].
Reports on the prognoses of hydroelectric generating units are scarcer than publications related to their diagnosis [23]. A great challenge in the area is proposing procedures that contemplate faults between different generating units and auxiliary interconnected systems [23]. An et al. presented a prognosis model based on the application of Shepard's interpolation of three variables: bearing vibration, apparent power, and working head [24]. The signal is decomposed by applying intrinsic time-scale decomposition to a limited number of rotating components, and the artificial neural network is trained for each of the temporal components of the signal. Thereafter, the models present a similar framework, with varied individual methods for signal decomposition and regression models. Fu et al. applied variational mode decomposition for signal decomposition and a least square support vector machine regression model fine-tuned using an adaptive sine cosine algorithm [8,25]. Zhou et al. combined a feature strategy using empirical wavelet decomposition for decomposing and Gram-Schmidt orthogonal process feature selection combined with kernel extreme learning machine regression [26].
Since feature extraction is a key factor in the success of data-driven diagnosis and prognosis systems, the Time Series Feature Extraction Based on Scalable Hypothesis Tests (TSFRESH, or TSF for short) algorithm has gained prominent attention in the literature, leading to better results than physical and statistical features alone [27]. The algorithm is capable of generating hundreds of new features while reducing collinearity through its hypothesis test-integrated selection procedure. Tan et al. adopted TSF along with a probability-based forest for bearing diagnosis [28]. A two-stage feature learning approach combining TSF and a multi-layer perceptron classifier was adopted for anomaly detection in machinery processes by Tnani et al. [44] and for earthquake detection by Khan et al. [29].
Finally, the random survival forest (RSF) is a survival analysis model that has recently been adapted for data-driven maintenance prognosis systems. Voronov et al. proposed the application of RSF for heavy vehicle battery prognosis [30], an important part of the electrical system and mostly affected by lead-acid during the engine starting. Gurung adopted the RSF along with histogram data for interpretive modeling and prediction of the remaining survival time of components of heavy-duty trucks, aiming to improve operation and maintenance processes [31]. Snider and McBean proposed an RSF-based model for the water main pipe replacement model, expecting savings of USD 26 million, or 14% [32] of the total cost of the ductile iron pipe, over the next 50 years.
In this context, the present paper innovates by proposing a framework for the prognosis of hydroelectric plants, based on the TSF feature extraction and selection algorithm and survival analysis models. The authors did not find any evidence or study that has adopted a similar approach in the literature thus far. We compare the different strategies of feature engineering associated with three survival model analyses, evaluating the models using the concordance index metric. The main findings and contributions of the current paper are the following: • The proposal of a data-oriented framework including feature engineering strategies and machine learning survival models for intelligent fault diagnosis of the SHP generating unit; • Evaluation of the importance of attributes using the permutation importance method associated with the RSF survival model; • Affirmation that the RSF survival analysis model associated with the TSF feature engineering hybrid model obtained the highest concordance index score (77.44%).
The remainder of the present article is organized as follows. Section 2 defines the study methodology, describing the methods, algorithms, and dataset applied. Section 3 presents the results and discussions of the simulations of the models, in addition to the outputs of the feature engineering strategies and survival analysis models, with illustrative examples of those models' inference. Finally, Section 4 presents the conclusions and recommendations for future work.

Problem Formulation
The prognosis problem was formulated as an inference problem based on historical data, specialist knowledge, external factors, and future usage profiles. Prognosis is a condition-based maintenance (CBM) practice widely applied to reduce costs incurred during inefficient schedule-based maintenance. In mechanical systems, the repetitive stresses from rotating machinery vibration temperature cycles leads to structural failures. Since mechanical parts commonly move slowly to a critical level, monitoring the growth of these failures permits evaluating the degradation and estimating the remaining component life over a period of time [33].
The current study was developed in Ado Popinhak, an SHP situated in the southern region of Brazil. With an installed capacity of 22.6 MW, the plant supplies energy to 50,000 residences. Monitoring data from the main single hydro generator unit were registered every 5 min, and the study period was from 13 August 2018 to 9 August 2019. Table 1 describes the number of runs by the generators contained in the dataset, the number of runs that ended due to failure, the average cycle time per run, and the longest cycle time.
The objective was to predict the remaining useful life (RUL) of a power system based on multiple component-level sensor and event data. The RUL information allows deci-sion makers to better plan maintenance and interventions, improving availability and reducing costs. Raw sensor reading data and event data, such as interventions, shutdowns, and planned and corrective maintenance, were curated and merged. The data registered were classified and split into runs, which are periods from the moment the generating unit is turned on until it is shut down, whether due to failure or not. Runs that ended because of failure were labeled with a true or false failure label. The time distribution plot until the end of the runs that ended in failure and those that were interrupted for another reason is shown in Figure 1. The nature of the problem is interpreted as a problem of survival analysis, given that the system does not run to failure and can be shut down due to a lack of water resources for generation, failures in the transmission system, or the execution of scheduled maintenance. A summary of the characteristics of the problem and dataset is as follows: • Data were collected for four generator units of the same manufacturer, model, and age; • Fifty-four variables were monitored, and readings were registered in the transactional database every 5 min; • Data were heterogeneous, including either control settings or monitored variables, both numerical and categorical; • Missing data represented around 5% of total registrations, mostly caused by loss of a network connection between the remote plants and the operations center; • The runs in which the subsequent state was a forced stop were labeled, and the last reading was registered as the logged time of failure; • There were many runs where no failure was registered during the time (i.e., data were right-censored).
The runs were considered independent, given that the systems could be turned off for a long time and undergo modifications, such as routine maintenance, during this period, and because machine start-up is the biggest cause of system deterioration. For this reason, the Kaplan-Meier model was adjusted and presented in order to describe the survival function of each run of the four generators in Figure 2.  The data transformation workflow is described in Figure 3. Sensor and event data were collected from the transactional database of telemetric systems and stored in text files. In the data-wrangling phase, the record tables were parsed and joined with the event tables, and the records were resampled into 5 min periods. While still in this stage, the imputation of the missing data and the classification of the runs were made if they ended due to failure or programmed shutdowns (censoring).
In the feature extraction and selection step, the features were extracted from each of the time series of the sensors during the first 30 5 min time units (150 min of operation) using the feature engineering strategies described in Section 3.1. The fixed period of the first 30 cycle times was selected from each run to extract features and adjust the survival models. Runs with a cycle time of fewer than 30 s were excluded from the training base. This approach was adopted to avoid data leakage in model training, where size-related features can contribute to models readily predicting the estimated total time to failure. These features were recorded in a text file and zipped due to the size of the generated tables.
In the next step, the runs were randomly divided into training and test sets, using proportions of 90% of the runs for training and 10% for testing. In each of the simulations, the partitioning was performed using a different random seed. The models were fitted to the training set, and metrics were calculated on the training set. The computational time was calculated for each fit of the models and saved for later analysis. Finally, the model metrics were compared using a set of statistical tests in order to identify if there was a difference in the average scores for different groups of models or feature strategies.

Higher-Order Statistics (HOS)
Higher-order statistics (HOS) have been applied in different fields which require separation and characterization of non-Gaussian signals against a Gaussian background. Moments and cumulants are widely used to quantify certain probability distributions, such as location (first moment) and scale (second moment). Several authors have used HOS in signal processing. For example, De La Rosa and Muñoz reported the application of higher-order cumulants via signal processing using HOS for early detection of subterranean termites [34], while Nemer et al. presented an algorithm for robust voice activity based on third-and fourth-order cumulants of speech [35].
Let X = [x(t)], t = 0, 1, 2, 3, · · · be a real stationary discrete-time signal and its moments up to order p exist. Then, its pth-order moment can be given by [35,36] depending solely on the time differences τ 1 , τ 2 , . . . , τ p−1 for all i. E(.) represents the statistical expectation for a deterministic signal. If the signal has zero mean as well, then its cumulant functions are given by [35] second-order cumulant: third-order cumulant: fourth-order cumulant: By setting all the lags to zero in the cumulant expressions and normalizing the input data to have a unity variance, we obtained the variance, normalized skewness, and normalized kurtosis: variance: normalized kurtosis: The skewness indicates to which side of the distribution the data are concentrated for unimodal distributions, so a positive skew indicates that the tail is to the right, and a negative skew indicates that it is to the left. The kurtosis is usually associated with the measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means that more of the variance is due to infrequent extreme deviations, as opposed to frequent, modestly sized deviations. The first four moments were calculated for each of the runs in order to extract the basic descriptive variables of the sensor signals:

Tsfresh (TSF)
TSF is an algorithm presented for time series feature engineering, which accelerates this procedure by combining 63 time series characterization methods. Features are chosen based on automatically configured hypotheses [37].
Given a set of time series , each time series X i is mapped into a feature space with a problem-specific dimensionality M and feature vector − → x i = (x i,1 , x i,w , · · · , x i,M ). The feature vector − → x i is built by applying time series characterization methods f j : X i → x i,j to the respective time series X i , which results in the feature vector [37] − → The feature vector might be extended by additional univariate attributes {a i,1 , a i,2 , · · · , a i,U } N i=1 and feature vectors from other kinds of time series. For a machine learning system with K different time series and U univariate variables per sample i, the resulting design matrix would have i rows and (K · M + U) columns [37].
From the set of 63 characterization methods f j available in the algorithm, we illustrate two of the most important ones based on our feature analysis, which are the fast Fourier transform (FFT) and the continuous wavelet transform (CWT). Both methods are timefrequency decomposition methods often applied in signal analysis.
The discrete Fourier transform (DFT) is a signal decomposition technique adequate for discrete and periodic signals. Let a signal a n for n = 0, . . . , N − 1 and a n = a n+jN for all n and j. The discrete Fourier transform of a, also known as the spectrum of a, is described by [38] where W N = e −i 2π N and W k N are called the Nth roots of unity. The sequence A k is the DFT of the sequence a n , where each is a sequence of N complex numbers. The FFT is a fast algorithm for computing the DFT into log 2 N states, each of which consists of fewer computations [38].
The CWT of a signal a with the wavelet ψ is defined as [39] W ψ a(s, t) where the scale s is inversely proportional to the central frequency of the rescaled wavelet ψ s (x) = ψx/s, which is a bandpass, and t represents the time location of the signal analysis. The larger the scale s, the wider the analyzing function ψ(x), and therefore the smaller the corresponding examined frequency. The main advantage over the Fourier transform methods is that the frequency description is localized in time, and that window size varies. It gives more flexibility and effectiveness than fixed-size analysis since low frequencies can be analyzed over wide time windows, while high frequencies can be analyzed over narrow time windows [39]. Finally, the feature selection of TSF is used to filter out irrelevant features based on automated statistical hypothesis tests [37]. Feature selection is crucial to reducing the number of variables, which increases generalization and prevents overfitting, in addition to bringing speed gains and less complexity to the estimator [40].

Evaluation Metrics
The most employed evaluation metric of survival models is the concordance index (C-index or C-statistic) [41]. It reflects a model's capacity of ranking the survival times based on the individual risk scores, and it can be expressed by the formula [42] C where η i is the risk score of a unit, 1 T j <T i = 1 if T j < T i and is otherwise 0, and 1 η j >η i = 1 if η j > η i . A C-index score equivalent to 1 corresponds to a perfect model estimator, while a C-index score of 0.5 represents a random estimator [42]. The C-index score can compare pairs in which the predictions and outputs are concordant, which means that the one with a higher risk score has a shorter actual survival time. If two instances experience an event at different times, or if one experiences an event and is outlasted by the other, we say that they are comparable. In contrast, a pair is said to not be comparable when they experience events at the same time [41,42].

CoxNet (CN)
The Cox proportional hazard (CPH) assumes that the hazard is proportional to the instantaneous probability of an event at a particular time. In this case, the effect of the covariates is multiplying the hazard function by a function of the exploratory covariates. This means that two units of observation have a ratio of the constant of their hazards, and it depends on their covariate values [43].
Let Xi = (X i1 , . . . , X ip ) be the realized values of the covariates for a subject i. The hazard function for the CPH model is described by [44] where λ(t|X i ) is the hazard function at time t for subject i with a covariate vector X)i, λ 0 (t) is the baseline hazard, and β i represents the effect parameters. The CPH model is especially interpretive since the regression coefficients represent the hazard ratio, providing useful insights into the problem. However, in applications with a large set of features, the standard CPH fails due to the fact that the model convergence relies on inverting the matrix that becomes non-singular due to correlation among features [45].
The CoxNet (CN) overcomes these problems by implementing an Elastic Net regression with a weighted combination of the l 1 and l 2 penalty by solving [45] arg max where PL is the partial likelihood function of the Cox model, β 1 , . . . , β p are the coefficients for p features, α ≥ 0 is a hyperparameter that controls the amount of shrinkage, and r ∈ [0; 1[ is the relative weight of the l1 and l2 penalty. The l 1 penalty helps the model select only a subset of features, while l 2 leads to better stability through regularization. In this paper, we adopted the default value proposed for r = 0.5 and an automatic procedure for selecting α ≥ 0.01.

Random Survival Forest (RSF)
The random survival forest (RSF) model is an adaption of the random survival regressor for the analysis of right-censored survival data. The main components of the RSF algorithm are the growing of the survival trees and the forming of the ensemble cumulative hazard function. Survival trees are binary trees grown by the recursive splitting of tree nodes using a predetermined survival criterion. The splitting into nodes maximizes the survival distinction between the nodes, and eventually, each node of the tree becomes homogeneous and populated by cases with similar survival. Once training is complete, the cumulative hazard function estimation λ(t|X i ) of each survival tree is described by the function [46] λ where d l,h is the number of failures and Y l,h is the operation run at risk at time t l,h . The ensemble cumulative function estimation λ * e (t|X i ) is the simple average of the M base estimators and is given by [46] λ where λ * b (t|X i ) represents the cumulative hazard function estimation of jth survival trees in the ensemble and M is the total number of survival trees. The base estimator hyperparameters were chosen from the convergence analysis performed on our data (shown in Figure 4). We used the parameter of the number of base estimators M = 100, as it was a value close to the smallest error observed. For the minimum value of the samples in each node, the value adopted was 15 samples, selected because it presented the lowest error for ensembles with 100 trees.

Gradient Boosting Survival Analysis (GBS)
The gradient boosting survival analysis (GBS) model was constructed using the gradient boosting framework for optimizing a specified loss function. The model was built on the principle of additively combining the predictions of multiple base learners into a powerful overall model [47]. GBS is an ensemble model similar to RSF, since it relies on multiple base learners to produce an overall prediction. The main difference between the two approaches is that while RSF independently fits the base learners and averages their predictions, the GBS model is assembled sequentially in a greedy, stage-wise manner. The GBS overall additive model f can be described by [47] where M > 0 represents the number of base learners, β M is the weighting term, the function g refers to a base learner outcome, and θ is the parameterized vector.  The loss function set for GBS is the partial likelihood loss of the CPH model. Therefore, the model maximizes the log partial likelihood function with the additive model f(X) such that [48] arg min The base estimator of GBS, as in the RSF model, is the survival tree. In this way, we adopted the same control parameters for the estimator number M and a minimum number of samples in each node for both models.
The specifications of the hardware used to perform the simulation were as follows: CPU Intel Core i9 2.30 GHz, 16 GB of RAM installed, and the macOS v.12.5 operating system. The approximate amount of time necessary to perform the data preparation, feature selection, and all 100 simulations was around three hours (one hour for feature extraction and two hours for simulation) without any parallelization. All scripts are available from the researcher's public repository (Github repository: https://github.com/rodrigosantis1 /shp_prognosis accessed on 1 December 2022) for reproducibility and replicability. Data have not been made publicly available by the SHP but can be shared upon request. Figure 5 shows the C-index scores calculated for each of the 100 randomized simulations with different training and testing sets. This visualization format provides better understanding of the metric distribution of each of the CN, RSF, and GBS survival analysis models when combined with HOS and TSF feature engineering. From the box plot analysis, we observed that the HOS-CN group obtained the lowest accuracy, while the TSF-RSF and TSF-GBS groups obtained the highest accuracies. The variance of CN was higher than those for the other survival models, especially when adopted with TSF feature engineering.

Simulation Results
There were a few outliers in all models which were mostly in the lower bound, indicating possible convergence problems. A suggestion for both improving the variances and reducing outliers is to adopt a model selection schema for tuning and adjusting the models. The TSF-RSF and TSF-GBA groups presented close distribution in terms of both median and variance. In general, most of the RSF and GBS groups presented close variance. Table 2 presents the average and standard deviation of the C-index score and fitting time, highlighting in bold the model with the highest score and the one with the lowest fitting time. The nonlinear models RSF and GSA, which require more computational time for training, achieved better accuracy scores than the linear model with regularization (CN). This trade-off between accuracy and computational time is expected in machine learning applications. When comparing RSF and GBS, RSF required up to 10 times more fitting time than GBS. However, it is worth mentioning that RSF, a bagging ensemble, can be more easily parallelized than GBS, a boosting ensemble. The fitting time difference between the TSF-CN and the nonlinear models was significant, requiring more than 1000 times less time than TSF-RSF for training.  Table 3 presents the total time necessary to execute both feature engineering strategies. As this is a step preceding model adjustment, it is worth considering its time when evaluating the models.
The computational time required to extract and select attributes using the TSF method was about 20 times greater than the time required using HOS. This is a significant difference that must be taken into account, especially for real-time applications of the prognosis model. However, it is interesting to point out that the TSF library offers the possibility of implementing cluster parallel computing. Furthermore, the time required for inference was lower, given that only the features previously selected by the feature hypothesis tests and applied in the model training needed to be calculated. One-way ANOVA [57] was applied to test the null hypothesis that the groups had the same mean C-index score. Table 4 displays the F S statistics, which represent the ratio of the variance among score means divided by the average variance within groups, and the p value calculated for the statistics. By adopting a confidence level of 0.95, we rejected the hypothesis that the score was equal between all groups since the p-value was lower than al pha = 0.05. Normality was checked using a Q-Q plot. The homogeneity of the variance when checking the ratio of the largest to the smallest sample standard deviations was less than 2 (1.44). Table 4. One-way ANOVA F S statistics and P value for the null hypothesis of scores being equal for all groups. The hypothesis H 0 : µ 1 = µ 2 = µ 3 = · · · = µ 6 was rejected at the significance level of 1 − α > 0.95. Sequentially, a pairwise Tukey test [58] was applied, and the results are presented in Table 5. The pair of groups in which the mean difference of the scores was not significant at the 0.95 level is highlighted in bold. The results show that the mean score metrics of the HOS-GBS, TSF-GBS, and TSF-RSF groups were statistically different. These results indicate that, from our experimentation, it is not possible to verify significant differences among the scores in these models. A reasonable model for the dataset we simulated was the HOS-GBS group, since it presented the least computational time for both preprocessing and fitting and was among the top three models.

Feature Importance Analysis
Feature importance was evaluated using the permutation importance method, which measures how the score decreases when a feature is not available [59]. The score adopted for evaluation was the C-index, the base estimator was the RSF model, and the number of permutation iterations was equal to 15. Table 6 presents the 20 most important features of the HOS-RSF combination, detailed by the sensor and the statistic used for aggregation into the feature used to train the RSF model.
The features associated with the speed registered in the speed regulator and the temperature of the oil were the most important features, contributing to an average increase of 2.8% in the C-index score. The features related to the coupled side bearing temperature of the generator were the most frequent ones (4/20), the oil temperature of the hydraulic unit was the second-most-frequent feature (3/20), and the uncoupled side bearing temperature was the third-most-frequent feature (2/20). When analyzing the type of aggregation, kurtosis and variance were the most frequent types (6/20), and skewness and average were the least frequent types (4/20), although there was a balance among all four types. Table 7 presents the top 20 most important features of the TSF-RSF combination, detailed by the sensor and the type of feature extraction technique applied.
The most important features were the CWT coefficient of the radial bushing temperature in the turbine and the absolute energy of the voltage in the bar, which contributed to an increase of 1.1% in the C-index score. Since more features were extracted using the TSF strategy than the HOS, it was expected that the weight of each individual feature would be lower. The features related to the radial bushing temperature in the turbine were most frequent (4/20), followed by those related to the coupled bearing temperature in the generator (2/20) and the bar voltage (2/20). The features originating from CWT were most frequent (9/20), followed by the FFT (3/20). The dominance of the CWT and FFT indicates the importance and efficiency of the time-frequency decomposition methods in this type of application. It is important to note that the TSF algorithm includes the statistical aggregations of kurtosis, skewness, mean, and variance from the HOS feature engineering strategy. Additionally, none of these aggregations were present in the 20 most important attributes after the inclusion of more complex features, such as the FFT and CWT.

Model Application Analysis
In this section, we present a deeper look at the model which presented the highest mean score in the simulation: TSF-RSF. The C-index of the model was 0.8139. It is worth noting that the maximum value for the C-index is 1, which indicates the order of observed events followed the same order as all predicted events, and a C-index value of 0.5 indicates the prediction was no better than a random guess [32]. For comparative purposes, the application of the RSF method on the remaining service life of water mains obtained a C-index of 0.88 [32], while for modeling the disruption durations of a subway service, the metric was 0.672 [60]. Figure 6 presents the reliability, and Figure 7 presents the cumulative hazard function plots predicted by the model for 20 instances randomly selected from the test set. When analyzing the representations, we can identify three operation cycles with a reliability pitfall in the earliest minutes of operation. These indicate some cases in which there was an intrinsic problem in the generator-turbine system prior to or during start-up, and those systems must be stopped as soon as possible for maintenance. There was a second group of four instances in which the reliability dropped by half in the first 1000 5 min time units. This behavior might be related to some operating conditions that were observed in the operation of the machine. Finally, there was a third group containing the other instances with a steadier rhythm of reliability decay, in which more than half of the systems were expected to fail after 2000 5 min time units.  sample 1  sample 2  sample 3  sample 4  sample 5  sample 6  sample 7  sample 8  sample 9  sample 10  sample 11  sample 12  sample 13  sample 14  sample 15  sample 16  sample 17  sample 18  sample 19 sample 20 In practical applications, the survival model can be used to evaluate both the current and previous runs of a generator unit, returning both the risks and the expected remaining useful life. Maintenance teams may want to keep all their systems with a reliability function closer to the third group described before, especially right before the rainy periods. During these periods, the generation is higher, and the stopped periods are rarer, making it more difficult and less desirable to execute maintenance procedures on the machines, which may lead to a loss in power generation.
The model can also be extended for a prescriptive perspective combined with the parameters of the start-up process, aiming to optimize the start-up process in order to achieve the highest reliability level possible. With this, a longer lifetime of the assets and greater time between failures can be expected.

Conclusions
In the present paper, we presented a structured modeling pipeline for survival analysis and remaining useful life estimation in a small hydroelectric plant in CBM. The available period of operations was approximately 1 year, and the 54 variables were monitored in 4 generating units of the same model and manufacturer. The HOS-GBS, TSF-RSF, and TSF-GBS models presented the highest C-index scores in our simulation. All three are suitable for production deployment.
Identifying failures before they happen is crucial for allowing better management of asset maintenance, lowering operating costs, and in the case of SHPs, promoting the expansion of renewable energy sources in the energy matrix [61]. Applying time series feature engineering and machine learning survival models, such as a framework, aims to enhance the health of the equipment and decrease power generation downtime.
Looking at variable importance, variance and kurtosis represented the most frequent transformation functions in HOS feature engineering, while the FFT and CWT were the most frequent transformations in TSF feature engineering. The sensors that contributed the most to the model accuracy were the generator bearing temperature, hydraulic unit oil temperature, and turbine radial bushing temperature. The data-driven framework presents generalities, and thus it can be reused to model generator units with different types of sensors.
Future studies should examine feature and model selection through exhaustive searching and Bayesian or evolutionary optimization, as the parameters were manually adjusted. Fine-tuning the models can contribute even more to improving the model accuracy. From the point of the modeling assumptions, runs are set to be independent, but features can be crafted to include times from other runs and from the last imperfect or perfect repairs. Additionally, the predictive model opens a path for prescriptive optimization of the ma-chine operation parameters, aiming to minimize wear, operational wear, and risk over time. Reinforcement learning approaches are a prominent course of action, since they have been adopted for dynamically developing maintenance policies for multi-component systems such as the power system of our object of study [62].
Finally, the present study contributes to the advancement of SHP maintenance, a crucial renewable power resource with enormous potential for supplying energy worldwide. By determining the faults before failure, management can carry out actions to avoid additional damage caused to combined systems and additional aggravation of the components, thus reducing the operating costs of power plants.  Data Availability Statement: Data have not been made publicly available by the SHP but can be shared upon request.

Acknowledgments:
The authors would like to thank Ado Popinhak for making available the failure monitoring database used in this study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: