UBO-EREX: Uncertainty Bayesian-Optimized Extreme Recurrent EXpansion for Degradation Assessment of Wind Turbine Bearings

: Maintenance planning is crucial for e ﬃ cient operation of wind turbines, particularly in harsh conditions where degradation of critical components, such as bearings, can lead to costly downtimes and safety threats. In this context, prognostics of degradation play a vital role, enabling timely interventions to prevent failures and optimize maintenance schedules. Learning systems-based vibration analysis of bearings stands out as one of the primary methods for assessing wind turbine health. However, data complexity and challenging conditions pose signi ﬁ cant challenges to accurate degradation assessment. This paper proposes a novel approach, Uncertainty Bayesian-Op-timized E x treme Recurrent EXpansion (UBO-EREX), which combines Extreme Learning Machines (ELM), a lightweight neural network, with Recurrent Expansion algorithms, a recently advanced representation learning technique. The UBO-EREX algorithm leverages Bayesian optimization to optimize its parameters, targeting uncertainty as an objective function to be minimized. We conducted a comprehensive study comparing UBO-EREX with basic ELM and a set of time-series adaptive deep learners, all optimized using Bayesian optimization with prediction errors as the main objective. Our results demonstrate the superior performance of UBO-EREX in terms of approximation and generalization. Speci ﬁ cally, UBO-EREX shows improvements of approximately 5.1460 (cid:3399) 2.1338% in the coe ﬃ cient of determination of generalization over deep learners and 5.7056% over ELM, respectively. Moreover, the objective search time is signi ﬁ cantly reduced with UBO-EREX with 99.7884 (cid:3399) 0.2404% over deep learners, highlighting its e ﬀ ectiveness in real-time degradation assessment of wind turbine bearings. Overall, our ﬁ ndings underscore the signi ﬁ cance of incorporating uncertainty-aware UBO-EREX in predictive maintenance strategies for wind turbines, o ﬀ ering enhanced accuracy, e ﬃ ciency, and robustness in degradation assessment.


Introduction
Wind turbines play a pivotal role in the renewable energy landscape, offering a sustainable solution to power generation [1,2].However, ensuring the reliable operation of wind turbines is crucial for maximizing energy output and minimizing maintenance costs [3,4].Among the various components of wind turbines, bearings are particularly susceptible to degradation, which can lead to costly downtimes and safety risks [5,6].Therefore, effective prognostics of bearing health are essential for predictive maintenance and planning, enabling timely interventions to prevent failures and optimize operational efficiency [7][8][9].While techniques such as deep learning have demonstrated promise in analyzing complex vibration data to detect early signs of deterioration in wind turbine bearings, the field still faces persistent challenges and research gaps.Despite advancements in prognostics and predictive maintenance techniques, recent state-of-the-art works highlight the existence of several unresolved challenges in assessing wind turbine bearing degradation.In this section, we provide an overview of notable works and their contributions, culminating in an exploration of overarching research gaps.
For instance, authors in [10] developed a sophisticated prognostic method for bearing health management by introducing a new health indicator that leverages multi-scale, distribution-similarity-based features, optimized using a multi-objective grasshopper optimization algorithm.This targets data quality in the life cycle record using metrics such as robustness, monotonicity, trendability, and prognosability.This health indicator is integrated with a Gated Recurrent Unit (GRU) network that adaptively determines hyperparameters to predict the remaining useful life (RUL) of bearings using the same optimization process for hyperparameters.Although the process of enhancing data quality and modeling effectively handles high-dimensional data and complex relationships within the data, it remains computationally complex due to its multi-layered optimization and feature fusion processes.Moreover, the study does not specifically target uncertainty quantification within the predictive model, focusing more on accuracy and robustness against noise than explicit uncertainty management.The authors of [11] developed a novel prognostic strategy for predicting the RUL of rolling element bearings, focusing on integrating robust anomaly detection and multi-step estimation techniques.They employed support vector data description for anomaly detection in noisy data and moving horizon estimation for multi-step estimation, allowing for consideration of multiple previous states rather than just the immediate past.This approach was enhanced by extracting advanced entropy and sparsity-based health indicators from signals filtered across different frequency bands, with the most predictive health indicator selected based on a specific criterion.The methodology aimed to improve the accuracy and reliability of RUL predictions, addressing complexities in both data handling and model formulation.By focusing on improving data quality and employing simpler learning methods like support vector data description instead of more complex deep learning architectures, the authors effectively reduced modeling complexity.However, explicit quantification of uncertainty was not specifically targeted in this study.The authors in [12] developed a graph domain adaptation method for predicting the RUL of rolling bearings.They constructed a dynamic model to simulate bearing degradation and generate extensive data, which was used to train a multi-layered cross-domain gated graph convolutional network.The designed network model enhances the ability to discern graph domain differences and adapt features from twin data to real data, optimizing prediction accuracy.However, the authors did not explicitly address uncertainty quantification within their predictive modeling framework.This omission could be a limitation, as uncertainty quantification is crucial in reliability engineering for understanding the variability in data and model outputs and for making informed decisions under uncertainty.Regarding model complexity, the use of advanced methods might lead to a model that is computationally expensive and requires substantial computational resources for training and inference.Additionally, the complexity might make the model more challenging to interpret and maintain, potentially limiting its applicability in scenarios where simpler models might suffice or where computational resources are constrained.The authors in [13] developed a parallel neural network architecture designed to estimate the RUL of rolling element bearings.This architecture integrates parallel processing pathways with advanced deep learning techniques such as time transformers and convolutional long-short-term memory networks.To address data complexity, the researchers implemented a variational stride temporal window strategy that dynamically adjusts data extraction based on the degradation stage of the components.This strategy, along with the parallel network, ensures that large volumes of data can be processed simultaneously with less information loss.While several techniques, such as positional encoding and self-attention mechanisms, are detailed, this does not explicitly discuss uncertainty quantification in the context of RUL predictions.A limitation or area for future work in similar studies could focus on the explicit quantification of uncertainty in predictions to enhance the reliability and robustness of the predictive models used in industrial applications.Another area could involve simplifying the architecture to reduce computational demands while maintaining high accuracy levels.In [14], the authors developed a method to predict the RUL of rolling bearings by integrating a framework that combines multi-domain mixed features with a temporal convolutional network.For effective handling of complex and noisy data, they utilized the dung beetle algorithm to optimize the variational mode decomposition method, enabling superior noise reduction.This optimization was crucial for enhancing the quality of input data through improved feature extraction, which included time-domain, frequency-domain, and entropy features.Additionally, the model complexity was addressed by incorporating a multi-head attention mechanism and a bidirectional gated recurrent unit into the temporal convolutional network, enhancing its ability to process and predict complex datasets.Although the study extensively tackled noise and feature extraction complexities, it did not explicitly address uncertainty quantification within the predictions.The optimized approach ensured effective handling of data and model complexities, facilitating accurate RUL predictions without reducing the deep learning architecture's complexity.In [15], the authors present a framework that utilizes regression models to accurately forecast the RUL of bearings.The models are trained using operational data, which is collected via a supervisory control and data collection system.The system begins by carefully filtering the data and then constructs a deterioration profile by analyzing the behavior of temperature time series.Furthermore, it utilizes a cross-validation technique to tackle the issue of limited data, hence improving the reliability of the model by using subsets of data from other turbines that are accessible.Multiple models were created with an average estimation of the RUL of 20 days.The work presented in [16] presents a mechanism that can accurately detect faults in the inner race of bearings and predict their RUL under various conditions.The model combines time and frequency-domain vibration signal analysis to extract characteristics, leverages a stacked variational denoising autoencoder to create a health indicator, and employs a bidirectional long-short-term memory neural network to forecast the remaining useful lifetime of the bearings.
Overall, these algorithms contribute to common perspectives, as all of these works address data complexity at the primary stage with a specific set of data processing techniques.When data complexity is reduced as expected, the complex architecture of deep learning, with multiple layers, parallel structures, and multiple nonlinear abstractions, comes into play.Generally, the problem of hyperparameters is addressed most of the time via optimization algorithms.On the other hand, these works leave behind some important research gaps, providing an important opportunity for new research contributions.The complexity of deep learning algorithms poses a significant barrier, including the following:  Extensive computational resources and expertise for implementation and optimization;  The computational time associated with deep learning models can be prohibitive, particularly for real-time applications where timely decision-making is crucial;  The inherent complexity of vibration data collected from wind turbines, coupled with the uncertainties introduced by harsh environmental conditions, further exacerbates the challenge of accurate degradation assessment.
In this context, our contributions aim to address the aforementioned challenges by proposing a novel approach, Uncertainty Bayesian-Optimized Extreme Recurrent EXpansion (UBO-EREX), for wind turbine bearing degradation assessment.Similar to previous works, after exposing data to a well-designed data preprocessing pipeline, including denoising, extraction, and outlier removal, a new learning scheme comes into play.Our approach combines the strengths of ELM, a lightweight neural network, with recurrent expansion algorithms, a recently advanced representation learning technique [17,18].By leveraging Bayesian optimization, we optimize the parameters of the UBO-EREX algorithm, with a focus on minimizing uncertainty as the objective function [19].Our solution offers several key advantages over existing approaches:  UBO-EREX provides a more computationally efficient alternative to traditional deep learning models, enabling faster model training and inference. REX also integrates principal component analysis, controlled by specific varianceretained ratio hyperparameters, allowing for the reduction of REX mapping size and optimization of learning performance.


By targeting uncertainty in the optimization process, UBO-EREX enhances the robustness and reliability of degradation assessment, particularly in the face of data complexity and environmental uncertainties. Additionally, our approach simplifies the model architecture while improving approximation and generalization performance, making it suitable for real-time applications in wind turbine maintenance and planning.
While it is true that many works combine ELM theories or machine learning methods in general with Bayesian optimization, such as those found in [20][21][22], this work, to the best of our knowledge, is the first to combine Bayesian optimization with ELM specifically for the purpose of uncertainty reduction, not just for enhancing generalization capability.This distinct focus on uncertainty reduction sets this current research apart from previous studies.Additionally, although some scientists familiar with the field might consider this combination a traditional contribution, our work introduces a significant novelty by integrating both ELM and Bayesian optimization with the innovative learning rules of Recurrent EXpansion (REX) [17].REX is a cutting-edge technique currently in its early development stage, and its combination with the UBO-EREX model makes this approach particularly novel and unique.The integration of REX involves iterative learning, where the model not only learns from additional mappings of labels but also enhances its understanding of input-label interactions over multiple rounds.This iterative process significantly improves the model's approximation and generalization capabilities.
Within the REX framework, the utilization of PCA for dimensionality reduction enhances the learning process by effectively managing the large number of hidden layers.This methodology stands out due to its comprehensive approach to uncertainty quantification.Through the incorporation of confidence intervals and the utilization of metrics like stability, coverage probability, and interval width, our model offers a robust evaluation of prediction uncertainty.This meticulous attention to uncertainty quantification represents a significant advancement over traditional ELM implementations, which typically prioritize generalization without adequately addressing prediction confidence.Bayesian optimization was employed, with careful consideration given to defining the hyperparameter space and ensuring convergence within computational constraints.
For a more in-depth view of our methodology, the flowchart in Figure 1 provides a concise summary of our contributions in order.Additionally, this flowchart offers a general overview of the dataset used and simplifies the understanding of our approach.Subsequent sections will provide detailed explanations of each step.In summary, our contributions are expected to offer a promising solution to the challenges in wind turbine bearing degradation assessment, paving the way for more effective predictive maintenance strategies and enhanced operational efficiency in the renewable energy sector.
The remainder of this paper is organized as follows: Section 2 is dedicated to describing the data utilized in this study, exploring its complexity, and detailing the various data processing techniques applied, accompanied by illustrative examples.Section 3 focuses on the methods employed, emphasizing the overall architecture of UBO-EREX.Section 4 presents the results and discussions, where several approximation metrics and methods of uncertainty quantification are applied and thoroughly discussed.Finally, Section 5 concludes the paper with key insights and future perspectives.

Materials
In order to derive more accurate conclusions regarding bearing deterioration, this study integrates a realistic dataset obtained from real-world conditions [23].In a previous study, a comprehensive run-to-failure experiment was conducted to monitor real-time health indicators of a high-speed shaft equipped with a 20-tooth pinion gear, driven by a 2 MW wind turbine.The data collection process involved meticulous measurements to account for environmental and operational fluctuations affecting turbine performance [23].Accelerometers mounted on the turbine's shaft captured vibrations at a high sampling rate of 97,656 Hz within 6-s time intervals per window.This setup effectively captured subtle variations in vibrations caused by changes in wind speed and mechanical loads.
To address the non-stationarity inherent in the data due to frequent fluctuations in shaft speed during operation, synchronous resampling techniques were employed.These variations stem from factors such as fluctuations in wind speed, torque ripple effects from the tower, and other operational loads influencing the turbine's mechanical stability.Synchronous resampling aligned the vibration data to a consistent reference frame, improving the reliability of spectral analyses used for detecting potential bearing faults and ensuring the accuracy of the monitoring process.
This meticulous approach facilitated early detection of bearing failures, thereby enhancing maintenance and operation strategies for wind turbines.The data collection process specifically targeted failures associated with high-speed shaft bearings in wind turbines.The analysis revealed inner race failures as a prevalent fault type due to the significant stress and load endured by these components, as illustrated in Figure 2. The application of synchronous resampling enhanced fault detection by improving the resolution of frequency analysis, allowing for clearer differentiation between faultrelated frequencies and normal operational frequencies.This method effectively identified deviations in vibration patterns indicative of bearing degradation, such as increases in inner race energy, which directly correlate with the presence of faults.This approach not only aids in early fault detection but also contributes to a more targeted and efficient maintenance regime, reducing downtime and enhancing turbine efficiency.
Throughout the 50-day observation period under normal operating conditions, 50 profiles were stored separately, with each file containing approximately 585,936 samples treated as a single health indicator.It was observed that the collected data exhibited exponential variations over time due to changes in the physical health conditions of the bearings, as depicted in Figure 3a.Consequently, after 50 days of operation, the bearings ceased functioning due to the occurrence of an internal race fault.
To summarize, the detailed information about the dataset is presented in the following list.As addressed in Figure 3b-g, specific steps are followed in this work in order to reveal the primary degradation patterns inherent in the signals before feeding the learning systems.These steps include, denoising, feature extraction, outlier removal, and linear filtering.Before and after each step, the data scales in the interval [0, 1].The data processing steps are explained as follows.

Denoising
Vibration signals collected from wind turbines often contain noise due to many factors, such as environmental conditions, mechanical vibrations, electrical interference, sensor imperfections, and transmission and signal processing.In this work, denoising is executed through several steps of wavelet denoising algorithms.These algorithms, including Beylkin, Best-localized Daubechies, Symlets, Coiflets, Daubechies, Fejer-Korovkin, Morris minimum-bandwidth orthogonal, and Vaidyanathan, are applied to each vibration signal independently [25,26].During the denoising process, each algorithm operates by decomposing the vibration signal into its constituent wavelet components, effectively separating the signal from noise.By leveraging the unique properties of wavelets, such as localization in time and frequency domains, these algorithms identify and suppress noise components while retaining signal features of interest.It is worthy to mention that the default parameters for wavelet denoising as per MATLAB 23.2.0.2515942 (R2023b) Update 7 documentation, including automatic determination of the decomposition level based on signal length, soft thresholding for noise suppression, universal thresholding for noise estimation, level-independent thresholding, and automatic rescaling of coefficients, are applied, ensuring efficient and reliable denoising of signals.The denoising process, as depicted in Figure 3b, effectively mitigates the fluctuations in signal amplitudes experienced by the data.This reduction in fluctuations is particularly notable within the time range of (30,40) days, where the denoising process distinguishes these fluctuations from actual failure patterns.In contrast, the time range (40, 50) days exhibits significant growth in signal amplitude, along with larger envelopes and perturbations.Consequently, degradation patterns become more discernible compared with the raw data of Figure 3a after denoising, as the process clarifies the underlying trends and features within the data, making fault patterns more evident and facilitating accurate fault detection and analysis.

Variance Extraction
After denoising, variance extraction follows to provide insights into the spread of signal values within each window and to capture the variability of signals over time [27].In this implementation, a window of size 300 samples is moved along the signal, and at each position, the variance of the signal within that window is computed.As depicted in Figure 3c, degradation patterns now become clearer, describing the health of the turbine.However, a slight issue that could potentially be deemed a problem is the presence of anomalies in the data, particularly in the initial primary recorded samples.These anomalies manifest as massive variability in the data, which may not accurately reflect the health of the turbine.Instead, they are more likely to be attributed to errors in measurements caused by various factors.Notably, the amplitudes of these anomalies are observed to equal or exceed the amplitudes of variance observed towards the end-of-life of the turbine, which is logically inconsistent.Therefore, to address this issue and mitigate the influence of these misrepresented samples, the next steps of outlier removal become necessary.Outlier removal aims to identify and eliminate these anomalous data points, thus refining the dataset and improving the accuracy of subsequent analysis and interpretation.

Envelope Analysis
Before proceeding to outlier removal, another essential feature step of signal envelope extraction is considered necessary [28].In this implementation, a time window of a specified size (i.e., 200 samples) is slid along the signal, and at each position, the envelope values are computed.These envelopes provide valuable information about the signal's behavior, facilitating further analysis and interpretation, such as identifying trends, periodicities, and anomalies.In Figure 3d, the obtained results of envelope extraction showcase a reduction in fluctuations in measurements.However, the anomalies identified in the previous step of variance extraction persist, highlighting the need for further discussion and recommendations regarding the use of outlier removal.

Outlier Detection and Removal
Outliers, defined as data points that significantly deviate from the majority of the dataset, have the potential to distort analysis results and lead to inaccurate conclusions [29,30].In this work, robust outlier detection techniques are employed to identify and remove such outliers, thereby ensuring the integrity of the dataset.Utilizing various statistical models and algorithms, including median analysis, Grubbs' test, mean analysis, and quartiles analysis, the process systematically identifies outliers that deviate significantly from the underlying data distribution.Figure 3e illustrates the results obtained after outlier removal, particularly highlighting the elimination of further pulses observed at the end of the turbine's life span, within the range of 40 to 50 days.This indicates that the outlier removal process has effectively identified and removed anomalous data points or pulses that occurred towards the end of the turbine's operation.By eliminating these outliers, the dataset is refined, and the integrity of the data is enhanced, allowing for more accurate and reliable analysis.However, there still remains a challenge posed by anomalies at the beginning of the turbine's life.Consequently, we are compelled to explore trend analysis methods, such as linear filtering, to address this issue.

Trend Analysis
Linear regression filtering is a crucial technique utilized to uncover underlying trends or patterns within datasets.By applying this method, we aim to identify long-term changes or abnormalities that may serve as indicators of impending bearing faults or degradation [31].In this particular study, the linear regression filtering process is employed to smooth the signal and uncover significant trends or anomalies within the data.Specifically, a window size of 9800 data points is utilized for the filtering operation to encompass a substantial range of observations.Ensuring that the window size remains odd maintains the integrity of the filtering process, preserving symmetry and accuracy in trend analysis.Figure 3f illustrates the outcomes derived from the filtering process, demonstrating a reduction in anomalies present at the beginning of the life cycle.This reduction signifies an improved depiction of the degradation trend, as the filtering process effectively mitigates noise and fluctuations, thereby facilitating a clearer understanding of the data and enhancing the ability to discern meaningful patterns indicative of bearing health.

RUL Label Generation
Subsequently, a linearly spaced array is created, spanning from 50 to 0 days.This array represents the RUL values, where 50 days corresponds to the initial state (full health) and 0 days represents the end of the bearing's operational life (failure).The length of this array matches the length of the dataset, ensuring that each data point is associated with a corresponding RUL label, as addressed in Figure 3h.
It is worth highlighting that our approach to data quality analysis relies heavily on visual inspection at each step of the process.This means that human intervention is necessary to determine whether degradation signals are detectable in the data.While this visual inspection has proven effective in achieving our objectives thus far, it is essential to acknowledge that this approach has limitations.Visual inspection may not always capture subtle patterns or anomalies in the data, potentially leading to overlooked insights or inaccuracies in the analysis.Therefore, there is a need for future research to explore and develop more analytical and precise methods for data analysis.By incorporating advanced analytical techniques, such as statistical methods, we can enhance the robustness and accuracy of our data analysis processes, ultimately leading to more reliable insights and conclusions.

Methods
This study integrates ELM [32] and REX [17] learning methodologies to formulate the proposed UBO-EREX model.This unified framework optimizes all hyperparameters via Bayesian optimization [33] while focusing on uncertainty quantification as the core objective function.The uncertainty quantification is conducted through the confidence interval philosophy, allowing for the estimation of the merging range of predictions.Thus, this section is dedicated to elucidating this approach, with an additional emphasis on the philosophy of uncertainty quantification.

UBO-EREX
As depicted in the flow diagram of Figure 4, the proposed UBO-EREX architecture involves the utilization of both the ELM network architecture and REX.In this work, ELM is trained by generating random input weights and biases ,  for the hidden layer , which is then activated by an activation function  for specific inputs  as in (1).After that, the learning weights , which are the output weights of , are computed using the Moore-Penrose pseudo-inverse of the matrix involving a regularization parameter , the transpose of the hidden layer  , the desired outputs , and the identity matrix  as in (2).Here, , the number of neurons  in , and  are hyperparameters of the currently used basic ELM architecture.

𝐻 𝜎 𝑎𝑥 𝑏
(1) In the REX philosophy, the learning model is expected to learn both model representations and behavior by merging the entire ELM neural network outcome, including , , and estimated outputs  , into another ELM network for multiple rounds .By repeating the process over time for multiple rounds, as addressed in (3), the model's approximation and generalization are expected to improve over time.This improvement occurs because the model first learns from additional mappings of labels, serving as a source of transductive learning.On the other hand, from each input and response, the model in round  1 is able to gain a better sense of the interaction between inputs and labels from previous rounds in .

𝑥
,   , In the formula of REX in (3),  represents a data processing function suggested to process feature maps and estimated targets, especially due to the expected large size of the hidden layer, which will complexify the REX of the next rounds.Accordingly, this work defines  as a dimensionality reduction algorithm based on principal components analysis (PCA) [34].PCA is controlled by the amount of retained variance ratio ( ), which is given as a hyperparameter of REX in this case, along with the number of rounds .The  algorithm can then be defined as follows: Let  be the reduced feature matrix, where x is the original feature matrix and  is the desired explained variance ratio.We can express the PCA reduction as follows by first computing the covariance matrix  as in (4).After that, we perform a Singular Value Decomposition (SVD)  on the covariance matrix , as in (5).

𝑉
(4) The next step consists of computing the total variance , as in ( 6), while after we compute the target variance to retain  , as in 7 .

𝑉
∑  (6) The last step consists of determining the number of principal components to retain  , as in (8).
Matrix dimensions are reduced using the retained principal components, as in (9), where  consists of the first  columns of . (9)

Uncertainty Quantification Objective Function
In this study, an objective function based on uncertainty quantification is proposed when searching for optimal hyperparameters of the UBO-EREX algorithm using the Bayesian approach.Accordingly, a confidence interval (CI) is utilized, as in Equation ( 10) [35].̅ denotes the sample mean, and  represents the score associated with a given confidence level  .In this case, a confidence level of 99% is utilized, which results in  approximately equaling 2.5758.Furthermore,  signifies the standard deviation of the samples, and  denotes the sample size.

𝐶𝐼
̅  •  (10) Formula ( 10) defines a range of values indicating the confidence level regarding the population mean.It is worth noting that CI analysis in this work focuses on residuals  , as described in (11).Emphasizing residual analysis, this approach offers direct insights into the uncertainty in predictions, thus providing valuable evaluations of the model's dependability and efficiency.Opting for a 99% confidence level not only ensures a high degree of certainty but also proves beneficial in scenarios necessitating crucial decision-making.
This investigation relies on extracting pivotal features from the confidence interval to improve the overall certainty of predictions.Consequently, metrics such as CI stability  , coverage probability  , and interval width  are established.Initially,  is determined by evaluating the consistency of the confidence interval.This involves comparing the lower and upper bounds to identify at least two comparable subsets with similar lengths.Following this,  and their corresponding medians, denoted as  , are computed, along with the calculation of their absolute deviations, referred to as  .Subsequently, the Levene test is applied to  , comparing it to critical values derived from the Fisher-Snedecor F-distribution at a predetermined significance level, as outlined in Equation (12).If the test statistic exceeds the critical value, the null hypothesis  (CI is non-stable) is rejected ( 0 and  1 )), confirming stability; conversely, a lower value indicates instability. is determined using the margin of error  , as described in Equation (13).Next,  is computed using the coverage parameter  and , as depicted in Equation ( 14).Here,  represents the count of confidence intervals encompassing the true parameter, while  ) signifies the proportion of confidence intervals covering the true parameter.This metric offers valuable insights into the reliability of the estimation process.This study introduces an uncertainty quantification (UQ) formula outlined in Equation (15), where elevated values indicate increased uncertainty in predictions.The inverse of  , denoted as   , equals 1 when the confidence interval is considered unstable (i.e., rejecting the null Levene hypothesis), and 0 otherwise.For the stability test, a 99% confidence level is conceded per default.
The uncertainty quantification  formula serves as the primary objective for hyperparameter tuning in this scenario.This implies that the aim is to decrease the interval width and its instability while enhancing its coverage probability.
In summary, Algorithm 1 illustrates the UBO-EREX algorithm proposed in this work along with its primary learning rules.the coefficient of determination ( , for both training and evaluation datasets.Additionally, the percentage of  improvements ( ) and the percentage of objective search time improvements ( ) are further utilized to estimate the amount of UBO-EREX improvement in both accuracy and computational cost for comparison reasons.These metrics are explained in Formulas (16)(17)(18)(19)(20)(21), respectively, where  is the mean of the observed values  and  is the number of samples.
Furthermore, the results are strengthened and validated through visual illustrations.Moreover, to provide a comprehensive comparison, UBO-EREX is benchmarked against several other algorithms, including the original ELM and a selection of deep learning time series models such as long-short term memory (LSTM), bidirectional LSTM (BiLSTM), and gated recurrent unite (GRU).It is worth noting that all compared algorithms undergo optimization via Bayesian optimization, with the objective function being RMSE, unlike UBO-EREX, which incorporates a UQ objective.As a result, this section thoroughly explores the UBO-EREX results independently, as it represents our primary algorithm.It also elaborates on the results obtained from the compared methods in the subsequent steps.
Firstly, by delving into the performance analysis behind UBO-EREX, we aim to demonstrate that incorporating learning from maps and labels as additional sources alongside the input data enhances the learning process over time.This leads to a deeper understanding of data representation and model behavior.
In this context, Figure 5 presents the most crucial metrics related to approximation errors and prediction variability, namely RMSE and  , respectively.These metrics are gathered from each training round k of the REX process.Notably, these metrics begin to demonstrate improved learning performance after  5 and appear to stabilize at better performances around  10 .This explains why the learning model at the first stage struggles to grasp the maps and targets with respect to the inputs of each learning round.This is why we observe stability in the initial rounds.Then, as the model discovers improved representations and begins to understand the relationships between inputs, maps, and targets, it starts to correctly tune the ELM weights, resulting in a clear increase in performance.It is worth noting that  is tuned via Bayesian objective optimization.This observed learning behavior validates the REX theory and underscores its applicability for the ELM light network.Now we move to the numerical evaluation and comparisons of these approximation metrics, as well as uncertainty quantification metrics, in a comparative analysis for a better understanding of UBO-EREX learning performances.Accordingly, Tables 1 and 2 are dedicated to this matter, respectively.
Table 1 provides a comprehensive comparison of evaluation metrics across different methods, focusing on both the training and testing phases.Each method, including BiLSTM, ELM, GRU, LSTM, and UBO-EREX, is evaluated based on the aforementioned key metrics.In terms of training performance, UBO-EREX demonstrates promising results with the lowest errors and highest accuracy in predicting target values compared with other methods.Additionally, UBO-EREX achieves a high  value of 0.8710, suggesting a strong correlation between predicted and actual values, which signifies the effectiveness of the learning process.Upon transitioning to the testing phase, UBO-EREX maintains its competitive edge, showcasing comparable performance metrics to those observed during training.The RMSE (0.1037), MAE (0.0760), MSE (0.0107), and  (0.8730) values remain consistent, reinforcing the robustness of the UBO-EREX algorithm in generalizing to unseen data.
In terms of computational efficiency, UBO-EREX also stands out, with a remarkably low search time of 0.0499 s.This indicates that UBO-EREX efficiently explores the search space to optimize model parameters, resulting in expedited convergence and reduced computational overhead.Furthermore, the percentage improvement over  over BiLSTM, ELM, GRU, and LSTM, respectively (7.6050, 4.0509, 3.7822, and 5.7056), clearly indicates its superior performance.Additionally, the percentage improvement in search time shows that UBO-EREX already demonstrates the most efficient search process among the evaluated methods, except for ELM, whose architecture is light and may incur less computational cost.
Overall, Table 1 highlights UBO-EREX as a robust and efficient method for training neural networks, offering superior predictive accuracy, strong generalization capability, and minimal computational overhead.These findings underscore the potential of UBO- EREX to advance machine learning applications in the field of predictive maintenance of rotating machinery, specifically wind turbine bearing degradation.The visual illustrations of curves fit in Figure 6 for both training Figure 6a and testing data.Figure 6b also showcases the performance of UBO-EREX and confirms the results presented in Table 2.However, further details about the smoothness and accuracy of the curve fit are revealed by UBO-EREX.The reason UBO-EREX behaves differently and approaches the target function better than all the compared learners lies in its objective function minimization.By specifically minimizing  objective, UBO-EREX effectively narrows the confidence interval width and reduces its variability.In this case, BiLSTM demonstrates the worst behavior, as it follows a divergent path for Remaining Useful Life (RUL) prediction.This divergence indicates that BiLSTM struggles to effectively model the RUL trajectory or fails to capture the underlying patterns in the data.As a result, its predictions deviate significantly from the actual RUL values, leading to poorer performance compared with other methods.Moving to uncertainty quantification, Table 2 provides a detailed breakdown of uncertainty quantification metrics for various methods.These metrics offer insights into the reliability and stability of the uncertainty estimates produced by each method.The first metric, interval width, measures the range of the uncertainty interval generated by each method.A narrower interval signifies a more precise estimation of uncertainty.In the table, we observe that ELM, GRU, and UBO-EREX exhibit similar and comparatively narrower interval widths, indicating more accurate uncertainty estimates compared with BiLSTM and LSTM.Moving on to coverage probability, this metric evaluates the proportion of true values that fall within the uncertainty interval.A coverage probability close to 0.99 0.1% implies that the method accurately captures the true uncertainty.Remarkably, all methods demonstrate high accuracy coverage probabilities, suggesting reliable uncertainty estimates across the board.Interval stability, the third metric, assesses how consistent the uncertainty intervals remain across different observations or instances.A stability value of 1 indicates perfect consistency, implying that the interval width remains constant.In this context, ELM, GRU, and UBO-EREX exhibit perfect stability, while BiLSTM and LSTM display varying degrees of instability.Lastly, the uncertainty metric provides an overall assessment of uncertainty estimation by combining interval width, coverage probability, and stability.Lower uncertainty values indicate more accurate and reliable uncertainty estimates.Notably, UBO-EREX achieves the lowest uncertainty value among all methods, indicating its superior performance in uncertainty quantification.The Confidence Interval (CI) plots depicted in Figure 7 serve to reinforce the findings presented in Table 1, while also providing a visual representation of the uncertainty associated with the predictions.These plots demonstrate that UBO-EREX consistently exhibits lesser variability and tighter CIs, even at a 99% confidence level, compared with the other methods evaluated.Specifically, UBO-EREX's CI plots indicate a higher level of confidence in its predictions, with narrower intervals around the predicted values.This suggests that UBO-EREX provides more precise and reliable estimates of uncertainty, offering greater confidence in its predictions compared with other methods.Conversely, BiLSTM consistently displays higher levels of variability in its CI plots, indicating less confidence in its predictions and a wider range of possible outcomes.This suggests that BiLSTM's predictions may be less reliable and more uncertain compared with UBO-EREX.Additionally, the CI plots highlight the instability of CI for LSTM.This instability is reflected in the fluctuation of the confidence intervals across different observations or instances, indicating inconsistencies in uncertainty estimation.This further underscores the superior performance of UBO-EREX in providing stable and reliable uncertainty estimates compared with LSTM.In summary, the CI plots in Figure 7 provide visual evidence supporting the findings of Table 1, demonstrating that UBO-EREX consistently outperforms other methods by exhibiting lesser variability and tighter confidence intervals, even at a higher confidence level, thereby enhancing confidence in its predictions.Figure 8 summarizes information about Bayesian optimization in terms of objective function behavior and computational time, while further details can be revealed about the computational efficiency and convergence behavior of the learning models.
Firstly, in Figure 8a, it is evident that the time consumed during the search for the objective function increases significantly for the deep neural networks, particularly for architectures such as LSTM, GRU, and BiLSTM, each consuming progressively more time than the other.Conversely, UBO-EREX and ELM require remarkably less computational time, with ELM being the least time-consuming.This observation highlights the advantages of newer architectures over traditional deep learning methods, as they maintain superior accuracy while requiring significantly less computational time.
Secondly, in Figure 8b, the behavior of RMSE objective minimization is addressed.It is evident that GRU exhibits signs of overfitting, indicating that the model may be fitting too closely to the training data and may struggle to generalize to unseen data.On the other hand, LSTM and ELM show moderate and somewhat stable convergence behavior, suggesting that they are better able to adapt to the data without overfitting or underfitting.However, BiLSTM clearly underfits, indicating that it fails to capture the complexity of the data and may produce overly simplistic models.
Simultaneously, Figure 8c illustrates the uncertainty quantification objective of UBO-EREX in terms of interval width.Similar to the convergence patterns observed in Figures 5a,b and 8c, this indicates a comparable convergence pattern for UBO-EREX.This suggests that UBO-EREX exhibits stable convergence behavior in uncertainty quantification, aligning with its ability to provide accurate and reliable uncertainty estimates.In the final Table 3, the obtained hyperparameters via Bayesian optimization are showcased.It should be noted that the Bayesian optimization process in this work involved defining a hyperparameter space encompassing parameters such as learning rates, regularization parameters, activation functions, and network architecture (neurons).The optimization process was guided by a Gaussian process regression model, assuming a Gaussian prior distribution over the objective function, with hyperparameters like length scale and noise level determined iteratively.Regarding the optimization stopping criterion, termination was based on a predefined number of iterations (i.e., 50 iterations) or function evaluations, ensuring convergence to a satisfactory solution within computational constraints.While default settings were utilized for simplicity, it is recognized that further exploration into the impact of varying these parameters on optimization performance and model outcomes is important.This table provides valuable insights, particularly concerning the number of neurons utilized by each method.Notably, UBO-EREX allows for a lesser number of neurons compared with the other methods, even when employing a large number of hidden layers and maps (e.g.,  70%).This observation carries significant implications, indicating that UBO-EREX effectively captures complex patterns within the data while requiring fewer neurons.This suggests that UBO-EREX can achieve comparable or even superior performance with a more efficient and streamlined neural network architecture.By leveraging Bayesian optimization to fine-tune hyperparameters, UBO-EREX optimally balances model complexity and predictive accuracy, resulting in a more efficient and effective learning process.In summary, UBO-EREX demonstrates a significant impact on predictive maintenance, particularly in addressing wind turbine bearing degradation.Through comprehensive evaluations and comparisons, UBO-EREX consistently outperforms alternative methods in predictive accuracy, adaptability, and computational efficiency.The technique Elapsed Time (s) Minimum RMSE Minimum Interval Width exhibits robust performance indicators, including minimal errors, high precision, and a strong correlation between predicted and actual values.Moreover, UBO-EREX provides reliable estimates of uncertainty and consistent predictions, further underscoring its relevance in practical applications.More precisely, the incorporation of UBO significantly enhances the predictive performance of the EREX model.By integrating uncertainty quantification objectives into the optimization process, UBO-EREX achieves improved predictive accuracy and reliability.This approach not only fine-tunes model parameters to produce more precise forecasts but also provides valuable insights into the confidence level of predictions, enhancing decision-making processes.Moreover, UBO-EREX demonstrates robustness to data variability and outliers while optimizing computational efficiency.Overall, the integration of UBO methodology elevates the EREX model's effectiveness in real-world applications, ensuring accurate and reliable predictions with enhanced confidence.These findings underscore the substantial impact of UBO-EREX in advancing machine learning applications, especially predictive maintenance for rotating machinery.

Conclusions
This work introduces a novel representation learning architecture named UBO-EREX, which combines ELM and REX methodologies to address challenges in wind turbine health degradation prognosis.The model is augmented by Bayesian optimization methods with an objective function targeting uncertainties in the data.Applied to a realistic dataset that has undergone thorough preprocessing stages, including denoising, outlier removal, filtering, scaling, and more, the algorithm demonstrates strong performance across a wide range of metrics.Through comprehensive evaluation utilizing error metrics, uncertainty quantification metrics, and various illustrative visualizations and curves, the algorithm exhibits remarkable performance.Particularly noteworthy is its superiority over existing streamlined time series deep learning models, positioning it as a preferred choice for degradation analysis throughout the turbine lifecycle.Future opportunities in this domain will focus on refining and expanding uncertainty quantification approaches, aiming to further enhance the robustness and reliability of prognostic models in wind turbine health monitoring and maintenance.

Figure 1 .
Figure 1.Overview of methodology and contributions.

Figure 2 .
Figure 2. Real-world display of an inner race fault on the high-speed shaft: Following data collection, a bearing inspection revealed a cracked inner race.Reproduced from [24]: MDPI 2021.

Figure 3 .
Figure 3. Vibration raw data and processing stages: (a) Raw data; (b) Denoising of raw signals; (c) Extraction of variance from denoised vibration signals; (d) Extraction of envelopes from variance signals; (e) Outliers removal from the envelopes; (f) Linear filtering, respectively; (g) RUL labels.

Figure 4 .
Figure 4. Architecture of the proposed approach: (a) ELM network(s); (b) recurrent expansion of the ELM network(s).

Figure 8 .
Figure 8. Plot Bayesian optimization results characteristics: (a) elapsed time versus number of evaluations; (b) RMSE versus number of evaluations; (c) UQ objective versus number of evaluations.

Run-to-failure experiment with high- frequency vibration measurement for early bearing inner race fault identification
 50-day observation period;  Exponential data variation over time;  Sampling rate: 97,656 Hz;  Time interval per window: 6 seconds;  50-day observation period;  Approximately 585,936 samples per file;  50 profiles stored separately.Data preprocessing  Wavelet denoising (Beylkin, Daubechies, Symlets, etc.);  Variance Extraction by a sliding window of 300 samples;  Envelope Analysis with a sliding window of 200 samples;  Outlier Removal via Median analysis, Grubbs' test, mean analysis, quartiles analysis;  Trend Analysis with a linear regression filtering of 9800 data points window;  RUL Label Generation as a linearly spaced array of 50 days.ELM and REX Integration  Extreme Learning Machine;  Recurrent Expansion;  PCA for Dimensionality Reduction;  Uncertainty Quantification via Confidence interval analysis of prediction residuals;  Bayesian Optimization for hyperparameter tuning with uncertainty quantification as objective function.

Table 1 .
Error evaluation metrics for training and testing and improvement ratio of UBO-EREX.

Table 3 .
List of tuned hyperparameters.