Data Science for Vibration Heteroscedasticity and Predictive Maintenance of Rotary Bearings

: Electric motors are widely used in our society in applications like cars, household appliances, industrial equipment, etc. Costly failures can be avoided by establishing predictive maintenance (PdM) policies or mechanisms for the repair or replacement of the components in electric motors. One of key components in the motors are bearings, and it is critical to measure the key features of bearings to support maintenance decision. This paper proposes a data science approach with embedded statistical data mining and a machine learning algorithm to predict the remaining useful life (RUL) of the bearings in a motor. The vibration signals of the bearings are collected from the experimental platform, and fault detection devices are developed to extract the important features of bearings in time domain and frequency domain. Regression-based models are developed to predict the RUL, and weighted least squares regression (WLS) and feasible generalized least squares regression (FGLS) are used to address the heteroscedasticity problem in the vibration dataset. Support vector regression (SVR) is also applied for prediction benchmarking. Case studies show that the proposed data science approach handled large datasets with ease and predicted the RUL of the bearings with accuracy. The features extracted from time domain are more signiﬁcant than those extracted from frequency domain, and they beneﬁt engineering knowledge. According to the RUL results, the PdM policy is developed for component replacement at the right moment to avoid the catastrophic equipment failure.


Introduction
Electric motors, which convert electrical energy into mechanical energy, are used in pumps, compressors, gear sets, drive belts, machine tools, etc.The electricity consumption of electric motors accounts for over twenty-five percent of the global electricity use.Generally, they have about 20 years of useful life under normal usage and maintenance [1].Over time, however, heat, vibration, torque, load, power supply, wear and pollution will shorten their lifecycles in the manufacturing process.Predictive maintenance (PdM), which monitors machinery and operations continuously, alerts management to an impending failure.Generally, motor vibration analysis and current signature analysis [2] are conducted.However, the complicated manufacturing process environment causes a higher false alarm rate because of the disturbance from the initial irregular signal and the outer noise during the detection.To address the motor monitoring and maintenance problem, the literature introduces "diagnostics" methods and "prognostics" methods [3,4].The former identify the root cause when failures are presented in an equipment, while the latter predicts the potential failure and replaces the parts before the failure happens.In fact, prevention is better than cure.Prognostics hence health management (PHM) are obtaining a lot of attention recently, as evidenced by the literature.In particular, condition-based maintenance (CBM), which is usually applied in practice [3], consists of data collection, data processing, and maintenance decision-making [4].In the data collection phase, a sensor collects actionable condition information from the machine or parts being monitored.The data processing phase consists of data preprocessing, feature extraction, and feature selection.The phase identifies the key parameters/variables which significantly affect the machine performance, and then develops the prediction model for prognostics analysis [5].The maintenance decision-making phase decides whether to replace or repair the machine [6].
Studies of remaining useful life (RUL) predictions use signal data in different ways [7,8].In [3], the features or other information in frequency domain, time domain, or time-frequency domain have been extracted to represent the health status of the components inside the equipment for RUL prediction.In [3] different mathematical models or artificial intelligence (AI) algorithms have been developed for different conditions.The taxonomies of the physical, statistical, and machine learning methods [9,10] are summarized in Figure 1.
Energies 2019, 12, x FOR PEER REVIEW 2 of 18 health management (PHM) are obtaining a lot of attention recently, as evidenced by the literature.In particular, condition-based maintenance (CBM), which is usually applied in practice [3], consists of data collection, data processing, and maintenance decision-making [4].In the data collection phase, a sensor collects actionable condition information from the machine or parts being monitored.The data processing phase consists of data preprocessing, feature extraction, and feature selection.The phase identifies the key parameters/variables which significantly affect the machine performance, and then develops the prediction model for prognostics analysis [5].The maintenance decisionmaking phase decides whether to replace or repair the machine [6].
Studies of remaining useful life (RUL) predictions use signal data in different ways [7,8].In [3], the features or other information in frequency domain, time domain, or time-frequency domain have been extracted to represent the health status of the components inside the equipment for RUL prediction.In [3] different mathematical models or artificial intelligence (AI) algorithms have been developed for different conditions.The taxonomies of the physical, statistical, and machine learning methods [9,10] are summarized in Figure 1.

Figure 1. Summary of RUL prediction methods
In [3] a comprehensive review and list of the critical components of PHM with corresponding models and algorithms are provided.Jardine et al. [4] and Heng et al. [11] have reviewed machine learning and statistical techniques of PHM.Keogh et al. [12] and Fu [13] have addressed large-scale signal datasets and discussed signal processing, segmentation methods, and time series data mining techniques for dimensionality reduction.To avoid ignoring the process characteristic and the variable interpretation, this study constructs statistical regression-based models and support vector machine with the extracted features from the time domain and frequency domain.The proposed data science approach can handle large-scale raw data, identify key factors, and predict the RUL of the bearings installed in electric motors.
The remainder of this paper is organized as follows: Section 2 explains the fundamental methods and techniques.Section 3 proposes data science approach including the data collection, data preprocessing, feature extraction, feature selection and RUL prediction models.An empirical study and experiments are conducted to validate the proposed approach in Section 4. Section 5 concludes and suggests future research.

Fundamental Methods and Techniques
This section introduces the fundamental methods and techniques used in the study.

Sliding Window with Rolling and Tumbling Aggregates
Sliding window (SW) [14,15] is a method usually used to extract the important features (e.g., mean or extreme value) from the large-scale raw dataset (e.g.current or vibration) and reduces the In [3] a comprehensive review and list of the critical components of PHM with corresponding models and algorithms are provided.Jardine et al. [4] and Heng et al. [11] have reviewed machine learning and statistical techniques of PHM.Keogh et al. [12] and Fu [13] have addressed large-scale signal datasets and discussed signal processing, segmentation methods, and time series data mining techniques for dimensionality reduction.To avoid ignoring the process characteristic and the variable interpretation, this study constructs statistical regression-based models and support vector machine with the extracted features from the time domain and frequency domain.The proposed data science approach can handle large-scale raw data, identify key factors, and predict the RUL of the bearings installed in electric motors.
The remainder of this paper is organized as follows: Section 2 explains the fundamental methods and techniques.
Section 3 proposes data science approach including the data collection, data preprocessing, feature extraction, feature selection and RUL prediction models.An empirical study and experiments are conducted to validate the proposed approach in Section 4. Section 5 concludes and suggests future research.

Fundamental Methods and Techniques
This section introduces the fundamental methods and techniques used in the study.

Sliding Window with Rolling and Tumbling Aggregates
Sliding window (SW) [14,15] is a method usually used to extract the important features (e.g., mean or extreme value) from the large-scale raw dataset (e.g.current or vibration) and reduces the data to a workable volume.The idea of SW is similar to moving average (MA).Some of the parameters used in SW can be adjusted for different datasets such as vibration, current, etc.: (1) Window size: a vector or list regarded as offsets compared to the current time.(2) Step size: moved by the step size of the number of data points rather than every point.
(3) Selected statistic in the window: statistics such as mean, maximum, minimum, etc. can be calculated in the window.
In general, there are two common types of SW: tumbling aggregates and rolling aggregates.The former suggests non-overlapping between windows, i.e., window size equals the step size.The latter one suggests overlapping windows, i.e., window size is larger than the step size.Given a sequence {x t }, the new sequence {SW k } can be derived by mean value: and by extreme value: where w denotes the window size, m denotes the step size, k denotes the index of new sequence {SW k } and k = 1 + t−w m , ∀t ≥ w and w ≥ m.SW can effectively reduce the large-scale signal dataset into a smaller volume (called aggregated dataset hereafter).The aggregated dataset is used in the next process.

Stepwise Regression
Stepwise regression is a variable selection technique which evaluates the subsets of variables by either adding or deleting variables one at a time according to a specific statistic (e.g., F-value) in linear regression models [16].
There are three main types of the stepwise selection methods: forward selection, backward selection, and both-sided selection.Forward selection begins with no variable in the regression model and adds one variable at a time until no more variables remain.Backward selection begins with all of the variables and removes one insignificant variable at a time until it reaches the stopping criteria.Both-sided selection integrates the forward and backward selections to obtain the balance between them [16,17].

Ordinary Least Squares (OLS) and Heteroscedasticity
OLS provides the best linear unbiased estimator (BLUE) of the coefficient estimation, given three strong assumptions (a) normality (b), homoscedasticity, and (c) serial independent.The parameter estimation method for linear regression has the following form: where index t represents time, the dataset consists of n observations with p independent variables, one response variable y t ∈ R n×1 , one independent variable x pt ∈ R n×(p+1) , the coefficient β p ∈ R (p+1)×1 of independent variables, and the error term ε t ∈ R n×1 .
Energies 2019, 12, 801 4 of 18 The vibration signals violate the homoscedasticity assumption since the amplitude exponentially grows when the bearing wears over time.Theoretically, the homoscedasticity can be defined by simple linear regression: If heteroscedasticity exists, the variance of the errors are not identical with respect to x t : In practice, the White test is used to detect the heteroscedasticity effect.In this study weighted least squares (WLS) and feasible generalized least squares (FGLS) are used to correct heteroscedasticity.

Weighted Least Squares (WLS)
WLS addresses heteroscedasticity by adding the weights on each independent variables x t to adjust the effect of the variance.In general, we can decide the form of weights and add the pre-determined form to the linear equation.If there is heteroscedasticity, the error form can be formulated as Var(ε t |x t ) = σ 2 h t , where h t denotes the variance function of the σ 2 .In this way, the problem of heteroscedasticity can be accommodated by taking the root squared of h t on the original regression function, and the variance can finally become homoscedasticity as Equations ( 6) and (7).
where √ h t are the pre-determined weights used to correct the variance of errors.

Feasible Generalized Least Squares (FGLS)
In practice, the form of the variance function with heteroscedasticity is usually unknown, and the weights are difficult to define.FGLS estimates the best weights to address heteroscedasticity.The variance function h t is always positive, so the variance with heteroscedasticity can be formulated as: where the variance function is written as an exponential function to be positive, δ 0 is the intercept term of the function, and δ p is the coefficient of x pt .The estimated linear regression is: The regression on the εt 2 is run to estimate h t (x), and log function is used to remove the exponential term to obtain: The logε t 2 term is set to be g t (x), and the regression is run to estimate ĝt (x), i.e., the variance function, to obtain: Deriving exp( ĝt ) = exp logε t 2 = εt 2 = ĥt obtains: where ĥt 1 2 are the estimated weights used to correct the variance of errors.The best weights are included in WLS, i.e., FGLS, to obtain a double estimated coefficient.In this way, the heteroscedasticity problem is solved and the variance of the coefficient of independent variables is effectively reduced [18].

Partial Least Squares (PLS) Regression
PLS, which is a regression-based approach based on principal component analysis (PCA), uses a set of latent or unobserved variables to identify the relation between the principal components (PCs) and the response variable [19].A typical PCA method transforms the original variables into several orthogonal principal components (PCs) for dimensionality reduction and eliminates the multi-collinearity problem.Instead of finding hyperplanes of maximum variance of the independent variables in PCA, PLS extracts a small number of latent variables by projecting the independent variable and the response variable to a new space simultaneously.Thus, PLS retains the merits of PCA and is not limited to uncorrelated variables like OLS.PLS handles noise and collinearity, and there is no requirement of independence between observations.PLS also models several response variables simultaneously [19].

Support Vector Regression (SVR)
SVR, which is a regression variant of support vector machine (SVM) is commonly used for prediction rather than classification [20].SVR maps the nonlinear function with independent variables into high dimensional kernel feature space, and then constructs a linear hyperplane with the biggest margin in the kernel space.SVR, uses a new type of loss function, i.e., ε-insensitive loss function, to minimize the norm of the normal vector regarding the hyperplane, which reduces the complexity of the prediction model.SVR is formulated as quadratic programming: where a pair (x t , y t ) is an observation with independent variable x t and response variable y t at period t, w is the normal vector of the hyperplane, and error ε ≥ 0 is a pre-determined parameter.
The goal is to find a hyperplane w T x t + b with an appropriate ε deviation between the targets y i and predicted value [20].Variants of SVR models with different parameters result in different tradeoffs between model complexity and degree of deviation, or choose a kernel function that best represents the distribution of the dataset.

Data Science
This section describes the proposed approach to extract the key features and predict the RUL of bearings.The approach includes the data collection, data preprocessing, feature extraction, feature selection and prediction models.This is as shown in Figures 2 and 3.The vibration data is collected from an operational real-world induction motor.Besides to vibration signals, the proposed approach can be applied to other similar-type variables such as voltage, current, loading, speed, etc. from an operational real-world induction motor.Besides to vibration signals, the proposed approach can be applied to other similar-type variables such as voltage, current, loading, speed, etc.

Experimental Platform and Data Collection
In an experimental rotary system, the facilities consist of a three-phase induction motor, tachometer, torsion meter, pump, coupling, magnetic powder brake and sensors.For RUL analysis, the motor runs and the data are collected until bearing failure through the electrical discharge fatigue destruction, which shortens the time-to-failure.The steps are as follows:

Experimental Platform and Data Collection
Step 1: Replace the bearings with the hydraulic oil inside to shorten their lifetimes.
Step 2: Use laser shaft alignment to adjust the platform for steadiness and accuracy.
Step 3: Open the motor power switch and the threshold switch and start the data-acquisition program.

Experimental Platform and Data Collection
In an experimental rotary system, the facilities consist of a three-phase induction motor, tachometer, torsion meter, pump, coupling, magnetic powder brake and sensors.For RUL analysis, the motor runs and the data are collected until bearing failure through the electrical discharge fatigue destruction, which shortens the time-to-failure.The steps are as follows:

Experimental Platform and Data Collection
Step 1: Replace the bearings with the hydraulic oil inside to shorten their lifetimes.
Step 2: Use laser shaft alignment to adjust the platform for steadiness and accuracy.
Step 3: Open the motor power switch and the threshold switch and start the data-acquisition program.
Step 4: After the rotating motor reaches a steady state, check to see if there's anything abnormal that will signala wrong setup or a false positive.
Step 5: Open the self-coupling switch and discharge switch and adjust the current and voltage.
Step 6: Check the controlling program and then start the experiment.
Step 7: Collect data until the vibration signal reaches the pre-determined threshold of failure.
Step 8: Rerun the experiment several times (i.e., iterations) to generate more observations.

Data Preprocessing
The data collection process produces a large-scale dataset which is computationally burdensome.To reduce the volume of data and retain the characteristics of original dataset, the steps are as follows:

Data Preprocessing
Step 1: Remove unrelated and redundant information such as serial number, sampling rate, etc.
Step 2: Merge the data files from the same experiment into a single table for processing.
Step 3:Use SW to reduce the size of the dataset and retain the main characteristics of the original dataset.
Step 4: Flip the negative amplitude to the positive signals and find the maximal value in the window size that generates the new dataset (called aggregated data).
Note that, in reality, there is probably no machine that just starts running when we are going to collect the data in a real-setting factory.The motors may have been working for a long while.To mimic the real-world condition of the system, a random interval sampling method is proposed to pick a time point and collects the data with a fixed time window during the lifetime of the system's induction motor, which is used for RUL analysis.

Feature Extraction
As mentioned, the data with a fixed time interval are randomly sampled.Each sample size (i.e., fixed interval) equals twenty seconds of data collection.Randomly sampling the intervals establishes the uncertain time periods of the rotary motor.The features are extracted from the time domain and the frequency domain.The time domain is split into a time-series dimension (i.e., the aggregated dataset in the time domain) and a change-point dimension (i.e., the time index of the extreme values of amplitude to create the change-point dimension by piecewise linear segmentation).The details are as follows:

Time-series Dimension
OLS is used to identify statistical features from the random interval sampling dataset.Given s is the sample standard deviation and n is the sample size.There are six statistical features: Energies 2019, 12, 801 8 of 18 (1).mean squared error 4 /s 4 (6).max = max{|x t |} in the random interval

Change-point Dimension
The time index of the change-point in the time-series dimension represents the value in the change-point dimension.An illustrative example is shown in Figure 4, where the sequence {c j } represent the time index of the change-point.Given s c is the sample standard deviation and n c is the sample size with respect to sequence {c j }.There are four more features: (7).standard deviation (s c ) = Step3: Extract the features in time-series, change-point and frequency dimensions.
Step4: Repeat Step1 to Step4 several times according to the physical meaning.
Step5: Create one large table containing the features and the RUL.

Frequency Domain
Fast Fourier Transformation (FFT), which transforms each part of the random interval, contains the amplitude (i.e., power) in the time domain that is changed into the frequency domain.In general FFT obtains the spectrum (i.e., the power in each frequency revealed by the data-point) and the corresponding frequency.The power-frequency relation of a random interval is shown in Figure 5. Feature extraction uses the corresponding frequencies of the maximum amplitude and the second strongest amplitude as the features, respectively.The first-to-fourth moments are the mean, variance, skewness and kurtosis, of the spectrum in each random time interval.The steps of the feature extraction from a random time interval sampling are as follows: Energies 2019, 12, 801 9 of 18

Feature Extraction
Step 1: Set the size of the fixed time interval.
Step 2: Choose a start-point randomly and calculate the RUL of this sampling (RUL = failure time minus the right-side index of the window).
Step 3: Extract the features in time-series, change-point and frequency dimensions.
Step 4: Repeat Step1 to Step4 several times according to the physical meaning.
Step 5: Create one large table containing the features and the RUL.

Feature Selection
Using the regression-based method to predict the RUL, a variance inflation factor (VIF) detects the multi-collinearity issue of the features before running a regression model [21]: where R 2 i is the coefficient of determination of the regression model using the i th predictor x i as the response variable on the remaining predictors.If VIF exceeds 10, significant multi-collinearity exists [20], and features with the least information loss need to be removed.After correcting for multi-collinearity, stepwise regression is used to obtain the features that significantly influence the RUL (i.e., selected features hereafter) and to remove the less important ones.

Model Adjustment and Prediction
Based on the significant features obtained from random interval sampling, the prediction models are applied to identify unstable signals and predict the RUL of the motor's bearings and other components.The OLS model is applied first.The basic concept of OLS is to minimize the sum of squares error, but the error should follow assumptions such as normality, homoscedasticity and independence.If heteroscedasticity is detected by the White test, WLS is applied to eliminate the effect and correct the variances of the coefficient of independent variables (i.e., selected features).If the weights are unknown, FGLS is used to find the proper weights.For comparison of prediction accuracy, we also suggest PLS and SVR.The procedure is as follows:

Model Adjustment and Prediction
Step 1: Fit OLS regression on the dataset.
Step 2: Apply White test.
Step 3: If the hypothesis of homoscedasticity is rejected, go to Step 4; otherwise, stop and use OLS to predict the RUL.
Step 4: If the structure of the weights of variances adjustment are known, add the weights into the original OLS model called WLS; otherwise, estimate the weights and add them into the original OLS model called FGLS.

Empirical Study and Experiments
An empirical study of the accelerated electrical discharge destruction experiment is conducted to validate the proposed approach.The data and signals are collected from four accelerometers, three-phase voltage, three-phase current, a rotary encoder, and a torque meter.The data collection interval is 10 min for each twenty-second time span.The rotation speed is 30 Hz (1800 rpm).

Data collection and data preprocessing
The experiment data are collected from the facilities and Labview program of a queued message handler (QMH) structure.The SW algorithm is used to reduce the raw data in the very large dataset (see Figure 6), while maintaining the characteristics of the raw data with sufficient information.The aggregated dataset is shown in Figure 7.Some of the findings and the data source (see Table 1) are as follows:


The sensors recorded the data at 25,600 data-points per second. Redundant information was removed and the data files were compiled into one large data table.Some of the findings and the data source (see Table 1) are as follows:


The sensors recorded the data at 25,600 data-points per second.


Redundant information was removed and the data files were compiled into one large data table.Some of the findings and the data source (see Table 1) are as follows: • The sensors recorded the data at 25,600 data-points per second.

Model Adjustment and Prediction
To investigate the heteroscedasticity issue, the White test is applied and the p-value is 1.899 × 10 −10 (<0.05).It rejects the homoscedasticity assumption.WLS or FGLS regression models are used to adjust the influence of heteroscedasticity (non-constant variances) caused by the differences in the amplitude of the bearing signals.Higher variance terms are given lower weights to reduce their influence on the fitted value.When nothing is known about the variance of the residuals, the weights are estimated first, and then added into the original OLS model, which is called FGLS regression.
First, the FGLS is compared to OLS.We focus on the coefficient estimate and its interpretation.Second, to ensure the prediction models' robustness, the aggregated dataset is divided into ten sub-datasets and 10-fold cross validation (i.e., K = 10) is implemented.As an example, nine of ten folds are used as the training dataset and the remaining fold as the testing dataset.The procedure repeats until every fold is taken as the testing dataset.

OLS & FGLS
The coefficient estimation of the selected features by OLS and FGLS with respect to the first testing dataset (k=1), are shown in Tables 6 and 7, respectively.FGLS adjusts and reduces the variance of the coefficient (the coefficient of standard error is lower than the OLS estimate), which implies a robust estimation.In particular, the FGLS model adds lower weights on the higher variance to reduce the effect.Thus, the standard error of this coefficient estimate is reduced.For the slope interpretation, using the feature "Max.ts" as an example, the FGLS coefficient estimate -40658.3implies that increasing one unit of the maximal value of amplitude in the sampling interval decreases the RUL by −40,658.3 points in the aggregated data, which is nearly 11.2 h of lifetime reduction.In practice, regression-based approach provides excellent causal interpretation, and thus engineer can refer to the selected features and make the maintenance decision (i.e., PdM).The dataset is split into training and testing sets, and the 10-fold cross validation (CV) of OLS and FGLS is conducted.The prediction results are shown in Tables 8 and 9.While the prediction performance of OLS appears "better" than the prediction performance of FGLS, the presence of heteroscedasticity defeats the underlying assumptions of OLS.The prediction results of OLS and FGLS with the real value and the fitted value are shown in Figures 8 and 9, respectively.The x-axis is the ordered sample index by the time, and the y-axis represents the RUL prediction.The intervals of each ordered index on the x-axis may not be equal because of random interval sampling.The prediction results of OLS and FGLS with the real value and the fitted value are shown in Figure 8 and Figure 9, respectively.The x-axis is the ordered sample index by the time, and the y-axis represents the RUL prediction.The intervals of each ordered index on the x-axis may not be equal because of random interval sampling.Generally, the prediction values of the RUL by FGLS are lower than OLS.The underestimated RUL by FGLS, implies the possibility of an early warning signal prior to actual machine fatigue or failure, and thus it may increase the maintenance cost for bearing replacement.In both figures, though it is still BLUE (but not efficient), the OLS estimate is no longer a robust estimate and shows the worsening performance in the later stage because of heteroscedasticity.FGLS corrects the irregular variances by estimating the parameters twice to obtain the proper weights for adjusting the though it is still BLUE (but not efficient), the OLS estimate is no longer a robust estimate and shows the worsening performance in the later stage because of heteroscedasticity.FGLS corrects the irregular variances by estimating the parameters twice to obtain the proper weights for adjusting the variances, though its estimate is biased (but efficient).The results should be interpreted with care so that maintenance is not postponed.Generally, the prediction values of the RUL by FGLS are lower than OLS.The underestimated RUL by FGLS, implies the possibility of an early warning signal prior to actual machine fatigue or failure, and thus it may increase the maintenance cost for bearing replacement.In both figures, though it is still BLUE (but not efficient), the OLS estimate is no longer a robust estimate and shows the worsening performance in the later stage because of heteroscedasticity.FGLS corrects the irregular variances by estimating the parameters twice to obtain the proper weights for adjusting the variances, though its estimate is biased (but efficient).The results should be interpreted with care so that maintenance is not postponed.
In addition, both FGLS and OLS underestimate RUL in the early stage of the data.In practice, poor prediction results are common at the early stages, because machines operate erratically (We generally collect data when the motors have been working for a long while).Though FGLS underestimates the RUL more in the beginning, OLS becomes overestimate later, which will delay warning (called type-II error) and may cause a catastrophic failure.The result claim that FGLS provides a more conservative and secure decision; however, OLS may overestimate the RUL and postponing the timing for equipment maintenance may lead to equipment failure, serious capacity loss, or defects.Therefore, this study suggests applying FGLS to afford the less loss than OLS.
Although FGLS appears to provide a better result its prediction performance is not good enough as shown in Table 9, and it implies model misspecification and the linear regression is not justified.To overcome the problem, PLS and SVR is suggested.

PLS & SVR
The prediction performance by SVR is shown in Figure 10 and the 10-fold CV is shown in Table 10.Unlike OLS and FGLS, SVR handles prediction bias in the early stage, thus giving a more accurate fitted result.The variance with R-squared of approximately 0.82 demonstrates its accurate prediction of RUL.Engineer or staff can make the PdM decision to maintain and repair the components of the equipment according to the result.A comparison of the prediction performances of OLS, FGLS, and SVR is shown in Figure 11.Again, however, the results must be treated with care, because SVR maps the nonlinear function with features into high dimensional kernel feature space and the interpretation is limited.SVR needs more computation time (in this case, more than 2 hours) because it is difficult to find the best or proper combination of the parameters (platform: OS Windows 7, CPU Intel Core i7-4790 CPU @ 3.60GHz, RAM 8 GB).Note that the result of PLS for this case is neglected because the number of its components equals the number of selected features (i.e.PLS and OLS results are very similar).

Conclusions
This study proposed a data science approach, which improved the data preprocessing procedure, identified key features, and built several prediction models to predict the RUL.FGLS was developed to address heteroscedasticity.If RUL is long, we keep operating the equipment; otherwise, we may replace or repair the parts before equipment failure.The numerical study showed that OLS and FGLS obtained prediction results quickly, whereas SVR provided more accurate prediction value although incurring a computational burden.
The contribution of this study suggests: (1) using a sliding window algorithm to handle largescale datasets, (2) using a random interval sampling to collect signal data at any time interval, (3) Again, however, the results must be treated with care, because SVR maps the nonlinear function with features into high dimensional kernel feature space and the interpretation is limited.SVR needs more computation time (in this case, more than 2 h) because it is difficult to find the best or proper combination of the parameters (platform: OS Windows 7, CPU Intel Core i7-4790 CPU @ 3.60 GHz, RAM 8 GB).Note that the result of PLS for this case is neglected because the number of its components equals the number of selected features (i.e., PLS and OLS results are very similar).

Conclusions
This study proposed a data science approach, which improved the data preprocessing procedure, identified key features, and built several prediction models to predict the RUL.FGLS was developed to address heteroscedasticity.If RUL is long, we keep operating the equipment; otherwise, we may

Table 5 .
Coefficients of the selected features.

Table 6 .
Coefficient estimation of OLS of testing dataset k = 1.

Table 7 .
Coefficient estimation of FGLS of testing dataset k = 1.