A Tutorial for Feature Engineering in the Prognostics and Health Management of Gears and Bearings

: Gears and bearings are one of the major components of many machines, which can result in operation downtime or even catastrophic failure of a whole system. This paper addresses a tutorial for the features extraction and selection of the gears and bearings, which is known as feature engineering, a prerequisite step for the prognostics and health management (PHM) of these components. While there have been many new developments in this ﬁeld, no studies have addressed the tutorial aspects of features engineering to aid engineers in solving problems by their own e ﬀ ort, which is of practical importance for successful PHM. The paper aims at helping beginners learn the basic concepts, and implement the algorithms using the public datasets as well as those made by the authors. Matlab codes are provided for them to implement the process by their own hands


Introduction
Prognostics and health management (PHM) is an engineering discipline that identifies fault severity and predicts the remaining useful life (RUL) of the target system.PHM is the enabling technology towards condition-based maintenance (CBM), which is the future maintenance strategy as opposed to corrective maintenance (CM) and periodic maintenance (PM).Numerous books have been published recently addressing various aspects such as signal processing [1], data driven diagnostics [2,3], prognostics [4], and practical applications [5,6].In general, PHM consists of three steps: (1) data acquisition and features extraction, (2) fault diagnosis, and (3) failure prognosis [7][8][9].As PHM is performed from signals that are obtained from sensors such as vibration, acoustic emissions, or oil debris, it is crucial to remove undesired noise and extract the valuable information called features from the raw signals in the first step.Based on the extracted features, the fault mode and its severity are identified through the diagnosis, and RUL is predicted using the prognosis algorithm.While there are many useful features for this purpose today, good features vary depending on PHM steps and applications.For example, in the diagnosis, features should show a clear difference between the normal and fault states to distinguish the health condition.In the prognosis, on the other hand, features showing a monotonic trend over time are considered good indicators for RUL prediction.In this context, the process to obtain such good features is also called "feature engineering".
In rotating machinery, gears and bearings are important components because they transmit power while supporting the applied load in the system.Therefore, their unexpected failure and degradation during operation lead to economic loss and catastrophic accidents.In fact, the failure of these components accounts for a large proportion of whole system failures [10], which is the reason why many are focusing on bearing and gear PHM.In this context, the feature engineering of these

PHM Framework and Datasets for the Tutorial
The PHM framework can be divided into several steps as shown in Figure 1.After the data acquisition, the signal processing step is performed which aims to remove noise from the raw signal to obtain only the necessary information for the diagnosis and prognosis.Discrete signal separation techniques remove the periodic signals that are not related to the fault such as the ones transmitted from nearby equipment.Residual fault signals are then enhanced by signal enhancement techniques.If necessary, the signals, whether in their raw form or after processing, can also be decomposed into various components to facilitate the exploration of fault information.Detailed information about the techniques is addressed in the previous tutorial study [13].Once the signal processing is done, the feature engineering process is conducted next, where the features that represent the fault or degradation of the system can be extracted, evaluated, and selected as shown in Figure 1.Finally, by exploiting these features, fault diagnosis is performed to classify the state of the target system or failure prognosis is performed to predict when the failure will occur in the future.Note that feature engineering requires substantial skills and expertise, which is the key to the success of the subsequent steps: diagnosis and prognosis.Recently, there is a new trend that takes advantage of deep learning-based neural network approaches to avoid the feature engineering Note that feature engineering requires substantial skills and expertise, which is the key to the success of the subsequent steps: diagnosis and prognosis.Recently, there is a new trend that takes advantage of deep learning-based neural network approaches to avoid the feature engineering process Appl.Sci.2020, 10, 5639 3 of 20 as illustrated in Figure 2 [11].The deep learning-based method utilizes the raw data directly in the diagnosis right after the signal processing, as opposed to the traditional data-driven method that underlies the feature engineering process.However, since the deep learning-based method is not mature yet and poses several challenges to be addressed in the future, this tutorial presents the traditional feature engineering method, aided by Matlab implementations.
Appl.Sci.2020, 10, x FOR PEER REVIEW 3 of 20 process as illustrated in Figure 2 [11].The deep learning-based method utilizes the raw data directly in the diagnosis right after the signal processing, as opposed to the traditional data-driven method that underlies the feature engineering process.However, since the deep learning-based method is not mature yet and poses several challenges to be addressed in the future, this tutorial presents the traditional feature engineering method, aided by Matlab implementations.In the PHM study of gears and bearings, the general method of data acquisition is to measure vibration by attaching an accelerometer to the housing of gears or bearings.Gathering the data is, however, too expensive, requiring a test rig, sensors, and time-consuming tests.To avoid this, several universities and research institutions have released their data to the public, which are summarized in reference [13].They are classified into diagnostic and prognostic data.Diagnostic data are obtained by running the components under normal and fault conditions, respectively.Prognostic data are obtained by running the components constantly from the normal to the failure state.Among them, the data used in this paper are presented in Table 1 and the details are as follows: HS (High speed) gear dataset [14] is the vibration data obtained for 6 s with a sampling frequency of 97,656 Hz from the 3 blades upwind V90 wind generator, which produces an output of 3 MW and operates at 30 Hz.The gear has 32 teeth and the gear mesh frequency (GMF) is 960 Hz.The data are collected in the normal and fault conditions, in which the natural faults are applied in the pinion gear.Among 17 files, 11 are fault and 6 are normal.The next dataset is KAUG (Korea Aerospace University Gear), obtained from the gearbox testbed made by the authors.The encoder signals are acquired from the testbed at a 10 kHz sampling rate.There are 31 data files, of which 10 are normal, 10 spall, and 11 crack.The KAUG datasets are given on the authors' homepage www.kau-sdol.com.The CWRU bearing dataset is provided by Case Western Reverse University [15].Two bearings are installed at the motor drive-and fan-end, which are operated under various motor loads and speeds.Among the many cases, this study considers the fault data at the inner and outer race of the drive-end with the hole diameter of 0.021 inches.The operating load is 3 HP, speed is 1730 rpm, and sampling frequency is 12 kHz.In the case of CWRU, a single dataset in each fault is divided into 10 segments to create 10 sets of data for the diagnostic study.The last dataset is the run-to-fail data for the prognostic study provided by the Center for Intelligent Maintenance Systems (IMS) at University of Cincinnati [16].The IMS bearing dataset has been collected from 4 bearings operating at 2000 rpm under 6000 lbs radial load applied to the shaft and bearing, with a sampling rate of 20,480 Hz.There are three datasets provided by the compressed file format.Each dataset has the damage announcement at the end of the experiment: Dataset 1 with inner race fault at bearing #3 and rolling element fault at bearing #4, dataset 2 with outer race fault at bearing #1, and dataset 3 with outer race fault at bearing #3.In the PHM study of gears and bearings, the general method of data acquisition is to measure vibration by attaching an accelerometer to the housing of gears or bearings.Gathering the data is, however, too expensive, requiring a test rig, sensors, and time-consuming tests.To avoid this, several universities and research institutions have released their data to the public, which are summarized in reference [13].They are classified into diagnostic and prognostic data.Diagnostic data are obtained by running the components under normal and fault conditions, respectively.Prognostic data are obtained by running the components constantly from the normal to the failure state.Among them, the data used in this paper are presented in Table 1 and the details are as follows: HS (High speed) gear dataset [14] is the vibration data obtained for 6 s with a sampling frequency of 97,656 Hz from the 3 blades upwind V90 wind generator, which produces an output of 3 MW and operates at 30 Hz.The gear has 32 teeth and the gear mesh frequency (GMF) is 960 Hz.The data are collected in the normal and fault conditions, in which the natural faults are applied in the pinion gear.Among 17 files, 11 are fault and 6 are normal.The next dataset is KAUG (Korea Aerospace University Gear), obtained from the gearbox testbed made by the authors.The encoder signals are acquired from the testbed at a 10 kHz sampling rate.There are 31 data files, of which 10 are normal, 10 spall, and 11 crack.The KAUG datasets are given on the authors' homepage www.kau-sdol.com.The CWRU bearing dataset is provided by Case Western Reverse University [15].Two bearings are installed at the motor drive-and fan-end, which are operated under various motor loads and speeds.Among the many cases, this study considers the fault data at the inner and outer race of the drive-end with the hole diameter of 0.021 inches.The operating load is 3 HP, speed is 1730 rpm, and sampling frequency is 12 kHz.In the case of CWRU, a single dataset in each fault is divided into 10 segments to create 10 sets of data for the diagnostic study.The last dataset is the run-to-fail data for the prognostic study provided by the Center for Intelligent Maintenance Systems (IMS) at University of Cincinnati [16].The IMS bearing dataset has been collected from 4 bearings operating at 2000 rpm under 6000 lbs radial load applied to the shaft and bearing, with a sampling rate of 20,480 Hz.There are three datasets provided by the compressed file format.Each dataset has the damage announcement at the end of the experiment: Dataset 1 with inner race fault at bearing #3 and rolling element fault at bearing #4, dataset 2 with outer race fault at bearing #1, and dataset 3 with outer race fault at bearing #3.

Feature Extraction
Feature engineering begins with feature extraction from data in its raw form or after going through noise removal by signal processing.According to Caesarendra et al. [17], the features are usually divided into three categories: time domain, frequency domain, and time-frequency domain.Among these, time-domain features are the simplest and most widely used, and are not just limited to gears and bearings.On the other hand, there are some unique features specially developed for gears and bearings, respectively.These are addressed in the following sections.

Features for Gears
In feature engineering for gears, several signal processing steps are usually taken and appropriate features are extracted from each step that enable fault identification from the normal.This is explained in Figure 3, which indicates that the raw data are processed step by step, and after each step, relevant features are extracted.The signal in the time domain and its Fourier transform in the frequency domain are also illustrated.Note that the features are divided into two groups: general features, which are the time domain features shown by the blue dotted box, and specific features for the gears as shown by the red dotted box.Note that the latter have been developed specifically for gears, for improved capability of fault diagnosis as found in many articles in the literature (e.g., [18][19][20][21]).[16] Prognosis Acceleration 3

Feature Extraction
Feature engineering begins with feature extraction from data in its raw form or after going through noise removal by signal processing.According to Caesarendra et al. [17], the features are usually divided into three categories: time domain, frequency domain, and time-frequency domain.Among these, time-domain features are the simplest and most widely used, and are not just limited to gears and bearings.On the other hand, there are some unique features specially developed for gears and bearings, respectively.These are addressed in the following sections.

Features for Gears
In feature engineering for gears, several signal processing steps are usually taken and appropriate features are extracted from each step that enable fault identification from the normal.This is explained in Figure 3, which indicates that the raw data are processed step by step, and after each step, relevant features are extracted.The signal in the time domain and its Fourier transform in the frequency domain are also illustrated.Note that the features are divided into two groups: general features, which are the time domain features shown by the blue dotted box, and specific features for the gears as shown by the red dotted box.Note that the latter have been developed specifically for gears, for improved capability of fault diagnosis as found in many articles in the literature (e.g., [18][19][20][21]).As shown in Figure 3, the raw signal includes the shaft and GMF frequencies (and their harmonics) as well as the noise existing at the low frequencies in the frequency spectrum.The first step is to remove these unnecessary signals, by the time synchronous averaging (TSA) and high pass filtering (HPF).TSA is to average out the signal over each revolution to remove random noise As shown in Figure 3, the raw signal includes the shaft and GMF frequencies (and their harmonics) as well as the noise existing at the low frequencies in the frequency spectrum.The first step is to remove these unnecessary signals, by the time synchronous averaging (TSA) and high pass filtering (HPF).TSA is to average out the signal over each revolution to remove random noise occurring during the rotation, which can be executed by the Matlab built-in function x = tsa(x, fs, tach, PulsePerRotation , ppr), where fs is the sampling frequency, tach is the tachometer signal, and ppr is PulsePerRotation.HPF is done to remove the low frequency components including the shaft and its harmonic frequencies, which are irrelevant to the gear fault.HPF first determines the filter parameters which are executed by the Matlab built-in function [b,a] = butter(ord, Wn, high ), where ord is the filter order, and Wn = sf/(fs/2) is the Nyquist frequency, which is the shaft frequency sf divided by fs/2.The filtering is then performed by x = filter(b, a, x).In this study, the filter order is given by 1.As the order increases, the filtering becomes sharp, which improves the passing performance, but decreases group delay characteristics, causing phase distortion.Hence, the filter order should be properly selected.In this paper, filter order is given by 1 among the values 1 to 3 often used in the filtering.
After this step, 11 time domain features are extracted, which are explained in Table 2. Recall that the time domain features can be extracted either from the raw signal directly or after taking this step depending on the problem.The Matlab function to extract the time domain features is given in Appendix A as [feature, feature_name] = TimeFeatures(x), where feature and feature_name are the array of feature values and their names, respectively.Table 2. General time domain features [17,22].

Abbreviation
Full Name Brief Explanation Formula Matlab Functions

MEAN
Mean Average

RMS
Root mean square Value that generally tends to get bigger as the degree of fault in the bearing increases  2. Note that each set of feature data is normalized by mean and standard deviation, respectively.In the result, most of the features classify the fault (x) from the normal (o) well, but some (SK and SF) do not, which means that further processing is necessary to select only the useful features.
At this step, the gear specific features are extracted as shown in Figure 3, which are the figure of merits zero (FM0) and sideband energy ratio (SER).FM0 serves to detect changes in gear engagement patterns as an indicator of the gear's main fault: where PP x is the difference between the maximum and minimum of the time domain signal, P i is the amplitude of the ith harmonic frequencies of GMF, and N har is the number of harmonic frequencies.
The GMF is the frequency caused by the engagement of gear teeth, given by the product of shaft frequency and number of teeth.In the frequency domain of gear signal, sidebands occur at both sides of the GMF and its harmonics with the interval of shaft frequency.SER is the ratio of the sum of sideband frequency amplitudes to that of the first harmonic of GMF: where P 1 is the amplitude at the first harmonic of GMF, N sb is the number of sidebands, which is usually 3 [23], and S + i and S − i denote the amplitudes of the ith sideband at the first harmonic of GMF.The next step is to obtain the residual signal by removing the components of GMF and their harmonics, which are those not related with the fault.From the residual signal, the feature NA4 is obtained, which is to detect progress of defects in gears.NA4 is obtained by dividing the fourth statistical moment of the residual signal (res) by the averaged variance of the residual signal over the last M revolutions, raised to the second power: where N is the number of data points in one revolution, and res is the mean of res.NB4 is similar to NA4 except that instead of the residual signal, NB4 uses the envelope of band-pass filtered (BPF) signal centered at the GMF.The BPF is to leave the signals in the band while removing those outside.
The feature was devised from the idea that a few damaged gear teeth will cause transient load fluctuations different from the normal fluctuations, which can be identified by the envelope (s) of BPF signal.NB4 is given by Regarding the envelope, more detail is given in the next section for the bearing features extraction.
Appl.Sci.2020, 10, x FOR PEER REVIEW 5 of 20 occurring during the rotation, which can be executed by the Matlab built-in function x = tsa(x, fs, tach, ′PulsePerRotation′, ppr), where fs is the sampling frequency, tach is the tachometer signal, and ppr is PulsePerRotation.HPF is done to remove the low frequency components including the shaft and its harmonic frequencies, which are irrelevant to the gear fault.HPF first determines the filter parameters which are executed by the Matlab built-in function [b,a] = butter(ord, Wn, ′high′), where ord is the filter order, and Wn = sf/(fs/2) is the Nyquist frequency, which is the shaft frequency sf divided by fs/2.The filtering is then performed by x = filter(b, a, x).In this study, the filter order is given by 1.As the order increases, the filtering becomes sharp, which improves the passing performance, but decreases group delay characteristics, causing phase distortion.Hence, the filter order should be properly selected.In this paper, filter order is given by 1 among the values 1 to 3 often used in the filtering.After this step, 11 time domain features are extracted, which are explained in Table 2. Recall that the time domain features can be extracted either from the raw signal directly or after taking this step depending on the problem.The Matlab function to extract the time domain features is given in Appendix A as [feature, feature_name] = TimeFeatures(x), where feature and feature_name are the array of feature values and their names, respectively.
Figure 4 is the result of extracted time domain features for the example of HS gear dataset: 6 normal and 11 fault data, in which the acronyms are found in Table 2.Note that each set of feature data is normalized by mean and standard deviation, respectively.In the result, most of the features classify the fault (x) from the normal (o) well, but some (SK and SF) do not, which means that further processing is necessary to select only the useful features.The third step is to obtain the difference signal by further removing the sideband frequencies of the GMF from the residual signal.From the difference signal, FM4, M6A, M8A, and energy ratio (ER) are extracted.FM4 is the feature to detect the pattern changes resulting from the damage on a few gear teeth.FM4 is defined as the kurtosis of the difference signal (di f f ): This indicates that if di f f is from the normal gear, it will follow Gaussian noise so that the FM4 should be 3, whereas it will be greater than 3 if defective.M6A and M8A were developed to detect surface damage on machinery components.They are similar to FM4 except that M6A and M8A are more sensitive to peaks of the di f f signal with higher power with 6 and 8, respectively: ER was proposed to define the RMS of di f f signal divided by the amplitudes of the GMF and its harmonics with further addition by their respective sidebands: The Matlab functions to obtain the residual and difference signal are given by res_sig = res_gear(x,fs,gmf, cutoff,ord), and diff_sig = diff_gear (x,fs,gmf,sf,cutoff,ord), respectively.In the functions, cutoff is the bandwidth to filter out, and ord is the filter order.Within each function, the removal of the frequency component for the bandwidth [f 0 -cutoff, f 0 +cutoff] is carried out by the Matlab built-in function (b,a) = butter(ord,[f 0 -cutoff f 0 +cutoff], stop ), followed by x = filter(b, a, x).This is also called notch filtering.Note that the cutoff is imposed to account for the inaccurate GMF, which usually occurs in practice.In the HS gear dataset, gmf = 960 Hz, cutoff = 2, ord = 1 are used.The Matlab function to extract all the above-mentioned gear specific features is given by [feature, feature_name] = Gear_feat(tsa_sig, res_sig, diff_sig, gmf, sf, fs). Figure 5 represents the result of extracted features for the HS gear.As in Figure 4, some are good classifiers, while others such as SER and M6A are not.Methods to select useful classifiers will be explained in Section 4.

Features for Bearings
Figure 6 illustrates the typical features extraction process in the bearings PHM, of which the basic philosophy is the same: carry out the signal processing steps to remove unnecessary signal or noise.In general, the bearing signal consists of the discrete (predictable) part, which is irrelevant to

Features for Bearings
Figure 6 illustrates the typical features extraction process in the bearings PHM, of which the basic philosophy is the same: carry out the signal processing steps to remove unnecessary signal or noise.In general, the bearing signal consists of the discrete (predictable) part, which is irrelevant to the fault since it is from the other components such as the shaft or gears, and the remaining (unpredictable) part.With this in mind, the first step is to remove the discrete signal by using the autoregressive (AR) filter, which is to obtain the discrete part of the signal based on the past data for a certain period.Then the residual part of the signal, which may include the fault information, is obtained by subtracting this from the raw data.In the figure, this usually corresponds to the removal of the low frequency components in the frequency domain.The discrete signal is made by the AR model: where x is the raw signal, x p is the discrete (predicted) signal, n and k are the indices in time, a(k) and p are the parameters and order of the AR model, respectively.The residual signal is then obtained by The AR model is obtained by Matlab built-in functions a = aryule(x,p) followed by xp = filter([0 -a(2:end)],1,x).Note here that the order p should be assigned carefully since it affects the performance greatly: too high may include even the fault signal, too low may lose the periodicity in the prediction.In general, the order p is determined such that the kurtosis of the residual signal is maximized.General time domain features are extracted from the residual signal as depicted by the blue dotted box.After removing the discrete signal with the AR filter, the next step is the demodulation of the signal.Whenever the fault exists in the race or elements, the bearing produces impact (fault) signals with a certain period called bearing fault frequencies.However, it is usually amplitude-modulated by much higher resonant frequencies, as found in Figure 6, which means that the fault signal is buried by the resonance signals.In order to separate the fault signal from this, envelope analysis, also called the demodulation process [13], is carried out to extract the amplitudes modulated by the carrier (resonance) signal.To this end a Hilbert transform is conducted, to shift the phase by −90 degrees.Then the so-called analytic signal is defined by extending the real signal to the imaginary dimension as follows: where ( ) is the Hilbert transformed signal.The envelope signal is then obtained by calculating the After removing the discrete signal with the AR filter, the next step is the demodulation of the signal.Whenever the fault exists in the race or elements, the bearing produces impact (fault) signals with a certain period called bearing fault frequencies.However, it is usually amplitude-modulated by much higher resonant frequencies, as found in Figure 6, which means that the fault signal is buried by the resonance signals.In order to separate the fault signal from this, envelope analysis, also called the demodulation process [13], is carried out to extract the amplitudes modulated by the carrier (resonance) signal.To this end a Hilbert transform is conducted, to shift the phase by −90 degrees.Then the so-called analytic signal is defined by extending the real signal to the imaginary dimension as follows: x analytic (t) = x(t) + j x(t), (11) where x(t) is the Hilbert transformed signal.The envelope signal is then obtained by calculating the magnitude x analytic (t) by using the Matlab built-in function x = abs(Hilbert(x)).As a result, the signals of resonant (higher) frequencies are removed in the envelope signal, and only those of the bearing fault (lower) frequencies remain in the frequency domain as shown in the figure, from which the bearing-specific features can be extracted.
The bearing fault frequencies can be identified by the bearing geometry as shown in Figure 7. Bearings consists of inner, outer race and balls or rollers.If one of them includes the defect, the bearing produces signals at the specific frequencies while it rotates and passes through the defect.They are the ball pass frequency of outer race (BPFO), ball pass frequency of inner race (BPFI) and ball spin frequency (BSF), which are defined as follows [24]: where D, B D , N B and α are the bearing diameter, ball diameter, number of the balls, and the contact angle, and w is the rotating frequency of the shaft, respectively.By examining the amplitudes at these frequencies-BPFO, BPFI, and BSF, the fault can be identified in the frequency domain as shown in Figure 6.For example, Figure 8 represents the frequency spectrum after performing fast Fourier transform (FFT) for the envelope signal of CWRU dataset where the outer race was artificially damaged.
The peak is apparent at the f BPFO , proving the presence of fault at the outer race.The corresponding Matlab function to extract the fault frequency features is [feature, feature_name] = Bear_feat(x,fs,bff,cutoff) where bff is the vector of bearing fault frequencies given by Equations ( 12)-( 14), and cutoff is to account for the inaccuracy of these frequencies in the real bearing.After removing the discrete signal with the AR filter, the next step is the demodulation of the signal.Whenever the fault exists in the race or elements, the bearing produces impact (fault) signals with a certain period called bearing fault frequencies.However, it is usually amplitude-modulated by much higher resonant frequencies, as found in Figure 6, which means that the fault signal is buried by the resonance signals.In order to separate the fault signal from this, envelope analysis, also called the demodulation process [13], is carried out to extract the amplitudes modulated by the carrier (resonance) signal.To this end a Hilbert transform is conducted, to shift the phase by −90 degrees.Then the so-called analytic signal is defined by extending the real signal to the imaginary dimension as follows: where ( ) is the Hilbert transformed signal.The envelope signal is then obtained by calculating the magnitude ( ) by using the Matlab built-in function x = abs(Hilbert(x)).As a result, the signals of resonant (higher) frequencies are removed in the envelope signal, and only those of the bearing fault (lower) frequencies remain in the frequency domain as shown in the figure, from which the bearing-specific features can be extracted.The bearing fault frequencies can be identified by the bearing geometry as shown in Figure 7. Bearings consists of inner, outer race and balls or rollers.If one of them includes the defect, the bearing produces signals at the specific frequencies while it rotates and passes through the defect.They are the ball pass frequency of outer race (BPFO), ball pass frequency of inner race (BPFI) and ball spin frequency (BSF), which are defined as follows [24]:    8 represents the frequency spectrum after performing fast Fourier transform (FFT) for the envelope signal of CWRU dataset where the outer race was artificially damaged.The peak is apparent at the fBPFO, proving the presence of fault at the outer race.The corresponding Matlab function to extract the fault frequency features is [feature, feature_name] = Bear_feat(x,fs,bff,cutoff) where bff is the vector of bearing fault frequencies given by Equations ( 12)-( 14), and cutoff is to account for the inaccuracy of these frequencies in the real bearing.Figure 9 shows the extracted time domain and bearing-specific features altogether for the CWRU bearing dataset: 10 normal, 10 outer race fault, and 10 inner race fault, with each feature normalized by their mean and SD, respectively.While the gear example handled two classes: only the normal and fault, this is a three-class classification: normal, outer and inner, which is more complex for purposes of distinction.Nevertheless, the visual inspection of the results indicates that the RMS, STD, and SF are good classifiers whereas the others are not.

Feature Evaluation and Selection for Diagnosis
Feature evaluation is to assess how well the features can classify between the normal and faults.Once this is done, a few numbers of the most significant features can be selected for the efficient fault Figure 9 shows the extracted time domain and bearing-specific features altogether for the CWRU bearing dataset: 10 normal, 10 outer race fault, and 10 inner race fault, with each feature normalized by their mean and SD, respectively.While the gear example handled two classes: only the normal and fault, this is a three-class classification: normal, outer and inner, which is more complex for purposes of distinction.Nevertheless, the visual inspection of the results indicates that the RMS, STD, and SF are good classifiers whereas the others are not.Figure 6.For example, Figure 8 represents the frequency spectrum after performing fast Fourier transform (FFT) for the envelope signal of CWRU dataset where the outer race was artificially damaged.The peak is apparent at the fBPFO, proving the presence of fault at the outer race.The corresponding Matlab function to extract the fault frequency features is [feature, feature_name] = Bear_feat(x,fs,bff,cutoff) where bff is the vector of bearing fault frequencies given by Equations ( 12)-( 14), and cutoff is to account for the inaccuracy of these frequencies in the real bearing.Figure 9 shows the extracted time domain and bearing-specific features altogether for the CWRU bearing dataset: 10 normal, 10 outer race fault, and 10 inner race fault, with each feature normalized by their mean and SD, respectively.While the gear example handled two classes: only the normal and fault, this is a three-class classification: normal, outer and inner, which is more complex for purposes of distinction.Nevertheless, the visual inspection of the results indicates that the RMS, STD, and SF are good classifiers whereas the others are not.

Feature Evaluation and Selection for Diagnosis
Feature evaluation is to assess how well the features can classify between the normal and faults.Once this is done, a few numbers of the most significant features can be selected for the efficient fault

Feature Evaluation and Selection for Diagnosis
Feature evaluation is to assess how well the features can classify between the normal and faults.Once this is done, a few numbers of the most significant features can be selected for the efficient fault classification.Although there are many feature evaluation methods for the diagnosis such as Kullback Leibler (KL) divergence, Bhattacharyya distance, features selection by adjusted rand index and standard deviation ratio (FSASR), and support margin local Fisher discriminant ratio (SM-LFDA) [25,26], two of the most popular metrics are introduced in this study to evaluate the performance: Fisher's discriminant ratio (FDR) and J 3 , which share the same mathematical background where the former is for the two classes, whereas the latter is for the higher dimensions.Since the HS gear has two classes, it is evaluated by the FDR.The KAUG and CWRU have three classes and are evaluated by the J 3 .

Fisher's Discriminant Ratio
Suppose that the two classes are Gaussian distributed with the mean and standard deviation being µ 1 , µ 2 , and σ 1 , σ 2 respectively.Then, the FDR is defined as follows where the numerator (µ 1 − µ 2 ) 2 indicates how far the distance is between the centers of two classes, and the denominator σ 2 1 + σ 2 2 indicates how large the dispersion of the two classes is.Table 3 shows the top 5 and bottom 5 FDR values for the total 19 features of HS gear obtained in Figures 4 and 5. Figure 10 shows the probability density functions (PDF) of the MEAN and SK for the two classes, which are those with the highest and lowest FDR values, respectively.Obviously, the MEAN distinguishes the fault from the normal well, whereas the SK does not.These Gaussian PDFs have been obtained by using the mean and standard deviation of the normalized features.Fisher's discriminant ratio (FDR) and , which share the same mathematical background where the former is for the two classes, whereas the latter is for the higher dimensions.Since the HS gear has two classes, it is evaluated by the FDR.The KAUG and CWRU have three classes and are evaluated by the .

Fisher's Discriminant Ratio
Suppose that the two classes are Gaussian distributed with the mean and standard deviation being , , and , respectively.Then, the FDR is defined as follows [25] where the numerator ( − ) indicates how far the distance is between the centers of two classes, and the denominator + indicates how large the dispersion of the two classes is.Table 3 shows the top 5 and bottom 5 FDR values for the total 19 features of HS gear obtained in Figures 4 and 5.
Figure 10 shows the probability density functions (PDF) of the MEAN and SK for the two classes, which are those with the highest and lowest FDR values, respectively.Obviously, the MEAN distinguishes the fault from the normal well, whereas the SK does not.These Gaussian PDFs have been obtained by using the mean and standard deviation of the normalized features.

Scatter Matrices
In the more than two classes problem, the following process should be considered.Denoting the number of classes as M and the prior probability of class as ≅ / where ni is the number of

Scatter Matrices
In the more than two classes problem, the following process should be considered.Denoting the number of classes as M and the prior probability of class as P i n i /N where n i is the number of samples in the ith class and N is the number of total samples, the Within-class scatter matrix, which represents the degree of dispersion within each class, is defined as follows: where S i is the covariance of the feature vector x with the mean µ i at the ith class.The Between-class scatter matrix, which represents the distance of the mean of each individual class from the global mean, is then defined as The mixture scatter matrix is then defined as With these matrices, a new criterion for feature evaluation is defined as follows [25]: Similar to the FDR, the smaller the S w , namely the smaller dispersion within the class, and the larger the S m , meaning the larger distance between the different classes, the larger the J 3 value we obtain, which means a better classification between the classes.The corresponding Matlab function is in Appendix A as J 3 = ScattMat(data,label) where data are the feature vectors obtained in Section 4 and label is the fault mode.In this study, J 3 is calculated to select the two features out of the total features as an illustration.It is applied to the two examples: KAUG and CWRU, which are the three class problems as shown in Table 1.Note in the KAUG problem that the encoder signals are those preprocessed by the TSA.Hence, the 11 time domain features are extracted after applying the HPF only to the signal.Then the J 3 are calculated for any two features combinations from 11 features which amount to 55 cases.The features for top 5 and bottom 5 results are listed in Table 4.The scatter plot of the best (PEAK & CL) and worst (STD & SK) features are also given in Figure 11a,b, respectively.The differences are outstanding: the features with larger J 3 classify the faults much better.In the CWRU problem, the 14 features as shown in Figure 9 are used to find out the two best features based on the J 3 .The top 5 and bottom 5 results are given in Table 5.The scatter plots of the two features with best (SF & P2P) and worst (SK & IF) are also given in Figure 12a,b.The same observations are found in this case as well.

Feature Evaluation and Selection for Prognosis
This section addresses how to evaluate the predictability performance of the features and select useful ones for the purpose of prognosis.Note that the principle for this is very different from those of diagnosis as the words suggest.The procedure is described via the IMS bearing datasets mentioned in Section 3. Bear in mind that the IMS bearing datasets contain the vibration signals for four bearings, in which dataset 1 contains those of the both (horizontal and vertical) directions for each bearing, whereas datasets 2 and 3 contain only one direction.Among these, the horizontal signal for the bearing #3 of dataset 1 is used for the demonstration, where the inner race is damaged during the experiment.As opposed to the diagnosis where the data are acquired for each discrete class such as the normal, inner race fault and so on, the data in the prognosis are acquired with a certain interval over time for the purpose of monitoring the degradation of the fault.In the IMS bearing, the duration for a single measurement is one second, which are stored in an individual file.They are done by every

Feature Evaluation and Selection for Prognosis
This section addresses how to evaluate the predictability performance of the features and select useful ones for the purpose of prognosis.Note that the principle for this is very different from those of diagnosis as the words suggest.The procedure is described via the IMS bearing datasets mentioned in Section 3. Bear in mind that the IMS bearing datasets contain the vibration signals for four bearings, in which dataset 1 contains those of the both (horizontal and vertical) directions for each bearing, whereas datasets 2 and 3 contain only one direction.Among these, the horizontal signal for the bearing #3 of dataset 1 is used for the demonstration, where the inner race is damaged during the experiment.As opposed to the diagnosis where the data are acquired for each discrete class such as the normal, inner race fault and so on, the data in the prognosis are acquired with a certain interval over time for the purpose of monitoring the degradation of the fault.In the IMS bearing, the duration for a single measurement is one second, which are stored in an individual file.They are done by every 5 min for the first 43 numbers, which is followed by every 10 min until failure.In the prognosis, construction of a suitable health index (HI) is necessary, which represents the current health state and enables its monitoring against the failure.The HI can be constructed using the diverse features extracted in the previous section.According to the literature, a good HI is characterized by three metrics: correlation, monotonicity, and robustness in terms of time (or cycles).Higher values of the three metrics give better performance in the prognosis.Therefore, the criterion for feature selection is given by the average of the three metrics in this study.
The first metric for the prognosis is the correlation, which represents the linearity between the features and time: where T is time, X is the feature, and σ is the standard deviation over the period.The value toward 1 or −1 means that it has near the perfect linear relationship.The Matlab built-in function for this is Rho = corr(X,Y).Note that this is also called the Pearson correlation.The second metric is the monotonicity, which evaluates the degree of continuous increase or decrease of the feature over time.
It is also called Spearman correlation, which is obtained by replacing the variables of the Pearson correlation by its rank variable that represents the standing of the variable in the increasing order: Mon(T, X) = cov(rg T , rg X ) where, rg is the rank of the variables, and σ is its standard deviation.The Matlab built-in function is Rho = corr(X,Y, Spearman ).Monotonicity also has a value between −1 and 1, of which the absolute value near 1 means that the feature is good for the prognosis.The third metric is the robustness, which has to do with the measurement noise arising in the data acquisition.Because the larger noise can cause poorer performance in the prognosis, selecting features robust to this noise is important.The robustness for this objective is defined as [8] Rob where X is the smoothed value of X in terms of time.Smoothing is generally conducted by the moving average, which is to average out the current value by the finite number of recent data.It can be obtained by the Matlab built-in function xt = smooth(x).Robustness can be easily calculated by substituting the smoothed data xt in Equation ( 22).The three metrics are calculated and averaged for each of the 11 time domain-and 3 bearingspecific features for dataset 1, IMS bearing #3.The results are given in Figure 13a, in which the P2P shows the highest value with 0.749, whereas the SK is the lowest with 0.182.While there are further issues of how to construct a single HI out of these features, this paper ends by representing the trend of the two features over time in Figure 13b,c, in which the P2P shows the distinct increase at around 1800 cycles whereas the SK does not show any trend at all.From this result, it can be concluded that exploiting the metric values in constructing the HI is important for good prognostic performance.

Conclusions
While feature engineering is the crucial and practical step for successful PHM, no paper has addressed the basic concepts along with providing the codes for engineers to implement by themselves.The authors have published other tutorial papers with this objective for signal processing [13], which is a very first step, and the particle filter [4], which is the last step of the PHM, respectively.This is another effort towards this end, regarding the feature engineering of the and bearings,

Figure 3 .
Figure 3. Features extraction process for gears.

Figure 3 .
Figure 3. Features extraction process for gears.

Figure 4
Figure 4  is the result of extracted time domain features for the example of HS gear dataset: 6 normal and 11 fault data, in which the acronyms are found in Table2.Note that each set of feature data is normalized by mean and standard deviation, respectively.In the result, most of the features classify the fault (x) from the normal (o) well, but some (SK and SF) do not, which means that further processing is necessary to select only the useful features.At this step, the gear specific features are extracted as shown in Figure3, which are the figure of merits zero (FM0) and sideband energy ratio (SER).FM0 serves to detect changes in gear engagement patterns as an indicator of the gear's main fault:

Figure 4 .
Figure 4. Normalized time domain features of high speed (HS) gears.Figure 4. Normalized time domain features of high speed (HS) gears.

Figure 4 .
Figure 4. Normalized time domain features of high speed (HS) gears.Figure 4. Normalized time domain features of high speed (HS) gears.

20 Figure 5 .
Figure 5. Normalized gear specific features of HS gear.

Figure 5 .
Figure 5. Normalized gear specific features of HS gear.

Figure 6 .
Figure 6.Features extraction process for bearing.

Figure 6 .
Figure 6.Features extraction process for bearing.

Figure 6 .
Figure 6.Features extraction process for bearing.

Figure 6 .
Figure 6.For example, Figure8represents the frequency spectrum after performing fast Fourier transform (FFT) for the envelope signal of CWRU dataset where the outer race was artificially damaged.The peak is apparent at the fBPFO, proving the presence of fault at the outer race.The corresponding Matlab function to extract the fault frequency features is [feature, feature_name] = Bear_feat(x,fs,bff,cutoff) where bff is the vector of bearing fault frequencies given by Equations (12)-(14), and cutoff is to account for the inaccuracy of these frequencies in the real bearing.

Figure 8 .
Figure 8. Frequency domain data of CWRU outer race fault bearing.

Figure 9 .
Figure 9.General time domain features and bearing specific features of CWRU bearing.

Figure 8 .
Figure 8. Frequency domain data of CWRU outer race fault bearing.

Figure 8 .
Figure 8. Frequency domain data of CWRU outer race fault bearing.

Figure 9 .
Figure 9.General time domain features and bearing specific features of CWRU bearing.

Figure 9 .
Figure 9.General time domain features and bearing specific features of CWRU bearing.

Figure 10 .
Figure 10.PDF of (a) MEAN, and (b) SK, representing the max and min FDR, respectively, for HS gear.

Figure 10 .
Figure 10.PDF of (a) MEAN, and (b) SK, representing the max and min FDR, respectively, for HS gear.

Figure 11 .
Figure 11.Scatter plot of two features with (a) max and (b) min for KAUG.

3Figure 11 .
Figure 11.Scatter plot of two features with (a) max J 3 and (b) min J 3 for KAUG.

Figure 11 .Figure 12 .
Figure 11.Scatter plot of two features with (a) max and (b) min for KAUG.

Figure 12 .
Figure 12.Scatter plot of two features with (a) max J 3 and (b) min J 3 for CWRU bearing.

Figure 13 .
Figure 13.Feature evaluation of IMS bearing dataset 1 bearing #3, (a) calculated feature selection criteria, (b) best feature for prognosis, (c) worst feature for prognosis.

Figure 13 .
Figure 13.Feature evaluation of IMS bearing dataset 1 bearing #3, (a) calculated feature selection criteria, (b) best feature for prognosis, (c) worst feature for prognosis.

Table 3 .
Rank of FDR value for HS gear.

Table 3 .
Rank of FDR value for HS gear.

Table 4 .
Rank of J 3 value for KAUG.

Table 5 .
Rank of J 3 value for CWRU.

Table 5 .
Rank of value for CWRU.