Diagnostics 101: A Tutorial for Fault Diagnostics of Rolling Element Bearing Using Envelope Analysis in MATLAB

: This paper presents a MATLAB-based tutorial to conduct fault diagnosis of a rolling element bearing. While there have been so many new developments in this ﬁeld, no studies have addressed the tutorial aspects in this ﬁeld to help the engineers learn the concept and implement by their own e ﬀ ort. The three most common techniques—the autoregressive model, spectral kurtosis, and envelope analysis—are selected to demonstrate the bearing diagnosis process. Simulation signal is introduced to help understand the characteristics of fault signal and carry out the process toward the fault identiﬁcation. The techniques are then applied to the two real datasets to demonstrate the practical applications, one made by the authors and the other by the Case Western Reserve University, which is known as a standard reference in testing the diagnostic algorithms


Introduction
Prognostics and Health Management (PHM), which enables life extension while preventing unexpected failures in various engineering systems, have received great attention over the last decades for more efficient operation and maintenance. Generally, there are three main steps in the PHM development: (1) signal processing, (2) diagnosis, and (3) prognosis. Signal processing is the step in which the valuable information called features are extracted from noisy sensor signals such as vibrations or acoustic emissions. Diagnosis focuses on detecting the faults, isolating their locations, and identifying their severity based on the extracted features. Lastly, the prognosis is to predict the Remaining Useful Life (RUL) based on the accumulated diagnostic information, which indicates how long it will take until the system's health reaches the failure threshold. Though the most important step is the prognosis, its performance is of no doubt dependent on the quality of prior steps: the signal processing and the diagnosis.
To date, the PHM has mainly focused on the vibration signal-based techniques for rotating mechanical components such as gears [1,2] and bearings [3][4][5][6][7], since they usually lead to the critical failures [1,8]. Among them, the rolling element bearings, which support the shaft during the rotation, have been of greatest importance in view of its widespread use and often cause of another failures.
where D, d, α, n and f s are the pitch diameter, ball diameter, contact angle between the ball and the cage, number of rolling elements, and the rotating speed of bearing (Hz), respectively. Usually, the impact signals at these fault frequencies are modulated by the much higher resonant frequencies of the bearings, which results in the signal like the one in Figure 1, the exponentially decaying ringing occurring periodically at the fault frequencies [16].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 23 where , , , and are the pitch diameter, ball diameter, contact angle between the ball and the cage, number of rolling elements, and the rotating speed of bearing (Hz), respectively. Usually, the impact signals at these fault frequencies are modulated by the much higher resonant frequencies of the bearings, which results in the signal like the one in Figure 1, the exponentially decaying ringing occurring periodically at the fault frequencies [16]. This phenomenon is mathematically the same as the amplitude modulation (AM), of which the concept is illustrated by a simple example in Figure 2. The fault and the resonance signal of the bearings are assumed to be Figure 2a,b which are the modulating and carrier signal in the signal processing theory, respectively. Then, the AM signal becomes the carrier signal with variable amplitude given by the modulating signal as shown in Figure 2c. After taking the FFT, the frequency domains of modulating, carrier, and the AM signals are given in Figure 2e-g, respectively. The result shows that the AM spectrum has peaks at the resonance (carrier) frequency ( ) with the sidebands spaced apart by the fault (modulating) frequency ( ) [17]. The AM signal can be expressed by the following equation [18]: where f(t) is the fault signal (plus DC offset to ensure that the mean amplitude of AM signals is zero) with the amplitude M and frequency , r(t) is the resonance signal with the amplitude and frequency , and ( ) is the AM signal. Equation (5) can be transformed to which shows that the spectrum of the AM signals consists of three components , ± , with the amplitudes and · /2, respectively, as shown in Figure 2f. Therefore, it can be concluded that, when a bearing has fault, it appears as the sideband around the resonance signal. Then, the remaining task is to extract or separate the original fault signal Figure 2a or its FFT (e) from the AM signal Figure  2c or its FFT (g). This is realized by the envelope analysis, which is an amplitude demodulation technique as will be addressed in the subsequent section. This phenomenon is mathematically the same as the amplitude modulation (AM), of which the concept is illustrated by a simple example in Figure 2. The fault and the resonance signal of the bearings are assumed to be Figure 2a,b which are the modulating and carrier signal in the signal processing theory, respectively. Then, the AM signal becomes the carrier signal with variable amplitude given by the modulating signal as shown in Figure 2c. After taking the FFT, the frequency domains of modulating, carrier, and the AM signals are given in Figure 2e-g, respectively. The result shows that the AM spectrum has peaks at the resonance (carrier) frequency ( f r ) with the sidebands spaced apart by the fault (modulating) frequency ( f f ) [17]. The AM signal can be expressed by the following equation [18]: x(t) = [1 + f (t)] × r(t) = 1 + Msin 2π f f t Acos(2π f r t) (5) where f (t) is the fault signal (plus DC offset to ensure that the mean amplitude of AM signals is zero) with the amplitude M and frequency f f , r(t) is the resonance signal with the amplitude A and frequency f r , and x(t) is the AM signal. Equation (5) can be transformed to x(t) = Acos(2π f r t) Appl. Sci. 2020, 10, 7302 4 of 23 which shows that the spectrum of the AM signals consists of three components f r , f r ± f f , with the amplitudes A and A·M/2, respectively, as shown in Figure 2f. Therefore, it can be concluded that, when a bearing has fault, it appears as the sideband around the resonance signal. Then, the remaining task is to extract or separate the original fault signal Figure 2a or its FFT (e) from the AM signal Figure 2c or its FFT (g). This is realized by the envelope analysis, which is an amplitude demodulation technique as will be addressed in the subsequent section.

Simulation of Fault Signal
With this understanding, more complex signal closer to the reality is considered. Toward this end, a virtual signal is generated using the MATLAB code given in Appendix A. A fault signal is

Simulation of Fault Signal
With this understanding, more complex signal closer to the reality is considered. Toward this end, a virtual signal is generated using the MATLAB code given in Appendix A. A fault signal is created as shown in Figure 3a by an exponential function, x 1 (t) = A · exp(−αt ) to represent the transient impact signal with the frequency f f = 10 Hz, where t = mod t, 1/ f f and A and α represent the magnitude of impulse signal and the damping parameter of the bearing, respectively. They are given by 5 and 75.4 in this example. The mod stands for the modulus after division, which is to make repeated signal over time [19]. The FFT of this signal becomes Figure 3d, where the fundamental frequency is 10 Hz with amplitude 1.0 with the subsequent ones being its harmonics. This is modulated by a carrier signal given by the sinusoidal function x 2 (t) = sin(2π f r t) with frequency f r = 600 Hz which is much higher than f f , as shown in Figure 3b-c. This corresponds to the resonance of the bearing. Then, the modulated signal becomes x 3 (t) = x 1 (t) * x 2 (t), and is shown in Figure 3c in which the carrier contains the fault in its amplitude [20]. Its FFT becomes Figure 3f, where the sidebands are present around 600 Hz. represent the magnitude of impulse signal and the damping parameter of the bearing, respectively. They are given by 5 and 75.4 in this example. The mod stands for the modulus after division, which is to make repeated signal over time [19]. The FFT of this signal becomes Figure 3d, where the fundamental frequency is 10 Hz with amplitude 1.0 with the subsequent ones being its harmonics. This is modulated by a carrier signal given by the sinusoidal function ( ) = (2 ) with frequency = 600 which is much higher than f , as shown in Figure 3b-c. This corresponds to the resonance of the bearing. Then, the modulated signal becomes ( ) = ( ) * ( ), and is shown in Figure 3c in which the carrier contains the fault in its amplitude [20]. Its FFT becomes Figure 3f, where the sidebands are present around 600 Hz. Added to this, the signal may include a certain discrete signals from other components such as gears and shafts as well as the background noise. In this example, they are represented by another low-frequency sinusoidal function ( ) = (2 ) with amplitude 2 and frequency = 13 Hz and Gaussian white noise ( )~(0, ) with variance = 0.5 , which are given by Added to this, the signal may include a certain discrete signals from other components such as gears and shafts as well as the background noise. In this example, they are represented by another low-frequency sinusoidal function x 4 (t) = sin(2π f d t) with amplitude 2 and frequency f d = 13 Hz and Gaussian white noise x 5 (t) ∼ N 0, σ 2 with variance σ 2 = 0.5 2 , which are given by Figure 4a,b, and their FFTs (d) and (e), respectively. Putting all these together, the resulting raw signal x(t) = x 3 (t) + x 4 (t) + x 5 (t) becomes Figure 4c with its FFT being (f). Note that the governing sinusoidal wave shown in Figure 4c is the discrete signal x 4 . The amplitudes at the high-frequency region in Figure 4f represent the carrier at 600 Hz (resonance) and surrounding sidebands (faults), which is a more complex version of the AM mentioned in Figure 2. The frequency amplitudes of the discrete and AM signal are 2 and 0.6, respectively, which means that the fault is buried by the carrier and another discrete signals. This agrees with the finding that the signals from the gears and shafts often show higher amplitudes than those of faulty bearing; typically, the amplitudes of the shaft rotation, gear meshing and the bearing fault signals are on the order of 0.1, 10 and 0.001 g [21,22].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 23 Figure 4a,b, and their FFTs (d) and (e), respectively. Putting all these together, the resulting raw signal ( ) = ( ) + ( ) + ( ) becomes Figure 4c with its FFT being (f). Note that the governing sinusoidal wave shown in Figure 4c is the discrete signal . The amplitudes at the high-frequency region in Figure 4f represent the carrier at 600 Hz (resonance) and surrounding sidebands (faults), which is a more complex version of the AM mentioned in Figure 2. The frequency amplitudes of the discrete and AM signal are 2 and 0.6, respectively, which means that the fault is buried by the carrier and another discrete signals. This agrees with the finding that the signals from the gears and shafts often show higher amplitudes than those of faulty bearing; typically, the amplitudes of the shaft rotation, gear meshing and the bearing fault signals are on the order of 0.1, 10 and 0.001 g [21,22].

Diagnosis Using Simulation Data
The goal of the diagnosis is to isolate the fault information from the raw signal mixed with other noisy sources. The steps are taken in reverse order of the way the virtual raw signal is made and are summarized in Figure 5. It consists of three steps: First is the discrete signal separation to remove signals from the other components. Second is the demodulation band selection to find out the band

Diagnosis Using Simulation Data
The goal of the diagnosis is to isolate the fault information from the raw signal mixed with other noisy sources. The steps are taken in reverse order of the way the virtual raw signal is made and are summarized in Figure 5. It consists of three steps: First is the discrete signal separation to remove signals from the other components. Second is the demodulation band selection to find out the band containing resonance frequencies, using which the band pass filtering is carried out to obtain only the signal in the band while removing those elsewhere. Last is the envelope analysis to restore the original fault signal out of the modulated signal around the resonance frequencies. Usually, the first two steps are denoted as the signal pre-processing. Note that the raw signal includes the mixture of four signals: bearing fault signal, resonance signal, and discrete signal with frequencies f f , f r and f d and the background noise.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 23 containing resonance frequencies, using which the band pass filtering is carried out to obtain only the signal in the band while removing those elsewhere. Last is the envelope analysis to restore the original fault signal out of the modulated signal around the resonance frequencies. Usually, the first two steps are denoted as the signal pre-processing. Note that the raw signal includes the mixture of four signals: bearing fault signal, resonance signal, and discrete signal with frequencies , and and the background noise. Regarding the algorithms in each step, autoregressive (AR) models are applied [14,23] for the discrete signal separation at first. Spectral kurtosis (SK) and Short Time Fourier Transform (STFT) are employed in the second step to select frequency band where the bearing resonance is dominant. In the third step, Hilbert transform is applied for the envelope analysis to obtain the analytic signal. Then, its magnitude becomes the envelope signal. By taking the FFT of the envelope signal, the fault information is retrieved from the amplitudes at the fault frequencies. More details are explained in the following sub-sections. MATLAB code with 63-lines is provided in the Appendix B to implement the process, which consists of four parts: (1) problem definition, (2) discrete signal separation, (3) demodulation band selection, and (4) envelope analysis.

Problem Definition (Lines 1-7)
In the code, six parameters are required in the problem definition. The first four are related to the data acquisition and bearing dimensions, and thus should be fixed for the same applications. Raw data are stored in the variable rawData. Simulation data generated by Appendix A are loaded into this variable. The parameters sampRate, rpm and bearFreq denote the sampling rate in Hz, shaft rotation speed in rpm and bearing fault frequency. Note that there are four fault frequencies in the bearing, which are calculated by the Equations (1)-(4) based on the shaft frequency in rpm and bearing specifications. In the simulation, however, a single fault frequency is directly assigned, which is = 10Hz as shown in Figure 3a. Therefore, the rpm is just set at 60 here. The last two parameters maxP and windLeng are to control the performance of the algorithm: maxP is the maximum order of the AR model, which is given by 300 from the ratio of the sampRate = 3000 and bearFreq = 10. This is based on the criteria that the model order should not be greater than the number of data within a fault interval [23]. The parameter windLeng is a row vector of the lengths in the window function of STFT. In this example, four different window lengths [2^4 2^5 2^6 2^7] are used. More details about how to set optimum window length can be referred to in Ref. [24].

Discrete Signal Separation (Lines 8-22)
The first step is the discrete signal separation that removes the signals from the other sources such as gears and shafts, which we call the discrete (deterministic) signal. The discrete signal can be obtained by the autoregressive (AR) model, which is to estimate the current value of time series data Regarding the algorithms in each step, autoregressive (AR) models are applied [14,23] for the discrete signal separation at first. Spectral kurtosis (SK) and Short Time Fourier Transform (STFT) are employed in the second step to select frequency band where the bearing resonance is dominant. In the third step, Hilbert transform is applied for the envelope analysis to obtain the analytic signal. Then, its magnitude becomes the envelope signal. By taking the FFT of the envelope signal, the fault information is retrieved from the amplitudes at the fault frequencies. More details are explained in the following sub-sections. MATLAB code with 63-lines is provided in the Appendix B to implement the process, which consists of four parts: (1) problem definition, (2) discrete signal separation, (3) demodulation band selection, and (4) envelope analysis.

Problem Definition (Lines 1-7)
In the code, six parameters are required in the problem definition. The first four are related to the data acquisition and bearing dimensions, and thus should be fixed for the same applications. Raw data are stored in the variable rawData. Simulation data generated by Appendix A are loaded into this variable. The parameters sampRate, rpm and bearFreq denote the sampling rate in Hz, shaft rotation speed in rpm and bearing fault frequency. Note that there are four fault frequencies in the bearing, which are calculated by the Equations (1)-(4) based on the shaft frequency in rpm and bearing specifications. In the simulation, however, a single fault frequency is directly assigned, which is f f = 10 Hz as shown in Figure 3a. Therefore, the rpm is just set at 60 here. The last two parameters maxP and windLeng are to control the performance of the algorithm: maxP is the maximum order of the AR model, which is given by 300 from the ratio of the sampRate = 3000 and bearFreq = 10. This is based on the criteria that the model order should not be greater than the number of data within a fault interval [23]. The parameter windLeng is a row vector of the lengths in the window function of STFT. In this example, four different window lengths [2ˆ4 2ˆ5 2ˆ6 2ˆ7] are used. More details about how to set optimum window length can be referred to in Ref. [24].

Discrete Signal Separation (Lines 8-22)
The first step is the discrete signal separation that removes the signals from the other sources such as gears and shafts, which we call the discrete (deterministic) signal. The discrete signal can be obtained by the autoregressive (AR) model, which is to estimate the current value of time series data via weighted sum of the past values. Then, the residual signal is obtained by subtracting the AR model, i.e., the discrete part, from the raw signal, as shown in Figure 6. via weighted sum of the past values. Then, the residual signal is obtained by subtracting the AR model, i.e., the discrete part, from the raw signal, as shown in Figure 6. Let us consider a time series x(n) with n = 1, N, which is the raw vibration signal acquired with sampRate per seconds. The AR model is then constructed to predict the next value in a time series by using a series of previous observations, which is described by the following equation [22] (line 15): where ( ) is the predicted value at nth time step as a weighted sum of its p previous values in the sequence, p is the order of the model, a(k) is the weighting coefficients, and x(n-k) is n-kth data. The matrix is an × matrix consisting of the past series values of (line [13][14]. The coefficient ( ) can be obtained by solving the Yule-Walker equation [14,22] (line 12): where ( ) is the autocorrelation function of x(n), which is defined by In order to illustrate the process, an example is given when p = 2. Then, Equation (8) becomes where In the equations, the items exceeding the range [1, N] do not exist, hence, are assigned zero. Then, the AR model at n th step becomes by Equation (7) Once the deterministic signal is obtained from the AR model, the residual signal is obtained by subtracting the prediction from this (line 16).
Since the order p affects the prediction performance, it should be determined to make the residual signal represent the fault as best as it can. Depending on its uses, there are several different Let us consider a time series x(n) with n = 1, N, which is the raw vibration signal acquired with sampRate per seconds. The AR model is then constructed to predict the next value in a time series by using a series of previous observations, which is described by the following equation [22] (line 15): where x p (n) is the predicted value at nth time step as a weighted sum of its p previous values in the sequence, p is the order of the model, a(k) is the weighting coefficients, and x(n-k) is n-kth data.
The matrix X is an N × p matrix consisting of the past series values of x (line [13][14]. The coefficient a(k) can be obtained by solving the Yule-Walker equation [14,22] (line 12): where r xx (k) is the autocorrelation function of x(n), which is defined by In order to illustrate the process, an example is given when p = 2. Then, Equation (8) becomes where In the equations, the items x exceeding the range [1, N] do not exist, hence, are assigned zero. Then, the AR model at nth step becomes by Equation (7) x Once the deterministic signal x p is obtained from the AR model, the residual signal is obtained by subtracting the prediction from this (line 16).
Since the order p affects the prediction performance, it should be determined to make the residual signal represent the fault as best as it can. Depending on its uses, there are several different criteria to determine the order p, such as Akaike information criteria (AIC) [25][26][27][28], Bayesian information criterion (BIC) [29], Minimum description length (MDL) [30], and Partial Autocorrelation Function (PACF) [31]. In the bearing fault diagnosis, the order p is usually determined such that the kurtosis of the residual signal is maximized, as are addressed in many literatures (e.g., [23]). The latter is used here because it is simpler to calculate and the kurtosis is the impulsiveness indicator of fault signal. For this objective, the kurtosis is calculated by increasing the order p from 1 to maxP = 300 (lines 10-18). The result is in Figure 7a, where the maximum is at p = 82, hence, is used for optP (line 19). Then, the AR model is constructed again using this, which is to repeat lines 12-15. In this case, however, it is implemented by the MATLAB function filter, which does the same but in more simple way (line 21). Residual signal thus obtained can be plotted by entering plot(rawData.t (optP+1:end),e) in the MATLAB command window, and is shown in Figure 7b along with its kurtosis value. Additionally, the raw signal and its kurtosis are given in Figure 7c for comparison. Kurtosis is higher by about four times after removing the discrete signals, reflecting the fault presence more clearly. For more advanced results, however, another criteria as mentioned above may be applied to determine the model order p and choose the one with better performances after comparison. This may be beyond the scope of this paper.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 23 criteria to determine the order p, such as Akaike information criteria (AIC) [25][26][27][28], Bayesian information criterion (BIC) [29], Minimum description length (MDL) [30], and Partial Autocorrelation Function (PACF) [31]. In the bearing fault diagnosis, the order p is usually determined such that the kurtosis of the residual signal is maximized, as are addressed in many literatures (e.g., [23]). The latter is used here because it is simpler to calculate and the kurtosis is the impulsiveness indicator of fault signal. For this objective, the kurtosis is calculated by increasing the order p from 1 to maxP = 300 (lines [10][11][12][13][14][15][16][17][18]. The result is in Figure 7a, where the maximum is at p = 82, hence, is used for optP (line 19). Then, the AR model is constructed again using this, which is to repeat lines 12-15. In this case, however, it is implemented by the MATLAB function filter, which does the same but in more simple way (line 21). Residual signal thus obtained can be plotted by entering plot(rawData.t (optP+1:end),e) in the MATLAB command window, and is shown in Figure 7b along with its kurtosis value. Additionally, the raw signal and its kurtosis are given in Figure 7c for comparison.
Kurtosis is higher by about four times after removing the discrete signals, reflecting the fault presence more clearly. For more advanced results, however, another criteria as mentioned above may be applied to determine the model order p and choose the one with better performances after comparison. This may be beyond the scope of this paper.

Demodulation Band Selection (Lines 23-49)
As the second step of the signal pre-processing, a frequency band enclosing the resonance frequencies is sought from the residual signal. It is determined based on the spectral kurtosis (SK), which is the kurtosis at each frequency line in a time-frequency diagram. When the fault occurs, the

Demodulation Band Selection (Lines 23-49)
As the second step of the signal pre-processing, a frequency band enclosing the resonance frequencies is sought from the residual signal. It is determined based on the spectral kurtosis (SK), which is the kurtosis at each frequency line in a time-frequency diagram. When the fault occurs, the SK becomes higher around the region of resonance frequencies onto which the fault signal is modulated. The SK is defined by the functions of short time Fourier transform (STFT) [14,32] (line 39): where · is the time-averaging operator, and S(t,f) is the STFT of the residual signal. The STFT is a local Fourier transform for a finite time window at time t, which varies from t = 0 to t end (lines 27-37).
The process is illustrated in Figure 8 Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 23 SK becomes higher around the region of resonance frequencies onto which the fault signal is modulated. The SK is defined by the functions of short time Fourier transform (STFT) [14,32] (line 39): where 〈•〉 is the time-averaging operator, and S(t,f) is the STFT of the residual signal. The STFT is a local Fourier transform for a finite time window at time t, which varies from = 0 to (lines 27-37). The process is illustrated in Figure 8  The time interval of Fourier transform is given by the half of the window length. Remembering that the sampling rate of the signal is 3000 Hz, which means 3000 data are in a second, when the window length is 2 7 = 128; for example, the first signal at t 0 = 0 in the middle figure resides in the range of 0≤ t ≤ 0.0427 (1 ≤ n ≤ 128), and the second at t 1 = 0.0213 (n = 65), in the range of 0.0213 ≤ t ≤ 0.0640 (65 ≤ n ≤ 192), etc.
The results of FFT in Figure 8 can also be plotted in the 2D form called spectrogram, as shown in Figure 9a, which shows the frequency spectrum where the x and y axes are the time and frequency. Finally, the SK can be computed using Equations (1) and (2) to obtain as a function of frequency, as shown in Figure 9b (lines 38-40). Based on the figure, the frequency range for high SK can be selected visually, which is around 400-800 Hz in this case (lines 43-47). The frequency band thus obtained is used to band-pass filter the residual signal, which leaves only the spectrum in the band (lines 48-49).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 23 The results of FFT in Figure 8 can also be plotted in the 2D form called spectrogram, as shown in Figure 9a, which shows the frequency spectrum where the x and y axes are the time and frequency. Finally, the SK can be computed using Equations (1) and (2) to obtain as a function of frequency, as shown in Figure 9b (lines 38-40). Based on the figure, the frequency range for high SK can be selected visually, which is around 400-800 Hz in this case (lines 43-47). The frequency band thus obtained is used to band-pass filter the residual signal, which leaves only the spectrum in the band (lines 48-49). By the way, the results of STFT may vary by different window lengths, which leads to the different SK. For better results, it is necessary to set a proper window length in the STFT. In case the fault interval can be identified visibly as is found as 0.1 s ( = 300) in Figure 7b, it is advised that the length be smaller to some degree than the number of data within the interval, which is the same as maxP. In practice, however, since the signal is not so clear, several window lengths are usually attempted [14], which are in this example given arbitrarily by [2^4 2^5 2^6 2^7] (line 7). Upon obtaining the SK with each window lengths (lines 26-42), they are plotted in superposition (line 45), from which the frequency band for higher SK is visually determined (lines 47). The result is shown in Figure 10a, where demodulation bands are set between [500 700] for the range of higher kurtosis in common. This should be input in the command window as shown in Figure 10b. Then, the residual signal goes through band-pass filtering to result in the signal in Figure 10c of which the kurtosis increases by twice to 23.1865 from 10.5885. By the way, the results of STFT may vary by different window lengths, which leads to the different SK. For better results, it is necessary to set a proper window length in the STFT. In case the fault interval can be identified visibly as is found as 0.1 s (n = 300) in Figure 7b, it is advised that the length be smaller to some degree than the number of data within the interval, which is the same as maxP. In practice, however, since the signal is not so clear, several window lengths are usually attempted [14], which are in this example given arbitrarily by [2ˆ4 2ˆ5 2ˆ6 2ˆ7] (line 7). Upon obtaining the SK with each window lengths (lines 26-42), they are plotted in superposition (line 45), from which the frequency band for higher SK is visually determined (lines 47). The result is shown in Figure 10a, where demodulation bands are set between [500 700] for the range of higher kurtosis in common. This should be input in the command window as shown in Figure 10b. Then, the residual signal goes through band-pass filtering to result in the signal in Figure 10c

Envelope Analysis (Lines 50-64)
After two pre-processing techniques-discrete signal separation and band-pass filtering-one obtains the AM signal that includes only components of resonance (carrier) and fault (modulation) frequency. Then, the envelope analysis is used as the last step for the diagnosis of bearing faults, which consists of three sub-steps: First, Hilbert transform is applied to the AM signal to obtain the analytic signal [14,15,17] (line 51). Next, its magnitude is calculated to obtain the envelope signal, which is the absolute value of the analytic signal (line 52). It is further translated to have zero mean by removing the DC offset (line 53). Last, FFT is taken to the envelope signal to retrieve the fault information from the amplitudes at the fault frequencies (lines 54-55).
The Hilbert transform is referred to as a 90° phase shifter and results in the analytic signals, expressed as follows [18]: where indicates the analytic signal, in which the real part (t) is the original AM signal, imaginary part ( ) is its Hilbert transform, ( ) is the instantaneous phase, and is the instantaneous amplitude, i.e., the envelope signal. The resulting envelope signal with the DC offset is shown in Figure 11a. The envelope spectrum after taking FFT is given in Figure 11b, in which the fault is identified by examining the amplitude at the given fault frequencies (lines 56-64). In the figure, the peaks near the fault frequency 10 Hz and its harmonics are apparent, which indicates that the fault has occurred indeed. Compare the Figure 11a,b with the original ones in Figure 3a,d to find out that result of the envelope spectrum is different, which we call smearing. The reason is because the sampling rate is low and the length of residual signal is changed from the original one after the

Envelope Analysis (Lines 50-64)
After two pre-processing techniques-discrete signal separation and band-pass filtering-one obtains the AM signal that includes only components of resonance (carrier) and fault (modulation) frequency. Then, the envelope analysis is used as the last step for the diagnosis of bearing faults, which consists of three sub-steps: First, Hilbert transform is applied to the AM signal to obtain the analytic signal [14,15,17] (line 51). Next, its magnitude is calculated to obtain the envelope signal, which is the absolute value of the analytic signal (line 52). It is further translated to have zero mean by removing the DC offset (line 53). Last, FFT is taken to the envelope signal to retrieve the fault information from the amplitudes at the fault frequencies (lines 54-55).
The Hilbert transform is referred to as a 90 • phase shifter and results in the analytic signals, expressed as follows [18]: where a x indicates the analytic signal, in which the real part x(t) is the original AM signal, imaginary part x(t) is its Hilbert transform, φ x (t) is the instantaneous phase, and A x is the instantaneous amplitude, i.e., the envelope signal. The resulting envelope signal with the DC offset is shown in Figure 11a. The envelope spectrum after taking FFT is given in Figure 11b, in which the fault is identified by examining the amplitude at the given fault frequencies (lines 56-64). In the figure, the peaks near the fault frequency 10 Hz and its harmonics are apparent, which indicates that the fault has occurred indeed. Compare the Figure 11a,b with the original ones in Figure 3a,d to find out that result of the envelope spectrum is different, which we call smearing. The reason is because the sampling rate is low and the length of residual signal is changed from the original one after the AR filtering. Though not addressed here, it is left to the reader to implement and figure out what the result will be when the fault is small or nonexistent by assigning smaller value to A. For comparison, envelop analysis results of healthy bearing are also given in Figure 11c,d, which can be easily made by assigning A = 0 in the MATLAB code (Bearing fault simulation). Comparing Figure 11a,b with the fault, the normal bearing does not show periodic impulse representing the fault in the time domain and the peak at the fault frequency in the frequency domain either. Furthermore, the kurtosis value is about 3 in the normal bearing, which is much smaller than that of fault and the typical of the Gaussian distribution.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 23 AR filtering. Though not addressed here, it is left to the reader to implement and figure out what the result will be when the fault is small or nonexistent by assigning smaller value to A. For comparison, envelop analysis results of healthy bearing are also given in Figure 11c,d, which can be easily made by assigning A = 0 in the MATLAB code (Bearing fault simulation). Comparing Figure 11a,b with the fault, the normal bearing does not show periodic impulse representing the fault in the time domain and the peak at the fault frequency in the frequency domain either. Furthermore, the kurtosis value is about 3 in the normal bearing, which is much smaller than that of fault and the typical of the Gaussian distribution.

Diagnosis of Korea Aerospace University (KAU) Bearing Faults
In this section, diagnosis is performed using the real vibration data made by the authors' test rig in the Korea Aerospace University (KAU). It is shown in Figure 12 which consists of DC motor, support bearings, and two roller bearings for the test. Vertical load is applied to the support bearing. The test bearing is given by NSK NJ 2306, with its geometry and dimensions in Table 1. Fault is seeded at the inner race with an artificially induced notch (0.2 mm depth, 0.3 mm width). The accelerometer is installed on top of the test bearing's housing using bolt to acquire the vibration signals. The test is performed under four different conditions as listed in Table 2.

Diagnosis of Korea Aerospace University (KAU) Bearing Faults
In this section, diagnosis is performed using the real vibration data made by the authors' test rig in the Korea Aerospace University (KAU). It is shown in Figure 12 which consists of DC motor, support bearings, and two roller bearings for the test. Vertical load is applied to the support bearing. The test bearing is given by NSK NJ 2306, with its geometry and dimensions in Table 1. Fault is seeded at the inner race with an artificially induced notch (0.2 mm depth, 0.3 mm width). The accelerometer is installed on top of the test bearing's housing using bolt to acquire the vibration signals. The test is performed under four different conditions as listed in Table 2.  In order to diagnose the bearing fault, the fault frequencies as stated in Equations (1)-(4) should be available. In practice, however, most of the bearing manufacturers do not provide the values of ball and pitch diameter. To solve this problem, a commercial bearing fault frequency calculator [33] is adopted in this study and the results are listed in Table 3. The bearing data are acquired through a sampling frequency of 51.2 kHz. In order to illustrate the process, the dataset for "bearing 1" is taken, which is operated with a rotating speed of 1200 rpm and a 0 N vertical load (note that the load by the weight of bearing housing still exists). Figure 13 shows the raw signal, of which the kurtosis is 4.9923. The four datasets listed in Table 2 can be downloaded from https://www.kau-sdol.com/ and https://sites.google.com/view/sgkim.  In order to diagnose the bearing fault, the fault frequencies as stated in Equations (1)-(4) should be available. In practice, however, most of the bearing manufacturers do not provide the values of ball and pitch diameter. To solve this problem, a commercial bearing fault frequency calculator [33] is adopted in this study and the results are listed in Table 3. The bearing data are acquired through a sampling frequency of 51.2 kHz. In order to illustrate the process, the dataset for "bearing 1" is taken, which is operated with a rotating speed of 1200 rpm and a 0 N vertical load (note that the load by the weight of bearing housing still exists). Figure 13 shows the raw signal, of which the kurtosis is 4.9923. The four datasets listed in Table 2 can be downloaded from https://www.kau-sdol.com/ and https://sites.google.com/view/sg-kim. The raw data for bearing 1 are loaded by the name bearing1. Referring to Table 2, sampRate and rpm are set at 51.2e3 and 1200, respectively. Bearing fault frequencies are assigned in bearFreq, which is [4.4423 6.5577 0.4038 5.0079] for BPFO, BPFI, FTF, and BSF, respectively, based on Table 3. The maximum order of the AR model maxP is set at 390 based on the ratio of the sampling frequency versus the highest fault frequency (51200/(6.5577*1200/60)) as mentioned before. Vector of the window lengths windLeng for STFT are given by [2^4 2^5 2^6 After going through the discrete signal separation (lines [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], one obtains the optimum order optP being 318 for the AR model by maximizing the kurtosis of the residual signal. Then, the resulting residual signal becomes Figure 14 and its kurtosis is 44.0245, roughly a nine-fold increase. Demodulation band selection (lines 23-49) is then carried out using the four sets of SK made with five window lengths, which are given in Figure 15a. The proper frequency band for higher SK is selected by visual examination of this figure, which is given arbitrarily as [1.8e4 2.3e4] by the authors. The residual signal after applying band-pass filter is then obtained as shown in Figure 15b, of which the kurtosis increases to 208.849, five times greater than the previous one.  The raw data for bearing 1 are loaded by the name bearing1. Referring to Table 2, sampRate and rpm are set at 51.2e3 and 1200, respectively. Bearing fault frequencies are assigned in bearFreq, which is [4.4423 6.5577 0.4038 5.0079] for BPFO, BPFI, FTF, and BSF, respectively, based on Table 3. The maximum order of the AR model maxP is set at 390 based on the ratio of the sampling frequency versus the highest fault frequency (51200/(6.5577*1200/60)) as mentioned before. Vector of the window lengths windLeng for STFT are given by [2ˆ4 2ˆ5 2ˆ6 2ˆ7 2ˆ8] under the constraint of maximum length being smaller than maxP (390). As a result, the commands in the Problem definition (lines 1-7) are modified as follows: After going through the discrete signal separation (lines [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], one obtains the optimum order optP being 318 for the AR model by maximizing the kurtosis of the residual signal. Then, the resulting residual signal becomes Figure 14 and its kurtosis is 44.0245, roughly a nine-fold increase. Demodulation band selection (lines 23-49) is then carried out using the four sets of SK made with five window lengths, which are given in Figure 15a. The proper frequency band for higher SK is selected by visual examination of this figure, which is given arbitrarily as [1.8e4 2.3e4] by the authors. The residual signal after applying band-pass filter is then obtained as shown in Figure 15b, of which the kurtosis increases to 208.849, five times greater than the previous one. The raw data for bearing 1 are loaded by the name bearing1. Referring to Table 2, sampRate and rpm are set at 51.2e3 and 1200, respectively. Bearing fault frequencies are assigned in bearFreq, which is [4.4423 6.5577 0.4038 5.0079] for BPFO, BPFI, FTF, and BSF, respectively, based on Table 3. The maximum order of the AR model maxP is set at 390 based on the ratio of the sampling frequency versus the highest fault frequency (51200/(6.5577*1200/60)) as mentioned before. Vector of the window lengths windLeng for STFT are given by [2^4 2^5 2^6 After going through the discrete signal separation (lines [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], one obtains the optimum order optP being 318 for the AR model by maximizing the kurtosis of the residual signal. Then, the resulting residual signal becomes Figure 14 and its kurtosis is 44.0245, roughly a nine-fold increase. Demodulation band selection (lines 23-49) is then carried out using the four sets of SK made with five window lengths, which are given in Figure 15a. The proper frequency band for higher SK is selected by visual examination of this figure, which is given arbitrarily as [1.8e4 2.3e4] by the authors. The residual signal after applying band-pass filter is then obtained as shown in Figure 15b, of which the kurtosis increases to 208.849, five times greater than the previous one.  After typing in the frequency band, it continues to the envelope analysis (lines 50-64), and the results are shown in Figure 16. In the figure, the four fault frequencies are plotted as vertical lines, at which the peak is examined to identify whether the fault has occurred or not. The result, however, should be interpreted with caution. It shows multitude of peaks at the several frequencies: the first group is at the shaft frequency (20 Hz = 1200 rpm/60 s) and its harmonics (black arrows), which can be caused by unbalance, misalignment, or coupling lock up. Details associated with shaft harmonics can be referred to the literature [34]. Next is at the BPFI (130 Hz) and sidebands with harmonics spaced at the interval of shaft frequency (20 Hz) around it (orange arrows). Once the peaks like these are identified around the BPFI, it indicates the inner race fault occurrence. Note that, in case of inner race fault, sidebands typically appear around the BPFI, whereas it is not so in the other faults [35]. Note also that sometimes the peaks can overlap by chance in the same frequency range, as is found at BPFO, which should not be mistaken as the fault.

Diagnosis of CWRU Bearing Faults
Vibration data provided by the Case Western Reserve University (CWRU) Bearing Data Center are employed [36,37] to demonstrate another application of the diagnosis code. In this case, artificial faults ranging in diameter from 0.007 to 0.028 inches were seeded with the electro-discharge machining (EDM) on drive-end (SKF 6205-2RS JEM) and fan-end (SKF 6203-2RS JEM) bearings. During the tests, data were collected at 12 and 48 kHz sampling rates for several datasets. Test After typing in the frequency band, it continues to the envelope analysis (lines 50-64), and the results are shown in Figure 16. In the figure, the four fault frequencies are plotted as vertical lines, at which the peak is examined to identify whether the fault has occurred or not. The result, however, should be interpreted with caution. It shows multitude of peaks at the several frequencies: the first group is at the shaft frequency (20 Hz = 1200 rpm/60 s) and its harmonics (black arrows), which can be caused by unbalance, misalignment, or coupling lock up. Details associated with shaft harmonics can be referred to the literature [34]. Next is at the BPFI (130 Hz) and sidebands with harmonics spaced at the interval of shaft frequency (20 Hz) around it (orange arrows). Once the peaks like these are identified around the BPFI, it indicates the inner race fault occurrence. Note that, in case of inner race fault, sidebands typically appear around the BPFI, whereas it is not so in the other faults [35]. Note also that sometimes the peaks can overlap by chance in the same frequency range, as is found at BPFO, which should not be mistaken as the fault. After typing in the frequency band, it continues to the envelope analysis (lines 50-64), and the results are shown in Figure 16. In the figure, the four fault frequencies are plotted as vertical lines, at which the peak is examined to identify whether the fault has occurred or not. The result, however, should be interpreted with caution. It shows multitude of peaks at the several frequencies: the first group is at the shaft frequency (20 Hz = 1200 rpm/60 s) and its harmonics (black arrows), which can be caused by unbalance, misalignment, or coupling lock up. Details associated with shaft harmonics can be referred to the literature [34]. Next is at the BPFI (130 Hz) and sidebands with harmonics spaced at the interval of shaft frequency (20 Hz) around it (orange arrows). Once the peaks like these are identified around the BPFI, it indicates the inner race fault occurrence. Note that, in case of inner race fault, sidebands typically appear around the BPFI, whereas it is not so in the other faults [35]. Note also that sometimes the peaks can overlap by chance in the same frequency range, as is found at BPFO, which should not be mistaken as the fault.

Diagnosis of CWRU Bearing Faults
Vibration data provided by the Case Western Reserve University (CWRU) Bearing Data Center are employed [36,37] to demonstrate another application of the diagnosis code. In this case, artificial faults ranging in diameter from 0.007 to 0.028 inches were seeded with the electro-discharge machining (EDM) on drive-end (SKF 6205-2RS JEM) and fan-end (SKF 6203-2RS JEM) bearings. During the tests, data were collected at 12 and 48 kHz sampling rates for several datasets. Test

Diagnosis of CWRU Bearing Faults
Vibration data provided by the Case Western Reserve University (CWRU) Bearing Data Center are employed [36,37] to demonstrate another application of the diagnosis code. In this case, artificial faults ranging in diameter from 0.007 to 0.028 inches were seeded with the electro-discharge machining (EDM) on drive-end (SKF 6205-2RS JEM) and fan-end (SKF 6203-2RS JEM) bearings. During the tests, data were collected at 12 and 48 kHz sampling rates for several datasets. Test bearings were driven with different rotation speeds ranging from 1797 to 1720 rpm. Accelerometers were installed on three different locations: on the drive-end bearing (DE), the fan-end bearing housing (FE), and the motor supporting base plate (BA).
Among others, this paper considers the bearing with inner race fault on the fan end. Geometry information and fault frequencies of the bearings are listed in Table 4 (the number of rolling elements and contact angles are not provided directly, and they are calculated by using the suggested fault frequencies), and the sampling rate and rotation speed are 12 kHz and 1796 rpm, respectively. The data are also available with the name "bearing 270" (the original data was resaved with two variables: vib for vibration data, and t for measuring time in rawData) at https://www.kau-sdol.com/. Further details regarding the data and test rig can be found in references [36,37]. In line 6, since the highest fault frequency is BPFI with 148.0802 Hz (4.947*1796/60), the maximum order of the AR model should be 82 (12000/148.0802), which is the number of data within a fault interval. In line 7, window lengths are set based on the criteria mentioned in Section 3.3. The results are shown in Figure 17. Based on the visual inspection of spectral kurtosis in Figure 17a, [5.3e3 5.9e3] Hz is selected in the command window for the demodulation band selection. After band-pass filtering by this interval, the results of the envelope analysis are shown in Figure 17b, which shows a dominant peak at BPFI (148.08 Hz) and sidebands with harmonics at the interval of shaft frequency (29.93 Hz), convincing that the bearing has inner race fault.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 23 bearings were driven with different rotation speeds ranging from 1797 to 1720 rpm. Accelerometers were installed on three different locations: on the drive-end bearing (DE), the fan-end bearing housing (FE), and the motor supporting base plate (BA). Among others, this paper considers the bearing with inner race fault on the fan end. Geometry information and fault frequencies of the bearings are listed in Table 4 (the number of rolling elements and contact angles are not provided directly, and they are calculated by using the suggested fault frequencies), and the sampling rate and rotation speed are 12 kHz and 1796 rpm, respectively. The data are also available with the name "bearing 270" (the original data was resaved with two variables: vib for vibration data, and t for measuring time in rawData) at https://www.kau-sdol.com/. Further details regarding the data and test rig can be found in references [36,37]. In line 6, since the highest fault frequency is BPFI with 148.0802 Hz (4.947*1796/60), the maximum order of the AR model should be 82 (12000/148.0802), which is the number of data within a fault interval. In line 7, window lengths are set based on the criteria mentioned in Section 3.3. The results are shown in Figure 17. Based on the visual inspection of spectral kurtosis in Figure 17a, [5.3e3 5.9e3] Hz is selected in the command window for the demodulation band selection. After band-pass filtering by this interval, the results of the envelope analysis are shown in Figure 17b, which shows a dominant peak at BPFI (148.08 Hz) and sidebands with harmonics at the interval of shaft frequency (29.93 Hz), convincing that the bearing has inner race fault.  As a supplementary explanation, the effects of each step in the signal preprocessing are also demonstrated in Figure 18, in which the envelope analysis are carried out to the raw signal (a) and to the residual signal that went through the AR filtering (b), respectively. In Figure 18a, the peak at BPFI in the spectrum is apparent but is smaller than the others, which is doubtful of the fault. Another peak appeared at BSF (59.69 Hz) also does not give us the confidence because it coincides with the harmonic of shaft frequency, which are the multiples of 29.93 Hz. After the AR filtering, the peak at the BPFI is increased greatly along with the sidebands with the intervals of shaft frequency in Figure 18b. This confirms the fault at that frequency. On the other hand, the height at the BSF is not changed relative to the neighbors, which suggests that it may not be the fault but the harmonic of the shaft frequency. In this case, the fault at BPFI is already identified at this step without further preprocessing, i.e., the demodulation band selection, although the result may be better interpreted as shown in Figure 17b. As such, not all bearings have to resort to the full preprocessing steps for the fault identification.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 18 of 23 As a supplementary explanation, the effects of each step in the signal preprocessing are also demonstrated in Figure 18, in which the envelope analysis are carried out to the raw signal (a) and to the residual signal that went through the AR filtering (b), respectively. In Figure 18a, the peak at BPFI in the spectrum is apparent but is smaller than the others, which is doubtful of the fault. Another peak appeared at BSF (59.69 Hz) also does not give us the confidence because it coincides with the harmonic of shaft frequency, which are the multiples of 29.93 Hz. After the AR filtering, the peak at the BPFI is increased greatly along with the sidebands with the intervals of shaft frequency in Figure  18b. This confirms the fault at that frequency. On the other hand, the height at the BSF is not changed relative to the neighbors, which suggests that it may not be the fault but the harmonic of the shaft frequency. In this case, the fault at BPFI is already identified at this step without further preprocessing, i.e., the demodulation band selection, although the result may be better interpreted as shown in Figure 17b. As such, not all bearings have to resort to the full preprocessing steps for the fault identification.
(a) (b) Figure 18. Envelope spectrum reflecting the raw and AR filtered signals: (a) envelope spectrum of the raw signal, and (b) envelope spectrum after AR filter.

Conclusions
Signal processing for feature extraction is an important step in the fault diagnosis and prognosis, but the process in detail are not fully understood for the beginners of PHM to implement by themselves. With this motivation, this paper has presented a tutorial for how to identify the fault using the envelope analysis along with the preprocessing steps such as AR filtering and SK filtering to remove the unnecessary signals and noises, which are widely used in the area of bearing fault diagnostics. MATLAB codes for the algorithms are provided, which consists of 64 lines and can be simply modified for the other applications. Two sets of bearing vibration data (data made by the authors and CWRU bearing data) are used to help the user to implement and understand the algorithms. The code and bearing data are available at https://www.kau-sdol.com/ and https://sites.google.com/view/sg-kim. Apart from the datasets in this tutorial, there are several more datasets open to the public, and some of those are listed in Table 5 to help the user practice diagnosis themselves.  Funding: This research received no external funding.