A Review of Feature Extraction Methods in Vibration-Based Condition Monitoring and Its Application for Degradation Trend Estimation of Low-Speed Slew Bearing

This paper presents an empirical study of feature extraction methods for the application of low-speed slew bearing condition monitoring. The aim of the study is to find the proper features that represent the degradation condition of slew bearing rotating at very low speed (≈1 r/min) with naturally defect. The literature study of existing research, related to feature extraction methods or algorithms in a wide range of applications such as vibration analysis, time series analysis and bio-medical signal processing, is discussed. Some features are applied in vibration slew bearing data acquired from laboratory tests. The selected features such as impulse factor, margin factor, approximate entropy and largest Lyapunov exponent (LLE) show obvious changes in bearing condition from normal condition to final failure.


Introduction
Slew bearings are large roller element bearings that are used in heavy industries such as mining and steel milling. They are particularly used in turntables, cranes, cranes, rotatable trolleys, excavators, reclaimers, stackers, swing shovels and ladle cars [1]. In this paper, the slew bearing being analyzed is the one typically used in BlueScope Steel Australia; specifically for the coal loading into the steel ladle turret, coal stacker and reclaimer. Slew bearing is different to typical rolling element bearing in terms of geometry, load and speed application. Slew bearings are usually operated at very low rotational speed, from 0.5 to 15 rpm in either continuous or reversible rotation and are designed with three axes to support high axial and radial loads. The bearings have large diameter (e.g., approximately from 200 mm to 5000 mm). Due to the high geometric and cost issue, industries purchase these bearings for replacement in a regular maintenance period. However, in some cases, these bearings may fail before the predicted time. To prevent catastrophic failure and to effectively schedule the bearing purchase for minimizing the replacement cost, a condition monitoring method for early failure detection is required.
Analyzing the vibration signatures of slew bearings that operate at a very low speed is more complex than that of typical rolling element bearings. Slew bearing signals have low energy. They are also non-stationary, non-linear, chaotic and practically hard to analyze. Due to their low energy, the signal is masked by a higher background noise level. Therefore, it is difficult to extract the pertinent

Data Acquisition Procedure
The vibration data for continuous rotation was collected from four accelerometers that were installed on the inner diameter surface at 90° to each other with 4880 Hz sampling rate. The accelerometers were IMI 608A11 ICP (IMI Sensors, PCB Piezotronics Division, New York, NY, USA) type sensors. The accelerometers were connected to a high-speed PicoScope DAQ (PS3424) (Pico Technology, St Neots, United Kingdom). The bearing was subjected to an axial load of 15 tons. The bearing began running in 2006 with rotational speed of 1 r/min. However, to provide continuous monitoring and produce run-to-failure bearing data, the bearing data was collected on a daily basis from February to August 2007 with total number of 139 days. To accelerate the bearing service life, coal dust was injected into the bearing in mid-April 2007 (58 days from the beginning). In practice, especially in steel-making companies, the slew bearing is located in open air where the bearing is exposed to a dusty environment and for this reason the coal dust was inserted in mid-April to simulate the real working conditions. The schematic of the vibration data acquisition from slew bearing test-rig is presented in Figure 2.

Data Acquisition Procedure
The vibration data for continuous rotation was collected from four accelerometers that were installed on the inner diameter surface at 90 • to each other with 4880 Hz sampling rate. The accelerometers were IMI 608A11 ICP (IMI Sensors, PCB Piezotronics Division, New York, NY, USA) type sensors. The accelerometers were connected to a high-speed PicoScope DAQ (PS3424) (Pico Technology, St Neots, UK). The bearing was subjected to an axial load of 15 tons. The bearing began running in 2006 with rotational speed of 1 r/min. However, to provide continuous monitoring and produce run-to-failure bearing data, the bearing data was collected on a daily basis from February to August 2007 with total number of 139 days. To accelerate the bearing service life, coal dust was injected into the bearing in mid-April 2007 (58 days from the beginning). In practice, especially in steel-making companies, the slew bearing is located in open air where the bearing is exposed to a dusty environment and for this reason the coal dust was inserted in mid-April to simulate the real working conditions. The schematic of the vibration data acquisition from slew bearing test-rig is presented in Figure 2.

Data Acquisition Procedure
The vibration data for continuous rotation was collected from four accelerometers that were installed on the inner diameter surface at 90° to each other with 4880 Hz sampling rate. The accelerometers were IMI 608A11 ICP (IMI Sensors, PCB Piezotronics Division, New York, NY, USA) type sensors. The accelerometers were connected to a high-speed PicoScope DAQ (PS3424) (Pico Technology, St Neots, United Kingdom). The bearing was subjected to an axial load of 15 tons. The bearing began running in 2006 with rotational speed of 1 r/min. However, to provide continuous monitoring and produce run-to-failure bearing data, the bearing data was collected on a daily basis from February to August 2007 with total number of 139 days. To accelerate the bearing service life, coal dust was injected into the bearing in mid-April 2007 (58 days from the beginning). In practice, especially in steel-making companies, the slew bearing is located in open air where the bearing is exposed to a dusty environment and for this reason the coal dust was inserted in mid-April to simulate the real working conditions. The schematic of the vibration data acquisition from slew bearing test-rig is presented in Figure 2.

Statistical Time-Domain Features
Statistical time-domain features such as mean, root mean square (RMS), standard deviation and variance were usually used in past studies to identify the differences between one vibration signal and another. More advanced statistical-based features such as skewness and kurtosis can be applied to the signal which is not purely stationary. These features examine the probability density function (PDF) of the signal. It is a well-known fact that if the condition of the bearing changes, the PDF also changes, thus the skewness and kurtosis might also be affected. In particular, skewness is used to measure whether the signal is negatively or positively skewed, while kurtosis measures the peak value of the PDF and indicates if the signal is impulse in nature. For a signal with a normal distribution i.e., normal bearing signal has a skewness value of zero.
In addition, kurtosis is obtained from the peak of the PDF of the vibration signal, while skewness is obtained from the mean value of the PDF of the vibration signal. It is well known that the kurtosis value of vibration signal from normal bearing is approximately three [9] and the skewness value of approximately zero. Then, when the PDF of vibration signal changes due to faults, the kurtosis value will increase to greater than three and the skewness value will shift to either negative or positive. Other features which can be calculated from PDF of the vibration signal includes: entropy, which calculates the histogram of the PDF and measures the degree of randomness of the vibration signal; lower bound and upper bound histogram, which measure the lower and upper values of the PDF respectively.
Aside from the statistical time-domain features mentioned above, there are other non-dimensional features such as shape factor (defines as RMS divided by mean) and crest factor (defines as standard deviation divided by RMS). Both features will change as the mean, RMS and standard deviation change. The summary of the statistical time-domain features is presented in Table 1. Table 1. Brief review of statistical time-domain features extraction applied in condition monitoring of typical rolling element bearings [10].

RMS
The RMS value increase gradually as fault developed. However, RMS is unable to provide the information of incipient fault stage while it increases with the fault development [11].
Variance measures the dispersion of a signal around their reference mean value.

Skewness
Skewness quantifies the asymmetry behavior of vibration signal through its probability density function (PDF).
Kurtosis quantifies the peak value of the PDF. The kurtosis value for normal rolling element bearing is well-recognized as 3.
Shape factor Shape factor is a value that is affected by an object's shape but is independent of its dimensions [12].

Crest factor
Crest factor (CF) calculates how much impact occur during the rolling element and raceway contact. CF is appropriate for "spiky signals" [12].

Entropy
Entropy, e(p), is a calculation of the uncertainty and randomness of a sampled vibration data. Given a set of probabilities, (p 1 , p 2 , . . . , p n ), the entropy can be calculated using the formulas as shown in the right column.

Upper and Lower Bound of Histogram
The application of the histograms or a discrete PDF for time-domain feature extraction is presented in [13]. Let d be the number of division that group the ranges, and h i with 0 ≤ i < d be the columns of the histogram for the time The upper bound h U and lower bound h L of histogram are defined as According to the features extraction results presented in Figure 3, some features, such as RMS, kurtosis, histogram upper bound and histogram lower bound, show the fluctuation in the last measurement day, approximately from the 90th day to the 139th day. These indicate that some features are sensitive to the slew bearing condition while others are less. The incipient damage occurred in day 90th that is clearly indicated by RMS, kurtosis, histogram upper bound and histogram lower bound feature, where the fluctuation is associated with roller and outer race defects. However, more appropriate and reliable features are necessary for the condition monitoring of slew bearing to detect an impending damage and to estimate the degradation index.

Upper and Lower Bound of Histogram
The application of the histograms or a discrete PDF for time-domain feature extraction is presented in [13]. Let d be the number of division that group the ranges, and i h with 0 i d ≤ < be the columns of the histogram for the time i The upper bound U h and lower bound L h of histogram are defined as According to the features extraction results presented in Figure 3, some features, such as RMS, kurtosis, histogram upper bound and histogram lower bound, show the fluctuation in the last measurement day, approximately from the 90th day to the 139th day. These indicate that some features are sensitive to the slew bearing condition while others are less. The incipient damage occurred in day 90th that is clearly indicated by RMS, kurtosis, histogram upper bound and histogram lower bound feature, where the fluctuation is associated with roller and outer race defects. However, more appropriate and reliable features are necessary for the condition monitoring of slew bearing to detect an impending damage and to estimate the degradation index.

Autoregressive (AR) Coefficients
In other studies [13], autoregressive (AR) coefficients of the vibration signal are calculated as bearing vibration features. It is known that a faulty vibration signal of typical rolling element bearing produces different autoregressive coefficients compared to that of normal vibration signal. The following is a common univariate time series model to calculate the AR coefficients: where a 1 to a n are the autoregressive coefficients, y t is the digitized vibration signal, n is the order of the AR model (n = 8) and ε is the residual (Gaussian white noise). The RMS, skewness, kurtosis, entropy and AR coefficients have been applied in vibration datasets from induction motor [14] and rolling element bearing [15]. These papers have made great contributions to the research areas on the bearing condition monitoring and fault diagnosis. However, the features were applied for artificially bearing damage. In practice, the damage is typically developed naturally rather than artificially.
The AR coefficients results are presented in Figure 4. It is shown that most of the features are less relevant to represent the slew bearing condition except for AR coefficient 4 which is shown fluctuating in the last measurement.

Autoregressive (AR) Coefficients
In other studies [13], autoregressive (AR) coefficients of the vibration signal are calculated as bearing vibration features. It is known that a faulty vibration signal of typical rolling element bearing produces different autoregressive coefficients compared to that of normal vibration signal. The following is a common univariate time series model to calculate the AR coefficients: where 1 a to n a are the autoregressive coefficients, t y is the digitized vibration signal, n is the order of the AR model ( 8 n= ) and ε is the residual (Gaussian white noise).
The RMS, skewness, kurtosis, entropy and AR coefficients have been applied in vibration datasets from induction motor [14] and rolling element bearing [15]. These papers have made great contributions to the research areas on the bearing condition monitoring and fault diagnosis. However, the features were applied for artificially bearing damage. In practice, the damage is typically developed naturally rather than artificially.
The AR coefficients results are presented in Figure 4. It is shown that most of the features are less relevant to represent the slew bearing condition except for AR coefficient 4 which is shown fluctuating in the last measurement. Other time domain features such as impulse factor (IF) and margin factor (MF) have been recently used [12,16,17]. Similar to CF, IF is used to measure how much impact is generated from the bearing defect. The CF measures the impact by dividing the maximum absolute value of vibration   Other time domain features such as impulse factor (IF) and margin factor (MF) have been recently used [12,16,17]. Similar to CF, IF is used to measure how much impact is generated from the bearing defect. The CF measures the impact by dividing the maximum absolute value of vibration signal and RMS value, while IF divides the maximum absolute value by the mean of absolute value. The formula is presented as follows: Margin factor (MF) measures the level of impact between rolling element and raceway. The MF is calculated by dividing the maximum absolute value of vibration signal to the RMS of absolute value of vibration signal (7): The progression of slew bearing deterioration of IF and MF features are more apparent compared to RMS, kurtosis and histogram lower features as presented in Figure 5. The formula is presented as follows: Margin factor (MF) measures the level of impact between rolling element and raceway. The MF is calculated by dividing the maximum absolute value of vibration signal to the RMS of absolute value of vibration signal (7): The progression of slew bearing deterioration of IF and MF features are more apparent compared to RMS, kurtosis and histogram lower features as presented in Figure 5.

Hjorts' Parameters
In addition to the time-domain features mentioned in previous subsections, Hjorts' parameters also falls into this category [18]. These features are calculated based on the first and the second derivatives of the vibration signal. In the time series context, the numerical values for the derivatives are obtained as the differences between the current value and the prior value. The first Hjorth's parameter is called 'activity'. The activity of vibration signal can be computed by simply calculating the variance of the vibration signal amplitude 2 x σ , where x σ is the standard deviation of the sampled vibration signal

Hjorts' Parameters
In addition to the time-domain features mentioned in previous subsections, Hjorts' parameters also falls into this category [18]. These features are calculated based on the first and the second derivatives of the vibration signal. In the time series context, the numerical values for the derivatives are obtained as the differences between the current value and the prior value. The first Hjorth's parameter is called 'activity'. The activity of vibration signal can be computed by simply calculating the variance of the vibration signal amplitude σ 2 x , where σ x is the standard deviation of the sampled vibration signal x = (x 1 , x 2 , x 3 , . . . , x N ). The 'mobility' parameter of vibration signal is calculated as the square root of the ratio of the activity of the first derivative and the activity of the vibration signal.
where σ x is the standard deviation of the first derivative of the vibration signal. The numerical values for the derivatives can be obtained as the differences between samples as follows.
The last parameter called 'complexity' which is calculated as the ratio of mobility of the first derivative and the mobility of the vibration signal.
These parameters have been used in electroencephalography (EEG) signals to detect the epileptic seizures [19]. They have never been used in vibration bearing signal except for activity feature, which is similar to the variance feature in the statistical time-domain features extraction. It can be seen from Figure 6 that the progression of bearing condition is difficult to track using mobility and complexity feature. In contrast, the activity or variance feature shows one highest peak at approximately the 90th day and continue fluctuation on the last measurement days. where ' x σ is the standard deviation of the first derivative of the vibration signal. The numerical values for the derivatives can be obtained as the differences between samples as follows.
The last parameter called 'complexity' which is calculated as the ratio of mobility of the first derivative and the mobility of the vibration signal.
These parameters have been used in electroencephalography (EEG) signals to detect the epileptic seizures [19]. They have never been used in vibration bearing signal except for activity feature, which is similar to the variance feature in the statistical time-domain features extraction. It can be seen from Figure 6 that the progression of bearing condition is difficult to track using mobility and complexity feature. In contrast, the activity or variance feature shows one highest peak at approximately the 90 th day and continue fluctuation on the last measurement days.

Mathematical Morphology (MM) Operators
In image processing, mathematical morphology (MM) was introduced as a non-linear method to analyze two dimensional (2D) image data including binary images and grey-level images based on set theory [20]. The fundamental principle of the MM method is to modify the shape of the original signal by transforming it through its intersection with another object called the 'structuring element' (SE). Four operators including erosion, dilation, closing and opening are used to achieve the transformation.
After having been applied successfully in 2D image processing, the MM method has been used in biomedical research field to analyze one dimensional (1D) EEG signals [21][22][23] and electrocardiography (ECG) signals [24]. The first known study of MM in 1D vibration signals was

Mathematical Morphology (MM) Operators
In image processing, mathematical morphology (MM) was introduced as a non-linear method to analyze two dimensional (2D) image data including binary images and grey-level images based on set theory [20]. The fundamental principle of the MM method is to modify the shape of the original signal by transforming it through its intersection with another object called the 'structuring element' (SE). Four operators including erosion, dilation, closing and opening are used to achieve the transformation.
After having been applied successfully in 2D image processing, the MM method has been used in biomedical research field to analyze one dimensional (1D) EEG signals [21][22][23] and electrocardiography (ECG) signals [24]. The first known study of MM in 1D vibration signals was presented by Nikolaou and Antoniadis [25]. The authors used the MM method for vibration data with inner race and outer race faults. The effects of four MM operators on the enveloping impulse-type signals were examined. The result shows that the 'closing' operator is more suitable for extracting the impulse signal corresponding to the bearing fault type. The SE value was empirically investigated within the range of 0.2 T to 1.2 T. The optimum SE for a signal with additive noise was selected as 0.6 T (0.6 times the pulse repetition period). In addition, T is the period of each bearing component frequency such as BPFO, BPFI and BSF.
The original discrete one dimensional (1D) signal and structuring element (SE) must be first determined to use the four basic operators in the MM method. Suppose the discrete 1D vibration signal • Erosion: also refer to as min filter.
• Dilation: also refer to as max filter.
• Closing: Dilates 1D signal and then erodes the dilated signal using the similar structuring element for both operations.
• Opening: Erodes 1D signal and then dilates the eroded signal using the similar structuring element for both operations.
where n ∈ (1, 2, . . . , N) and m ∈ (1, 2, . . . , M), the notations Θ, ⊕, •, • denote the erosion operator or Minkowski subtraction, dilation operator or Minkowski addition, closing operator and opening operator in the MM method, respectively. The original simulated 1D signal is shown in the first row of Figure 7. The various effects of the basic MM operators application in the simulated 1D vibration signal are presented in the second to fifth row of Figure 7. The second row of Figure 7 to the fifth row of Figure 7 illustrate the effect of erosion, dilation, closing and opening, respectively. Please note that the solid blue line represents the original signal and the dashed red line is the signal after being processed or transformed by the MM operators. The simulated signal has zero mean value, which is typical to 1D vibration signals. The work principle of the basic MM operators can be summarized as follows: erosion of x(n) by the structuring element A(m) reduces the positive part and enlarges the negative part of x(n), while dilation of x(n) by the structuring element A(m) reduces the negative part and enlarges the positive part of x(n). In addition, closing of x(n) smoothens the signal x(n) by cutting down any impulses in the negative part and opening x(n) smoothens the signal x(n) by cutting down its impulses in the positive part.
In rolling element bearing, the MM method has been studied. The multi-scale morphological analysis was developed by Zhang et al. [26]. The method shows independency of empirical rules in terms of SE selection, but the technique is rather complex. Wang et al. [27] proposed an improved morphological filter as features of periodic impulse signals. An average weighted combination of open-closing and close-opening morphological operators was employed to eliminate the statistical deflection of amplitude and to extract the impulse component from the original signal. To optimize the SE, they used a new criterion called impulse attenuation (IMA). Dong et al. [28] presented the fault diagnosis of rolling element bearings using the modified morphological method. The morphological operator used in their method was the average weight combination of the closing and opening operator. They proposed a new criterion called SNR criterion to optimize the length of SE. Another recently published research on MM method was presented by Raj and Murali [29]. The authors combined the MM operators with fuzzy inference to classify the artificial bearing defects. The vibration data used in [29] were similar to the data used in Zhang et al. [26]. terms of SE selection, but the technique is rather complex. Wang et al. [27] proposed an improved morphological filter as features of periodic impulse signals. An average weighted combination of open-closing and close-opening morphological operators was employed to eliminate the statistical deflection of amplitude and to extract the impulse component from the original signal. To optimize the SE, they used a new criterion called impulse attenuation (IMA). Dong et al. [28] presented the fault diagnosis of rolling element bearings using the modified morphological method. The morphological operator used in their method was the average weight combination of the closing and opening operator. They proposed a new criterion called SNR criterion to optimize the length of SE. Another recently published research on MM method was presented by Raj and Murali [29]. The authors combined the MM operators with fuzzy inference to classify the artificial bearing defects. The vibration data used in [29] were similar to the data used in Zhang et al. [26]. The application of the MM method in detecting the bearing fault frequencies from among impulsive-type signals have been demonstrated [26][27][28][29]. These signals are generated from known bearing conditions such as inner and/or outer race faults. It is acknowledged that the signals from artificial faults are easier to be identified than those from natural bearing faults. Besides, typically, the impulse-type signals are easily identified visually without any additional tool or processing step [28]. The application of MM method to the naturally damaged slew bearing is presented in Figures  8-10.
The MM method has been used effectively for the damage detection of high speed rolling bearing. It is because when the damage occurred, the contact between the damage spot and rolling element generates detectable impulse due to the contact between the defect spot and race way or roller. In the case of naturally degraded low-speed slew bearing, the multiple damage e.g., outer race, inner race and roller are typically occurred at close sequence time and the magnitude of impulse is very low. Thus, it is difficult to identify the origin of impulse from outer race, inner race or roller damage. Therefore, the feature extraction result of MM for the three conditions: BPFO, BPFI and BSF present an identical figure. The application of the MM method in detecting the bearing fault frequencies from among impulsive-type signals have been demonstrated [26][27][28][29]. These signals are generated from known bearing conditions such as inner and/or outer race faults. It is acknowledged that the signals from artificial faults are easier to be identified than those from natural bearing faults. Besides, typically, the impulse-type signals are easily identified visually without any additional tool or processing step [28]. The application of MM method to the naturally damaged slew bearing is presented in Figures 8-10.
The MM method has been used effectively for the damage detection of high speed rolling bearing. It is because when the damage occurred, the contact between the damage spot and rolling element generates detectable impulse due to the contact between the defect spot and race way or roller. In the case of naturally degraded low-speed slew bearing, the multiple damage e.g., outer race, inner race and roller are typically occurred at close sequence time and the magnitude of impulse is very low. Thus, it is difficult to identify the origin of impulse from outer race, inner race or roller damage. Therefore, the feature extraction result of MM for the three conditions: BPFO, BPFI and BSF present an identical figure.

Category 2: Frequency-Domain Features Extraction
To apply frequency-domain features, the time-domain vibration signals must initially be transformed into frequency-domain signals using fast Fourier transform (FFT). FFT is a common method in vibration analysis for bearing fault detection. FFT measures the dominant frequency of the repetitive impulse period of certain faults due to the contact between rolling elements and defective spot. Therefore, the type of bearing defects such as outer race, inner race or ball defect can be detected. FFT performed effectively in stationary periodic signals; however, it is less effective for nonstationary signals that arise from time-dependent events [30].

Statistical Frequency-Domain Features
Other frequency-domain features such as frequency centre (FC), root mean square frequency (RMSF) and root variance frequency (RVF) have been studied in [13]. When fault exists, the frequency element changes, and the values of the FC, RMSF and RVF also change. The FC and RMSF indicate the position changes of main frequencies, while the RVF shows the convergence of the power spectrum. Another feature called median frequency is sometimes used. The median frequency is measures by dividing the area of the amplitude spectrum into two equal part. In addition, the mean frequency (FC) can be defined as the first-order moment of the Fourier spectrum, RVF as the secondorder moment, SS as the third-order moment and SK as the fourth-order moment of the Fourier spectrum [18]. These features can be calculated as follows:

Category 2: Frequency-Domain Features Extraction
To apply frequency-domain features, the time-domain vibration signals must initially be transformed into frequency-domain signals using fast Fourier transform (FFT). FFT is a common method in vibration analysis for bearing fault detection. FFT measures the dominant frequency of the repetitive impulse period of certain faults due to the contact between rolling elements and defective spot. Therefore, the type of bearing defects such as outer race, inner race or ball defect can be detected. FFT performed effectively in stationary periodic signals; however, it is less effective for non-stationary signals that arise from time-dependent events [30].

Statistical Frequency-Domain Features
Other frequency-domain features such as frequency centre (FC), root mean square frequency (RMSF) and root variance frequency (RVF) have been studied in [13]. When fault exists, the frequency element changes, and the values of the FC, RMSF and RVF also change. The FC and RMSF indicate the position changes of main frequencies, while the RVF shows the convergence of the power spectrum. Another feature called median frequency is sometimes used. The median frequency is measures by dividing the area of the amplitude spectrum into two equal part. In addition, the mean frequency (FC) can be defined as the first-order moment of the Fourier spectrum, RVF as the second-order moment, SS as the third-order moment and SK as the fourth-order moment of the Fourier spectrum [18]. These features can be calculated as follows: The result of FC, RMSF and RVF feature extraction is presented in Figure 11.
The result of FC, RMSF and RVF feature extraction is presented in Figure 11.

Spectral Skewness, Spectral Kurtosis, Spectral Entropy and Shannon Entropy Feature
Recently, new methods such as spectral skewness, spectral kurtosis, spectral entropy and Shannon entropy have been developed. Spectral skewness (SS) and spectral kurtosis (SK) are the advanced statistical measures applied to the magnitude spectrum. The SS measures the symmetry of the distribution of the spectral magnitude values around its mean [31]. It is defined by The SK measures the distribution of the spectral magnitude values and compare to a Gaussian distribution. It is defined by Antoni and Randall [32] presents the development of classical kurtosis analysis called spectral kurtosis (SK) to deal with the transient behaviour in a signal. SK can be used to detect incipient faults even in the presence of resilient masking noise. The concept of spectral statistic is adopted in this method to supplement the classical power spectral density (PSD). PSD ideally gives zeros values at those frequencies when the signal has stationary Gaussian noise, and it gives high positive values at those frequencies during the occurrence of the transients. When the noise-to-signal ratio is high, the transient signal will be buried in the background noise, thus the incipient fault cannot be clearly detected. SK can overcome this problem by analyzing the entire frequency band and to select the sensitive frequency band that correspond to the bearing condition.

Category 3: Time-Frequency Representation
Time-frequency domain representation methods such as short-time Fourier transform (STFT), wavelet transform and Wigner-Ville distribution (WVD) are commonly used for non-stationary or

Spectral Skewness, Spectral Kurtosis, Spectral Entropy and Shannon Entropy Feature
Recently, new methods such as spectral skewness, spectral kurtosis, spectral entropy and Shannon entropy have been developed. Spectral skewness (SS) and spectral kurtosis (SK) are the advanced statistical measures applied to the magnitude spectrum. The SS measures the symmetry of the distribution of the spectral magnitude values around its mean [31]. It is defined by The SK measures the distribution of the spectral magnitude values and compare to a Gaussian distribution. It is defined by Antoni and Randall [32] presents the development of classical kurtosis analysis called spectral kurtosis (SK) to deal with the transient behaviour in a signal. SK can be used to detect incipient faults even in the presence of resilient masking noise. The concept of spectral statistic is adopted in this method to supplement the classical power spectral density (PSD). PSD ideally gives zeros values at those frequencies when the signal has stationary Gaussian noise, and it gives high positive values at those frequencies during the occurrence of the transients. When the noise-to-signal ratio is high, the transient signal will be buried in the background noise, thus the incipient fault cannot be clearly detected. SK can overcome this problem by analyzing the entire frequency band and to select the sensitive frequency band that correspond to the bearing condition.

Category 3: Time-Frequency Representation
Time-frequency domain representation methods such as short-time Fourier transform (STFT), wavelet transform and Wigner-Ville distribution (WVD) are commonly used for non-stationary or transient signal. These methods implement a mapping of one-dimensional time-domain signals to a two-dimensional function of time and frequency. The objective is to provide a true time-frequency representation of a signal. A review of time-frequency analysis methods for machinery fault diagnosis has been presented by Feng et al. [33].

Short-Time Fourier Transform (STFT)
Due to its low computational complexity and definite physical meaning, STFT is often used as an initial pre-processing tool to analyze non-stationary vibration signals. The STFT maps the vibration signal into a two-dimensional (2D) function of time and frequency [34]. STFT divides a non-stationary signal into small windows of equal time frame. The Fourier transformation is then applied to the time segment. Unlike FFT where the frequency transformation includes the whole original signal, in STFT original signal is decomposed using FFT at pre-defined time intervals. By doing this, the defect signal frequency can be identified in a particular window of time. The key to this method is the selection of the time interval. The STFT can be expressed as follows [35]: where STFT( f , t) is short-term frequency spectrum, t is the time, f is frequency, w(t − τ) is the shifted window, x(τ) denotes the input vibration signal, exp(j2π f τ) is the complex exponential, T is the window interval and w(τ) is the windowing function which satisfies w(τ) = 0 for |τ| => T/2. By taking the square magnitude of the STFT in (21), an energy density using the spectrogram S(t, f ) method can be obtained S(t, f ) = |STFT(t, f )| 2 (22) Equation (22) represents the energy density of the vibration signal x(τ) windowed by w(τ). The visualization of window length and hop size is presented in Figure 12. The vibration signal with the sample index i is split into overlapping blocks of length B L and block index n. While, the hop size H s is the distance from the start of one particular block and its subsequent block. The STFT result of one day slew bearing signal measurement is shown in Figure 13 for 10 s vibration signal with window length of 2048, hop size of 256 and number of FFT points of 16,384. It can be seen that during 10 s sampled signal, the amplitudes at lower frequencies were higher than the amplitude at high frequencies (approximately at 1300 Hz and 1750 Hz). This indicates that slew bearing frequencies were dominated by low frequency contents.

Short-Time Fourier Transform (STFT)
Due to its low computational complexity and definite physical meaning, STFT is often used as an initial pre-processing tool to analyze non-stationary vibration signals. The STFT maps the vibration signal into a two-dimensional (2D) function of time and frequency [34]. STFT divides a nonstationary signal into small windows of equal time frame. The Fourier transformation is then applied to the time segment. Unlike FFT where the frequency transformation includes the whole original signal, in STFT original signal is decomposed using FFT at pre-defined time intervals. By doing this, the defect signal frequency can be identified in a particular window of time. The key to this method is the selection of the time interval. The STFT can be expressed as follows [35]: where STFT ( , ) f t is short-term frequency spectrum, t is the time, f is frequency, ( ) The visualization of window length and hop size is presented in Figure 12. The vibration signal with the sample index i is split into overlapping blocks of length L Β and block index n . While, the hop size s Η is the distance from the start of one particular block and its subsequent block. The STFT result of one day slew bearing signal measurement is shown in Figure 13 for 10 s vibration signal with window length of 2048, hop size of 256 and number of FFT points of 16,384. It can be seen that during 10 s sampled signal, the amplitudes at lower frequencies were higher than the amplitude at high frequencies (approximately at 1300 Hz and 1750 Hz). This indicates that slew bearing frequencies were dominated by low frequency contents.

Wavelet Transform and Wavelet Decomposition
Wavelet transform is well-known as an appropriate method to analyze non-stationary and transient signal [14,36]. Wavelet transformation employs wavelet function instead of sinusoidal function as the basis function. It enhances a scale variable in addition to the time variable in the inner product transformation [33]. For any limited energy signal a > , t is the time shift and 1/ a is a normalization factor to maintain energy preservation. A review of the wavelet transform applications in machine condition monitoring and fault diagnostics is presented by Peng et al. [37]. As a non-stationary signal analysis method, wavelet transform can be applied in machine fault diagnostics as a feature extraction method [37]. Since the wavelet transformations can provide an excellent energy concentration properties due to the uses of the wavelet basis function; the wavelet transform can represent the signal with a certain number of coefficients. The process is usually called wavelet decomposition.
Wavelet decomposition technique decomposes a non-stationary vibration signal into linear form of time-scale units. It decomposes and organizes original signal into several signal components according to the translation of the wavelet function. This changes the scale and shows the transition of each frequency component [38]. Through this decomposition, a selected coefficient can be used directly as the fault features. In wavelet decomposition, the input signal is basically decomposed into two coefficients. The input signal that is passed through the low-pass filter becomes the 'approximation coefficient'. The input signal which is passed through the high-pass filter becomes the 'detail coefficient.' Since the low-frequency component is the most sensitive part to extract the useful information related to the bearing condition, the 'approximate coefficient' is passed to the next wavelet decomposition loop in multilevel wavelet decomposition process. Figure 14 shows three level wavelet decomposition structures. The structure illustrates the decomposition of slew bearing vibration signal into approximate coefficient (A) and detailed coefficient (D) at each level. For further processing, features such as mean, variance, skewness and kurtosis are calculated from the detailed coefficients of level 3 (D3) [39]. The features extraction results are presented in Figure 15, it can be seen that kurtosis can predict the damage compared to mean, variance and skewness calculated from

Wavelet Transform and Wavelet Decomposition
Wavelet transform is well-known as an appropriate method to analyze non-stationary and transient signal [14,36]. Wavelet transformation employs wavelet function instead of sinusoidal function as the basis function. It enhances a scale variable in addition to the time variable in the inner product transformation [33]. For any limited energy signal x(t) ∈ L 2 (R), the wavelet function is defined as [26] WT where wavelet ψ[(τ − t)/a] is derived by dilating and translating the wavelet basis ψ(t), a is the scale parameter (a > 0), t is the time shift and 1/ √ a is a normalization factor to maintain energy preservation.
A review of the wavelet transform applications in machine condition monitoring and fault diagnostics is presented by Peng et al. [37]. As a non-stationary signal analysis method, wavelet transform can be applied in machine fault diagnostics as a feature extraction method [37]. Since the wavelet transformations can provide an excellent energy concentration properties due to the uses of the wavelet basis function; the wavelet transform can represent the signal with a certain number of coefficients. The process is usually called wavelet decomposition.
Wavelet decomposition technique decomposes a non-stationary vibration signal into linear form of time-scale units. It decomposes and organizes original signal into several signal components according to the translation of the wavelet function. This changes the scale and shows the transition of each frequency component [38]. Through this decomposition, a selected coefficient can be used directly as the fault features. In wavelet decomposition, the input signal is basically decomposed into two coefficients. The input signal that is passed through the low-pass filter becomes the 'approximation coefficient'. The input signal which is passed through the high-pass filter becomes the 'detail coefficient.' Since the low-frequency component is the most sensitive part to extract the useful information related to the bearing condition, the 'approximate coefficient' is passed to the next wavelet decomposition loop in multilevel wavelet decomposition process. Figure 14 shows three level wavelet decomposition structures. The structure illustrates the decomposition of slew bearing vibration signal into approximate coefficient (A) and detailed coefficient (D) at each level. For further processing, features such as mean, variance, skewness and kurtosis are calculated from the detailed coefficients of level 3 (D3) [39]. The features extraction results are presented in Figure 15, it can be seen that kurtosis can predict the damage compared to mean, variance and skewness calculated from signal D3 of the wavelet decomposition due to the significant changes after day 90 of the bearing life time.
The application of wavelet decomposition to the transient stator current signal acquired from induction motor is presented by Niu et al. [40]. Statistical features have also been used to reveal the fault information from the selected coefficients. Another study that uses the wavelet coefficients as the fault features for ball bearing is presented by Liu et al. [41].
Machines 2017, 5, 21 16 of 28 signal D3 of the wavelet decomposition due to the significant changes after day 90 of the bearing life time.
The application of wavelet decomposition to the transient stator current signal acquired from induction motor is presented by Niu et al. [40]. Statistical features have also been used to reveal the fault information from the selected coefficients. Another study that uses the wavelet coefficients as the fault features for ball bearing is presented by Liu et al. [41].

Empirical Mode Decomposition-Based Hilbert Huang Transform
Empirical mode decomposition (EMD) [42] has been used in some applications that involve nonstationary signals (see, e.g., Braun and Feldman [43]). A review of EMD applications in fault diagnosis of rotating machinery is discussed in [44]. The review gives a detailed introduction to EMD application in various areas such as bearings, gears and rotors etc.
The EMD decomposes the vibration signal into several signals from high to low frequencies content, namely intrinsic mode functions (IMFs) based on the enveloping technique, i.e. Hilbert-Huang transform (HHT). From the specific frequencies content information, the bearing fault frequency can be identified. In some preliminary studies on condition monitoring [39,45], the original EMD [46] is used for non-stationary slew bearing data. A bearing fault such as outer race, inner race or rolling element fault has occurred when one of the IMF frequencies is identical to one of the bearing fault frequencies (as shown in Table 2) [45]. The EMD results for slew bearing data are presented in Table 3. A detail EMD result is presented in [39,45]. Table 2. Fault frequencies of slew bearing test-rig (run at 1 r/min).

Empirical Mode Decomposition-Based Hilbert Huang Transform
Empirical mode decomposition (EMD) [42] has been used in some applications that involve non-stationary signals (see, e.g., Braun and Feldman [43]). A review of EMD applications in fault diagnosis of rotating machinery is discussed in [44]. The review gives a detailed introduction to EMD application in various areas such as bearings, gears and rotors etc.
The EMD decomposes the vibration signal into several signals from high to low frequencies content, namely intrinsic mode functions (IMFs) based on the enveloping technique, i.e., Hilbert-Huang transform (HHT). From the specific frequencies content information, the bearing fault frequency can be identified. In some preliminary studies on condition monitoring [39,45], the original EMD [46] is used for non-stationary slew bearing data. A bearing fault such as outer race, inner race or rolling element fault has occurred when one of the IMF frequencies is identical to one of the bearing fault frequencies (as shown in Table 2) [45]. The EMD results for slew bearing data are presented in Table 3. A detail EMD result is presented in [39,45]. Table 2. Fault frequencies of slew bearing test-rig (run at 1 r/min).

Axial Radial
Outer raceway (BPFO) 1.32 0.55 Inner raceway (BPFI) 1.37 0.55 Rolling element (BSF) 0.43 0.54 Table 3. Decomposition result of EMD method for the slew bearing data. According to Table 3, the frequencies that correspond to the bearing fault condition did not appear on 24 February. In contrast, the fault frequency of rolling element (BSF axial) is identified on 3 May at 0.43 Hz. This frequency is associated to the roller defects as shown in Figure 16a. Interesting result was obtained from data on 30 August, where the BSF axial is hidden by BPFI axial of 1.37 Hz. The reason could be that the amplitude of signal 0.43 Hz is too weak and is covered by the background noise. The photo of the outer raceway damage is shown in Figure 16b. The roller and outer race damage photo was taken after final failure during the bearing dismantle on 2 September. The detailed vibration data used from the slew bearing test rig is presented in [39]. Statistical features from the decomposition results of EMD method is shown in Figure 17. The mean, variance, skewness and kurtosis features are calculated from the summation of the low IMF frequencies (from IMF 9 to the lowest IMF) of the EMD results. The selection of these IMFs is based on the characteristic of slew bearing signal being dominated by low frequency components as presented previously in STFT result.

Vibration Data
Machines 2017, 5, 21 18 of 28 According to Table 3, the frequencies that correspond to the bearing fault condition did not appear on 24 February. In contrast, the fault frequency of rolling element (BSF axial) is identified on 3 May at 0.43 Hz. This frequency is associated to the roller defects as shown in Figure 16a. Interesting result was obtained from data on 30 August, where the BSF axial is hidden by BPFI axial of 1.37 Hz. The reason could be that the amplitude of signal 0.43 Hz is too weak and is covered by the background noise. The photo of the outer raceway damage is shown in Figure 16b. The roller and outer race damage photo was taken after final failure during the bearing dismantle on 2 September. The detailed vibration data used from the slew bearing test rig is presented in [39]. Statistical features from the decomposition results of EMD method is shown in Figure 17. The mean, variance, skewness and kurtosis features are calculated from the summation of the low IMF frequencies (from IMF 9 to the lowest IMF) of the EMD results. The selection of these IMFs is based on the characteristic of slew bearing signal being dominated by low frequency components as presented previously in STFT result.  According to Table 3, the frequencies that correspond to the bearing fault condition did not appear on 24 February. In contrast, the fault frequency of rolling element (BSF axial) is identified on 3 May at 0.43 Hz. This frequency is associated to the roller defects as shown in Figure 16a. Interesting result was obtained from data on 30 August, where the BSF axial is hidden by BPFI axial of 1.37 Hz. The reason could be that the amplitude of signal 0.43 Hz is too weak and is covered by the background noise. The photo of the outer raceway damage is shown in Figure 16b. The roller and outer race damage photo was taken after final failure during the bearing dismantle on 2 September. The detailed vibration data used from the slew bearing test rig is presented in [39]. Statistical features from the decomposition results of EMD method is shown in Figure 17. The mean, variance, skewness and kurtosis features are calculated from the summation of the low IMF frequencies (from IMF 9 to the lowest IMF) of the EMD results. The selection of these IMFs is based on the characteristic of slew bearing signal being dominated by low frequency components as presented previously in STFT result.

Wigner-Ville Distribution (WVD)
The Wigner distribution (WD) is derived from the relationship between the power spectrum and the autocorrelation function for time-variant and non-stationary processes [47]. The correlation function is a product of the function at a posterior time with the function at a subsequent time [48]. The original WD for real signal x(t) is defined as [49] The symbol ( * ) in Equation (24) denotes the complex conjugate. When the WD is applied for the analytical signal x A (t), it is called the Wigner-Ville distribution (WVD). The analytical signal is defined as where x h (t) is the Hilbert transform of x(t): The WVD has been used for gear fault detection [47,48] and it is recently used for rolling element bearing to represent the time-frequency features of vibration signals [50].

Category 4: Phase-Space Dissimilarity Measurement
Phase-space dissimilarity measurements, such as fractal dimension, correlation dimension approximate entropy and largest Lyapunov exponent (LLE), are developed based on the chaos theory. These methods have been applied in biomedical engineering studies [19,51] and vibration signals [52,53]. Present studies have conducted investigations into these methods in the slew bearing condition monitoring for features extraction. The motivation is that multiple defects occur more frequently than single defects in slew bearings, therefore the phase-space dissimilarity measurements can be used to analyze the irregular or chaotic vibration signals. The phase-space can be computed as follows: where J is reconstruction delay that can be calculated by time lag/∆t, m is the embedding dimension and M is the number of reconstructed vectors. The relation between N, J, m and M can be defined as: Thus, the dimension of phase-space is a M × m matrix.

Fractal Dimension
Fractal dimension is a method to quantify the signal complexity by characterizing fractal patterns. The complexity can be quantified by examining the dynamical behaviours of phase-space matrix. The frequently used algorithm of fractal dimension was proposed by Higuchi [54]. The field is rapidly growing as estimated fractal dimensions for statistically self-similar phenomena. This method has been applied in many fields including image analysis [55], neuroscience [56,57], medicine [58], physics [59] and acoustics [60]. The application of fractal dimension in vibration signal of rolling element bearing has been presented in [10,61]. The algorithm of fractal dimension is shown in (28) [10]. Once the phase-space matrix, X is obtained (27), the mean absolute length between J th and (J th − 1) element of X is defined as follows: f or a = 1 : where J and m is reconstruction delay and embedding dimension, respectively. The fractal dimension D can be calculated using standard least-square fitting method of the logarithmic natural of (1/m) and L m .

Correlation Dimension
Correlation dimension can be used to quantify the self-similarity of vibration signal. A large value of correlation dimension corresponds to less-similarity of vibration signal. An example application of correlation dimension in vibration signal for bearing fault diagnosis was proposed by Logan and Mathew [52]. The correlation dimension was applied for vibration signals acquired from four different conditions: (1) normal (new bearing); (2) outer race fault; (3) inner race fault; and (4) roller fault. The damages were artificially introduced to the bearing parts. The method is able to identify four different bearing conditions.
The commonly used procedure to calculate the correlation dimension was introduced by Grassberger and Procaccia [62]. Similar to fractal dimension, the phase-space matrix (27) is fed to the correlation dimension algorithm as an input matrix. The formula for correlation dimensionis given by: where X i and X j are the position vectors on attractor, l is the distance under consideration, Θ(x) is the Heaviside step function, Θ(x) = 0 if X ≤ 0, Θ(x) = 1 if X > 0, k is the summation offset, M is the number of reconstructed vectors from the original vibration signal and C(l) is the correlation dimension.

Approximate Entropy
The approximate entropy quantifies the degree of regularity in the vibration signal. The smaller the value of the approximate entropy, the more the regularity of the behavior of the signal is. An application example of approximate entropy in vibration signal was proposed by Yan et al. [63] for high speed rolling element bearing. Moreover, Yan et al. [63] mentioned that the deterioration of bearing condition corresponds to the increase of number of frequency components. In addition, when the bearing starts to deteriorate the regularity will decrease and the corresponding approximate entropy value will increase. Similar to fractal dimension and correlation dimension, approximate entropy also uses phase-space matrix, X in (27) as an input. The initial step analysis of the approximate entropy algorithm is to measure the distance between two vectors X(i) and X(j) which can be defined as the maximum difference in their respective corresponding elements [63]: where i = 1, 2, . . . , N − m + 1, j = 1, 2, . . . , N − m + 1 and N is the number of data points in the vibration signal. A calculation that describes the similarity between the vector X(i) and all other vectors X(j), where j = i can be constructed as the Heaviside step function, Θ(x) is similar to the symbol in correlation dimension where Θ(x) = 0 if X ≤ 0 and Θ(x) = 1 if X > 0. The symbol r in (31) denotes a predetermined tolerance value, defined as where std(Y) refers to the standard deviation of vibration signal, Y and k is a constant (k > 0). By defining the approximate entropy value of time series can be computed as

Largest Lyapunov Exponent
The largest Lyapunov exponent (LLE) algorithm is an established method and has been used in some areas such as biomedical signal processing analysis especially for electroencephalography (EEG) signal [19]. The LLE algorithm calculates the exponential divergence (either positive or negative exponential) of two initial neighboring trajectories in a phase-space matrix. In other words, the LLE algorithm quantifies the degree of chaos of vibration signal in certain time. The degree of chaos corresponds to any local instability vibration signal due to the dynamic contact between rolling elements and defect spots during the data acquisition. The detailed discussion and the application of LLE in slew bearing vibration data is presented in [64].
The features extraction result for correlation dimension, fractal dimension and approximate entropy from slew bearing vibration data is presented in Figure 18. It can be seen that the approximate entropy shows better result in presenting the degradation condition of the slew bearing than the correlation dimension and fractal dimension. A fluctuation in the approximate entropy in the last measurement (after the 90th day) indicates that the slew bearing condition has changed.

Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov (KS) test is a nonparametric test that can be used to compare or to measure the similarity of two cumulative distribution functions (CDFs), T( ) x and R( ) x [65]. T( ) x is a target CDF and R( ) x is a reference CDF. Because the signature of bearing condition can be identified by its CDF, therefore the KS test can be applied in bearing condition monitoring to distinguish bearing health signal from faulty signal. In order to distinguish the two CDFs, T( ) x and R( ) x , a statistical distance D is calculated. The distance D can be defined as the maximum absolute distance between T( ) x and R( ) x . This is defined as follows The KS feature of slew bearing vibration data is presented in Figure 19. The '0' represents normal condition and '1' represent abnormal condition. It can be seen that at the beginning of the measurement, the condition of bearing was still normal. Then it changed to abnormal condition when bearing had run for a couple of days. This result needs further investigation as the abnormal detection happened too early.

Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov (KS) test is a nonparametric test that can be used to compare or to measure the similarity of two cumulative distribution functions (CDFs), T(x) and R(x) [65]. T(x) is a target CDF and R(x) is a reference CDF. Because the signature of bearing condition can be identified by its CDF, therefore the KS test can be applied in bearing condition monitoring to distinguish bearing health signal from faulty signal. In order to distinguish the two CDFs, T(x) and R(x), a statistical distance D is calculated. The distance D can be defined as the maximum absolute distance between T(x) and R(x). This is defined as follows The KS feature of slew bearing vibration data is presented in Figure 19. The '0' represents normal condition and '1' represent abnormal condition. It can be seen that at the beginning of the measurement, the condition of bearing was still normal. Then it changed to abnormal condition when bearing had run for a couple of days. This result needs further investigation as the abnormal detection happened too early.

Sample Entropy
The sample entropy has been proposed by Richman and Moorman [66] to improve the approximate entropy. Chen et al. [67] presents an application of sample entropy for features extraction of gearbox vibration signals. However, the application of the sample entropy for bearing vibration signal is still limited [68]. The sample entropy feature for slew bearing vibration data is presented in Figure 20. It can be observed from Figure 20 that the sample entropy feature is not effective for slew bearing damage detection.

Sample Entropy
The sample entropy has been proposed by Richman and Moorman [66] to improve the approximate entropy. Chen et al. [67] presents an application of sample entropy for features extraction of gearbox vibration signals. However, the application of the sample entropy for bearing vibration signal is still limited [68]. The sample entropy feature for slew bearing vibration data is presented in Figure 20. It can be observed from Figure 20 that the sample entropy feature is not effective for slew bearing damage detection.

Sample Entropy
The sample entropy has been proposed by Richman and Moorman [66] to improve the approximate entropy. Chen et al. [67] presents an application of sample entropy for features extraction of gearbox vibration signals. However, the application of the sample entropy for bearing vibration signal is still limited [68]. The sample entropy feature for slew bearing vibration data is presented in Figure 20. It can be observed from Figure 20 that the sample entropy feature is not effective for slew bearing damage detection.   The singular value decomposition (SVD) is usually applied to quantify the periodicity of the time series data [69]. A comparison study [70] has been shown that SVD performs the Fourier decomposition in terms of information content and robustness. Formula of SVD can be found in [71].
3.6.2. Piecewise Aggregate Approximation (PAA) and Adaptive Piecewise Constant Approximation (APCA) In online bearing condition monitoring cases, the changes of bearing condition can be effectively identified by measuring the similarity between two vibration signals (i.e., normal or reference and monitored signal) using Euclidean distance. However, the size of the measured data is a problem in Euclidean distance method. The similarity search methods are techniques that perform dimensionality data reduction methods such as piecewise aggregate approximation (PAA) [72,73] and adaptive piecewise constant approximation (APCA) [72]. The APCA [74], is another approach of PAA that allows arbitrary length segments. Either PAA or APCA have particular advantages; for example, PAA has twice as many approximating segments, and APCA is able to separate a single segment in an area of low activity and many segments in areas of high activity [74]. The PAA has been applied in previous study [39] for data pre-processing stage of circular domain features extraction. The detailed discussion and the application of PAA in slew bearing vibration data is presented in [39].

Conclusions
Features extraction methods in wide range application such as mechanical engineering (i.e., for bearing vibration signals), biomedical engineering area (i.e., for EEG and ECG signals) and finance area (i.e., for time series signals) have been presented. According to the certain characteristic of existing feature extraction methods, the methods in this paper were classified into six categories: (1) time-domain features extraction; (2) frequency-domain features extraction; (3) time frequency representation; (4) phase-space dissimilarity measure; (5) complexity measure; and (6) other features. Some of features extraction methods in each category have been empirically tested using actual normal to failure slew bearing vibration signal. It was shown that some features do not show bearing degradation trend clearly and the others are shown to be potential slew bearing condition parameters. The potential features are: impulse factor, margin factor, mathematical morphology operators, wavelet decomposition-kurtosis, approximate entropy and largest Lyapunov exponent.
Since slew bearing operating at very low speed with highly applied load, the vibration signals exhibit non-linear, non-stationary and chaotic behaviour. The methods such as impulse factor and margin factor measured the statistical properties of vibration signal and therefore could revealed the information related to the impulse level. Other potential methods based on the phase-space dissimilarity measurement such as approximate entropy and LLE algorithm can be used to extract pertinent bearing signal corresponding to the bearing damage progression. This is because the methods analyzed the vibration signal in certain time-frame to extract the useful information from the non-linear and non-stationary behaviour.