Evaluation of Time and Frequency Condition Indicators from Vibration Signals for Crack Detection in Railway Axles

: Railway safety is a matter of importance as a single failure can involve risks associated with economic and human losses. The early fault detection in railway axles and other railway parts represents a broad ﬁeld of research that is currently under study. In the present work, the problem of the early crack detection in railway axles is addressed through condition-based monitoring, with the evaluation of several condition indicators of vibration signals on time and frequency domains. To achieve this goal, we applied two different approaches: in the ﬁrst approach, we evaluate only the vibrations signals captured by accelerometers placed along the longitudinal direction and, in the second approach, a data fusion technique at the condition indicator level was conducted, evaluating six accelerometers by merging the indicator conditions according to the sensor placement. In both cases, a total of 54 condition indicators per vibration signal was calculated and selecting the best features by applying the Mean Decrease Accuracy method of Random Forest. Finally, we test the best indicators with a K-Nearest Neighbor classiﬁer. For the data collection, a real bogie test bench has been used to simulate crack faults on the railway axles, and vibration signals from both the left and right sides of the axle were measured. The results not only show the performance of condition indicators in different domains, but also show that the fusion of condition indicators works well together to detect a crack fault in railway axles.


Introduction
The railway transportation has a rapid growth worldwide, railway safety is a subject of high interest in the research field. Railway axles are one of the most critical elements in railway transportation systems, and failures such as a cracked axle can lead to the derailment and probably human and economic losses. Therefore, the early detection of faults in railway axles is crucial in railway safety [1,2].
The ultrasound technique can be used to perform condition monitoring over railway axles; however, its disadvantage is that it does not provide continuous information between different tests; therefore, it is not possible to detect fast-growing faults [3]. Other techniques that involve variables, such as temperature [4], acoustics [5], and acoustic emission [1], have been used for the continuous monitoring of faults on railway axles. However, vibration signal monitoring has become the most common monitoring technique due to its high reliability. Fault diagnosis based on vibration signals enables early fault detection, online condition monitoring, and when combined with different signal processing methods and artificial intelligence, better diagnostic results are obtained [6][7][8].
Fault diagnosis based on vibration signals with a data-driven approach is generally accomplished in four phases: (a) acquisition and conditioning of the vibration signal, (b) extraction of features, also called condition indicators, (c) selection of features, and (d) classification. After acquisition and conditioning of the vibration signal, in the feature extraction phase, we study the change of the signal behavior that can be an indicator of the fault condition. The study of these changes in the signal can be focused on the time, frequency or time-frequency domains. From these domains, condition indicators (CIs) or features can be extracted, which allows monitoring or detecting different faults [9]. The evaluation of the CIs has been widely studied, leading to achieve good results in the detection of faults in the railway axles. In the time-frequency domain, the energy calculated by means of Wavelet Packet Transform (WPT) has been used, allowing crack detection with excellent results [10,11]. The above-mentioned works only measured vibrations in the railway axle and bearings in insulation, without the bogie.
The work developed by Gómez et al. [7] (which is the same case of study of the present work), used real railway axles installed in a real Y21 bogie, where the vibration signals from six accelerometers were processed by means of the WPT energy. The feature selection stage was carried out by means of a visual analysis, the energy packages were selected to increase their values with the depth of the crack, and the packages varied with the change of speed. In the classification stage, a radial-basis function-based artificial neural network was used with 32 inputs corresponding to the selected energy packets, the load and speed values; the two possible outputs of the network were healthy or cracked conditions. This work highlighted that the six accelerometers provide important information for detection and better results are achieved at certain speeds. A recent work, over this same case, presented by Lucero at al. [12] evaluated the signals from the six accelerometers; thirty features from the time-domain signal were extracted, then, feature selection was applied, and finally the classification was implemented through a random forest classifier. The best accuracy in the classification was found with ten features, extracted from vibration signals measured with the accelerometers located in the longitudinal direction. The results in this work also show that features such as Wilson Amplitude (WAMP), Wave length (WL), Zero crossing (ZC), Slope Sign Change (SSC), mean, Energy Operator (EO), and the skewness are well-suitable to handle the fault classification. On the other hand, vibrations signals by nature exhibit random behavior in a wide range of applications. To reveal the strengths of different signal domains, the Fast Fourier Transform can be used to switch from time to frequency domain, and it has been noticed changes in the vibration signature of railway elements such as axle-boxes [13]. Moreover, if we want to understand how the strength of a signal is distributed in the frequency domain, we can use the power spectral density (PSD), which describes the power of the signal as a function per frequency unit. Therefore, the PSD can be used to infer normal operation or fault conditions of railway vehicle [14,15].
Through an analysis of CIs, it is possible to obtain adequate information for the understanding and interpretation of the machinery condition, such as the definition of limit values of certain indicators to establish that the machinery is in normal or abnormal conditions. This would support the diagnosis process and the maintenance decision-making [16,17]. On the other hand, the result of the diagnosis from the analysis of CIs can be improved by having several sensors to monitor the machinery because it would allow performing Data -Fusion [18,19]. The extraction of indicators in time and frequency domains requires a lower computational cost than the required one for calculating indicators in the time-frequency domain [20].
Data Fusion refers to the combination of data from multiple sensors of either the same or different types, and can be defined as the use of techniques that combine data from multiple sources. Thus, by using data fusion, a more reliable and realistic inference, deduction or discrimination can be made by using data from different isolated sources in data-driven approaches [21].
The goal of this work is the evaluation of the performance of condition indicators in time and frequency domains for crack detection in railway axles through the use of vibration signals. Two approaches are proposed: (1) the first one evaluates the indicators extracted from vibration signal of two accelerometers, in time domain, frequency domain, and their combination, and (2) the second approach evaluates the data fusion of the indicators of the six accelerometers. The rest of this paper is organized as follows. Section 2 presents the condition indicators used in this work, the selection, and classification methods. Section 3 contains the experimental set-up and data acquisition. In Section 4, the proposed methodology for evaluating the condition indicators performance is detailed. Then, Section 5 shows the results and discussion, and finally in Section 6, the conclusions are addressed.

Condition Indicators
The analysis of signals can be done with several techniques and the use of signals on different domains can be useful to enrich the information obtained from a signal, leading us to a better understanding of its nature. If we are interested in quantifying some signal properties, we can use mathematics, statistics-based values or condition indicators to measure the different signal characteristics; this can help revealing the hidden information inside the signal. These condition indicators (CIs) are often called features. In the present work, the approach used in this feature extraction phase is a combination of different features computed from the signal in both time and frequency domains.
In case of time domain, we used 30 statistical indicators resulting from a mixture of common features mainly used in fault diagnosis and features coming from the Electromyography (EMG) field, such as described in [9]. Additionally, we used frequency-based CIs due to the fact that the frequency domain can reveal valuable information about the change in the monitored system condition in a different shape.
The frequency spectrum X(k) of a discrete-time signal x(i) can be computed by using the Fast Fourier Transform (FFT). The spectral analysis shows all the harmonic components of a signal, leading to a better understanding of the underlying phenomenon behavior. Another common approach to search signal characteristics in the frequency domain is the Power Spectral Density (PSD). The power spectrum of a signal P(k) is given by where X(k) is a previously obtained frequency spectrum and K denotes the number of points in the power spectrum. The PSD measures the average power of a signal in terms of the frequency, and it also shows periodicities [22,23]. The knowledge about the power distribution among the frequency components contained in a signal is also useful to understand the signal nature.
In frequency domain, we have used 24 condition indicators: 15 of them computed over the frequency spectrum and nine over the power spectrum. These 24 condition indicators are presented in Table 1; here, f k is the frequency value of the spectrum in the corresponding frequency bin k, whereas K denotes the total number of samples in the frequency and power spectrum.
Kurtosis of spectrum Entropy of spectrum F15 = − ∑ K−1 k=1 P n (k)log 2 [P n (k)] where P n is the normalized total spectral energy P n (k) = X(k)

Random Forest for Feature Selection
Random Forest (RF) is a machine learning algorithm designed by Breiman in 2001 [24] for classification and regression. RF uses multiple decision trees to classify a sample; each decision tree is built by using bootstrap sampling with a random feature selection implementation. Their predictions are used in a voting system where, from all trees, a majority class is calculated. The out-of-bag (OOB) error is mostly used to compute the expected model generalization performance [25].
In Random Forest, two methods defined by Breiman can be used for feature selection through of a feature ranking (feature importance): Mean Decrease Impurity (MDI) and Mean Decrease Accuracy (MDA). In this work, MDA is used for feature selection because if a feature does not impact the model, the method permutes the features values, such that the prediction accuracy should not decrease over the OBB observations. Feature Selection by MDA Given a data set D n = {(X 1 , Y 1 ), . . . , (X n , Y n )} of n samples and p independent variables with , is calculated averaging the OBB errors of all permutations of trees [25]. Denote D l,n the out-of-bag data set of the l − th tree and D j l,n the same data set, where the values of X (j) have been randomly permuted. Keeping in mind that the m n (·; Θ l ) represents the l-th tree estimate and where Θ 1 , ..., Θ M are independent random variable used to resample the training set before the growth of individual trees, MDA takes the form

k-Nearest Neighbor Classifier
The K-Nearest Neighbor classifier (KNN) is a popular algorithm directly based on the training samples and commonly used in pattern classification [26].
To classify an unknown sample, two steps are followed. First, KNN calculates the distance between the unknown point q and points x i in the training data, according to a distance metric d(q, x i ). Generally, d(·, ·) can be Euclidean, Manhattan, Minkowski, Cosine, Chebychev Euclidean, Mahalanobis Standardized Euclidean, Hassana, or Chi-Square distance [27,28], just to mention a few distance metrics. Second, the k nearest neighbors are used to determine the class of q. The specified distance rule for classification of a new sample was Simple Voting. Then, an approach where new observation is assigned to the class of the majority of the k nearest points was used [29].

Bogie Test Bench
The bogie test bench was designed and manufactured by Danobat Railway Systems. It allows simulating different faults in the elements of the bogie. Figure 1 shows the main parts of the bogie, which is composed by the fixed wheels set (1) resting on the structure anchored to the floor (3). Wheels are connected by the fixed shaft (2), whereas the set of rotating wheels (6) are connected by the rotary axis (7); here is where the faults are simulated. The rotating wheels are driven by the rollers (10), and the speed is controlled by the roll driver (4), which is operated manually. The load is simulated by two hydraulic cylinders (9), and transmitted through a chain that pushes a beam (12) against the bogie structure (5).
Three accelerometers are located on the left side (11) and right side (8) to measure acceleration in all three directions on each side.

Signal Acquisition and Experimental Conditions
A pair of bearings is included inside each axle box, supporting the rotating wheels. Three uni-axial accelerometers were placed at each axle box of the wheelset as indicated in Figure 2a, three on the right side (RS) and three on the left side (LS) oriented in three directions: left vertical (LV) accelerometer, left longitudinal (LL) accelerometer, and left axial (LA) accelerometer; right vertical (RV) accelerometer, right longitudinal (RL) accelerometer, and right axial (RA) accelerometer, as presented in Figure 2b. The accelerometer model is CMSS-RAIL-9100 with sensitivity of 100 mV/g, frequency range 0.52 Hz-8 kHz and resonance frequency 25 kHz, the accelerometer is coupled to the conditioner system SKF Multilog IMX-R, that was connected to a computer with the software SKF @ptitude Observer. This conditioner system is for industrial use and reduces the possibility of the signals having noise or interference. The sampling frequency was 12.8 kHz for a time of 1.2 s. The experimental conditions for load, speed, the rotation direction, and crack depths in the railway axle (see Figure 2) are presented in Table 2 for the four experimental conditions; at least 60 samples were acquired for each condition. We named the data obtained from each accelerometer: right-hand accelerometer data set (RD), and for the left and accelerometer data set (LD). Crack faults were artificially generated by an abrasive grinding process. Further details of the experimental conditions can be found in [7].

The Proposed Approachs
This work proposes evaluating CIs for crack detection in railway axles. Two approaches are developed: the first approach evaluates the CIs of two accelerometers in the time domain, frequency domain, and combining them, the second approach evaluates the fusion of the CIs of all six accelerometers.

Proposed Approach 1
This approach evaluates condition indicators extracted from the vibration signal in time domain, frequency domain, and a combination of both called time + frequency domain for crack detection in railway axles. In this approach, we analyze the two accelerometers located along the longitudinal direction: on the right side (RL) and another on the left side (LL). According to the work developed by Lucero et al. [12], these accelerometers and their orientation offer the best information. Figure 3a shows the workflow of the first approach. It has four stages: (1) Data Acquisition, (2)  3. Feature selection: this process starts by removing correlated features. Correlation coefficient value on 0.8-1 between two CIs suggests these two features are highly correlated [30]. In this study, a threshold value equal to or greater than 0.95 (95%) was selected to identify two highly-correlated features, in all data sets. This threshold was stated after obtaining good performance in classification. Next, a normalization process was applied by scaling characteristics between −1 and 1. Normalization can be applied to Normal distributed data, as well as to data with another type of distribution. This process helps KNN to give equal importance to all CIs. After the preprocessing step, a Random Forest model was implemented with 40 trees as the main parameter. MDA metric was used to select the ten most important CIs, and then, the selection of the ten best-ranked features from each dataset. 4. Classification: each dataset is organized in a matrix of samples in the rows and CIs in the columns.
Each pre-selected dataset (built from 10 ranked CIs) in stage 3 is classified using KNN (a value of k = 3 was chosen as the main parameter after testing different number of neighbors) and the cosine distance metric was used.
Usually, when using k-folds cross-validation, values of k = 5 or k = 10 are chosen due to the good results obtained empirically [31]. In our work, the best results were obtained for k = 5. Therefore, a five-fold cross-validation strategy was carried out; the average accuracy and standard deviation (std) on the cross-validation process were calculated from 5 runs.
Finally, many classifications were performed; these starts by using the first feature for a classification, next, the two first features are used for a new classification, and so on, until reaching the ten first features. The purpose is to analyze the contribution of each ranked feature.

Classification
Vibration signals from Accelerometers on the left side Data-Fusion Accelerometers on the left side (CI_DFL)

Proposed Approach 2
The second approach evaluates the signals from the six accelerometers using data fusion at the indicator level. Figure 3b shows the workflow of this approach. It has five stages: (1) Data acquisition,  Tables 3-8 show the approaches to the results of approach 1. The top 10 CIs and their averaged accuracies for each data set are shown in Tables 3, 5 and 7. Tables 4, 6 and 8 show the accuracies per class for CI_TL, CI_FL and CI_TFL, respectively.

Results of Approach 1
In Table 3, the best accuracy of 92.89%, was obtained by using CI_TL with 7 CIs. Otherwise, to CI_TR a maximum accuracy of 95.17% was obtained with 8 CIs. Per-class accuracy values for CI_TL are presented in Table 4. The best performance with 10 attributes shows that class 1 has the highest classification accuracy over 98% followed by class 2 with accuracy over 93%. Classes 3 and 4 have similar classification accuracy over 86%. Class 3 does not reach accuracy over 89% in any case. Class 4 can reach accuracy over 92% opposite to the accuracy of classes 1 and 2. Regarding the top 10 of time domain CIs presented in the Table 3, nine of the 10 are common: zero crossing, wave length, SSC, WAMP, kurtosis, energy operator, CPT3, shape factor, and skewness.
In Table 5, the best accuracy of 91.29% was obtained using CI_FL with 7 CIs. By contrast, to CI_FR a maximum accuracy of 94.36% was obtained with 9 CIs. Per-class accuracy values for CI_FL are presented in Table 6. The best performance with 10 attributes shows that class 4 has the highest classification accuracy over 98% followed by class 2 with accuracy over 89%. Classes 1 and 3 have similar classification accuracy over 85%. Classes 1, 2 and 3 do not reach accuracy over 89% in any case. Regarding the top 10 frequency domain CIs presented in the Table 5, seven of the 10 are common PKF, CP2, FR, skewnessf, centroid of spectrum, VCF, and spectrum spread.
In Table 7, the best accuracy of 96.62% due to its low std 0.59, was obtained using CI_TFL with 10 CIs. By contrast, to CI_TFR a maximum accuracy of 97.14% was obtained with also 10 CIs. Per-class values for CI_TFL are presented in Table 8. The best performance with 10 attributes shows that classes 1 and 4 have similar high classification accuracy over 97%. Classes 2 and 3 have also similar classification accuracy over 95%. From the use of seven attributes, all the classes reach classification accuracy over 90%. Regarding the top 10 time + frequency domain CIs presented in the Table 7, six of the 10 are common; four belong to the frequency domain: PKF, CP2, FR, SM4; and two to the time domain: zero crossing and SCC. This result shows that the use of combined features from time and frequency domains can improve the classification accuracy for each class. To evaluate the CIs in time, frequency and time + frequency domain, we compare the accuracy of each ranked feature set. Figure 5 shows the accuracy results for each data set. Here, the X-axis represents the number of k first CIs most important selected by RF. The Y-axis represents the average accuracy of the k first CIs.
In order to illustrate the importance of feature selection, a scatter plot obtained from the best case, related to the CI_FL set, is presented in Figure 6, where the class 1 is the red color, class 2 is the green color, class 3 is the cyan color, and class 4 is the violet color. This set provides the high accuracy and low std using 3 attributes. The formation of small clusters can be observed for each of the classes in different locations in the three-dimensional space, and no perfect boundaries are exhibited. However, these three attributes can provide over 80% of accuracy, according to Figure 5.

Results of Approach 2
The top 10 CIs and their accuracies are presented in Tables 9 and 10 by using a Data Fusion approach, for all three data sets. The set of fused CIs from the left-side accelerometers (LV, LA, and LL) named CI_DFL, the set of fused CIs from the right-side accelerometers (RV, RA, and RL) named CI_DFR as presented in Table 9, and the third set of fused CIs from all six accelerometers named CI_DFRL as presented in Table 10. Figure 5 shows the accuracy results for each data set. Here, the X-axis represents the number of k first CIs most important selected by RF. The Y-axis represents the average accuracy of the k first CIs. Regarding the 10 top CIs presented in the Tables 9 and 10, five of the 10 are common, four belong to the frequency domain, i.e., PKF, FSK, VCF, skewnessf, and one to the time domain, i.e. SCC.
The highest accuracy (acc) using 10 condition indicators in CI_DFL was 97.56% with a std of 0.88, for CI_DFR was 98.01% with std of 0.88, and for CI_DFRL was 98.37% with std of 0.76. Of the latter, its accuracy by class is presented in the Table 11, where the class with the highest accuracy was number 4 obtaining 100% accuracy with 9 CIs.
In order to illustrate the importance of feature selection and data fusion in the approach 2, a scatter plot obtained from the best case, related to the CI_DFRL set, is presented in Figure 7. This set provides the highest accuracy (98.37%) and lowest std (0.76) using 3 CIs. Best small clusters can be observed for each class in contrast to Figure 6. The accuracy using 3 CIs was of 88.24%. These features were MDF-Freq-LV, VCF-Freq-RL, and CP2-Freq-RA.

Discussion
The proposed approaches provide the most efficient CIs to be evaluated for crack detection. The results in Tables 3, 5, 7, 9 and 10 show the approach 1 achieves an accuracy rate of more than 91% from the seventh CI for all data sets while approach 2 achieves a rate of more than 93% from the fifth CI. Furthermore, the std obtained by each K-fold is presented in these tables where it can be seen that new CIs help reducing the std of the cross-validation process. This result indicates the classifier becomes more robust to new features. Note that using the std metric from the cross-validation process is a more useful, unlike the typical single train/test experiment.
In Figure 5, the trend of the curves indicates a fast growth of the accuracy up to the seven CIs, later the value of the accuracy flattens, i.e. the classifier will not significantly improve the accuracy after adding more CIs. It is important to note that all trends present a continuous increment in the accuracy as CIs increases except for CI_FL, for which the fourth CI decreases its performance. Figure 8 shows the maximum classification accuracy of each data set and the number of indicators for which it was obtained. Approach 1, which employs a single accelerometer, achieves the best results by combining the CIs of time + frequency domain for both the left and right side accelerometers. Slightly better classification results are achieved with approach 2; however, data fusion by using six-accelerometers may involve greater challenges to perform the system diagnostic. For the three domains analyzed, better results are achieved with the right-side accelerometers, which may be due to the fact that the drive is not on this side and the signal information is less noisy.
Regarding the other works in this same case, Lucero et al. [12] reached a maximum accuracy of 96.43% and Gomez et al. [7] achieved the overall probability of detection at 95%, with 32 features (energy computed on WPT). In this work, the maximum accuracy is 97.14% with the approach 1, and 98.37% with the approach 2.
The ZC and SCC condition indicators of time domain and PKF, CP2 in frequency domain, are the common CIs among the top 10 CIs in both approach 1 and 2.
One of the main benefits of approach 1 is the possibility of classifying cracks in railway axles with a few CIs of time + frequency domain extracted from signals of a single accelerometer. In this way, we can reduce the number of machine sensors to detect a crack. On the other hand, eliminating non-informative indicators improves the performance of the classification algorithm. In addition, the use of few features reduces the computational cost of data processing.

Conclusions
In this work, it was possible to develop a practical and straightforward methodology to evaluate the performance of condition indicators (CIs) in time and frequency domains. Its application was tested for crack detection in railway axles, by using vibration signals measured with accelerometers located on the right and left side of the railway axles, along the vertical, axial, and longitudinal directions. This test bed is made of real components and has been used in previous works for testing other approaches of vibration analysis.
Two approaches were proposed. Approach 1 only analyzes the CIs extracted from the vibrations signals measured with the accelerometers placed along the longitudinal direction, on the right and left sides of the axles, as an extension for comparing to previous works analyzing this case. Approach 2 uses a data fusion analysis, by combining the CIs extracted from all the six accelerometers.
In approach 1, CIs extracted from the accelerometer on the right side in the different domains show slightly better results of classification accuracy than the condition indicators from the accelerometer on the left side. The right side accelerometer time + frequency domain data set (CI_TFR) had the highest performance, with classification accuracy over 97.14%, which is better than the results obtained in previous works. This fact may be due because the drive motor is placed on the right side, and the driven motor is on the left side which provides movement to the rotating wheels.
In approach 2, the best performance was achieved with CIs in the data set CI_DFRL which combines all the indicators extracted from the six accelerometers. Classification accuracy was over 98.38%, and this result is better than those one by using approach 1, showing that Data Fusion is a good approach to improve accuracy in fault classification.
The best results in terms of classification accuracy were achieved by using a combination of indicators extracted from time and frequency domains. The CIs named ZC and SCC in the time domain and PKF, skewnessf, CP2 in the frequency domain, are the common CIs among the top 10 CIs in both approach 1 and approach 2.
The main advantage of this methodology is that it supports the identification of the relevant CIs for fault detection in different domains of the vibration signal. This allows identifying which CIs should be monitored on-line, mainly the CIs of time domain as they have low computational cost regarding the signal processing.
As future work, the contribution of each single attribute to improving the accuracy values regarding the crack detection in railway axles will be analyzed. Moreover, the analysis of CIs coming from other application domains will be developed. Additionally, it is expected to have a repository of discriminating and independent features to develop effective data-driven classification algorithms.