Rapid Multi-Sensor Feature Fusion Based on Non-Stationary Kernel JADE for the Small-Amplitude Hunting Monitoring of High-Speed Trains

Joint Approximate Diagonalization of Eigen-matrices (JADE) cannot deal with non-stationary data. Therefore, in this paper, a method called Non-stationary Kernel JADE (NKJADE) is proposed, which can extract non-stationary features and fuse multi-sensor features precisely and rapidly. In this method, the non-stationarity of the data is considered and the data from multi-sensor are used to fuse the features efficiently. The method is compared with EEMD-SVD-LTSA and EEMD-JADE using the bearing fault data of CWRU, and the validity of the method is verified. Considering that the vibration signals of high-speed trains are typically non-stationary, it is necessary to utilize a rapid feature fusion method to identify the evolutionary trends of hunting motions quickly before the phenomenon is fully manifested. In this paper, the proposed method is applied to identify the evolutionary trend of hunting motions quickly and accurately. Results verify that the accuracy of this method is much higher than that of the EEMD-JADE and EEMD-SVD-LTSA methods. This method can also be used to fuse multi-sensor features of non-stationary data rapidly.


Introduction
Hunting motion is a self-excited vibration that is a serious obstacle to the safety of high-speed trains [1]. Monitoring systems are designed to detect hunting only after it has developed to a specific degree. Besides, in most cases, the recognition result is only obtained using a single observation. Therefore, the accuracy and real-time performance of monitoring systems need to be further improved [2]. With the increasing performance requirements of high-speed trains, it is important to establish an accurate and rapid feature extraction method for hunting detection through multi-characterizations before the phenomenon has developed to any significant degree.
The structure of high-speed trains is very complicated and their working conditions are very poor, resulting in non-stationary vibration signals [3]. In the existing feature extraction research, a variety of data extraction algorithms are utilized. These algorithms can be roughly divided into two categories: those designed for stationary data and those for non-stationary data. Feature extraction methods for stationary data include Singular Value Decomposition (SVD), Linear Discriminant Analysis (LDA) [4,5], Principal Component Analysis (PCA) [6], Locality Preserving Projection (LPP) [7], and so on. However, it is hard to extract features of non-stationary data using these methods. For non-stationary data, manifold learning is a good feature extraction method [8][9][10][11]; however, it incurs a high computational cost and has extremely long calculation times, which means that the diagnostic information cannot be to identify the evolutionary trend of the hunting motion quickly and accurately. The results verify that the accuracy of this method is much higher than that of the JADE method.
The remainder of this paper is organized as follows: In Section 2, the theoretical backgrounds of Ensemble Empirical Mode Decomposition (EEMD) [28] and a separability metric are introduced. The proposed method of non-stationary Kernel JADE is also described. The framework of the proposed method is introduced in Section 3. In Section 4, data from Case Western Reserve University (CWRU) are used to test the proposed method, and the operational data of multiple states of high-speed trains are used to verify the accuracy and rapidity of the proposed method. Finally, the conclusion is presented in Section 5.

EEMD
The basic idea of the EEMD method is to use the sifting process to decompose the signal into several intrinsic mode functions, when Gaussian white noise is added. For the original signal, the specific steps to use EEMD to decompose it into IMFs are as follows: (1) Obtain the overall signal by adding Gaussian white noise to the original signal: (2) The overall signal is decomposed to obtain the IMF components of each order, where i represents the i-th component, r is the residual term, and n is the number of IMFs: (3) Repeat step (1) and (2), each time adding different white noise sequences of the same amplitude: In Equation (3), c ij is the i-th IMF component of the decomposition to which white noise at for the j-th time, while r j is the residual value of the decomposition. (4) Using the zero-mean principle of the Gaussian white noise frequency, the effect of white noise can be eliminated, and the IMF component corresponding to the original signal can be expressed as: where n represents the number of times the white noise is added and c i represents the i-th IMF component obtained by the EEMD decomposition of the original signal.

IMF Energy Matrix
For the IMF component c i = [c i (1), . . . . . . , c i (M)], the energy feature can be expressed as: All IMF components of a vibration signal form a feature energy vector e = [e 1 , e 2 , . . . . . . e n ], where n is the dimension of vector e and represents the number of IMF components.

Proposed Non-Stationary Kernel JADE
Since the original JADE algorithm is based on stationary signal analysis, considering the non-stationary nature of high-speed train signal, the kernel JADE method [15] is applied to a non-stationary environment.
The main idea of the kernel is to map the input matrix into the nonlinear space Φ Suppose X = {x 1 , x 2 , . . . , x m }; then, the processing can be defined as follows: During implementation, we need to calculate the inner product of two eigenvectors which have been mapped into a nonlinear space using a kernel function, and a kernel matrix will be calculated using Equation (7): where x i , x j are vectors. The commonly used kernel function includes the following [29]: Similar to KPCA [30], the centered kernel matrix K can be calculated through Equation (11): The whole time series is divided into M segment intervals T 1 , . . . . . . , T M , and consequently M covariance matrices S T 1 , . . . . . . , S T M can be generated. These M covariance matrices are jointly diagonalized to find a unitary matrix U which can diagonalize M covariance matrices simultaneously; then, the energy features can be extracted.
Step 1: For M segment intervals T 1 , . . . . . . , T M , the covariance matrices of signal x(t) can be expressed as: where Step 2: The most common method to diagonalize matrices S T 1 , . . . . . . , S T M is to diagonalize the first matrix, and then transform the remaining M−1 matrices into diagonalization. W is the diagonalized matrix of the covariance matrix S T 1 : In Equation (13), V is the eigen-matrix of S T 1 , while Λ is the eigenvector of S T 1 . For the remaining M−1 matrices, the diagonalization matrix can be defined respectively: Step 3: The approximate joint diagonalization problem is equivalent to finding an orthogonal matrix U that minimizes: where off(US * T m U H ) has the same off-diagonal elements as US * T m U H , and the diagonal element is zero, while b and d represent the b-th row and d-th column of the matrix, respectively.
Since the sum of squares remains the same when multiplied by an orthogonal matrix, the problem is equivalent to maximizing the sum of squares of the diagonal elements: where p represents the dimension to which the feature is extracted.
Step 4: Givens rotation is used to transform the set of matrices to a more diagonal form, two rows and two columns at a time. The Givens rotation matrix is given by: where The initial value for the orthogonal matrix U is the identity matrix I. Matrices S T 2 , . . . ,S T m are then updated using: When the values of all non-diagonal elements are less than a given threshold ε, the iteration is completed, and the joint approximation diagonalization is achieved. The unitary matrixÛ is obtained so that multiple matrices are diagonalized jointly.
Step 5: The transform matrix A can be calculated as A=ÛW # , where superscript # denotes the pseudo-inverse. As such, the original signal s(t) can be expressed as: Since the NKJADE method can extract the nonlinear relationships hidden in the high-dimensional feature space, the fusion features can be estimated through the joint feature decomposition using multiple inputs. The fusion features can express the nonlinear and non-stationary relationships hidden in the inputs well, so they can represent the characteristic relationship of different states, and then distinguish different feature states quickly and accurately.

Separability Evaluation
To illustrate the merit of our proposed algorithm, the separability J is utilized to demonstrate the algorithm's ability to form distinct classes. The capability of the feature extraction in pattern classification can be described quantitatively using between-class scatter S b , within-class scatter S w [31], and separability [32]. Assuming that the data have a total of C classifications, the vector of the i-th classification is: where n i is the number of i-th classification. Between-class scatter S b and within-class scatter S w can be calculated as follows: where The between-class scatter S b describes how far different classes are separated, and the within-class scatter S w indicates how compactly each class of samples is distributed. Based on between-class scatter and within-class scatter, separable evaluation J is introduced to describe the clustering ability of different methods. Separable evaluation J could be calculated as follows: where function trace refers to the sum of diagonal elements.

Methodology
For the different characteristics of vibration signals, a multi-sensor data feature extraction framework is provided, in which a rapid feature fusion method using NKJADE is proposed. The framework of the feature extraction method is shown in Figure 1. The method can quickly and accurately extract different features of information contained in non-stationary vibration signals. The main steps of this framework are as follows: (1) The m position sensor data and l classes are selected, and a total of m sample matrices are obtained.
(2) The EEMD algorithm is used to separately decompose the sample matrices of m positions, then n IMF components of each sample signal can be obtained. (3) By extracting the energy features of the n components of IMF obtained using decomposition, an IMF energy matrix is obtained from each position. In this paper, a total of m energy matrices of IMF were obtained. (4) For the obtained energy matrices of IMF of m positions, a rapid feature fusion method using NKJADE is proposed, so the dimensionality is reduced to three for a better-observed performance. (5) The LSSVM is trained on and used to test the fusion features to verify the accuracy of the method.

Experiment Results and Analysis
In order to verify the valid of the method, we applied it to bearing fault identification (Case I) and small amplitude hunting monitoring of high-speed trains (Case II). Some traditional approaches, such as Ensemble Empirical Mode Decomposition Joint Approximate Diagonalization of Eigenmatrices (EEMD-JADE) and Ensemble Empirical Mode Decomposition Singular Valuable Decomposition Learning Technology Systems Architecture (EEMD-SVD-LTSA), were compared with the proposed method.

Data Description
The bearing test data for normal and faulty bearings were from the Case Western Reserve University (CWRU). The test signal contained normal state, ball fault, and inner and outer race faults; for the latter three, the fault diameters were 0.07, 0.14 and 0.21 inches, respectively.
The data used in this paper were collected from the drive end. The sampling frequency was 12KHZ and the speed of the shaft was 1725 r/min, corresponding to 400 points collected per revolution. In order to reduce the influence of equipment fluctuations, each sample contained 800 points.
As shown in Table 1, two datasets were selected. Dataset A had the same fault location (inner race fault), but the fault size was different. Dataset B had different fault locations, but the fault size was the same, and the outer race fault was at the location of 3 o'clock. Each of the datasets was divided into three categories.

Experiment Results and Analysis
In order to verify the valid of the method, we applied it to bearing fault identification (Case I) and small amplitude hunting monitoring of high-speed trains (Case II). Some traditional approaches, such as Ensemble Empirical Mode Decomposition Joint Approximate Diagonalization of Eigen-matrices (EEMD-JADE) and Ensemble Empirical Mode Decomposition Singular Valuable Decomposition Learning Technology Systems Architecture (EEMD-SVD-LTSA), were compared with the proposed method.

Data Description
The bearing test data for normal and faulty bearings were from the Case Western Reserve University (CWRU). The test signal contained normal state, ball fault, and inner and outer race faults; for the latter three, the fault diameters were 0.07, 0.14 and 0.21 inches, respectively.
The data used in this paper were collected from the drive end. The sampling frequency was 12 KHZ and the speed of the shaft was 1725 r/min, corresponding to 400 points collected per revolution. In order to reduce the influence of equipment fluctuations, each sample contained 800 points.
As shown in Table 1, two datasets were selected. Dataset A had the same fault location (inner race fault), but the fault size was different. Dataset B had different fault locations, but the fault size was the same, and the outer race fault was at the location of 3 o'clock. Each of the datasets was divided into three categories.

Signal Processing Results
The scatter plots obtained by applying the EEMD-SVD-LTSA, EEMD-JADE method and EEMD-NKJADE methods on dataset A are shown in Figure 2a-c, while the corresponding scatter plots for dataset B are shown in Figure 2d-f. The results of selecting parameters for kernel are shown in Figure 3. The results of the application of the three algorithms in dataset A are shown in Table 2, while the results for dataset B are shown in Table 3.

Signal Processing Results
The scatter plots obtained by applying the EEMD-SVD-LTSA, EEMD-JADE method and EEMD-NKJADE methods on dataset A are shown in Figure 2a-c, while the corresponding scatter plots for dataset B are shown in Figure 2d-f. The results of selecting parameters for kernel are shown in Figure  3. The results of the application of the three algorithms in dataset A are shown in Table 2, while the results for dataset B are shown in Table 3 (   Figure  3. The results of the application of the three algorithms in dataset A are shown in Table 2, while the results for dataset B are shown in Table 3 (   We found that the generalization ability of the Gaussian kernel is better than that of the other kernels. Therefore, we focused on the parameters of the Gaussian kernel. According to the results of selecting parameters in [15], we tried to set the range of σ to (1, 10). The parameter was incremented step by step (parameter step is 0.5, shown in X-axis), and the separability J was calculated (shown in Y-axis). The larger J is, the better the classification result will be. Therefore, for the parameter selection of EEMD method, the optimal parameter of the Gaussian kernel on dataset A and dataset B is 2.
From Figure 2 and Table 2, we see that all three methods could extract the features with satisfactory results. The accuracy of the three methods was nearly 100% in all cases. The result may be attributed to the CWRU bearing fault data having great differences, which are quite easy to classify.
However, the separability J in Table 2 is different, which is consistent with (a)-(c) in Figure 2. In this respect, the classification ability of EEMD-NKJADE algorithm is obviously better than other algorithms. In addition, the results of the running time in Table 2 show that the time required for EEMD-JADE calculation was relatively short (the PC configuration was as follows: Intel Core i5-4460, 12GB of memory, NVIDIA GeForce GT720).
In order to further verify the robustness of the algorithm, we also tested the results of the algorithm on dataset B.
In Table 3, the accuracy of the three methods in dataset B was almost the same as that in dataset A. However, the separability of dataset B was the highest after being processed using the EEMD-SVD-LTSA algorithm, while in dataset A, the separability achieved using EEMD-SVD-LTSA was the lowest, which indicates that the selection of dataset had a greater impact on EEMD-SVD-LTSA. Compared with EEMD-SVD-LTSA, the results from the EEMD-NKJADE and EEMD-JADE algorithms were less affected by the different dataset, and therefore seem to be more robust. Therefore, the method of EEMD-NKJADE offers superior performance with respect to the classification effect and the robustness, but its calculation time is longer than EEMD-JADE.

Problem Description
The stability of hunting has always been a key problem in the study of vehicle lateral motion stability [33]. Small amplitude hunting is a sign of hunting instability. In China, hunting phenomena are considered to occur when the amplitude of lateral acceleration signals from the bogie frame reaches or exceeds 8-10 m/s 2 more than 5 times continuously after a 10 Hz low-pass filter [34]. In Figure 4, the lateral acceleration of the bogie frame signals will sometimes go through a normal operation/small amplitude hunting/normal operation periodic cycle, which is a gradually convergent process. The hunting amplitude in this situation is small and convergent. Sometimes, the signals will go through a normal operation/small amplitude hunting/critical hunting process, which is a gradually divergent process. The hunting amplitude in this situation starts small and then diverges.
Sensors 2020, 20, x FOR PEER REVIEW 10 of 17 go through a normal operation/small amplitude hunting/critical hunting process, which is a gradually divergent process. The hunting amplitude in this situation starts small and then diverges. Therefore, it is necessary to extract the features of different states rapidly and accurately, especially the small amplitude divergent hunting states, to guarantee the system is alerted in time to ensure the safe operation of the train.

Data Acquisition
The data used in this paper were lateral acceleration signals of the bogie frame and axle box from an online tracking experiment. The CRTS II ballastless track and seamless rail were used on the whole line. The speed of the train was 320-350 km/h. The sampling frequency was 2500 Hz. All the data were acquired in accordance with China's Railway Passenger Traffic Safety Monitoring Standard [35]. Although in China the amplitude of lateral acceleration signals from the bogie frame is used as the testing parameter to monitor hunting motion, research has proven that many other testing parameters are also important for hunting monitoring. As such, in this paper, acceleration signals of the bogie frame and the axle box were used.
The installation locations of the accelerometers are shown in Figure 5, where two accelerometers are installed in the diagonal direction on the H-shaped bogie frame. The lateral accelerometers from the bogie are respectively denoted as S1 and S2. Also, considering that the vibration of the axle box is important for hunting [36], a sensor located on the axle box was used. In Figure 6, the lateral accelerometer on the axle box is denoted as S3. Figure 5 shows a photograph of the installation site. Therefore, it is necessary to extract the features of different states rapidly and accurately, especially the small amplitude divergent hunting states, to guarantee the system is alerted in time to ensure the safe operation of the train.

Data Acquisition
The data used in this paper were lateral acceleration signals of the bogie frame and axle box from an online tracking experiment. The CRTS II ballastless track and seamless rail were used on the whole line. The speed of the train was 320-350 km/h. The sampling frequency was 2500 Hz. All the data were acquired in accordance with China's Railway Passenger Traffic Safety Monitoring Standard [35]. Although in China the amplitude of lateral acceleration signals from the bogie frame is used as the testing parameter to monitor hunting motion, research has proven that many other testing parameters are also important for hunting monitoring. As such, in this paper, acceleration signals of the bogie frame and the axle box were used.
The installation locations of the accelerometers are shown in Figure 5, where two accelerometers are installed in the diagonal direction on the H-shaped bogie frame. The lateral accelerometers from the bogie are respectively denoted as S1 and S2. Also, considering that the vibration of the axle box is important for hunting [36], a sensor located on the axle box was used. In Figure 6, the lateral accelerometer on the axle box is denoted as S3. Figure 5 shows a photograph of the installation site. Sensors 2020, 20, x FOR PEER REVIEW 11 of 17 S1 S2 S3 Figure 5. Installation locations of the accelerometers. S1, S2: Accelerometer on the bogie frame; S3: Accelerometer on the axle box. Considering the applications of other researchers, the sampling frequency of the original signal was set to 2500 Hz. However, for hunting, the characteristic frequency of the lateral acceleration of the frame is only 3-7 Hz. Therefore, in the preprocessing stage, a 250 Hz resampling method was applied [37], and a low-pass filter of 0-10 Hz was applied. Then, a zero-mean smoothing process was used for preprocessing to eliminate trend terms. In accordance with commonly followed practices, parameters such as the amplitude of the lateral acceleration signals from the bogie frame and axel box were used to obtain a synthetic assessment of the lateral stability of the high-speed train tested. In this paper, the filtered lateral acceleration signals were divided into four states: normal, small amplitude convergent hunting, small amplitude divergent hunting, and hunting. Ten groups of sample data were used for each of the four states and each sensor, yielding a total of 120 sample groups. The length of each sample was 500 points, corresponding to a sample time of 2 s.
Nonlinear factors [38,39] have been proven to affect the bifurcation evolution of small amplitude hunting and, according to [23], all the values of the Lyapunov exponent of the lateral acceleration are greater than 1, which means that the lateral acceleration signals from the bogie frame have nonstationary characteristics.

Feature Fusion
First, EEMD was applied on each signal. Then, 7 IMFs and 1 residue were obtained using the EEMD method. Figure 7 shows an EEMD decomposition view of the three signals at different points during the same time period.  Considering the applications of other researchers, the sampling frequency of the original signal was set to 2500 Hz. However, for hunting, the characteristic frequency of the lateral acceleration of the frame is only 3-7 Hz. Therefore, in the preprocessing stage, a 250 Hz resampling method was applied [37], and a low-pass filter of 0-10 Hz was applied. Then, a zero-mean smoothing process was used for preprocessing to eliminate trend terms. In accordance with commonly followed practices, parameters such as the amplitude of the lateral acceleration signals from the bogie frame and axel box were used to obtain a synthetic assessment of the lateral stability of the high-speed train tested. In this paper, the filtered lateral acceleration signals were divided into four states: normal, small amplitude convergent hunting, small amplitude divergent hunting, and hunting. Ten groups of sample data were used for each of the four states and each sensor, yielding a total of 120 sample groups. The length of each sample was 500 points, corresponding to a sample time of 2 s.
Nonlinear factors [38,39] have been proven to affect the bifurcation evolution of small amplitude hunting and, according to [23], all the values of the Lyapunov exponent of the lateral acceleration are greater than 1, which means that the lateral acceleration signals from the bogie frame have nonstationary characteristics.

Feature Fusion
First, EEMD was applied on each signal. Then, 7 IMFs and 1 residue were obtained using the EEMD method. Figure 7 shows an EEMD decomposition view of the three signals at different points during the same time period. Considering the applications of other researchers, the sampling frequency of the original signal was set to 2500 Hz. However, for hunting, the characteristic frequency of the lateral acceleration of the frame is only 3-7 Hz. Therefore, in the preprocessing stage, a 250 Hz resampling method was applied [37], and a low-pass filter of 0-10 Hz was applied. Then, a zero-mean smoothing process was used for preprocessing to eliminate trend terms. In accordance with commonly followed practices, parameters such as the amplitude of the lateral acceleration signals from the bogie frame and axel box were used to obtain a synthetic assessment of the lateral stability of the high-speed train tested. In this paper, the filtered lateral acceleration signals were divided into four states: normal, small amplitude convergent hunting, small amplitude divergent hunting, and hunting. Ten groups of sample data were used for each of the four states and each sensor, yielding a total of 120 sample groups. The length of each sample was 500 points, corresponding to a sample time of 2 s.
Nonlinear factors [38,39] have been proven to affect the bifurcation evolution of small amplitude hunting and, according to [23], all the values of the Lyapunov exponent of the lateral acceleration are greater than 1, which means that the lateral acceleration signals from the bogie frame have non-stationary characteristics.

Feature Fusion
First, EEMD was applied on each signal. Then, 7 IMFs and 1 residue were obtained using the EEMD method. Figure 7 shows an EEMD decomposition view of the three signals at different points during the same time period. The samples of the three sensors were processed, and the three corresponding energy matrices E1, E2, and E3, with a size of 40 × 8 were obtained. The NKJADE method was used to fuse the three high-dimensional energy matrices and the data dimensionality was reduced.

Single Sensor Classification using NKJADE
A scatter plot of the features extracted using a single sensor and NKJADE is shown in Figure 8. The separability J and accuracy of the NKJADE method are shown in Table 4. The samples of the three sensors were processed, and the three corresponding energy matrices E1, E2, and E3, with a size of 40 × 8 were obtained. The NKJADE method was used to fuse the three high-dimensional energy matrices and the data dimensionality was reduced.

Single Sensor Classification Using NKJADE
A scatter plot of the features extracted using a single sensor and NKJADE is shown in Figure 8. The separability J and accuracy of the NKJADE method are shown in Table 4.   Table 4, the accuracy achieved using S1 data only was 97.23%, which was greater than that attained at the other sensor locations. However, when the three sensors were used together , the accuracy rate became 100%, and J reached a much greater value.

Multi-Sensor Fusion using NKJADE
The EEMD-SVD-LTSA and EEMD-JADE methods were used to compare the identification accuracy and calculation speed of the feature fusion method in multi-sensor conditions. The results of the methods are shown in Figure 9. The separability J and accuracy obtained using the different sensor fusion methods with data from all three sensors are shown in Table 5.   From Table 4, the accuracy achieved using S1 data only was 97.23%, which was greater than that attained at the other sensor locations. However, when the three sensors were used together, the accuracy rate became 100%, and J reached a much greater value.

Multi-Sensor Fusion Using NKJADE
The EEMD-SVD-LTSA and EEMD-JADE methods were used to compare the identification accuracy and calculation speed of the feature fusion method in multi-sensor conditions. The results of the methods are shown in Figure 9. The separability J and accuracy obtained using the different sensor fusion methods with data from all three sensors are shown in Table 5.
The JADE and SVD-LTSA methods were compared with the proposed method in the multi-sensor feature fusion conditions. From Table 5, the accuracy rate of the SVD-LTSA method was 93.75%, which used the non-stationary method LTSA. The accuracy rate of the JADE method was only 30.12%, which used the stationary method. The accuracy rate of NKJADE (Gaussian kernel, δ = 0.6) was 100%, while the separability J of NKJADE was greater than those of the other methods. Considering that data from high-speed trains have more typical no-stationary characteristics than the bearing fault data from CWRU, this result shows that the proposed method may more be suitable for non-stationary data. The separability J of the JADE method was only 26.67, which is even lower than that achieved using single sensor S2 or S3 data. The reason for this is probably that in the JADE method, the non-stationary condition is not considered.
Besides, because real-time processing is a significant factor for the small amplitude hunting monitoring of high-speed trains, the calculation time is a very important factor. If the calculation time is too long, the diagnostic information cannot be fed back to the system in time. Compared with the SVD-LTSA method, fusion features can be extracted very quickly using the proposed NKJADE method. The run time of the NKJADE method was nearly the same as that of the JADE method, and the separability of J of NKJADE was greater than JADE. This shows that NKJADE is a rapid multi-sensor feature fusion method based on non-stationary condition, which outperforms the SVD-LTSA and JADE methods. It can be applied to the small amplitude hunting bifurcation evolution monitoring in high-speed trains. The JADE and SVD-LTSA methods were compared with the proposed method in the multisensor feature fusion conditions. From Table 5, the accuracy rate of the SVD-LTSA method was 93.75%, which used the non-stationary method LTSA. The accuracy rate of the JADE method was only 30.12%, which used the stationary method. The accuracy rate of NKJADE (Gaussian kernel, δ = 0.6) was 100%, while the separability J of NKJADE was greater than those of the other methods. Considering that data from high-speed trains have more typical no-stationary characteristics than the bearing fault data from CWRU, this result shows that the proposed method may more be suitable for non-stationary data. The separability J of the JADE method was only 26.67, which is even lower than that achieved using single sensor S2 or S3 data. The reason for this is probably that in the JADE method, the non-stationary condition is not considered.
Besides, because real-time processing is a significant factor for the small amplitude hunting monitoring of high-speed trains, the calculation time is a very important factor. If the calculation time is too long, the diagnostic information cannot be fed back to the system in time. Compared with the SVD-LTSA method, fusion features can be extracted very quickly using the proposed NKJADE method. The run time of the NKJADE method was nearly the same as that of the JADE method, and the separability of J of NKJADE was greater than JADE. This shows that NKJADE is a rapid multisensor feature fusion method based on non-stationary condition, which outperforms the SVD-LTSA and JADE methods. It can be applied to the small amplitude hunting bifurcation evolution monitoring in high-speed trains.
As shown in Figure 10, the parameter is incremented step by step (parameter of step is 0.01, shown in X-axis), and the separability index J is calculated (shown in Y-axis). The larger J is, the better the classification result will be.  As shown in Figure 10, the parameter is incremented step by step (parameter of step is 0.01, shown in X-axis), and the separability index J is calculated (shown in Y-axis). The larger J is, the better the classification result will be. Because the raw data collected are mainly distributed in the range of −1-1 g (gravity). In Equation (8), σ is the width parameter of the kernel function, so we tried to set the range of σ to (0, 1). As shown in Figure 10, with the increase of δ, J increases at first and then decreases. It indicates that the range of parameter selection is reasonable. When δ = 0.6, the classification result is the best. Because the raw data collected are mainly distributed in the range of −1-1 g (gravity). In Equation (8), σ is the width parameter of the kernel function, so we tried to set the range of σ to (0, 1). As shown in Figure 10, with the increase of δ, J increases at first and then decreases. It indicates that the range of parameter selection is reasonable. When δ = 0.6, the classification result is the best.

Conclusions
In this paper, a rapid multi-sensor feature fusion method based on NKJADE is proposed, with which the features of multiple sensors can be extracted quickly and accurately. In order to use the algorithm in a non-stationary environment, the whole time series is divided into M time periods, and the kernel function is introduced. Then, Jacobian rotation is used to obtain the unitary matrix by diagonalizing multiple kernel matrices simultaneously to extract the non-stationary fusion features.
Our main findings are that: 1.
The fusion features can be extracted quickly and efficiently using the proposed method NKJADE compared to the SVD-LTSA and the JADEs methods with bearing fault data from Case Western Reserve University.

2.
The NKJADE method can extract the fusion features from non-stationary data effectively compared to the JADE method. In case I, the accuracy rate of the three methods (SVD-LTSA, JADE, and NKJADE) was nearly the same (100%), but in case II, the accuracy rate of the three methods was very different.

3.
The NKJADE method can extract the multi-sensor fusion features effectively. The data from hunting monitoring of high-speed trains were used to verify the validity of the method in multi-sensor conditions.