Sensors Information Fusion System with Fault Detection Based on Multi-Manifold Regularization Neighborhood Preserving Embedding

Electrical drive systems play an increasingly important role in high-speed trains. The whole system is equipped with sensors that support complicated information fusion, which means the performance around this system ought to be monitored especially during incipient changes. In such situation, it is crucial to distinguish faulty state from observed normal state because of the dire consequences closed-loop faults might bring. In this research, an optimal neighborhood preserving embedding (NPE) method called multi-manifold regularization NPE (MMRNPE) is proposed to detect various faults in an electrical drive sensor information fusion system. By taking locality preserving embedding into account, the proposed methodology extends the united application of Euclidean distance of both designated points and paired points, which guarantees the access to both local and global sensor information. Meanwhile, this structure fuses several manifolds to extract their own features. In addition, parameters are allocated in diverse manifolds to seek an optimal combination of manifolds while entropy of information with parameters is also selected to avoid the overweight of single manifold. Moreover, an experimental test based on the platform was built to validate the MMRNPE approach and demonstrate the effectiveness of the fault detection. Results and observations show that the proposed MMRNPE offers a better fault detection representation in comparison with NPE.


Introduction
As a foundational component of industrial development, sensors of various types are applied and equipped in diverse systems [1,2], which perfectly meet the demand of data gathering [3,4] and fault detection [5][6][7]. In addition, the attractions of the information fusion process lie in its ability to eliminate the redundancy and contraction among sensor data sets, as well as its decision-making capacity with uncertainty information. However, once potential information gathered by sensors is not dealt with promptly, some tiny faults may extend unrestrictedly and result in breakdown of systems [7]. Therefore, for the sake of responsiveness, sensors information fusion is an essential technology for improving security and reliability of the system.
Several issues arise when fusing information is introduced to the system, including information uncertainty and data management. An efficient method to deal with the massive data information is feature extraction [8][9][10][11]. There are various techniques reported in the literature for fault diagnosis with the combination of algorithms' fusion. Jafarian et al. [12] used fast Fourier transform as a feature extraction methodology, after which the artificial neural networks, support vector machines and k nearest neighbor classification algorithms are employed to verify the multiple performance metrics and realize signal monitoring. In addition, Saimurugan and Ramprasad [13] fused diverse algorithms for separate purposes to realize fault diagnosis. Wavelet transform and the decision tree were employed for feature extraction, in addition to the artificial neural network to classify the faulty situation. More recently, Liu et al. [14] proposed an intelligent multi-sensor data fusion method with the help of a relevance vector machine for gearboxes' fault detection, and an ant colony optimization algorithm is involved.
However, the proposed compound methods are composed of diverse algorithms, which may result in the complexity of computational system for fault detection. For the purpose of detecting consistently and integrally, there should be some concise algorithms in data preprocessing, or one algorithm to realize the monitoring and detection of the whole system [15][16][17][18]. For example, Yunusa-Kaltungo et al. [19] proposed an improved composite spectrum data fusion technique to retain amplitude and phase information by applying cross power spectrum density to fault diagnosis in rotating machines. Moreover, Yunusa-Kaltungo et al. [20] used the data combination method as a preprocessing way for obtaining composite higher order spectra, after which principal component analysis was employed for fault detection. Jing et al. [21] employed deep convolutional neural networks to address an adaptive multi-sensor data fusion problem, which was capable of detecting the conditions of the planetary gearbox effectively with the best diagnosis accuracy. Most of the fusion strategies show their efficiency in small systems, whereas, in practical real-world scenarios, where data generated by sensors might be tremendous, these approaches might lose their advantages and lead to erroneous results.
In many critical fields, the systems are generally confronted with large scale and complicated logic, contributing to a high dimension of data collected from sensors. Under such circumstances, a manifold learning algorithm aiming at dimensionality reduction shows its advantage in data mining [22,23], such as neighborhood preserving embedding (NPE) [24], locality preserving projection (LPP) [25], Laplacian eigenmap (LE) [26], locally linear embedding (LLE) [27] and more [28][29][30][31]. It has been proved that the discriminative ability will be enhanced tremendously once the intrinsic manifold structure is considered. To be specific, NPE is a linear technology for combining neighbored data points together to seek an optimal local distribution. Nevertheless, NPE merely concerns the designated points, which means a lack of concern regarding the paired data.
At the same time, motivated by manifold learning algorithms, manifold regularized techniques, which also take local geometric structure into account, are proposed to learn a low-rank approximation [32,33]. For example, in [34], a structure cluster ensemble method is proposed to capture the structure information of the original data set with a manifold regularized objective function. In [35], Chuang et al. employ a manifold regularized distribution adaptation algorithm to classify both multi-spectral and hyper-spectral remote sensing data as well.
Inspired by the aforementioned research, this paper proposes a new algorithm named multi-manifold regularization neighborhood preserving embedding (MMRNPE). Different from the previous local manifold methods based on merely minimizing Euclidean distance between designated point and its neighbors, our framework pays extra attention to paired points in low dimension manifolds, along with the proportion adjustment between designated points and paired ones, which comprises the global information. It is also attractive because the selection of multi-manifold feature will avoid the disturbance and uncertainty of noise. Furthermore, multiple parameters are included for regularization or optimization purposes, which are capable of judging the membership between local and partial global message. In addition, as an iterable algorithm, MMRNPE is capable of choosing the iteration time by required accuracy but with consistent convergence.
The remainder of this paper is organized as follows. In Section 2, a brief review of some preliminaries and related works is given, including our small sensor information fusion system, as well as the NPE and LPP algorithms. In Section 3, the MMRNPE algorithm is proposed to extract both designated point information and paired points information, together with some discusions on algorithms and parameters. In Section 4, complete experiments based on the fusion system and MMRNPE are presented and verified. Conclusions are drawn in Section 5.

One Small Sensor Information Fusion System
The small sensor information fusion system is actually an electrical drive system in high speed trains. Figure 1 presents the schematic diagram components of this platform [36]. It is an experimental platform of a high-speed train from the China Railway Rolling Stock Corporation in Zhuzhou, China. Several sensors are installed in different parts of traction components. By fusing the sensors and computational modules together with the aid of MATLAB R2014a (Mathworks, Natick, MA, USA) and dSPACE models (2014-A, dSPACE, Paderborn, Germany), the features of unexpected faults can be distinguished and the faults will be detected finally. Effective sensor data is crucial to increase reliable detection capacity of this system. The more effective and complete the data set compiled from various sensors, the greater the system's ability to extract features. In our electrical data-driven platform, three-phase output current signals collected from sensors equipped in traction motors are indicated as i a , i b , i c . At the same time, the voltage from the line-side of a transformer is represented as u net while the two-phase input voltage signals of inverters are labeled as u d1 and u d2 . The rotation signal of traction motor is s. In addition, sensors in traction inverters can also acquire boolean values as the judgement of switch state.
Moreover, information fusion is a fundamental and essential part of sensor management. Since information from a single sensor may be inaccurate and uncertain, not only data collected from multiple sensors oughts to be fused in time, but also the acquired data in the inference and calculation process needs preprocessing. In addition, the following information fusion steps for analysis are based on the newly algorithm MMRNPE, which will be elaborated in Section 3. With the aid of the inference and calculation process, the control and management of the whole system will be realized once the monitoring of fault detection is enabled. The information fusion is, in fact, a data-based method. The process of information fusion structuring is shown in Figure 2. To be specific, the offline data mainly consists of normal data sets, which means the calculated performance indicators represent the normal part. Once faults are injected into this physical electrical platform, the online data will change rapidly with the occurrence of jumping transition in performance indicators. With the aid of one accurate control threshold, the accuracy of faults detection might be guaranteed effectively. This sensor information fusion system is constructed for the purpose of fault detection. It fuses the information from multiple sensors to recognize abnormal signals through the proposed algorithm and make the right decisions accordingly.

Neighborhood Preserving Embedding
Neighborhood preserving embedding (NPE) is a recently proposed feature extraction method. The basic idea of NPE is to seek a lower dimensional projection of input sensor data set To obtain the optimal projection matrix A ∈ R N×D , the NPE algorithm first constructs a neighborhood graph and then finds the weight matrix of each edge by minimizing the following reconstruction error [24,27]: where w ij is the neighbored weight from x i to x j , with constraints ∑ N j=1 w ij = 1 and 0 ≤ w ij < 1. After acquiring the basic weight matrix, the minimized cost function with regard to the output matrix Y ∈ d × N(d < N) is then chosen as follows: where R NPE = X(I − W) T (I − W)X T and a is the column vector of matrix A. Now that Y = A T X is a linear projection, NPE is a linear approximate method that contributes to the fast computation ability among massive manifold learning algorithms. In addition, its focus on the relationship between the designated point and its neighbors is widely applied.

Locality Preserving Projection
The process of calculating the optimal weight matrix and reconstruction error of locality preserving projection (LPP) is the same as NPE. The only difference between these two algorithms is in their eigenmaps, i.e., the objective function of LPP is totally different from that of NPE.
where D is a diagonal matrix with entries calculated from the column sum of W, i.e., D ii = ∑ j W ji . In addition, L LPP = D − W is a Laplacian matrix. What makes LPP so attractive is its exploration towards finding another relationship between all paired points, that is, the variance based on the projected data with Euclidean distance. The detailed explanation is given in Section 3.

Multi-Manifold Regularization Neighborhood Preserving Embedding
As is mentioned in previous research [24,25], both NPE and LPP are linear manifold learning algorithms concentrating on extracting neighbored connections. In NPE, it reveals that NPE and LPP provide two different ways to linearly approximate the eigenfunction of Laplace Beltrom operator. However, in fact, they have totally different concerns that NPE focuses on the variance about the projected designated point and its reconstructed point while LPP concentrates on the variance about the projected paired points. In other words, NPE is more concerned about single data while LPP cares more about paired data. When it is projected into all data sets, NPE is a local algorithm and LPP is to some extent a global one.
Hence, this proposed multi-manifold regularization neighborhood preserving embedding is developed by combining them together to obtain an overall optimal algorithm. The detailed derivation of this MMRNPE algorithm is presented as below.
In LPP, L LPP = D − W, where W is the weight matrix whose element w ij represents the neighborhood relationship between x i and x j . However, in this MMRNPE, we replace W by H, where H is the matrix packaging the neighbored information: Actually, once the data set is given, two kinds of graph Laplacian can be established, including an unsupervised graph in which L is constructed by unlabeled data and a supervised graph in which L is constructed by labeled data. Unsupervised graph Laplacian is defined asL where m is the number of manifolds. If labeled information is achievable, discriminative information is obtainable for separate samples of different labels. Therefore, a supervised graph Laplacian is neg represent normal data and faulty data, respectively. In addition, β is a regulating parameter. In this paper, a normalized graph Laplacian is proposed and applied as follows: where D (i) m is the diagonal matrix similar to that in LPP and i here means the i-th manifold with particular setting. Thus, i = 1, 2, ..., m.
However, there exists noise corruption in every manifold of L (i) m , which may result in the failure of exploring intrinsic distribution of samples and then the inaccuracy of fault detection. To avoid the accidental errors caused by a single manifold, one conjoint multi-manifold algorithm is proposed. The core of this multiple method is as follows: where L (i) m are various manifolds that stem from different settings of neighborhood and α (i) are the parameters to match the optimal multi-manifold combination. By expanding the choices of manifolds, the terminal selected L is in L ter = span{L Actually, the idea of multi-manifold is confined to the introduction of α (i) , which is quite remarkable in our algorithm. Obviously, this approach is based on an assumption that the intrinsic manifold exactly lies in the convex hull of all the pre-given manifold candidates and these manifolds are the same as the concept of graph Laplacians. Thus, several manifolds corresponding to diverse neighborhood settings are gathered in the spanning set L ter , which will ensure different features being collected and filtered. In addition, at the same time, the disturbance of noise and uncertainty from a single manifold are diminished.
Taking NPE and the above multi-manifold ideology from LPP into consideration, the objective function is changed: where 2 ) and k is a regularization parameter.
The parameter k here takes a role to scale the contribution of NPE and LPP. That is to say, although MMRNPE takes both local structure from NPE and the variance structure from LPP into consideration, there is no exact method to measure their own membership. Affiliated with the objected function, k adapts its role of adjusting the proportion of locally neighbored information and partial global variance information, which are critical to the distributive balance.
Use the Lagrange multiplier to solve the minimized problem, and the question is transformed into one generalized eigenvalue problem: However, experiments indicate that the above objective function may concentrate on a series of problems with regard to manifold selection. To be specific, α = [α 1 α 2 · · · α m ] may meet with the following situation: where k is the k-th manifold. This means that only the k-th parameter is efficient. In other words, although multiple manifolds are collected, only one particular manifold is chosen. Now that only one single manifold information is preserved and revealed, L ter is not used well. Hence, the objective function needs further improvement and constraint, which are about α (i) to check and balance the decisive leadership of k-th manifold. Therefore, the task is to find a function f (α (i) ) with negative correlation of a T XLX T a. Entropy of information [37][38][39] is an efficient method to make adjustment to the contribution on α (i) . Thus, another objective function is constructed to obtain α (i) : where γ is a parameter to adjust the proportion of entropy. With the aid of the entropy of information about α (i) , the condition that one single manifold is emphasized obsessively will be avoided reasonably. With the introduction of γ, the cost function of α (i) is modified. The existence of parameter γ along with the entropy of information is a penalty term of a generalized regularization, which does well in monitoring and regulating the validity of the objective function.
Therefore, the optimized objective function not only considers the local and partly global information, but also fuses multiple manifolds with sensor information. In addition, the added regularized parameters are also significant for selection of an optimal solution.
In a gesture to solve the minimization problem in Equation (10), Lagrange multipliers are introduced for constructing Lagrange function: where λ 1 and λ 2 are Lagrange multipliers.
By setting the derivative of Q w.r.t. α (i) along with λ 1 and λ 2 to zero, we have so that we obtain α (i) as follows: Now that α (i) is deeply relevant to a, the variable α It is remarkable that the manifolds constructed by α (i) 2 are the expected results which are sensitive to suppressing the noise disturbance.
In accordance with the above analysis, the algorithmic procedure of the proposed MMRNPE can be formally summarized as below: 1. Compute the normalized graph Laplacians L (i) m of different manifolds with Equation (5). 2. Compute the initial L 1 with pre-given manifold candidates: where i = 1, 2, ..., m. 3. Solve the generalized eigenvectors of the following equation as a 1 : where M = (I − W) T (I − W). 4. Compute L 2 with a series of optimized α 5. Solve the generalized eigenvectors of the following equation as a 2 : 6. Obtain the embedding as follows: 2 , ..., a Furthermore, the calculation process of our framework is an iterable one with the verification of regularization consistency shown in Figure 3. The core of this process lies in the update of α (i) . More specifically, the iteration round will continue when a 2 in Equation (18) is substituted to Equation (13) and then α (i) 3 is obtained. After substituting α (i) 3 to Equation (8), it is easier to observe the projection matrix A. Such alternating iteration round is continuous and the convergency of this learning algorithm can always be guaranteed. Moreover, along with the increasing number of iterations, the algorithm is theoretically deeper, i.e., researchers may choose the iteration times freely and obtain the designing curve based on the requirement of experiment error plenarily.
The iteration procedure is an optimal process of parameter α (i) as well.
In general, MMRNPE is proved to be a successful method in exploiting the underlying geometry structure of selected data sets with the aid of NPE and LPP. Actually, by incorporating the ideology of LPP or graph Laplacians, MMRNPE takes advantage of NPE's local structure and LPP's partial global variance structure successfully. It is worthwhile to highlight the marvelous properties of this proposed approach:

•
In our MMRNPE, it takes the Euclidean distance of both designated point and paired points into adequate consideration, which guarantees the balance between local and global information from sensor data.

•
Multiple parameters are included in this algorithm, some of which are of regularization purpose and the others are of limitation consideration. Now that various choices of parameters will result in distinction of performance, some optimization algorithms can be chosen to promise the fault detection rate.

•
Some of the regularized parameters are able to judge the membership relationship of elements, i.e., the membership between local information with NPE algorithm and variance information with LPP algorithm is displayed intuitively, which realizes the sensor information fusion.

Fault Detection Strategy
The fault detection is based on the small sensor information fusion system referred to in Section 2.1. The detailed parameters information is given in Table 1 [7], which includes parameters both in physical space and in computational space. Actually, data sets collected from sensors in physical space are sent to the computational space promptly, after which the calculation and monitoring process begins. The calculation and monitoring process includes two steps-offline calculation and online monitoring. Now that MMRNPE seeks a latent variable space that represents the high dimensional space relatively, the monitoring statistics Hotelling's T 2 statistic is constructed as a measurement of the performance of fault detection [40,41]. Here is the definition of T 2 : where Λ = cov(Y offline ) is the covariance matrix of the offline data set and Y is the sample after dimensional reduction process. It is obvious that with sampling n, T 2 is the sum of the normalized squared scores, making it possible to measure the performance of the chosen projection matrix A [42,43]. Another statistic SPE plays this role as well [40,41]. With the performance statistics obtained, the offline modeling procedure is as follows: 1. Collect original data set X and normalize it with zero mean and unit variance. 2. Compute the projection matrix with the proposed MMRNPE algorithm. 3. Calculate the dimensional-reduction data set Y with the linear mapping. 4. Compute the performance statistics T 2 and SPE of offline data set. 5. Construct the upper control limits of T 2 and SPE as the standard of online data.
After the offline modeling process, the upper control limits are obtained, and then we can implement the online monitoring procedure: 1. Collect online data set X online and normalize it with zero mean and unit variance. 2. Calculate the dimensional-reduction data set Y with the projection matrix obtained in offline procedure. 3. Compute the performance statistics T 2 and SPE of the online data set and compare them with the upper control limits of T 2 and SPE from the offline process. 4. Compute the fault alarm ratio (FAR), non-detection ratio (NDR) and total detection rate (TDR) to evaluate the fault detection ability of this MMRNPE algorithm.

Experiments Verification with the Proposed MMRNPE
The platform with multiple sensors to fuse information derived in Section 2.1 is selected to verify this multi-manifold algorithm.
Several experiments are carried out in the normal state and various bias faulty states on the test bench. Both the normal and faulty expressions are listed in Table 2. Numerous sensors are equipped and distributed across all aspects and locations of the system. Several typical signals attract much attention to evaluate the performances of motors, including current signals, voltage signals and speed signals. Hence, we locate the sensor faults at current path and voltage path, as well as the speed sensor itself separately. As is obviously shown in Table 2, three types of faults are manually induced in various sensors under different operation conditions, i.e., current sensor fault, voltage sensor fault and speed sensor fault.
Once sensors located in other locations break down and affect the security of system operation, the typical signals mentioned above also changes abnormally. In a gesture to evaluate the severities of faults, three different ranks of signal amplitude are set. The degree of voltage sensor fault is 0.01% of the running voltage amplitude while that of current sensor fault is 0.05%. In addition, the speed sensor fault is 0.5% of the normal condition. It should also be noted that the training data of three different faults share the same normal data sets. The only difference lies in the test data, where samplings collected during the faulty operation periods have the same length. Figures 4-6 give the evolution processes of fault injections with different sensor faults.
As is shown in Figures 4-6, there are some fault alarming points which are below control limits after faults occur and non-detection points that are above control limits before faults happen. Both fault alarming points and non-detection points are marked in the figures with striking colors. At the same time, fault alarming ratio (FAR), non-detection ratio (NDR) and total detection ratio (TDR) with performance statistics T 2 and SPE are calculated under both MMRNPE and NPE algorithms, which are shown in Table 3. The detection indexes of various sensor faults with MMRNPE and NPE shown in Table 3 can obviously verify the superiority of our MMRNPE, especially when it comes to statistics T 2 . The results of FAR, NDR and TDR with performance statistics T 2 perform better accuracy of fault detection. Furthermore, with a careful observation of the magnified figures, it is apparent that several misclassifications occur. To be more specific, the non-detection point may be marked in the fault alarming notation after the fault is injected while the fault alarming point may be marked in a non-detection notation before 240 s. Such mistakes take place due to the setting of sampling time. The sampling time T s is 4 × 10 −4 s, which will make it difficult to inject the sensor faults exactly at 240 s, i.e., the injected time 240 s is between sampling points m and m + 1.
Several parameters are induced into this MMRNPE algorithm and each of them plays its own role. Here, we will discuss the role of γ to select an optimal one. By using different γ ,the monitoring and detection results of f 1 with MMRNPE are totally different, which are shown in Figure 7.
As is shown in Figure 7, the FAR of T 2 increases very slightly while NDR decreases like an inverse sigmoid curve with the gradual increase of γ.
The x-axis of Figure 7 is the number of γ, where the series of γ is actually a geometrical sequence with the initial value is 1 and the terminal value is 10 4 .

Conclusions
Multiple sensors located at various positions of this electrical drive system are comprised of numerous amounts of characteristic information. In this research, we discussed the excavation of sensor data sets obtained from sensors information fusion systems to detect faults via an adapted MMRNPE algorithm. There are three key components of this improved algorithm. Firstly, as a combination of NPE and LPP, the objective function of MMRNPE inspired by manifold learning algorithms considers both the designated points and paired points to find the intrinsic incorporation. Secondly, multiple manifolds with various neighborhood points are merged together closely while keeping their distinct characteristics. Thirdly, diverse parameters are introduced and presented into this methodology to play their own role, some of which bear the responsibility of adjusting the proportion of locally neighbored information and partial global variance information. In addition, some of the parameters are used for weighting adjustments with different manifolds. The experimental results demonstrate that MMRNPE realizes data processing and information fusion successfully in terms of its sufficient fault detection effects. With three different sensor faults injected and detected promptly and efficiently, this approach is verified and confirmed adequately.
In this study, parameters are selected in an enumeration or experimental way, which means future work will focus on the optimization algorithms for our strategy. By taking the optimization algorithms as well as iteration into account, the detection efficiency will be promoted to a higher level.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors have no role in the design of the study; in the collection, analysis or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: