An Enhanced Data Visualization Method for Diesel Engine Malfunction Classification Using Multi-Sensor Signals

The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.


Introduction
Condition monitoring on diesel engines is essential for their safety and reliability. In the condition monitoring system of diesel engines, reliable detection and classification of engine malfunctions is very important to schedule maintenance [1]. However, the ambiguity between different malfunctions is still the main challenge in any condition monitoring system. In order to obtain more information from such a complex system, different measurement methods and condition monitoring approaches with multi-sensor systems were proposed [2,3]. Therefore, an efficient approach for classifying the malfunctions of diesel engines from multi-sensor signals is the main task of a condition monitoring system.
In recent years, different measurement methods were used for condition monitoring and diagnosis of diesel engines. These measurement methods, including vibrations [4][5][6][7], instantaneous speed [8][9][10], oil analysis [11], acoustic emission [12,13], cylinder pressure [14][15][16], etc., have shown good performance in condition monitoring of the diesel engine. Of these signals, cylinder vibration signals are easy to obtain, and cylinder pressure signals can reflect the real combustion conditions of diesel engines. Therefore, these two measurement methods have been widely used in diesel engine condition monitoring systems.
However, due to the complexity of the diesel engine, it is difficult to identify the conditions or malfunctions by a single feature from one measurement signal. For example, the peak value of cylinder pressure can reflect the maximum pressure of the combustion. However, the combustion condition cannot be determined by only this parameter due to the fact that the whole combustion condition is also determined by the mean value of pressure, the maximum rate of pressure rise, the amplitude of combustion vibration, etc. To better reflect the engine's health states, multiple features should be extracted from the signals of different sensors. Multi-objective optimization method and data fusion are effective ways to address the multi-sensor problem. Many works focused on multi-objective optimization algorithms and effective solutions for multi-sensor and sensor networks were proposed [17][18][19][20]. In [21], the multi-sensor data fusion method was integrated in the integrated vehicle health maintenance system (IVHMS) for reducing failure risk.
The multiple features extracted from the multi-sensor signals construct a high-dimensional dataset, which are accompanied by a large volume of data due to the continuous sampling in an online condition system. As advances in handling big and complex data, a number of methods were proposed. In these methods, data visualization can provide a way to handle big and high-dimensional data as it offers an intuitive interface for humans to rapidly detect the structural elements of the data such as clusters, homogeneous regions, outliers, etc. [22].
In the domain of data visualization, dimensionality reduction is one of the most common techniques. Dimensionality reduction can be used to reduce the complexity and dimension of the original data while keeping most of the desired intrinsic information [23,24]. Such intrinsic information can be used for distinguishing the different health states of tested system from a visualization perspective.
Various methods such as principal component analysis (PCA) [25] and linear discriminant analysis (LDA) [26,27] for dimensionality reduction were proposed in the literature. These methods are linear techniques that focus on keeping the low-dimensional representations of dissimilar data points far apart. Sometimes, it is more important to keep the data points of the same class close together in the low-dimensional space, especially when they have a heterogeneous distribution in high-dimensional space. It is typically not possible to use a linear mapping method, and a non-linear manifold can be more competent to do this [28].
A variety of non-linear dimensionality reduction methods such as isometric mapping (ISOMAP) [23], locally linear embedding (LLE) [24], kernel PCA [29], t-distributed stochastic neighbor embedding (t-SNE) [28] have also emerged. These methods enable the correct visualization of data which lie on curved manifolds or incorporate cluster of complex shape. In addition, most of the non-linear dimensionality reduction methods provide a map of the given data points only, without an explicit mapping prescription. This choice has a benefit that it equips the methods with a high degree of flexibility for keeping the correlations of the different classes in classification problems.
As mentioned above, the high-dimensional data which are used for data visualization contain many different features that are extracted from multi-sensor signals. For a classification problem, however, not all the features are useful for classifying. Irrelevant features may even deteriorate the classification result. In our study, in order to reflect the different malfunctions of a diesel engine, the different features are extracted from vibration signals and pressure signals. Not all the features are sensitive for the specific malfunction. When all of the features are used for dimensionality reduction, the irrelevant or non-sensitive features may deteriorate the performance of the classification in low-dimensional space. Some malfunctions which are very similar in high-dimensional space may not be separated in low-dimensional space. Furthermore, the high-dimensional data contains irrelevant features causing higher computational cost. To solve these problems, a feature subset score based t-SNE data visualization method is proposed in this paper for the malfunction classification of diesel engines. The optimal feature subset is obtained by a subset-level score criterion through an iterative algorithm. After that, it is used for data visualization by t-SNE method.
The contributions of this paper can be summarized as: (1) the data visualization method is extended to data classification using multi-sensor data; (2) according to a feature subset score criterion, an improved t-SNE algorithm, referred to as FSS-t-SNE, is proposed for data visualization. The proposed method selects the optimal features, by which the irrelevant features are eliminated; (3) the proposed method is applied on the malfunction classification of diesel engines using multi-sensor signals. In this study, the datasets from the UCI machine learning repository are used to validate the proposed method. The results show that it can effectively improve classification accuracy. Finally, a malfunction classification experiment on a diesel engine was performed to validate the proposed method. In this experiment, a 16-cylinder marine diesel engine was used. A pressure sensor and a vibration sensor were used to collect the cylinder pressure signals and cylinder vibration signals, respectively. Multiple features of these two signals are used for malfunction classification. The results show that the proposed method has a much better performance on classifying the normal condition and three different malfunctions in comparison with other visualization methods.

t-SNE Dimensionality Reduction Method
Given a high-dimensional space X, D X ∈  constitutes a data manifold for which a sample of points is available. Data points , 1,..., in X are projected to points , 1,..., i i n = y in the projection space d Y d D ∈   such that as much structure as possible is preserved. Stochastic Neighbor Embedding (SNE) starts by converting the high-dimensional Euclidean distances between data points into conditional probabilities that represent similarities. The similarity of data point xj to data point xi is the conditional probability, pj|i, that xi would pick xj as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at xi. For the nearby two data points, pj|i is relatively high, whereas for the separated data points, pj|i is low. pj|i will be almost infinitesimal, when two data points are widely separated.
In the low-dimensional space, the projected data points yi and yj can also compute a similar conditional probability, which is denoted by qj|i. SNE aims to find a low-dimensional data representation that minimizes the mismatch between pj|i and qj|i. This is achieved by minimizing a cost function which is a sum of Kullback-Leibler (KL) divergences between pj|i and qj|i. This function can be represented as: Although SNE constructs a reasonably good visualization, it is hampered by a cost function that is difficult to optimize and by a problem referred to as the "crowding problem". Van der Maaten and Hinton proposed a new technique called "t-Distributed Stochastic Neighbor Embedding" or "t-SNE" that aims to alleviate these problems [28]. t-SNE uses a symmetrized version of the SNE cost function with simpler gradients that was introduced by Cook et al. [30].
Instead of minimizing the sum of the KL divergences between the conditional probabilities pj|i and qj|i, it is possible to minimize a single KL divergence between a joint probability distribution, which is P in high-dimensional space and Q in low-dimensional space: In Equation (2) Then in this symmetric SNE, the pairwise similarities in high-dimensional space is: t-SNE uses a Student-t distribution to compute the similarity between two points in the low-dimensional space. t-SNE employs a heavy-tailed distribution in the low-dimensional space to alleviate both the crowding problem and the optimization problem of SNE.
In low-dimensional space, the pairwise similarities is: The minimization of the cost function Equation (2) is performed by a gradient descent method. The gradient can be represented as Equation (5): Then i y is updated by Equation (6). In order to speed up the optimization and to avoid poor local minima, a relatively large momentum term is added to the update equation: In Equation (6), α( ) t is the momentum term at iteration t, η is the learning rate.

Feature Subset Score Based t-SNE
t-SNE has good data visualization performance. It can keep the structure of different classes without class labels when reducing the dimensionality of data. In the classification problem, multiple features constitute the high-dimensional data. Not all the features are useful for classification. For example, in the malfunction classification of diesel engines, in order to reflect the engine state, many features from different measurement signals are used. Among these features, irrelevant and non-sensitive features may deteriorate the data visualization performance. Consequently, the classification accuracy is reduced. In addition, a large number of features will cause higher computational cost. If an optimal feature subset for classification can be found, the result of data visualization for classification can be effectively improved.
Suppose the original high-dimensional data space D X ∈  , the number of features (dimensions) is D. The task is to find the optimal projection matrix under an appropriate criterion, and the D-features data x from space X is transformed to the d-features data y by: which has the form: Then W can be written as: where the vector I is a permutation of {1,2,…,D}. A graph is a natural and effective way to encode the relationship among data. It has been applied in many machine learning tasks, such as clustering [31], subspace learning [32,33], and manifold learning [34]. For the task of feature selection, two weighted matrixes are usually constructed, w A and A reflects the between-class or global affinity relationship i x and j x [35]. These two weighted matrixes are constructed according to the Fisher score method [36].
The optimal features for visualization need to satisfy 2 ( ) i j w i j ij −  y y A as small as possible and 2 ( )  y y A as large as possible. To achieve the above two goals, the optimization problem can be given by: The natural solution to this problem is to solve a trace ratio optimization problem. It can be represented by: In Equation (11), w L and b L are the Laplacian matrices. w Define the score of a feature subset is represented as: The task of feature selection is to find the feature subset with the maximum score, which can be represented by Equation (13): A novel iterative algorithm to efficiently solve this optimization problem is proposed in literature [33]. Define For the score of the i-th feature can be computed by: (15) d largest scores are selected to construct I W according to the f score . Then the λ is updated by Equation (14). The maximum λ can be obtained by this iterative procedure. After the convergence condition is reached, the optimal feature subset can be computed by Equation (7). The algorithm of this feature subset score based t-SNE (FSS-t-SNE) data visualization method is illustrated in Algorithm 1.

UCI Dataset Test for the FSS-t-SNE Method Validation
The dataset from the UCI machine learning repository are used for validating the proposed method [37]. We choose "Australian Sign Language signs (High Quality)" dataset from this repository. The samples of this dataset were captured from a native signer using high-quality position trackers [38]. This dataset is usually used for classification task. In this test, four different signs are selected for data visualization and classification. These four signs are "change mind", "answer", "forget" and "deaf", which are represented by Sign1, Sign2, Sign3 and Sign4, respectively. The original features of this dataset are 22. The features are constituted of pattern data recorded for each hand, such as palm positions, palm degrees, finger bend measures, etc. The more information can be obtained from the website of UCI machine learning repository.
The number of features in optimal feature subset is set to 4. As FSS-t-SNE is a proposed method based on t-SNE, the results of the t-SNE method are used for comparison. The data visualization results are shown in Figure 1. Figure 1a,b show the visualization results of two signs. In Figure 1a, it is hard to find a dividing line between Sign 1 and Sign 2. In Figure 1b, the two signs can be easily separated. Figure 1c,d show the results of three signs. Figure 1e,f show the results of four signs.

Figure 1. (a) t-SNE for two signs; (b) FSS-t-SNE for two signs; (c) t-SNE for three signs; (d) FSS-t-SNE for three signs; (e) t-SNE for four signs; (f) FSS-t-SNE for four signs.
According to the results, FSS-t-SNE can improve the data visualization performance for classification. In order to evaluate the improvement for classification, the k-Nearest Neighbor (KNN) classifier is used for classifying the different signs. In KNN classifier, the number of the neighbors is set to 1. Ten samples of each sign are used for training the classifier. The results of classification accuracy are shown in Table 1. It is shown that FSS-t-SNE can improve the classification accuracy effectively.

Feature Set Construction from Multi-Sensor Signals
The combustion process is an important part in a working cycle of the diesel engine. Most malfunctions of diesel engines can be reflected in the combustion process. For instance, when a fault occurs on the fuel pump of a cylinder, it causes fuel supply problems, and the combustion of the corresponding cylinder is inevitably weaker. As mentioned in Section 1, cylinder vibration and pressure are mostly used in the condition monitoring of diesel engine. Therefore, the combustion segment is drawn from these two signals to analyze the different malfunctions of diesel engines. Figure 2 shows the combustion segment from the cylinder vibration signal. Figure 3 shows the combustion segment from the cylinder pressure signal. The combustion of the diesel engine is a very intricate process. It is hardly possible to classify the different malfunctions by a single feature. Therefore, more features should be extracted to construct a feature set. According to the figure, the combustion segments from the two measurement methods are both non-stationary signals. In general, when malfunctions occur, this can be reflected in the time-domain parameters of the signals. In this paper, 15 time-domain features are considered. They are described in Table 2.

Feature Equation
Root Mean Square For each measurement method, the original signal and its envelope signal are used to compute the time-domain features. Then, a feature set with 60 dimensions is constructed. This high-dimensional data is used as the input of the proposed data visualization method. According to the result of the data visualization, classification task is performed by a classifier. The procedure of the malfunction classification on a diesel engine can be represented by a flowchart that is shown in Figure 4.

Diesel Engine Malfunction Experiment
In order to validate the proposed method applied on malfunction classification of a diesel engine, a diesel engine malfunction experiment was performed. In this experiment, a large power marine diesel engine, which was being inspected before shipment was used. A drawing of this diesel engine is shown in Figure 5. The basic technical data of this diesel engine is shown in Table 3. The structural representation of this engine is shown in Figure 6. There are 16 cylinders in this diesel engine. They are arranged in two banks, which are named as Bank A and Bank B, respectively. The malfunctions were created on cylinder A5. Figure 7 shows the setup view of cylinder A5. The vibration sensor was mounted on top of the cylinder. The pressure sensor was mounted on the screwed hole on the top of the cylinder.
The specifications of the malfunctions created in this experiment are described in Table 4. Because the combustion process of diesel engine is affected by many parameters, such as the engine load, engine speed, engine age, exact fuel injection time, etc., the cylinder pressure signals and cylinder vibration signals may vary when parameters change. In order to reduce the effect of engine age, a diesel engine which was being inspected before shipment was used to perform the bench experiment. In this bench experiment, the parameters such as engine load, engine speed, fuel injection time, fuel condition, etc. were controlled under the same conditions. All the malfunctions were created individually. Three different malfunction conditions and normal working condition were performed in this experiment. To create a lack of fuel supply, the fuel supply controller was adjusted to supply low fuel. To create bad fuel atomization, a faulty fuel atomizer nozzle was replaced on the corresponding cylinder. To create an exhaust valve leakage, a hole was milled on the exhaust valve of the corresponding cylinder. Under each condition, the engine was run at 835 ± 3 rpm. When the engine was running in stationary state, signal sampling was performed. Cylinder vibration and pressure signals were sampled from cylinder A5. Figure 8a shows a segment of the sampled cylinder vibration signal. There are many vibration sources in the sampled vibration data, including intake valve vibrations, exhaust valve vibrations, fuel pump vibrations, combustion vibrations, etc. The combustion vibration is used to analyze the different conditions of the diesel engine. The red circles in Figure 8a indicate the combustion vibrations. Figure 8b shows a segment of the sampled cylinder pressure signal.

Results and Discussion
The cylinder vibration and pressure signals were sampled under four different conditions in the malfunction experiment, including normal condition and three malfunction conditions. Different features are extracted from these signals. The high-dimensional data of the four conditions are used by the data visualization method for classification. The FSS-t-SNE method is compared with other methods, including linear methods such as PCA and LDA, and non-linear methods such as ISOMAP and t-SNE. The data visualization results are shown in Figure 9.
The linear methods, PCA and LDA, which are shown in Figure 9b,c, can only show three different clusters. In Figure 9b, the LOFS condition doesn't have a good clustering performance. Compared with the PCA method, the LDA method shown in Figure 9c has better clustering performance. However, in both figures, the NORMAL condition and the LOEV condition are completely overlapped. It is impossible to separate these two conditions by the two methods. The non-linear methods, ISOMAP and t-SNE, are shown in Figure 9d,e. They can also show the three different clusters. Compared with the two linear methods above, the ISOMAP method is more similar to the PCA method.
t-SNE has the best performance among these methods. In Figure 9e, the points belonging to the same class is closely conglomerated, whereas those belonging to different classes are clearly separated, except for the NORMAL and LOEV conditions. Apparently, it still cannot separate the NORMAL and LOEV conditions of the diesel engine. In Figure 9e, the NORMAL and LOEV conditions are still completely overlapped. Figure 9f shows the result of the proposed FSS-t-SNE method. It clearly shows four different clusters. Although the NORMAL and LOEV conditions are close to each other, they don't overlap and can be separated easily. Moreover, the good clustering performance of the t-SNE method is retained.  demonstrates that the proposed method, FSS-t-SNE, outperforms the other methods in data visualization of diesel engine multi-sensor data under different conditions. In the proposed FSS-t-SNE method, the number of features contained in the optimal feature subset is a variable parameter. According to the principle of feature selection, different numbers of features form different optimal feature subsets. They have different performance in data visualization. Figure 10 shows the data visualization results of different optimal feature subsets that contains different numbers of features selected by FSS-t-SNE. When 55 features are selected, it is nearly the same to the result of t-SNE shown in Figure 9e. The NORMAL and LOEV conditions are completely overlapped. As the number of features decreases, the overlapped two classes are separated gradually. The clustering performance is also improved. From Figure 10e, it can be seen that the two overlapped classes are separated when 10 features are selected. In order to analyze the improvement of classification performance when selected features change, the KNN classifier is used to compute the classification accuracy of four conditions. In KNN classifier, the number of the neighbors is set to 1. Ten samples of each condition of the diesel engine are used for training the KNN classifier. The accuracy curve is shown in Figure 11. According to the accuracy curve, the FSS-t-SNE has high accuracy when the number of selected features is less than 50. The accuracy is higher than 85% when the selected number is less than 50. When 20 features are selected, the malfunction classification has the highest accuracy. The accuracy decreases after the number of features is less than 10. As shown in the Figure 11, the accuracy curve is not smooth. It fluctuates when the number of features changes. There are two main causes for this result. First, FSS-t-SNE initializes the low-dimensional data with the random data points. Selecting different features may lead to different distribution results of the data points that belong to the same class. However, the samples used for training the KNN classifier are the same. This may have an effect on the classification accuracy of the KNN classifier. Second, the samples for computing the classification accuracy are limited. Because the aim of our study is a large power marine diesel engine, in order to reduce the effect of engine age, we used a brand new engine which was being inspected before shipment. The malfunction experiment may cause damage to a new engine. Therefore, the engine could only be run for a short time under each malfunction condition and limited samples were collected from the engine. Although the accuracy curve fluctuates, the whole tendency of the curve shows that the classification accuracy is improved.

Conclusions
A diesel engine is a complex dynamic machine. It is difficult to evaluate the state of the engine by single features extracted from the single measurement signal. In order to solve this problem, various features extracted from multi-sensor signals are used to identify and classify the different states or malfunctions of the engine. These different features from multi-sensor signals constitute a high-dimensional dataset. The identification and classification problem is transformed to solve the pattern recognition problem with this high-dimensional data. Data visualization using the dimensionality reduction method is an effective way. In the malfunction classification of diesel engines, not all the features are sensitive to the specific malfunction condition. Some features may deteriorate the result of classification. Irrelevant and non-sensitive features should be eliminated before data visualization. For this purpose, a feature subset score based t-SNE data visualization method is proposed in this paper. Optimal feature subset is obtained by the feature subset score criterion. Then the non-linear dimensionality reduction method t-SNE is used for data visualization. Malfunction experiments with a large power diesel engine were performed to validate the proposed method. According to the experimental results, the proposed method has high classification accuracy in malfunction classification of the diesel engine.