A Study on the Anomaly Detection of Engine Clutch Engagement/Disengagement Using Machine Learning for Transmission Mounted Electric Drive Type Hybrid Electric Vehicles

: Transmission mounted electric drive type hybrid electric vehicles (HEVs) engage/disengage an engine clutch when EV ↔ HEV mode transitions occur. If this engine clutch is not adequately engaged or disengaged, driving power is not transmitted correctly. Therefore, it is required to verify whether engine clutch engagement/disengagement operates normally in the vehicle development process. This paper studied machine learning-based methods for detecting anomalies in the engine clutch engagement/disengagement process. We trained the various models based on multi-layer perceptron (MLP), long short-term memory (LSTM), convolutional neural network (CNN), and one-class support vector machine (one-class SVM) with the actual vehicle test data and compared their results. The test results showed the one-class SVM-based models have the highest anomaly detection performance. Additionally, we found that conﬁguring the training architecture to determine normal/anomaly by data instance and conducting one-class classiﬁcation is proper for detecting anomalies in the target data.


Introduction
A transmission mounted electric drive (TMED) type hybrid electric vehicle (HEV) is a parallel hybrid electric vehicle with a structure in which an engine clutch is mounted between an engine and a motor that is connected to a transmission input shaft. In this vehicle structure, the engine clutch is released in EV driving mode, which drives the vehicle only with the motor. The engine clutch is coupled in HEV driving mode, which drives the vehicle with the engine and motor together [1]. According to a power distribution strategy, a hybrid control unit (HCU) drives the vehicle in EV mode or HEV mode [2][3][4][5]. Therefore, EV↔HEV mode transitions can occur when the vehicle is driving, and the engine clutch is engaged or disengaged. If the engine clutch is not adequately engaged or disengaged, power is not transmitted correctly. Thus, it is necessary to verify whether engine clutch engagement/disengagement operates normally in the vehicle development process.
Studies on fault or anomaly detection for vehicle powertrains have been carried out by various approaches. They can be classified by rule-based methods [6][7][8][9][10][11][12], mathematical lies in the engine clutch engagement/disengagement process required for EV↔HEV mode transitions, which is a crucial function of TMED type HEVs. We used data-driven methods to make it easy to apply to various vehicle data in the future. Additionally, previous studies' rule-based methods and mathematical model-based methods have limitations in applying them to target control function. The rule-based techniques are difficult to apply to complex control functions because they use simple rules generally. In addition, it is not easy to construct a mathematical model for the engine clutch engagement/disengagement process. As noted earlier, little research has been conducted on detecting anomalies in the system operation level of the powertrain control. Therefore, we used the basic and most widely used learning architecture. We used multi-layer perceptron (MLP), long short-term memory (LSTM), convolutional neural network (CNN), and one-class support vector machine (one-class SVM) to train the models for anomaly detection and compared the trained models. MLP is the most basic neural network architecture, and CNN and LSTM are the most widely used learning architectures recently. To investigate the performance of the trained model for actual vehicle data, we used real vehicle test data. As a result of the study, we found that the one-class classification method is the most effective.
This paper is organized as follows. Section 2 introduces the structure of TMED type HEVs and the engine clutch engagement/disengagement process in more detail. Section 3 describes the basic data preprocessing for model training and testing, and Section 4 explains the anomaly detection model training methods. Section 5 shows the test results for the trained models, and the conclusions are described in Section 6.

Target Vehicle
This paper's target vehicle is a TMED type parallel HEV with the following powertrain structure. This structure is one in which we were able to obtain actual vehicle data. In Figure 1, MG1 represents the BSG (belt-driven starter and generator), MG2 represents the main traction motor, ENG represents the engine, BAT represents the high voltage battery, TM represents the transmission, and FD represents the final drive. methods to make it easy to apply to various vehicle data in the future. Add vious studies' rule-based methods and mathematical model-based methods tions in applying them to target control function. The rule-based technique to apply to complex control functions because they use simple rules generally it is not easy to construct a mathematical model for the engine clutch engag gagement process. As noted earlier, little research has been conducted on det alies in the system operation level of the powertrain control. Therefore, we u and most widely used learning architecture. We used multi-layer perceptron short-term memory (LSTM), convolutional neural network (CNN), and onevector machine (one-class SVM) to train the models for anomaly detection a the trained models. MLP is the most basic neural network architecture, a LSTM are the most widely used learning architectures recently. To investiga mance of the trained model for actual vehicle data, we used real vehicle te result of the study, we found that the one-class classification method is the m This paper is organized as follows. Section 2 introduces the structure o HEVs and the engine clutch engagement/disengagement process in more de describes the basic data preprocessing for model training and testing, and plains the anomaly detection model training methods. Section 5 shows the t the trained models, and the conclusions are described in Section 6.

Target Vehicle
This paper's target vehicle is a TMED type parallel HEV with the follo train structure. This structure is one in which we were able to obtain actual In Figure 1, MG1 represents the BSG (belt-driven starter and generator), MG the main traction motor, ENG represents the engine, BAT represents the high tery, TM represents the transmission, and FD represents the final drive. Parallel HEV structures can be classified into P0, P1, P2, P3, and P4 depe motor's position, as shown in Figure 2. In the case of P0 and P1 structures, v fication can be possible at a low cost, but the fuel efficiency improvement tively low. In contrast, P2-P4 structures can improve fuel efficiency more, characterized by high system complexity and high construction cost [45]. Th cle structure, TMED type HEV, can be seen as a P0 + P2 structure and has th of improving high fuel efficiency using a P2 motor and enabling engine sta using the P0 motor. Parallel HEV structures can be classified into P0, P1, P2, P3, and P4 depending on the motor's position, as shown in Figure 2. In the case of P0 and P1 structures, vehicle electrification can be possible at a low cost, but the fuel efficiency improvement effect is relatively low. In contrast, P2-P4 structures can improve fuel efficiency more, but they are characterized by high system complexity and high construction cost [45]. The target vehicle structure, TMED type HEV, can be seen as a P0 + P2 structure and has the advantages of improving high fuel efficiency using a P2 motor and enabling engine start/generation using the P0 motor.

Target Data
The TMED type HEV shown in Figure 1 drives the vehicle using MG2 in EV d mode and drives the vehicle mainly using the engine and MG2 in HEV driving According to a power distribution strategy, an HCU drives the vehicle in EV m HEV mode. Therefore EV↔HEV mode transitions can occur when the vehicle is d For an HEV→EV mode transition, an HCU gives an EV mode transition comman then an engine clutch is disengaged with a clutch pressure drop. Additiona EV→HEV mode transition is accomplished through the following process [46].
1. Cranking an engine using MG1 or MG2; 2. Speed synchronization of both sides of the engine clutch (engine and traction speed synchronization); 3. Engine clutch engagement and transition to HEV mode. This paper tried to detect anomalies related to this engine clutch engagement gagement occurring in EV↔HEV mode transitions. Table 1 shows the cases of representative anomalous behavior for such data. gine clutch engagement failure is the case when the speed difference between an and a motor occurs at higher than a certain level, although an HCU applies an clutch command to be fully engaged. The engine clutch disengagement failure is t when the speed difference between an engine and a motor occurs lower than a level because the clutch is not released correctly, although an HCU applies an clutch command to be released. The clutch pressure command following failure is t when the clutch pressure does not follow a clutch pressure command from the H TCU (transmission control unit). We collected actual vehicle test data, including lowing anomalous behavior cases, and trained the models to detect anomalies wit data.

Case Description
Engine clutch engagement failure The speed difference between an engine and a moto ceeds a certain level, although an HCU applies an en clutch command to be fully engaged. (This case is that the engine clutch is not fully engag intended. For the target vehicle, as the engine clutch nects the engine and the motor, there should be no s difference between the engine and the motor when H commands the engine clutch as full engagement in n conditions.) The speed difference between an engine and a moto

Target Data
The TMED type HEV shown in Figure 1 drives the vehicle using MG2 in EV driving mode and drives the vehicle mainly using the engine and MG2 in HEV driving mode. According to a power distribution strategy, an HCU drives the vehicle in EV mode or HEV mode. Therefore EV↔HEV mode transitions can occur when the vehicle is driving. For an HEV→EV mode transition, an HCU gives an EV mode transition command, and then an engine clutch is disengaged with a clutch pressure drop. Additionally, an EV→HEV mode transition is accomplished through the following process [46].
Speed synchronization of both sides of the engine clutch (engine and traction motor speed synchronization); 3.
Engine clutch engagement and transition to HEV mode.
This paper tried to detect anomalies related to this engine clutch engagement/ disengagement occurring in EV↔HEV mode transitions. Table 1 shows the cases of representative anomalous behavior for such data. An engine clutch engagement failure is the case when the speed difference between an engine and a motor occurs at higher than a certain level, although an HCU applies an engine clutch command to be fully engaged. The engine clutch disengagement failure is the case when the speed difference between an engine and a motor occurs lower than a certain level because the clutch is not released correctly, although an HCU applies an engine clutch command to be released. The clutch pressure command following failure is the case when the clutch pressure does not follow a clutch pressure command from the HCU or TCU (transmission control unit). We collected actual vehicle test data, including the following anomalous behavior cases, and trained the models to detect anomalies with these data. Table 1. Representative anomalous behaviors in engine clutch engagement/disengagement.

Case Description
Engine clutch engagement failure The speed difference between an engine and a motor exceeds a certain level, although an HCU applies an engine clutch command to be fully engaged. (This case is that the engine clutch is not fully engaged as intended. For the target vehicle, as the engine clutch connects the engine and the motor, there should be no speed difference between the engine and the motor when HCU commands the engine clutch as full engagement in normal conditions.)

Engine clutch disengagement failure
The speed difference between an engine and a motor is less than a certain level over a certain time, although an HCU applies an engine clutch command to be released/open and an engine operating mode command to be off for EV mode. (This case is that the engine clutch is not released as intended in EV mode. As a result, the speed difference is small as the engine clutch still connects the engine and the motor. In normal conditions, there should be a speed difference between the engine and the motor because the motor has speed according to the vehicle speed and the engine speed is zero due to the engine off command.) Clutch pressure command following failure The difference between a clutch pressure command value and a clutch pressure sensor value exceeds a certain level over a certain time.
(This case is that the engine clutch pressure does not follow a command. In normal conditions, the difference between the pressure command and pressure sensor value should be small enough. When the pressure command changes significantly, hydraulic generation delay can slightly increase this difference, but the duration should not be long.) Appl. Sci. 2021, 11, 10187 5 of 21

Data Interpolation
We used the data acquired through actual vehicle tests to train the models that can detect anomalies. The target data are the signals in the CAN bus related to engine clutch engagement/disengagement. For example, engine speed, clutch status, clutch hydraulic pressure command from the HCU or TCU, etc. Because the target signals are transmitted from various controllers, sampling time and period are slightly different. To synchronize these sampling times, we defined a time vector with a specific period and then conducted linear interpolation of the target signals to this time vector.

Target Data Section Extraction (Pattern Extraction)
When driving, an HEV operates in various driving modes according to an HCU's power distribution strategy. Accordingly, there are cases where an engine clutch is not engaged. For example, an engine clutch is disengaged in EV driving mode, and it is not related to the target situation. To deal with only the related data, we extracted the data from when an engine clutch is engaged to when an engine clutch is disengaged based on the engine clutch control state command signal from the HCU. We then defined one extracted data section as a pattern like a Figure 3. If there are any anomalies in a pattern, the pattern is labeled as an anomaly pattern.
should be small enough. When the pressure command changes significantly, hydraulic generation delay can slightly increase this difference, but the duration should not be long.)

Data Interpolation
We used the data acquired through actual vehicle tests to train the models that ca detect anomalies. The target data are the signals in the CAN bus related to engine clutc engagement/disengagement. For example, engine speed, clutch status, clutch hydrauli pressure command from the HCU or TCU, etc. Because the target signals are transmitte from various controllers, sampling time and period are slightly different. To synchroniz these sampling times, we defined a time vector with a specific period and then conducted linear interpolation of the target signals to this time vector.

Target Data Section Extraction (Pattern Extraction)
When driving, an HEV operates in various driving modes according to an HCU' power distribution strategy. Accordingly, there are cases where an engine clutch is no engaged. For example, an engine clutch is disengaged in EV driving mode, and it is no related to the target situation. To deal with only the related data, we extracted the dat from when an engine clutch is engaged to when an engine clutch is disengaged based o the engine clutch control state command signal from the HCU. We then defined one ex tracted data section as a pattern like a Figure 3. If there are any anomalies in a pattern, th pattern is labeled as an anomaly pattern.  The number of normal/anomaly patterns extracted through this process is shown in Table 2. We can see that the number of anomaly patterns is much less than the number of normal patterns. This is because developed vehicle control functions are generally first verified through tests such as model-in-the-loop simulation (MILS) and hardware-inthe-loop simulation (HILS) before being applied to actual vehicles. These tests examine unintended behaviors and improve control function's quality, resulting in fewer anomalous data for controllers and control logic installed in actual vehicles. This normal/anomaly data imbalance is a common phenomenon that occurs not only in vehicles but also in other manufacturing industries. If there are too little anomalous data, it is challenging to learn the characteristics of anomalous data. Therefore, we composed training and test data by copying the anomaly patterns, as shown in Table 3. Because the acquired anomalous data are representative anomalous data of the target control function, we copied the anomalous data directly. However, it is difficult to obtain anomalous data as much as normal data. To address this, we copied anomalous data so that the ratio of normal to anomalous data was about 3:1. The ratio of 3:1 is an arbitrarily determined value. Table 2. The number of engine clutch engagement/disengagement patterns for model training and test (before the copy of anomaly patterns).

Normal Patterns Anomaly Patterns
Number of data 1878 25 Table 3. The number of engine clutch engagement/disengagement patterns for model training and test (after the copy of anomaly patterns).

Normal Patterns Anomaly Patterns
Number of data 1878 625

Anomaly Detection Model Training
This section describes the model training method that can detect engine clutch engagement/disengagement anomalies using the data preprocessed in Section 3. We used MLP, LSTM, CNN, and one-class SVM architecture to train the model. We trained the model with various hyperparameters for each architecture and compared the results.

Multi-Layer Perceptron (MLP)
MLP is the neural network in the form of sequentially attaching several layers that are composed of perceptrons. Figure 4 shows the structure of the perceptron [47]. As shown in Figure 4, a single perceptron adds up all the weighted inputs and biases and then calculates an output h by inputting the summed value to an activation function, shown in Figure 5. Table 2. The number of engine clutch engagement/disengagement patterns for mod test (before the copy of anomaly patterns).

Normal Patterns
Anomaly Pa Number of data 1878 25 Table 3. The number of engine clutch engagement/disengagement patterns for mod test (after the copy of anomaly patterns).

Normal Patterns Anomaly Pa
Number of data 1878 625

Anomaly Detection Model Training
This section describes the model training method that can detect engi gagement/disengagement anomalies using the data preprocessed in Sectio MLP, LSTM, CNN, and one-class SVM architecture to train the model. W model with various hyperparameters for each architecture and compared th

Multi-Layer Perceptron (MLP)
MLP is the neural network in the form of sequentially attaching sever are composed of perceptrons. Figure 4 shows the structure of the percep shown in Figure 4, a single perceptron adds up all the weighted inputs an then calculates an output ℎ by inputting the summed value to an activat shown in Figure 5.   We trained the engine clutch engagement/disengagement anomaly detection model using MLP by configuring the input/output structure as shown in Figure 6. Because MLP can receive one-dimensional input data only, the target signals for each pattern were configured and inputted in one dimension, as shown in Figure 6. Although MLP receives onedimensional data, we expected MLP to learn data patterns because units of hidden layers are all connected to input nodes. There are also studies where MLP learned time-series data [48,49]. Before inputting data to MLP, because the length of the target signal in a pattern may be different for each pattern, it is necessary to match the length of the signal and then input it into the network. Accordingly, the data were constructed in one dimension after filling the insufficient data points with zeroes in line with the pattern that had We trained the engine clutch engagement/disengagement anomaly detection model using MLP by configuring the input/output structure as shown in Figure 6. Because MLP can receive one-dimensional input data only, the target signals for each pattern were configured and inputted in one dimension, as shown in Figure 6. Although MLP receives one-dimensional data, we expected MLP to learn data patterns because units of hidden layers are all connected to input nodes. There are also studies where MLP learned time-Appl. Sci. 2021, 11, 10187 7 of 21 series data [48,49]. Before inputting data to MLP, because the length of the target signal in a pattern may be different for each pattern, it is necessary to match the length of the signal and then input it into the network. Accordingly, the data were constructed in one dimension after filling the insufficient data points with zeroes in line with the pattern that had the longest data length. Equation (1) below is the example of a pattern with a length of 5, filling the data with a length of 10. In the equation, x Data is the preprocessed target signal vector, and x Data,Input is the target signal vector that is inputted into the network; x Data,Input configured in this way is reorganized into one dimension in the way shown in Figure 6 and input into the network. We composed the output as normal/anomaly per pattern to determine normal/anomaly considering data trends over time. The two output units shown in Figure 6 are 1 0 T in the case of normal and 0 1 T in the case of anomaly. This output unit configuration follows the setting of the Matlab MLP training app we used. This app determines the number of output neurons as much as the number of output classes. We trained the engine clutch engagement/disengagement anomaly detection model using MLP by configuring the input/output structure as shown in Figure 6. Because MLP can receive one-dimensional input data only, the target signals for each pattern were configured and inputted in one dimension, as shown in Figure 6. Although MLP receives onedimensional data, we expected MLP to learn data patterns because units of hidden layers are all connected to input nodes. There are also studies where MLP learned time-series data [48,49]. Before inputting data to MLP, because the length of the target signal in a pattern may be different for each pattern, it is necessary to match the length of the signal and then input it into the network. Accordingly, the data were constructed in one dimension after filling the insufficient data points with zeroes in line with the pattern that had the longest data length. Equation (1) below is the example of a pattern with a length of 5, filling the data with a length of 10. In the equation, is the preprocessed target signal vector, and , is the target signal vector that is inputted into the network; , configured in this way is reorganized into one dimension in the way shown in Figure 6 and input into the network. We composed the output as normal/anomaly per pattern to determine normal/anomaly considering data trends over time. The two output units shown in Figure 6 are 1 0 in the case of normal and 0 1 in the case of anomaly. This output unit configuration follows the setting of the Matlab MLP training app we used. This app determines the number of output neurons as much as the number of output classes.  The number of hidden layers and hidden units for the MLP-based models were composed as shown in Table 4. We set these values to an appropriate value through trial and error. This configuration is for comparing training results according to the number of hidden units per hidden layer and training results according to the number of hidden layers. For a clear comparison, we composed values with large differences. A hyperbolic tangent function was used as an activation function. In the case of MLP-based models, unlike other training architectures, trained models tended to overfit when data were divided into training data and test data only. Therefore, we trained the models by dividing the data into a training, validation, and test set when training the MLP-based models. The proportions of training, validation, and test sets were 70%, 15%, and 15%, respectively, and data were randomly sampled from the data shown in Table 3. Table 4. Hidden layer and hidden unit settings for MLP-based anomaly detection models.

Model
The

Long Short-Term Memory (LSTM)
A recurrent neural network (RNN) is mainly used to learn ordered data or time-series data such as natural language processing and speech recognition [50][51][52][53][54][55][56][57][58][59]. However, RNN has the vanishing gradient problem that significantly reduces the learning ability when the distance between the previous output and the point where it uses the information from that output is far away [60,61]. LSTM is the proposed neural network architecture to solve this vanishing gradient problem. An LSTM network is composed of connected multiple LSTM cells as shown in Figure 7 [62].
training, validation, and test set when training the MLP-based models. The proportions of training, validation, and test sets were 70%, 15%, and 15%, respectively, and data were randomly sampled from the data shown in Table 3. Table 4. Hidden layer and hidden unit settings for MLP-based anomaly detection models.

Model
The

Long Short-Term Memory (LSTM)
A recurrent neural network (RNN) is mainly used to learn ordered data or time-series data such as natural language processing and speech recognition [50][51][52][53][54][55][56][57][58][59]. However, RNN has the vanishing gradient problem that significantly reduces the learning ability when the distance between the previous output and the point where it uses the information from that output is far away [60,61]. LSTM is the proposed neural network architecture to solve this vanishing gradient problem. An LSTM network is composed of connected multiple LSTM cells as shown in Figure 7 [62]. Equations (2)-(7) are the equations for an LSTM cell unit. In each equation, W q , U q , and b q (q = f , i, o, c) denote weight and bias, respectively. Here, x t represents the input vector of the LSTM cell unit, f t represents the forget gate's activation vector, i t represents the input/update gate's activation vector, o t represents the output gate's activation vector, c t represents the cell input activation vector, h t represents the hidden state vector that is known as the LSTM cell unit's output vector, and represents the Hadamard product [63].
We trained the engine clutch engagement/disengagement anomaly detection model using LSTM by configuring the input/output structure as shown in Figure 8. The network was constructed by sequentially connecting the LSTM layer, fully connected layer, softmax layer, and classification layer, as shown in the figure. We inputted target signals per pattern into the LSTM layer and configured normal/anomaly per pattern as the output of the network. We also matched the LSTM's input data length to the length of the longest pattern using Equation (1). The data were normalized so that the average of the input data is zero before inputting the data.
We trained the engine clutch engagement/disengagement anomaly detection model using LSTM by configuring the input/output structure as shown in Figure 8. The network was constructed by sequentially connecting the LSTM layer, fully connected layer, softmax layer, and classification layer, as shown in the figure. We inputted target signals per pattern into the LSTM layer and configured normal/anomaly per pattern as the output of the network. We also matched the LSTM's input data length to the length of the longest pattern using Equation (1). The data were normalized so that the average of the input data is zero before inputting the data. The number of LSTM layers and hidden units for the model based on LSTM were composed as shown in Table 5. We set these values to an appropriate value through trial and error. This configuration is for comparing training results according to the number of hidden units per LSTM layer and training results according to the number of LSTM layers. But as there are many learning parameters for each LSTM layer, it can be overfitted if there are many layers. Therefore, to prevent this, we reduced the number of hidden units per The number of LSTM layers and hidden units for the model based on LSTM were composed as shown in Table 5. We set these values to an appropriate value through trial and error. This configuration is for comparing training results according to the number of hidden units per LSTM layer and training results according to the number of LSTM layers. But as there are many learning parameters for each LSTM layer, it can be overfitted if there are many layers. Therefore, to prevent this, we reduced the number of hidden units per layer if there were three LSTM layers. For training, 80% of the data shown in Table 3 were randomly sampled, and other data were used as test data. Table 5. LSTM layer and hidden unit settings for the LSTM-based anomaly detection models.

Model
The

Convolutional Neural Network (CNN)
A CNN is the network architecture that learns directly from data, eliminating the need for manual feature extraction. CNNs are particularly useful for finding patterns in images to recognize objects, faces, and scenes. They can also be quite effective for classifying non-image data such as audio, time series, and signal data [64][65][66][67][68][69][70][71][72][73]. Figure 9 shows an example of image classification using a CNN [65]. As shown in the figure, a convolution operation is repeatedly performed to extract features, and extracted features are entered into the fully connected network to classify the images.
A CNN is the network architecture that learns directly from data, eliminating the need for manual feature extraction. CNNs are particularly useful for finding patterns in images to recognize objects, faces, and scenes. They can also be quite effective for classifying non-image data such as audio, time series, and signal data [64][65][66][67][68][69][70][71][72][73]. Figure 9 shows an example of image classification using a CNN [65]. As shown in the figure, a convolution operation is repeatedly performed to extract features, and extracted features are entered into the fully connected network to classify the images. We trained the engine clutch engagement/disengagement anomaly detection model using a CNN by configuring the input/output structure as shown in Figure 10. As shown in the figure, the architectures with one and three convolution layers were constructed. In [74], the authors configured the channel for time-series data to enter the data into the convolution layer to classify time-series data. Similarly, we configured the channel for each target signal to enter the data into the first convolution layer. The output of the network is the normal/anomaly per pattern, as in the aforementioned MLP and LSTM. For input signals of the network, we matched the length of data per pattern in the same way as the MLP and LSTM, using Equation (1), and then entered the data into the network. The composition of layers for each architecture in Figure 10 is as follows. First, Table 6 shows the hyperparameter setting of the convolutional layer for each architecture. In the table, [w, h] of the filter size means [height, width (time axis)] of the filter, and [a, b] of the stride means [vertical step size, horizontal step size]. Because the input data is one-dimensional, the height of the filter size and vertical step size of the stride is always 1. We made the input size and output size of the layer the same by using zero padding. In the batch normalization layer, z-score normalization is conducted on input data for each channel. In the ReLU layer, the activation function shown in Figure 5c is applied to the input data. The max pooling layer performs downsampling by outputting a maximum value for a We trained the engine clutch engagement/disengagement anomaly detection model using a CNN by configuring the input/output structure as shown in Figure 10. As shown in the figure, the architectures with one and three convolution layers were constructed. In [74], the authors configured the channel for time-series data to enter the data into the convolution layer to classify time-series data. Similarly, we configured the channel for each target signal to enter the data into the first convolution layer. The output of the network is the normal/anomaly per pattern, as in the aforementioned MLP and LSTM. For input signals of the network, we matched the length of data per pattern in the same way as the MLP and LSTM, using Equation (1), and then entered the data into the network. The composition of layers for each architecture in Figure 10 is as follows. First, Table 6 shows the hyperparameter setting of the convolutional layer for each architecture. In the table, [w, h] of the filter size means [height, width (time axis)] of the filter, and [a, b] of the stride means [vertical step size, horizontal step size]. Because the input data is one-dimensional, the height of the filter size and vertical step size of the stride is always 1. We made the input size and output size of the layer the same by using zero padding. In the batch normalization layer, z-score normalization is conducted on input data for each channel. In the ReLU layer, the activation function shown in Figure 5c is applied to the input data. The max pooling layer performs downsampling by outputting a maximum value for a specific region (pooling region) of input data. The sizes of the pooling regions of the two max pooling layers of the 3-convolutional layer architecture were set to (1,3) and (1,4), respectively, and the stride values were also set to (1,3) and (1, 4), respectively. Here, (w, h) of the pooling regions size means height, width (time axis)] of pooling region. The set values for each layer above were selected according to a general method or through trial and error.

1-Convolution Layer Architecture
Hyperparameter 1st convolutional layer -- (1, 1) --specific region (pooling region) of input data. The sizes of the pooling regions of the two max pooling layers of the 3-convolutional layer architecture were set to (1,3) and (1,4), respectively, and the stride values were also set to (1,3) and (1,4), respectively. Here, (w, h) of the pooling regions size means height, width (time axis)] of pooling region. The set values for each layer above were selected according to a general method or through trial and error.   The number of convolution layers for the model based on a CNN were composed as shown in Table 7. This configuration is for comparing training results according to the number of CNN layers. For a clear comparison, we composed values with large differences. For training, 80% of the data shown in Table 3 were randomly sampled, and other data were used as test data. The number of convolution layers for the model based on a CNN were composed as shown in Table 7. This configuration is for comparing training results according to the number of CNN layers. For a clear comparison, we composed values with large differences. For training, 80% of the data shown in Table 3 were randomly sampled, and other data were used as test data. Table 7. Convolution layer settings for the CNN-based anomaly detection models.

One-Class SVM
Like the target data in this paper, when the number of data per class is unbalanced, a model is sometimes learned using only a class with a large number of data, which is called one-class classification [75]. One-class SVM is a representative method in one-class classification [76]. Because the target data in this study were also disproportionate in the number of normal/anomaly data, as shown in Table 2, we trained the models to detect anomalies in engine clutch engagement/disengagement data using one-class SVM and only normal data. Figure 11 shown the engine clutch engagement/disengagement anomaly detection model training structure using one-class SVM. For effective training, we performed z-score normalization and principal component analysis (PCA). The one-class SVM model was trained with data dimensionally reduced to the principal component space through PCA. We used data projected into the three-dimensional principal component space because the principal component contribution analysis of the target signal showed that the cumulative contribution rate of the three principal components was 71-77%. model training structure using one-class SVM. For effective training, we performed zscore normalization and principal component analysis (PCA). The one-class SVM model was trained with data dimensionally reduced to the principal component space through PCA. We used data projected into the three-dimensional principal component space because the principal component contribution analysis of the target signal showed that the cumulative contribution rate of the three principal components was 71-77%. The input/output data structure for training the one-class SVM model consisted of two types. The first type consisted of target data per pattern as input and normal/anomaly per pattern as output (Type 1). The second type consisted of normal/anomaly per data instance of the target signals as output, and the input data were the same as Type 1 (Type 2). The model learned with this configuration determines the normal/anomaly state of the data instance. Training examples for each type are shown in Figure 12. However, for vehicle data, the duration of a particular situation can be a criterion for normal/anomaly status. For example, for hydraulic pressure following errors in engine clutches, the errors above a certain level for a short time may occur because it takes a certain amount of time for hydraulic pressure to be generated. This situation should be seen as normal. As configuring normal/anomaly per data instance as the output of the model makes it difficult to determine normal/anomaly for the duration of this situation, we configured the signal duration ( , = 1,2, ⋯) as shown in Figure 13 as additional input data to the model. As shown in the figure, the signal duration was derived by separating the interval of the signal based on when any target signals change, and for this, continuous signals should be discretized. If the configured signal duration satisfies an anomaly criterion, all data instances in the corresponding signal interval are labeled as anomalies. The input/output data structure for training the one-class SVM model consisted of two types. The first type consisted of target data per pattern as input and normal/anomaly per pattern as output (Type 1). The second type consisted of normal/anomaly per data instance of the target signals as output, and the input data were the same as Type 1 (Type 2). The model learned with this configuration determines the normal/anomaly state of the data instance. Training examples for each type are shown in Figure 12. However, for vehicle data, the duration of a particular situation can be a criterion for normal/anomaly status. For example, for hydraulic pressure following errors in engine clutches, the errors above a certain level for a short time may occur because it takes a certain amount of time for hydraulic pressure to be generated. This situation should be seen as normal. As configuring normal/anomaly per data instance as the output of the model makes it difficult to determine normal/anomaly for the duration of this situation, we configured the signal duration (DT k , k = 1, 2, · · · ) as shown in Figure 13 as additional input data to the model. As shown in the figure, the signal duration was derived by separating the interval of the signal based on when any target signals change, and for this, continuous signals should be discretized. If the configured signal duration satisfies an anomaly criterion, all data instances in the corresponding signal interval are labeled as anomalies.
The models based on one-class SVM also learned with various hyperparameter configurations, as shown in Table 8. For the discretization method, the method using domain knowledge discretized the target continuous signals densely where dense discretization is required and coarsely where it is not. The area where dense discretization or coarse discretization is needed was determined by domain knowledge. Dense discretization discretized the target continuous signals to be sufficiently narrow and evenly spaced. The outlier fraction is the prediction ratio of how much anomalous data will be within the training data. A small outlier fraction means predicting that there will be fewer anomalous data within the training data and a large outlier fraction means predicting that there will be many anomalous data within the training data. In this work, we performed the one-class SVM learning with only data labeled as normal. Thus, we trained the model with small outlier fraction values. For training, 80% of the data shown in Table 3 were randomly sampled, and other data were used as test data.  The models based on one-class SVM also learned with various hyperparam figurations, as shown in Table 8. For the discretization method, the method usin knowledge discretized the target continuous signals densely where dense dis is required and coarsely where it is not. The area where dense discretization discretization is needed was determined by domain knowledge. Dense discreti cretized the target continuous signals to be sufficiently narrow and evenly sp outlier fraction is the prediction ratio of how much anomalous data will be training data. A small outlier fraction means predicting that there will be fewer a data within the training data and a large outlier fraction means predicting that be many anomalous data within the training data. In this work, we performe class SVM learning with only data labeled as normal. Thus, we trained the m small outlier fraction values. For training, 80% of the data shown in Table 3 domly sampled, and other data were used as test data.   The models based on one-class SVM also learned with various hyperparameter configurations, as shown in Table 8. For the discretization method, the method using domain knowledge discretized the target continuous signals densely where dense discretization is required and coarsely where it is not. The area where dense discretization or coarse discretization is needed was determined by domain knowledge. Dense discretization discretized the target continuous signals to be sufficiently narrow and evenly spaced. The outlier fraction is the prediction ratio of how much anomalous data will be within the training data. A small outlier fraction means predicting that there will be fewer anomalous data within the training data and a large outlier fraction means predicting that there will be many anomalous data within the training data. In this work, we performed the oneclass SVM learning with only data labeled as normal. Thus, we trained the model with small outlier fraction values. For training, 80% of the data shown in Table 3 were randomly sampled, and other data were used as test data.

Anomaly Detection Model Test Results
This section describes the test results of the models trained with the architectures constructed in Section 4. We used the data not used for training to test the models and compared the results using true positive rate (TPR), true negative rate (TNR), and accuracy. The TPR, TNR, and accuracy were calculated using equations (8) to (10). In the equations, TP means true positive, FN means false negative, TN means true negative, and FP means false positive.

Multi-Layer Perceptron (MLP)
Unlike other architectures such as LSTM, CNN, and one-class SVM, the MLP-based anomaly detection models showed a rather large performance difference each time they were trained, even in the same hidden layers and hidden units. Therefore, we trained the MLP-based anomaly detection models for each configuration in Table 4 many times and compared the results. Figure 14a below shows the TPR and TNR results, and Figure 14b shows the TNR results according to training iteration for each model. In Figure 14a, we can see that the TPR was mostly derived high, but the TNR was often derived low. Additionally, in Figure 14b, we can see that TNR tended to appear low when the training iteration was small. That is, models with a low TNR can be viewed as local optimal. According to these results, we can conclude that the MLP-based anomaly detection models are capable of adequately distinguishing anomalies from normal, but local optimal models with a low TNR can be easily derived. Table 9 shows the average values of the training results for each model. We can see that the TNRs and accuracies are lower than the TPRs because low TNR cases were often derived, as shown in Figure 14a.
MLP-based anomaly detection models for each configuration in Table 4 many times and compared the results. Figure 14a below shows the TPR and TNR results, and Figure 14b shows the TNR results according to training iteration for each model. In Figure 14a, we can see that the TPR was mostly derived high, but the TNR was often derived low. Additionally, in Figure 14b, we can see that TNR tended to appear low when the training iteration was small. That is, models with a low TNR can be viewed as local optimal. According to these results, we can conclude that the MLP-based anomaly detection models are capable of adequately distinguishing anomalies from normal, but local optimal models with a low TNR can be easily derived. Table 9 shows the average values of the training results for each model. We can see that the TNRs and accuracies are lower than the TPRs because low TNR cases were often derived, as shown in Figure 14a.   Figure 15 shows the accuracy according to the training progresses of LSTM-m2, and Table 10 shows the training results of the models per the LSTM layers and hidden units configured as shown in Table 5. The numbers 10-90 shown above the graph's x-axis in the figure means the number of epochs. In Figure 15, we can see that the accuracy according to the training progress is oscillating between about 60 and 90% and not converging. Training LSTM is done by dividing the training data into several subsets and finding parameters that minimize the loss function for each training data subset. The reason for accuracy oscillating is optimized LSTM parameters for each subset do not converge and continue to change. This means that the LSTM cannot properly find the pattern of target input/output data. Accordingly, we can see that the accuracy per the models in Table 10 is also low. All training results show high TPRs but low TNRs.
Training LSTM is done by dividing the training data into several subsets and finding parameters that minimize the loss function for each training data subset. The reason for accuracy oscillating is optimized LSTM parameters for each subset do not converge and continue to change. This means that the LSTM cannot properly find the pattern of target input/output data. Accordingly, we can see that the accuracy per the models in Table 10 is also low. All training results show high TPRs but low TNRs.   Figure 16 shows the accuracy according to the training progresses of CNN-m2, and Table 11 shows the training results of the models per the convolutional layers configured as shown in Table 7. The numbers 10-90 shown above the graph's x-axis in the figure means the number of epochs. In Figure 16, we can see the CNN-based models are converging differently from LSTM. However, Table 11 shows that the training results of the CNN-based models have low TNRs, like LSTM-based models.   Figure 16 shows the accuracy according to the training progresses of CNN-m2, and Table 11 shows the training results of the models per the convolutional layers configured as shown in Table 7. The numbers 10-90 shown above the graph's x-axis in the figure means the number of epochs. In Figure 16, we can see the CNN-based models are converging differently from LSTM. However, Table 11 shows that the training results of the CNN-based models have low TNRs, like LSTM-based models.   Table 12 shows the training results of the models per the hyperparameters configured as shown in Table 8. The results of one-class SVM-m2 through m5 show that both TPRs and TNRs are higher than one-class SVM-m1. Through this, we can show that, for oneclass SVM, it is more appropriate to construct normal/anomaly per data instance as the   Table 12 shows the training results of the models per the hyperparameters configured as shown in Table 8. The results of one-class SVM-m2 through m5 show that both TPRs and TNRs are higher than one-class SVM-m1. Through this, we can show that, for one-class SVM, it is more appropriate to construct normal/anomaly per data instance as the output data. As for the discretization of continuous signals, the comparison between one-class SVM-m2 and one-class SVM-m3 through m5 shows that dense discretization is more advantageous for TPR. One-class SVM-m3 through m5 are the results of different outlier fractions, and we can see that the larger the outlier fraction, the higher the TPR and the lower the TNR. This is because the larger the outlier fraction, the narrower the decision boundary, as shown in Figure 17. The decision boundary is the criterion by which the model distinguishes anomalies.    Table 12 shows the training results of the models per the hyperparameters configured as shown in Table 8. The results of one-class SVM-m2 through m5 show that both TPRs and TNRs are higher than one-class SVM-m1. Through this, we can show that, for oneclass SVM, it is more appropriate to construct normal/anomaly per data instance as the output data. As for the discretization of continuous signals, the comparison between oneclass SVM-m2 and one-class SVM-m3 through m5 shows that dense discretization is more advantageous for TPR. One-class SVM-m3 through m5 are the results of different outlier fractions, and we can see that the larger the outlier fraction, the higher the TPR and the lower the TNR. This is because the larger the outlier fraction, the narrower the decision boundary, as shown in Figure 17. The decision boundary is the criterion by which the model distinguishes anomalies.   The training results of each architecture and model's configuration are summarized as follows. First, for MLP, it is possible to derive the model that adequately distinguishes normal/anomaly, but we could see the phenomenon where local optimal models with low TNRs are easily derived. The LSTM and CNN-based models also showed low TNRs. Like this, the TNRs of MLP, LSTM, and CNN all tended to have low values. This is thought to be because anomalous data have fewer numbers than normal data. Training is conducted in the direction of increasing the accuracy of the training dataset. Therefore, even if the accuracy of the class having a low number is low, if the accuracy of the class having a high number is high, the overall accuracy is increased. Since our training data also have more normal data than anomalous data, we can infer that the models were trained in this way to increase the accuracy of normal data. This can be confirmed from the high TPR results and low TNR results for each architecture. In the case of one-class SVM, the models that configured normal/anomaly per data instance as output showed high TPRs and TNRs. Through these results, we found that for engine clutch engagement/disengagement data with an imbalance in normal/anomaly, constructing the training architecture to determine normal/anomaly by data instance and performing one-class classification are advantageous for anomaly detection. In this work, we used one-class SVM, which is most commonly used for one-class classification, but other one-class classification architectures are also expected to show high anomaly detection performance.

Discussion
In this paper, we studied the methods for detecting anomalies in the engine clutch engagement/disengagement process required for EV↔HEV mode transitions of TMED type HEVs. We used machine learning-based methods such as MLP, LSTM, CNN, and one-class SVM and trained various models according to different parameters like the number of hidden layers and hidden units, outlier fraction, etc. For data verification at an actual vehicle level, we used the data acquired through actual vehicle tests for model training and testing. The training results showed that the models based on MLP, LSTM, and CNN have low TNRs, whereas one-class SVM-m3 though m5, the models based on one-class SVM, have high TPRs and TNRs. Through these results, we could obtain the following conclusions.

•
For engine clutch engagement/disengagement data, constructing training architecture to determine normal/anomaly by data instance and performing one-class classification are advantageous for anomaly detection.

•
The structure of determining normal/anomaly per pattern cannot learn characteristics of engine clutch engagement/disengagement data properly.
For the second item, the various durations of a vehicle state, such as the duration of clutch engaged, may be one of the reasons why determining normal/anomaly per the pattern is not adequate. We expected the training architectures to learn normal/anomaly for pattern regardless of the duration of the vehicle state by learning the relationship between the data at the previous time and the data at the current time. But it is presumed that the structure of determining normal/anomaly per pattern cannot learn these characteristics well.
Since most of the data acquired through real vehicle tests will have similar characteristics, one-class classification by data instance is expected to be effective for other vehicle test data. Therefore, future work should examine whether a one-class classification by data instance is also effective in detecting anomalies in other HEV powertrain control functions.
Finally, we also anticipate that real-time detection is possible because the time to detect anomalies for given data is short if there is an already trained model. But if there is no trained model, it is difficult to perform this part in real-time because training time is very long.