Prediction Model for Transient NOx Emission of Diesel Engine Based on CNN-LSTM Network

: In order to address the challenge of accurately predicting nitrogen oxide (NOx) emission from diesel engines in transient operation using traditional neural network models


Introduction
The diesel engine has become a preferred choice in heavy transportation and automobile industries due to its high efficiency and power output. However, the emissions produced by diesel engines during operation contribute to global environmental pollution [1]. In recent years, increasingly stringent emission regulations have posed significant challenges to controlling diesel engine emissions. Simply relying on in-machine purification technology is no longer sufficient to meet regulatory requirements, necessitating the use of various post-treatment equipment such as diesel oxidation catalytic (DOC), diesel particulate filter (DPF), selective catalytic reduction (SCR), and more [2]. However, the accurate control strategy for injecting the reducing agent in the SCR system relies on the precise knowledge of the original NOx emissions from the diesel engine. Currently, due to the high cost of NOx sensors, the original NOx emission map is primarily obtained through extensive calibration tests, which are time-consuming and require significant investments. Therefore, there is a need to explore a more convenient method for predicting NOx emissions in diesel engines [3].
To accurately predict NOx emissions from diesel engines, researchers, both domestically and internationally, have proposed methods based on physical models [4,5], as well 2 of 21 as a combination of physical models with MAP mapping [6,7]. While these prediction methods can effectively estimate NOx emissions under steady-state conditions, they face challenges in accurately predicting transient NOx emissions due to the rapid changes in diesel engine speed, torque, and fuel injection during transient conditions. The deterioration of in-cylinder combustion during these transient conditions affects pollutant emissions, posing difficulties for precise prediction using these methods.
In recent years, there has been a remarkable increase in the utilization of machine learning techniques to address cutting-edge challenges in various fields, driven by the wave of interdisciplinary research [8]. Among these techniques, the LSTM network stands out for its robust capability to tackle both long-term and short-term problems. It has demonstrated exceptional performance in predicting nonlinear time series data [9], which is particularly relevant in the case of diesel engine transient emission data, as it is also represented in the form of time series data. Consequently, the LSTM network has found a wide application in the prediction of diesel engine transient emissions. For instance, Yang et al. [10] and Dai Jinchi et al. [11] have employed the LSTM network to forecast NOx emissions under transient conditions. Seunghyup et al. [12] utilized a Bayesian hyperparametric optimization deep neural network model to predict NOx emissions under transient conditions. Yang Rong et al. [13] employed a genetic algorithm to optimize the LSTM network to predict transient NOx emissions in diesel engines. While all of the aforementioned models demonstrate some ability to predict transient emissions in diesel engines, they fail to fully capture the spatial correlation characteristics among various control parameters, such as the speed, torque, and fuel injection control. As a result, the prediction accuracy of these models under transient working conditions is compromised. In light of this limitation, the present study proposes the utilization of a CNN.
The CNN has achieved significant advancements in various domains such as image processing, data processing, air pollutant prediction, and power system load prediction [14][15][16][17]. It has also garnered considerable attention in the prediction of diesel engine emissions [18]. The CNN network structure possesses three key characteristics: local connection, weight sharing, and pooling [19]. These properties grant the network a certain level of invariance to translation, scaling, and rotation, enabling it to capture the spatial characteristics of data [20]. Consequently, when confronted with the spatial correlation among diesel engine control parameters, the CNN can effectively extract relevant feature information. However, relying solely on the spatial characteristics extracted by the CNN is insufficient to address the prediction challenges associated with diesel engine transient emissions, which necessitate the consideration of both temporal and spatial series.
Based on the above problems, to enhance the accuracy of predicting NOx emissions from diesel engines in transient environments, a method that combines CNN with LSTM is proposed. This approach establishes a diesel engine NOx emission prediction model known as CNN-LSTM that is specifically designed for transient working conditions. By harnessing the spatial data extraction capabilities of the CNN, this model generates a plethora of valuable inputs that effectively complement the LSTM network model [21], enabling a comprehensive consideration of transient emission data from diesel engines.

Experimental Equipment
The test was conducted on a supercharged in-line 4-cylinder electronically controlled high-pressure common rail diesel engine, which complies with the national emission standards. Table 1 presents the key technical parameters of the engine. The test employed several essential instruments and equipment, including the AVL PUMA measurement and control system, AVL electric dynamometer, AVL AMA i60 exhaust measurement system, AVL FTIR i60 exhaust measurement system, 553 coolant temperature control system, and 735 fuel consumption meter. The layout and physical configuration of the test bench can be observed in Figures 1 and 2, respectively. and 735 fuel consumption meter. The layout and physical configuration of the test bench can be observed in Figures 1 and 2, respectively.     and 735 fuel consumption meter. The layout and physical configuration of the test bench can be observed in Figures 1 and 2, respectively.

Experimental Scheme
With the promulgation of nation VI emission regulations, it has become imperative to calibrate the hot and cold WHTC tailpipes of the diesel engine to comply with the emission limit. In order to enhance development efficiency, the calibration is primarily focused on the pure hot WHT. Therefore, this paper chooses the hot cycle within the WHTC test cycle as the testing condition for the proposed test system.
The WHTC test cycle is a test cycle proposed by Europe for Euro-VI emission standards. This test cycle takes into full consideration the road conditions worldwide and the driving characteristics of different vehicles. It consists of three main components: the cold start emission test, the hot dip emission test, and the hot start emission test. The cold start cycle and hot start cycle have a duration of 1800 s each, and their operating conditions are defined by a set of standard percentages of speed and torque that change every second. Figure 3 illustrates that, upon completion of the cold start test, a hot dip procedure lasting 10 ± 1 min should be immediately conducted as the engine's hot start test pretreatment. The cold start test contributes to 14% of the final emission results, while the hot start test accounts for the remaining 86% [22].

Experimental Scheme
With the promulgation of nation VI emission regulations, it has become imperative to calibrate the hot and cold WHTC tailpipes of the diesel engine to comply with the emission limit. In order to enhance development efficiency, the calibration is primarily focused on the pure hot WHT. Therefore, this paper chooses the hot cycle within the WHTC test cycle as the testing condition for the proposed test system.
The WHTC test cycle is a test cycle proposed by Europe for Euro-VI emission standards. This test cycle takes into full consideration the road conditions worldwide and the driving characteristics of different vehicles. It consists of three main components: the cold start emission test, the hot dip emission test, and the hot start emission test. The cold start cycle and hot start cycle have a duration of 1800 s each, and their operating conditions are defined by a set of standard percentages of speed and torque that change every second. Figure 3 illustrates that, upon completion of the cold start test, a hot dip procedure lasting 10 ± 1 min should be immediately conducted as the engine's hot start test pretreatment. The cold start test contributes to 14% of the final emission results, while the hot start test accounts for the remaining 86% [22]. According to the WHTC program's cycle condition, which consists of the last 600 s with a tolerance of plus or minus 10 s, the hot cycle condition within this program is selected for testing purposes. Once the hot soak period of the diesel engine is completed, the bench WHTC cycle program is initiated to conduct the official hot cycle test. During this test, 11 relevant parameters are collected and recorded every second. Among these parameters, NOx emission is chosen as the output parameter for the prediction model. The input parameters for research and analysis include speed, torque, fuel pressure, fuel temperature, intake flow, pre-injection timing, pre-injection quantity, total fuel injection quantity, atmospheric temperature, and atmospheric humidity. These parameters serve as the basis for further investigation and analysis. Table 2 displays some of the thermal cycle data obtained from the testing process. According to the WHTC program's cycle condition, which consists of the last 600 s with a tolerance of plus or minus 10 s, the hot cycle condition within this program is selected for testing purposes. Once the hot soak period of the diesel engine is completed, the bench WHTC cycle program is initiated to conduct the official hot cycle test. During this test, 11 relevant parameters are collected and recorded every second. Among these parameters, NOx emission is chosen as the output parameter for the prediction model. The input parameters for research and analysis include speed, torque, fuel pressure, fuel temperature, intake flow, pre-injection timing, pre-injection quantity, total fuel injection quantity, atmospheric temperature, and atmospheric humidity. These parameters serve as the basis for further investigation and analysis. Table 2 displays some of the thermal cycle data obtained from the testing process.

Data Correlation Analysis
Due to the excessive number of total sample input parameters recorded during the initial collection, and the lack of significant correlation between some parameters and the generation of NOx emissions, it becomes necessary to analyze each of the aforementioned parameters individually. By eliminating parameters with low correlation, we can effectively reduce the model's dimensionality and enhance its accuracy.
The Spearman correlation coefficient and Pearson correlation coefficient are employed to analyze the pre-selected input parameters separately. The Spearman correlation coefficient is utilized to evaluate the degree of nonlinear correlation between parameters, while the Pearson correlation coefficient is utilized to assess the linear correlation between parameters [13]. The calculation equation for the Spearman correlation coefficient is presented in Equation (1) [23]: where n is the total sample number, d 2 i is the rank difference of two variables after sorting, and d i = (x i − y i ), x i is the corresponding input parameter, y i is the emission value of NOx. The calculation equation of the correlation coefficient of Pearson was shown in Equation (2) [24]: where x i is the corresponding input parameter, y i is the emission value of NOx, cov(X, Y) measures the covariance of two sets of data X and Y, and σ X and σ Y are the standard deviations of X and Y.
The results of the correlation coefficient analysis between each pre-selected parameter and NOx emissions are presented in Table 3. A positive correlation coefficient value indicates a positive correlation between the two parameters, suggesting that their changing trends are in the same direction. Conversely, a negative correlation coefficient value indicates a negative correlation between the two parameters, indicating that their changing Energies 2023, 16, 5347 6 of 21 trends are the opposite. Moreover, when the absolute value of the correlation coefficient is closer to 1, it signifies a stronger correlation and a greater influence relationship between the parameters. Conversely, when the absolute value of the correlation coefficient is closer to 0, it indicates a weaker correlation and a smaller influence relationship [25][26][27]. Based on the analysis in Table 3, it is observed that the Pearson correlation coefficient and Spearman correlation coefficient of fuel temperature, atmospheric temperature, and atmospheric humidity with NOx emissions are small. Hence, the correlation between these three pre-selected parameters and NOx emissions is minimal and they can be eliminated. Additionally, the table reveals that the Pearson correlation coefficient for the pre-injection quantity is 0.14, but it increases to 0.35 in the Spearman correlation coefficient. This indicates that while the linear correlation between the pre-injection quantity and NOx emissions is relatively small, the degree of nonlinear correlation remains significant. Therefore, the pre-injection quantity is included as one of the input variables.
To summarize, this study excludes only three variables (fuel temperature, atmospheric temperature, and atmospheric humidity) from the pre-selected input parameters, while retaining the remaining input parameters.

Data Normalization Processing
Once the input and output parameters of the model have been determined, it is crucial to normalize the data to address potential issues arising from significant differences in data orders between the input and output parameters. By normalizing the data and mapping the values to the range of [0, 1], we can prevent excessive prediction errors and facilitate faster convergence of the model. The calculation equation for normalization is provided in Equation (3): where x is the value of each input parameter.

CNN Neural Network
The CNN is a type of deep neural network that incorporates convolutional structures. It is primarily composed of a convolution layer, pooling layer, and fully connected layer [28]. The convolution layer is responsible for feature extraction, followed by the pooling layer, which reduces the parameter dimension and improves the training efficiency by transmitting data information to the next layer in the network. Finally, the results are output through linear transformation in the fully connected layer.
Different convolutional dimensions are utilized in CNNs for various processing domains. One-dimensional convolutional neural networks (1D-CNN) are employed for processing one-dimensional and two-dimensional data or images. Two-dimensional convolutional neural networks (2D-CNN) are mainly used for image classification tasks, while Energies 2023, 16, 5347 7 of 21 three-dimensional convolutional neural networks (3D-CNN) are predominantly applied in video processing and the detection of actions and behaviors of individuals [29]. The structure of a CNN is illustrated in Figure 4. layer, which reduces the parameter dimension and improves the training efficiency by transmitting data information to the next layer in the network. Finally, the results are output through linear transformation in the fully connected layer.
Different convolutional dimensions are utilized in CNNs for various processing domains. One-dimensional convolutional neural networks (1D-CNN) are employed for processing one-dimensional and two-dimensional data or images. Two-dimensional convolutional neural networks (2D-CNN) are mainly used for image classification tasks, while three-dimensional convolutional neural networks (3D-CNN) are predominantly applied in video processing and the detection of actions and behaviors of individuals [29]. The structure of a CNN is illustrated in Figure 4. The 1D-CNN utilizes matrix multiplication to perform convolution calculations on time series data and it maps data variables to a high-dimensional space and extracts local features based on spatial and time series correlations. During the data processing, the convolution kernel of the 1D-CNN can only move in the horizontal or vertical direction of the data. In the case of time series data, the convolution kernel slides along the time series direction, making it particularly suitable for processing time series data recorded by sensors. It is also well-suited for analyzing various types of signal data within a fixed length of time.
Since transient NOx emission data from diesel engines involves time series emissions recorded by sensors, the CNN can effectively extract characteristics from the emission data and enhance the prediction accuracy of the model. The calculation equation for a onedimensional convolution is shown in Equation (4)  is the corresponding bias. To enhance the fitting ability and sparsity of the CNN, the ReLU function has been chosen as the activation function. When compared to the Sigmoid and tanh function, the ReLU function effectively addresses the issues of gradient disappearance and slow convergence. Its calculation equation for the ReLU function is shown in Equation (5): where a is the value obtained after the convolution operation. The 1D-CNN utilizes matrix multiplication to perform convolution calculations on time series data and it maps data variables to a high-dimensional space and extracts local features based on spatial and time series correlations. During the data processing, the convolution kernel of the 1D-CNN can only move in the horizontal or vertical direction of the data. In the case of time series data, the convolution kernel slides along the time series direction, making it particularly suitable for processing time series data recorded by sensors. It is also well-suited for analyzing various types of signal data within a fixed length of time.
Since transient NOx emission data from diesel engines involves time series emissions recorded by sensors, the CNN can effectively extract characteristics from the emission data and enhance the prediction accuracy of the model. The calculation equation for a one-dimensional convolution is shown in Equation (4) [30]: where k n m is the mth feature map of layer n, f (.) is an activation function, N is the input feature size, * is the convolution operation between the lth feature map of the former layer [(l -1)th layer] and the convolution kernel w n lm , and b n m is the corresponding bias. To enhance the fitting ability and sparsity of the CNN, the ReLU function has been chosen as the activation function. When compared to the Sigmoid and tanh function, the ReLU function effectively addresses the issues of gradient disappearance and slow convergence. Its calculation equation for the ReLU function is shown in Equation (5): where a is the value obtained after the convolution operation. The pooling layer serves the purpose of data and parameter compression, bit reduction, and addressing overfitting issues. It performs downsampling operations, which enhance computation speed and the resilience of extracted features. Additionally, it diminishes redundant features while preserving the key characteristics of NOx emissions from diesel engines. The pooling operation consists of two types: maximum pooling and average pooling. The calculation equation for the pooling operation is shown in Equation (6): where p(i, j) is the value of the ith row in the jth column of the pooling layer output matrix, α(u, v) is the value of the uth row in the vth column of the pooling layer input matrix, and s is the boundary value of the region participating in the set. The data, post-convolution and pooling, is fed into the fully connected layer. Depending on whether the task is regression or classification, different activation functions are employed to produce the final output. Its calculation equation for the fully connected layer is shown in Equation (7): where y is the output value of the fully connected layer, x f is the input value of the fully connected layer, W is the weigh matrix, and B is the bias vector.

LSTM Neural Network
The LSTM network is a specialized type of recurrent neural network (RNN) commonly employed to address the issues of the gradient vanishing or exploding during prolonged information transmission [31]. Unlike the RNN, the LSTM network incorporates a more intricate neuron structure within the hidden layer. It introduces a cell state to retain longterm information and utilizes three control mechanisms: the input gate, forgetting gate, and output gate to regulate the state. Each LSTM module consists of a storage unit and three control gates, as illustrated in Figure 5, representing the fundamental building block of the neural network [32,33].
is the value of the uth row in the vth column of the pooling layer input m and s is the boundary value of the region participating in the set.
The data, post-convolution and pooling, is fed into the fully connected layer pending on whether the task is regression or classification, different activation func are employed to produce the final output. Its calculation equation for the fully conne layer is shown in Equation (7): where y is the output value of the fully connected layer, f x is the input value o fully connected layer, W is the weigh matrix, and B is the bias vector.

LSTM Neural Network
The LSTM network is a specialized type of recurrent neural network (RNN) monly employed to address the issues of the gradient vanishing or exploding during longed information transmission [31]. Unlike the RNN, the LSTM network incorpora more intricate neuron structure within the hidden layer. It introduces a cell state to r long-term information and utilizes three control mechanisms: the input gate, forge gate, and output gate to regulate the state. Each LSTM module consists of a storage and three control gates, as illustrated in Figure 5, representing the fundamental buil block of the neural network [32,33].  The red dotted box in the figure shows the distinctive structure of the forget gate, which plays a crucial role in determining the portion of the cell case that should be forgotten from the previous time step. The calculation formula for the forget gate is shown in Equation (8): where f t is the value of forget gate, δ is the Sigmoid function, W f is the weight of the forget gate, h t−1 is the implied unit of the (t -1)th moment, x t is the input data of the tth moment, and b f is the bias of the forget gate.
The blue-dashed box in the figure depicts the precise structure of the input gate, which is responsible for determining the portion of the network input that should be preserved at the current time step. The calculation formula for the input gate is shown in Equations (9) and (10): Energies 2023, 16, 5347 9 of 21 where i t is the value of input gate, W t is the weight of the input gate, R i is the offset term of the input gate, g t is the input node, W g is the weight of the input node, and b g is the bias of the input node. The green-dashed box in the figure represents the specific structure for updating the cell case. It operates based on the combined influence of the forget gate and input gate, enabling it to retain relevant information from the distant past while discarding irrelevant or invalid information that should not be propagated through the network. The calculation formula for updating the cell case is shown in Equation (11): where c t is the cell case of the tth moment, and c t−1 is the cell case of the (t -1)th moment.
The purple-dotted frame in the figure shows the specific structure of the output gate, which plays a crucial role in determining the impact of long-term memory on the current output and updating the hidden unit. The calculation formula for the output gate is shown in Equations (12) and (13): where o t is the value of output gate, W o is the weight of the output gate, b o is the bias of the output gate, and h t is the output value of the tth moment. The LSTM network excels at preserving the distinctive traits found in long time series data and possesses the capacity for long-term memory. Leveraging its capabilities through sequence learning and feature training, it proves advantageous in enhancing the accuracy of predicting transient NOx emissions in diesel engines.

CNN-LSTM Neural Network Prediction Model
The LSTM network prediction model is employed to effectively model time series data, incorporating high-dimensional feature information extracted by the CNN. By capturing the temporal patterns within these features, the LSTM network model enables the accurate representation of the nonlinear dynamics associated with transient emission in diesel engines. Consequently, this approach enhances the prediction accuracy of NOx emissions in diesel engine transient environments.
The CNN-LSTM network prediction model is typically divided into two components: the CNN's feature extraction module and the LSTM network's time series prediction module. The first part focuses on extracting spatial feature information from preprocessed data related to diesel engine parameters and emissions. This extracted feature information serves as the input for the LSTM network model. The second part employs the LSTM network for its ability to maintain long-term memory, enabling the accurate extraction of time series characteristics from the data. Consequently, the model can effectively predict transient NOx emissions in diesel engines.

Determination of Structural Parameters of CNN-LSTM Neural Network
In the process of debugging the CNN structure, it has been observed that when the number of convolution layers is too small, the model may suffer from underfitting due to insufficient feature extraction capabilities. Conversely, an excessive number of convolution layers can lead to overfitting. While the pool layer can mitigate overfitting, employing too many pool layers results in a reduced number of feature dimensions being fed into the LSTM network. This reduction can adversely affect the extraction of time series features by LSTM, consequently diminishing the effectiveness of network fitting. After numerous rounds of debugging, a network structure comprising three convolution layers, one pool layer, and one flat layer has been ultimately selected to successfully predict the transient NOx emission of diesel engines.
The Adam optimizer is utilized to automate the updating of the weight matrix and bias of the LSTM network model, as well as adaptively adjust the learning rate throughout the training process. A grid search is employed to swiftly optimize parameters, including model depth N l , the number of neurons in hidden layers N u , and the batch size B C , for the LSTM network prediction. The optimized parameters have proven to significantly enhance model performance and improve prediction accuracy.

Optimization of Super-Parameter of Prediction Model by Grid Search Method
The optimization of neural network hyperparameters through the grid search method involves an exhaustive exploration of the hyperparameter space subset of the algorithm [34]. This approach divides the search range into a grid and systematically examines all intersections within it. By evaluating the feedback results from these intersections, the best combination of hyperparameters can be determined. This process provides relatively optimal modeling parameters for the prediction module in the CNN-LSTM network model. The main steps for optimizing the CNN-LSTM network prediction model using the grid search method are outlined below: Different CNN-LSTM neural network prediction models are constructed based on each parameter combination; 3.
The loss function is defined to evaluate the performance of model parameters, and the mean square error (MSE) is adopted as the chosen loss function. The calculation formula for updating the MSE is shown in Equation (14): where n is the total sample number, y i a is the true value, and y i p is the predicted value.

4.
Set the number of network epochs iterations to 50, and obtain the final value of the loss function for each prediction model after the network training reaches the maximum learning iteration; 5.
The optimal solution with the minimum loss function value is selected to determine the optimal hyperparameter combination for the CNN-LSTM network prediction model.
By continuously iterating and optimizing the hyperparameter combinations of the prediction model using the grid search method, and performing training under different hyperparameter combinations, the final optimal hyperparameter combination is obtained as follows: N l = 2; N u = 20; B c = 60. The framework of the transient NOx emission prediction model for diesel engines, based on the CNN-LSTM network optimized using the grid search method, is depicted in Figure 6.

Training and Verification of Prediction Model
After data preprocessing, the WHTC thermal cycle dataset consisting of 1800 data points is divided into a training set and a validation set using an 8:2 ratio. The training set includes seven input features and one output label, which are fed into a CNN and convolved three times. Following the convolutional operations, a ReLU activation function is applied to map the features to high-dimensional nonlinear intervals, preventing overfitting. Subsequently, a one-layer maximum pooling layer is used to reduce the output dimension. The number of convolution kernels is set to 32, 64, and 128 sequentially, with convolution and pooling kernel sizes set to 1 × 3 and a stride of 1 for both the convolutional and pooling layers. After three consecutive convolutions and a maximum pooling operation, a feature matrix of size 128 × 16 is obtained. This matrix is then flattened into a one-

Training and Verification of Prediction Model
After data preprocessing, the WHTC thermal cycle dataset consisting of 1800 data points is divided into a training set and a validation set using an 8:2 ratio. The training set includes seven input features and one output label, which are fed into a CNN and convolved three times. Following the convolutional operations, a ReLU activation function is applied to map the features to high-dimensional nonlinear intervals, preventing overfitting. Subsequently, a one-layer maximum pooling layer is used to reduce the output dimension. The number of convolution kernels is set to 32, 64, and 128 sequentially, with convolution and pooling kernel sizes set to 1 × 3 and a stride of 1 for both the convolutional and pooling layers. After three consecutive convolutions and a maximum pooling operation, a feature matrix of size 128 × 16 is obtained. This matrix is then flattened into a one-dimensional vector of length 2048, serving as the global feature extraction for the LSTM network. The feature extraction process of the 1D-CNN is illustrated in Figure 7.  To ensure the accuracy of the model, the optimal hyperparameter combination obtained through the grid search is used as an input for the LSTM network. The final network structure consists of one input layer, two hidden layers (each with 20 neurons), one output layer, and one fully connected layer. The mean square error (MSE) is chosen as the loss function for fitting and predicting the transient NOx emission data of diesel engines. The LSTM network iteratively trains the input gate, forgetting gate, and output gate to adjust their respective parameters. The feature vectors extracted by the CNN are trained, and the weights of the neural network are updated iteratively using the Adam algorithm. The initial learning rate is set to 0.001, and the weights and biases of each neuron are continually updated using the momentum and adaptive learning rate, resulting in an optimized output from the loss function [35].
After 50 iterations of training and validation with the first group of data, the optimal model is obtained. Finally, the prediction dataset is inputted into the optimal model to predict a new transient NOx emission value for the diesel engine. The loss trend of the training set and validation set of the model is depicted in Figure 8. From the curve in the figure, it can be observed that the loss values of the training set and validation set generally decrease with oscillations as the number of iterations increases. The training results demonstrate that the model converges well and achieves good training performance without overfitting. To ensure the accuracy of the model, the optimal hyperparameter combination obtained through the grid search is used as an input for the LSTM network. The final network structure consists of one input layer, two hidden layers (each with 20 neurons), one output layer, and one fully connected layer. The mean square error (MSE) is chosen as the loss function for fitting and predicting the transient NOx emission data of diesel engines. The LSTM network iteratively trains the input gate, forgetting gate, and output gate to adjust their respective parameters. The feature vectors extracted by the CNN are trained, and the weights of the neural network are updated iteratively using the Adam algorithm. The initial learning rate is set to 0.001, and the weights and biases of each neuron are continually updated using the momentum and adaptive learning rate, resulting in an optimized output from the loss function [35].
After 50 iterations of training and validation with the first group of data, the optimal model is obtained. Finally, the prediction dataset is inputted into the optimal model to predict a new transient NOx emission value for the diesel engine. The loss trend of the training set and validation set of the model is depicted in Figure 8. From the curve in the figure, it can be observed that the loss values of the training set and validation set generally decrease with oscillations as the number of iterations increases. The training results demonstrate that the model converges well and achieves good training performance without overfitting.

Model Prediction Evaluation Index
To evaluate the performance of the prediction model, four evaluation metrics will be utilized: the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and fitting coefficient (R 2 ). These metrics provide a comprehensive assessment of the model's performance. The calculation formulas for these four evaluation metrics are shown in Equations (15)- (18): where n is the total sample number, y i a is the true value, y i p is the predicted value, and y i b is the average of the actual responses.

Model Prediction Evaluation Index
To evaluate the performance of the prediction model, four evaluation metrics will be utilized: the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and fitting coefficient (R 2 ). These metrics provide a comprehensive assessment of the model's performance. The calculation formulas for these four evaluation metrics are shown in Equations (15) where n is the total sample number, i a y is the true value, i p y is the predicted value, and i b y is the average of the actual responses.

Comparison of Model Prediction
To compare the advantages of the CNN-LSTM network prediction model in predicting NOx emissions under transient working conditions of a diesel engine, new NOx emission data is collected during the WHTC thermal cycle test as the prediction dataset. This dataset is then compared with the predictions from the LSTM network prediction model, CNN prediction model, and the grid search-optimized BP neural network prediction

Comparison of Model Prediction
To compare the advantages of the CNN-LSTM network prediction model in predicting NOx emissions under transient working conditions of a diesel engine, new NOx emission data is collected during the WHTC thermal cycle test as the prediction dataset. This dataset is then compared with the predictions from the LSTM network prediction model, CNN prediction model, and the grid search-optimized BP neural network prediction model. The network structure design used for this comparison is as follows: 1.
The structure of the CNN prediction model optimized by grid search consists of three convolutional layers, one maximum pooling layer, and two fully connected layers. The number of convolutional kernels is set to 16, 32, and 64, with a kernel size and pooled kernel size of 1 × 3. The pooled characteristic data is then fitted through the fully connected layers. To prevent overfitting, the ReLU activation function is employed. This configuration enables the prediction of transient NOx emissions in diesel engines; 2.
The LSTM network prediction model optimized by grid search is a network structure composed of one input layer, two hidden layers (each containing 64 neurons), one output layer, and one fully connected layer. The MSE is utilized as the loss function in order to predict the transient NOx emissions of diesel engines; 3.
The structure of the BP neural network prediction model optimized by the grid search consists of one input layer, eleven hidden layers, and one output layer. Each hidden layer is comprised of 35 neurons. This network configuration, which utilizes the mean square error as the loss function, enables the prediction of transient NOx emissions in diesel engines.
The training curves of each neural network prediction model are depicted in Figure 9. Based on the four different colored curves, it can be observed that the training loss of each prediction model generally exhibits a downward trend with oscillations as the number of iterations increases. After approximately 20 iterations, the CNN-LSTM, CNN, and LSTM network prediction models tend to stabilize, while the BP neural network prediction model tends to stabilize after around 40 iterations. This indicates that all four neural network prediction model structures converge without overfitting. Furthermore, it is noteworthy that the CNN-LSTM network model demonstrates a relatively fast convergence speed compared to the other three prediction models, second only to the CNN model. Additionally, once reaching a stable state, the model exhibits relatively low loss values, suggesting that the prediction model is more robust compared to the LSTM and CNN network prediction models.
hidden layer is comprised of 35 neurons. This network configuration, which utilizes the mean square error as the loss function, enables the prediction of transient NOx emissions in diesel engines.
The training curves of each neural network prediction model are depicted in Figure  9. Based on the four different colored curves, it can be observed that the training loss of each prediction model generally exhibits a downward trend with oscillations as the number of iterations increases. After approximately 20 iterations, the CNN-LSTM, CNN, and LSTM network prediction models tend to stabilize, while the BP neural network prediction model tends to stabilize after around 40 iterations. This indicates that all four neural network prediction model structures converge without overfitting. Furthermore, it is noteworthy that the CNN-LSTM network model demonstrates a relatively fast convergence speed compared to the other three prediction models, second only to the CNN model. Additionally, once reaching a stable state, the model exhibits relatively low loss values, suggesting that the prediction model is more robust compared to the LSTM and CNN network prediction models. On the other hand, the fitting effects of the CNN, LSTM network, and BP neural network models are slightly inferior. This can be attributed to the fact that these three neural networks are only sensitive to either spatial characteristics or time series characteristics individually. Consequently, the degree of fitting for NOx emissions under transient thermal cycles is insufficient for these three neural network prediction models. Table 4 and Figure 11 display the prediction errors of the four models. In comparison to the CNN, LSTM network, and BP neural network, the CNN-LSTM network exhibits significantly smaller prediction errors and higher accuracy. When comparing the prediction results of the CNN-LSTM network model to the other three neural network prediction models, the MAE, RMSE, and R 2 values are 23 Figure 12 illustrates the regression accuracy of each model in the prediction set. It can be observed that the deviations of the prediction results for all four models are randomly distributed on both sides of the regression line, which aligns with the random distribution of experimental errors. In terms of the LSTM network prediction model, a few points with significant deviations can be observed at low NOx emissions, indicating average prediction accuracy of the model in that range. Both the CNN prediction model and the BP neural network prediction model exhibit large deviation points throughout the entire emission range, with a high dispersion degree. This suggests that the prediction precision of these models for the overall emission cycle is low. However, in the CNN-LSTM network prediction model, the deviation distribution for each segment of the emission prediction is relatively uniform, without any points displaying a wide range of deviations. Additionally, the dispersion degree is low. This indicates that the model addresses the problem of poor prediction accuracy encountered by the aforementioned three neural network models at specific points. Moreover, it demonstrates that the CNN-LSTM network prediction model possesses higher nonlinear fitting and prediction accuracy compared to the CNN, LSTM network, and BP neural network prediction models.
can effectively explore the relationships among variables in transient NOx emissions of a diesel engine and extract crucial time-series characteristic information from historical NOx emission data, thus exhibiting robust learning capabilities.  Figure 11. Prediction accuracy chart of different models. Figure 12 illustrates the regression accuracy of each model in the prediction set. It can be observed that the deviations of the prediction results for all four models are randomly distributed on both sides of the regression line, which aligns with the random distribution of experimental errors. In terms of the LSTM network prediction model, a few points with significant deviations can be observed at low NOx emissions, indicating average prediction accuracy of the model in that range. Both the CNN prediction model and the BP neural network prediction model exhibit large deviation points throughout the entire emission range, with a high dispersion degree. This suggests that the prediction precision of these models for the overall emission cycle is low. However, in the CNN-LSTM network prediction model, the deviation distribution for each segment of the emission prediction is relatively uniform, without any points displaying a wide range of deviations. Additionally, the dispersion degree is low. This indicates that the model addresses the problem of poor prediction accuracy encountered by the aforementioned three neural network models at specific points. Moreover, it demonstrates that the CNN-LSTM network prediction model possesses higher nonlinear fitting and prediction accuracy compared to the CNN, LSTM network, and BP neural network prediction models.

Conclusions
In order to address the current issue of low accuracy in predicting transient NOx emissions of diesel engines, this paper proposes a prediction model, namely the CNN-LSTM network, which combines the CNN with LSTM network. Based on the validation of experimental data, the following conclusions have been drawn:

Conclusions
In order to address the current issue of low accuracy in predicting transient NOx emissions of diesel engines, this paper proposes a prediction model, namely the CNN-LSTM network, which combines the CNN with LSTM network. Based on the validation of experimental data, the following conclusions have been drawn:

Data Availability Statement:
The study did not report any data.

Conflicts of Interest:
The authors declare no conflict of interest.