Hybrid LSTM + 1DCNN Approach to Forecasting Torque Internal Combustion Engines

: Innovative solutions are now being researched to manage the ever-increasing amount of data required to optimize the performance of internal combustion engines. Machine learning approaches have shown to be a valuable tool for signal prediction due to their real-time and cost-effective deployment. Among them, the architecture consisting of long short-term memory (LSTM) and one-dimensional convolutional neural networks (1DCNNs) has emerged as a highly promising and effective option to replace physical sensors. This architecture combines the capacity of LSTM to detect patterns and relationships in smaller segments of a signal with the ability of 1DCNNs to detect patterns and relationships in larger segments of a signal. The purpose of this work is to assess the feasibility of substituting a physical device dedicated to calculating the torque supplied by a spark-ignition engine. The suggested architecture was trained and tested using signals from the ﬁeld during a test campaign conducted under transient operating conditions. The results reveal that LSTM + 1DCNN is particularly well suited for signal prediction with considerable variability. It constantly outperforms other architectures used for comparison, with average error percentages of less than 2%, proving the architecture’s ability to replace physical sensors.


Introduction
More rigorous regulations regarding pollutant emissions from internal combustion engines (ICEs), along with customer demands for increased performance, have made vehicle control increasingly challenging [1][2][3]. The calibration and run-time operation of engines are conducted in diverse areas, such as automotive or aerospace industries, for which huge amounts of data are required [4][5][6]. To successfully handle and process the information produced, significant computing efforts are necessary [7]. Many measurements obtained by sensors and monitoring systems during engine calibration operations are critical for fine tuning and maximizing performance while also assuring efficient and dependable operation [8]. Furthermore, during run-time operations, the engines' real-time outputs are critical for monitoring engine health and detecting possible anomalies [9]. Advanced approaches are being researched to improve engine performance while lowering consumption, pollutant emissions, and operating longevity [10][11][12]. In the automobile industry, machine learning (ML) techniques are rapidly being employed to enhance computing performance and minimize costs [13]. Because of their small setup and low-cost hardware implementation, as well as their capacity to forecast operational parameters, they can reduce the number of operating points to be examined, resulting in significant memory and computational speed advantages [14]. LSTM + 1DCNN appears to be a promising method for performing signal analyses among ML approaches [15].
The LSTM (long short-term memory) approach is a sort of recurrent neural network (RNN) that can reproduce the sequential nature of non-linear observations across time [16,17].

Experimental Setup
Tests were performed on a 1L 3-cylinder turbocharged engine with a maximum power of 84 CV at 5250 rpm and a maximum torque of 120 Nm at 3250 rpm. The internal cylinder bore is 72 mm while the piston stroke is 81.8 mm. The compression ratio is equal to 10:1. The engine operates with port fuel injection (PFI) with European market gasoline (E5, with RON = 95 and MON = 85) injected at 4.2 bar of absolute pressure. A Borghi & Saveri eddy current brake dynamometer of 600 CV ensures the engine speed is in firing condition (Figure 1a). A Vascat electric motor of 66.2 kW allows us to control the engine speed both in motored and firing conditions. All the engine parameters are controlled using an EFI Technology EURO-4 engine control unit. The signals coming from thermocouples TCK and pressure sensors PTX 1000 are acquired by National Instrument data acquisition systems. The indicated analysis is performed through a Kistler Kibox combustion analysis system (with a maximum temporal resolution of 0.1 CAD) which receives the pressure signals coming from the piezoresistive sensors (Kistler 4624A) placed in the intake and exhaust ports, the in-cylinder pressure of the piezoelectric sensor (Kistler 5018) placed on a side of the combustion chamber beside the flywheel, the ignition signal from ECU, and the absolute crank angular position measured by an optical encoder (AVL 365C). Due to structural and mechanical constraints, only the combustion chamber adjacent to the flywheel has been equipped with a piezoelectric sensor, which is used to determine the indicated mean effective pressure (IMEP). The torque delivered by the engine is measured using a torquemeter positioned near the engine crankshaft. All of the above quantities are recorded by AdaMo Hyper software during engine operations, allowing us to simultaneously manage the engine's speed, torque, and valve throttle position in both firing and motored states. Figure 1b summarizes the experimental layout.

Experimental Setup
Tests were performed on a 1L 3-cylinder turbocharged engine with a maximum power of 84 CV at 5250 rpm and a maximum torque of 120 Nm at 3250 rpm. The internal cylinder bore is 72 mm while the piston stroke is 81.8 mm. The compression ratio is equal to 10:1. The engine operates with port fuel injection (PFI) with European market gasoline (E5, with RON = 95 and MON = 85) injected at 4.2 bar of absolute pressure. A Borghi & Saveri eddy current brake dynamometer of 600 CV ensures the engine speed is in firing condition (Figure 1a). A Vascat electric motor of 66.2 kW allows us to control the engine speed both in motored and firing conditions. All the engine parameters are controlled using an EFI Technology EURO-4 engine control unit. The signals coming from thermocouples TCK and pressure sensors PTX 1000 are acquired by National Instrument data acquisition systems. The indicated analysis is performed through a Kistler Kibox combustion analysis system (with a maximum temporal resolution of 0.1 CAD) which receives the pressure signals coming from the piezoresistive sensors (Kistler 4624A) placed in the intake and exhaust ports, the in-cylinder pressure of the piezoelectric sensor (Kistler 5018) placed on a side of the combustion chamber beside the flywheel, the ignition signal from ECU, and the absolute crank angular position measured by an optical encoder (AVL 365C). Due to structural and mechanical constraints, only the combustion chamber adjacent to the flywheel has been equipped with a piezoelectric sensor, which is used to determine the indicated mean effective pressure (IMEP). The torque delivered by the engine is measured using a torquemeter positioned near the engine crankshaft. All of the above quantities are recorded by AdaMo Hyper software during engine operations, allowing us to simultaneously manage the engine's speed, torque, and valve throttle position in both firing and motored states. Figure 1b summarizes the experimental layout.

Case Study
A transient cycle ( Figure 2) was chosen to preliminarily evaluate the performance of the LSTM + 1DCNN proposed algorithm in predicting the torque delivered by the threecylinder SI engine. A total of 12 variables acquired by AdaMo Hyper were initially selected among the other characteristics as input parameters.

Case Study
A transient cycle ( Figure 2) was chosen to preliminarily evaluate the performance of the LSTM + 1DCNN proposed algorithm in predicting the torque delivered by the three-cylinder SI engine. A total of 12 variables acquired by AdaMo Hyper were initially selected among the other characteristics as input parameters.
(TC_Turbine IN, P_Turbine IN) and after the turbine (TC_Turbine OUT and P_Turbine OUT), and temperature of the engine oil (TC_Engine Oil).
• Parameters related to the AdaMo actuation: throttle valve opening (Throttle Position) and engine speed (Engine speed).
The cycle comprised an input matrix of [12 × 28,800] samples and an output matrix of [1 × 28,800] samples. A total of 80% of the entire dataset was used for training sessions and the remaining 20% for the test sessions, i.e., torque prediction ( Figure 2). It is feasible to efficiently reduce the dimensions of the model and improve its accuracy by removing parameters with low correlation. As a result, a preliminary analysis using the Shapley value [25] was conducted on the complete dataset. SHAP attempts to explain an instance's prediction by assessing the contribution of each attribute to the forecast. The authors were able to quantify the impact of the single measured quantities on the objective function using the average absolute Shapley values (ABSVs) [26]. The less influential parameters, i.e., TC_Turbine OUT, SparkAdvance, and Throttle Position (Figure 3), are excluded by the initial input dataset since they present the lowest percentage of impact. In this way, the number of input parameters is reduced from 12 to 9.   The cycle comprised an input matrix of [12 × 28,800] samples and an output matrix of [1 × 28,800] samples. A total of 80% of the entire dataset was used for training sessions and the remaining 20% for the test sessions, i.e., torque prediction ( Figure 2).
It is feasible to efficiently reduce the dimensions of the model and improve its accuracy by removing parameters with low correlation. As a result, a preliminary analysis using the Shapley value [25] was conducted on the complete dataset. SHAP attempts to explain an instance's prediction by assessing the contribution of each attribute to the forecast. The authors were able to quantify the impact of the single measured quantities on the objective function using the average absolute Shapley values (ABSVs) [26]. The less influential parameters, i.e., TC_Turbine OUT, SparkAdvance, and Throttle Position (Figure 3), are excluded by the initial input dataset since they present the lowest percentage of impact. In this way, the number of input parameters is reduced from 12 to 9.
Vehicles 2023, 5, FOR PEER REVIEW 4 • Parameters coming from ECU: activation time of the injector (InjectionTime) and ignition timing of the spark (SparkAdvance) at the first cylinder beside the flywheel.

•
Parameters coming from pressure sensors and thermocouples: temperature of the air before the filter (TC_Air_Intake), temperature and pressure of the air at the intake pipe (TC_ETB_OUT and MAP), pressure and temperature of the exhaust gas before (TC_Turbine IN, P_Turbine IN) and after the turbine (TC_Turbine OUT and P_Turbine OUT), and temperature of the engine oil (TC_Engine Oil).

•
Parameters related to the AdaMo actuation: throttle valve opening (Throttle Position) and engine speed (Engine speed).
The cycle comprised an input matrix of [12 × 28,800] samples and an output matrix of [1 × 28,800] samples. A total of 80% of the entire dataset was used for training sessions and the remaining 20% for the test sessions, i.e., torque prediction ( Figure 2). It is feasible to efficiently reduce the dimensions of the model and improve its accuracy by removing parameters with low correlation. As a result, a preliminary analysis using the Shapley value [25] was conducted on the complete dataset. SHAP attempts to explain an instance's prediction by assessing the contribution of each attribute to the forecast. The authors were able to quantify the impact of the single measured quantities on the objective function using the average absolute Shapley values (ABSVs) [26]. The less influential parameters, i.e., TC_Turbine OUT, SparkAdvance, and Throttle Position (Figure 3), are excluded by the initial input dataset since they present the lowest percentage of impact. In this way, the number of input parameters is reduced from 12 to 9.  Previous work by the same research group [2,24] has shown that when the architectures operate with the removal of the less relevant parameters, the performance improves. Based on this, the current work solely illustrates the architecture's predicting performance with the previously established 9 input variables. After identifying the input parameters using the prior analysis, the data are normalized to reduce excessive prediction mistakes and to allow the architecture to converge faster. In this context, the normalization process allows for the avoidance of problems caused by differences in input and output parameters. The values supplied are mapped to the range of [0, 1]. Following the prediction procedure, the predicted data are de-normalized to provide a direct comparison to the actual experimentally acquired target. Figure 4 describes the entire dataset used in this activity and the division between input and output parameters for each analyzed case.
Vehicles 2023, 5, FOR PEER REVIEW 5 Previous work by the same research group [2,24] has shown that when the architectures operate with the removal of the less relevant parameters, the performance improves. Based on this, the current work solely illustrates the architecture's predicting performance with the previously established 9 input variables. After identifying the input parameters using the prior analysis, the data are normalized to reduce excessive prediction mistakes and to allow the architecture to converge faster. In this context, the normalization process allows for the avoidance of problems caused by differences in input and output parameters. The values supplied are mapped to the range of [0, 1]. Following the prediction procedure, the predicted data are de-normalized to provide a direct comparison to the actual experimentally acquired target. Figure 4 describes the entire dataset used in this activity and the division between input and output parameters for each analyzed case.

Structure of the LSTM +1 DCNN model
The predictive scheme of the LSTM + 1DCNN [27] structure used in this work for the torque prediction is reported in Figure 5a and it follows the scheme reported in [24]: • A SequenceInputLayer is used to pass the dataset to the network. Such a layer enters the sequence data into the network by setting the size and building the related structures.

•
A one-dimensional CNN layer applies a 1-D convolutional filter to each input frame. To perform convolution operations on time series data, the 1D-CNN employs matrix multiplication. It maps the data variables to a higher-dimensional space and finds local features based on spatial and temporal correlations. The convolution kernel of the 1D-CNN moves horizontally or vertically along the data in this process, depending on the nature of the data. The kernel for time series data moves along the time axis, making it excellent for examining sensor data over time [23]. This approach is especially beneficial for analyzing signal data in a short period of time. Because torque data includes times series recorded by sensors, the CNN can efficiently extract characteristics from such variable data and improve the prediction accuracy of the model.

•
ReLu activation function was chosen to improve CNN fitting and sparsity because of its ability to address difficulties such as delayed convergence and gradient disappearance [28].  The predictive scheme of the LSTM + 1DCNN [27] structure used in this work for the torque prediction is reported in Figure 5a and it follows the scheme reported in [24]: • A SequenceInputLayer is used to pass the dataset to the network. Such a layer enters the sequence data into the network by setting the size and building the related structures. • A one-dimensional CNN layer applies a 1-D convolutional filter to each input frame.
To perform convolution operations on time series data, the 1D-CNN employs matrix multiplication. It maps the data variables to a higher-dimensional space and finds local features based on spatial and temporal correlations. The convolution kernel of the 1D-CNN moves horizontally or vertically along the data in this process, depending on the nature of the data. The kernel for time series data moves along the time axis, making it excellent for examining sensor data over time [23]. This approach is especially beneficial for analyzing signal data in a short period of time. Because torque data includes times series recorded by sensors, the CNN can efficiently extract characteristics from such variable data and improve the prediction accuracy of the model.

•
ReLu activation function was chosen to improve CNN fitting and sparsity because of its ability to address difficulties such as delayed convergence and gradient disappearance [28].

•
The AveragePoolingLayer calculates the average value for feature map patches and allows for map downsampling by utilizing the mean value in the 2 × 2 cell square.
It uses downsampling to improve computation speed and the durability of derived characteristics [29].
receding or exploding during long-term information propagation. The LSTM network unlike typical RNNs, has a more sophisticated neuron structure in the hidden layer. It uses three control mechanisms: the input gate, the forgetting gate, and the output gate to retain long-term knowledge [30]. LSTMs provide a distinct additive gradient structure with direct access to forget gate activations, allowing the network to encourage desired behavior from the error gradient by employing frequent port updates at each stage of the learning process [24]. Following LSTM, the feature map is distributed in a temporal vectorial sequence by TimeDistributedLayer, and the loss of mean square error for the specified regression issue is computed by RegressionOutputLevel. The Adam optimizer is used to streamline the updating of the LSTM network model's weight matrix and bias, as well as to adjust the learning rate adaptively during the training process. The performance of the proposed structure is compared with those LSTM is used to process the feature maps at this point. The internal architecture of the LSTM network is made up of components known as gates (Figure 5b) [24]. The LSTM network is a sort of recurrent neural network (RNN) that handles the issue of gradients receding or exploding during long-term information propagation. The LSTM network, unlike typical RNNs, has a more sophisticated neuron structure in the hidden layer. It uses three control mechanisms: the input gate, the forgetting gate, and the output gate to retain long-term knowledge [30]. LSTMs provide a distinct additive gradient structure with direct access to forget gate activations, allowing the network to encourage desired behavior from the error gradient by employing frequent port updates at each stage of the learning process [24]. Following LSTM, the feature map is distributed in a temporal vectorial sequence by TimeDistributedLayer, and the loss of mean square error for the specified regression issue is computed by RegressionOutputLevel.

Definition of the Procedures to Determine the Structural Parameters of the Proposed Model
The definition of the optimal neural structures is determined through preliminary analysis considering the training sessions' performance. To evaluate the training performance of model parameters, the loss function is created, and the mean square error (MSE) is chosen as the loss function [31]: The Adam optimizer is used to streamline the updating of the LSTM network model's weight matrix and bias, as well as to adjust the learning rate adaptively during the training process. The performance of the proposed structure is compared with those deriving from the utilization of other two different architectures, i.e., back propagation (BP) [32] and LSTM, whose optimizations were performed through extensive preliminary analysis as was carried out for the LSTM + 1DCNN architecture. The optimal solutions with the lowest loss function value were chosen to predict the torque signal in the test sessions.

Performance on Training
The first comparison between the proposed algorithms is performed via the train-ing_loss function as described in the previous paragraph.
Starting from the LSTM + 1DCNN structure, Figure 6 shows the val_loss and train-ing_loss of the structure that performed best, i.e., N c = 100, N h = 150, B s = 70, and M d = 1. The training results highlight how the model converges without overfitting. In the context of debugging a 1DCNN structure, it is important to note that having a limited number of neurons (Nc) can cause underfitting because it does not extract enough features. Conversely, having too many N c can lead to overfitting. Although adding pooling layers can counteract overfitting, having an excessive number of them reduces the dimensions of the features passed to the LSTM network. This reduction can harm the LSTM's capability to effectively extract time series features, ultimately reducing the network's ability to fit the data well.
Vehicles 2023, 5, FOR PEER REVIEW 7 deriving from the utilization of other two different architectures, i.e., back propagation (BP) [32] and LSTM, whose optimizations were performed through extensive preliminary analysis as was carried out for the LSTM + 1DCNN architecture. The optimal solutions with the lowest loss function value were chosen to predict the torque signal in the test sessions.

Performance on Training
The first comparison between the proposed algorithms is performed via the train-ing_loss function as described in the previous paragraph.
Starting from the LSTM + 1DCNN structure, Figure 6 shows the val_loss and train-ing_loss of the structure that performed best, i.e., Nc = 100, Nh = 150, Bs = 70, and Md = 1. The training results highlight how the model converges without overfitting. In the context of debugging a 1DCNN structure, it is important to note that having a limited number of neurons (Nc) can cause underfitting because it does not extract enough features. Conversely, having too many Nc can lead to overfitting. Although adding pooling layers can counteract overfitting, having an excessive number of them reduces the dimensions of the features passed to the LSTM network. This reduction can harm the LSTM's capability to effectively extract time series features, ultimately reducing the network's ability to fit the data well. To sum up, the LSTM + 1DCNN structure is composed of a one-dimensional convolutional layer with 100 neurons, a kernel size equal to three, and the ReLu activation function; a max pooling 1D layer which uses a pool size of two and a stride of two; an LSTM layer composed of 150 neurons with a batch size equal to 70 and a model depth of one; and a time distributed layer and a dense layer composed of one unit to perform the regression task.
Concerning the neural structures used for comparison purposes, the best architectures found are structured as follows: • The structure of a back propagation (BP) algorithm [32] is composed of one input layer and three hidden layers, each of which comprises 55,180 and 110 neurons, respectively, as well as one output layer.

•
The LSTM network is composed of one input layer, one hidden layer with 150 neurons, one output layer, and one fully connected layer. Figure 7 compares the training_loss of the compared architectures. All the structures show a decrease trend as the epochs increase, and they tend to stabilize around 50 epochs, To sum up, the LSTM + 1DCNN structure is composed of a one-dimensional convolutional layer with 100 neurons, a kernel size equal to three, and the ReLu activation function; a max pooling 1D layer which uses a pool size of two and a stride of two; an LSTM layer composed of 150 neurons with a batch size equal to 70 and a model depth of one; and a time distributed layer and a dense layer composed of one unit to perform the regression task.
Concerning the neural structures used for comparison purposes, the best architectures found are structured as follows:

•
The structure of a back propagation (BP) algorithm [32] is composed of one input layer and three hidden layers, each of which comprises 55,180 and 110 neurons, respectively, as well as one output layer.

•
The LSTM network is composed of one input layer, one hidden layer with 150 neurons, one output layer, and one fully connected layer. Figure 7 compares the training_loss of the compared architectures. All the structures show a decrease trend as the epochs increase, and they tend to stabilize around 50 epochs, reaching a training_loss value lower than 0.001 around the 100th epoch. This certifies that the models converge without overfitting. In particular, LSTM + 1DCNN shows the fastest convergence speed since it has a training loss below 0.005 already at about 10 epochs. Moreover, once stabilized, it presents very low oscillations, suggesting that this model could be more robust than the others. reaching a training_loss value lower than 0.001 around the 100th epoch. This certifies that the models converge without overfitting. In particular, LSTM + 1DCNN shows the fastest convergence speed since it has a training loss below 0.005 already at about 10 epochs. Moreover, once stabilized, it presents very low oscillations, suggesting that this model could be more robust than the others.  Figure 8 displays the prediction of the torque traces performed by each tested structure. To make a comparison over the entire predicted range, for each forecast i, the deviation of the prediction from the target throughout the range is computed as follows (2):

Performance on Test
where N is the number of samples considered for the test case and i is the i th sample. The average percentage error, i.e., Erravg, is computed as well to draw attention to the global prediction quality. For this kind of application, a maximum critical threshold of 10 is established for the abovementioned errors. Moreover, other two evaluation metrics are used to compare the test performance of the architectures, i.e., the RMSE (Equation (3)) and R 2 (Equation (4)) [24].
where is the average value of the prediction. All the tested structures ( Figure 8) are able to reproduce the trend of the torque over time.
The BP structure has an average error about 2.7% less than the critical threshold of 10. The number of predictions exceeding such a threshold is 320 samples, corresponding to about 5.55% of the total predictions. The structure is capable of following the low fluctuations of the target signal, while when the torque becomes higher, in the range between 250 and 280 s, the model underestimates the maximum peaks even if the percentage errors stay below the critical thresholds. In such a zone, the architecture predicts in advance the target peaks showing underestimations of about 2 Nm, corresponding to Err values higher than 6%. However, it is worth highlighting the structure's capability to follow the fluctuation of the signals in such a large range from 22 to 34 Nm.  To make a comparison over the entire predicted range, for each forecast i, the deviation of the prediction Y i predicted from the target Y i target throughout the range is computed as follows (2):

Performance on Test
where N is the number of samples considered for the test case and i is the ith sample. The average percentage error, i.e., Err avg , is computed as well to draw attention to the global prediction quality. For this kind of application, a maximum critical threshold of 10 is established for the abovementioned errors. Moreover, other two evaluation metrics are used to compare the test performance of the architectures, i.e., the RMSE (Equation (3)) and R 2 (Equation (4)) [24].
where Y i avg is the average value of the prediction. All the tested structures ( Figure 8) are able to reproduce the trend of the torque over time.
The BP structure has an average error about 2.7% less than the critical threshold of 10. The number of predictions exceeding such a threshold is 320 samples, corresponding to about 5.55% of the total predictions. The structure is capable of following the low fluctuations of the target signal, while when the torque becomes higher, in the range between 250 and 280 s, the model underestimates the maximum peaks even if the percentage errors stay below the critical thresholds. In such a zone, the architecture predicts in advance the target peaks showing underestimations of about 2 Nm, corresponding to Err values higher than 6%. However, it is worth highlighting the structure's capability to follow the fluctuation of the signals in such a large range from 22 to 34 Nm.
i.e., between 250 and 280 s, it can reproduce the peaks well without any advances or delays. At around 265 s (the second highest peak zone), the structure underestimates the target value of about 1.5 Nm.
According to the LSTM + 1DCNN structure, the architecture is capable of reproducing the target trend well with average percentage errors below 1.5%, i.e., Erravg = 1.19%. The prediction never exceeds an error of 10% throughout the entire torque signal and better follows the rapid oscillations of the signals compared to the other architectures.
Back Propagation LSTM LSTM + 1DCNN The LSTM model outperforms the BP performance as it presents an Err avg = 1.70% with 67 samples exceeding an Err = 10%, corresponding to 1.16% of the total predictions. With respect to BP model, the LSTM model never exceeds an Err of 20%. LSTM is capable of following the low fluctuations of the target and, in the range of highest torque values, i.e., between 250 and 280 s, it can reproduce the peaks well without any advances or delays. At around 265 s (the second highest peak zone), the structure underestimates the target value of about 1.5 Nm.
According to the LSTM + 1DCNN structure, the architecture is capable of reproducing the target trend well with average percentage errors below 1.5%, i.e., Err avg = 1.19%. The prediction never exceeds an error of 10% throughout the entire torque signal and better follows the rapid oscillations of the signals compared to the other architectures. Figure 9 displays the regression accuracy of each model (Equation (3)) and shows the corresponding RMSE. Compared to the other tested models, LSTM + 1DCNN exhibits a higher accuracy and smaller prediction errors, as previously demonstrated. The BP structure presents significant deviations in the lowest and highest range tested, i.e., it tends to overestimate predictions when dealing with low values and underestimate them when operating at high torque levels. LSTM improves the BP's R 2 (R 2 BP = 0.908 ad R 2 BP = 0.965) by about 7% and reduces the RMSE (RMSE BP = 3.24 and RMSE LSTM = 1.92) by 40%, respectively, thereby mitigating the BP's error in the lowest range but still maintaining a lower accuracy in the highest range. The LSTM + 1DCNN model further enhances the performance of the LSTM model by improving its predictions across the entire range. It achieves an R 2 BPLSTM+1DCNN = 0.99 and an RMSE LSTM+1DCNN = 1.29, resulting in a 9% increase in the R-squared value compared to that of the BP model and a 2.6% increase compared to the LSTM model. It also reduces the RMSE by 60% compared to that of the BP model and 32% compared to the LSTM model. In other words, the proposed LSTM + 1DCNN model effectively reduces the dispersion degree by making the deviation distribution of torque predictions more uniform. These results indicate that the architecture is capable of addressing the prediction issues encountered by other models, highlighting the LSTM + 1DCNN's superior nonlinear fitting compared to other architectures. Therefore, LSTM + 1DCNN can be subjected to testing and evaluation in diverse dynamic cycles featuring a wide range of input and output engine characteristics to evaluate its potential applicability under various operating conditions.  Figure 9 displays the regression accuracy of each model (Equation (3)) and shows the corresponding RMSE. Compared to the other tested models, LSTM + 1DCNN exhibits a higher accuracy and smaller prediction errors, as previously demonstrated. The BP structure presents significant deviations in the lowest and highest range tested, i.e., it tends to overestimate predictions when dealing with low values and underestimate them when operating at high torque levels. LSTM improves the BP's R 2 (R = 0.908 ad R = 0.965) by about 7% and reduces the RMSE (RMSEBP = 3.24 and RMSELSTM = 1.92) by 40%, respectively, thereby mitigating the BP's error in the lowest range but still maintaining a lower accuracy in the highest range. The LSTM + 1DCNN model further enhances the performance of the LSTM model by improving its predictions across the entire range. It achieves an R = 0.99 and an RMSELSTM+1DCNN = 1.29, resulting in a 9% increase in the Rsquared value compared to that of the BP model and a 2.6% increase compared to the LSTM model. It also reduces the RMSE by 60% compared to that of the BP model and 32% compared to the LSTM model. In other words, the proposed LSTM + 1DCNN model effectively reduces the dispersion degree by making the deviation distribution of torque predictions more uniform. These results indicate that the architecture is capable of addressing the prediction issues encountered by other models, highlighting the LSTM + 1DCNN's superior nonlinear fitting compared to other architectures. Therefore, LSTM + 1DCNN can be subjected to testing and evaluation in diverse dynamic cycles featuring a wide range of input and output engine characteristics to evaluate its potential applicability under various operating conditions.

Evaluating LSTM + 1DCNN Structure on Different Transient Cycles
After comparing the prediction performance and considering the results obtained, we applied the proposed LSTM + 1DCNN structure to two additional transient cycles, each exhibiting different trends compared to the scenario shown in Figure 2. Despite the variations, both new cases maintained the same number of samples, as illustrated in Figure 10. As observed, the structure consistently demonstrates its ability to replicate the torque trends in these new scenarios, with average percentage errors consistently below 1.7% and no predictions exceeding an error of 10% in either case. This suggests that even when operating within different ranges, the model manages to maintain error levels similar to those observed prior to its on-board implementation. Therefore, by training the model on a more extensive dataset, we can expect similarly low errors even in broader operational contexts.

Evaluating LSTM + 1DCNN Structure on Different Transient Cycles
After comparing the prediction performance and considering the results obtained, we applied the proposed LSTM + 1DCNN structure to two additional transient cycles, each exhibiting different trends compared to the scenario shown in Figure 2. Despite the variations, both new cases maintained the same number of samples, as illustrated in Figure 10. As observed, the structure consistently demonstrates its ability to replicate the torque trends in these new scenarios, with average percentage errors consistently below 1.7% and no predictions exceeding an error of 10% in either case. This suggests that even when operating within different ranges, the model manages to maintain error levels similar to those observed prior to its on-board implementation. Therefore, by training the model on a more extensive dataset, we can expect similarly low errors even in broader operational contexts.

Conclusions
The present work evaluates the possibility of replacing a physical sensor dedicated to computing the torque delivered by an internal combustion engine under transient conditions by using an LSTM + 1DCNN approach. The optimized structure combines the capability of an LSTM to capture long-term dependencies and temporal patterns with the ability of 1DCNNs to detect patterns within smaller-signal segments. The performance of the proposed architecture was compared with those of other optimized artificial neural structures, i.e., back propagation and LSTM, for comparative purposes. All the structures proved to be able to reproduce the experimental trend of the engine's delivered torque. Specifically, the BP model achieved an average error of about 3% with 6% of its predictions exceeding the critical threshold set at 10%. It accurately reproduced low fluctuations of the signals but underestimated the maximum peaks of the torque. The LSTM model outperformed the BP model, showing an average error of 1.7% with about 1.6% of its predictions exceeding the critical threshold. In this case, underestimations of the local maximum peaks were also shown. With average errors of less than 1.5% (Err avg = 1.19%), the LSTM + 1DCNN structure outperforms the other architectures. It follows torque trends properly, never going over a 10% inaccuracy, and captures fast-signal oscillations better. The structure's performance was confirmed further when testing in two additional transient cycles, with average errors maintained at 1.7% and no forecasts with an inaccuracy greater than 10%. Overall, the study shows that LSTM + 1DCNN can replace physical sensors in torque computation for spark-ignition engines. To summarize: Summary of Key Findings It was found that the LSTM + 1DCNN architecture demonstrated convergence during training without overfitting, a crucial aspect of model performance. This implies that it can effectively learn from data and make accurate predictions.
Moreover, the comparative analysis with other architectures, including back propagation and LSTM, revealed that the LSTM + 1DCNN model consistently outperformed them in terms of accuracy and robustness. This superior performance positions LSTM + 1DCNN as a promising candidate for torque prediction in spark-ignition engines.
Implications of the Findings These findings hold significant implications for the field of spark-ignition engine research and artificial intelligence. By developing an architecture capable of accurate torque prediction, the authors pave the way for potential cost savings and improved accuracy in engine performance monitoring. This innovation can lead to more efficient and reliable engine management systems.
Additionally, the robustness and accuracy demonstrated by the LSTM + 1DCNN architecture highlight its usability across a wide range of real-world engine operating conditions. This adaptability could prove invaluable for diverse engine applications, further advancing the field.
Future Research and Testing While this study represents a substantial step forward, there is a need for future research and testing to validate and extend the applicability of the LSTM + 1DCNN architecture. Additional on-board implementation and testing under various operational contexts are essential to fully exploring its potential benefits. This includes testing the architecture in more extensive datasets and diverse dynamic cycles.
In conclusion, the work shows a promising solution for replacing physical torque sensors and contributes to the broader field of artificial intelligence in engine research. The LSTM + 1DCNN architecture's robustness and accuracy make it a compelling choice for enhancing engine performance monitoring, opening doors to further advancements in the field.

Data Availability Statement:
The data presented in this study are available from the corresponding author. The data are not publicly available due to privacy-related choices.