# Performance Analysis of Long Short-Term Memory Predictive Neural Networks on Time Series Data

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Teacher Forcing Background

#### 2.2. Neural Network Benchmarking Papers and Studies

#### 2.3. Neural Network Hyperparameter Optimization

## 3. Proposed Approach

#### 3.1. Long Short-Term Memory Neural Networks

#### 3.2. Long Short-Term Memory Networks with Teacher Forcing

#### 3.3. Neural Network Architectures and Prediction Modes

**Input Sequence Length (ISL)**: The input sequence length denotes the number of time-steps of the input variables fed to the neural network as a single sequence.**Teacher Forcing Lags (time-delays)**: For a given output variable, the number of lags denotes how many previous time-steps of said output variable will be fed back as input at the current time-step t. As previously mentioned, the number of lags is denoted as $\tau $, where $\tau \ge 1$.**Mini-Batch Size (MBS)**: The mini-batch represents a subset of the training dataset used to evaluate the gradient of the loss function and update the weights of the model [77].**Learning Rate (LR)**: The learning rate is a hyperparameter that controls the rate of change for the weights after each optimization step. If the learning rate is too small, the model will learn slowly and training will take longer. If it is too high, the model may not find the optimal solution or may even diverge.**Number of Hidden Units (HU)**: The dimensionality of the LSTM hidden state. The hidden units represent the information that is retained by the layer from one time-step to the next, referred to as the hidden state.

#### 3.4. Performance Evaluation Metrics

**Training Convergence Time (epochs).**Convergence refers to the epoch at which the network’s performance on the training data stops advancing. This often occurs after the network’s weights and biases have been adjusted so that the network’s output is as near to the desired output as feasible for a particular input. This value is computed by identifying the epoch after which the loss value does not increase or decrease by a percentage, further denoted as $\u03f5$, for P consecutive epochs.**Training Final Loss Value.**The final value of the training loss function after the maximum number of training epochs.**Training Loss Mean Value.**The mean value of the training loss function, computed over all the training epochs.**Testing Mean Absolute Error Value (MAE).**The mean average absolute error value computed on the testing set, as shown in Equation (18).**Testing Error Standard Deviation Value.**The standard deviation of the absolute error computed on the testing set.

#### 3.5. Feature Selection

## 4. Experimental Assessment

#### 4.1. Dataset Description and Data Preprocessing

#### 4.2. Neural Network Architectures

#### 4.3. Experiments

#### 4.3.1. Experiment 1

#### 4.3.2. Experiment 2

#### 4.3.3. Experiment 3

#### 4.3.4. Experiment 4

#### 4.3.5. Experiment 5

## 5. Results

#### 5.1. Multi-Input Single-Output Configuration

#### 5.1.1. Experiment 1

#### 5.1.2. Experiment 2

#### 5.1.3. Experiment 3

#### 5.2. Multi-Input Multi-Output Configuration

#### 5.2.1. Experiment 1

#### 5.2.2. Experiment 2

#### 5.2.3. Experiment 3

#### 5.2.4. Experiment 4

#### 5.2.5. Experiment 5

## 6. Discussion, Conclusions, and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

HU | Number of Hidden Units |

ISL | Input Sequence Length |

JNN | Jordan Neural Network |

LSTM | Long Short-term Memory Neural Network |

LSTMTF | Long Short-term Memory Neural Network with Teacher Forcing as proposed in [40] |

LSTMTFC | Long Short-term Memory Neural Network with the original Teacher Forcing algorithm |

M2O | Many-to-One Prediction Mode |

M2M | Many-to-Many Prediction Mode |

MAE | Mean Absolute Error |

MBS | Mini-Batch Size |

MIMO | Multi Input Multi Output Configuration |

MISO | Multi Input Single Output Configuration |

MSE | Mean Squared Error |

NARXNN | Nonlinear Auto-Regressive Neural Networks with eXogenous (external) inputs |

RNN | Recurrent Neural Network |

SMAPE | Symmetric Mean Absolute Percentage Error |

SOH | State of Health |

TEP | Tennessee Eastman Process |

TF | Teacher Forcing |

VLSTM | Vanilla Long Short-term Memory Neural Network |

## References

- Shailaja, K.; Seetharamulu, B.; Jabbar, M. Machine learning in healthcare: A review. In Proceedings of the IEEE 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 910–914. [Google Scholar]
- Dixon, M.F.; Halperin, I.; Bilokon, P. Machine Learning in Finance; Springer: Berlin/Heidelberg, Germany, 2020; Volume 1406. [Google Scholar]
- Rai, R.; Tiwari, M.K.; Ivanov, D.; Dolgui, A. Machine learning in manufacturing and industry 4.0 applications. Int. J. Prod. Res.
**2021**, 59, 4773–4778. [Google Scholar] [CrossRef] - Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors
**2018**, 18, 2674. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Călburean, P.A.; Grebenișan, P.; Nistor, I.A.; Pal, K.; Vacariu, V.; Drincal, R.K.; Țepes, O.; Bârlea, I.; Șuș, I.; Somkereki, C.; et al. Prediction of 3-year all-cause and cardiovascular cause mortality in a prospective percutaneous coronary intervention registry: Machine learning model outperforms conventional clinical risk scores. Atherosclerosis
**2022**, 350, 33–40. [Google Scholar] [CrossRef] [PubMed] - Carvalho, T.P.; Soares, F.A.; Vita, R.; Francisco, R.d.P.; Basto, J.P.; Alcalá, S.G. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng.
**2019**, 137, 106024. [Google Scholar] [CrossRef] - Avram, S.M.; Oltean, M. A Comparison of Several AI Techniques for Authorship Attribution on Romanian Texts. Mathematics
**2022**, 10, 4589. [Google Scholar] [CrossRef] - Darabant, A.S.; Borza, D.; Danescu, R. Recognizing human races through machine learning—A multi-network, multi-features study. Mathematics
**2021**, 9, 195. [Google Scholar] [CrossRef] - Nassif, A.B.; Talib, M.A.; Nasir, Q.; Dakalbab, F.M. Machine learning for anomaly detection: A systematic review. IEEE Access
**2021**, 9, 78658–78700. [Google Scholar] [CrossRef] - McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys.
**1943**, 5, 115–133. [Google Scholar] [CrossRef] - Elmsili, B.; Outtaj, B. Artificial neural networks applications in economics and management research: An exploratory literature review. In Proceedings of the IEEE 2018 4th International Conference on Optimization and Applications (ICOA), Mohammedia, Morocco, 26–27 April 2018; pp. 1–6. [Google Scholar]
- Haglin, J.M.; Jimenez, G.; Eltorai, A.E. Artificial neural networks in medicine. Health Technol.
**2019**, 9, 1–6. [Google Scholar] [CrossRef] - Ullah, A.; Malik, K.M.; Saudagar, A.K.J.; Khan, M.B.; Hasanat, M.H.A.; AlTameem, A.; AlKhathami, M.; Sajjad, M. COVID-19 Genome Sequence Analysis for New Variant Prediction and Generation. Mathematics
**2022**, 10, 4267. [Google Scholar] [CrossRef] - Abdel-Basset, M.; Hawash, H.; Alnowibet, K.A.; Mohamed, A.W.; Sallam, K.M. Interpretable Deep Learning for Discriminating Pneumonia from Lung Ultrasounds. Mathematics
**2022**, 10, 4153. [Google Scholar] [CrossRef] - Rodrigues, J.A.; Farinha, J.T.; Mendes, M.; Mateus, R.J.; Cardoso, A.J.M. Comparison of Different Features and Neural Networks for Predicting Industrial Paper Press Condition. Energies
**2022**, 15, 6308. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Gers, F.A.; Eck, D.; Schmidhuber, J. Applying LSTM to time series predictable through time-window approaches. In Neural Nets WIRN Vietri-01; Springer: Berlin/Heidelberg, Germany, 2002; pp. 193–200. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
- Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM fully convolutional networks for time series classification. IEEE Access
**2017**, 6, 1662–1669. [Google Scholar] [CrossRef] - Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing
**2019**, 323, 203–213. [Google Scholar] [CrossRef] - Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv
**2015**, arXiv:1511.08630. [Google Scholar] - Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 2005; pp. 799–804. [Google Scholar]
- Tan, H.X.; Aung, N.N.; Tian, J.; Chua, M.C.H.; Yang, Y.O. Time series classification using a modified LSTM approach from accelerometer-based data: A comparative study for gait cycle detection. Gait Posture
**2019**, 74, 128–134. [Google Scholar] [CrossRef] - Wang, P.; Jiang, A.; Liu, X.; Shang, J.; Zhang, L. LSTM-based EEG classification in motor imagery tasks. IEEE Trans. Neural Syst. Rehabil. Eng.
**2018**, 26, 2086–2095. [Google Scholar] [CrossRef] - Wang, X.; Huang, T.; Zhu, K.; Zhao, X. LSTM-Based Broad Learning System for Remaining Useful Life Prediction. Mathematics
**2022**, 10, 2066. [Google Scholar] [CrossRef] - Ma, Y.; Peng, H.; Khan, T.; Cambria, E.; Hussain, A. Sentic LSTM: A hybrid network for targeted aspect-based sentiment analysis. Cogn. Comput.
**2018**, 10, 639–650. [Google Scholar] [CrossRef] - Minaee, S.; Azimi, E.; Abdolrashidi, A. Deep-sentiment: Sentiment analysis using ensemble of cnn and bi-lstm models. arXiv
**2019**, arXiv:1904.04206. [Google Scholar] - Bengio, Y. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
- Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning; Springer: Cham, Switzerland, 2019; pp. 3–33. [Google Scholar]
- Breuel, T.M. Benchmarking of LSTM networks. arXiv
**2015**, arXiv:1508.02774. [Google Scholar] - Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst.
**2016**, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
- Farzad, A.; Mashayekhi, H.; Hassanpour, H. A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Comput. Appl.
**2019**, 31, 2507–2521. [Google Scholar] [CrossRef] - Khodabakhsh, A.; Ari, I.; Bakır, M.; Alagoz, S.M. Forecasting multivariate time-series data using LSTM and mini-batches. In Proceedings of the 7th International Conference on Contemporary Issues in Data Science; Springer: Berlin/Heidelberg, Germany, 2020; pp. 121–129. [Google Scholar]
- Menezes, J.M.P., Jr.; Barreto, G.A. Long-term time series prediction with the NARX network: An empirical evaluation. Neurocomputing
**2008**, 71, 3335–3343. [Google Scholar] [CrossRef] - Principe, J.C.; Euliano, N.R.; Lefebvre, W.C. Neural and Adaptive Systems: Fundamentals through Simulations with CD-ROM; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1999. [Google Scholar]
- Kumar, D.A.; Murugan, S. Performance analysis of NARX neural network backpropagation algorithm by various training functions for time series data. Int. J. Data Sci.
**2018**, 3, 308–325. [Google Scholar] [CrossRef] - Smith, S.L.; Kindermans, P.J.; Ying, C.; Le, Q.V. Don’t decay the learning rate, increase the batch size. arXiv
**2017**, arXiv:1711.00489. [Google Scholar] - Morishita, M.; Oda, Y.; Neubig, G.; Yoshino, K.; Sudoh, K.; Nakamura, S. An empirical study of mini-batch creation strategies for neural machine translation. arXiv
**2017**, arXiv:1706.05765. [Google Scholar] - Bolboacă, R. Adaptive Ensemble Methods for Tampering Detection in Automotive Aftertreatment Systems. IEEE Access
**2022**, 10, 105497–105517. [Google Scholar] [CrossRef] - Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput.
**1989**, 1, 270–280. [Google Scholar] [CrossRef] - Downs, J.J.; Vogel, E.F. A plant-wide industrial process control problem. Comput. Chem. Eng.
**1993**, 17, 245–255. [Google Scholar] [CrossRef] - Rieth, C.A.; Amsel, B.D.; Tran, R.; Cook, M.B. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. Harv. Dataverse
**2017**, 1, 2017. [Google Scholar] [CrossRef] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Schmidt, F. Generalization in generation: A closer look at exposure bias. arXiv
**2019**, arXiv:1910.00292. [Google Scholar] - Jordan, M. Generic constraints on underspecified target trajectories. In International Joint Conference on Neural Networks; IEEE Press: New York, NY, USA, 1989; Volume 1, pp. 217–225. [Google Scholar] [CrossRef]
- Lin, T.; Horne, B.G.; Tino, P.; Giles, C.L. Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans. Neural Netw.
**1996**, 7, 1329–1338. [Google Scholar] - Medsker, L.; Jain, L.C. Recurrent Neural Networks: Design and Applications; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
- Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput.
**2019**, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed] - Elman, J.L. Finding structure in time. Cogn. Sci.
**1990**, 14, 179–211. [Google Scholar] [CrossRef] - Taigman, Y.; Wolf, L.; Polyak, A.; Nachmani, E. Voiceloop: Voice fitting and synthesis via a phonological loop. arXiv
**2017**, arXiv:1707.06588. [Google Scholar] - Drossos, K.; Gharib, S.; Magron, P.; Virtanen, T. Language modelling for sound event detection with teacher forcing and scheduled sampling. arXiv
**2019**, arXiv:1907.08506. [Google Scholar] - Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. Adv. Neural Inf. Process. Syst.
**2015**, 28. [Google Scholar] - Loganathan, G.; Samarabandu, J.; Wang, X. Sequence to sequence pattern learning algorithm for real-time anomaly detection in network traffic. In Proceedings of the 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), Quebec, QC, Canada, 13–16 May 2018; pp. 1–4. [Google Scholar]
- Massaoudi, M.; Chihi, I.; Sidhom, L.; Trabelsi, M.; Refaat, S.S.; Abu-Rub, H.; Oueslati, F.S. An effective hybrid NARX-LSTM model for point and interval PV power forecasting. IEEE Access
**2021**, 9, 36571–36588. [Google Scholar] [CrossRef] - Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE
**1990**, 78, 1550–1560. [Google Scholar] [CrossRef] [Green Version] - Staudemeyer, R.C.; Morris, E.R. Understanding LSTM—A tutorial into long short-term memory recurrent neural networks. arXivt
**2019**, arXiv:1909.09586. [Google Scholar] - Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom.
**2020**, 404, 132306. [Google Scholar] [CrossRef] [Green Version] - Toomarian, N.; Bahren, J. Fast Temporal Neural Learning Using Teacher Forcing. U.S. Patent No. 5,428,710, 27 June 1995. [Google Scholar]
- Schrauwen, B.; Verstraeten, D.; Van Campenhout, J. An overview of reservoir computing: Theory, applications and implementations. In Proceedings of the 15th European Symposium on Artificial Neural Networks, Bruges, Belgium, 25–27 April 2007; pp. 471–482. [Google Scholar]
- Qi, K.; Gong, Y.; Liu, X.; Liu, X.; Zheng, H.; Wang, S. Multi-task MR Imaging with Iterative Teacher Forcing and Re-weighted Deep Learning. arXiv
**2020**, arXiv:2011.13614. [Google Scholar] - Goodman, S.; Ding, N.; Soricut, R. Teaforn: Teacher-forcing with n-grams. arXiv
**2020**, arXiv:2010.03494. [Google Scholar] - Hao, Y.; Liu, Y.; Mou, L. Teacher Forcing Recovers Reward Functions for Text Generation. arXiv
**2022**, arXiv:2210.08708. [Google Scholar] - Feng, Y.; Gu, S.; Guo, D.; Yang, Z.; Shao, C. Guiding teacher forcing with seer forcing for neural machine translation. arXiv
**2021**, arXiv:2106.06751. [Google Scholar] - Toomarian, N.B.; Barhen, J. Learning a trajectory using adjoint functions and teacher forcing. Neural Netw.
**1992**, 5, 473–484. [Google Scholar] [CrossRef] - Lamb, A.M.; Alias Parth Goyal, A.G.; Zhang, Y.; Zhang, S.; Courville, A.C.; Bengio, Y. Professor forcing: A new algorithm for training recurrent networks. Adv. Neural Inf. Process. Syst.
**2016**, 29. [Google Scholar] - Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast.
**2020**, 36, 1181–1191. [Google Scholar] [CrossRef] - Delcroix, B.; Ny, J.L.; Bernier, M.; Azam, M.; Qu, B.; Venne, J.S. Autoregressive neural networks with exogenous variables for indoor temperature prediction in buildings. Build. Simul.
**2021**, 14, 165–178. [Google Scholar] [CrossRef] - Ruiz, L.G.B.; Cuéllar, M.P.; Calvo-Flores, M.D.; Jiménez, M.D.C.P. An Application of Non-Linear Autoregressive Neural Networks to Predict Energy Consumption in Public Buildings. Energies
**2016**, 9, 684. [Google Scholar] [CrossRef] [Green Version] - Boussaada, Z.; Curea, O.; Remaci, A.; Camblong, H.; Mrabet Bellaaj, N. A Nonlinear Autoregressive Exogenous (NARX) Neural Network Model for the Prediction of the Daily Direct Solar Radiation. Energies
**2018**, 11, 620. [Google Scholar] [CrossRef] [Green Version] - Bennett, C.; Stewart, R.A.; Lu, J. Autoregressive with Exogenous Variables and Neural Network Short-Term Load Forecast Models for Residential Low Voltage Distribution Networks. Energies
**2014**, 7, 2938–2960. [Google Scholar] [CrossRef] - Alsumaiei, A.A.; Alrashidi, M.S. Hydrometeorological Drought Forecasting in Hyper-Arid Climates Using Nonlinear Autoregressive Neural Networks. Water
**2020**, 12, 2611. [Google Scholar] [CrossRef] - Pereira, F.H.; Bezerra, F.E.; Junior, S.; Santos, J.; Chabu, I.; Souza, G.F.M.d.; Micerino, F.; Nabeta, S.I. Nonlinear Autoregressive Neural Network Models for Prediction of Transformer Oil-Dissolved Gas Concentrations. Energies
**2018**, 11, 1691. [Google Scholar] [CrossRef] [Green Version] - Buitrago, J.; Asfour, S. Short-Term Forecasting of Electric Loads Using Nonlinear Autoregressive Artificial Neural Networks with Exogenous Vector Inputs. Energies
**2017**, 10, 40. [Google Scholar] [CrossRef] [Green Version] - Ren, Z.; Du, C.; Ren, W. State of Health Estimation of Lithium-Ion Batteries Using a Multi-Feature-Extraction Strategy and PSO-NARXNN. Batteries
**2023**, 9, 7. [Google Scholar] [CrossRef] - Prasetyowati, A.; Sudibyo, H.; Sudiana, D. Wind Power Prediction by Using Wavelet Decomposition Mode Based NARX-Neural Network. In Proceedings of the 2017 International Conference on Computer Science and Artificial Intelligence, CSAI 2017, Jakarta, Indonesia, 5–7 December 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 275–278. [Google Scholar] [CrossRef]
- Masters, D.; Luschi, C. Revisiting small batch training for deep neural networks. arXiv
**2018**, arXiv:1804.07612. [Google Scholar] - Hanin, B. Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients? In Proceedings of the Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Halpern-Wight, N.; Konstantinou, M.; Charalambides, A.G.; Reinders, A. Training and testing of a single-layer LSTM network for near-future solar forecasting. Appl. Sci.
**2020**, 10, 5873. [Google Scholar] [CrossRef]

**Figure 1.**LSTM generic unit (

**left**) together with an LSTM unit with teacher forcing (

**right**). This figure originates from [40].

**Figure 2.**Illustration of the training loss for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode. For each architecture, the lines with the same colors represent the loss for networks trained with the sequence length in the range [1:500]. The highlighted thick lines represent the training loss for a sequence length of 1.

**Figure 3.**Illustration of the training loss for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode. For each architecture, the lines with the same colors represent the loss for networks trained with the sequence length in the range [1:500]. The highlighted thick lines represent the training loss for a sequence length of 1.

**Figure 4.**Illustration of the testing mean absolute error value for Experiment 1, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 5.**Illustration of the testing mean absolute error value for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 6.**Illustration of the training loss for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode. For each architecture, the lines with the same colors represent the loss for networks trained with mini-batch size in the [2:128] range. The highlighted thick lines represent the training loss for a mini-batch size of 2.

**Figure 7.**Illustration of the training loss for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode. For each architecture, the lines with the same colors represent the loss for networks trained with the training mini-batch sizes in the [2:128] range. The highlighted thick lines represent the training loss for a mini-batch size of 2.

**Figure 8.**Illustration of the testing mean absolute error value for Experiment 2, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 9.**Illustration of the testing mean absolute error value for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 10.**Illustration of the mean loss value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode.

**Figure 11.**Illustration of the mean loss value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 12.**Illustration of the testing mean absolute error value for Experiment 3, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 13.**Illustration of the testing mean absolute error value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 14.**Illustration of the training loss for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode. The highlighted thick lines represent the training loss for a mini-batch size of 2.

**Figure 15.**Illustration of the training loss for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode. The highlighted thick lines represent the training loss for a mini-batch size of 2.

**Figure 16.**Illustration of the testing mean absolute error value for Experiment 1, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 17.**Illustration of the testing mean absolute error value for Experiment 1, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 18.**Illustration of the training loss for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode.

**Figure 19.**Illustration of the training loss for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 20.**Illustration of the testing mean absolute error value for Experiment 2, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 21.**Illustration of the testing mean absolute error value for Experiment 2, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 22.**Illustration of the mean loss value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-many prediction mode.

**Figure 23.**Illustration of the mean loss value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 24.**Illustration of the testing mean absolute error value for Experiment 3, LSTMTF (

**left**), VLSTM (

**middle**), and LSTMTFC (

**right**), using a many-to-many prediction mode.

**Figure 25.**Illustration of the testing mean absolute error value for Experiment 3, LSTMTF (

**left**) and VLSTM (

**right**), using a many-to-one prediction mode.

**Figure 26.**Illustration of the prediction performance for experiment 4. Subfigures (

**a**,

**b**,

**d**–

**f**) showcase the observed/predicted values on an entire cycle of 500 observations. Subfigure (

**c**) showcases the observed/predicted values for LSTMTF using an M2O prediction mode for 5000 observations.

Configuration | Output Variables | Input Variables |
---|---|---|

MISO | Product Analysis F (Stream 11) | A Feed (stream 1) |

A and C Feed (Stream 4) | ||

Product Separator Pressure | ||

Stripper Pressure | ||

Stripper Temperature | ||

Stripper Steam Flow | ||

Reactor Cooling Water Outlet Temperature | ||

Reactor feed Analysis B | ||

Reactor feed Analysis E | ||

A feed flow (Stream 1) | ||

Reactor Cooling Water Flow | ||

MIMO | Product Analysis F (Stream 11) Purge gas analysis (Stream 9) | All Continuous Process Measurements |

All Process Manipulated Variables |

**Table 2.**The complete list of fixed and varying hyperparameter values for Experiments 1–3 for all the tested neural network architectures.

Experiment | Hidden Layers | Sequence Input Length | Hidden Units | Learning Rate | Mini-Batch Size | Epochs | Lags |
---|---|---|---|---|---|---|---|

1 | 1 | [10, 500] | 16 | 0.01 | 32 | 100 | [1, 50] |

2 | 1 | 40 | 16 | 0.01 | [1, 128] | 100 | 1 |

3 | 1 | 40 | [1, 128] | [0.00001, 0.1] | 32 | 100 | 1 |

**Table 3.**The training convergence time (epochs) for LSTMTF, computed with respect to the number of lags, for $\u03f5$ = 1% and P = 5.

Many-to-Many | Many-to-One | |||||
---|---|---|---|---|---|---|

Min | Avg | Max | Min | Avg | Max | |

1 Lag | 41 | 86 | 100 | 16 | 65 | 100 |

10 Lags | 26 | 91 | 100 | 16 | 72 | 100 |

20 Lags | 46 | 95 | 100 | 21 | 73 | 100 |

30 Lags | 46 | 95 | 100 | 21 | 73 | 100 |

40 Lags | 46 | 96 | 100 | 26 | 71 | 100 |

50 Lags | 46 | 97 | 100 | 41 | 87 | 100 |

**Table 4.**The average training convergence time (epochs) for LSTMTF and VLSTM, computed with respect to the input sequence length, for $\u03f5$ = 1%, P = 5, and 1 lag for LSTMTF.

LSTMTF | VLSTM | |||
---|---|---|---|---|

ISL | Many-to-Many | Many-to-One | Many-to-Many | Many-to-One |

[10, 100] | 70 | 83 | 32 | 38 |

[100, 200] | 73 | 69 | 36 | 51 |

[200, 300] | 86 | 82 | 44 | 57 |

[300, 400] | 93 | 82 | 66 | 74 |

[400, 500] | 90 | 86 | 80 | 79 |

**Table 5.**The training convergence time (epochs) for LSTMTF and VLSTM, computed with respect to the mini-batch size, for $\u03f5$ = 1% and P = 5.

LSTMTF | VLSTM | |||
---|---|---|---|---|

MBS | Many-to-Many | Many-to-One | Many-to-Many | Many-to-One |

2 | 100 | 46 | 12 | 66 |

8 | 51 | 31 | 16 | 41 |

16 | 86 | 41 | 41 | 32 |

32 | 100 | 51 | 46 | 56 |

64 | 86 | 56 | 51 | 56 |

128 | 100 | 56 | 51 | 57 |

**Table 6.**The training convergence time (epochs) for LSTMTF and VLSTM, computed with respect to the number of hidden units, for $\u03f5$ = 1% and P = 5.

LSTMTF | VLSTM | |||
---|---|---|---|---|

HU | Many-to-Many | Many-to-One | Many-to-Many | Many-to-One |

2 | 81 | 38 | 30 | 38 |

8 | 73 | 40 | 25 | 44 |

16 | 76 | 43 | 24 | 43 |

32 | 79 | 47 | 22 | 44 |

64 | 77 | 44 | 22 | 42 |

128 | 58 | 47 | 23 | 47 |

**Table 7.**The training convergence time (epochs) for LSTMTF and VLSTM, computed with respect to the learning rate, for $\u03f5$ = 1% and P = 5.

LSTMTF | VLSTM | |||
---|---|---|---|---|

LR | Many-to-Many | Many-to-One | Many-to-Many | Many-to-One |

[0.0001, 0.02] | 71 | 46 | 23 | 45 |

[0.02, 0.04] | 71 | 44 | 23 | 44 |

[0.04, 0.06] | 71 | 44 | 23 | 44 |

[0.06, 0.08] | 71 | 45 | 23 | 45 |

[0.08, 0.1] | 71 | 44 | 23 | 44 |

**Table 8.**The actual training and testing times for 50,000 data points, measured in milliseconds, for LSTMTF and VLSTM.

LSTMTF | VLSTM | |||
---|---|---|---|---|

Time [ms] | Many-to-Many | Many-to-One | Many-to-Many | Many-to-One |

MISO Training | 1820 | 1831 | 1810 | 1870 |

MISO Testing | 46.55 | 49.70 | 47.13 | 45.72 |

MIMO Training | 2209 | 2294 | 2208 | 2236 |

MIMO Testing | 57.27 | 63.08 | 60.74 | 61.71 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bolboacă, R.; Haller, P.
Performance Analysis of Long Short-Term Memory Predictive Neural Networks on Time Series Data. *Mathematics* **2023**, *11*, 1432.
https://doi.org/10.3390/math11061432

**AMA Style**

Bolboacă R, Haller P.
Performance Analysis of Long Short-Term Memory Predictive Neural Networks on Time Series Data. *Mathematics*. 2023; 11(6):1432.
https://doi.org/10.3390/math11061432

**Chicago/Turabian Style**

Bolboacă, Roland, and Piroska Haller.
2023. "Performance Analysis of Long Short-Term Memory Predictive Neural Networks on Time Series Data" *Mathematics* 11, no. 6: 1432.
https://doi.org/10.3390/math11061432