Data-Driven Predictive Modeling of Neuronal Dynamics using Long Short-Term Memory

Modeling brain dynamics to better understand and control complex behaviors underlying various cognitive brain functions are of interests to engineers, mathematicians, and physicists from the last several decades. With a motivation of developing computationally efficient models of brain dynamics to use in designing control-theoretic neurostimulation strategies, we have developed a novel data-driven approach in a long short-term memory (LSTM) neural network architecture to predict the temporal dynamics of complex systems over an extended long time-horizon in future. In contrast to recent LSTM-based dynamical modeling approaches that make use of multi-layer perceptrons or linear combination layers as output layers, our architecture uses a single fully connected output layer and reversed-order sequence-to-sequence mapping to improve short time-horizon prediction accuracy and to make multi-timestep predictions of dynamical behaviors. We demonstrate the efficacy of our approach in reconstructing the regular spiking to bursting dynamics exhibited by an experimentally-validated 9-dimensional Hodgkin-Huxley model of hippocampal CA1 pyramidal neurons. Through simulations, we show that our LSTM neural network can predict the multi-time scale temporal dynamics underlying various spiking patterns with reasonable accuracy. Moreover, our results show that the predictions improve with increasing predictive time-horizon in the multi-timestep deep LSTM neural network.


Introduction
Our brain generates highly complex nonlinear responses at multiple temporal scales, ranging from few milliseconds to several days, in response to external stimulus [1,2,3]. One of the long-time interests in computational neuroscience is to understand the dynamics underlying various cognitive and non-cognitive brain functions by developing computationally efficient modeling and analysis approaches. In the last four decades or so, several advancements have been made in the direction of dynamical modeling and analysis of brain dynamics [4,5,6]. In the context of modeling the dynamics of single neurons, several modeling approaches, ranging from detailed mechanism-based biophysiological modeling to simplified phenomenological/probabilistic modeling, have been developed to understand the diverse firing patterns (e.g., simple spiking to bursting) observed in electrophysiological experiments [7,8]. These models provide a detailed understanding of various ionic mechanisms that contribute to generate specific spiking patterns as well as allow to perform large-scale simulations to understand the dynamics underlying cognitive behaviors. However, most of these models are computationally expensive from the perspective of developing novel real-time neurostimulation strategies for controlling neuronal dynamics at single neurons and network levels. In this paper, we investigate purely data-driven long short-term memory (LSTM) based recurrent neural network (RNN) architectures in multi-timestep predictions of single neuron's dynamics for the use in developing novel neurostimulation strategies in an optimal control framework. Figure 1: A schematic illustrating the overall data-driven approach developed in this paper for multi-timestep predictions of high-dimensional dynamical systems' behavior over a long time-horizon. An initial sequence of states and inputs are fed to the "Stacked LSTM Network" in a reverse-order for multi-timestep prediction of the system's states ("Reverseorder sequence-to-sequence mapping"). The predicted output from each stacked LSTM network is concatenated with the next sequence of inputs and fed into the next stacked LSTM network in a reverse-order to increase the predictive horizon. This process is iterated an arbitrary number of times, creating long dynamical predictions.
In contrast to existing approaches in modeling dynamical systems using neural networks, our architecture uses (1) stacked LSTM layers in conjunction with a single densely connected layer to capture temporal dynamic features as well as input/output features, (2) sequence-to-sequence mapping, which enables multi-timestep predictions, and (3) reverse ordered input and measured state trajectories to the network, resulting in highly accurate early predictions and improved performance over long horizons. We show the efficacy of our developed approach in making stable multi-timestep predictions of various firing patterns exhibited by hippocampal CA1 pyramidal neurons, obtained from simulating an experimentally validated highly nonlinear 9-dimensional Hodgkin-Huxley model of CA1 pyramidal cell dynamics, over long time-horizons.
The remaining paper is organized as follows. In Section 2, we describe our developed deep LSTM neural network architecture and methodological approach to data-driven multi-timestep predictions of dynamical systems. We show the efficacy of our approach in making stable multi-timestep predictions over long time-horizons of neuronal dynamics in Section 3 which is followed by a thorough discussion on the limitations of our approach in Section 4.

Neural Network Architecture, Algorithm and Approach
In Section 2.1, we describe our developed deep LSTM neural network architecture which combines stacked LSTMs with a fully-connected dense output layer. We describe the sequence-to-sequence mapping with reversed order input sequences used in this paper in Section 2.2. In Section 2.3, we provide the details on the synthetic data used to train our networks. Finally, in Section 2.4, we provide the details on the approach used to train the developed neural network architecture.

Deep LSTM Neural Network Architecture
Long short-term memory (LSTM) neural networks [25] are a particular type of recurrent neural networks (RNNs) which mitigate the vanishing or exploding gradient problem during the network training while capturing both the long-term and the short-term temporal features in sequential time-series data processing [19]. Specifically, LSTM uses multiple gating variables that control the flow of information of a hidden cell state and assign temporal importance to the dynamical features that are present in the time series data flowing through the cell state. Figure 2 shows a schematic illustrating the internal gating operation in a single LSTM cell. Figure 2: A schematic illustrating the internal gating operation in a single LSTM cell. The "+" represents an additive operation and the "•" represents a multiplicative operation. σ g is the sigmoidal activation function and σ c is the hyperbolic tangent activation function.
A forward pass of information through a single LSTM cell is described by the following cell and gating state equations (reference): (1b) In equations (1a)-(1e), c t ∈ IR h and h t ∈ IR h represent the cell state vector and the hidden state vector, respectively, at time t. f t ∈ IR h , i t ∈ IR h , and o t ∈ IR h are the "forget gate", "input gate", and "output gate" activation vector, respectively, at time t. x t ∈ IR d is the input vector to the LSTM unit at time t, and h t−1 ∈ IR h is the previous time step hidden state vector passed back into the LSTM unit at time t. The matrices W f , W i , and W o represent the input weights for the "forget gate", "input gate", and "output gate", respectively. The matrices U f , U i and U o represent the weights of the recurrent connections for the "forget gate", "input gate", and "output gate", respectively. The vectors b f b i , and b o represent the "forget gate", "input gate", and "output gate" biases, respectively. • represents the element-wise multiplication. The function σ g represents the sigmoidal activation function, and σ c is the hyperbolic tangent activation functions.
In this paper, we use stacked LSTM network integrated with a fully connected feedforward output layer to make multitimestep state predictions. The use of a single feedforward dense output layer allows the network to effectively learn the static input-output features, while the stacked LSTM network captures the temporal dynamical features. To appropriately select the optimum dimensionality of the hidden states in a single hidden layer, we systematically varied the number of hidden states in a sequence of {n, n 2 , 2n 2 , 4n 2 , · · · }, where n is the dimension of the system's state and evaluated the training performance for each case. We found that for our application (n = 9), a hidden state dimensionality of 4n 2 = 324 was optimal in learning dynamical behaviors while avoiding overfitting. To select the number of hidden layers, we systematically increased the number of hidden layers of identical hidden state dimensionality (i.e., 324 states) and compared the network performance during the training. We found that increasing the number of hidden layers beyond 3 layers did not improve the network performance on the training and validation dataset. Thus, we fixed the number of hidden layers to 3 in our study. Throughout this paper, we utilized stateless LSTMs which reset the internal cell and hidden states to zero after processing and performing gradient descent for a given minibatch. We initialized the network weights using the Xavier method [26]. Specifically, the initial weights were drawn from a uniform distribution using where n j is the dimensionality of the input units in the weight tensor, and n j+1 is the dimensionality of the output units in the weight tensor.
To generate a long time-horizon dynamical prediction beyond the multi-timestep prediction by a single stacked deep LSTM neural network (shown as "Deep LSTM" in Figure 3), we used an iterative approach as described here. We made copies of the trained single stacked LSTM network and connected them in the feedforward manner in a sequence. We concatenated the sequence of predicted output from the previous stacked LSTM network with an equivalent length sequence of new inputs to the system and fed them in the reverse sequence order to the next stacked LSTM network. Figure 3 illustrates this iterative approach.

Deep LSTM
Reverse Order Reverse Order Initial Sequence Reverse Order ... : Iterative prediction of the system's outputs over a long time-horizon. Each "Deep LSTM" receives the predicted sequence of outputs from the previous "Deep LSTM" and an equivalent length of new system's inputs in reverse order and predict the next sequence of outputs of same time duration in future.

Sequence to Sequence Mapping with Neural Networks
To make multi-timestep predictions of dynamical systems' outputs using the deep LSTM neural network architecture described in the previous section (Section 2.1), we formulate the problem of mapping trajectories of the network inputs to the trajectories of the predicted outputs as a reverse order sequence-to-sequence mapping problem. The central idea of the reverse order sequence-to-sequence mapping approach is to feed the inputs to the network in reverse order such that the network perceives the first input as the last and the last input as the first. Although this approach has been developed and applied in language translation applications [27], it has never been considered in the context of predicting dynamical systems behaviors from time-series data. Figure 4 illustrates the basic idea of the reverse order sequence-to-sequence mapping approach for translating letters (inputs) to their numerical indices (outputs).
As shown in Figure 4, in the forward sequence-to-sequence mapping approach (Figure 4(a)), i.e., A, B, C → 1, 2, 3, the distance between all mappings is same (i.e., 3 "units"). In the reverse sequence-to-sequence mapping approach ( Figure  4(b)), the network receives the input in a reverse order to map to the target output sequence, i.e., C, B, A → 1, 2, 3. As noted here, the average distance between the mappings remains the same for both approaches (i.e. 3 "units") but the reverse order approach introduces short and long-term symmetric temporal dependencies between inputs and outputs. These short and long-term symmetric temporal dependencies provide improved predictive performance over long temporal horizons [27]. Figure 4: Forward and reversed sequence-to-sequence mapping approach for translating letters (inputs) to their numerical indices (outputs) in recurrent neural network (RNN). (a) shows the forward sequence-to-sequence mapping approach. The input is fed into the network in the same sequence as the desired output. The "distance" between all corresponding inputs and outputs is uniform. (b) shows the reversed sequence-to-sequence mapping approach. This approach introduces a temporal symmetry between input and output sequences while keeping the average "distance" between the corresponding inputs and outputs same as the forward approach. As shown in (b), A → 1 is the shortest "distance" to map, B → 2 the second, and C → 3 the furthest.

Synthetic Data
Hippocampal CA1 pyramidal neurons exhibit various multi-timescale firing patterns (from simple spiking to bursting) and play an essential role in shaping spatial and episodic memory [28]. In the last two decades, several biophysiological models of the CA1 pyramidal (CA1Py) neurons ranging from single compartmental biophysiological and phenomenological models [29,30,31] to detailed morphology-based multi-compartmental models [32,33,34,35,36,37,38] have been developed to understand the contributions of various ion-channels in diverse firing patterns (e.g., simple spiking to bursting) exhibited by the CA1Py neurons.
In this paper, we use an experimentally validated 9-dimensional nonlinear model of CA1 pyramidal neuron in the Hodgkin-Huxley formalism given in [29] to generate the synthetic data for the network training and validation. The model exhibits several different bifurcations to the external stimulating current and has shown its capability in generating diverse firing patterns observed in electrophysiological recordings from CA1 pyramidal cells under various stimulating currents. Figure 5 shows three different firing patterns generated from this model based on the three different regimes of the applied input currents. To construct the synthetic training and validation dataset for the deep LSTM neural networks we designed in this paper with different predictive horizons, we simulated the Hodgkin-Huxley model of CA1 pyramidal neuron given in [29] (see Appendix A for the details of the model) for 1000 ms duration for 2000 constant stimulating currents, sampled uniformly between I = 0.0 nA and I = 3.0 nA. From these 2000 examples, we randomly and uniformly drew 50 samples (i.e., 10 4 data points) of the desired predictive horizon as the input/output sequence data for training and validation. As described in Section 2.4, we used 1/32 of these data points for validation, i.e., 96,875 data points for the training and 3,125 data points for the validation. Since our deep LSTM neural network takes an initial sequence of outputs of appropriate predictive horizon length (i.e., N p = 1, 50, 100, 200) as an input sequence to make the next time-horizon prediction of equivalent length of sequence, we assume that this initial output sequence data is available to the deep LSTM neural network throughout our simulations.

Network Training
We formulated the following optimization problem to train a set of network weights θ: where the loss function L(θ) is given by Here N P represents the length of horizon over which the predictions are made, x(k) is the known state vector at time step k, andx(k|θ) is the neural network's prediction of the state vector at time k, given θ.
To solve the optimization problem (3)-(4), we used the standard supervised backpropagation learning algorithm [39], [40], [41] along with the Adaptive Moment Estimation (Adam) method [42]. The Adam method is a first-order gradient-based optimization algorithm and uses lower-order moments of the gradients between layers to optimize a stochastic objective function.
Given the network parameter θ (i) and the loss function L(θ), where i represents the algorithm's training iteration, the parameter update is given by [42] where m θ is the first moment of the weights in a layer, ν θ is the second moment of the weights in a layer, η is the learning rate, β 1 and β 2 are the exponential decay rates for the moment estimates, ∇ is the differential gradient operator, and is a small scalar term to help numerical stability. Throughout this work, we used β 1 = 0.9, β 2 = 0.999, and η = 0.001 [42].
It should be noted that there is a tradeoff between the predictive time-horizon of deep LSTM neural network and the computational cost involved in training the network over the predictive horizon. As the predictive horizon increases, the computational cost of training the network over that horizon increases significantly for an equivalent number of examples. To keep the computational tractability in our simulations, all networks with long predictive horizons (i.e., N P = 50, 100, 200) were trained for 200 epochs except the one-step predictive network which was trained for 1,000 epochs.
For all training sets throughout this paper, we used the validation to training data ratio as 1/32. We set the minibatch size for training to 32. We performed all the training and computation in the TensorFlow computational framework on a discrete server running CentOS 7 with twin Nvidia GTX 1080Ti GPUs equipped with 11 Gb of VRAM.

Simulation Results
In this section, we present our simulation results on predicting the multi-timescale spiking dynamics exhibited by hippocampal CA1 pyramidal neurons over a long time-horizon using our developed deep LSTM neural network architecture described in Section 2. We trained 4 LSTM networks for making one timestep prediction (equivalently, 0.1 ms), 50 timesteps prediction (equivalently, 5 ms), 100 timesteps prediction (equivalently, 10 ms), and 200 timesteps prediction (equivalently, 20 ms). Figure 6 shows the training and validation loss for these 4 LSTM networks.  As shown in Figure 7, the iterative prediction of the membrane potential traces by the LSTM network didn't differ significantly over a short time horizon (up to 200 ms) for N p = 1, 50, 100, 200, but it significantly improved afterward with the increased predictive horizon of the LSTM network (i.e., N p = 1 to N p = 200). In particular, the LSTM network performance significantly improved in predicting the timing of the occurrence of spikes, but the magnitude of the membrane potential traces during spikes degraded as we increased N p from 1 to 200. For clarity, we also computed the time-averaged root mean squared error (RMSE) of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for N p = 1, 50, 100, 200 over 500 ms of simulation time. Figure 8(a) shows that the time-averaged RMSE decreased consistently with the increased predictive horizon of the LSTM network. Overall, these results show that our LSTM network with a longer predictive horizon prefers to capture temporal correlations more accurately over the amplitude while LSTM network with a shorter predictive horizon prefers to capture the amplitude more accurately over the temporal correlations.   Figure 7. (b) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution U(2.3, 3.0) and 100 random initial conditions for each stimulating current.
To systematically evaluate whether the designed LSTM networks provide reasonable predictions of the membrane potential traces of the regular spiking dynamics across the range of external input currents between 2.3 nA and 3.0 nA, we performed simulations for 50 random stimulating currents drawn from a Uniform distribution U(2.3, 3.0). For each stimulating current, we chose 100 initial conditions drawn randomly from the maximum and minimum range of the Hodgkin-Huxley state variables (Note that the network was not trained over this wide range of initial conditions). Figure 8(b) shows the LSTM network performance, represented in terms of the root mean squared error vs time over 5000 realizations, for N p = 1, 50, 100, 200. As shown in this figure, the root mean squared error decreased with the increased predictive horizon of the LSTM network for all time, which is consistent with the result shown in Figure 8(a).
In conclusion, these results suggest that our deep LSTM neural network with a longer predictive horizon feature can predict the regular (periodic) spiking patterns exhibited by hippocampal CA1 pyramidal neurons with high accuracy over a long-time horizon.

Irregular Bursting
In this section, we demonstrate the efficacy of our trained deep LSTM neural network over the range of external current between 0.79 nA and 2.  As shown in Figure 9, the LSTM performance significantly improved in predicting the timing of the occurrence of spikes up to 100 ms with the increased predictive horizon of the LSTM network from N p = 1 to N p = 200, but the performance degraded in capturing the magnitude of the membrane potentials during spiking with an increased value of N p . Although the time-averaged root mean squared error of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for N p = 1, 50, 100, 200 showed an improved performance with the increased value of N p (see Figure 10(a)), none of the LSTM networks showed a reasonable prediction of the timing of the occurrence of spikes in this regime beyond 100 ms of the time-horizon.
To systematically evaluate whether the designed LSTM networks provide reasonable predictions of the membrane potential traces of the regular spiking dynamics across the range of external input currents between 0.79 nA and 2.3 nA, we performed simulations for 50 random stimulating currents drawn from a Uniform distribution U(0.79, 2.3). For each stimulating current, we chose 100 initial conditions drawn randomly from the maximum and minimum range of the Hodgkin-Huxley state variables (Note that the network was not trained over this wide range of initial conditions). Figure 10(b) shows the LSTM network performance, represented in terms of the root mean squared error vs time over 5000 realizations, for N p = 1, 50, 100, 200. As shown in this figure, the root mean squared error decreased with the increased predictive horizon of the LSTM network for all time, which is consistent with the result shown in Figure  10(a).
In conclusion, these results suggest that our deep LSTM neural network with a longer predictive horizon feature can predict the irregular bursting patterns exhibited by hippocampal CA1 pyramidal neurons with high accuracy over only a short-time horizon.

Regular Bursting
In this section, we demonstrate the efficacy of our trained deep LSTM neural network over the range of external current between 0.24 nA and 0.79 nA in predicting the regular bursting dynamics shown by the biophysiological Hodgkin-Huxley model of CA1 pyramidal neuron in response to the external current I ∈ [0.24, 0.79) nA. For clarity, we here show our results only for the membrane potential traces. We provide the complete set of simulation results on the LSTM network performance in predicting the dynamics of all the 9 states of the Hodgkin-Huxley model in Appendix B.3 (see Figures 23,24,25,26,and 27).    Figure 9. (b) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution U(0.79, 2.3) and 100 random initial conditions for each stimulating current.
By analyzing the results shown in Figure 11, we found that the LSTM network performance in predicting the timing of spikes during bursts as well as tracking the subthreshold potential improved significantly from N p = 1 to N p = 200, but the performance substantially degraded in capturing the magnitude of the membrane potentials during spiking. In conclusion, the 200 timesteps prediction horizon based LSTM network (see Figure 11(d)) predicts the temporal dynamics with reasonable accuracy over the first 300 ms of prediction. Figure 12(a) shows the time-averaged root mean squared error of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for N p = 1, 50, 100, 200. As noted in this figure, the root mean squared error decreased substantially between 100 timesteps and 200 timesteps prediction horizon compared to the regimes of regular spiking (Figure 8(a)) and irregular bursting (Figure 10(a)), which indicates that a longer predictive horizon based LSTM network is necessary to capture the two different timescales (i.e., short intraburst spiking intervals and long interburst subthreshold intervals) presented in these dynamics.   Figure 11. (b) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution U(0.24, 0.79) and 100 random initial conditions for each stimulating current.
consistent with the result shown in Figure 12(a). Note that we have excluded the simulation result for N p = 50 as we found out in our detailed analysis that the trained LSTM network for N p = 50 led to instability in predicting spiking responses for some of the initial condition values in this regime. The reason for this may be that the network may not have seen these initial conditions during the training.

Discussion
In this paper, we developed and presented a new data-driven long short-term memory (LSTM) based neural network (NN) architecture to predict the dynamical spiking patterns of single neurons. Compared to other LSTM-based NN architectures for forecasting dynamical systems behavior reported in the literature, our architecture incorporated a single dense feedforward output layer with an activation function and a reverse-order sequence-to-sequence mapping approach into traditional LSTM based neural networks to enable truly multi-timestep stable predictions of the dynamics over a long time-horizon. We demonstrated the efficacy of our architecture in predicting the multi-time scale dynamics of hippocampal CA1 pyramidal neurons and compared the predictions from our model with the ground truth synthetic data obtained from an experimentally validated biophysiological model of CA1 pyramidal neuron in the Hodgkin-Huxley formalism. Through simulations, we showed that (1) the presented architecture can learn multi-timescale dynamics, and (2) the predictive accuracy of the network increases with the increase in the predictive horizon of the LSTM network.
Our results for irregular bursting regime showed the limitation of the designed deep LSTM neural network architecture in making an accurate prediction of the timing of the occurrence of spikes over a long-time horizon compared to regular spiking and regular bursting regimes. A possible reason for this may be the architecture itself or the dataset used for training these networks, which requires further investigations by training the networks on the dataset explicitly generated from this regime.
In all dynamical regimes, our results showed a degraded performance of the deep LSTM neural network in predicting the amplitude of membrane potentials during the timing of the occurrence of spikes with the increased predictive horizon of the LSTM network. This issue may be related to the equally weighted norm-2 loss function used for training the networks. A further investigation is required by considering different loss functions, such as norm-1 or weighted norm-2, which we consider as our future work.
Although the data-driven approach developed in this paper showed the ability of the designed LSTM-based neural network in learning multi-timescale dynamics, we note that the network struggles to accurately capture the dynamics of some state variables where the magnitude of the state variable is comparable to the numerical precision of our simulations. This can particularly be seen in 14, 15, 16, and 26, where the network is not able to reconstruct the dynamics of the state variable q sAHP with a reasonable accuracy. One possible way to alleviate this issue may be to increase the tolerance of the numerical errors in simulations which may increase the overall computational cost during training.

B Simulation Results on Full State Predictions of Hodgkin-Huxley Model
In Section 3, we showed our simulation results only for the membrane potential traces. Here, we provide the simulation results for all the 9 states of the Hodgkin-Huxley model of CA1 pyramidal neuron (HHCA1Py) predicted by the deep LSTM neural network over a long-time horizon and show the comparison between these predictions and the simulated dynamics from HHCA1Py.

B.1 Regular Spiking
In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the regular periodic spiking regime (I ∈ As shown in these figures, the performance of the deep LSTM neural network model in predicting state dynamics significantly improved with the increased predictive horizon of the LSTM network (i.e., N p = 1 to N p = 200) for all the states except q sAHP for which we found that the magnitude is comparable to the numerical precision of the performed simulations. Figure 17 shows the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for N p = 1, 50, 100, 200. These results show that the root mean squared error decreases from N p = 1 to N p = 200.

B.2 Irregular Bursting
In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the irregular bursting regime (I ∈ [0.79, 2.3) nA). Figures  18, 19, 20, and 21 show the comparison between the state's dynamics simulated using the Hodgkin-Huxley model and the deep LSTM neural network model developed for 1 timestep, 50 timesteps, 100 timesteps, and 200 timesteps (equivalently, N p = 1, 50, 100, 200) predictive horizon, respectively.
As shown in these figures, the deep LSTM neural network model provides a reasonable prediction of the dynamics of all the states except q sAHP over the initial 100 ms of simulations. Moreover, the prediction improved from N p = 1 to N p = 200, which is consistent with the results for the regular spiking regime (see Figures 13,14,15,16,and 17). We found that the magnitude of q sAHP was comparable to the numerical precision of our simulations, which hindered the capability of the LSTM network in making a reasonable prediction for this state. Figure 22 shows the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for N p = 1, 50, 100, 200. As shown here, the root mean squared error decreased with the increased predictive horizon of the LSTM network (i.e., N p = 1 to N p = 200).

B.3 Regular Bursting
In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the regular bursting regime (I ∈ [0.24, 0.79) nA). Figures 23,24,25,and 26 show the comparison between the state's dynamics simulated using the Hodgkin-Huxley model and the deep LSTM neural network model developed for 1 timestep, 50 timesteps, 100 timesteps, and 200 timesteps (equivalently, N p = 1, 50, 100, 200) predictive horizon, respectively.
As shown in these figures, the performance of the deep LSTM neural network model in predicting state dynamics significantly improved between 1 timestep predictive horizon ( Figure 23) and 200 timesteps predictive horizon ( Figure  26) across all the states except q sAHP for the similar reason we provided for the regular spiking and irregular bursting regimes. More importantly, the LTSM network predicted the temporal correlations with high accuracy over the time-            horizon of 300 ms for N p = 200. The extrapolation of these results suggest that increasing the predictive horizon beyond N p = 200 could improve the prediction beyond 300 ms of time-horizon.
In Figure 22, we show the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for N p = 1, 50, 100, 200. As shown here, the root mean squared error decreased with the increased predictive horizon of the LSTM network (i.e., N p = 1 to N p = 200), which is consistent with the results of the regular spiking and irregular bursting regimes.