Open Access
This article is

- freely available
- re-usable

*Algorithms*
**2019**,
*12*(10),
203;
https://doi.org/10.3390/a12100203

Article

Data-Driven Predictive Modeling of Neuronal Dynamics Using Long Short-Term Memory

Department of Chemical and Materials Engineering, University of Idaho, Moscow, ID 83844, USA

^{*}

Author to whom correspondence should be addressed.

Received: 12 August 2019 / Accepted: 23 September 2019 / Published: 24 September 2019

## Abstract

**:**

Modeling brain dynamics to better understand and control complex behaviors underlying various cognitive brain functions have been of interest to engineers, mathematicians and physicists over the last several decades. With the motivation of developing computationally efficient models of brain dynamics to use in designing control-theoretic neurostimulation strategies, we have developed a novel data-driven approach in a long short-term memory (LSTM) neural network architecture to predict the temporal dynamics of complex systems over an extended long time-horizon in future. In contrast to recent LSTM-based dynamical modeling approaches that make use of multi-layer perceptrons or linear combination layers as output layers, our architecture uses a single fully connected output layer and reversed-order sequence-to-sequence mapping to improve short time-horizon prediction accuracy and to make multi-timestep predictions of dynamical behaviors. We demonstrate the efficacy of our approach in reconstructing the regular spiking to bursting dynamics exhibited by an experimentally-validated 9-dimensional Hodgkin-Huxley model of hippocampal CA1 pyramidal neurons. Through simulations, we show that our LSTM neural network can predict the multi-time scale temporal dynamics underlying various spiking patterns with reasonable accuracy. Moreover, our results show that the predictions improve with increasing predictive time-horizon in the multi-timestep deep LSTM neural network.

Keywords:

long short-term memory; brain dynamics; data-driven modeling; complex systems## 1. Introduction

Our brain generates highly complex nonlinear responses at multiple temporal scales, ranging from few milliseconds to several days, in response to an external stimulus [1,2,3]. One of the long-time interests in computational neuroscience is to understand the dynamics underlying various cognitive and non-cognitive brain functions by developing computationally efficient modeling and analysis approaches. In the last four decades or so, several advancements have been made in the direction of dynamical modeling and analysis of brain dynamics [4,5,6]. In the context of modeling the dynamics of single neurons, several modeling approaches, ranging from detailed mechanism-based biophysiological modeling to simplified phenomenological/probabilistic modeling, have been developed to understand the diverse firing patterns (e.g., simple spiking to bursting) observed in electrophysiological experiments [7,8]. These models provide a detailed understanding of various ionic mechanisms that contribute to generating specific spiking patterns as well as allowing the performance of large-scale simulations to understand the dynamics underlying cognitive behaviors. However, most of these models are computationally expensive from the perspective of developing novel real-time neurostimulation strategies for controlling neuronal dynamics at single neurons and network levels. In this paper, we investigate purely data-driven long short-term memory (LSTM) based recurrent neural network (RNN) architectures in multi-timestep predictions of a single neuron’s dynamics for the use in developing novel neurostimulation strategies in an optimal control framework.

The availability of an abundant amount of data and advances in machine learning has recently revolutionized the field of predictive data-driven dynamical modeling of complex systems using neural networks (NNs) and deep learning approaches. Various nonlinear system identification approaches have been developed to map static input-output relations using multi-layer perceptrons (MLPs) [9,10,11,12] and their variations [13,14]. Reinforcement learning has recently been explored in robotics dynamical modeling in Reference [15]. NN architectures that make use of vanilla recurrent neural network (RNNs) elements have also been explored for nonlinear system identification and modeling in References [16,17,18]. However, network architectures that make use of vanilla recurrent layers often suffer from the exploding or vanishing gradient problem when used to model dynamics over long time series horizons [19]. In Reference [18], a highly specialized multi-phase training algorithm was used to ensure that the network did not suffer from this problem. LSTM based approaches to modeling dynamical systems [20,21,22] mitigate the vanishing gradient problem but suffer from poor early trajectory predictive performance when using long predictive horizons [20,23]. LSTMs have been used to model high-dimensional chaotic systems [24] but these studies have been limited to single step prediction applications. Additionally, machine learning techniques have begun to be explored in neuroscientific modeling applications. Multi-layer Spiking Artificial Neural Networks (SANNs) for use in spatiotemporal spike pattern transformations have been developed in References [25,26]. This is achieved by using novel approximations and surrogates of the partial derivatives of the spike train functions with respect to the weights. This partial derivative is typically problematic when used in backpropagation, as it is undefined at spike times for many neuronal models, rendering it incompatible with traditional backpropagation-based approaches. In Reference [27], a novel gated recurrent unit (GRU) based encoder/decoder approach is used to learn and predict neuronal population dynamics and kinematic trajectories from single-trial spike train data.

In this paper, we have developed a novel deep LSTM neural network architecture, which can make multi-timestep predictions in large-scale dynamical systems. In particular, we use a reversed sequence-to-sequence mapping technique, developed for language translation applications of multi-layer LSTM networks in Reference [28], and generalize the application of this technique to dynamical systems time-series forecasting. Figure 1 illustrates our overall approach.

In contrast to existing approaches in modeling dynamical systems using neural networks, our architecture uses (1) stacked LSTM layers in conjunction with a single densely connected layer to capture temporal dynamic features as well as input/output features; (2) sequence-to-sequence mapping, which enables multi-timestep predictions; and (3) reverse ordered input and measured state trajectories to the network, resulting in highly accurate early predictions and improved performance over long horizons. We show the efficacy of our developed approach in making stable multi-timestep predictions of various firing patterns exhibited by hippocampal CA1 pyramidal neurons, obtained from simulating an experimentally validated highly nonlinear 9-dimensional Hodgkin-Huxley model of CA1 pyramidal cell dynamics, over long time-horizons. Our approach is contingent on the network being trained on the entire state vector of the neuronal model.

The remaining paper is organized as follows. In Section 2, we describe our developed deep LSTM neural network architecture and methodological approach to data-driven multi-timestep predictions of dynamical systems. We show the efficacy of our approach in making stable multi-timestep predictions over long time-horizons of neuronal dynamics in Section 3, which is followed by a thorough discussion on the limitations of our approach in Section 4.

## 2. Neural Network Architecture, Algorithm and Approach

In Section 2.1, we describe our developed deep LSTM neural network architecture which combines stacked LSTMs with a fully-connected dense output layer. We describe the sequence-to-sequence mapping with reversed order input sequences used in this paper in Section 2.2. In Section 2.3, we provide the details on the synthetic data used to train our networks. Finally, in Section 2.4, we provide the details on the approach used to train the developed neural network architecture.

#### 2.1. Deep LSTM Neural Network Architecture

Long short-term memory (LSTM) neural networks [29] are a particular type of recurrent neural networks (RNNs) which mitigate the vanishing or exploding gradient problem during the network training while capturing both the long-term and the short-term temporal features in sequential time-series data processing [19]. Specifically, LSTM uses multiple gating variables that control the flow of information of a hidden cell state and assign temporal importance to the dynamical features that are present in the time series data flowing through the cell state. Figure 2 shows a schematic illustrating the internal gating operation in a single LSTM cell.

A forward pass of information through a single LSTM cell is described by the following cell and gating state equations (reference):
in Equations (1a)–(1e), ${c}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$ and ${h}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$ represent the cell state vector and the hidden state vector, respectively, at time t. ${f}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$, ${i}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$, and ${o}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$ are the “forget gate’’, “input gate’’, and “output gate’’ activation vector, respectively, at time t. ${x}_{t}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{d}$ is the input vector to the LSTM unit at time t, and ${h}_{t-1}\in {\mathrm{I}\phantom{\rule{-0.166667em}{0ex}}\mathrm{R}}^{h}$ is the previous time step hidden state vector passed back into the LSTM unit at time t. The matrices ${W}_{f}$, ${W}_{i}$, and ${W}_{o}$ represent the input weights for the “forget gate’’, “input gate’’, and “output gate’’, respectively. The matrices ${U}_{f}$, ${U}_{i}$ and ${U}_{o}$ represent the weights of the recurrent connections for the “forget gate’’, “input gate’’, and “output gate’’, respectively. The vectors ${b}_{f}$ ${b}_{i}$, and ${b}_{o}$ represent the “forget gate’’, “input gate’’, and “output gate’’ biases, respectively. ∘ represents the element-wise multiplication. The function ${\sigma}_{g}$ represents the sigmoidal activation function, and ${\sigma}_{c}$ is the hyperbolic tangent activation functions.

$${c}_{t}={f}_{t}\circ {c}_{t-1}+{i}_{t}\circ {\sigma}_{c}({W}_{c}{x}_{t}+{U}_{c}{h}_{t-1}+{b}_{c}),$$

$${h}_{t}={o}_{t}\circ {\sigma}_{c}\left({c}_{t}\right).$$

$${f}_{t}={\sigma}_{g}({W}_{f}{x}_{t}+{U}_{f}{h}_{t-1}+{b}_{f}),$$

$${i}_{t}={\sigma}_{g}({W}_{i}{x}_{t}+{U}_{i}{h}_{t-1}+{b}_{i}),$$

$${o}_{t}={\sigma}_{g}({W}_{o}{x}_{t}+{U}_{o}{h}_{t-1}+{b}_{o}),$$

In this paper, we use stacked LSTM network integrated with a fully connected feedforward output layer to make multi-timestep state predictions. The use of a single feedforward dense output layer allows the network to effectively learn the static input-output features, while the stacked LSTM network captures the temporal dynamical features. To appropriately select the optimum dimensionality of the hidden states in a single hidden layer, we systematically varied the number of hidden states in a sequence of {n, ${n}^{2}$, $2{n}^{2}$, $4{n}^{2}$, ⋯}, where n is the dimension of the system’s state and evaluated the training performance for each case. We found that for our application ($n=9$), a hidden state dimensionality of $4{n}^{2}=324$ was optimal in learning dynamical behaviors while avoiding overfitting. To select the number of hidden layers, we systematically increased the number of hidden layers of identical hidden state dimensionality (i.e., 324 states) and compared the network performance during the training. We found that increasing the number of hidden layers beyond 3 layers did not improve the network performance on the training and validation dataset. Thus, we fixed the number of hidden layers to 3 in our study. Throughout this paper, we utilized stateless LSTMs which reset the internal cell and hidden states to zero after processing and performing gradient descent for a given minibatch. We initialized the network weights using the Xavier method [30]. Specifically, the initial weights were drawn from a uniform distribution using
where ${n}_{j}$ is the dimensionality of the input units in the weight tensor, and ${n}_{j+1}$ is the dimensionality of the output units in the weight tensor.

$${W}_{ij}\sim \mathcal{U}\left(-\frac{6}{\sqrt{{n}_{j}+{n}_{j+1}}},\frac{6}{\sqrt{{n}_{j}+{n}_{j+1}}}\right),$$

To generate a long time-horizon dynamical prediction beyond the multi-timestep prediction by a single stacked deep LSTM neural network (shown as “Deep LSTM” in Figure 3), we used an iterative approach as described here. We made copies of the trained single stacked LSTM network and connected them in the feedforward manner in a sequence. We concatenated the sequence of predicted output from the previous stacked LSTM network with an equivalent length sequence of new inputs to the system and fed them in the reverse sequence order to the next stacked LSTM network. Figure 3 illustrates this iterative approach.

#### 2.2. Sequence to Sequence Mapping with Neural Networks

To make multi-timestep predictions of dynamical systems’ outputs using the deep LSTM neural network architecture described in the previous section (Section 2.1), we formulate the problem of mapping trajectories of the network inputs to the trajectories of the predicted outputs as a reverse order sequence-to-sequence mapping problem. The central idea of the reverse order sequence-to-sequence mapping approach is to feed the inputs to the network in reverse order such that the network perceives the first input as the last and the last input as the first. Although this approach has been developed and applied in language translation applications [28], it has never been considered in the context of predicting dynamical systems behaviors from time-series data. Figure 4 illustrates the basic idea of the reverse order sequence-to-sequence mapping approach for translating letters (inputs) to their numerical indices (outputs).

As shown in Figure 4, in the forward sequence-to-sequence mapping approach (Figure 4a), that is, $A,B,C\to 1,2,3$, the distance between all mappings is same (i.e., 3 “units”). In the reverse sequence-to-sequence mapping approach (Figure 4b), the network receives the input in a reverse order to map to the target output sequence, i.e., $C,B,A\to 1,2,3$. As noted here, the average distance between the mappings remains the same for both approaches (i.e., 3 “units”) but the reverse order approach introduces short and long-term symmetric temporal dependencies between inputs and outputs. These short and long-term symmetric temporal dependencies provide improved predictive performance over long temporal horizons [28].

#### 2.3. Synthetic Data

Hippocampal CA1 pyramidal neurons exhibit various multi-timescale firing patterns (from simple spiking to bursting) and play an essential role in shaping spatial and episodic memory [31]. In the last two decades, several biophysiological models of the CA1 pyramidal (CA1Py) neurons ranging from single compartmental biophysiological and phenomenological models [32,33,34] to detailed morphology-based multi-compartmental models [35,36,37,38,39,40,41] have been developed to understand the contributions of various ion-channels in diverse firing patterns (e.g., simple spiking to bursting) exhibited by the CA1Py neurons.

In this paper, we use an experimentally validated 9-dimensional nonlinear model of CA1 pyramidal neuron in the Hodgkin-Huxley formalism given in Reference [32] to generate the synthetic data for the network training and validation. The model exhibits several different bifurcations to the external stimulating current and has shown its capability in generating diverse firing patterns observed in electrophysiological recordings from CA1 pyramidal cells under various stimulating currents. Figure 5 shows three different firing patterns generated from this model based on the three different regimes of the applied input currents.

To construct the synthetic training and validation dataset for the deep LSTM neural networks we designed in this paper with different predictive horizons, we simulated the Hodgkin-Huxley model of CA1 pyramidal neuron given in [32] (see Appendix A for the details of the model) for 1000 ms duration for 2000 constant stimulating currents, sampled uniformly between $I=0.0$ nA and $I=3.0$ nA. From these 2000 examples, we randomly and uniformly drew 50 samples (i.e., ${10}^{4}$ data points) of the desired predictive horizon as the input/output sequence data for training and validation. As described in Section 2.4, we used 1/32 of these data points for validation, that is, 96,875 data points for the training and 3125 data points for the validation. Since our deep LSTM neural network takes an initial sequence of outputs of appropriate predictive horizon length (i.e., ${N}_{p}=1,50,100,200$) as an input sequence to make the next time-horizon prediction of equivalent length of sequence, we assume that this initial output sequence data is available to the deep LSTM neural network throughout our simulations.

#### 2.4. Network Training

We formulated the following optimization problem to train a set of network weights $\theta $:
where the loss function $\mathcal{L}\left(\theta \right)$ is given by

$${\theta}^{*}=\underset{\theta}{arg\; min}\mathcal{L}\left(\theta \right),$$

$$\mathcal{L}\left(\theta \right)=\frac{1}{{N}_{P}}\sum _{k=0}^{{N}_{P}}{(\overrightarrow{x}\left(k\right)-\widehat{x}\left(k\right|\theta ))}^{T}(\overrightarrow{x}\left(k\right)-\widehat{x}\left(k\right|\theta )).$$

Here ${N}_{P}$ represents the length of horizon over which the predictions are made, $\overrightarrow{x}\left(k\right)$ is the known state vector at time step k, and $\widehat{x}\left(k\right|\theta )$ is the neural network’s prediction of the state vector at time k, given $\theta $.

To solve the optimization problem (3) and (4), we used the standard supervised backpropagation learning algorithm [42,43,44] along with the Adaptive Moment Estimation (Adam) method [45]. The Adam method is a first-order gradient-based optimization algorithm and uses lower-order moments of the gradients between layers to optimize a stochastic objective function.

Given the network parameter ${\theta}^{\left(i\right)}$ and the loss function $\mathcal{L}\left(\theta \right)$, where i represents the algorithm’s training iteration, the parameter update is given by Reference [45]
where ${m}_{\theta}$ is the first moment of the weights in a layer, ${\nu}_{\theta}$ is the second moment of the weights in a layer, $\eta $ is the learning rate, ${\beta}_{1}$ and ${\beta}_{2}$ are the exponential decay rates for the moment estimates, ∇ is the differential gradient operator, and $\u03f5$ is a small scalar term to help numerical stability. Throughout this work, we used ${\beta}_{1}=0.9$, ${\beta}_{2}=0.999$ and $\eta =0.001$ [45].

$$\begin{array}{c}\hfill {m}_{\theta}^{(i+1)}\leftarrow {\beta}_{1}{m}_{\theta}^{\left(i\right)}+(1-{\beta}_{1}){\nabla}_{\theta}{\mathcal{L}}^{\left(i\right)},\end{array}$$

$$\begin{array}{c}\hfill {\nu}_{\theta}^{(i+1)}\leftarrow {\beta}_{2}{m}_{\theta}^{\left(i\right)}+(1-{\beta}_{2}){\left({\nabla}_{\theta}{\mathcal{L}}^{\left(i\right)}\right)}^{2},\end{array}$$

$$\begin{array}{c}\hfill {\widehat{m}}_{\theta}=\frac{{m}_{\theta}^{(i+1)}}{1-{\left({\beta}_{1}\right)}^{i+1}},\end{array}$$

$$\begin{array}{c}\hfill {\widehat{\nu}}_{\theta}=\frac{{\nu}_{\theta}^{(i+1)}}{1-{\left({\beta}_{2}\right)}^{(i+1)}},\end{array}$$

$$\begin{array}{c}\hfill {\theta}^{(i+1)}\leftarrow {\theta}^{\left(i\right)}-\eta \frac{{\widehat{m}}_{\theta}}{\sqrt{{\widehat{\nu}}_{\theta}}+\u03f5},\end{array}$$

It should be noted that there is a trade-off between the predictive time-horizon of deep LSTM neural network and the computational cost involved in training the network over the predictive horizon. As the predictive horizon increases, the computational cost of training the network over that horizon increases significantly for an equivalent number of examples. To keep the computational tractability in our simulations, all networks with long predictive horizons (i.e., ${N}_{P}=50,100,200$) were trained for 200 epochs except the one-step predictive network, which was trained for 1000 epochs.

For all training sets throughout this paper, we used the validation to training data ratio as 1/32. We set the minibatch size for training to 32. We performed all the training and computation in the TensorFlow computational framework on a discrete server running CentOS 7 with twin Nvidia GTX 1080Ti GPUs equipped with 11 Gb of VRAM.

## 3. Simulation Results

In this section, we present our simulation results on predicting the multi-timescale spiking dynamics exhibited by hippocampal CA1 pyramidal neurons over a long time-horizon using our developed deep LSTM neural network architecture described in Section 2. We trained 4 LSTM networks for making one timestep prediction (equivalently, $0.1$ ms), 50 timesteps prediction (equivalently, 5 ms), 100 timesteps prediction (equivalently, 10 ms), and 200 timesteps prediction (equivalently, 20 ms). Figure 6 shows the training and validation loss for these 4 LSTM networks.

Using the iterative approach described in Section 2.1, we simulated the LSTM networks over 500 ms of time duration under different initial conditions and stimulating input currents between three different regimes of dynamical responses (“Regular Spiking” ($I\in [2.3,3.0]$ nA), “Irregular Bursting” ($I\in [0.79,2.3)$ nA), and “Regular Bursting” ($I\in [0.24,0.79)$) nA) and compared the predicted state trajectories with the Hodgkin-Huxley model. We provide a link to download simulation codes to replicate our presented results in this section in the Supplementary Materials section.

#### 3.1. Regular Spiking

In this section, we demonstrate the efficacy of our trained deep LSTM neural network over the range of external current between $2.3$ nA and 3 nA in predicting the regular spiking dynamics shown by the biophysiological Hodgkin-Huxley model of CA1 pyramidal neuron in response to the external current $I\ge 2.3$ nA. In this range, we observe firing rates between approximately 165 Hz and 187 Hz. For clarity, we here show our results only for the membrane potential traces. We provide the complete set of simulation results on the LSTM network performance in predicting the dynamics of all the 9 states of the Hodgkin-Huxley model in Appendix B.1 (see Figure A1, Figure A2, Figure A3, Figure A4, Figure A5).

Figure 7 shows a comparison of the membrane potential traces simulated using the Hodgkin-Huxley model and the 4 different predictive horizons of the LSTM network (i.e., 1 timestep, 50 timesteps, 100 timesteps and 200 timesteps, which we represent as ${N}_{p}=1,50,100,200$) for the external stimulating current $I=3.0$ nA. Note that all the simulations are performed using the same initial condition as provided in Appendix A. Since our LSTM network uses the initial sequence of outputs of appropriate predictive horizon (i.e., ${N}_{p}=1,50,100,200$) from the Hodgkin-Huxley model to make future time predictions, the LSTM network predictions (shown by dashed red line) start after $0.1$ ms, 5 ms, 10 ms, and 20 ms in Figure 7a–d, respectively.

As shown in Figure 7, the iterative prediction of the membrane potential traces by the LSTM network did not differ significantly over a short time horizon (up to 200 ms) for ${N}_{p}=1,50,100,200$, but it significantly improved afterward with the increased predictive horizon of the LSTM network (i.e., ${N}_{p}=1$ to ${N}_{p}=200$). In particular, the LSTM network performance significantly improved in predicting the timing of the occurrence of spikes, but the magnitude of the membrane potential traces during spikes degraded as we increased ${N}_{p}$ from 1 to 200. For clarity, we also computed the time-averaged root mean squared error (RMSE) of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for ${N}_{p}=1,50,100,200$ over 500 ms of simulation time. Figure 8a shows that the time-averaged RMSE decreased consistently with the increased predictive horizon of the LSTM network. Overall, these results show that our LSTM network with a longer predictive horizon prefers to capture temporal correlations more accurately over the amplitude while an LSTM network with a shorter predictive horizon prefers to capture the amplitude more accurately over the temporal correlations.

To systematically evaluate whether the designed LSTM networks provide reasonable predictions of the membrane potential traces of the regular spiking dynamics across the range of external input currents between $2.3$ nA and $3.0$ nA, we performed simulations for 50 random stimulating currents drawn from a Uniform distribution $\mathcal{U}(2.3,3.0)$. For each stimulating current, we chose 100 initial conditions drawn randomly from the maximum and minimum range of the Hodgkin-Huxley state variables (Note that the network was not trained over this wide range of initial conditions). Figure 8b shows the LSTM network performance, represented in terms of the root mean squared error vs time over 5000 realizations, for ${N}_{p}=1,50,100,200$. As shown in this figure, the root mean squared error decreased with the increased predictive horizon of the LSTM network for all time, which is consistent with the result shown in Figure 8a.

In conclusion, these results suggest that our deep LSTM neural network with a longer predictive horizon feature can predict the regular (periodic) spiking patterns exhibited by hippocampal CA1 pyramidal neurons with high accuracy over a long-time horizon.

#### 3.2. Irregular Bursting

In this section, we demonstrate the efficacy of our trained deep LSTM neural network over the range of external current between $0.79$ nA and $2.3$ nA in predicting the irregular bursting dynamics shown by the biophysiological Hodgkin-Huxley model of CA1 pyramidal neuron in response to the external current $I\in [0.79,2.3)$ nA. In this range, we observe firing rates between approximately 53 Hz and 164 Hz. For clarity, we here show our results only for the membrane potential traces. We provide the complete set of simulation results on the LSTM network performance in predicting the dynamics of all the 9 states of the Hodgkin-Huxley model in Appendix B.2 (see Figure A6, Figure A7, Figure A8, Figure A9, Figure A10).

Figure 9 shows a comparison of the membrane potential traces simulated using the Hodgkin-Huxley model and the 4 different predictive horizons of the LSTM network (i.e., ${N}_{p}=1,50,100,200$) for the external stimulating current $I=1.5$ nA. Note that all the simulations are performed using the initial condition used for $I=3.0$ nA in Figure 7. Since our LSTM network uses the initial sequence of outputs of appropriate prediction horizon (i.e., ${N}_{p}=1,50,100,200$) from the Hodgkin-Huxley model to make future time predictions, the LSTM network predictions (shown by dashed red line) start after $0.1$ ms, 5 ms, 10 ms, and 20 ms in Figure 9a–d, respectively.

As shown in Figure 9, the LSTM performance significantly improved in predicting the timing of the occurrence of spikes up to 100 ms with the increased predictive horizon of the LSTM network from ${N}_{p}=1$ to ${N}_{p}=200$, but the performance degraded in capturing the magnitude of the membrane potentials during spiking with an increased value of ${N}_{p}$. Although the time-averaged root mean squared error of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for ${N}_{p}=1,50,100,200$ showed an improved performance with the increased value of ${N}_{p}$ (see Figure 10a), none of the LSTM networks showed a reasonable prediction of the timing of the occurrence of spikes in this regime beyond 100 ms of the time-horizon.

To systematically evaluate whether the designed LSTM networks provide reasonable predictions of the membrane potential traces of the regular spiking dynamics across the range of external input currents between $0.79$ nA and $2.3$ nA, we performed simulations for 50 random stimulating currents drawn from a Uniform distribution $\mathcal{U}(0.79,2.3)$. For each stimulating current, we chose 100 initial conditions drawn randomly from the maximum and minimum range of the Hodgkin-Huxley state variables (note that the network was not trained over this wide range of initial conditions). Figure 10b shows the LSTM network performance, represented in terms of the root mean squared error vs time over 5000 realizations, for ${N}_{p}=1,50,100,200$. As shown in this figure, the root mean squared error decreased with the increased predictive horizon of the LSTM network for all time, which is consistent with the result shown in Figure 10a.

In conclusion, these results suggest that our deep LSTM neural network with a longer predictive horizon feature can predict the irregular bursting patterns exhibited by hippocampal CA1 pyramidal neurons with high accuracy over only a short-time horizon.

#### 3.3. Regular Bursting

In this section, we demonstrate the efficacy of our trained deep LSTM neural network over the range of external current between $0.24$ nA and $0.79$ nA in predicting the regular bursting dynamics shown by the biophysiological Hodgkin-Huxley model of CA1 pyramidal neuron in response to the external current $I\in [0.24,0.79)$ nA. In this range, we observe firing rates between approximately 8 Hz and 52 Hz. For clarity, we here show our results only for the membrane potential traces. We provide the complete set of simulation results on the LSTM network performance in predicting the dynamics of all the 9 states of the Hodgkin-Huxley model in Appendix B.3 (see Figure A11, Figure A12, Figure A13, Figure A14, Figure A15).

Figure 11 shows a comparison of the membrane potential traces simulated using the Hodgkin-Huxley model and the 4 different predictive horizons of the LSTM network (i.e., ${N}_{p}=1,50,100,200$) for the external stimulating current $I=0.5$ nA. Note that all the simulations are performed using the initial condition used for $I=3.0$ nA in Figure 7. Since our LSTM network uses the initial sequence of outputs of appropriate prediction horizon (i.e., ${N}_{p}=1,50,100,200$) from the Hodgkin-Huxley model to make future time predictions, the LSTM network predictions (shown by dashed red line) start after $0.1$ ms, 5 ms, 10 ms and 20 ms in Figure 11a–d, respectively.

By analyzing the results shown in Figure 11, we found that the LSTM network performance in predicting the timing of spikes during bursts as well as tracking the subthreshold potential improved significantly from ${N}_{p}=1$ to ${N}_{p}=200$, but the performance substantially degraded in capturing the magnitude of the membrane potentials during spiking. In conclusion, the 200 timesteps prediction horizon based LSTM network (see Figure 11d) predicts the temporal dynamics with reasonable accuracy over the first 300 ms of prediction.

Figure 12a shows the time-averaged root mean squared error of the membrane potential traces between the Hodgkin-Huxley model and the LSTM network for ${N}_{p}=1,50,100,200$. As noted in this figure, the root mean squared error decreased substantially between 100 timesteps and 200 timesteps prediction horizon compared to the regimes of regular spiking (Figure 8a) and irregular bursting (Figure 10a), which indicates that a longer predictive horizon based LSTM network is necessary to capture the two different timescales (i.e., short intraburst spiking intervals and long interburst subthreshold intervals) presented in these dynamics.

Figure 12b shows the LSTM networks performances, represented in terms of the root mean squared error vs time over 5000 realizations, for ${N}_{p}=$ 1, 100, and 200 timestep prediction horizon LSTM network. As shown in this figure, the root mean squared error decreased with the increased predictive horizon of the LSTM network for all time, which is consistent with the result shown in Figure 12a. Note that we have excluded the simulation result for ${N}_{p}=50$ as we found out in our detailed analysis that the trained LSTM network for ${N}_{p}=50$ led to instability in predicting spiking responses for some of the initial condition values in this regime. The reason for this may be that the network may not have seen these initial conditions during the training.

## 4. Discussion

In this paper, we developed and presented a new data-driven long short-term memory (LSTM) based neural network (NN) architecture to predict the dynamical spiking patterns of single neurons. Compared to other LSTM-based NN architectures for forecasting dynamical systems behavior reported in the literature, our architecture incorporated a single dense feedforward output layer with an activation function and a reverse-order sequence-to-sequence mapping approach into traditional LSTM based neural networks to enable truly multi-timestep stable predictions of the dynamics over a long time-horizon. We demonstrated the efficacy of our architecture in predicting the multi-time scale dynamics of hippocampal CA1 pyramidal neurons and compared the predictions from our model with the ground truth synthetic data obtained from an experimentally validated biophysiological model of CA1 pyramidal neuron in the Hodgkin-Huxley formalism. Through simulations, we showed that (1) the presented architecture can learn multi-timescale dynamics; and (2) the predictive accuracy of the network increases with the increase in the predictive horizon of the LSTM network.

Our results for irregular bursting regime showed the limitation of the designed deep LSTM neural network architecture in making an accurate prediction of the timing of the occurrence of spikes over a long-time horizon compared to regular spiking and regular bursting regimes. A possible reason for this may be the architecture itself or the dataset used for training these networks, which requires further investigations by training the networks on the dataset explicitly generated from this regime. In addition, it has been shown that this regime of bursting exhibits chaotic dynamics [46], which may provide further explanation for the network’s struggle to accurately predict this bursting behavior, as the system exhibits a high sensitivity to the initial conditions. Hybrid approaches that combine LSTM networks with mean stochastic models (MSM) have been explored for predictively modeling chaotic dynamical systems in Reference [24]. However, this LSTM-MSM approach is limited to iterative single step prediction, and the application of this technique falls beyond the scope of this paper.

Another limitation of our presented approach in modeling neuronal dynamics as currently constructed is the inclusion of the full state vector in both training and predictive evaluation. In experiment, it may be infeasible to have the entire state vector of the neuron measured for any given time. This should provide a valuable direction for future research, as partially observed systems or neuronal spike train recordings are much more feasible to measure in vivo or vitro and merit further consideration in combination with this approach.

In all dynamical regimes, our results showed a degraded performance of the deep LSTM neural network in predicting the amplitude of membrane potentials during the timing of the occurrence of spikes with the increased predictive horizon of the LSTM network. This issue may be related to the equally weighted norm-2 loss function used for training the networks. A further investigation is required by considering different loss functions, such as norm-1 or weighted norm-2, which we consider as our future work.

In addition, we make a note on the computational requirement of our presented approach. The computational cost of inference with an artificial neural network can effectively be boiled down to the number of multiplications and additions needed to complete a forward pass of information. The inference complexity for an LSTM is roughly $\mathcal{O}({d}_{i}\xb7d\xb7h+d\xb7{h}^{2})$, as described in Reference [24], where ${d}_{i}$ is the dimensionality of each input, d is the number of inputs, and h is the dimensionality of the hidden states. Using this, we estimated the inference complexity of our LTSM network, represented by $\mathcal{O}\left(I\right)$, for the prediction horizon of 1, 50, 100, and 200. Using a single Nvidia GTX 1080Ti, we have also calculated the average computation time required for each of the predictive horizons used to predict 500 ms of state values from 1000 examples. We report these values in Table 1.

Although the data-driven approach developed in this paper showed the ability of the designed LSTM-based neural network in learning multi-timescale dynamics, we note that the network struggles to accurately capture the dynamics of some state variables where the magnitude of the state variable is comparable to the numerical precision of our simulations. This can particularly be seen in Figure A2, Figure A3, Figure A4 and Figure A14, where the network is not able to reconstruct the dynamics of the state variable ${q}_{sAHP}$ with a reasonable accuracy. One possible way to alleviate this issue may be to increase the tolerance of the numerical errors in simulations, which may increase the overall computational cost during training.

In conclusion, our results showed that a longer predictive horizon-based LSTM network can provide a more accurate prediction of multi-time scale dynamics, but at the expense of extensive offline training cost.

## Supplementary Materials

Codes and supplementary materials can be found at https://webpages.uidaho.edu/gkumar/Research/publications.html

## Author Contributions

Conceptualization, B.P. and G.K.; methodology, B.P. and G.K.; formal analysis, B.P. and G.K.; investigation, B.P.; resources, B.P. and G.K.; writing—original draft preparation, B.P.; writing—review and editing, G.K.; visualization, B.P. and G.K.; supervision, G.K.; project administration, G.K.; funding acquisition, G.K.

## Funding

We gratefully acknowledge the startup funding support from the University of Idaho College of Engineering and Department of Chemical and Materials Engineering.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A. Hodgkin-Huxley Model of CA1 Pyramidal Neuron Dynamics

We used the following Hodgkin-Huxley model of CA1 pyramidal neuron from [32] to demonstrate the efficacy of our data-driven modeling approach presented in this paper:
where the ionic currents ${I}_{Na}$, ${I}_{NaP}$, ${I}_{Kdr}$, ${I}_{A}$, ${I}_{M}$, ${I}_{sAHP}$, ${I}_{C}$, and ${I}_{Ca}$ are given by
here, V is the membrane potential in mV, C is the membrane capacitance, ${V}_{L}$ is the reversal potential of the leak current, ${g}_{L}$ is the conductance of the leak current, and ${I}_{app}$ is the externally applied stimulating current. The ionic currents ${I}_{Na}$, ${I}_{NaP}$, ${I}_{Kdr}$, ${I}_{A}$, ${I}_{M}$, ${I}_{sAHP}$, ${I}_{C}$, and ${I}_{Ca}$ represent the transient sodium current, persistent sodium current, delayed rectifier potassium current, A-type potassium current, muscarinic-sensitive potassium current, slow calcium-activated potassium current, fast calcium-activated potassium current, and high threshold calcium current respectively. ${g}_{i}$, $i\in \{Na,NaP,Kdr,A,M,Ca,C,sAHP\}$ represents the conductance of the ion channel i. ${V}_{i}$, $i\in \{Na,K,Ca\}$ is the reversal potential of the ion channel i.

$$C\frac{dV}{dt}=-{g}_{L}(V-{V}_{L})-{I}_{Na}-{I}_{NaP}-{I}_{Kdr}-{I}_{A}-{I}_{M}-{I}_{Ca}-{I}_{C}-{I}_{sAHP}+{I}_{app},$$

$${I}_{Na}={g}_{Na}{m}_{\infty}^{3}\left(V\right){h}_{Na}(V-{V}_{Na}),$$

$${I}_{NaP}={g}_{NaP}{p}_{\infty}\left(V\right)(V-{V}_{Na}),$$

$${I}_{Kdr}={g}_{Kdr}{n}_{Kdr}^{4}(V-{V}_{K}),$$

$${I}_{A}={g}_{A}{a}_{\infty}^{3}\left(V\right){b}_{Kdr}(V-{V}_{K}),$$

$${I}_{M}={g}_{M}{z}_{M}(V-{V}_{K}),$$

$${I}_{Ca}={g}_{Ca}{r}_{Ca}^{2}(V-{V}_{Ca}),$$

$${I}_{C}={g}_{C}{d}_{\infty}\left({\left[C{a}^{2+}\right]}_{i}\right){c}_{C}(V-{V}_{K}),$$

$${I}_{sAHP}={g}_{sAHP}{q}_{sAHP}(V-{V}_{K}),$$

The dynamics of the transient activation/deactivation variables of the ionic and calcium currents, i.e., ${h}_{Na}$, ${n}_{Kdr}$, ${b}_{Kdr}$, ${z}_{M}$, ${r}_{Ca}$, ${c}_{C}$, ${q}_{sAHP}$, and ${\left[C{a}^{2+}\right]}_{i}$, are given by:
here, ${m}_{\infty}\left(V\right)$, ${h}_{\infty}\left(V\right)$, ${n}_{\infty}\left(V\right)$, ${p}_{\infty}\left(V\right)$, ${a}_{\infty}\left(V\right)$, ${b}_{\infty}\left(V\right)$, ${z}_{\infty}\left(V\right)$, ${r}_{\infty}\left(V\right)$, ${c}_{\infty}\left(V\right)$, ${q}_{\infty}\left({\left[C{a}^{2+}\right]}_{i}\right)$, and ${d}_{\infty}\left({\left[C{a}^{2+}\right]}_{i}\right)$ are the steady-state activation/deactivation functions. $\varphi $ is a scaling parameter. ${\tau}_{{h}_{Na}}\left(V\right)$, ${\tau}_{{n}_{Kdr}}\left(V\right)$, ${\tau}_{{b}_{Kdr}}$, ${\tau}_{{r}_{Ca}}$, ${\tau}_{{c}_{C}}$, and ${\tau}_{{q}_{sAHP}}$ are the time constants. The steady-state activation/deactivation functions are given by:
here, ${a}_{c}$, ${a}_{q}$, ${\theta}_{i}$, ${\sigma}_{i}$ for $i\in \{m,n,h,p,b,z,a,r,c\}$ are the model parameters. The voltage dependent time constants ${\tau}_{{h}_{Na}}\left(V\right)$ and ${\tau}_{{n}_{Kdr}}\left(V\right)$ are given by
where ${\theta}_{ht}$, ${\theta}_{nt}$, ${\sigma}_{ht}$, and ${\sigma}_{nt}$ are model parameters.

$$\frac{d{h}_{Na}}{dt}=\varphi \frac{{h}_{\infty}\left(V\right)-{h}_{Na}}{{\tau}_{{h}_{Na}}\left(V\right)},$$

$$\frac{d{n}_{Kdr}}{dt}=\varphi \frac{{n}_{\infty}\left(V\right)-{n}_{Kdr}}{{\tau}_{{n}_{Kdr}}\left(V\right)},$$

$$\frac{d{b}_{Kdr}}{dt}=\frac{{b}_{\infty}\left(V\right)-{b}_{Kdr}}{{\tau}_{{b}_{Kdr}}},$$

$$\frac{d{z}_{M}}{dt}=\frac{{z}_{\infty}\left(V\right)-{z}_{M}}{{\tau}_{z}},$$

$$\frac{d{r}_{Ca}}{dt}=\frac{{r}_{\infty}\left(V\right)-{r}_{Ca}}{{\tau}_{{r}_{Ca}}},$$

$$\frac{d{c}_{C}}{dt}=\frac{{c}_{\infty}\left(V\right)-{c}_{C}}{{\tau}_{{c}_{C}}},$$

$$\frac{d{q}_{sAHP}}{dt}=\frac{{q}_{\infty}\left(V\right)-{q}_{sAHP}}{{\tau}_{{q}_{sAHP}}},$$

$$\frac{d{\left[C{a}^{2+}\right]}_{i}}{dt}=-\nu {I}_{Ca}-\frac{{\left[C{a}^{2+}\right]}_{i}}{{\tau}_{Ca}},$$

$${m}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{m})/{\sigma}_{m}}},$$

$${n}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{n})/{\sigma}_{n}}},$$

$${h}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{h})/{\sigma}_{h}}},$$

$${p}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{p})/{\sigma}_{p}}},$$

$${b}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{b})/{\sigma}_{b}}},$$

$${z}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{z})/{\sigma}_{z}}},$$

$${a}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{a})/{\sigma}_{a}}},$$

$${r}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{r})/{\sigma}_{r}}},$$

$${c}_{\infty}\left(V\right)=\frac{1}{1+{e}^{-(V-{\theta}_{c})/{\sigma}_{c}}},$$

$${d}_{\infty}\left({\left[C{a}^{2+}\right]}_{i}\right)=\frac{1}{(1+{a}_{c}/{\left[C{a}^{2+}\right]}_{i})},$$

$${q}_{\infty}\left({\left[C{a}^{2+}\right]}_{i}\right)=\frac{1}{1+({a}_{q}^{4}/{\left[C{a}^{2+}\right]}_{i}^{4})},$$

$${\tau}_{{h}_{Na}}\left(V\right)=1+\frac{7.5}{1+{e}^{-(V-{\theta}_{ht})/{\sigma}_{ht}}},$$

$${\tau}_{{n}_{Kdr}}\left(V\right)=1+\frac{5}{1+{e}^{-(V-{\theta}_{nt})/{\sigma}_{nt}}},$$

Throughout this paper, we used the following numerical values for the unknown model parameters [32]: $C=1$ $\mathsf{\mu}$F/cm${}^{2}$, ${g}_{L}=0.05$ mS/cm${}^{2}$, ${V}_{L}=-70$ mV, $\nu =0.13$ cm${}^{2}$/(ms ×$\mathsf{\mu}$A), ${g}_{Na}$ = 35 mS/cm${}^{2}$, ${V}_{Na}$ = 55 mV, ${g}_{NaP}$ = 0.4 mS/cm${}^{2}$, ${g}_{Kdr}$ = 6.0 mS/cm${}^{2}$, ${V}_{K}$ = −90 mV, ${g}_{A}$ = 1.4 mS/cm${}^{2}$, ${g}_{M}$ = 0.5 mS/cm${}^{2}$, ${g}_{Ca}=$ 0.08 mS/cm${}^{2}$, ${g}_{C}$ = 10 mS/cm${}^{2}$, ${V}_{Ca}$ = 120 mV, and ${g}_{sAHP}$ = 5 mS/cm${}^{2}$, ${\theta}_{m}$ = −30 mV, ${\sigma}_{m}$ = 9.5 mV, ${\theta}_{h}$ = −45 mV, ${\sigma}_{h}$ = −7 mV, ${\theta}_{ht}$ = −40.5 mV, ${\sigma}_{ht}$ = −6 mV, $\varphi $ = 10, ${\theta}_{P}$ = −47 mV, ${\sigma}_{P}$ = 3 mV, ${\theta}_{n}$ = −35 mV, ${\sigma}_{n}$ = 10 mV, ${\theta}_{nt}$ = −27 mV, ${\sigma}_{nt}$ = −15 mV, ${\theta}_{a}$ = −50 mV, ${\sigma}_{a}$ = 20 mV, ${\theta}_{b}$ = −80 mV, ${\sigma}_{b}$ = −6 mV, ${\theta}_{z}$ = −39 mV, ${\sigma}_{z}$ = 5 mV, ${\theta}_{r}$ = −20 mV, ${\sigma}_{r}$ = 10 mV, ${\tau}_{r}$ = 1 ms, ${\theta}_{c}$ = −30 mV, ${\sigma}_{c}$ = 7 mV, ${\theta}_{c}$ = 2 ms, ${a}_{c}$ = 6, ${\tau}_{q}$ = 450 ms, and ${a}_{q}$ = 2.

Unless otherwise stated, we used the following initial conditions to simulate the Hodgkin-Huxley model for generating the synthetic data: ${V}_{0}$ = −71.81327 mV, ${h}_{Na0}=0.98786$, ${n}_{Kdr0}=0.02457$, ${b}_{KA0}=0.203517$, ${u}_{KM0}=0.00141$, ${r}_{Ca0}=0.005507$, ${\left[Ca\right]}_{i0}=0.000787$, ${c}_{C0}=0.002486$, ${q}_{Ca0}=0.0$.

## Appendix B. Simulation Results on Full State Predictions of Hodgkin-Huxley Model

In Section 3, we showed our simulation results only for the membrane potential traces. Here, we provide the simulation results for all the 9 states of the Hodgkin-Huxley model of CA1 pyramidal neuron (HHCA1Py) predicted by the deep LSTM neural network over a long-time horizon and show the comparison between these predictions and the simulated dynamics from HHCA1Py.

#### Appendix B.1. Regular Spiking

In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the regular periodic spiking regime ($I\in [2.3,3.0]$ nA). Figure A1, Figure A2, Figure A3, Figure A4 show the comparison between the state’s dynamics simulated using the Hodgkin-Huxley model and the deep LSTM neural network model developed for 1 timestep, 50 timesteps, 100 timesteps, and 200 timesteps (equivalently, ${N}_{p}=1,50,100,200$) predictive horizon, respectively.

As shown in these figures, the performance of the deep LSTM neural network model in predicting state dynamics significantly improved with the increased predictive horizon of the LSTM network (i.e., ${N}_{p}=1$ to ${N}_{p}=200$) for all the states except ${q}_{sAHP}$ for which we found that the magnitude is comparable to the numerical precision of the performed simulations. Figure A5 shows the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for ${N}_{p}=1,50,100,200$. These results show that the root mean squared error decreases from ${N}_{p}=1$ to ${N}_{p}=200$.

#### Appendix B.2. Irregular Bursting

In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the irregular bursting regime ($I\in [0.79,2.3)$ nA). Figure A6, Figure A7, Figure A8, Figure A9 show the comparison between the state’s dynamics simulated using the Hodgkin-Huxley model and the deep LSTM neural network model developed for 1 timestep, 50 timesteps, 100 timesteps, and 200 timesteps (equivalently, ${N}_{p}=1,50,100,200$) predictive horizon, respectively.

As shown in these figures, the deep LSTM neural network model provides a reasonable prediction of the dynamics of all the states except ${q}_{sAHP}$ over the initial 100 ms of simulations. Moreover, the prediction improved from ${N}_{p}=1$ to ${N}_{p}=200$, which is consistent with the results for the regular spiking regime (see Figure A1, Figure A2, Figure A3, Figure A4, Figure A5). We found that the magnitude of ${q}_{sAHP}$ was comparable to the numerical precision of our simulations, which hindered the capability of the LSTM network in making a reasonable prediction for this state.

Figure A10 shows the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for ${N}_{p}=1,50,100,200$. As shown here, the root mean squared error decreased with the increased predictive horizon of the LSTM network (i.e., ${N}_{p}=1$ to ${N}_{p}=200$).

**Figure A1.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 1 timestep predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=3.0$ nA.

**Figure A2.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 50 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=3.0$ nA.

**Figure A3.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 100 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=3.0$ nA.

**Figure A4.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 200 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=3.0$ nA.

**Figure A5.**The root mean squared error (RMSE) versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(2.3,3.0)$ and 100 random initial conditions for each stimulating current.

**Figure A6.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 1 timestep predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=1.5$ nA.

**Figure A7.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 50 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=1.5$ nA.

**Figure A8.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 100 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=1.5$ nA.

**Figure A9.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 200 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=1.5$ nA.

**Figure A10.**The root mean squared error (RMSE) versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(0.79,2.3)$ and 100 random initial conditions for each stimulating current.

#### Appendix B.3. Regular Bursting

In this section, we show the simulation results on predicting the dynamics of all the 9 states of HHCA1Py over a long-time horizon using the deep LSTM neural network for the regular bursting regime ($I\in [0.24,0.79)$ nA). Figure A11, Figure A12, Figure A13, Figure A14 show the comparison between the state’s dynamics simulated using the Hodgkin-Huxley model and the deep LSTM neural network model developed for 1 timestep, 50 timesteps, 100 timesteps, and 200 timesteps (equivalently, ${N}_{p}=1,50,100,200$) predictive horizon, respectively.

As shown in these figures, the performance of the deep LSTM neural network model in predicting state dynamics significantly improved between 1 timestep predictive horizon (Figure A11) and 200 timesteps predictive horizon (Figure A14) across all the states except ${q}_{sAHP}$ for the similar reason we provided for the regular spiking and irregular bursting regimes. More importantly, the LTSM network predicted the temporal correlations with high accuracy over the time-horizon of 300 ms for ${N}_{p}=200$. The extrapolation of these results suggest that increasing the predictive horizon beyond ${N}_{p}=200$ could improve the prediction beyond 300 ms of time-horizon.

In Figure A10, we show the root mean squared error between the states of HHCA1Py and the deep LSTM neural network model as a function of simulation time over 5000 random realizations, for ${N}_{p}=1,50,100,200$. As shown here, the root mean squared error decreased with the increased predictive horizon of the LSTM network (i.e., ${N}_{p}=1$ to ${N}_{p}=200$), which is consistent with the results of the regular spiking and irregular bursting regimes.

**Figure A11.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 1 timestep predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=0.5$ nA.

**Figure A12.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 50 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=0.5$ nA.

**Figure A13.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 100 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=0.5$ nA.

**Figure A14.**Comparison between the Hodgkin-Huxley model (“HH Model”) states’ dynamics and the iterative predictions of states’ dynamics using the 200 timesteps predictive horizon-based deep LSTM neural network (“LSTM Network”) in response to $I=0.5$ nA.

**Figure A15.**The root mean squared error (RMSE) versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(0.24,0.79)$ and 100 random initial conditions for each stimulating current.

## References

- Salmelin, R.; Hari, R.; Lounasmaa, O.; Sams, M. Dynamics of brain activation during picture naming. Nature
**1994**, 368, 463. [Google Scholar] [CrossRef] [PubMed] - Fox, M.D.; Snyder, A.Z.; Vincent, J.L.; Corbetta, M.; Van Essen, D.C.; Raichle, M.E. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci. USA
**2005**, 102, 9673–9678. [Google Scholar] [CrossRef] - Kiebel, S.J.; Daunizeau, J.; Friston, K.J. A hierarchy of time-scales and the brain. PLoS Comput. Biol.
**2008**, 4, e1000209. [Google Scholar] [CrossRef] - Gerstner, W.; Kistler, W.M.; Naud, R.; Paninski, L. Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Siettos, C.; Starke, J. Multiscale modeling of brain dynamics: From single neurons and networks to mathematical tools. Wiley Interdiscip. Rev. Syst. Biol. Med.
**2016**, 8, 438–458. [Google Scholar] [CrossRef] [PubMed] - Breakspear, M. Dynamic models of large-scale brain activity. Nat. Neurosci.
**2017**, 20, 340. [Google Scholar] [CrossRef] [PubMed] - Herz, A.V.; Gollisch, T.; Machens, C.K.; Jaeger, D. Modeling single-neuron dynamics and computations: A balance of detail and abstraction. Science
**2006**, 314, 80–85. [Google Scholar] [CrossRef] [PubMed] - Gerstner, W.; Naud, R. How good are neuron models? Science
**2009**, 326, 379–380. [Google Scholar] [CrossRef] - Chen, S.; Billings, S. Neural networks for nonlinear dynamic system modelling and identification. Int. J. Control
**1992**, 56, 319–346. [Google Scholar] [CrossRef] - Purwar, S.; Kar, I.; Jha, A. Nonlinear system identification using neural networks. IETE J. Res.
**2007**, 53, 35–42. [Google Scholar] [CrossRef] - Kuschewski, J.G.; Hui, S.; Zak, S.H. Application of feedforward neural networks to dynamical system identification and control. IEEE Trans. Control Syst. Technol.
**1993**, 1, 37–49. [Google Scholar] [CrossRef] - Pan, S.; Duraisamy, K. Long-time predictive modeling of nonlinear dynamical systems using neural networks. Complexity
**2018**, 2018, 4801012. [Google Scholar] [CrossRef] - Gupta, P.; Sinha, N.K. Modeling robot dynamics using dynamic neural networks. IFAC Proc. Vol.
**1997**, 30, 755–759. [Google Scholar] [CrossRef] - Patra, J.C.; Pal, R.N.; Chatterji, B.; Panda, G. Identification of nonlinear dynamic systems using functional link artificial neural networks. IEEE Trans. Syst. Man Cybern. Part B (Cybern.)
**1999**, 29, 254–262. [Google Scholar] [CrossRef] [PubMed] - Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7559–7566. [Google Scholar]
- Jaeger, H.; Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science
**2004**, 304, 78–80. [Google Scholar] [CrossRef] [PubMed] - Bailer-Jones, C.A.; MacKay, D.J.; Withers, P.J. A recurrent neural network for modelling dynamical systems. Netw. Comput. Neural Syst.
**1998**, 9, 531–547. [Google Scholar] [CrossRef] - Lenz, I.; Knepper, R.A.; Saxena, A. DeepMPC: Learning deep latent features for model predictive control. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
- Mohajerin, N.; Waslander, S.L. Multistep Prediction of Dynamic Systems With Recurrent Neural Networks. IEEE Trans. Neural Netw. Learn. Syst.
**2019**. [Google Scholar] [CrossRef] [PubMed] - Lin, L.; Gong, S.; Li, T.; Peeta, S. Deep learning-based human-driven vehicle trajectory prediction and its application for platoon control of connected and autonomous vehicles. In Proceedings of the Autonomous Vehicles Symposium, San Francisco, CA, USA, 9–12 July 2018; Volume 2018. [Google Scholar]
- Gonzalez, J.; Yu, W. Non-linear system modeling using LSTM neural networks. IFAC-PapersOnLine
**2018**, 51, 485–489. [Google Scholar] [CrossRef] - Wang, Y. A new concept using LSTM neural networks for dynamic system identification. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 5324–5329. [Google Scholar]
- Vlachas, P.R.; Byeon, W.; Wan, Z.Y.; Sapsis, T.P.; Koumoutsakos, P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc. R. Soc. A Math. Phys. Eng. Sci.
**2018**, 474, 20170844. [Google Scholar] [CrossRef] - Zenke, F.; Ganguli, S. Superspike: Supervised learning in multilayer spiking neural networks. Neural Comput.
**2018**, 30, 1514–1541. [Google Scholar] [CrossRef] - Huh, D.; Sejnowski, T.J. Gradient descent for spiking neural networks. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 1433–1443. [Google Scholar]
- Pandarinath, C.; O’Shea, D.J.; Collins, J.; Jozefowicz, R.; Stavisky, S.D.; Kao, J.C.; Trautmann, E.M.; Kaufman, M.T.; Ryu, S.I.; Hochberg, L.R.; et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods
**2018**, 15, 805–815. [Google Scholar] [CrossRef] - Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- McKiernan, E.C.; Marrone, D.F. CA1 pyramidal cells have diverse biophysical properties, affected by development, experience, and aging. PeerJ
**2017**, 5, e3836. [Google Scholar] [CrossRef] [PubMed] - Golomb, D.; Yue, C.; Yaari, Y. Contribution of persistent Na+ current and M-type K+ current to somatic bursting in CA1 pyramidal cells: Combined experimental and modeling study. J. Neurophysiol.
**2006**, 96, 1912–1926. [Google Scholar] [CrossRef] [PubMed] - Nowacki, J.; Osinga, H.M.; Brown, J.T.; Randall, A.D.; Tsaneva-Atanasova, K. A unified model of CA1/3 pyramidal cells: An investigation into excitability. Prog. Biophys. Mol. Biol.
**2011**, 105, 34–48. [Google Scholar] [CrossRef] [PubMed] - Ferguson, K.A.; Huh, C.Y.; Amilhon, B.; Williams, S.; Skinner, F.K. Simple, biologically-constrained CA1 pyramidal cell models using an intact, whole hippocampus context. F1000Research
**2014**, 3, 104. [Google Scholar] [CrossRef] [PubMed] - Poirazi, P.; Brannon, T.; Mel, B.W. Arithmetic of subthreshold synaptic summation in a model CA1 pyramidal cell. Neuron
**2003**, 37, 977–987. [Google Scholar] [CrossRef] - Royeck, M.; Horstmann, M.T.; Remy, S.; Reitze, M.; Yaari, Y.; Beck, H. Role of axonal NaV1. 6 sodium channels in action potential initiation of CA1 pyramidal neurons. J. Neurophysiol.
**2008**, 100, 2361–2380. [Google Scholar] [CrossRef] [PubMed] - Katz, Y.; Menon, V.; Nicholson, D.A.; Geinisman, Y.; Kath, W.L.; Spruston, N. Synapse distribution suggests a two-stage model of dendritic integration in CA1 pyramidal neurons. Neuron
**2009**, 63, 171–177. [Google Scholar] [CrossRef] - Bianchi, D.; Marasco, A.; Limongiello, A.; Marchetti, C.; Marie, H.; Tirozzi, B.; Migliore, M. On the mechanisms underlying the depolarization block in the spiking dynamics of CA1 pyramidal neurons. J. Comput. Neurosci.
**2012**, 33, 207–225. [Google Scholar] [CrossRef] - Marasco, A.; Limongiello, A.; Migliore, M. Fast and accurate low-dimensional reduction of biophysically detailed neuron models. Sci. Rep.
**2012**, 2, 1–7. [Google Scholar] [CrossRef] - Kim, Y.; Hsu, C.L.; Cembrowski, M.S.; Mensh, B.D.; Spruston, N. Dendritic sodium spikes are required for long-term potentiation at distal synapses on hippocampal pyramidal neurons. Elife
**2015**, 4, e06414. [Google Scholar] [CrossRef] [PubMed] - Bezaire, M.J.; Raikov, I.; Burk, K.; Vyas, D.; Soltesz, I. Interneuronal mechanisms of hippocampal theta oscillations in a full-scale model of the rodent CA1 circuit. Elife
**2016**, 5, e18566. [Google Scholar] [CrossRef] [PubMed] - Werbos, P.J. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw.
**1988**, 1, 339–356. [Google Scholar] [CrossRef] - Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE
**1990**, 78, 1550–1560. [Google Scholar] [CrossRef] - Mozer, M.C. A focused backpropagation algorithm for temporal. In Backpropagation: Theory, Architectures, and Applications; Lawrence Erlbaum Associates: Hillsdale, NJ, USA; Hove, UK, 1995; p. 137. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Naud, R.; Marcille, N.; Clopath, C.; Gerstner, W. Firing patterns in the adaptive exponential integrate-and-fire model. Biol. Cybern.
**2008**, 99, 335. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**A schematic illustrating the overall data-driven approach developed in this paper for multi-timestep predictions of high-dimensional dynamical systems’ behavior over a long time-horizon. An initial sequence of states and inputs are fed to the ”Stacked LSTM Network” in a reverse-order for multi-timestep prediction of the system’s states (”Reverse-order sequence-to-sequence mapping”). The predicted output from each stacked LSTM network is concatenated with the next sequence of inputs and fed into the next stacked LSTM network in a reverse-order to increase the predictive horizon. This process is iterated an arbitrary number of times, creating long dynamical predictions.

**Figure 2.**A schematic illustrating the internal gating operation in a single LSTM cell. The ”+” represents an additive operation and the “∘” represents a multiplicative operation. ${\sigma}_{g}$ is the sigmoidal activation function and ${\sigma}_{c}$ is the hyperbolic tangent activation function.

**Figure 3.**Iterative prediction of the system’s outputs over a long time-horizon. Each “Deep LSTM" receives the predicted sequence of outputs from the previous “Deep LSTM" and an equivalent length of new system’s inputs in reverse order and predict the next sequence of outputs of same time duration in future.

**Figure 4.**Forward and reversed sequence-to-sequence mapping approach for translating letters (inputs) to their numerical indices (outputs) in recurrent neural network (RNN). (

**a**) shows the forward sequence-to-sequence mapping approach. The input is fed into the network in the same sequence as the desired output. The “distance” between all corresponding inputs and outputs is uniform. (

**b**) shows the reversed sequence-to-sequence mapping approach. This approach introduces a temporal symmetry between input and output sequences while keeping the average “distance” between the corresponding inputs and outputs same as the forward approach. As shown in (

**b**), A$\to 1$ is the shortest “distance" to map, $B\to 2$ the second, and $C\to 3$ the furthest.

**Figure 5.**Diversity in the spiking patterns of hippocampal CA1 pyramidal neurons to applied currents. (

**a**) Regular bursting in response to the external current of 0.23 nA. (

**b**) Irregular bursting in response to the external current of 1.0 nA. (

**c**) Plateau potentials followed by regular spiking in response to the external current of 3.0 nA.

**Figure 6.**Training and validation loss for the deep long short term memory (LSTM) neural network with multi-timestep predictive horizon. (

**a**) 1 timestep predictive horizon. (

**b**) 50 timesteps predictive horizon. (

**c**) 100 timesteps predictive horizon. (

**d**) 200 timesteps predictive horizon.

**Figure 7.**Comparison of predicted membrane potential traces by the deep LSTM neural network (“LSTM Network”) to the regular spiking pattern exhibited by the Hodgkin-Huxley model (“HH Model”) in response to the external stimulating current $I=3.0$ nA. (

**a**) Prediction using 1 timestep predictive LSTM network (${N}_{p}=1$). (

**b**) Prediction using 50 timesteps predictive LSTM network (${N}_{p}=50$). (

**c**) Prediction using 100 timesteps predictive LSTM network (${N}_{p}=100$). (

**d**) Prediction using 200 timesteps predictive LSTM network (${N}_{p}=200$).

**Figure 8.**The effect of the length of predictive horizon of the deep LSTM neural network on the accuracy of regular spiking patterns prediction. (

**a**) shows the time-averaged root mean squared error (RMSE) versus predictive horizon of the LSTM network (${N}_{p}=1,50,100,200$) for the simulation results shown in Figure 7; (

**b**) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(2.3,3.0)$ and 100 random initial conditions for each stimulating current.

**Figure 9.**Comparison of predicted membrane potential traces by the deep LSTM neural network (“LSTM Network”) to the irregular bursting spiking patterns exhibited by the Hodgkin-Huxley model (“HH Model”) in response to the external stimulating current $I=1.5$ nA. (

**a**) Prediction using 1 timestep predictive LSTM network (${N}_{p}=1$). (

**b**) Prediction using 50 timesteps predictive LSTM network (${N}_{p}=50$). (

**c**) Prediction using 100 timesteps predictive LSTM network (${N}_{p}=100$). (

**d**) Prediction using 200 timesteps predictive LSTM network (${N}_{p}=200$).

**Figure 10.**The effect of the prediction horizon of the deep LSTM neural network on the accuracy of irregular bursting dynamics prediction. (

**a**) shows the time-averaged root mean squared error (RMSE) versus predictive horizon of the LSTM network (${N}_{p}=1,50,100,200$) for the simulation results shown in Figure 9. (

**b**) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(0.79,2.3)$ and 100 random initial conditions for each stimulating current.

**Figure 11.**Comparison of predicted membrane potential traces by the LSTM network (“NN Prediction”) to the irregular bursting spiking patterns exhibited by the Hodgkin-Huxley model (“HH Model”) in response to the external stimulating current $I=0.5$ nA. (

**a**) Prediction using 1 timestep predictive LSTM network (${N}_{p}=1$); (

**b**) Prediction using 50 timesteps predictive LSTM network (${N}_{p}=50$); (

**c**) Prediction using 100 timesteps predictive LSTM network (${N}_{p}=100$); (

**d**) Prediction using 200 timesteps predictive LSTM network (${N}_{p}=200$).

**Figure 12.**The effect of the prediction horizon of the multi-timestep LSTM network on the accuracy of regular bursting dynamics prediction. (

**a**) shows the time-averaged root mean squared error (RMSE) versus predictive horizon of the LSTM network (${N}_{p}=1,50,100,200$) for the simulation results shown in Figure 11. (

**b**) shows the RMSE versus simulation time for 5000 independent realizations, drawn from the predicted membrane potential trajectories of 50 randomly selected stimulating currents from a Uniform distribution $\mathcal{U}(0.24,0.79)$ and 100 random initial conditions for each stimulating current.

**Table 1.**Comparison of computational requirement for the iterative approach presented in this paper.

${\mathit{N}}_{\mathit{P}}$ ($\mathbf{\Delta}\mathit{t}=0.1$ ms) | Prediction Time (ms) | Iterations | On-Line Computation Time (s) | $\mathcal{O}\left(\mathit{I}\right)$ |
---|---|---|---|---|

1 | 500 | 5000 | 8.896 | $5.3\times {10}^{9}$ |

50 | 500 | 100 | 3.778 | $1.6\times {10}^{8}$ |

100 | 500 | 50 | 3.679 | $1.1\times {10}^{8}$ |

200 | 500 | 25 | 3.565 | $8.0\times {10}^{7}$ |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).