Quantum Convolutional Long Short-Term Memory Based on Variational Quantum Algorithms in the Era of NISQ

: In the era of noisy intermediate-scale quantum (NISQ) computing, the synergistic collaboration between quantum and classical computing models has emerged as a promising solution for tackling complex computational challenges. Long short-term memory (LSTM), as a popular network for modeling sequential data, has been widely acknowledged for its effectiveness. However, with the increasing demand for data and spatial feature extraction, the training cost of LSTM exhibits exponential growth. In this study, we propose the quantum convolutional long short-term memory (QConvLSTM) model. By ingeniously integrating classical convolutional LSTM (ConvLSTM) networks and quantum variational algorithms, we leverage the variational quantum properties and the accelerating characteristics of quantum states to optimize the model training process. Experimental validation demonstrates that, compared to various LSTM variants, our proposed QConvLSTM model outperforms in terms of performance. Additionally, we adopt a hierarchical tree-like circuit design philosophy to enhance the model’s parallel computing capabilities while reducing dependence on quantum bit counts and circuit depth. Moreover, the inherent noise resilience in variational quantum algorithms makes this model more suitable for spatiotemporal sequence modeling tasks on NISQ devices.


Introduction
Weather forecasting is a complex and crucial task that involves modeling and predicting a large amount of spatiotemporal data.Traditional meteorological forecasting methods typically rely on physical models and statistical approaches; however, these methods have limitations in capturing complex spatiotemporal dynamics and handling nonlinear data.Long short-term memory (LSTM) networks, as a powerful type of recurrent neural network architecture [1], have gained significant attention in the field of weather forecasting.
LSTM networks are renowned for their unique memory cell structure and gating mechanisms, enabling them to effectively capture long-term dependencies in time series data and alleviate the common issue of vanishing gradients during training [2][3][4][5].Therefore, LSTM has been widely applied in short-term weather forecasting, climate pattern prediction, and extreme weather event alerts [6][7][8][9][10][11][12][13].However, due to the complex nature of the LSTM network structure, substantial computational resources are required during training, and challenges may arise when dealing with large time spans and deep networks [14][15][16].
Meanwhile, the fusion of quantum and machine learning has become a hot research direction [17].A significant body of previous work indicates that quantum computing holds enormous potential in enhancing the performance of machine learning, surpassing traditional classical computing methods [18,19].In 2020, Samuel et al. first introduced the concept of quantum long short-term memory (QLSTM) [20].QLSTM successfully leverages the acceleration and entanglement properties of quantum mechanics to address the computational complexity and convergence issues encountered during training.In comparison to classical LSTM, QLSTM exhibits shorter computation times and more stable convergence [21,22].
The current era of quantum computing has entered the NISQ technology phase [23,24], where quantum noise becomes an unavoidable challenge.In practical NISQ devices, unresolved noise interference issues ultimately lead to deviations between the model's actual results and theoretical values.Huang et al. have introduced various quantum computing techniques, including variational quantum algorithms, error mitigation, quantum circuit compilation, and benchmark protocols [25].Among them, variational quantum algorithms have proven to possess natural noise resilience and are sometimes even beneficial in the presence of noise, making them considered the most promising avenue for realizing quantum advantage in practical applications during the NISQ era.Variational quantum algorithms have demonstrated impressive performance in various domains, such as classification tasks, generative adversarial learning, and deep reinforcement learning [26][27][28][29][30].
Currently, most QLSTM models utilize a quantum fully connected network structure [31][32][33], neglecting the consideration of spatial correlations in the data.Additionally, in the context of the NISQ era, evaluating the model's noise resistance is a valuable research endeavor.Therefore, this paper proposes a novel network framework from several perspectives.The contributions of this paper are as follows: To address the issue of traditional QLSTM lacking in learning data spatial features, we propose the QConvLSTM model based on the quantum convolutional neural network (QCNN) structure.This model introduces QCNN into LSTM for the first time, not only retaining the temporal modeling ability of classical LSTM but also enhancing the extraction of spatial features from data, endowing the model with spatiotemporal characteristics.Experimental results demonstrate that our proposed model outperforms other LSTM variants with equal parameters.
To improve the training efficiency and noise robustness of the model, we design a special VQC structure.By fully exploiting the parallelism of quantum computation through layered circuit stacking and utilizing a tree-like structure to reduce the requirements for quantum bit counts and circuit depth, we effectively enhance the training efficiency.Furthermore, the inherent noise resilience in the variational quantum algorithm greatly enhances the model's own noise resistance.
In contrast to the neglect of noise in other studies, we investigate the noise robustness of the model.By adding noise channels of different interference levels in VQC, we design noise simulation experiments.The results show that QConvLSTM exhibits strong robustness against various common incoherent noises, demonstrating its potential for stable training on NISQ devices.

Long Short-Term Memory
Hochreiter and Schmidhuber introduced LSTM networks in 1997 [1] to address the vanishing gradient problem encountered by traditional RNNs during training on long sequences.LSTM networks enhance the standard RNN structure with specialized memory units, enabling them to effectively capture long-term dependencies.Each LSTM unit consists of a cell state c t and a hidden state h t .At each time step, LSTM receives input x t from the current time step and the hidden state h t−1 from the previous time step, and controls the flow of information through various gate mechanisms, including the forget gate f t , the input gate i t , and the output gate o t .Specifically, whenever a new input x t arrives, if the input gate i t is activated, its information is accumulated in the cell.Furthermore, if the forget gate f t is open, the past cell state c t−1 is forgotten.Finally, the output gate o t controls whether the output c t propagates to the final state h t .The key formula is shown as follows (1): where σ represents the sigmoid activation function, W represents the parameters of the weight matrix, and denotes the Hadamard product.

Convolutional Long Short-Term Memory
Shi et al. first incorporated convolutional neural networks into classical LSTM in 2015 [34].ConvLSTM, as an improvement of the classical LSTM model, not only possesses the capability to process time series data like LSTM networks but also has the ability to extract spatial local features like CNNs.Compared to LSTM models, ConvLSTM models can take images as input to the network and perform convolutional operations on image sequences to extract image features, thus performing better sequence modeling where the temporal data are images.Its innovation lies in integrating the convolutional operations of convolutional neural networks into LSTM units.It applies one-dimensional convolutional operations on the input gate, forget gate, and output gate of LSTM, enabling the capturing of features of input data simultaneously in both time and space dimensions.ConvLSTM finds wide applications in various fields [35,36].For instance, it can model dynamic features in video sequences [37].In natural language processing, ConvLSTM can be applied to tasks such as text classification and sentiment analysis [38,39].The LSTM formula with incorporated convolutional operations is shown as follows (2): where * represents the convolution operator, and the input and output of ConvLSTM networks are both three-dimensional tensors, whereas in traditional LSTM models, they are two-dimensional.

Amplitude Encoding
LaRose et al. introduced several quantum encoding methods, including angle encoding, dense angle encoding, and amplitude encoding [40].Angle encoding and dense angle encoding are beneficial for reducing the depth of quantum circuits.However, these two encoding methods require O(N) orders of magnitude of qubits to encode classical data of dimension N, while most current NISQ devices can only provide a limited number of qubits.In contrast, choosing amplitude encoding, although it may deepen the circuit to some extent, only requires O(log N) orders of qubits to encode classical data of dimension N [41], making it more suitable for data encoding in the current era.
During the amplitude encoding process, the information of classical data is encoded into the amplitudes of quantum bits.A normalized classical N-dimensional data point x is encoded into a quantum state | φ x ⟩ requiring n qubits, with its amplitudes represented as Here, N = 2 n , x i represents the ith element in the data point x, and | i⟩ is the ith computational basis state.
To achieve amplitude encoding, a series of quantum gate operations are required to control the state of the quantum bits.Commonly used quantum gate operations include the Hadamard gate and the phase gate.The Hadamard gate can transform a basis state into a uniform superposition state, thereby adjusting the values of the amplitudes.The phase gate can introduce phase differences, further altering the encoding of amplitudes.

Variational Quantum Circuits
In 2007, Sousa et al. proposed a universal circuit model for implementing quantum variational algorithms [42].Subsequently, scholars in the field of quantum machine learning began to gradually focus on using variational quantum circuits (VQC) to enhance the performance of classical networks [43][44][45].A VQC consists of a series of quantum gate operations, with the adjustment of parameters in the circuit aimed at optimizing specific quantum states or quantum operations.As shown in Figure 1, VQC typically comprise two main parts: the encoding layer U ε and the variational layer U(θ).The encoding layer is used to encode classical data into quantum states, while the variational layer introduces adjustable parameters θ and applies a series of parameterized quantum gate operations on the input quantum state.These parameterized quantum gates can be adjusted using classical optimization algorithms to minimize a target function.Finally, the measurement layer M is employed to obtain the final result.

Incoherent Noise
Ren et al. have explored several types of incoherent noise [46].The application of incoherent noise channels converts input quantum pure states into mixed states.Specifically, the noise channel randomly rotates the input quantum state in a new direction, resulting in an interaction between the input state and the environment, leading to the output state being a density matrix.This density matrix typically consists of multiple pure states, each corresponding to a possible rotation direction.In a pure state, the quantum state of the system is completely determined, allowing us to predict its behavior precisely.However, in a mixed state, the quantum state of the system is uncertain, and thus we can only make probabilistic predictions about its behavior, as depicted by Equation ( 3): where | φ i ⟩ is a pure state within the mixed state ρ, where p represents the probability of being in that state and must satisfy normalization.

Model
We now present our QConvLSTM network.Although the QLSTM model has been proven to be powerful in handling time-correlated data, it contains too much spatial data redundancy.To address this issue, we propose the architecture of QConvLSTM, which includes quantum convolution operations in both the input-to-state and state-to-state transitions.Not only does it fully exploit the entanglement and acceleration properties of quantum mechanics to enhance training efficiency, but it also has an advantage in spatiotemporal sequence modeling problems through the combination of multiple quantum LSTM layers containing convolution operations.

Quantum Convolutional Long Short-Term Memory
The main drawback of QLSTM in handling spatiotemporal data is that it uses quantum fully connected neural networks in both the input-to-state and state-to-state transitions, without encoding spatial information.Therefore, we replace the quantum fully connected circuit layer with a quantum convolution circuit layer, as illustrated in Figure 2. We treat each time step of the input sequence as an image.When the image sequence at a certain time enters the quantum LSTM unit, three types of control gates composed of quantum convolution circuits are applied to the sequence based on the actual situation.This is carried out to learn the spatial and temporal information contained in the sequence, thereby performing the modeling of the image sequence.σ represents the sigmoid function, while the tanh block denotes the hyperbolic tangent activation function.x t represents the input at time t, h t represents the hidden state, c t represents the cell state, and y t represents the output.⊕ and , respectively, denote addition and multiplication operations.

Quantum Convolutional Circuit Structure
The approach adopted in this paper utilizes a hierarchical stacking method to design quantum circuits, gradually decreasing the number of qubits layer by layer in a tree-like structure.From a design perspective, the structure of this circuit is similar to that of a convolutional neural network.Through stacking and layer-by-layer qubit reduction operations, we fully exploit the parallelism of quantum computation while reducing the count of qubits and parameters, thus improving training efficiency.Additionally, the hierarchical structure offers flexibility and adjustability, allowing us to design and optimize according to specific problem requirements, adjust the number of layers, and select suitable gate operations and parameterization forms to improve the performance and convergence speed of the quantum algorithm.
The specific operations are illustrated in Figure 3. Firstly, we employ multiple sets of two-qubit VQC modules for combination, used to initialize the first layer of the circuit.Subsequently, we reduce the number of qubits in the next layer by discarding one qubit from each module's output.In the next layer, we apply the two-qubit VQC module again to the remaining qubits and then discard half of them.This process is repeated until only one qubit remains, and finally, the average expectation value on the remaining qubits is measured.This design effectively avoids problems such as barren plateaus, enhancing the overall trainability and performance of the network.The VQC 1 and VQC 2 in the figure represent different circuit design structures.Different structures will lead to changes in the performance of the model, so it is necessary to design a circuit structure suitable for the current application scenario based on the actual situation.The details of this design will be discussed in Section 7.

Experimental Setup
The dataset used in this experiment was the Moving-MNIST image dataset, with images having a resolution of 64 × 64 pixels.We selected 500 sequences from the dataset for training and 200 sequences for testing.The learning rate was set to 0.001, and the encoding method was amplitude encoding.The experiments were conducted on a Linux operating system, specifically Ubuntu 18.04.5 LTS, with GPU processing.The experimental code was implemented using Python 3, and the libraries chosen were PyTorch and PennyLane.
PyTorch is an open source ML library that offers various ways to construct models and can utilize GPU acceleration for computing, making it suitable for large-scale data and complex models.PennyLane, on the other hand, is an open source quantum machine learning library specifically designed for gradient descent optimization in quantum computing.It provides interfaces based on popular machine learning frameworks such as PyTorch and TensorFlow and integrates code for simulating quantum noise, allowing users to develop and train noisy quantum machine learning models.
Equation ( 4) represents the Kraus operators for the bit flip channel, where p is the probability of a qubit undergoing a bit flip.The operator K 1 describes the case where no quantum state flip occurs, while the operator K 2 describes the case where an X gate is applied to the quantum state with a certain probability, controlling the quantum state to transition from state | 0⟩ to state | 1⟩ , or from state | 1⟩ to state | 0⟩ .
Equation ( 5) represents the Kraus operators for the phase flip channel, where the operator K 2 describes the case where a Z gate is applied to the quantum state with a certain probability, controlling the quantum state to remain unchanged if it is in state | 0⟩ , or to transition from state | 1⟩ to state −| 1⟩ .
Equation ( 6) represents the Kraus operators for the bit-phase flip channel, where the operator K 2 describes the case where a Y gate is applied to the quantum state with a certain probability, controlling the quantum state to transition from state | 0⟩ to state i | 1⟩ , or from state | 1⟩ to state −i| 0⟩ .
Equation ( 7) represents the Kraus operators for the depolarizing channel.The depolarizing channel is characterized by the application of X, Y, and Z gates with equal probabilities on the quantum state.

Noiseless Simulations
In this section, we apply the proposed QConvLSTM framework to model the Moving-MNIST dataset.To compare the differences between classical and quantum learning, all model network hyperparameters were kept consistent.Firstly, based on the model architecture described in Section 4.1, we constructed the QConvLSTM model framework containing quantum convolutional layer operations.Additionally, during the model experimentation process, we adjusted and compared the model circuit's number of layers to optimize the model's performance.Ultimately, we chose the optimal setting with the circuit layers as 2.
In addition to ensuring the final training effectiveness of the model, it is also necessary to consider the available computing resources.Since we used the quantum simulator provided by the PennyLane platform in our experiment, the running speed of a real quantum computer could not be achieved.Therefore, to reduce the time complexity of training, we had to use circuits with as few qubits as possible.In this experiment, we used a two-layer four-qubit structure for the circuit.If using amplitude encoding, the maximum dimension of the input sequence that this circuit could accept was 16.However, since the Moving-MNIST dataset has a size of 64 × 64 pixels, it requires preprocessing to reduce the dimensionality of the sequences.We first reduced the original sequences to 64 × 16 using a fully connected layer and then split them into 64 batches to input into the circuit and encode them into quantum states.

Noisy Simulations
We simulated the scenario of noise by adding noise channels at each layer of the quantum convolutional network.Initially, based on the aforementioned experiments, we added noise channels at the end of the convolutional circuits in the QConvLSTM model, with the same type of noise channel added for each test.We conducted multiple tests using the same methodology as described in the previous experiment.Finally, we evaluated the robustness of our model to various types of noise during the training process by observing changes in performance metrics, thus assessing the noise robustness of the model.We utilized the four types of noise channels introduced in Section 5.1, namely bit flip, bit-phase flip, phase flip, and depolarizing noise.These four types of noise capture the impacts that most common noise sources may have.

Noiseless
Table 1 presents the average results of all comparative models on each frame.We utilized evaluation metrics widely used by previous researchers: mean squared error (MSE) [47], structural similarity index measure (SSIM) [48], and learned perceptual image patch similarity (LPIPS) [49].The distinctions among these metrics lie in the fact that the MSE estimates absolute pixel errors; the SSIM measures the similarity of structural information within spatial neighborhoods; and the LPIPS is based on deep features, which better align with human perception.Through these three types of metrics, we can comprehensively assess the sequence modeling ability of the models, where smaller MSE and LPIPS values indicate better performance, while the SSIM value is the opposite.Figure 4 provides the corresponding frame-by-frame comparison changes.By comparing multiple datasets, we can intuitively understand the performance differences among various LSTM variants.Regarding the MSE, firstly, by adding convolutional operations to LSTM, the ConvLSTM model reduced the mean MSE from 132.7 to 113.4 compared to the LSTM model.Secondly, by combining quantum algorithms with classical LSTM models, the QLSTM model further reduced the mean MSE to 87.2.Finally, our proposed QConvLSTM model further increased the mean MSE by 25.7% (from 87.2 to 64.8) by incorporating quantum convolutional networks.Additionally, the SSIM and LPIPS, respectively, improved by 2.1% and 14.5% over QLSTM.Therefore, it is concluded that, under the same parameters, the performance of the QConvLSTM model is superior to classical models and quantum models with quantum fully connected network structures.

Noisy
Table 2 shows the results of the simulation experiments for the four types of noise.The results indicate that the influence of these four types of noise on our QConvLSTM model can be neglected.The robustness of QConvLSTM to noise interference is determined by the special structure of the model.Firstly, the circuit depth used in our experiments is only two layers, and circuits with shallower depths are less prone to noise issues.Secondly, the LSTM unit with added convolutional operations can better extract features from the input data, enabling the network to adapt to and fit the influence of noise.Lastly, the introduced variational quantum algorithm can enhance the network's ability to handle some nonlinear problems, assisting the network in better capturing and processing nonlinear information in the noise.

Discussion
As mentioned in Section 4.2, our circuit structure design adopts a hierarchical stacking approach.Therefore, we conducted multiple tests on the number of stacked layers in the circuit to find the optimal number of layers.As shown in Table 3, we tested the performance of the models under three scenarios.The single-layer circuit structure yielded the worst results, while the performance of the three-layer structure was slightly better than the two-layer structure.However, as the number of layers increased, both the computational complexity and time complexity grew exponentially, and an increase in network depth was more likely to lead to the emergence of noise.Therefore, we ultimately chose two layers as the number of layers for the experimental circuit.Typically, the structure of a quantum circuit can significantly impact the performance of the model.In the early stages of experimentation, we designed six types of circuit structures, as shown in Figure 5.We conducted comparative tests on these six types of circuits to select the one with the best performance as the final circuit for this experiment.According to the results in Table 4, the circuit labeled as (c) exhibited the best performance, surpassing the circuits with other structures.Therefore, we chose this structure as the final component of the model in this paper.

Conclusions
This paper introduces a novel QConvLSTM model that combines the advantages of quantum computing and convolutional networks on the basis of classical LSTM networks.This model not only improves the training efficiency but also enhances the model's capability to extract spatial features.We first utilized the proposed model to predict the Moving-MNIST dataset and then evaluated the model's performance based on loss value and accuracy.Experimental results demonstrate that, under the same parameters, the performance of QConvLSTM surpasses that of classical LSTM structures and QLSTM.Specifically, compared to QLSTM, QConvLSTM achieved a 25.7%, 2.1%, and 14.5% improvement in MSE, SSIM, and LPIPS, respectively.Furthermore, due to the hierarchical tree-like structure adopted by the circuit, we utilized fewer quantum bits during design, reduced network depth, decreased the overall parameter count of the model, and improved training efficiency.Finally, we demonstrate the robustness of QConvLSTM to the most common noise sources, which holds considerable practical significance in the era of NISQ computing.
The integration of quantum convolutional networks into classical LSTM provides new insights for future researchers in the study of quantum long short-term memory networks.The robustness of QConvLSTM to noise enables training on most current NISQ devices.However, due to device limitations, the experiments in this work were confined to lowresolution image data.In future work, we aim to extend our model to high-resolution image tasks and attempt to reconstruct the control gate structure within LSTM units using structurally diverse quantum neural networks.Furthermore, our research on noise robustness remains incomplete.In the future, we will expand our study to include noise factors such as phase damping, amplitude damping, and depolarization damping.

Figure 1 .
Figure 1.Parameter update process in a VQC.The entire process occurs simultaneously in both quantum and classical environments.Variational operations are performed on a quantum computer, while parameter optimization operations are carried out on a classical computer.

Figure 2 .
Figure 2. The basic framework of the LSTM unit based on quantum convolutional neural networks.

Figure 3 .
Figure 3.A quantum convolutional circuit based on a hierarchical tree-like structure.The entire circuit is composed of multiple two-qubit VQC modules concatenated together, with each column representing a layer of the circuit.Each blue square represents a VQC module, which can be flexibly constructed according to specific requirements.

Figure 4 .
Figure 4. Frame-by-frame results of the Moving-MNIST test set generated by models trained on the training set: (a) MSE; (b) SSIM; (c) LPIPS.

Figure 5 .
Figure 5. Structural design of VQC: (a-f) represent the design schemes of a single VQC module in the quantum convolution circuit based on a hierarchical tree structure.

Table 1 .
Average performance of different models on 10 prediction time steps.

Table 2 .
Performance metric comparison of QConvLSTM in different noise environments.

Table 3 .
Performance comparison of circuit structures with different numbers of layers.

Table 4 .
Comparison of the impact of different VQC structures on training effectiveness.