Multi-Objective Evolutionary Architecture Search for Parameterized Quantum Circuits

Recent work on hybrid quantum-classical machine learning systems has demonstrated success in utilizing parameterized quantum circuits (PQCs) to solve the challenging reinforcement learning (RL) tasks, with provable learning advantages over classical systems, e.g., deep neural networks. While existing work demonstrates and exploits the strength of PQC-based models, the design choices of PQC architectures and the interactions between different quantum circuits on learning tasks are generally underexplored. In this work, we introduce a Multi-objective Evolutionary Architecture Search framework for parameterized quantum circuits (MEAS-PQC), which uses a multi-objective genetic algorithm with quantum-specific configurations to perform efficient searching of optimal PQC architectures. Experimental results show that our method can find architectures that have superior learning performance on three benchmark RL tasks, and are also optimized for additional objectives including reductions in quantum noise and model size. Further analysis of patterns and probability distributions of quantum operations helps identify performance-critical design choices of hybrid quantum-classical learning systems.


INTRODUCTION
Near-term quantum computing technologies will hopefully allow quantum computing systems to reliably solve tasks that are beyond the capabilities of classical systems [30].One of the most promising applications of quantum computing is hybrid quantum-classical learning systems, which utilize parameterized quantum operations and classical optimization algorithms to solve the learning tasks.Prior works have demonstrated that parametrized quantum circuits (PQCs) [2] are able to handle a variety of supervised and unsupervised tasks such as classification [15,22,31,32]) and generative modeling [16,20,23,39], as well as provide proofs of their learning advantages [8,17].Some recent work [17,34] further shows that PQCs can be used to construct quantum policies to solve the more complex reinforcement learning problems, with an empirical learning advantage over standard deep neural networks (DNNs).
One of the key aspects behind the success of PQC-based algorithms is the architectural designs of hybrid quantum learning frameworks.While prior works [17,29,34] have either identified some essential components, or empirically analyzed the influence of different design choices on the learning performances, the development of high-performance architectures of hybrid quantum systems nevertheless relies on human ingenuity.On the other hand, architecture search methods, which aim to automate the process of discovering and evaluating the architecture of complex systems, have been extensively explored in classical learning systems, e.g., neural architecture search (NAS) [11].More specifically, recent works [7,25] on combining genetic algorithms with gradient-based optimization have demonstrated superior performance in NAS and more generally optimizing deep neural networks.In the context of quantum computing, common architecture search approaches, such as greedy algorithms [13,36], evolutionary algorithms [9,21,24], reinforcement learning [19,27], and gradient-based learning [38] have also been attempted to solve tasks such as quantum control, variational quantum eigensolver, and quantum error mitigation.However, most of these approaches target on optimizing either specific pure quantum circuits or single-qubit quantum operations, instead of more complex multi-qubit hybrid systems.Overall, automated search and optimization of architectures of hybrid quantum learning systems have not been sufficiently explored yet.
In this work, we aim to explore using genetic algorithms to automatically design the architecture of hybrid quantum-classical systems that can solve complex RL problems.We propose EQAS-PQC, an Evolutionary Quantum Architecture Search framework for constructing complicated quantum circuits based on some fundamental PQC circuits.We adopt the ideas of successful approaches in NAS using genetic algorithms, which have more flexible architecture search spaces and require less prior knowledge about architecture design.In our experiments, we consider the benchmark RL environments from OpenAI Gym, which has been widely used for RL research.Experimental results show that agents trained by using our method significantly outperform the ones from prior work.We further analyze the top-performing PQC architectures found by our method to identify the common patterns that appear during the search process, which may provide insights for the future development of hybrid quantum-classical learning systems.

PRELIMINARIES AND RELATED WORK
In this section, we introduce the basic concepts of quantum computation related to this work, and give a detailed description of parametrized quantum circuits and their applications.

Quantum Computation Basics
An -qubit quantum system is generally represented by a complex Hilbert space of 2  dimensions.Under the bra-ket notation, the quantum state of the system is denoted as a vector | ⟩, which has unit norm ⟨ | ⟩ = 1, where ⟨ | is the conjugate transpose and ⟨ | ⟩ represents the inner-product.The computation basis states are represented as the tensor products of single-qubit computational basis states, e.g., the two-qubit state |01⟩ = |0⟩ ⊗ |1⟩ where |0⟩ = Given a rotation angle  , the matrix representations of rotation operators are If a quantum state of a composite system can not be written as a product of the states of its components, we call it an entangled state.An entanglement can be created by applying controlled-Pauli-Z gates to the input qubits.
A projective measurement of quantum states is described by an observable, , which is a Hermitian operator on the state space of the quantum system being observed.The observable has a spectral decomposition where   is the projector onto the eigenspace of M with eigenvalue .Upon measuring the state | ⟩, the probability of getting result  is given by and the expectation value of the measurement is For a more detailed introduction to basic concepts of quantum computation, we refer the readers to Nielsen and Chuang [26].

Parametrized Quantum Circuits
Given a fixed -qubit system, a parametrized quantum circuit (PQC) is defined by a unitary operation  (,  ) that acts on the current quantum states  considering the trainable parameters  .In this work, we mainly consider two types of PQCs: variational PQCs (V-PQCs) [2,18] and data-encoding PQCs (D-PQCs) [29,33].The V-PQCs are composed of single-qubit rotations   ,   ,   with the rotation angles as trainable parameters.The D-PQCs have a similar structure with rotations, but the angles are the input data  scaled by a trainable parameter .The structures of both PQCs are depicted in Fig. 1, which we describe in details later in Sec.3.1.
A recent work [17] proposes to use an alternating-layered architecture [29,33] to implement parameterized quantum policies for RL, which basically applies an alternation of V-PQC (followed by an entanglement) and D-PQC till the target depth.While this architecture is simple and effective, it is obvious to see that this general design can be easily modified and probably improved by changing the placement of its components.In this work, we aim to optimize the design of such PQC-based systems with architecture search methods.

Quantum Architecture Search
Early research [35] has shown the usage of genetic programming to solve specific quantum computing problems from an evolutionary perspective.Prior works [9,19,21,24,27,36,38] have explored the usage of common architecture search approaches in various quantum computing applications such as quantum control, variational quantum eigensolver, and quantum error mitigation.However, most of these works target on specific quantum computing problems and try to optimize the quantum circuits in a hardware-efficient manner.
More recently, a few approaches have been proposed to optimize the architectures involving parameterized quantum circuits.Grimsley et al. [13] proposed a method that iteratively adds parameterized gates and re-optimizes the circuit using gradient descent.Ostaszewski et al. [28] proposed an energy-based searching method for optimizing both the structure and parameters for single-qubit gates and demonstrated its performance on a variational quantum eigensolver.In this work, we take one step further and propose a more general architecture search framework for hybird quantumclassical systems with both parameterized and non-parameterized quantum operators, which aim to solve the challenging learning problems such as RL.

METHOD
We propose EQAS-PQC, an Evolutionary Quantum Architecture Search framework for constructing quantum learning models based on some fundamental PQC circuits.While the proposed framework can be generally applied to various learning problems, in this work, Figure 1: Illustration of a simple 4-qubit PQC architecture in the search space of EQAS-PQC.This architecture, of which the genome encoding is 1 − 2 − 3 − 0, is composed of 4 operations: 1) Variational PQC (x 1 ) performs rotations on each qubits according to parameters  ; 2) Data-encoding PQC (x 2 ) performs rotations on each qubit according to the input data  and scaling parameter ; 3) Entanglement (x 3 ) performs circular entanglement to all the qubits; 4) Measurement (x 0 ) adds another Variational PQC (x 1 ) and perform measurement to obtain the observable values.
we choose to target on the challenging RL problems in order to better illustrate the benefit of our method.In this section, we describe the major components of EQAS-PQC including encoding scheme and search process in detail.

Encoding and Search Space
Biologically inspired methods such as genetic algorithms (GAs) have been successfully used in many search and optimization problems for decades.In most cases, GAs refer to a class of populationbased computational paradigms, which simulate the natural evolution process to evolve programs by using genetic operations (e.g., crossover and mutation) to optimize some pre-defined fitness or objective functions.From this perspective, we may view the architectures of quantum circuits as phenotypes.Since the genetic operations usually work with genotypes, which are representations where the genetic operations can be easily applied, we need to define an encoding scheme as the interface for abstracting the architectures to genomes, where the genes are different quantum operations.
The existing architectures of parameterized quantum policies [17] can be viewed as a composition of functional quantum circuits that specify some computational schemes on a single qubit or multiple qubits.In EQAS-PQC, we define four basic operation encodings x = {x 0 , x 1 , x 2 , x 3 }, and the corresponding genes are represented as integers {0, 1, 2, 3}.An illustration of a simple PQC architecture in the search space of EQAS-PQC is depicted in Fig. 1.More specifically, given a fixed -qubit state, we define the following operations: • x 1 : Variational PQC -A circuit with single-qubit rotations   ,   ,   performed on each qubit, with the rotation angles as trainable parameters.• x 2 : Data-encoding PQC -A circuit with single-qubit rotations   performed on each qubit, with the rotation angles is the input scaled by trainable parameters.• x 3 : Entanglement -A circuit that performs circular entanglement to all the qubits by applying one or multiple controlled-Z gates.• x 0 : Measurement -A Variational PQC followed by measurement.
The outputs are computed by weighting the observables by another set of trainable parameters for each output, with optional activation functions such as Softmax.The architecture encoding/decoding is terminated when approaching to x 0 .
It is easy to see that the search space of EQAS-PQC is dependent on the maximal length of the genomes.Since the encoding will terminate when approaching to x 0 , there will be cases where the same architecture is decoded from different genomes.So the search space is the sum of possible operations (except x 0 ) for all the possible length less than the maximum length.In other words, given a maximum length of the genomes , the search space of EQAS-PQC is

Search Process
Similar to many other genetic algorithms, EQAS-PQC iteratively generates a population of candidates (architectures) through genetic operations on the given parents, and selects parents for the next generation based on fitness evaluation.In this work, we adopt the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) [6] to optimize the search process, with the average collected rewards as the objective.NSGA-II has been successfully employed in various single-and multi-objective optimization problems including NAS [25].
The goal of EQAS-PQC is to discover diverse sequential combinations of quantum operators and optimize the process with respect to the objective.Towards this goal, we elaborate on the following components of the search process:

Crossover.
We use the two-point crossover to perform recombination of the parent architectures to generate new offspring.This method randomly choose two crossover points and swap the bits between the parents, and has been widely used in search problems such as NAS.The intuition is that sequential architectures for learning models usually require different substructure for the beginning (input), middle (modeling), and ending (output) of the architecture.From this perspective, the two-point crossover can hopefully separate the three parts of the model and improve the architecture through recombination.

Mutation.
To enhance the diversity of architectures in the population, we also add polynomial mutation, which has been widely used in evolutionary optimization algorithms as a variation operator.Given the specific encoding, the mutation is operated in integer space, and will allow the search to potentially reach any possible genomes in the search space.

Duplicate elimination.
It is worth noting that, given the proposed encoding scheme, some different genomes may be decoded to the same architecture, e.g., any operations after x 0 does not change the architecture.To maintain the diversity of population, we additionally eliminate those duplicate architectures with different genomes.

Fitness evaluation.
For each generation, we decode the population to different architectures, and use the architectures to construct Softmax-PQC [17] policies for the RL agents.While EQAS-PQC can be easily extended to optimize for multiple objectives, in this work, we demonstrate by using the learning performance of the RL agents as a single objective for fitness evaluation.The learning performance is computed as the average episode reward to represent the area under the learning curve.

EXPERIMENTS
In this section, we describe the experimental setup and implementation details of EQAS-PQC on the classical benchmark RL environments.We also show the empirical results of our method compared to the prior work (Softmax-PQC by Jerbi et al. [17]) to demonstrate the advantage of using EQAS-PQC against commonly-used alternating-layer PQC architectures.

RL Environments
In this work, we consider two classical RL benchmark environments from the OpenAI Gym [4]: CartPole and MountainCar.Both environments have continuous state spaces and discrete action spaces, and have been widely used in RL research, including prior works on quantum RL [17,34].The CartPole task is to prevent the pole from falling over by controlling the cart.For MountainCar, the goal is to drive up the mountain by driving back and forth to build up momentum.More detailed description can be found in Brockman et al. [4].The specifications are presented in Table 1, where the reward is the step reward and  is the discount factor for future rewards.

Implementation Details
Search algorithm.
For both environments, EQAS-PQC uses a population size of 20 and runs for 20 generations.The maximum length of architecture is set to 30.We also reduce the total number of episode to a factor of 0.8 in the search process to improve the efficiency of evolution.The main search framework is implemented using the pymoo [3].

RL training during search.
For each generated PQC policy, we learn the policy for a single trial, and calculate the average episode rewards as its learning performance.We set the hyperparameters such as learning rates and observables following the general practice in Jerbi et al. [17], which are also summarized in Table 1.All the agents are trained using REINFORCE [37], which is a basic Monte Carlo policy gradient algorithm.We additionally apply the value-function baseline [12] in MountainCar to stabilize the Monte Carlo process, which has been commonly used in recent RL methods [10].The quantum circuits are implemented using Cirq [14] and the learning process is simulated using TensorFlow [1] and TensorFlow Quantum [5].

Performance evaluation.
For the final results, we take the best performing architecture for each environment and evaluate it for 10 trials (500 episodes for CartPole and 1000 episodes for Mountain-Car).To compare with prior work, we also evaluate the alternatinglayer architecture (Softmax-PQC) as used in Jerbi et al. [17], which can be viewed as a special case in the search space of EQAS-PQC.

Results
We evaluate the general performance of the proposed EQAS-PQC and the experiment results are presented as follows.There are two goals for our experiment: 1) to show that our method is able to find PQC architectures with better learning performance as well as a similar computation cost to prior work; 2) to discover the performance-critical design choices of PQCs in addition to the commonly-used alternating-layer architecture.To illustrate the above points, we first apply EQAS-PQC to two classical RL benchmark environments and obtained the best performing architectures, and then conduct two analysis on the resulting architectures.

Learning Performance.
We evaluate and visualize the average learning performance over 10 trials of the best-performing architecture searched by EQAS-PQC and the one used in Softmax-PQC [17], as shown in Fig. 2. The corresponding genomes of the EQAS-PQC architectures for the two RL environments are: − 0 To ensure a fair comparison, for Softmax-PQC, we use the depth of 6, resulting in an architecture with length 19, which is larger than the resulting architectures searched by EQAS-PQC for both environments.Thus, we can conclude that our method is able to find PQC architectures that significantly outperform the standard alternating-layer PQC.
Probability Distribution of Quantum Operations.
We also want to tell the reason why architectures found by EQAS-PQC is able to have better performance.To illustrate this, we calculate the probability distribution of all the encoded operations at each position in the architecture, and visualize in Fig. 3.The probabilities are smoothed by a window of length 5 and the fitted lines are polynomials.
From the plot, we can first see that the Variational PQC has a similar frequency as entanglement, which aligns with the design of alternating-layer PQC.However, the frequency of Data-encoding PQC has an obvious decreasing trend, indicating that it is better to have more Data-encoding PQCs at the beginning of the architecture.This finding is intuitive and can be referred to the general machine learning modeling practices, where data input is usually at the beginning of the modeling.Finally, the probability of Measurement does not increase till the end of architecture, meaning that most of the optimal architectures have a similar length of around 20.This also shows the advantage of PQCs that shallow architectures with a small number of qubits are able to handle the challenging RL problems, which has also been proved in prior works [17].

CONCLUSION AND FUTURE WORK
In this work, we propose EQAS-PQC, an evolutionary quantum architecture search framework for parameterized quantum circuits.EQAS-PQC uses the population-based genetic algorithm to evolve PQC architectures by exploring the search space of quantum operations.Experimental results show that our method can significantly improve the performance of hybrid quantum-classical systems in solving benchmark reinforcement learning problems.In addition, we also analyze the probability distributions of quantum operations in top-performing architectures, and identifies design choices that are essential to the performance of PQC-based architectures.
One limitation to our work is that the experiments are conducted using a simulation backend for quantum circuits.For future work, we expect to extend our work to use real quantum computers and add more objectives to the consideration of evolutionary, such as quantum noise and hardware efficiency.
acting on qubits is called a quantum gate.Some common quantum gates are frequently used in this work, namely the single-qubit Pauli gates Pauli- , Pauli- , Pauli- and their associated rotation operators   ,   ,   .The matrix representations of Pauli gates are

Table 1 :Figure 2 :
Figure 2: Learning performance of EQAS-PQC on benchmark RL environments.We plot the average learning curves (smoothed by a temporal window of 10 episodes)) over 10 randomly-initialized EQAS-PQC agents and Softmax-PQC agents in two benchmark RL environments (CartPole-v1 and MountainCar-v0) from OpenAI Gym.The shaded areas represent the standard deviation of the average collected reward.

Figure 3 :
Figure 3: Probability distribution of quantum operations in top-performing PQC architectures.We select 20 topperforming architectures searched by EQAS-PQC (10 for each RL environment), and calculate the probability distributions of operations at each position in the architecture.