From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz

The next few years will be exciting as prototype universal quantum processors emerge, enabling implementation of a wider variety of algorithms. Of particular interest are quantum heuristics, which require experimentation on quantum hardware for their evaluation, and which have the potential to significantly expand the breadth of quantum computing applications. A leading candidate is Farhi et al.'s Quantum Approximate Optimization Algorithm, which alternates between applying a cost-function-based Hamiltonian and a mixing Hamiltonian. Here, we extend this framework to allow alternation between more general families of operators. The essence of this extension, the Quantum Alternating Operator Ansatz, is the consideration of general parametrized families of unitaries rather than only those corresponding to the time-evolution under a fixed local Hamiltonian for a time specified by the parameter. This ansatz supports the representation of a larger, and potentially more useful, set of states than the original formulation, with potential long-term impact on a broad array of application areas. For cases that call for mixing only within a desired subspace, refocusing on unitaries rather than Hamiltonians enables more efficiently implementable mixers than was possible in the original framework. Such mixers are particularly useful for optimization problems with hard constraints that must always be satisfied, defining a feasible subspace, and soft constraints whose violation we wish to minimize. More efficient implementation enables earlier experimental exploration of an alternating operator approach to a wide variety of approximate optimization, exact optimization, and sampling problems. Here, we introduce the Quantum Alternating Operator Ansatz, lay out design criteria for mixing operators, detail mappings for eight problems, and provide brief descriptions of mappings for diverse problems.

The next few years will be exciting as prototype universal quantum processors emerge, enabling implementation of a wider variety of algorithms. Of particular interest are quantum heuristics, which require experimentation on quantum hardware for their evaluation, and which have the potential to significantly expand the breadth of applications for which quantum computers have an established advantage. A leading candidate is Farhi et al.'s Quantum Approximate Optimization Algorithm, which alternates between applying a cost-function-based Hamiltonian and a mixing Hamiltonian. Here, we extend this framework to allow alternation between more general families of operators. The essence of this extension, the Quantum Alternating Operator Ansatz, is the consideration of general parameterized families of unitaries rather than only those corresponding to the time-evolution under a fixed local Hamiltonian for a time specified by the parameter. This ansatz supports the representation of a larger, and potentially more useful, set of states than the original formulation, with potential long-term impact on a broad array of application areas.
For cases that call for mixing only within a desired subspace, refocusing on unitaries rather than Hamiltonians enables more efficiently implementable mixers than was possible in the original framework. Such mixers are particularly useful for optimization problems with hard constraints that must always be satisfied, defining a feasible subspace, and soft constraints whose violation we wish to minimize. More efficient implementation enables earlier experimental exploration of an alternating operator approach, in the spirit of the Quantum Approximate Optimization Algorithm, to a wide variety of approximate optimization, exact optimization, and sampling problems. In addition to introducing the Quantum Alternating Operator Ansatz, we lay out design criteria for mixing operators, detail mappings for eight problems, and provide a compendium with brief descriptions of mappings for a diverse array of problems.

Introduction
Over the last few decades, researchers have discovered several stunning instances of quantum algorithms that provably outperform the best existing classical algorithms and, in some cases, the best possible classical algorithm [64]. For most problems, however, it is currently unknown whether quantum computing can provide an advantage, and if so, how to design quantum algorithms that realize such advantages. Today, challenging computational problems arising in the practical world are frequently tackled by heuristic algorithms, which by definition have not been analytically proven to be the best approach, or even proven analytically to outperform the best approach of the previous year. Rather, these algorithms are empirically shown to be effective, by running them on characteristic sets of problems. As prototype quantum hardware emerges, we can begin to apply this heuristic approach to quantum heuristic algorithms.
For several years now, special-purpose quantum hardware has been used to explore one quantum heuristic algorithm, quantum annealing. Emerging gate-model processors will enable investigation of a much broader array of quantum heuristics beyond quantum annealing. Within the last year, IBM has made available publicly through the cloud a gate-model chip with five superconducting qubits [40], and announced recently an upgrade to a 16-qubit chip. Likewise, Google [10] and Rigetti Computing [68] anticipate providing processors with 40-100 superconducting qubits within a year or two [55]. Many academic groups, including at TU Delft and at UC Berkeley, have similar efforts. Beside superconducting architectures, ion [17] and neutral atom based [66] devices are also reaching the scale at which intermediate-size experiments would be feasible [56]. Gate-model quantum computing expands the empirical evaluation of quantum heuristics applications beyond optimization of classical functions, as well as enabling a broader array of approaches to optimization [77].
While limited exploration of quantum heuristics beyond quantum annealing has been possible through small-scale classical simulation, the exponential overhead in such simulations has limited their usefulness. The next decade will see a blossoming of quantum heuristics as a broader and more flexible array of quantum computational hardware becomes available. The immediate question is: what experiments should we prioritize that will give us insight into quantum heuristics? One leading candidate is the Quantum Approximate Optimization Algorithm (QAOA), for which a number of tantalizing related results have been obtained [21,23,41,71,[73][74][75] since Farhi et al.'s initial paper [22]. In QAOA, a phase-separation operator, usually the problem Hamiltonian that encodes the cost function of the optimization problem, and a mixing Hamiltonian are applied in alternation. The class QAOA p consists of level-p QAOA circuits, in which there are p iterations of applying a classical Hamiltonian (derived from the cost function) and a mixing Hamiltonian. The 2p parameters of the algorithm specify the durations for which each of these two Hamiltonians are applied.
Prior work suggests the power and flexibility of QAOA circuits. Farhi et al. [23] exhibited a QAOA 1 algorithm that beat the existing best approximation bound for efficient classical algorithms for the problem E3Lin2, only to inspire a better classical algorithm [5]. Jiang et al. [41] demonstrated that the class of QAOA circuits is powerful enough to obtain the Θ( √ 2 n ) query complexity on Grover's problem, and also provides the first algorithm within the QAOA framework to show a quantum advantage for a finite number of iterations greater than two. Farhi and Harrow [21] proved that, under reasonable complexity assumptions, the output distribution of even QAOA 1 circuits cannot be efficiently sampled classically. Yang et al. [75] proved that for evolution under a Hamiltonian that is the weighted sum of Hamiltonian terms, with the weights allowed to vary in time, the optimal control is (essentially always) bang-bang, i.e. constant magnitude, of either the maximum or minimum allowed weight, for each of the terms in the Hamiltonian at any given time. Their work implies that QAOA circuits with the right parameters are optimal among Hamiltonians of the form H(s) = 1 − f (s) H B + f (s)H C , where f (s) is a real function in the range [0, 1]. It remains an open question whether QAOA provides a quantum advantage over classical algorithms for approximate optimization.
QAOA circuits were first proposed by Farhi et al. [22] as the basis for a quantum approximate optimization algorithm (QAOA). Since Farhi et al.'s original work, QAOA circuits have also been applied for both exact optimization [41,74] and sampling [21]. Here, we formally describe a Quantum Alternating Operator Ansatz (QAOA), extending the approach of Farhi et al. [22] to allow alternation between more general families of operators. The essence of this extension is the consideration of general parameterized families of unitaries rather than only those corresponding to the time-evolution of a fixed local Hamiltonian for a time specified by the parameter. This ansatz supports the representation of a larger, and potentially more useful, set of states than the original formulation. For cases that call for mixing only within a desired subspace, refocusing on unitaries rather than Hamiltonians enables more efficiently implementable mixers than was possible in the original framework. Such mixers are particularly useful for optimization problems with hard constraints that must always be satisfied, defining a feasible subspace, and soft constraints whose violation we wish to minimize. More efficient implementation enables earlier experimental exploration of an alternating operator approach, in the spirit of the Quantum Approximate Optimization Algorithm, to a wide variety of approximate optimization, exact optimization, and sampling problems.
We carefully construct a framework for this ansatz, laying out design criteria for families of mixing operators. We then detail QAOA mappings of several optimization problems, and provide a compendium of mappings for a diverse array of problems. These mapping range from the relatively simple, which could be implemented on near term devices, to complex mappings with significant resource requirements. This paper is meant as a starting point for a research program. Improved mappings and compilations, especially for some of the more complex problems, are a promising area for future work. Architectural codesign could be used to enable experimentation of QAOA approaches to some problems earlier than would be possible otherwise.
We reworked the original acronym so that "QAOA" continues to apply to both prior work and future work to be done in this more general framework. More generally, the reworked acronym refers to a set of states representable in a certain form, and so can be used without confusion in contexts other than approximate optimization, e.g. exact optimization and sampling. (Incidentally, this reworking also removes the redundancy from the now commonly used phrase "QAOA algorithm".) Here, after describing the framework for the ansatz, we map a number of problems, designing phase-separation and mixing operators appropriate for each problem. We comment here on the relation between these mappings and those for quantum annealing (QA). Because current quantum annealers have a fixed driver (the mixing Hamiltonian in the QA setting), all problem dependence must be captured in the cost Hamiltonian on such devices. The general strategy is to incorporate the hard constraints as penalty terms in the cost function, and then convert the cost function to a cost Hamiltonian [9,52,65]. But this approach means that the algorithm must search a much larger space than were the evolution confined to feasible configurations, making the search less efficient than were it possible to constrain the evolution. This issue, and other drawbacks, led Hen & Spedalieri [39] and Hen & Sarandy [38] to suggest a different approach for adiabatic quantum optimization (AQO) in which the standard driver is replaced by an alternative driver that confines the evolution to the feasible subspace. Their approach resembles a restricted class, H-QAOA (defined below), of QAOA algorithms. Our H-QAOA mappings of graph coloring, graph partitioning, and not-all-equal 3SAT are close to those in [38,39]. While QAOA mappings are different from quantum annealing mappings, with most of the design effort going into the mixing operator rather than the cost-function-based phase separator, QAOA algorithms, like QA and AQO, but unlike most other quantum algorithms, are relatively easy for people familiar with classical computer science but not quantum computing to design, as we illustrate in this paper.
In Sec. 2, we carefully construct a framework for this ansatz, laying out design criteria for families of mixing operators. Sec. 3 and Sec. 4 detail QAOA mappings and compilations for several optimization problems, illustrating design techniques and a variety of mixers. Sec. 3 considers four problems in which the configuration space of interest is strings: MaxIndependentSet, Max-k-ColorableSubgraph, Max-k-ColorableInducedSubgraph, and MinGraphColoring. Sec. 4 considers four problems in which the configuration space of interest is orderings: MinTravelingSalesperson, and three versions of single machine scheduling (SMS), also called job sequencing. App. A provide a compendium of mappings and compilations for a diverse array of problems, and provides resource estimates for their implementation. Sec. 5 concludes with a discussion of many open questions and directions for future work.

The Quantum Alternating Operator Ansatz (QAOA)
Here, we formally describe the Quantum Alternating Operator Ansatz (QAOA), extending the approach of Farhi et al. [22]. QAOA, in our sense, encompasses a more general class of quantum states that may be algorithmically accessible and useful. We focus here on the application of QAOA to approximate optimization, though it may also be used in exact optimization [41,74] and sampling [21].
An instance of an optimization problem is a pair (F, f ), where F is the domain (set of feasible points) and f : F → R is the objective function to be optimized (minimized or maximized). Let F be the Hilbert space of dimension |F |, whose standard basis we take to be {|x : x ∈ F }. Generalizing Ref. [22], a QAOA circuit is characterized by two parameterized families of operators on F: • a family of phase-separation operators U P (γ) that depends on the objective function f and • a family of mixing operators U M (β) that depends on the domain and its structure, where β and γ are real parameters. Specifically, a QAOA p circuit consists of p alternating applications of operators from these two families: This Quantum Alternating Operator Ansatz (QAOA) consists of the states representable as the application of such a circuit to a suitably simple initial state |s : For a given optimization problem, a QAOA mapping of a problem consists of a family of phase-separation operators, a family of mixing operators, and a starting state. The circuits for the quantum approximate optimization algorithm fit within this paradigm, with unitaries of the form e −iγH P and e −iβH M , with parameters γ and β indicating the time for which a fixed Hamiltonian is applied. The domain usually will be expressed as the feasible subset of a larger configuration space, specified by a set of problem constraints. For implementation on expected near-term quantum hardware, each configuration space will need to be encoded into a subspace of a Hilbert space of a multiqubit system, with the domain corresponding to as a feasible subspace of the configuration space. For each domain, there are many possible mixing operators. As we will see, using more general one-parameter families of unitaries enables more efficiently implementable mixers that preserve the feasible subspace. Given a domain, an encoding of its configuration space, a phase separator, and a mixer, there are a variety of compilations of the phase separator and mixer to circuits that act on qubits.
For any function f , not just an objective function, we define H f to be the quantum Hamiltonian that acts as f on the basis states: In prior work, the domain F is the set of all n-bit strings, U P (γ) = e −iγH f , and U M (β) = e −iγH B . Furthermore, with just one exception, the mixing Hamiltonian was H B = n j=1 X j . We use the notation X j , Y j , Z j to indicate the Pauli matrices X, Y , Z acting on the jth qubit. The corresponding parameterized unitaries are denoted by X j (θ) = e −iθX j and similarly for Y j and Z j . The one exception is Sec. VIII of [21], which discusses a variant for the problem of maximum independent set, in which F is the set of bitstrings corresponding to independent sets, the phase separator depends on the cost function as above, and the mixing operator is 1, x, y ∈ F and Ham(x, y) = 1, 0, otherwise, which connects feasible states with unit Hamming distance Ham. Sec. VIII of [21] does not discuss the implementability of U M (β). We extend and formalize the approach of Sec. VIII of [21] with an eye to implementability, both in the short and long terms. We also build on a theory developed for adiabatic quantum optimization (AQO) by Hen and Spedalieri [39] and Hen and Sarandy [38], though the gate-model setting of QAOA leads to different implementation considerations than those for AQO. For example, Hen et al. identified Hamiltonians of the form as useful in the AQO setting for a variety of optimization problems with hard and soft constraints; such mixers restrict the mixing to the feasible subspace defined by the hard constraints. Analogously, the unitary U M = e −iβH M meets our criteria, discussed in Sec. 2.1, for good mixing for a variety of optimization problems including those considered in [38,39]. Since H j,k and H i,l do not commute when |{j, k} ∩ {i, l}| = 1, compiling U M to two-qubit gates is non-trivial. One could Trotterize, but it may be more efficient and just as effective to use an alternative mixing operator such as U M = e −iβH Sr · · · e −iβH S 1 where the pairs of qubits have been partitioned into r subsets {S i } i containing only disjoint pairs, motivating in part our more general ansatz.
We define as "Hamiltonian-based QAOA" (H-QAOA) the class of QAOA circuits in which both the phase separator family U P (γ) = e −iγH P and the mixing operator family U M (β) = e −iβH M correspond to time-evolution under some Hamiltonians H P and H M , respectively. (In this work, we consider only phase separators U P (γ) = x e −iγg(x) |x x| that correspond to classical functions and thus also correspond to time-evolution under some (potentially non-local) Hamiltonian, though more general types of phase separators may be considered.) We further define "Local-Hamiltonian-based QAOA" (LH-QAOA) as the subclass of H-QAOA in which the Hamiltonian H M is a sum of (polynomially many) local terms.
Before discussing design criteria, we briefly mention that there are obvious generalizations in which U P and U M are taken from families parameterized by more than a single parameter. For example, in [24], a different parameter for every term in the Hamiltonian is considered. In this paper, we will only consider the case of one-dimensional families, given that it is a sufficiently rich area of study, with the task of finding good parameters γ 1 , . . . , γ p and β 1 , . . . , β p already challenging enough due to the curse of dimensionality [73]. A larger parameter space may support more effective circuits, but increases the difficulty of finding such circuits by opening up the design space and making the parameter setting more difficult.

Design Criteria
Here, we briefly specify design criteria for the three components of a QAOA mapping of a problem. We expect that as exploration of QAOA proceeds, these design criteria will be strengthened, and will depend on the context in which the ansatz is used. For example, when the aim is a polynomial-time quantum circuit, the component should have more stringent bounds on their complexity; without such bounds, the ansatz is not useful as a model for a strict subset of states producible via polynomially-sized quantum circuits. On the other hand, when the computation is expected to grow exponentially, a simple polynomial bound on the depth of these operators might be reasonable. One example might be for exact optimization of the problems considered here; for these problems the worst case algorithmic complexity is exponential, but it is worth exploring whether QAOA might out-perform classical heuristics in expanding the tractable range for some problems.
Initial State. We require that the initial state |s be trivial to implement, by which we mean that it can be created by a constant-depth (in the size of the problem) quantum circuit from the |0 . . . 0 state. Here, we often take as our initial state a single feasible solution, usually implementable by a depth-1 circuit consisting of single-qubit bit-flip operations X. Because in such a case the initial phase operator only applies a global phase, we may want to consider the algorithm as starting with a single-mixing operator U M (β 0 ) to the initial state as a first step. In the quantum approximate optimization algorithm, the standard starting state |+ · · · + is obtained by a depth-1 circuit apply Hadamards H to each of the qubits in the |0 . . . 0 state. This criterion could be relaxed to logarithmic depth if needed. It should not be relaxed too much: relaxing the criterion to polynomial depth would obviate the usefulness of the ansatz as a model for a strict subset of states producible via polynomially-sized quantum circuits. Algorithms with more complex initial states should be considered hybrid algorithms, with an initialization part and a QAOA part. Such algorithms are of interest in cases when one expect the computation to grow exponentially, such as is the case for exact optimization for many of the problems here, but might still out-perform classical heuristics in expanding the tractable range.
Phase-separation unitaries. We require the family of phase separation operators to be diagonal in the computational basis. In almost all cases, we take U P (γ) = e −iγH f .

Mixing unitaries (or "mixers").
We require the family of mixing operators U M (β) to • preserve the feasible subspace: for all values of the parameter β the resulting unitary takes feasible states to feasible states, and • provide transitions between all pairs of states corresponding to feasible points. More concretely, for any pair of feasible computational-basis states x, y ∈ F , there is some parameter value β * and some positive integer r such that the corresponding mixer connects those two states: In some cases we may want to relax some of these criteria. For example, if a QAOA circuit is being used as a subroutine within a hybrid quantum-classical algorithm, or in a broader quantum algorithm, we may use starting states informed by previous runs and thus allow mixing operators that mix less.

QAOA Mappings: Strings
This section describes mappings to QAOA for four problems in which the underlying configuration space is strings with letters taken from some alphabet. We introduce some basic families of mixers, and discuss compilations thereof, illustrating their use with MaxCol-orableSubgraph as an example. We then build on these basic mixers to design families of more complicated mixers, such as controlled versions of these mixers, and illustrate their use in mappings and circuits for the problems MaxIndependentSet, MaxColorableInduced-Subgraph and MinGraphColoring as examples. The mixers we develop in this section, and close variants, are applicable to a wide variety of problems, as we will see in Sec. A.

Example: Max-κ-ColorableSubgraph
Problem. Given a graph G = (V, E) with n vertices and m edges, and κ colors, maximize the size (number of edges) of a properly vertex-colored subgraph.
The domain F is the set of colorings x of G, an assignment of a color to each vertex. (Note that here and throughout, the term "colorings" includes improper colorings.) The domain F can be represented as the set of length n strings over an alphabet of κ characters, The objective function f : [κ] n → N counts the number of properly colored edges in a coloring: We now build up machinery to define mixing operators for a QAOA approach to this problem. Since some mixing operators are more naturally expressed in one encoding rather than another, we find it useful to describe different mixing operators in different encodings, though we emphasize that doing so is merely for convenience; all mixing operators are encoding independent, so the descriptions may be translated from one encoding to another. The domain F is naturally expressed as strings of d-dits, a d-valued generalization of bits.
For the present problem, d = κ. In addition to discussing colorings as strings of dits, we will use a "one-hot" encoding into nκ bits, with x i,c indicating whether or not vertex i is assigned color c.

Single Qudit Mixing Operators
We focus initially on designing partial mixers, component operators that will be used to construct full mixing operators. For this mapping of the maxColorableSubgraph problem, the partial mixers are operators acting on a single qudit with dimension d = κ that mix the colors associated to one vertex v. As we will see in subsequent sections, this is a particularly simple case of partial mixers which are often more complicated multiqubit controlled operators. Once we have defined these single-qudit partial mixing operators, we put them together to create a full mixer for the problem. We begin by considering the following family of single-qudit mixing operators expressed in terms of qudits, and qudit operators, and then consider encodings and compilations to qubit-based architectures, which will inspire us to consider other families of single-qubit mixing operators. See App. C.2 and [6,29] for a review of qudit operators, including the generalized Pauli operatorsX andZ.
i , which acts on a single qudit, withX = d a=1 |a + 1 a|. We identify two special cases by name: the "single-qudit ring mixer" for r = 1, H ring = H 1-NV ; and the "fully-connected" mixer for r = d − 1, H FC = H (d−1)-NV . Whenever we introduce a Hamiltonian, we also implicitly introduce its corresponding family of unitaries, as with H r-NV and U r-NV (β).
The single-qudit ring mixer will be our bread and butter. We concentrate on qubit encodings thereof, given the various projections of hardware with at least 40 qubits that will be available in the next year or two [55,68], though it could alternatively be implemented directly using a qudit-based architecture. We explore two natural encodings of a qudit into qubits: (1) the one-hot encoding using d qubits, in which the qudit state |a is encoded as |0 ⊗a ⊗|1 ⊗|0 ⊗d−1−a ; and (2) the binary encoding using log 2 d qubits, in which the qudit state |a is encoded as the qubit basis state corresponding to the binary representation of the integer a. The one-hot encoding uses more qubits but supports simpler compilation of the single-qudit ring mixer for general d, both in the sense of it being much easier to write down a compilation, and in the sense that it uses many fewer 2-qubit gates; in the one-hot encoding, 2-qubit mixing operators suffice, whereas the binary encoding requires log 2 dlocal Hamiltonian terms, with the corresponding unitary requiring further compilation to 2-qubit gates.
In the one-hot encoding, the single-qudit ring mixer is encoded as a qubit unitary U (enc) ring corresponding to the qubit Hamiltonian which acts on the whole Hilbert space in a way that preserves the Hamming weight of computational basis states, and acts as U ring on the encoded subspace spanned by unit Hamming weight computational basis states.
Although H (enc) ring is 2-local, its terms do not mutually commute. There are several implementation options. First, hardware that natively implements the multiqubit gate U (enc) ring directly may be plausible (much as quantum annealers already support the simultaneous application of a Hamiltonian to a large set of qubits), but most proposals for universal quantum processors are based on two-qubit gates, so a compilation to such gates is desirable both for appliciability to hardware likely to be available in the near term and for error-correction and fault tolerance in the longer term. Second, we could use constructions used in quantum many-body physics to compile U (enc) ring into a circuit of 2-local gates [72]. Third, the multiqubit gate U (enc) ring could be implemented approximately via Trotterization or other Hamiltonian simulation algorithms. A different approach, and the alternative we explore most extensively here, is to implement a different unitary rather than U r-NV , one related to U r-NV , and sharing its desirable mixing properties, as encapsulated in the design criteria of Sec. 2.1, but is easier to implement. The form of the circuit obtained by Trotterization is suggestive. We consider sequentially apply unitaries corresponding to sets of terms in the Hamiltonian, each set chosen in such a way that the unitary is readily implementable. This reasoning mirrors the relation of H-QAOA circuits to Trotterized AQO. We give a few examples of mixers obtained in this way.
Parity single-qudit ring mixer. Still in the one-hot encoding, we partition the d terms e −iβ(XaX a+1 +YaY a+1 ) by parity. Let where Such "XY" gates are natively implemented on certain superconducting processors [15]. It is easy to see that the parity single-qudit ring mixer meets the first criterion of keeping the evolution within the feasible subspace of a graph consisting of a single vertex. To see that it also meets the second, providing transitions between all feasible computational basis states, it is useful to consider the quantum swap gate, which behaves in exactly the same way as the XY gate on the subspace spanned by {|0, 1 , |1, 0 }. The swap gate is both unitary and Hermitian, and thus e iθSWAP ij = cos(θ)I + i sin(θ)SWAP ij .
For 0 < β < π/2, each term in the parity single-qudit ring mixer is a superposition of a swap gate and the identity. A single application of U parity (β) will have non-zero transition amplitudes only between pairs of colors with indices no more than two apart. Nevertheless, this mixer meets the second criteria because all possible orderings of d swap gates appear in d 2 repeats of the parity operator for any 0 < β < π/2, thus providing non-zero amplitude transitions between all feasible computational basis states for a single-vertex graph.
When d is an integer power of two, there is also a straightforward, though more resourceintensive, compilation of the parity single-qudit ring mixer using the binary encoding. Applying a Pauli X gate to the least-significant qubit acts as H even on the encoded subspace. Incrementing the register by one, applying the Pauli gate to the least-significant qubit, and finally decrementing the register by one overall acts as H odd . Therefore we can implement U parity by incrementing the register by one, applying an X(β) = e −iβX to the least-significant qubit, decrementing the register by one, and then again applying an X(β) gate to the least-significant qubit. For an l-bit register, each incrementing and decrementing operation can be written as a series of l multiply-controlled X gates, with the numbers of control ranging from 0 to l − 1.
Repeated parity single-qudit ring mixers. As we mention above, a single application of U parity (β) will have non-zero transition amplitudes only between pairs of colors with indices no more than two apart, which suggests that it may be useful to repeat the parity mixer within one mixing step.
Partition single-qudit ring mixers. We now generalize the above construction for the parity single-qudit ring mixer to more general partition mixers. For a given ordered partition P = (P 1 , . . . , P p ) of the terms of H r-NV such that all pairs of terms with a P i act on disjoint states of the qudit, let where By construction, in the one-hot encoding, the terms of U P -XY commute (because they act on disjoint pairs of qubits), and so the ordering does not matter; all can be implemented in parallel. We call U P-r-NV the "partition P" r-nearby-values single-qudit ring mixer, and U r-NV the "simultaneous" r-nearby-values single-qudit ring mixer to distinguish the latter from the former. The latter is member of H-QAOA, while the former is not. Even more generally, for a set of single-qudit ring mixers a {H α } indexed by some α and an ordered partition P = (P 1 , . . . , P p ) thereof in which the single-qudit mixers within each part mutually commute, we define a simultaneous version, e −iβ α Hα and a P-partitioned version, P i=1 α∈P i e −iβHα , where the order of the product over the elements of each part doesn't matter because they commute, and the order of the product over the parts of the partition is given by their ordering within the ordered partition P.
Binary single-qudit mixer for d = 2 l . We now return briefly to the binary encoding, and describe a different single-qudit mixer. An alternative to the r-nearby values singlequdit mixer, which is easily implementable using the binary encoding when d = 2 l is a power of two, is the "simple binary" single-qudit mixer: where X i acts on the ith qubit in the binary encoding of the qudit. Since the ordering of the colors was arbitrary to begin with, it doesn't much matter whether the Hamiltonian mixes nearby values in the ordering or mixes the colors in a different way, in this case to colors with indices whose binary representations have Hamming distance 1.
When d is not a power of 2, a straightforward generalization of the binary single-qudit mixer has difficulty in meeting the first of the design criteria since swapping one of the bit values in the binary representation may take the evolution out of the feasible subspace. While requiring d to be a power of 2 restricts its general applicability, the binary singlequdit mixer could be useful in some interesting cases, such as 4-coloring. For 2-coloring (a problem is equivalent to MaxCut), the full binary single-qudit mixer is simply the standard mixer X. We will use this encoding in Sec. 4.2 to handle slack variables in a single-machine scheduling problem, a case in which there is flexibility in the upper range of the integer to be encoded, allowing us to round up to the nearest power of two when needed.

Full QAOA mapping
Having introduced several partial mixers for single qudits, we now show a complete QAOA circuit for an instance of MaxColorableSubgraph with n vertices, m edges, and κ colors, compiled to 2-local gates on qubits. Using the one-hot encoding, we require nκ qubits.
Mixing operator. We use as the full mixer a parity ring mixer made up of parity single-qudit ring mixers, one for each of the qudits corresponding to each vertex, The single qubit mixers act on different qubits and can be applied in parallel. The overall parity mixing operator requires a depth-2 or depth-3 circuit (for even and odd κ, respectively) of nκ gates. Other single-qudit mixers we defined above, including r repeats of the parity ring mixer, other partitioned mixers, or the binary mixer, could be used in place of the parity single-qudit mixer in this construction. All of these unitary mixers by construction meet our first criterion for a mixer: keeping the evolution in the feasible subspace. Further, each of these mixers, after at most κ 2 repeats, provides non-zero amplitude transitions between all colors at a given vertex, with the product providing transitions between any two feasible states.
Phase-separation operator. The objective function can be written in the classical one-hot encoding as where x v,a = 1 indicating that vertex v has been assigned color a. To obtain a phaseseparation Hamiltonian, we substitute (I − Z)/2 for each binary variable, to obtain The constant term effects only a physically-irrelevant global phase, and since we are only concerned about the feasible subspace, we can disregard each sum κ a=1 Z u,a of all κ single Z operators corresponding to a single qudit, since they multiply each of the d Hamming weight 1 elements corresponding to the d single qudit values by the same constant, resulting in a global phase. Removing those terms and rescaling, the phase separator now has the simpler form where Z v,a acts on the ath qubit in the one-hot encoding of the vth qudit, corresponding to coloring vertex v with color a. The phase separator requires a circuit containing mκ gates with depth at most D G + 1, where D G is the maximum degree over all vertices in the instance graph G. Translated back to acting on qudits, Eq. (17) acts as . We refer to this function g as the "phase function", which will typically be an affine transformation of the objective function, which corresponds simply to a physically irrelevant global phase and a rescaling of the parameter. Defining H P using such a phase function allows us to write a simpler encoded version H (enc) P that corresponds exactly to H P , without qualification, on the encoded subspace. Initial state. Any encoded coloring can be generated by a depth-1 circuit of no more than n single-qubit X gates. A reasonable initial state is one in which all vertices are assigned the same color. Alternatively, we could start with any other feasible state, or the initial state could be obtained by applying one or more rounds of the mixer to a single feasible state, so that the algorithm begins with a superposition of feasible states.
The circuit depth and gate count for the full algorithm will increase when compiling to realistic near-term hardware with architectures that have nearest neighbor topological constraints limiting between which pairs of physical qubits two-qubit gates can be applied. See [71] for one approach for compiling to realistic hardware with such constraints.
Further investigation is need to understand which mixers and initial states, for a given resource budget, result in a more or less effective altorithms, and whether some have an advatage with respect to finding good parameters γ 1 , . . . , γ p and β 1 , . . . , β p or being robust to error.

Example: MaxIndependentSet
Problem. Given a graph G = (V, E), with |V | = n and |E| = m, find the largest subset V ⊂ V of mutually non-adjacent vertices.
This problem was discussed in Sec. VII of [22] as a "variant" of the quantum approximate optimization algorithm introduced in that paper. To handle this problem, Farhi et al. suggested restricting the evolution to what we are calling the feasible subspace of the overall Hilbert space, the subspace spanned by computational basis elements corresponding to independent sets, through modification of what we are calling the mixing operator. We now make the construction of the H-QAOA mixer Farhi et al. define more explicit, and introduce partitioned mixers that have implementation advantages over the H-QAOA, or simultaneous, mixer defined in Farhi et al..
The configuration space is the set of n-bit strings, representing subsets V ⊂ V of vertices, where i ∈ V if and only if x i = 1. The domain F is represented by the subset of all n-bit strings corresponding to independent sets of G. In contrast to the domain for MaxColorableSubgraph, this domain is dependent on the problem instance, not just on the size of the problem. Because the configuration is already bit-based, some aspects of mapping this problem to QAOA are simpler, but the partitioned mixing operators are more complicated in that they require controlled operation.
To support the discussion of controlled operators, we use the notation Λ y (Q) to indicate a unitary target operator Q applied to a set of target qubits controlled by the state y of a set of control qubits: More generally, we use Λ χ (Q) when the operation is controlled on a predicate χ: Whether the subscript of Λ is a string or predicate will be clear from context. For a Hamiltonian H Q such that Q = e −iH Q , we can write the controlled unitary Eq. (19) as We refer to H χ ⊗ H Q as the controlled Hamiltonian, χ-controlled-H Q . Note that Hamiltonian H χ that acts as the predicate χ on computational basis states, in the sense of Eq. (3), is precisely the projector H χ = x:χ(x)=1 |x x| that projects onto the subspace on which the predicate is 1. We will use this relation to connect corresponding controlled Hamiltonians and controlled unitaries. In particular, when we want to apply a phase only on the part of the Hilbert space picked out by a predicate χ, we can write where we have adapted the control notation of Eq. (20) to mean applying the operator Q = e −iθ to zero target qubits. We will often compile controlled unitaries (both phase separators and mixers) by using ancilla qubits to intermediate the control, e.g. for a single ancilla qubit (initialized at |0 and returned thereto):

Partial mixing operator at each vertex
Given an independent set V , we can add a vertex w / ∈ V to V while maintaining feasibility only if none of its neighboring vertices nbhd(w) are already in V . On the other hand, we can always remove any vertex w ∈ V without affecting feasibility. Hence, a bitflip operation at a vertex, controlled by its neighbors (adjacent vertices), suffices both to remove and add vertices while maintaining the independence property. These classical moves inspire the controlled-bit-flip partial mixing operators.
In general, for a string y and a set of indices V , let be the substring of y in lexicographical order of the indices. In particular, let x nbhd(v) = (x w ) w∈nbhd (v) . (The ordering of the characters within the substring is arbitrary, because we only use this as the argument to predicates that are symmetric under permutation of the arguments.) For each vertex, we define the partial mixer as a multiply-controlled X operator with corresponding partial mixing unitary, a multiply-controlled-X(β) single-qubit rotation, where Since X v is both Hermitian and unitary, e −iβXv is a combination of the identity and X v for 0 < β < π/2.

Full QAOA mapping
We define two distinct types of mixers: • the simultaneous controlled-X mixer, U sim-CX (β) = e −iβH CX ; and • a class of partitioned controlled-X(β) mixers, where P is an ordered partition of the partial mixers, in which each part contains mutually commuting partial mixers. Since the partial mixers often do not commute, different ordered partitions often result in dfferent mixers. By design, the partitioned mixers restricts evolution to the feasible subspace. With respect to the second design criterion, there is non-zero transition amplitude from the |0 ⊗n state corresponding to the empty set to all other independent sets; for 0 < β < π/2, we get terms corresponding to products of the individual control-bit-flip operators for all subsets of vertices, including those corresponding to independent sets. (For those subsets S not corresponding to independent sets, the product will result in a independent set V ∈ S that does not include vertices in S which have neighbors whose controlled-bit-flip preceded them in the partition order, thus different ordered partition affects the amount of non-zero amplitude in the states corresponding to independent sets.) Two applications of any partitioned mixer results in non-zero amplitude between any two feasible states. An interesting question is how different ordered partitions affect the ease with which good parameters can be found and the quality of the solutions obtained.
Partitioned mixers are generally easier to compile than the simultaneous mixer, since the partitioned mixer is a product of multiqubit-controlled-not operators, generalized Toffoli gates on at most D G + 1 qubits. Altogether, this construction uses n partial mixers, which can then be compiled into single-and two-qubit gates. For many graphs, partitions in which each set contains multiple commuting partial mixers exist, reducing the depth.
Phase-separation operator. The objective function is the size of the independent set, or f (x) = n i=1 x i , which we could translate into a phase-separating Hamiltonian via substitution of (I − Z)/2 for each binary variable. Instead, we use affine transformation of the objective function g(x) = n − 2f (x), which when translated yields a phase separation operator of a simpler form: which is simply a depth-1 circuit of n single-qubit Z-rotations.
Initial state. A reasonable initial state is the trivial state |s = |0 ⊗n corresponding to the empty set.

Example: MaxColorableInducedSubgraph
Problem. Given κ colors, and a graph G = (V, E) with n vertices and m edges, find the largest induced subgraph that can be properly κ-colored.
The configuration space is the set [κ + 1] n of (κ + 1)-dit strings of length n, corresponding to partial κ-colorings of the graph: x v = 0 indicates that vertex v is uncolored, and x v = c > 0 indicates that the vertex has color c. The induced subgraph is defined by the colored vertices. The domain is the set of proper partial colorings, those in which two colored vertices that are adjacent in G have different colors. The objective function f : [κ + 1] n → N is the number of vertices that are colored,

Controlled null swap mixer at a vertex
The controlled-null-swap partial mixer we define has elements of the mixers we have seen for the previous two problems, combining the control by vertex neighbors from MaxIn-dependentSet and the color swap from MaxColorableSubgraph. Here, however, we make substantial use of the uncolored state, and only consider swapping a color with uncolored status at each vertex. An uncolored vertex can be assigned color c, maintaining feasibility as long as none of its neighbors have been assigned color c, suggesting a swap between a color and uncolored controlled on neighboring vertices; uncoloring a vertex always preserves feasibility. This reasoning in terms of classical moves inspires, for problems containing NEQ constraints, the controlled-null-swap partial mixing Hamiltonian with corresponding controlled null-swap-rotation mixing unitary where is shorthand for none of the variables in y having value any of the values in A; when A = {a} is a singleton set, we write simply NONE(y, a) = NONE(y, {a}).

Mixing operators.
Define We define two distinct types of mixers: • the simultaneous controlled null-swap mixer, U sim-NS (β) = e −iβH NS ; and • a family of partitioned controlled null-swap mixers, Again, we have a variety of partitioned mixers, each specified by an ordered partition P of the vertices such that for each color the terms corresponding to the vertices in the partition commute. We have segregated the colors into separate stages, but other orderings are possible. We use the one-hot encoding of Sec. 3.1, but with additional variables x v,0 for the uncolored states: the binary variables for each vertex v are x v,0 , x v,1 , . . . , x v,k . This encoding uses n(κ + 1) computational qubits. In this encoding, a single partial mixer has the form wherex nbhd(v),a = (x w,a ) w∈nbhd (v) . Reasoning similar to that we used for the mixers discussed for the MaxIndependentSet and MaxColorableSubgraph problems shows that this mixer has non-zero transition amplitude between any feasible computational-basis state and the trivial state corresponding to the empty set as the induced subgraph. Two applications of this mixer give non-zero transition amplitudes between any two feasible computational-basis states.
To ease compilation, each partial mixer can be implemented as where the control is intermediated by an ancilla qubit, which is initialized and return to the zero state. Altogether, this construction uses κn partial mixers, which can then be compiled into single-and two-qubit gates. For many graphs, partitions in which each set contains multiple commuting partial mixers exist, reducing the depth.
Phase-separation operators. We can translate the objective function to a Hamiltonian as usual, or translate a linear modification of the objective function to obtain a simpler form.
The phase separator function g(x) = n − 2f (x) corresponding to the simple compiled phase separator H which can be implemented using a depth-1 circuit of n single-qubit Z-rotations.

Example: MinGraphColoring
Problem. Given a graph G = (V, E), find the minimal number of colors k * required to properly color it.
A graph that can be κ-colored but not (κ − 1)-colored is said to have chromatic number κ. We take as our configuration space the set of κ-dit strings of length n, where κ = D G +2. The domain F is the set of proper colorings, many of which will use fewer than κ colors. With D G + 2 colors, as we explain next, it is possible to get from any proper coloring to any other by local moves while staying in the feasible space, a property we will make use of in designing mixing operators. We comment that it may be advantageous to take use a larger number of colors since that may promote mixing, but the tradeoffs there would need to be determined in a future investigation.
It is easy to see that any graph can be colored with D G + 1 colors. To see that κ = D G + 2 suffices to get between any two D G + 1 colorings, first recognize that given a D G + 2 coloring, one can always obtain a D G + 1 coloring by simply choosing a color and recoloring each vertex currently colored in that color with one of the other colors, since at least one of those colors will not be used by its neighbors. This move is local, in that it depends only on the neighborhood of the vertex. Now, given two D G + 1 colorings C and C , we will iterate through colors c to tranform between the two colorings via local moves while staying in the feasible space. Let S ⊂ V be the set of vertices colored c in C , and let S ⊂ S be the set of vertices in S that are not colored c in C. Consider all neighbors of vertices in S. For any neighbor colored c, color it with the unused color. We are now free to color all vertices in S with color c. Iterating through the κ colors, provides a means of getting from one D G + 1 coloring to another by local moves that remain in the feasible space.

Partial mixer at a vertex
We use a controlled version of the mixer in Sec. 3.1 that allows a vertex to change colors only when doing so would not result in an improper coloring; we may swap colors a and b at vertex v only if none of its neighbors are colored a or b. The partial mixer we define has a similar form to the controlled-null-swap partial mixer defined in Eqs. (27) and (28), but supports color changes between any two colors at a vertex, rather than only between a color and uncolored. Define the controlled-swap partial mixing Hamiltonian: with corresponding controled-swap-rotation mixing unitary where NONE(x, A) was defined in Eq. (29). These mixers are controlled versions of the single qudit fully-connected mixer of Sec. 3.1, rather than the single qudit ring mixer, which makes sure that every possible state is reachable.

Mixing Operator. Let
We define two types of mixers: • the simultaneous controlled-swap mixer, • a family of partitioned controlled-swap mixers, As before, each partitioned mixer is specified by an ordered partition P of the vertices such that, for each color, the partial mixers for vertices in one set of the partition all commute with each other. Altogether, this construction uses (κ − 1)κn/2 partial mixers, For many graphs, partitions in which each set contains multiple commuting partial mixers exist, allowing different partial mixers to be carried out in parallel, reducing the depth.
Phase-separation operator. The objective function, f : [κ] n → Z + , is OR (EQ(x 1 , a), . . . , EQ(x n , a)) , (39) which counts the numbers of colors used. Let g(x) = κ − f (x) be the phase operator that counts the number of colors not used. Let H NONE(x,a) be the projector onto the subspace of H spanned by the states corresponding to strings in [κ] n that do not contain the character a. We have H g = a H NONE(x,a) , so Initial state. For the initial state, we use an easily found D G + 1 (or D G + 2) coloring.

Compilation in one-hot encoding
We now give partial compilations of the elements of the mapping to qubits using the one-hot encoding. Mixing operators. In the one-hot encoding, the controlled-swap mixing Hamiltonian can be written with corresponding unitary written as where, for a string doubly indexed y = (y i,j ) i,j , y A,B = (y i,j ) i∈A j∈B denotes the substring consisting the characters y i,j for which i ∈ A and j ∈ B, in lexicographical ordering of the two indices. In particular, x nbhd(v),{a,b} indicates the bits corresponding to coloring the neighbors of v either color a or color b. The Λ NOR(x nbhd(v),{a,b} ) dictates that none of the neighbors of v take value a or b for the swap to be performed. Each U CS,v,{a,b} is a controlled gate with 2D v control qubits and two target qubits. Altogether, the full mixing Hamiltonian can be implemented using κ(κ − 1)n/2 controlled gates on no more than D G + 2 qubits.
Phase separator. Let U P,a (γ) = e −iγH NONE(x,a) , so that the phase separator Eq. (40) can be written as U P (γ) = κ a=1 U P,a (γ). Each partial phase separator can alternatively be written as Initial state. Any coloring can be prepared in depth 1 using n single-qubit X gates:

QAOA Mappings: Orderings and schedules
Many challenging computational problems have a configuration space that is fundamentally the set of orderings or schedules of some number of items. Here, we introduce the machinery for mapping such problems to QAOA, using the traveling salesperson and several singlemachine scheduling problem as illustrative examples.

Example: Traveling Salesperson Problem (TSP)
Problem. Given a set of n cities, and distances d : [n] 2 ∈ R + , find an ordering of the cities that minimizes the total distance traveled for the corresponding tour. While for expository purposes, we call these numbers distances, the mapping works for any cost function on pairs of cities, whether or not it forms a metric or not; the distances are not required to be symmetric, or to satisfy the triangle inequality.

Mapping
The configuration space here is the set of all orderings of the cities. Labeling the cities by [n], the ordering ι = (ι 1 , ι 2 , . . . , ι n−1 , ι n ) indicates traveling from city ι 1 to city ι 2 , then on to city ι 3 , and so on until finally returning from city ι n back to city ι 1 . The configuration space includes some degeneracy in solutions with respect to cyclic permutations; specifically, for any ordering ι, the configuration space includes both (ι 1 , ι 2 , . . . , ι n−1 , ι n ) and (ι 2 , ι 3 , . . . , ι n , ι 1 ), even though the are essentially the same solution to TSP. We leave in this degeneracy to preserve symmetries which make it easier to construct mixers. As there are no problem constraints, the domain is the same as the configuration. The objective function is Ordering swap partial mixing Hamiltonians. Our mixers for orderings will be built on the partial mixers we call "value-selective ordering swap mixing Hamiltonians." Consider {ι i , ι j } = {u, v}, indicating that city u (resp. v) is visited at the ith (resp. jth) stop on the tour. There are n We make extensive use of a special case, the adjacent ordering swap mixing Hamiltonians, To swap the ith and jth elements of the ordering regardless of which cities those are, we use the value-independent ordering swap partial mixing Hamiltonian Of these n 2 partial mixing Hamiltonians, n are adjacent value-independent ordering swap partial mixing Hamiltonians, which swap the ith element with the subsequent one regardless of which cities those are. These partial mixers can be combined in several ways to form full mixers, of which we explore two types.
Simultaneous ordering swap mixer. Defining H PS = n i=1 H PS,i , we have the "simultaneous ordering swap mixer", where the ordering of the products does not matter because each term commutes. Similarly for P c,even and U c,even , and P c,last and U c,last . Thus we have the full color-parity mixer where the unitaries {U c,π } are applied in the order they appear in P CP . The color-parity partition is optimal with respect to the number of parts in the partition (exactly so for even n and up to an additive factor of 2 for odd n). By construction, application of this mixer to any feasible state results in a feasible state, thus satisfying the first design criterion. With regard to the second criterion, while a single application of this mixer will have non-zero transitions only between orderings that swap cities in tour positions no more than two apart, repeating the mixer sufficiently many times results in non-zero transitions between any two states representing orderings. More specifically, since any ordering can be obtained from any other with no more than n(n−1) 2 adjacent swaps, alternating between odd and even swaps, n(n−1) 2 repeats suffices for any 0 < β < π/2.

Compilation
Encoding orderings. We encode orderings in two stages: first into strings, then into bits making use of encodings from Sec. 3. Here, we focus on a "direct encoding", as opposed to the "absolute encoding" that will be introduced in Sec. 4.3. Other encodings of orderings are possible, such as the Lehmer code and inversion tables. In the direct encoding, an ordering ι = (ι 1 , . . . , ι n ) is encoded directly as a string [n] n of integers. Once in the form of strings, any of the string encodings introduced in Sec. 3 can be applied. We apply the one-hot encoding with n 2 binary variables; the binary variable x j,u indicates whether or not ι j = u in the ordering, in other words, whether city u is visited at the j-th stop of the tour.
Phase separator. We use the phase function g(ι) = 4f (ι) − (n − 2) n u=1 n v=1 d(u, v), which translates to a phase separator encoded as The phase separating unitary corresponding to Eq. (53) imparts a phase determined by the sum of the distances between successive cities to a state corresponding to a tour. This unitary can be implemented using n 2 (n−1) 2-qubit gates, which mutually commute. Using the same color-parity partition of the terms as for the color-parity ordering swap mixer, this can be done in depth 2κ ≤ 2n.
Mixer. The individual value-selective ordering swap partial mixer, which swaps cities u, v between tour positions i and j, is expressed in the one-hot encoding as where The ith adjacent value-selective swap partial mixer (Eq. (47)) is the special case Each of the two term of the form S + S + S − S − in Eq. (58), can be written as a sum of eight terms, each a product of 4 Pauli operators (e.g. XXY Y ). The color-parity partitioned ordering swap mixer of Eq. (52) can be implemented using (n − 1) n 2 of these 4-qubit gates, implementable in depth 2κ ≤ 2n in these gates. The 2-qubit gate circuit depth is at most 2κ times the depth of a compilation for such 4-qubit gates. Initial state. The initial state, an arbitrary ordering, can be prepared from the zero state using at most n single-qubit X gates.

Example: Single Machine Scheduling, Minimizing Total Squared Tardiness
Problem. (1|d j | w j T 2 j ) Given a set of n jobs with processing times p, deadlines d ∈ Z + , and weights w, find a schedule minimizing the total weighted squared tardiness n j=1 w j T 2 j . The tardiness of job j with completion time C j is defined as The configuration space and domain are the set of all orderings of the jobs. Given an ordering ι of the jobs in which job i is the σ i -th job to start, the corresponding schedule s(ι) is that in which each job starts as soon as the earlier jobs finish: For a job i starting at time s i , consider the expression When the "slack" variable y i ∈ [0, d i − p i ] is minimized, this expression is equal to the square of the tardiness of job i. Therefore, we recast SMS as the minimization of over the configuration space of orderings ι and slack variables y i . Using the direct one-hot encoding defined in Sec. 4.1.2, in which x j,α indicates that job j is the α-th to start, this is equivalent to where Note Eq. (61) may seem to be quartic; however, the encoding constraints i x i,α = α x i,α = 1 that come with the direct one-hot encoding imply that the quartic terms disappear in the full expansion. The objective function is thus a cubic pseudo-Boolean function, which corresponds to a 3-local diagonal Hamiltonian for the phase separator.
Mixer and initial state We use the same initial state preparation, and the same mixer as in TSP for mixing the ordering, in addition to any of the single-qudit mixers from Sec. 3.1.1 for each of the slack variables. Because the ordering and slack mixers act on separate sets of qubits (x and y), they can be implemented in parallel. Note that the only requirement for the upper bound of the range of the slack variable y i is that it be at least d i − p i + 1. In particular, it could be 2 log 2 (d i −p i +1) , allowing us to use the binary encoding without modification.

SMS, minimizing total tardiness
Problem. (1|d j | w j T j ) Given a set of jobs with processing times p, deadlines d ∈ Z + , and weights w, find a schedule minimizing the total weighted tardiness n j=1 w j T j .

Encoding and mixer
The configuration space is the set of orderings of the jobs; the domain is the same.
Absolute and positional encodings. In the "absolute" encoding of the ordering ι = (ι 1 , . . . , ι n ), we assign each item i a value s i ∈ [0, h], where the "horizon" h is a parameter of the encodings, such that for all i < j, s ι i < s ι j . In certain cases, there will be an itemspecific horizon h i such that s i ∈ [0, h i ]. Note that in general the relationship between encoded states and the orderings they encode is not injective, but it will be in the domains to which we apply it. Once the ordering ι is encoded as a string s(ι) ∈ × n i=1 [0, h i ], the resulting string can be encoded using any of the string encodings previously introduced. We call the special case of the "absolute" encoding with h = n the "positional" encoding; using the one-hot encoding of the resulting strings, the "direct" and "positional" encodings are the same.
Time-swap mixer. We now introduce a mixer that is specific to the absolute encoding in which there is a single horizon h and each job i has a processing time p i . Let the horizon be h = n i=1 p i . Each job can start between time 0 and h i = h − p i . (Other optimizations may be made on an instance-specific basis, though we neglect those for ease of exposition.) We use a "time swap" partial mixer that acts on absolutely encoded orderings which swaps jobs i and j when they are scheduled immediately after one another with the earlier one starting at time t. To swap them regardless of the time at which the earlier one starts we use The simultaneous time swap mixer is as usually constructed, where H TS = {i,j}∈ ( [n] 2 ) H TS,{i,j} . Note that while the simultaneous versions of the total time swap and adjacent permutation mixers are exactly the same, and in particular the individual partial mixer H TS,t,{i,j} has no equivalent that acts on the unencoded ordering, because it acts depending on the total processing times of the preceding jobs rather than their number. Now consider the "time-color" ordered partition P TC = [0, h] × [κ], where, as in Sec. 4.1.1, we use a particular κ-edge-coloring of the complete graph. (Again, further optimizations may be made on an instance-specific basis.) The partition P t,c contains the partial time swap mixers U TS,t,{i,j} for which the edge {i, j} is colored c. The full "time-color" mixer is where U (t,c)-TS is the product of the (mutually commuting) partial mixers in the part P t,c .

Mapping and compilation
We use the absolute one-hot encoding, in which the ordering is encoded as a string using the absolute encoding and then the string is further encoded using the one-hot encoding. Specifically, we encode an ordering ι into i (h i −1) ≤ nh qubits, where qubit (i, t) indicates if job i starts at time t: Phase separator. The objective function is the weighted total tardiness: This yields the encoded phase Hamiltonian Mixer. The partial time-swap mixer in the absolute one-hot encoding is equivalent to Using the time-color partitioned time-swap mixer, this corresponds to a circuit of h n Initial state. We use an arbitrary ordering of the jobs as the initial state.

SMS, with release dates
Problem. (1|d j , r j | * ) Given a set of jobs with processing times p, deadlines d release times r ∈ Z + and weights w, find a schedule that minimizes some function of the tardiness, such that each job starts no earlier than its release time.
We now consider SMS with release dates {r j }. We will not specify the objective function here, as any of those used in previous sections are still applicable. Our focus in the section is introducing a modification of the time swap mixer that preserves feasibility.
Let the horizon h be some upperbound on the maximum completion time, e.g.
be the window of times in which job j can start.
Consider the configuration space × n j=1 [W j ∪ {b j }], i.e. schedules {s} in which a job is scheduled either between its release time and the horizon or at its buffer time slot. The domain is the subset of the configuration space that satisfies the problem constraint that no two jobs overlap.

Partial Mixer: controlled-null-swap mixer
We now introduce a mixer, specifically a controlled null swap mixer used in graph coloring, 3.3. that preserves feasibility and avoids getting "stuck": which corresponds to the unitary is the temporal "neighborhood" of job i at time t with respect to job j, i.e. the set of times at that starting job j would conflict with job i, and s nbhd i,t = s j,t t ∈nbhd i,t (j) j =i .
The role of the buffer site is similar to the "uncolored" option in finding the maximal colorable induced subgraph in Sec. 3.3. Such mixing terms enables jobs to move freely in and out [0, T 0 ] without inducing job-overlap, hence enables exploration of the whole feasible subspace. The controlled unitary notation Λ y (Q) is defined in Eq. (18) in Sec. 3.2.

Encoding and compilation
Given the n i=1 |W i | partial mixers, one for each job i and time t ∈ W i , we can define simultaneous and sequential mixers as above. Using the one-hot encoding the partial mixer is The controlled null swap can be further compiled using ancilla qubits, as in Section 3.2. While in the cost of compilation could be bounded based the degree of the graph, the overlap of the various partial mixers may be more complicated (with respect to partitioning into disjoint sets) and expensive (with respect to number of gates and ancilla qubits) depending on the SMS instance. Objective function. As an example objective function, we again consider minimizing the weighted total tardiness, Eq. (69). The one-hot encoded phase Hamiltonian takes the form of Eq. (70), with b j included in the summation range of t: It can be implemented with j (h j − d j + 1) single qubit Z-rotation gates.
Initial state. Any feasible schedule can be used as the initial state. In particular, we use a greedy earliest-release-date schedule. Assume without loss of generality that the jobs are ordered by their release times, i.e. r 1 ≤ r 2 ≤ · · · ≤ r n . Then set s 1 = r 1 and recursively set s i = max{r i , s i−1 + p i−1 }, which is feasible if likely suboptimal.

Mapping variants
In the construction above, each job j is assigned a "buffer" time b j , and a phase b j − d j applied whenever that job is scheduled at its buffer time. The factor b j − d j is arbitrary. Rather than consider schedules in which some jobs are at their buffer time, one could consider "partial schedules", in which a job is either scheduled at a time between 0 and h or is in its buffer. The phase applied when a job is in its buffer need not be b j − d j but in fact can be arbitrary, e.g. some common "buffer phase factor" B. In this case, we must define a scheme for associating each partial schedule with a canonical complete schedule, e.g. greedily starting the buffered jobs after those that are already scheduled. In this way, the states corresponding to partial schedules can still be considered as part of the domain.

Conclusions
We introduced a Quantum Alternating Operator Ansatz (QAOA), an extension of Farhi et al.'s Quantum Approximate Optimization Algorithm, and showed how to apply the ansatz to a diverse set of optimization problems with hard constraints. The essence of this extension is the consideration of general parameterized families of unitaries rather than only those corresponding to the time evolution of a local Hamiltonian, which allows it to represent a larger, and potentially more useful, set of states than the original formulation. In particular, refocusing on unitaries rather than Hamiltonians in the specification allows for more efficiently implementable mixing operators.
The original algorithm is already a leading candidate quantum heuristic for exploration on near-term hardware; our extension makes early testing on emerging gate-model quantum processors possible for a wider array of problems at an earlier stage. After formally introducing the ansatz, and providing design criteria for its components, we worked through a number of examples in detail, illustrating design techniques and exhibiting a variety of mixing operators. In the appendix, we provide a compendium of QAOA mappings for over 20 problems, organized by type of mixer.
While the approach of designing mixing operators to keep the evolution within the feasible subspace appears quite general, as illustrated by the wide variety of examples we have worked out, it is not universally applicable. Many of the problems in Zuckerman [79] have the form of optimizing a quantity within a feasible subspace consisting of the solutions to an NP-complete problem. Not only is an initial starting state (corresponding to one or more feasible solutions) hard to find, designing the mixing operator is also problematic. Even given a set of feasible solution to an NP-complete problem, it is typically computationally difficult to find another [61], making it difficult to design a mixer that fully explores the feasible subspace. The situation here is somewhat subtle, with it being easy to show in the case of SAT that finding a second solution when given a first remains NP-complete, but for Hamiltonian cycle on cubic graphs, given a first solution, a second is easy to find (but not a third). See [70,76] for results on the complexity of "another solution problems (ASPs)." This difficulty in mapping Zuckerman's problems to QAOA further illustrates reasons for the difficulty of these approximate optimization problems.
While we have given basic design criteria for initial states, mixing operators, and phaseseparation operators, we have barely scratched the surface in terms of which possibilities perform better than others. For most of the example problems, we discussed multiple mixers, coming from different partitions and orderings of partial mixers, different choices related to connectivity, or different numbers of repeats of operators. Analytical, numerical, and ultimately experimental work is required to understand which of these mixers are most effective, and also to determine potential trade-offs with respect to robustness to noise or control error, efficiency of implementation, effectiveness of the mixing, and difficulty in optimizing the parameters. Similar questions arise with respect to choosing an initial state.
Effective parameter setting remains a critical, but mostly open, area of research. While for fixed p, brute-force search of parameter search space was proposed [22], it is practical only for small p; as p increases, the parameter optimization becomes inefficient due to the curse of dimensionality, Guerreschi et al. [30] providing a detailed analysis. In certain simple [73] or highly symmetric cases [41], some insight into parameter setting for p > 1 has been obtained, but even in the simplest cases, understanding good choices of parameters seems non-trivial [73]. Improved parameter setting protocols may come from adapting techniques from existing control theory and parameter optimization methods, and by using insights gained from classical simulation of quantum circuits and experimentation on quantum hardware as it becomes available. In particular, classical simulation of the quantum circuits can take advantage of the local structure of the objective function (when it is indeed local) and the feasibility of classically simulating the measurement of log(n)-qubits in a QAOA circuit by adapting results for IQP circuits [11].
To run on near-term quantum hardware, further compilation will be required. In many cases, we have left the compilation at the level of multiqubit operators that need to be further compiled to the gate set natively available on the hardware, most likely certain oneand two-qubit gates. Furthermore, near-term hardware will have additional restrictions, including which qubits each gate can be applied to, duration and fidelity of the gates, and cross-talk, among others. This necessitates additional compilation, especially to optimize success probability on pre-fault-tolerance devices. Other architectures, e.g. ones based on higher-dimensional qudits, may prompt other sorts of compilations as well. Recently, approaches for compiling circuits, including QAOA circuits, to realistic hardware have been explored [71], but that research direction remains open to innovation.
The robustness of these circuits to realistic error requires further exploration. In the near term, the question is how robust QAOA circuits are to realistic noise and how best to incorporate resource-efficient techniques to improve the robustness. In the longer term, the question becomes how to best incorporate the full spectrum of error correction techniques into this framework.
We expect some cross-fertilization between research on QAOA and research on AQO and quantum annealing, especially in the p 0 regime. A particularly fruitful area of further study would be to build on the results of Yang et al. [75]. They used Pontryagin's minimization principle to show that a bang-bang schedule is (essentially) always optimal, providing support for QAOA, and gate-base approaches, generally. But the argument does not provide an efficient means to find such a schedule. Thus, it remains open in the AQO realm how to find effective schedules, with it currently being completely open as to whether allowing non-bang-bang schedules makes the finding of good schedule parameters easier. Similarly, exploiting certain structural commonalities with VQE is likely to be fruitful. In the small p regime, we expect further cross-fertilization between QAOA and other models being considered for early quantum supremacy experiments, such as random circuits, boson sampling, and IQP circuits. Results for IQP circuits, whose close structural similarities with QAOA circuits has already provided insights [21], and whose advanced status with respect to error analysis and fault tolerance [12,13] are likely to prove especially useful.
A number of extensions are possible. We briefly mention two classes. The first is hybrid methods. As one example, some other algorithm, either classical or quantum, could be run first in order to provide a good initial state, and QAOA then used to improve upon it; this could be repeated several times with different initial states when the other algorithm is stochastic. Another example of a hybrid method is a parameter-setting protocol in which a classical algorithm uses the results of measurements during and after runs of a QAOA circuit to iteratively improve the parameters. A second class of extensions is versions of QAOA with many parameters. For example, we introduced mixing operators that are repeated applications of some basic mixer; each application could use a different parameter. The same could be done with mixers consisting of many partial mixers. Until we have a better grasp of parameter setting in the single parameter per mixing operator case, and for the effectiveness of the different mixers, it makes sense to restrict to the simple case. The one exception is to take advantage of the specific gates natively available, and to use essentially a VQE approach, as suggested in [24]. That approach makes excellent use of the capabilities of near-term hardware but may be more limited than single-parameterper-phase QAOA in what it can tell us about parameter setting and the design of scalable quantum heuristics.
The biggest open question is the effectiveness of this approach as a quantum heuristic, and its potential impact in broadening the array of established applications of quantum computing. While obtaining further analytic results may be possible in some cases, in general we will have to try it out and see. Improved simulation techniques for quantum circuits, including potentially approaches tailored to QAOA circuits, can provide some insight, but the ultimate test will be experimentation on quantum hardware itself.

Acknowledgements
The authors would like to thank the other members of NASA's QuAIL group for useful discussions about QAOA-related topics, particularly Salvatore Mandrà and Zhang Jiang.

A Compendium of mappings and compilations
We summarize QAOA mappings for a variety of problems. All problems considered in NPO [3,69]. Problem names are compact versions of the names in Ausiello et al. [3] unless a different reference is given. Most of the mappings are new to this paper; for the exceptions, a reference is given, though in all such cases only H-QAOA mappings have been considered previously. Approximability results are taken from Ausiello et al. [3] unless otherwise mentioned.
We specify partial mixing Hamiltonians for each problem, which can then be used to define two types of full mixers, simultaneous mixers in H-QAOA that correspond to time evolution under the sum of the partial mixing Hamiltonians, and partitioned mixers (which are not in H-QAOA) that correspond to products of unitary operators defined by all the partial mixers in an order specified by an ordered partition, possibly repeated. We will not specify the full mixers, since they are straightforward to derive from the partial mixers. For each problem, we provide a compilation with resource counts for at least one mixer or a class of mixers. In most cases, we mention only one partial mixers and compilations, though as we have seen, there are many possibilities for partial mixers, and for compilations of the operators.
The resource counts given are upper bounds. We have not worked to find optimal compilations. For the simpler problems that have the potential for implementation in the near term, we give exact resource bounds when we have such results, rather than only giving the complexity. We give bounds for the number of computational qubits required, and also for the number of ancilla qubits, when used. We give resource counts for the number of arbitrary two-qubit gates required to implement a given operator, merging single qubit operations into two-qubit gates before or after when possible. We compute depth for such circuits, which gives a lower bound on the depth on nearterm hardware, which will generally be higher due to topological constraints. In many cases, we will not compile all the way down to two-qubit gates, but will instead specify the resource count in terms of multiqubit operators which would then yield a two-qubit gate count and depth given compilations of those multiqubit mixers. Throughout, the initial state is always obtained by appling a constant-depth circuit to the state |0 ⊗ * .
An algorithm A achieves an approximation ratio r ≤ 1 if for all instances x of a maximization problem it satisfies A(x)/OPT(x) ≥ r, where OPT(x) is the optimal solution of the given instance. Both cases in which r is constant and in which r is a function of the problem parameters are considered. For minimization problems the approximation ratio is similarly defined with r ≥ 1 and A(x)/OPT(x) ≤ r. Unfortunately, there are multiple conventions in the literature, with the approximation ratio given as 1/r for minimization, maximization, or both. Fortunately, there is no ambiguity for a given value of r to which convention convention is being used. Here, instead of being internally consistent, we will simply take the ratio as stated in the reference cited so as to facilitate easy comparison with the literature. All approximation results below concern efficient classical algorithms.
For all problems on a graph G = (V, E), let |V | = n, |E| = m, and D G be the maximum vertex degree.

A.1 Bitflip (X) mixers
In this section we consider problems where all states of the configuration space are feasible, and hence the original QAOA construction may be used. Specifically, for all problems we can use the following: Variables: n binary variables. Initial state: |0 ⊗n . Mixer: The standard mixer U H M (β) = e −iβB where B = m j=1 X j , which can be implemented with depth 1. (Since all terms commute and act on separate qubits, we do not need to consider partitioned variants of the standard mixer.) Phase separator: U P (γ) = exp[−iγH P ], we will specify H P for each problem. All problems in this section can be trivially extended to their weighted version by multiplying the terms of the phase Hamiltonian by the corresponding weights.

A.1.1 Maximum cut
Problem: Given a graph G = (V, E), find a subset S ⊂ V such that the number of edges between S and V \ S is the largest. Prior QAOA work: [22]. Approximability: APX-complete [45,60]. NP-hard to approximate better than 16/17 [35]. Semidefinite programming acheives 0.8785 [27], which is optimal under the unique games conjecture [46]. On bounded degree graphs with D G ≥ 3 can be approximated to within 0.8785 + O(D G ) [25], in particular 0.921 for D G = 3, but remains APX-complete [60]. Configuration space, Domain, and Encoding: {0, 1} n , indicating whether each vertex is in S or not. Objective: max |{{u, v} ∈ E : u ∈ S, v / ∈ S}|. Phase separator: Resource count for phase separator: m gates with depth at most D G + 1. Variant: Directed-MaxCut (Max-Directed-Cut), where we seek to maximize the number of the directed edges leaving S. The phase separator needs to be replaced by

A.1.5 Set Splitting
Problem: Given a set S and a collection of subsets {S j } m j=1 , seek a partition S = S 1 ∪ (S \ S 1 ) that maximizes the number of split sets, i.e. S j with elements in both S 1 and S \ S 1 . Approximability: APX-complete [63]. Can be approximated to 0.7499 [78]. Remains APX-complete if each S j is restricted to have at most or exactly k ≥ 2 elements [51]. For each S j having exactly k ≥ 4 elements, unless P=NP there is no efficient classical algorithm that does essentially better than a random partition [31,36]. The generalization MaxHypergraphCut, in which each subset is given a weight and we seek to maximize the total weight of the split sets, can be approximated to 0.7499 [78]. Reduction to Not-All-Equal--SAT: This problem is a special case of Not-All-Equal--SAT where none of the variable are negated.

A.1.6 MaxE3LIN2
Problem: Given a set of m three-variable equations A = {A}, over n binary variables x ∈ {0, 1} n , where each equation A j is of the form x a 1,j + x a 2,j + x a 3,j = b j mod 2 where a 1,j , a 2,j , a 3,j ∈ [n] and b j ∈ {0, 1}, find an assignment x ∈ {0, 1} n that maximizes the number of satisfied equations. Prior QAOA work: [23]. Approximability: No efficient 1 + classical approximation algorithm unless P=NP. [35] Objective: number of equations satisfied. Phase Separator: Configuration space: The configuration space of each problem is the set of subsets V of some set V , represented by a bitstring. Constraint graph: An instance of each problem is either specified by a graph or has a natural corresponding constraint graph whose vertices correspond to the variables and with respect to which each variable v has a neighborhood nbhd(v). Domain: The domain is the subset of the configuration that satisfies some CNF formula (whose clauses correspond to the edges of the problem or constraint graph, except for Min-SetCover). Objective: The cardinality of the subset. Mixing rule: Swap an element v in or out of V if some predicate χ(x nbhd(v) ) is satisfied by the partial state of its neighbors. (The predicate χ will depend on the problem.) Partial mixing Hamiltonian: H CX,v = X v H Pv , which can be used to define both a simultaneous controlled-bit-flip mixer and a class of partitioned controlled-bit-flip mixers. (See Sec. 3.2).

A.2.1 MaxIndependentSet [See Sec. 3.2]
Problem: Given G = (V, E), maximize the size of a subset V ⊂ V of mutually nonadjacent vertices. Prior QAOA work: [22]. Also, see Sec. 3.2 for a detailed discussion of the mapping. Approximability: Poly-APX-complete [7], and has no constant factor approximation unless P = NP. On bounded degree graphs with maximum degree D G ≥ 3 can be approximated to (D G + 2)/3 [7], but remains APX-complete [60]. Configuration space: All subsets V of V , represented by elements x of {0, 1} n , with

Mixing rule:
Swap v in or out of V if none of its neighbors are in V . Partial mixing Hamiltonian: Phase separator: U P (γ) = exp(−iγ u∈V Z u ) Initial state: |s = |0 ⊗n , i.e. V = ∅. Resource count: • Controlled-bit-flip mixers: n multiqubit-controlled-X(β) gates, each with at most D G controls (exactly D v controls for each vertex). Depth at most n, but will be much less for sparsely connected graphs. • Phase separator: n single-qubit Z-rotations. Depth 1.

A.2.2 MaxClique
Problem: Given G = (V, E), maximize the size of a clique (a subset V ⊂ V that induces a subgraph in which all pairs of vertices are adjacent). Approximability: Cannot be classically approximated better than O(n 1− ) for any unless P = N P [80]. Configuration space: All subsets V of V , represented by elements x of {0, 1} n , with Reduction to MaxIndependentSet: Every clique on G = (V, E) gives on independent set on the complement graph G = (V, E(G)) where E(G) = V 2 \ E. Therefore, a mapping for MaxClique is given by the mapping for MaxIndependentSet applied to the complement graph G. Resource count: The same resources as for MaxIndependentSet, except that the controlled bit flip mixers are now multiqubit-controlled-X gates with at most n − D G − 1 controls (exactly n − D v − 1 controls for each vertex v). Variants: Extends easily to weighted versions MaxIndependentSet, with objective func-

A.2.3 MinVertexCover
Problem: Given G = (V, E), minimize the size of a subset V ⊂ V such that for every (u, v) ∈ E, u ∈ V or v ∈ V . Approximability: APX-complete [60]. Has a 2 − Θ(1/ √ log n)-approximation [43], but cannot be approximated better than 1.3606 unless P=NP [18]. Configuration space: All subsets V of V , represented by elements x of {0, 1} n , with Objective: min |V |. Initial state: |s = |1 ⊗n , i.e. V = V . Mixing rule: Swap v in or out of V if all of the edges incident to v are covered by V ∩ nbhd(v). Phase Separator: U P (γ) = exp(−iγ u∈V Z u ). Reduction to MaxIndependentSet: A subset V ⊂ V is a vertex cover if and only if V \V is an independent set, so the problem of finding a minimum vertex cover is equivalent to that of finding a maximum independent set. While as approximation problems they are not equivalent [69], we can use the same mapping as for MaxIndependentSet with eachx v replaced by x v . The resource counts are the same as for MaxIndependentSet. Reduction to MinSat: Maranthe et al. [54] give an approximation-preserving reduction to Min-D G -SAT enabling us to use the QAOA constuction given above for MinSat. The resource counts are the same as for MinSat with m variables and n clauses.

Resource count:
• Controlled-bit-flip mixers: Each partial mixer H CX,j is implemented as a controlled-R X gate with D j control qubits. Partial mixer depth at most m. • Phase separator: m single-qubit Z-rotations. Depth 1.

Resource count:
• Controlled-bit-flip mixers: For each H CX,j , use |S j | ancilla qubits. Use each ancilla qubit i to compute ∈nbhd(j):i∈S x using a controlled NOT gate with |{ ∈ nbhd(j) : i ∈ S }| ≤ |nbhd(j)| = D j control qubits. Then implement H CX,j using a controlled X gate on qubit j with the |S j | ancilla qubits as the control. Finally, uncompute the ancilla qubits using the same |S j | controlled NOT gates as in the first step. Depth at most 2D j + 1 per partial mixer. • Phase separator: m single-qubit Z-rotations. Depth 1.
• Initial state: Depth 0. Variants: Equivalent to Minimum Hitting Set [3] and under L-reductions equivalent to Minimum Dominating Set (which is a special case of Minimum Set Cover).

A.3 XY mixers
Problem: Given a graph G and κ colors, maximize the size (number of edges) of a properly colored subgraph. Approximability: A random coloring properly colors a fraction 1 − 1/k of edges in expectation. Equivalent to MaxCut for k = 2. For k > 2, semidefinite programming gives a (1 − 1/k + (2 + o(k)) ln k k 2 )−approximation [26], which is optimal up to the o(k) factor under the unique games conjecture [46]. APX-complete for k ≥ 2 [60] and no PTAS unless P=NP [63].
where ADD i (z) adds z to the register i encoding an integer in binary.

A.3.2 Graph Partitioning (Minimum Bisection)
Problem: Given a graph G such that n is even, find a subset V 0 ⊂ V satisfying |V 0 | = n/2 such that the number of edges between V 0 and V \ V 0 is minimized. Prior AQO work: Studied in the context of AQO for constrained optimization [39]. Approximability: An efficient O(log 1.5 n)-approximate algorithm is known [48].

Resource count:
• Number of qubits: n.

A.3.3 Maximum Bisection
Problem: Given a graph G such that n is even, and edge weights w j , find a subset V 0 ⊂ V satisfying |V 0 | = n/2 such that the total weight of edges crossing between V 0 and V \ V 0 is maximized. Approximability: A random bisection gives an 0.5-approximation in expectation, improved to 0.65 [26]. Mapping: The same as for Graph Partitioning in Sec. A.3.2 with weights included in the phase separator.

A.3.4 Maximum Vertex k-Cover
Problem: Variant of Vertex Cover optimization problem. Given a graph G and an integer k ≤ n, find a subset V 0 ⊂ V of size |V 0 | = k such that the number of edges covered by V 0 is maximized. Approximability: It is NP-hard to decide whether or not a fraction (1 − ) of the edges can be k-covered [63]. Mapping: The same as for Graph Partitioning in Sec. A.3.2 with the Hamming weight n/2 replaced by k.

A.4 Controlled-XY mixers
Problem: Given a graph G and κ colors, maximize the size of a subset of vertices V ⊂ V whose induced subgraph is κ-colorable. Approximability: Equivalent to MaxIndependentSet for k = 1. Both as easy and as hard to approximate as MaxIndependentSet for k ≥ 1 [3,59]. On bounded degree graphs can be approximated to (D G /k + 1)/2 [33], but remains APX-complete. Configuration space: [κ + 1] n . (0-th color is "uncolored" and represents v /

Resource count:
• Partitioned controlled null-swap mixers: nκ partial mixers, each acting on at most D G + 1 qubits. Depth at most nκ, but will be much less for sparsely connected graphs. • Phase separator: n single-qubit Z-rotations. Depth 1.

A.4.2 MinGraphColoring (Sec. 3.4)
Problem: Given a graph G, minimize the number of colors required to properly color it. Approximability: The best classical algorithm [32] achieves approximation ratio O(n (log log n) 2 log 3 n ), and we cannot do better than O(n 1− ) for any > 0 unless P=NP [80]. For edge-colorings of multigraphs, there is a (1.1 + 0.8/κ * )-approximate algorithm [57]. Configuration space: [κ] n , κ = D G + 2. Domain: Proper κ-colorings of G (many of which use fewer than κ colors), x : {i,j}∈E NEQ(x i , x j ) . Objective: Minimize number of used colors: κ a=1 OR (EQ(x 1 , a), . . . , EQ(x n , a)). Mixing rule: The color of vertex u may be swapped between colors c and c if none of its neighbours are already colored c or c . Partial mixing Hamiltonian: Controlled-swap partial mixing Hamiltonian. Encoding: One-hot. Phase Separator: κ a=1 Λ OR(x [n],a ) e −iγ . Resource count: • Partitioned controlled-swap mixers: κ(κ − 1)n/2 controlled gates on no more than D G + 2 qubits. • Phase separator: κ partial phase separators acting on n + 1 qubits, one target qubit and n control qubits. Depth 2 in partial phase separators, or depth 1 with the addition of κ ancilla qubits. • Initial state: Any valid κ coloring (can be efficiently computed classically). Can be implemented in depth 1 using n single-qubit X gates. Reduction from MinEdgeColoring: In MinEdgeColoring, the objective is to minimize the number of colors need to color the edges so that no two adjacent edges have the same color. This is equivalent to MinGraphColoring on the line graph.

A.4.3 MinCliqueCover
Problem: Given a graph G, we seek the smallest collection of cliques S 1 , . . . , S k ⊂ V , such that every vertex belongs to at least one clique. Approximability: If MaxClique is approximable within f (n) for a given instance then MinCliqueCover approximable to O(f (n)) [3]. Not approximable within n for any > 0 [53]. Reduction to MinGraphColoring: A partition of the vertices of G is a k-clique cover if and only if it is a proper k-coloring of the complement graph G = (V, E c ), and moreover the smallest clique cover corresponds to the chromatic number of the complement graph. Thus the previous construction suffices. Problem: Given a set of n cities and distances d : [n] 2 → R + (with d(i, i) = 0), find an ordering of the cities that minimizes the total distance traveled on the corresponding tour. Approximability: NPO-complete [58]. MetricTSP is APX-complete [62] and has 3/2approximation [16]. The corresponding MaxTSP problem is approximable within 7/5 for symmetric distance, and 63/38 if asymmetric. Configuration space and domain: Orderings of the cities {i}. Objective: f (ι) = n j=1 d ι j ,ι j+1 . Partial mixer: Partial permutation swap mixer. Encoding: Direct one-hot. Compilation:

.2)
Problem: (1|d j | w j T 2 j ) Given a set of jobs with processing times p, deadlines d ∈ Z + , and weights w, find a schedule that minimizes the total weighted squared tardiness n j=1 w j T 2 j . Approximability: Considered in [67]. Configuration space and domain: Orderings of the jobs {i}, and an integer slack variable y j ∈ [0, d j − p j ] for each job. Objective: f (ι, y) = n j=1 w j (s j (ι) + p j − d j + y j ) 2 . Partial mixer: Partial permutation swap mixer for computational qubits; binary mixer mixer for the slack qubits. Encoding: Direct one-hot for the ordering variables; binary for the slack variables. Compilation: • Phase separator: The encoded phase separator is a 3-local Hamiltonian containing all 1-local terms; all 2-local terms of two computational qubits corresponding to different jobs at different places in the ordering, all 2-local terms of two ancilla qubits corresponding to the same job, and all 2-local terms of one computational qubit and one ancilla qubit except when they correspond to different jobs and the computational qubit corresponds to that job being last in the ordering; all 3-local terms of three computational qubits corresponding to different jobs at different places in the ordering and all 3-local terms containing two computational qubits corresponding to different jobs at different places in the ordering and one ancilla qubit corresponding to the later job.
• Initial state Arbitrary ordering. Resource count: • Color-parity permutation swap mixer (Sec. 4.1): at most (n − 1) n 2 4-qubit partial mixers, in depth at most 2n. Single-qubit X mixer for slack binary variables can be done in parallel with the permutation swap mixer. • Initial state: n single-qubit X gates. Depth 1.

A.5.3 SMS, minimizing total weighted tardiness [See Sec. 4.3]
Problem: (1|d j | w j T j ) Given a set of n jobs with processing times p, deadlines d ∈ Z + , and weights w, find a schedule that minimizes the total weighted tardiness n j=1 w j T j . Approximability: There is an (n − 1)-approximate algorithm [14]. The decision version is strongly NP-hard [49]. Configuration space and domain: Orderings of the jobs {i}. Objective: f (ι) = n i=1 w i max{0, d i − s i (ι) − p i }. Encoding: Absolute one-hot. Mixing rule: Swap two jobs only if they are scheduled in consecutive order. Partial mixer: Partial time-swap mixer (specific to the absolute encoding). Initial state: Arbitrary ordering. Compilation: • Phase separator: H Problem: (1|d j , r j |f ) Given a set of jobs with processing times p, deadlines d release times r ∈ Z + and weights w, find a schedule that minimizes some (given) function f of the schedule, e.g. weighted total tardiness such that each job starts no earlier than its release time. Approximability: For all deadlines being zero, the minimal weighted total tardiness (completion times in this special case) is 1.685-approximable [28].
is the window of times in which job j can start and b j is the job's "buffer" slot. Domain: Schedules in the configuration space such that no two jobs overlap. Objective: A given function f (s), e.g. weighted total tardiness f (s) = n j=1 w j T j , where T j = max{0, d i − s i − p i } is the tardiness of job j. Mixing rule: Swap a job between time t ∈ h and its buffer slot if no other job is running at time t. Partial mixer: Controlled null-swap mixer; see Sec. 4.4. Initial state: Greedy earliest-release-date schedule. Encoding: One-hot encoding. Compilation: • Partial mixer: H For strings and subsets thereof, we focus on mixing operators that are composed of independent mixing operators on each of the variables. Specifically, we consider the following single-qudit mixing operators in the absence of constraints: • r-nearby-values mixer: H r-NV = r a=1 X a + (X † ) a . The special cases of r = 1 and r = d − 1 are called the "ring mixer" and "fully-connected mixer", respectively.
• simple binary mixer: when d = 2 l is a power of two: H • "Null swap mixer": for cases when one of the d values corresponds to a "null" value (e.g. black or uncolored in graph coloring), H NS = d−1 a=1 (|0 a| + |a 0|).
For problem in the ordering/schedules family, we consider swapping-based mixers.
• value-selective permutation swap mixer: swaps the ith and jth elements in the ordering if those elements are u and v, Eq. (46) in Sec. 4.1.
• value-independent permutation swap mixer: swaps the ith and jth elements of the ordering regardless of which items those are, see Eq. (48). in Sec. 4.1.
When there are constraints, we considered modifications of the above mixers that are controlled on not violating the constraints. A few examples: • Controlled Null swap mixer:

B.1.2 Partitions:
The two elementary partitions we use for generating the family of partitioned mixing unitaries are: • parity-mixer: For Hamiltonian terms of type u A u B u+1 , where u ∈ [n] and A u and B u+1 are operators acting on qubit u and u + 1, respectively. Partition the index set {u} into even and odd subsets. See Sec. 3.1 for details.
• color-mixer: For index pairs (u, v) ∈ [n] 2 , let P col = (P 1 , . . . , P κ ) be an ordered partition of the indices [n] 2 into κ parts such that each part contains only mutually disjoint pairs of indices from [n]. This is equivalent to considering a κ-edge-coloring of the complete graph K n , and assigning an ordering to the colors, so we call P col the "color partition". For even n, κ = n − 1 suffices, and for odd n, κ = n. See Sec. 4.1 for its use.
For mixing Hamiltonians of different coupling connectivity, these partitions can be combined and tailored as desired. For example, the color-parity permutation swap mixer for TSP in Sec. 4.1 and the time-color partition for SMS (1|d j | w j T j ) in Sec. 4.3.

B.2 Encodings
We consider two types of encodings of strings into qubit space. The one-hot encoding enables more concise circuits at the expense of requiring more qubits, whereas the binary encoding makes the opposite trade-off.
A generalization of the binary encoding is the radix encoding, which represents a in r-base with an arbitrary integer r. While the binary bears the advantage in implentability, it is plausible that for some problems the general radix encoding could be a natural choice. Radix encoding is a generalization of the binary encoding with the latter bears the advantage of easy implentability.
For problem in the ordering/schedule family, the encoding is composed of two-steps: first encoding into strings, and then apply the above encodings into qubit space. The following encodings are considered for step-one: • direct encoding: an ordering ι = (ι 1 , . . . , ι n ) is encoded directly as a string [n] n of integers.
It is demonstrated for TSP and SMS (1|d j | j w j T 2 j ) in Sec. 4.1, and 4.2, respectively. • absolute encoding: To endoce the ordering ι = (ι 1 , . . . , ι n ), we assign each item i a value s i ∈ [0, h], where the "horizon" h is a parameter of the encodings, such that for all i < j, s ι i < s ι j .
It is demonstrated for SMS (1|d j | j w j T j ) in Sec. 4.3.
The name of the overall encoding features both steps. For example • direct one-hot encoding, see Sec. 4.1.

C Elementary operators
Here, we elaborate on some of basic quantum operators and their properties that are used in the text. Sec. C.1 explains the the relations between the SWAP gate and the XY-model Hamiltonian XX + Y Y , which are used as building blocks for encoded mixers in many mappings. Sec. C.2 contains a brief review of generalized Pauli operators for qudits of arbitrary dimension.

C.1 SWAP and XY opertors
We examine the relations between the quantum SWAP operator SWAP i,j , which swaps the state of qubit i and j, and the XY operator, X i X j + Y i Y j . In many of our mapping constructions, these operators can be used interchangeably. The XY operator can be expressed as We observe the following connections between the two Hamiltonians: • In the subspace spanned by {|01 , |10 }, X i X j +Y i Y j and SWAP i,j behave identically.
• In the subspace {|11 , |00 }, X i X j + Y i Y j acts as null while SWAP i,j acts as identity.
• The operators X i X j + Y i Y j and SWAP i,j are both unitary and Hermitian.
• Applied to a many-qubit system, both H SWAP = i,j SWAP i,j and H XY = i,j X i X j + Y i Y j preserve Hamming weights, hence so do the corresponding unitaries exp[−iβH XY ] and exp[−iβH SWAP ]. Although the two do not behave identically on the full Hilbert space, they can both serve as mixers in situations in which Hamming weight is the relevant constraint.
• To enforce simultaneous swaps of multiple qubit pairs, in {i,j} SWAP i,j , the SWAP i,j cannot be directly replaced by X i X j + Y i Y j ; see the TSP problem in Sec. 4.1 as an example.

C.2 Generalized Pauli gates for qudits
We consider d-dimensional qudits. Let ω = e 2πi/d . We have the following gates where all arithmetic is modulo d. As in the main text, we use operator notations "Z" and "X" for qudits of arbitrary dimension, and reserve "X" and "Z" specifically for the qubit case. For d = 2, they are the same. Note that for d > 2, the generalized Pauli operators, while unitary, are not Hermitian. In many cases, we use the sum of a generalized Pauli operator and its conjugate to generalize its qubit analog, e.g.X +X † to generalize X. BecauseX r = d−1 a=0 |a + r a|, it and its Hermitian conjugateX r + (X † ) r together generate transitions between |a and |a + r for any a. Sec. 3.1.1 uses this operator in the coloring of a single vertex.