Constrained mixers for the quantum approximate optimization algorithm

The quantum approximate optimization algorithm/quantum alternating operator ansatz (QAOA) is a heuristic to find approximate solutions of combinatorial optimization problems. Most literature is limited to quadratic problems without constraints. However, many practically relevant optimization problems do have (hard) constraints that need to be fulfilled. In this article, we present a framework for constructing mixing operators that restrict the evolution to a subspace of the full Hilbert space given by these constraints; We generalize the"XY"-mixer designed to preserve the subspace of"one-hot"states to the general case of subspaces given by a number of computational basis states. We expose the underlying mathematical structure which reveals more of how mixers work and how one can minimize their cost in terms of number of CX gates, particularly when Trotterization is taken into account. Our analysis also leads to valid Trotterizations for"XY"-mixer with fewer CX gates than is known to date. In view of practical implementations, we also describe algorithms for efficient decomposition into basis gates. Several examples of more general cases are presented and analyzed.


Introduction
The quantum approximate optimization algorithm (QAOA) [6], and its generalization, the quantum alternating operator ansatz (also abbreviated as QAOA) [11] is a meta-heuristic for solving combinatorial optimization problems that can utilize gate based quantum computers and possibly outperform purely classical heuristic algorithms. Typical examples that can be tackled are quadratic (binary) optimization problems of the form where Q f , Qg ∈ R n×n are symmetric n × n matrices. For binary variables x ∈ {0, 1}, any linear part can be absorbed into the diagonal of Q f and Qg. In this article we focus on the case where the constraint is given by a feasible subspace as defined in the following: Definition 1 (Constraints given by indexed computational basis states). Let H = (C 2 ) ⊗n the Hilbert space for n qubits, which is spanned by all computational basis states |zj , i.e., H = span{ |zj , 1 ≤ j ≤ 2 n , zj ∈ {0, 1} n }. Let the subset of all computational basis states defined by an index set J. This corresponds to which is a quadratic constraint.
If both UM and UP correspond to the time evolution under some Hamiltonians HM , HP , i.e., UM = e −iβH M and UP = e −iγH P , the approach can be termed "Hamiltonian-based QAOA" (H-QAOA). If the Hamiltonians HM , HP are the sum of (polynomially many) local terms it represents a sub-class termed "local Hamiltonian-based QAOA" (LH-QAOA).
In practice, it is not possible to implement UM or UP directly. It is necessary to decompose the evolution into smaller pieces, which means that instead of applying e −it(H 1 +H 2 ) one can only apply e −itH 1 and e −itH 2 . This process is typically referred to as "Trotterization". As an example, the simplest Suzuki-Trotter decomposition, or the exponential product formula [13,20] is given by where x is a parameter and H1, H2 are two operators with some commutation relation [H1, H2] = 0. Higher order formulas can be found for instance in [13]. Practical algorithms need to be defined using a few operators from a universal gate set, e.g., A good (and simple) indicator for the complexity of a quantum algorithm is given by the number of required CX gates. Overall, the most efficient algorithm is the one that provides the best accuracy in a given time [kronsjo1987algorithms].

Remark 1 (Repeated mixers).
If UM is the exponential of a Hermitian matrix, the parameter r in Equation (6) does not matter as it can be absorbed as a re-scaling of β. However, if UM is Trotterized this can lead to missing transitions. In this case r > 1 can again provide these transitions. It is therefore suggested in [11] to repeat mixers within one mixing step. For this reason, we will consider the cost of Trotterized mixers including the necessary repetitions to provide transitions for all feasible states.

Related work
The QAOA was introduced by [6] where it was applied to the Max-Cut problem. The authors in [9] compared the QAOA to the classical AKMAXSAT solver extrapolate from small instances to large instances and estimate that a quantum speed-up can be obtained with (several) hundreds of qubits. A general overview of variational quantum algorithms, including challenges and how to overcome them, is provided in [3,18]. Key challenges are that it is in general hard to find good parameters. It has been shown that the training landscapes are in general NP-hard [1]. Another obstacle are so-called barren plateaus, i.e. regions in the training landscape where the loss function is effectively constant [18]. This phenomenon can be caused by random initializations, noise, and over-expressablity of the ansatz [22,24] Since its inception, several extensions/variants of the QAOA have been proposed. ADAPT-QAOA [25] is an iterative, problem-tailored version of QAOA that can adapt to specific hardware constraints. A non-local version, referred to as R-QAOA [2] recursively removes variables from the Hamiltonian until the remaining instance is small enough to be solved classically. Numerical evidence shows that this procedure significantly outperforms standard QAOA for frustrated Ising models on random 3-regular graphs for the Max-Cut problem. WS-QAOA [5] takes into account solutions of classical algorithms to a warm-starting QAOA. Numerical evidence shows an advantage at low depth, in the form of a systematic increase in the size of the obtained cut for fully connected graphs with random weights.
There are two principal ways to take constraints into account when solving Equation (1) with the QAOA. The standard, simple approach is to penalize unsatisfied constraints in the objective function with the help of a so called Lagrange multiplier λ, leading to This approach is popular, since it is straightforward to define a phase separating Hamiltonian for f (x) + λg(x). Some applications include the tail-assignment problem [21], the Max-k-cut problem [7], graph coloring problems, and the traveling sales person problem [12]. A downside of this approach is that infeasible solutions are also possible outcomes, especially for approximate solvers like QAOA. This also makes the search space much bigger and the entire approach less efficient. In addition, the quality of the results turns out to be very sensitive to the chosen value of the hyper parameter λ.
On one hand, λ should be chosen large enough such that the lowest eigenstates of HP correspond to feasible solutions. On the other hand, too large values of λ mean that the resulting optimization landscape in the γ has very high frequencies, which makes the problem hard to solve in practice. In general, it can be very challenging to find (the problem dependent) value for λ that best balances the trade off between optimality and feasibility in the objective function [23]. For QAOA, a second approach is to define mixers that have zero probability to go from a feasible state to an infeasible one, making the hyper parameter λ of the previous approach unncessary. However, it is generally more challenging to devise mixers that take into account constraints. The most prominent example in the literature is the XY -mixer [12,11,23] which constrains evolution to states with non-zero overlap with "one-hot" states. One-hot states are computational basis states with exactly one entry equal to one. For instance |0001 and |010000 are one-hot states, while |00 and |110 are not. The name XY mixer comes from the related XY -Hamiltonian [16]. The mixers derived in the literature follow the intuition of physicists to use "hopping" terms. A performance analysis of the XY-mixer applied to the maximum k-vertex cover shows a heavy dependence on the initial states as well as the chosen Trotterization [4].
QAOA can be viewed as a discretized version of quantum annealing. In quantum annealing enforcing constraints via penalty terms is particularly "harmful" since they often require all-to-all connectivity of the qubits [14]. The authors in [15] therefore introduce driver Hamiltonians that commute with the constraints of the problem. This bears similarities with and actually inspired the approaches in [12,11].
The main contributions of this article are: • A general framework to construct mixers restricted to a set of computational basis states, see Section 3.1. • An analysis of the underlying mathematical structure, which is largely independent of the actual states, see Section 3.2. • Efficient algorithms for decomposition into basis gates, see Section 3.3 and 3.5.
• Valid Trotterizations, which is not completely understood in the literature, see Section 3.5.
• We prove that it is always possible to realize a valid Trotterization, see Theorem 3.
• Improved efficiency of Trotterized mixers for "one-hot" states in Section 5.1.
• Discussion of the general case, exemplified in Section 5.2.
We start by describing the general framework.

Construction of constraint preserving mixers
In the following we will derive a general framework for mixers that are restricted to a subspace, given by certain basis states. For example, one may want to construct a mixer for five qubits that is restricted to the subspace Sp(|01001 , |11001 , |11110 ) of C 2 5 , where Sp(B) denotes the linear span of B. In this section we will describe the conditions for a Hamiltonian-based QAOA mixer to preserve the feasible subspace, and for providing transitions between all pairs of feasible states. We also provide efficient algorithms to decompose these mixers into basis gates.

Conditions on the mixer Hamiltonian
the following statements hold.
• If T is symmetric, the mixer is well defined and preserves the feasible subspace, i.e. condition (5) is fulfilled.
• If T is symmetric and for all 1 ≤ j, k ≤ |J| there exists an r ∈ N ∪ {0} (possibly depending on the pair) such that (T r ) j,k = 0, (11) then UM provides transitions between all pairs of feasible states, i.e. condition (6) is fulfilled.
Almost trivially HM is Hermitian if T is symmetric, Since HM is a Hermitian (and therefore normal) matrix there exists a diagonal matrix D, with the entries of the diagonal as the (real valued) eigenvalues of HM , and a matrix U , with columns given by the corresponding orthonormal eigenvectors. The mixer is therefore well defined through the convergent series Reformulations.
We can rewrite HM in the following way where the columns of the matrix E ∈ R 2 n ×|J| consist of the feasible computational basis states, i.e., E = [xj]j∈J , see Figure 1 for an illustration. Using that E T E = I ∈ R |J|×|J| is the identity matrix, we have that and Equation (13) can be written as Preservation of the feasible subspace. Let |v ∈ Sp(B). Using Equation (15) we know that with coefficients cj ∈ C. Therefore, also e −itH M |v ∈ Sp(B) , t ∈ R, since it is a sum of these terms. Transition between all pairs of feasible states. For any pair of feasible computational basis states |xj * , |x k * ∈ B we have that It is enough to show that f (t) is not the zero function. Since f (t) : R → C is an analytic function it has a unique extension to C. Assume that f is indeed the zero function on R, then the extension to C would also be the zero function and all coefficients of its Taylor series would be zero. However, we assumed the existence of an r ∈ N ∪ {0} such that |(T r ) j * ,k * | > 0, and hence there exists a non-zero coefficient, which is a contradiction to f being the zero function.
A natural question is how the statements in Theorem 1 depend on the particular ordering of the elements of B.
Corollary 1 (Independence of the ordering of B.). Statements in Theorem 1 that hold for a particular ordering of computational basis states for a given B, hold also for any permutation π : {1, · · · , |J|} → {1, · · · , |J|}, i.e., they are independent of the ordering of elements. For each ordering, the transition matrix T changes according to Tπ = P T π T Pπ, where Pπ is the permutation matrix associated with π.
Proof. We start by pointing out that the inverse matrix of Pπ exists and can be written as P −1 π = P π −1 = P T π . The resulting matrix HM is unchanged. Following the derivation in Equation (14), we have that HM π = EπTπE T π , where the columns of the matrix E ∈ R 2 n ×|J| consists of the permuted feasible computational basis states, i.e., Eπ = {x π(j) }j∈J . Inserting T = P T π T Pπ we have indeed If the condition in Equation (11) holds for T than it also holds for Tπ. Using T r π = P T π T r Pπ we can show that Equation (11) holds for the permuted index pair (π(j), π(k)) for Tπ if it holds for (j, k) for T .
In the following, if nothing else is remarked, computational basis states are ordered with respect to increasing integer value, e.g., |001 , |010 , |111 .
Apart form special cases, there is a lot of freedom to choose the transition matrix T that fulfills the conditions of Theorem 1. The entries of T will heavily influence the circuit complexity, which will be investigated in Section 3.3. In addition, we have the following property which adds additional flexibility to develop efficient mixers.  Proof. Any |v ∈ B is in the null space of HM,C , i.e., HM,C |v = 0 and hence UM,C |v = I. Therefore, UM,BUM,C |v = UM,B |v ∈ B, and UM,C UM,B |v = UM,C |w = |w with |w ∈ B which means the feasible subspace is preserved. Condition (6) follows similarly form the fact that UM,C |v = I for any |v ∈ B.
Corollary 2 naturally holds as well for any linear combination of mixers, i.e., HM,B + i aiHM,C i is a mixer for the feasible subspace Sp(B) as long as Sp(Ci) ∩ Sp(B) = {0}, ∀i. At first, it might sound counter intuitive that adding more terms to the mixer results in more efficient decomposition into basis gates. However, as we will see in Section 5, it can lead to cancellations due to symmetry considerations.
Next, we describe the structure of the eigensystem of UM . Proof. Let (λ, v) an eigenpair of T . Then, HM Ev = ET E T Ev = ET v = λEv, so (λ, Ev) is an eigenpair of HM . The connection between HM and UM is general knowledge from linear algebra.
An example illustrating Corollary 3 is provided by the transition matrix T ∈ R 4×4 with zero diagonal and all other entries equal to one. A unit eigenvector of T, which fulfills Theorem 1, is v = 1/2(1, 1, 1, 1) T . For any B = {|z1 , |z2 , |z3 , |z4 } the uniform superpositions of these states is an eigenvector, since This result holds irrespective of what the states are and which dimension they have.
Theorem 2 (Products of mixers for subspaces). Given the same setting as in Theorem 1. For any decomposition of T into a sum of Q symmetric matrices Tq, in the following sense we construct the mixing operator via Proof. Combining Equations (15) and (16) we have Using that T only has positive entries and the condition in Equation (20), the same argument as in Theorem 1 can be used to show that UM (β) is not the zero function and therefore we have transitions between all pairs of feasible states.
As Theorem 1 leaves a lot of freedom for choosing valid transition matrices we will continue by describing important examples for T .

Transition matrices for mixers
Theorem 1 provides conditions for the construction of mixer Hamiltonians that preserve the feasible subspace and provide transitions between all pairs of feasible computational basis states, namely The black color represents non-vanishing entries equal to one, representing pairs with the specified Hamming distance.
Remarkably, these conditions depend only on the dimension of the feasible subspace |J| = dim(Sp(B)) = |B|, and are independent of the specific states that B consists of. In addition, Corollary 1 shows that these conditions are robust with respect to reordering of rows if in addition columns are reordered in the same way. Moreover, Equation (17) shows also that the overlap between computational basis states |xj , |x k ∈ B is independent of the specific states that B consists of and only depends on T, since the right hand side of the expression is independent of the elements in B. This allows us to describe and analyze valid transition matrices by only knowing the number of feasible states, i.e., |B|. What these specific states are is irrelevant, unless one wants to look at what an optimal mixer is, which we will come back to in Section 3.4. Figure 3 provides a comparison of some mixers described in the following with respect to the overlap between different states. In the following, we denote the matrix for pairs of indices whose binary representation have Hamming distance equal to d as Examples of the structure of T Ham(d) can be found in Figure 4. Furthermore, it will be useful to denote the matrix which has two non-zero entries at (k, l) and (l, k) as Before we start, we point out that the diagonal entries of T can be chosen to be zero, because |(T 0 )j,j| = 1 = 0 for all j ∈ J. Although trivial, we will repeatedly use that v = 1 √ |J| (1, 1, · · · , 1) T is an eigenvector of a matrix F ∈ C |J|×|J| if the sum of all rows are a multiple of v.

Hamming distance one mixer T Ham(1)
The matrix T Ham(1) ∈ R |J|×|J| fulfills Theorem 1 when |J| = 2 n , n ∈ N. The symmetry of T Ham(1) ) is due to the fact that the Hamming distance is a symmetric function. Using the identity it can be shown that where c k are real coefficients. Therefore, it is clear that T k Ham(1) reaches all states with Hamming distance K. Furthermore, v = 1 √ 2 n (1, 1, · · · , 1) T is a unit eigenvector of T Ham(1) since the sum of each row is n. This is because there are exactly n other states with Hamming distance one for each bitstring.

All-to-all mixer T A
We denote the matrix with all but the diagonal entries equal to one as Trivially, TA ∈ R |J|×|J| fulfilles Theorem 1 and v = 1 √ |J| (1, 1, · · · , 1) T is a unit eigenvector of TA since the sum of each row is |J| − 1.

(Cyclic) Nearest integer mixer T ∆ /T ∆,c
Inspired by the stencil of finite-difference methods we introduce T∆, T∆,c ∈ R |J|×|J| as matrices with off-diagonal entries equal to one Both matrices fulfill Theorem 1. Symmetry holds by definition and it is easy to see that the k-th off-diagonal of T k ∆ and T k ∆,c is nonzero for 1 ≤ k ≤ |J|. For the nearest integer mixer T∆ it is known that v k = (sin(c), sin(2c), · · · , sin(|J|c)), c = kπ |J| + 1 are eigenvectors for 1 ≤ k ≤ |J|. For the cyclic nearest integer mixer, we have that the sum of each row/column of T∆,c is equal to two (except for n = 1 when it is one). Therefore, v = 1 √ |J| (1, 1, · · · , 1) T is a unit eigenvector.

Products of mixers and T E , T O
In some cases, it will be necessary to use Theorem 2 to implement mixer unitaries. When splitting transitions matrices into odd and even entries the following definition is useful. Denote the matrix with entries in the d-th off-diagonal for even rows equal to one and accordingly T O(d) for odd rows. In addition, we will use T O(1),c to be the cyclic version in the same way as in Equation (28

Random mixer T rand
Finally, the upper triangular entries of the mixer T rand are drawn from a continuous uniform distribution on the interval [0, 1], and the lower triangular entries are chosen such that T becomes symmetric. Since the probability of getting a zero entry is zero, such a random mixer fulfilles Theorem 1 with probability 1.

Decomposition of (constraint) mixers into basis gates
Given a set of feasible (computational basis) states B = {|xj , j ∈ J, xj ∈ {0, 1} n }, we can use Theorem 1 to define a suitable mixer Hamiltonian. The next question is how to (efficiently) decompose the resulting mixer into basis gates. In order to do so we first decompose the Hamiltonian HM into a weighted sum of Pauli-strings. A Pauli-string P is a Hermitian operator of the form P = P1 ⊗ · · · ⊗ Pn where Pi ∈ {I, X, Y, Z}. Pauli-strings form a basis of the real vector space of all n-qubit Hermitian operators. Therefore, we can write ci 1 ,··· ,in σi 1 ⊗ · · · ⊗ σi n , ci 1 ,··· ,in ∈ R, with real coefficients ci 1 ,··· ,in , where σ1 = I, σ2 = X, σ3 = Y, σ4 = Z. After using a standard Trotterization scheme [13,20] (which is exact for commuting Paul-strings), it is well-established how to implement each of the terms of the product using basis gates, see Equation (33). We will discuss the effects of Trotterization in more detail in Section 3.5, as there are several important aspects to consider for a valid mixer.
Here, S is the S or Phase gate and H is the Hadamard gate. The standard way to compute the coefficients ci 1 ,··· ,in is given in Algorithm 1. For n qubits this requires to compute 4 n coefficients, as . end well as multiplication of 2 n × 2 n matrices. However, most of these terms are expected to vanish. We therefore describe an alternative way to produce this decomposition, using the language of quantum mechanics [19]. In the following we use the ladder operators used in the creation and annihilation operators from the second quantization formulation in quantum chemistry defined by Since a |0 = 0, a |1 = |0 , where 0 is the zero vector, we have that |0 1| = a. Since a † |0 = |1 , a † |1 = 0, we have that |1 0| = a † , and finally a † a |0 = 0, a † a |1 = |1 , aa † |0 = |0 , aa † |1 = 0, means that |0 0| = aa † and |1 1| = a † a. Note that As an example, consider the matrix M = |01 10| = |0 1| ⊗ |1 0| which can be expressed with ladder operators as M = a1a † 2 . Another example is given by M = |01 11| = a1a † 2 a2. This approach clearly extends to the general case and leads to Algorithm 2.
A comparison of the complexity of the two algorithms is given in Table 1. The naive algorithm needs to perform a matrix-matrix multiplication with matrices of size 2 n × 2 n for each of the 4 n coefficients. This quickly becomes prohibitive for larger n. The algorithm based on ladder operators requires resources that scale with the number of non-zero entries of the transition matrix T , which is much more favourable. In the end a symbolic mathematics library is used to simplify the expressions in order to create the list of non-zero Pauli-strings.
simplify S (e.g., using a library for symbolic mathematics) this defines the non-vanishing coefficients c i1,··· ,in

Optimality of mixers
On current NISQ devices, the noise level of two-qubit gate (CX) times and error rates are one order of magnitude higher than for single qubit gates (U3). In addition, most devices lack all-to-all connectivity. CX gates between these require SWAP operations, which consist of additional CX gates. An optimal mixer will therefore contain as few CX gates as possible. Since Pauli-strings are implemented according to Equation (33)  2(len(σi 1 ⊗ · · · ⊗ σi n ) − 1), where len(P ) is the length of a Pauli-string P defined as the number of literals that are not the identity. For instance P = IXIIY = I1X2I3I4Y5 = X2Y5 has len(P ) = 2. The Cost(HM ) specifies the number of CX gates that are required to implement the mixer. A lower cost means fewer and/or shorter Pauli-strings. There are four interconnected factors that influence the cost to implement the mixer for a given B.

Transition matrix T
The larger |B| the more freedom we have in choosing the transition matrix T that fulfills Theorem 1. The combination of T and the specific states of B define the cost of the Hamiltonian. Unless one can find a way to utilize the structure of the states of B to efficiently compute an optimal T , we expect this problem to be NP-hard. In practice, a careful analysis of the specific states of B is required to determine T such that the cost becomes low. We will revisit optimality for both unrestricted and restricted mixers in Sections 4 and 5.

Adding mixers
Corollary 2 allows one to add mixers with a kernel that contains Sp(B). In general, also this is a combinatorial optimization problem which we do not expect to solve exactly with an efficient algorithm. However, we will provide a heuristic that can be used to reduce the cost of mixers in certain cases. We will provide more details in Section 5 where we discuss constrained mixers on some examples in detail.

Non-commuting Pauli-strings
Depending on the mixer -which depends on the transition matrix and addition of mixers outside the feasible subspace -one can influence the commutativity pattern of the resulting Pauli-strings. This is an intricate topic, which we discuss next.

Trotterizations
Algorithms 1 and 2 produce a weighted sum of Pauli-strings equal to the mixer Hamiltonian HM defined in Theorem 1. A further complication arises when the non-vanishing Pauli-strings of the mixer Hamiltonian HM do not all commute. In that case one can not realize UM exactly but has to find a suitable approximation/Trotterization, see Equation (32). Two Pauli-strings commute, i.e., [PA, PB] = PAPB − PBPA = 0 if, and only if, they fail to commute on an even number of indices [8].
An example is given in Figure 6. This problem is similar to a problem for observables; how does one divide the Pauli-strings into groups of commuting families [8,10] to maximize efficiency and increase accuracy? In order to minimize the number of measurements required to estimate a given observable one wants to find a "min-commuting-partition"; given a set of Pauli-strings from a Hamiltonian, one seeks to partition the strings into commuting families such that the total number of partitions is minimized. This problem is NP-hard in general [8]. However, based on Theorem 3 we expect our problem to be much more tractable.
For our case, it turns out that not all Trotterizations are suitable as mixing operators; they can either fail to preserve the feasible subspace, i.e., Equation (5), or fail to provide transitions between all pairs of feasible states, i.e., Equation (6). An example is given by B = {|001 , |010 , |100 } with the mixer HM = 1 2 (XIX + Y IY ) + 1 2 (XXI + Y Y I) associated with T∆ = T1↔2 + T2↔3, see Section 5.1. Looking at Figure 6, these terms can be grouped into commuting families in two ways, which represent two (of many) different ways to realize the mixer unitary with basis gates.

The first possible Trotterization is given by
However, it turns out that ∃β ∈ R such that | 111| U1(β)U2(β) |z | > 0 for all |z ∈ B. This means that this Trotterization does not preserve the feasible subspace and does not represent a valid mixer Hamiltonian. The underlying reason for this is that the terms XXI and Y Y I are generated from the entry T1↔2, but are split in this Trotterization. The same holds true for IXX and IY Y which are generated via T2↔3.
We have just learned that it is a bad idea to Trotterize terms that belong to a non-zero entry of T, i.e. to Tj↔i. Therefore, we need to show that all non-vanishing Pauli-strings of |xj xi| + |xi xj| commute; otherwise there might exist subspaces for which we can not realize the mixer constructed in Theorem 1. Luckily, the following Theorem shows that it is always possible to realize a mixer by Trotterizing according to non-zero entries of T = i,j∈J,i<j Tj↔i.
For n = 1 we have the following cases.

(39)
Case x = y. The number of indices where two Pauli-strings commute does not change when going from An to An ⊗ (I ± Z). The same holds for Bn and Bn ⊗ (I ± Z). This means that the assertion is true for x = y. Case x = y. First, we prove that all non-vanishing Pauli-strings of An+1 commute, and the same for Bn+1. This is easy to see, since non-vanishing Pauli strings of An ⊗ X ± Bn ⊗ Y must have an even number of indices where they fail to commute. The same is true for Bn ⊗ X ± An ⊗ Y . Finally, we prove that non-vanishing Pauli-strings of An+1 do not commute with non-vanishing Pauli-strings of Bn+1. Using our assumptions of non-vanishing Pauli strings of An and Bn it is easy to show that non-vanishing Pauli-strings of the following pairs fail to commute on an odd number of indices (An ⊗ X, Bn ⊗ X), (An ⊗ X, ±An ⊗ Y ), (±Bn ⊗ Y, Bn ⊗ X), and hence do not commute. This shows that also non-vanishing Pauli strings of An+1 and Bn+1 do not commute.
The proof in Theorem 3 inspires the following algorithm to decompose HM into Pauli-strings. For each item in the list S that the algorithm produces, all Pauli-strings commute. We can illustrate the difference between Algorithms 2 and 3 for B = {|01 , |10 } and T1↔2. With Algorithm 2 we have P1,2 = 1 4 (X − iY )(X + iY ) and P2,1 = 1 4 (X + iY )(X − iY ) which can be simplified to S = P1,2 + P2,1 = 1 2 (XX + Y Y ). With Algorithm 3 we have A1 = X, B1 = Y and S = A2 = 1 2 (A1X + B1Y ) = 1 2 (XX + Y Y ) without the need to simplify the expression. As shown above, Trotterizations can also lead to missing transitions. It is suggested in [11] that it is useful to repeat mixers within one mixing step, which corresponds to r > 1 in Equation (6). However, as we see in Figure 5, there can be more efficient ways to get mixers which provide transitions between Algorithm 1 Algorithm 2 Algorithm 3 Table 1: Comparison of the complexity of the two algorithms for n qubits. Here, γ is the number of nonzero entries of T .
all pairs of feasible states. One way to do so is to construct an exact Trotterization (restricted to the feasible subspace) as described in [23]. However, the ultimate goal is not to avoid Trotterization errors, but rather to provide transitions between all pairs of feasible states. We will revisit the topic of Trotterizations in Section 5 in more detail for each case and show that there are more efficient ways to do so.

Full/Unrestricted mixer
We start by applying the proposed algorithm to the case without constraints, i.e., for the case g = 0 in Equation (1), in order to check for consistency and new insight. We will see that the presented approach is able to reproduce the "standard" X mixer as one possibility, but provides a more general framework. For this case B = { |xj , j ∈ J, xj ∈ {0, 1} n }, J = {i, 1 ≤ i ≤ 2 n } which means that Sp(B) = H. Furthermore, using Equation (14) we have that HM,B = T , since E is the identity.

T Ham(1) aka "standard" full mixer
The Hamiltonian of the standard full mixer for n qubits can be written as The last identity in Equation (40) shows that HM is created by the transition matrix given by T Ham (1) . This assumes that the feasible states in B are ordered from the smallest to the largest integer representation.

All-to-all full mixer
For |J| = 2 n the full mixer TA can be written as TA = n j=1 T Ham(j) . For the case T Ham(2) the resulting Hamiltonian HM does not provide transitions between all pairs of feasible states, but we observe that HM = j,k∈J (T Ham(2) ) j,k |xj x k | = · · · n jm=j m−1 +1 which consists of all  Table 2: Full/unrestricted mixer case for n qubits, i.e., |B| = 2 n . Comparison of the total Hamming distance of the transition matrix T as well as resulting requirements for implementations in terms of single-and two-qubit gates for different T . Figure 6: In the commutation graph (middle) of the terms of the mixer given in Equation (47) an edge occurs if the terms commute. From this we can group terms into three (nodes connected by green edge) or two (nodes connected by red/blue edges) sets. Only the left/green grouping preserves the feasible subspace.

Constrained mixers
We start by describing what is known as the "XY"-mixer [7,11,23], before we explore more general cases. Our framework provides additional insights into this case and inspires further improvement of the algorithms above with respect to the optimality of the mixers, described in Section 3.4, by (possibly) reducing the length of Pauli-strings. For this case, we will analyze TA, T∆, and T∆,c only. T Ham(1) only makes sense when n is a power of two, and T rand has in general high cost, see Table 2.

"One-hot" aka "XY"-mixer
We are concerned with the case given by all computational basis states with exactly one "1", i.e., B = {|x , xj ∈ {0, 1} n , s.t. xj = 1}. These states are sometimes referred to as "one-hot". We have that n = |B| is the number of qubits. After some motivating examples we present the general case for constructing mixers for any n > 2.
which has Cost(HM ) = 2. No Trotterization is needed in this case.

The general case n > 2
We start with the observation that for any symmetric T ∈ R n×n with zero diagonal we have The cost for implementing one of the entries, i.e., e −iβP j,k is given by the recursive formula where f l n is Pascal's triangle starting with 2 instead of 1. Examples of the resulting costs for different transition matrices can be seen in Table 3.
The cost of the mixers can be considerably reduced by adding mixers generalized from case n = 3. If the entries (T )i↔j of T are non-zero, we can add mixers for each of the 2 n−2 pairs of states x ∈ {0, 1} n that fulfill that (xi = 0 ∧ xj = 1) ∨ (xi = 1 ∧ xj = 0). We can enumerate them with 0 ≤ l ≤ 2 n−2 − 1 by ‹ B l i,j = {|x , x ∈ {0, 1} n , s.t. x−i,−j = bin(l)}, where x−i,−j removes the indices i and j of x. We have that B ∩ ‹ B l i,j = ∅. We observe that for n ≥ 2 let |x , |z with Ham(x, z) = 2, i.e., the strings x, z differ at exactly two positions we have that Adding these mixers for each nonzero entries T j↔k of T has the effect of summing over all possible combinations of (I ± Z) ⊗2 n−2 which is equal to the identity. Therefore, we get the mixer which reduces the cost of one term to Cost(P j,k ) = 4.

Trotterizations
Not all Pauli-strings of the mixer in Equation (51) commute. This necessitates a suitable and efficient Trotterization. We will use Theorem 2 and Theorem 3 to identify valid Trotterized mixers. As pointed out in [23] when n is a power of two one can realize a Trotterization which is exact in the feasible subspace B. Termed simultaneous complete-graph mixer, this involves all possible pairs (i, j) corresponding to a certain Trotterization of mixer for TA. We will see that there are more efficient mixers that provide transitions between all pairs of feasible states.
Another possibility is to Trotterize T∆,c or T∆ according to odd and even entries as described in Section 3.2.4. This is what is termed a parity-partitioned mixer in [23]. However, fewer and fewer feasible states can be reached as n increases, as we have seen in Figure 5. Repeated applications (r > 0 in Equation (6)) are necessary and r increases with increasing n. Figure 7 shows a comparison of different Trotterizations. As the cost of the mixer is dictated by the number of non-zero entries of the transition matrix, it is more efficient to add mixers for off-diagonals according to i∈I (T O(i) +T E(i) ) for some suitable index set I.

General cases
In this section we analyze some specific cases that go beyond unrestricted mixers and mixers restricted to one-hot states.

Example 2
Finally, we investigate the case B = {|10010 , |01110 , |10011 , |11101 , |00110 , |01010 }, which restricts to 6 of the total 2 5 = 32 computational basis states for 5 qubits. It is not clear a priori if for any (distinct) pair Ti 1 ,j 1 and Ti 2 ,j 2 all pairs of non-vanishing Pauli strings commute. In order to fulfill Equation (6) for r = 1, this means that one needs to Trotterize according to all pairs of TA as shown in Table 5. The resulting cost for this Trotterized mixer is Cost(HM ) = 1360. Since H ∩ B is spanned by k = 2 n − |B| = 26 computational basis states, there are Å k 2 ã = 325 different pairs to add to each Ti↔j. As Table 5 shows this can reduce the cost of the resulting mixer to Cost(HM ) = 568. Of course, there is the possibility to reduce the cost even further by adding more mixers for states in the kernel of HM,B. However, this quickly becomes computationally very demanding, when all possibilities are considered in a brute-force fashion.

Availability of Data and Code
All data and the python/jupyter notebook source code for reproducing the results obtained in this article are available at https://github.com/OpenQuantumComputing.

Conclusion and Outlook
While designing mixers with the presented framework is more or less straight forward, designing efficient mixers turns out to be a difficult task. An additional difficulty arises due to the need for Trotterization. Somewhat counter-intuitively, the more restricted the mixer, i.e., the smaller the subspace, the more design freedom one has to increase efficiency. More structure/symmetry of the restricted subspace seems to allow for a lower cost of the resulting mixer. For the case of "one-hot" states, we provide a deeper understanding of the requirements for Trotterizations. Compared to state of the art in the literature, this leads to a considerable reduction of the cost of the mixer, as defined in Equation (36). The introduced framework reveals a rigorous mathematical analysis of the underlying structure of mixer Hamiltonians and deepens the understanding of those. We believe the framework can serve as the backbone for further development of efficient mixers.
When adding mixers, in general the kernel of HM,B is spanned by k = 2 n − |B| computational basis states. Therefore, one can add different mixers for each non-zero entry Ti↔j of T . Out of all these, one wants to find the combination leading to the lowest overall cost. Clearly, brute-force optimization is computationally not tractable, even for a moderate number of qubits n when |B| 2 n . Further research should aim to carefully analyze the structure of the basis states in B in order to develop efficient (heuristic) algorithms to find low-cost mixers through adding mixers in the kernel of HM,B.