Efficient Implementation of Discrete-Time Quantum Walks on Quantum Computers

Quantum walks have proven to be a universal model for quantum computation and to provide speed-up in certain quantum algorithms. The discrete-time quantum walk (DTQW) model, among others, is one of the most suitable candidates for circuit implementation due to its discrete nature. Current implementations, however, are usually characterized by quantum circuits of large size and depth, which leads to a higher computational cost and severely limits the number of time steps that can be reliably implemented on current quantum computers. In this work, we propose an efficient and scalable quantum circuit implementing the DTQW on the 2n-cycle based on the diagonalization of the conditional shift operator. For t time steps of the DTQW, the proposed circuit requires only O(n2+nt) two-qubit gates compared to the O(n2t) of the current most efficient implementation based on quantum Fourier transforms. We test the proposed circuit on an IBM quantum device for a Hadamard DTQW on the 4-cycle and 8-cycle characterized by periodic dynamics and by recurrent generation of maximally entangled single-particle states. Experimental results are meaningful well beyond the regime of few time steps, paving the way for reliable implementation and use on quantum computers.

There are two main models of quantum walk: discrete-time quantum walk (DTQW) [1] and continuous-time quantum walk (CTQW) [2].The CTQW is defined on the position Hilbert space of the quantum walker and the evolution is driven by the Hamiltonian H of the system, U(t) ≡ exp(−itH/h).The DTQW is defined on a Hilbert space comprising an additional coin space and the evolution is driven by a position shift operator, S, controlled by a quantum coin operator, C, acting at discrete time steps.The single time-step operator is defined as U = S(C ⊗ I p ), with I p the identity in position space, from which U(t) ≡ U t arXiv:2402.01854v2[quant-ph] 9 Apr 2024 with t ∈ N. As pointed out in [46], implementing the evolution operator of a DTQW is simplified by the fact that (i) the time is discrete, (ii) the evolution is repetitive, U(t) = U t , and (iii) U acts locally on the coin-vertex states encoding the graph (applying U to the coin-vertex states associated to a given vertex will propagate the corresponding amplitudes only to adjacent vertices).
DTQWs already have efficient physical implementations in platforms that natively support the conditional walk operations, e.g., photonic systems [44,45].However, devising efficient implementations on digitized quantum computers is desirable and necessary in order to make DTQWs available to develop quantum algorithms for general purpose quantum computers and, in general, quantum protocols to be implemented in circuit models.A first circuit implementation of a DTQW on the cycle was realized on a multiqubit nuclear-magnetic-resonance system [47], and thereafter proposals of efficient implementation on certain graphs [48][49][50][51], for position-dependent coin operators [52], and for staggered quantum walks (a coinless discrete-time model of quantum walk) [53] have been devised.
In this work we propose an efficient and scalable quantum circuit implementing the DTQW on the 2 n -cycle, a finite discrete line with 2 n vertices and periodic boundary conditions.Although this is the simplest DTQW one may think of, implementing it on quantum computers already highlights the limitations of actual quantum devices [54][55][56][57].In this model, the position state of the quantum walker is encoded in a n-qubit state.To the best of our knowledge, the most efficient state-of-the-art implementation of a DTQW [58] overall requires O(n 2 t) two-qubit gates for t time steps, because it involves one quantum Fourier transform (QFT) and one inverse QFT (IQFT) at each time step.In quantum computers, two-qubit gates are the noisiest and take the longest time to execute, so any efficient quantum circuit should aim at significantly reducing their number.Our quantum circuit accomplishes this task through a wise use of the unitary property of the QFT: Independently of t, our circuit involves only one QFT (at the beginning) and one IQFT (at the end), and thus it overall requires O(n 2 + nt) two-qubit gates.Accordingly, the advantage gets larger and larger for long times, passing, for t ≫ n, from O(n 2 t) in [58] to O(nt) in the present scheme.For illustrative purposes, we implemented the proposed quantum circuit on an actual quantum hardware-ibm_cairo, a 27-qubit high-fidelity quantum computer-considering a Hadamard DTQW on the 4-and 8-cycle.Results indicate that our circuit outperforms current efficient circuits also in the regime of few time steps and provide experimental evidence of the recurrent generation of maximally entangled single-particle states in the 4-cycle [59].
The paper is organized as follows.Sec. 2 reviews the DTQW model on the N-cycle.Sec. 3 introduces the efficient quantum circuit we designed for the DTQW on the 2 n -cycle and compares it with other existing schemes.Sec. 4 presents and discusses the results from testing the proposed circuit on a quantum hardware.Finally, Sec. 5 is devoted to conclusions and perspectives.Selected technical details are deferred to the appendices.

The model: DTQW on the N-cycle
A N-cycle, or circle, is a 1D lattice having N vertices and periodic boundary conditions.To each vertex, labeled by j = 0, . . ., N − 1, we associate a quantum state, |j⟩, which represents the walker localized at such vertex.In a DTQW, the quantum walker has an external degree of freedom, the position, and an internal one, the coin.Associated to each degree of freedom is a Hilbert space: A N-dimensional position Hilbert space H (N) p = span {|j p ⟩ : j = 0, . . ., N − 1} and a two-dimensional coin Hilbert space H (2) c = span({|s c ⟩ : s = 0, 1}).We use the label "p" to refer to walker's position degree of freedom and "c" to coin degree of freedom.Depending on the coin state, the walker can move counterclockwise (s = 0) or clockwise (s = 1) on the cycle (Fig. 1).The full Hilbert space This is the natural basis for a DTQW and in the following we will refer to it as the computational basis.The coin basis state are |0 c ⟩ = (1, 0) ⊺ and |1 c ⟩ = (0, 1) ⊺ , with ⊺ denoting the transpose without complex conjugation; the position basis states are |j p ⟩ = (0, . . ., 0, 1, 0, . . ., 0) ⊺ , with the only nonzero element in position j.Accordingly, a generic coin-position basis state |s c ⟩|j p ⟩ = |s c ⟩ ⊗ |j p ⟩ is represented by the column vector of length 2N, whose first N entries are related to s = 0 and the last N to s = 1.The only nonzero entry is the (Ns + j)-th one.The evolution is ruled by the unitary single time-step operator with I p the identity in position space and C the coin operator acting on the coin state.Coin and conditional shift operators must be unitary for U to be unitary.The conditional shift operator S acts on the full Hilbert space and makes the walker move according to the coin state: S|s c ⟩|j p ⟩ = |s c ⟩|[(j + 2s − 1) mod N] p ⟩, where operations in position space are performed modulo N.Such operator can be written as where  where 0 is the N × N null matrix and The quantum walker is usually assumed to be initially localized at the vertex |0⟩, while the coin is in a generic superposition state, with θ ∈ [0, π] and ϕ ∈ [0, 2π[.After t ∈ N time steps the quantum walker will be in the state where the amplitudes ψ s,j (t) ∈ C-with s = 0, 1-associated to the states |s c ⟩|j p ⟩ satisfy the normalization condition at any t, ∑ N−1 j=0 ∑ 1 s=0 |ψ s,j (t)| 2 = 1.The probability to find the walker at position k at time t, irrespective of the coin state, is

Quantum circuit implementing the DTQW on the 2 n -cycle
A quantum circuit implementing the DTQW on the N-cycle with N = 2 n requires n + 1 qubits: n to encode the walker's position state and an additional 1 to encode the coin state.Both states are encoded in base 2. Denoting by j 2 the binary representation of the integer j with n digits, in the little-endian ordering convention (the most significant bit is placed on the left) we write |j⟩ ≡ |j 2 ⟩ = |q n−1 . . .q 0 ⟩, where q k = 0, 1 with k = 0, . . ., n − 1, such that j = ∑ n−1 k=0 q k × 2 k .Accordingly, we write the quantum state of the quantum walker as the (n + 1)-qubit state which represents the state where the coin is in the state |q c n ⟩ and the walker is in the position state |q p n−1 . . .q p 0 ⟩.

Quantum circuit design
The efficiency of a quantum circuit implementing a DTQW relies on the efficient implementation of the single time-step operator (2), so, ultimately, on that of the conditional shift operator (3).As shown in Eq. ( 4), the latter involves the circulant matrices P 0 and P 1 introduced in Eq. ( 5) and circulant matrices are known to be diagonalized by the quantum Fourier transform (QFT) matrix (see Appendix A).The QFT of the computational basis is defined as where The QFT is a unitary transformation, F F † = F † F = I.Accordingly, we can write where with We stress that ) and that we can write the second equality of Eq. ( 12) because we are assuming N = 2 n .Given Eq. ( 11), the conditional shift matrix S (4) and its diagonal form Σ are related via where the identity in coin space I c is required since the (I)QFT acts on position space only.
The DTQW of t steps is generated by repeatedly applying the operator U t times, Eq. (7).Recalling that the QFT is unitary, it acts only on position space, and given that with Σ in Eq. ( 14).Equation ( 15) provides a first sketch of the circuit we are going to implement.First, we perform a QFT on the position register, I c ⊗ F .Then, we repeat t Quantum circuit implementing one time-step of the DTQW on the 2 n -cycle proposed in [58].The quantum Fourier transform, F , and its inverse, F † , do not include the SWAP gates.The increment gate is diagonalized by the QFT [see also Fig. 2(d)].Conditional shift operator as in Fig. 2.
times the single time-step evolution in the extended Fourier space (extended to include coin space), Σ(C ⊗ I p ).In the end, we perform an IQFT on the position register, I c ⊗ F † .Even at this stage, an advantage of our scheme is evident: Overall, it requires only one QFT and one IQFT, unlike the QFT-scheme in Fig. 3, which requires both the transformations at each time step [58].Now, we focus on Σ to further improve the above scheme.The second equality in Eq. ( 14) clearly shows the action of Σ: If the coin is in the state |0 c ⟩ (|1 c ⟩), then the operator Ω † (Ω) acts on the position state.It is evident that, in the present form, each step of the DTQW requires 2n controlled-R k gates.To reduce the number of controlled operations, we point out that performing k independently of the coin state, followed by a controlled-R 2 k if the coin is in |1 c ⟩ to compensate the previously assigned phase and get the correct one.Formally, we can rewrite the diagonal conditional shift operator (14) as where since R 2 k = R k−1 and R 0 = I (one-qubit identity gate), see Eq. ( 13).As a final step in optimizing our quantum circuit, we want to make the SWAP operations, usually required in a proper (I)QFT to obtain the correct states, unnecessary.Therefore, following the argument in [58], it is useful to introduce the SWAP operation on the n-qubit register, which we denote by τ.The SWAP takes qubit k to qubit n − 1 − k and vice versa, This operation is unitary, τ −1 = τ † , and involutory, τ −1 = τ.A proper (I)QFT on n qubits requires a SWAP on the n-qubit register at the end (beginning) of the circuit, i.e., F = τ F , and where F ( †) denotes the (I)QFT without the SWAP.Similarly, we introduce To be repeated t times  see Eq. ( 12).Using Eqs. ( 19)-( 20) and recalling that τ −1 = τ † = τ acts only on the position register (not on the coin), we observe that according to which we can rewrite Eq. ( 15) as In conclusion, the quantum circuit implementing t steps of a DTQW, excluding the initial state preparation, is shown in Fig. 4 and does not need the SWAP in the (I)QFT.Size and depth of a quantum circuit implementing a DTQW can be further reduced by choosing a proper encoding of the position space and designing initial-state dependent circuits [60].We point out that the design of our quantum circuit is independent of the initial state, but it can be further optimized for an initially localized walker, the usual initial condition, by replacing the initial QFT with a layer of Hadamard gates (Appendix B).

Comparison with other existing schemes
In this section we estimate the size of the proposed quantum circuit (Fig. 4), in terms depth D, number of one-and two-qubit gates, N (1) and N (2) respectively, and compare it with that of other existing schemes, following the preliminary analysis provided in [58].We will compare our scheme with the following ones.(i) The ID-scheme [48], which is based on the increment and decrement gates (Fig. 2) that require generalized CNOT gates.The latter can be implemented in different ways, so, as an example, we consider their implementation (i.a) via linear-depth quantum circuit [61] or (i.b) via ancilla qubits [62].In passing, we also mention a possible implementation via rotations [56].In the following analysis we consider the ID-scheme implemented as in Fig. 2(d), with the increment gate only.(ii) The QFT-scheme [58], which is based on the increment gate diagonalized by the QFT (Fig. 3).
This discussion is neither supposed to be exhaustive, e.g., several are the ways to implement the generalized CNOT gates in the ID-scheme, nor to provide optimal and universal metrics, as the latter are ultimately quantum-device-dependent, e.g., it suffices to think of the process of transpilation, which rewrites and/or optimizes a given circuit according to the topology of the quantum device considered.Still, our estimate is instructive to assess how our scheme scales better than others when including the number of implemented time-steps in the analysis, in particular if compared to the QFT-scheme (Fig. 3) which is, to the best of our knowledge, the most efficient state-of-the-art implementation of DTQW on 2 n -cycles.Results on the circuit size in the different schemes are summarized in Table 1 and shown in Fig. 5. Details on the computation are deferred to Appendix C.
Considering both the number n of position qubits and the number t of time-steps, the ID-scheme is the most resource-demanding among those examined: (i.a)If generalized CNOT gates are implemented via linear-depth quantum circuits, then circuit's depth increases as D = O(4tn 2 ) and the results quickly degrade due to the large number of two-qubit gates, N (2) = O(2tn 3 /3); (i.b)If generalized CNOT gates are implemented via ancilla qubits, then circuit's depth is of the same order, D = O(8tn 2 ), but the number of twoqubit gates is reduced by an order, N (2) = O(10tn 2 ), at the cost of requiring extra qubits, N (a) = O(n).Both the ID-approaches require N (1) = 2t.(ii) A remarkable improvement is obtained by the QFT-scheme, which, with no need of ancilla qubits, has D = O(6tn) and N (2) = O(tn 2 ), at the cost of increasing the number of one-qubit gates, N (1) = O(3tn).
Our scheme refines such metrics by making the cost of the (I)QFT independent of the number of time-steps: In the long-time limit, t ≫ n, we have D = N (1) = N (2) = O(tn), to which is addded a fixed, t-independent but n-dependent, cost D = O(4n), N (2) = O(n 2 ), and N (1) = O(2n).To ease the comparison among the schemes, the metrics of Table 1 are shown in Fig. 5, making it clear that our scheme outperforms the others in the number of two-qubit gates, which take the longest time to execute and are the noisiest in quantum computers.Also, we observe that the circuit depth is mainly determined by the number of two-qubit gates.
In conclusion, both the QFT-scheme and ours outperform the ID-scheme at any time.Although these metrics are comparable in the few time-steps regime, our scheme outperforms the QFT-scheme when a large number of time-steps is implemented.
Table 1.Metrics of the quantum circuit implementing t time-steps of a DTQW on the 2 n -cycle for different schemes: N (1) and N (2) denote the number of one-and two-qubit gates,respectively, D the depth of the circuit, and N (a) the number of ancilla qubits.The number n refers to the number of qubits encoding walker's position.See also Fig. 5.

Scheme
Fig   1 as a function of the number of position qubits, n, and the number of time-steps, t, of a DTQW on the 2 n -cycle.Each column corresponds to a different scheme (Present, QFT, ID (lin.-depth), and ID (ancillae)) and each row to a different metric (number of one-N (1) and two-qubit gates N (2) , circuit depth D).

Results and Discussion
We test the DTQW circuit introduced in Sec.3.1 (see Fig. 4) on the IBM quantum computer ibm_cairo v.1.3.5, a 27-qubit Falcon r5.11 processor whose qubit connectivity map is shown in Fig. 6(a).In the following, we introduce the DTQW considered for the test and the quantities of interest, then we point out some solutions to improve the circuit, and finally we present and discuss the results.

Hadamard DTQW
A common choice for the coin operator is the Hadamard coin As for the initial state, we assume the walker to be initially localized in |0 p ⟩ and the coin to be in a given superposition The Hadamard DTQW for this initial state has the following properties: (i) The dynamics is periodic of period T dyn = 8 (T dyn = 24) on the 4-cycle (8-cycle) [63]; (ii) Maximally entangled single-particle states-entanglement between position and coin-are generated after one step of the walk and then recurrently with period T MESPS = 4 (T MESPS = 12) on the 4-cycle (8-cycle) [59].These properties are therefore suitable for thoroughly testing the quality of the designed quantum circuit, i.e., to assess to what extent the actual implementation can reproduce these ideal features.We point out that, for a Hadamard DTQW on the 4-and 8-cycle, any initial state (6) with ϕ = π/2 will generate a dynamics with the two above mentioned features for any value of θ [59].We arbitrarily set θ = π/6 to initialize the coin in a non-trivial superposition of states, see Eq. ( 23).For later convenience, we anticipate that the periodic dynamics of the DTQW in the 4-and 8-cycle sustain the periodic occurrence of localized states when the initial state is localized in position space.However, dynamics and occurrence of localized states can have different periods.In the 4-cycle, the initial state localized in |0⟩ p is perfectly transferred to |2⟩ p after 4 time steps, hence localized states occur with period T dyn /2 = 4, in contrast with the period T dyn = 8 of the dynamics.Instead, in the 8-cycle localized states occur only as a result of the periodic dynamics (T dyn = 24) [59].Implementing the cycle with N = 4 and N = 8 vertices requires n = 2 and n = 3 position qubits, respectively.

Figures of merit
Probability distribution.-Inmost DTQW problems, we are interested in the probability distribution of the walker's position.Our purpose is to compare the ideal probability distribution with the experimental ones, the latter obtained in a noisy simulation and in an actual implementation on the quantum hardware ibm_cairo.To compare two discrete probability distributions, P = {p k } k and Q = {q k } k , we adopt the Hellinger fidelity 2 , where the Hellinger distance h(P, Q) [64] between P and Q is defined by The Hellinger distance is symmetric, h(P, Q) = h(Q, P), and bounded 0 ≤ h(P, Q) ≤ 1, with h = 0 meaning that the two distributions are equal (fidelity H = 1).
Entanglement.-Usually, in a DTQW entanglement between walker and coin occurs.This can be understood as hybrid entanglement, also referred to as single-particle entanglement because established between different degrees of freedom of the same quantum system, here position and coin, which is an internal degree of freedom, e.g., spin [65].Bipartite entanglement can be probed by means of entanglement entropies.In this case, we probe the second-order Rényi entanglement entropy [66] of the reduced density matrix for the two parts (coin and position) via randomized measurements [67].Estimating this quantity requires significantly fewer measurements than performing quantum state tomography: For a n-qubit state, O(2 an ) with a < 2 (the coefficient a depends on the nature of the considered state, see Supplementary Materials for [67]) compared to 2 2n − 1 [68].In this regard, the advantage of this approach becomes remarkable when a large number of qubits is involved-as expected to be in future applications-due to the current costliness of tomography.The second-order Rényi entropy for a part A of the total bipartite system described by ρ AB is defined as with ρ A = Tr B ρ AB the reduced density matrix for part A. If the second-order Rényi entropy of the part is greater than that of total system, S (2) (ρ A ) > S (2) (ρ AB ), then bipartite entanglement exists between the two parts (for separable states S (2) (ρ A ) ≤ S (2) (ρ AB ) and S (2) (ρ B ) ≤ S (2) (ρ AB ) [66]).If, in addition, the overall state ρ AB is pure, then the secondorder Rényi entropy is directly a measure of bipartite entanglement and S (2) (ρ A ) = S (2) (ρ B ) (the reduced density matrices of a pure bipartite state have the same non-zero eigenvalues, from the Schmidt decomposition).The second-order Rényi entropy is maximum for the maximally mixed state, max ρ A S (2) (ρ A ) = log 2 d A , with d A the dimension of ρ A .Furthermore, S (2) (ρ AB ) is indicative of the overall purity of the system, because it is null for pure quantum states.

Circuit optimization
Each time step of the DTQW requires the coin qubit to interact with each position qubit (see controlled-R k gates in Fig. 4).The quantum hardware may have limited connectivity and whenever two qubits are not physically adjacent, then SWAP operations are needed to make them interact.A wise circuital implementation must account for the connectivity of the quantum hardware considered.We can limit the number of SWAPs by making the coin qubit as "shared" as possible, compatibly with the typical sparse connectivity of superconducting quantum computers.The qubit topology of ibm_cairo [Fig.6(a)] makes it possible to implement the circuit for a DTQW on the 4-and 8-cycle without SWAPs by mapping coin and position qubits as in Fig. 6(b) and (c), respectively.Given the optimal mapping compatible with the given qubit connectivity, we consider the set of qubits having the lowest error rates averaged over different calibrations.In addition, the initial state ( 23) is localized in position space, therefore we replace the initial QFT with a layer of Hadamard gates (Appendix B).

Analysis of the DTQW on the 4-and 8-cycle
Probability distribution (4-cycle).-Performinga noisy simulation of the quantum circuit is the preliminary step before the actual implementation on the quantum hardware.Fig. 7(a) shows the Hellinger fidelities of the walker's position distribution in the 4-cycle.The Hellinger fidelity for the noisy simulation of our circuit is above the 80% for all the 19 time steps implemented, while the results for the noisy simulation of the circuit in the QFTscheme [58] degrade below the 80% after a few steps.The previous analysis on the circuit size proved the advantage of our circuit in the long time limit, but these results suggest that our circuit outperforms the QFT-scheme circuit already in the few time-steps regime.For the actual implementation on ibm_cairo we consider two levels of optimization in the transpilation: (i) optimization_level=1 (default value), which transpiles the circuit into the native gates of the hardware and performs a light optimization (blue dashed line, curve "IBM Cairo"), and (ii) optimization_level=3, which performs the heaviest optimization (red solid line, curve "IBM Cairo Opt.").
The Hellinger fidelity for the default implementation of our circuit (optimization_-level=1) shows a moderate discrepancy with respect to the noisy simulation, but closely follows the trend of the latter.The heavily optimized implementation of our circuit (optimization_level=3) provides better results and also partially mitigates the local minima of the noisy simulation.However, we point out that the circuit transpiled with the highest optimization level turns out to have a depth which is basically independent of the number of time-steps implemented (see Appendix D).This explains the long-lasting optimality of the results, H ≳ 90% up to t = 19 (last time-step implemented).However, this optimal transpiled circuit is obtained only for n = 2 position qubits (4-cycle); for n = 3 (8-cycle) we obtain a circuit whose depth increases with the number of time-steps.The Hellinger fidelity at t = 0 is lower than 1 because we still implement the whole circuit with t = 0, i.e., we do not just implement the initial state.Also, the Hellinger fidelity is characterized by periodic local minima, with period 4 which is half of the period of  23).(a) Hellinger fidelity between the ideal and the experimental probability distributions of walker's position as a function of time-step t.The experimental distributions include: Noisy simulation and implementations on the actual quantum hardware with optimization_level=1 (IBM Cairo) and optimization_level=3 (IBM Cairo Opt.) in transpilation.Results for the noisy simulation of the DTQW circuit in the QFTscheme [58] are reported for comparison.(b) Ideal and experimental (IBM Cairo Opt.) probability distributions of walker's position for time-steps t = 0, . . ., 8. Results for both simulations and quantum hardware implementation are obtained for 10 5 shots and by encoding the position state in qubits number 3 and 8 and the coin state in qubit 5 of ibm_cairo, see Fig. 6(a),(b).the dynamics.These minima occur when the walker is ideally localized in position, a probability distribution so peaked (delta) that it can hardly be obtained as the result of an actual, noisy implementation [see Fig. 7(b)].The frame t = 8 in panel (b) shows that the walker has returned to the initial position, as expected for the periodic dynamics.
Probability distribution (8-cycle).-Weobtain qualitatively analogous results also for the DTQW on the 8-cycle (Fig. 8).However, as anticipated in the previous paragraph, in this case the optimization in the transpilation is not as effective as for the 4-cycle.The Hellinger fidelities with optimization_level=1 and 3 are thus consistent with each other.Unlike the DTQW on the 4-cycle, given an initial localized state, localized states at later times occur only as a result of the periodic dynamics (T dyn = 24), hence the minimum of the Hellinger fidelity at t = 24.
To summarize, the actual multiqubit state implemented on the quantum hardware and then processed by the quantum circuit is generally mixed and characterized by residual entanglement, but the local minima and maxima of S (2) (ρ c ) and S (2) (ρ p ) are perfectly consistent with the expected periodic separable and maximally entangled single-particle states, respectively, up to t = 9. Results suggest that the observed local maxima will preserve the expected periodicity for further steps t > 9, but eventually S (2) (ρ cp ) ≥ S (2) (ρ α ) due to the degradation of the purity of the total state.Entanglement is a distinct quantum signature and, in this example, we have clear evidence of generation up to t = 9.This does not necessarily imply that quantum features, including entanglement, in the realization of the DTQW are lost thereafter.Indeed, for the remaining time-steps investigated, 9 < t ≤ 14, we still observe S (2) (ρ p ) ≳ S (2) (ρ cp )-presence of entanglement-whenever states with partial or maximum entanglement are expected.Therefore, quantum features are present up to the last time-step considered, t = 14.

Conclusions
We proposed an efficient quantum circuit for the DTQW on the 2 n -cycle.Our scheme, using only one QFT and one IQFT, significantly improves the most efficient state-of-theart implementation [58] (QFT-scheme in the following), which uses one QFT and one IQFT at each time step.As a result, our circuit requires only O(n 2 + nt) two-qubit gates, compared to the O(n 2 t) of the QFT-scheme.The improvement in this gate count is even more significant at long times, passing, for t ≫ n, from O(n 2 t) in the QFT-scheme to O(nt) in ours.Therefore, two-qubit gates taking the longest time to execute and being the noisiest, our quantum circuit is computationally less demanding and also paves the way for reliable use on noisy-intermediate scale quantum devices [69][70][71].
In this regard, we tested the proposed quantum circuit on an actual quantum hardware, ibm_cairo, considering a Hadamard DTQW on the 4-and 8-cycle.Both are characterized by periodic dynamics [63] and recurrent generation of maximally entangled single-particle states [59].We claim two main results.First, even in the short-time regime, the present quantum circuit outperforms the current state-of-the-art DTQW circuits, whose results are degraded after only a few steps [56,58].Despite the moderate discrepancy, our implementation on an actual quantum hardware provides results that closely follow those from the noisy simulation.In particular, the Hellinger fidelity between the ideal probability distribution of walker's position and the experimental one is above 90% for all the t = 19 time-steps we implemented in the 4-cycle and above 80% up to t = 13 time-steps in the 8-cycle.Second, for the DTQW on the 4-cycle, we provide experimental evidence of the recurrent generation of nearly maximally entangled single-particle states up to t = 9 time-steps.The expected maximum entanglement is not achieved because the ideally pure state of the bipartite system is actually implemented on the quantum hardware as a mixed multiqubit state and its purity degrades over time.
The implementation of our circuit on actual quantum computers may benefit from the following.The circuit strongly relies on controlled R k -gates (phase shift gate) and the latter can be efficiently implemented using a single ancillary qubit [72].Moreover, as the position space increases, the sparse connectivity of a superconducting quantum computer results in large experimental overheads of SWAP gates, which becomes unavoidable, e.g., to make the coin qubit interact with each position qubit.In this regard, a virtual two-qubit gate can be employed to suppress errors due to the additional SWAP gates [73].Alternatively, an implementation on a quantum hardware architecture with full connectivity [74][75][76] may be more advantageous.
Possible applications of our scheme include the circuital implementation of direct communication protocols [35,36] and quantum key distribution protocols [38] based on DTQW on the cycle.We point out that the proposed circuit does not impose constraints on the coin operator (one-qubit gate), which in principle can be changed at each time step.Parrondo's paradox arises when losing strategies are combined to obtain a winning one and it cuts across various research areas.This counterintuitive phenomenon can be observed in DTQW on the line or cycle when two or more coin operators are applied in a deterministic sequence [77,78].Therefore, we expect that our quantum circuit may also be of interest to quantum game theory [79].
Circuit implementations of DTQW on a cycle of arbitrary N have been addressed in [51,58], while our proposal is limited to DTQW on the N-cycle with N = 2 n .We point out, however, that any circulant matrix is diagonalized by the QFT and this has been already exploited to efficiently implement CTQWs on circulant graphs [80].Similarly, our implementation for DTQWs can be generalized to circulant graphs-graphs whose adjacency matrix is circulant-which, being d-regular (each vertex has degree d), will require a d-dimensional coin [81,82].A generalization of our approach to more complex structures is therefore desirable and potentially of larger interest, e.g., for algorithmic purposes.
In Fig. A1 we show the gate count-number of one-and two-qubit gates-and depth of the quantum circuit implementing t time-steps of the DTQW on the 4-and 8-cycle, panels (a) and (b) respectively, for optimization_level=1,3 in transpilation.Consistently with Table 1, the general trend is linear in time for both the DTQWs on the 4-and 8-cycle with light optimization (optimization_level=1) and only for the DTQW on the 8-cycle with the heaviest optimization (optimization_level=3).In the particular case of the 4-cycle, for the latter optimization level the counts are basically independent of t [Fig.A1(a)].
The transpiled circuit implementing one step of the Hadamard DTQW is shown in Fig. A2 for the 4-cycle with optimization_level=1,3 and in Fig. A3 for the 8-cycle with optimization_level=1.We recall that the initial state is given in Eq. ( 23), the initial QFT is implemented via Hadamard gates (Appendix B), and the Hadamard gate, which is not a native gate in this quantum processor, is transpiled into H = R z (π/2) √ XR z (π/2) [see Eq. ( 22) and Eq.(A23)].

Figure 1 .
Figure 1.Schematic representation of a DTQW on the N-cycle.(a) The coin state (internal degree of freedom) is responsible for making the walker move in the cycle clockwise if |1 c ⟩ and counterclockwise if |0 c ⟩.(b) States and operators of a DTQW.The vertices of the cycle (light violet)-walker's position states-are labeled by |j p ⟩ with j = 0, 1, . . ., N − 1, and each vertex comprises two sub-vertices-coin states-labeled by |0 c ⟩ (orange) and |1 c ⟩ (green).Each step of the walk, Eq. (2), involves the action of a local coin operator C j responsible for mixing the coin states of each vertex (we assume C j = C ∀j), followed by the action of the conditional-shift operators |0 c ⟩⟨0 c | ⊗ P 0 (decrement) and |1 c ⟩⟨1 c | ⊗ P 1 (increment) responsible for shifting the position states, see Eq. (3)[43].
The operators P 0 (decrement) and P 1 (increment) are responsible for making the walker move one step counterclockwise and clockwise, respectively [see Fig.1and Fig.2(b)-(c)].In the computational basis the conditional shift operator (3) has matrix representation

Figure 2 .
Figure 2. (a) Quantum circuit implementing one time-step of the DTQW on the 2 n -cycle based on controlled-increment (I) and controlled-decrement (D) gates [48].(b) Increment and (c) decrement gates consist of generalized CNOT gates, with controls being |1⟩ (solid circle) and |0⟩ (empty circle), respectively.These gates act on the walker's position quantum register, conditional on the coin's qubit state [see panel (a)].(d) The ID-quantum circuit shown in panel (a) can be conveniently redesigned in terms of one increment gate (not controlled by the coin qubit) and CNOT gates only, being Decr.= n k=1 X k Incr.n k=1 X k [58].Quantum circuits in panels (a,d) implement the conditional shift operator S = ∑ N−1 j=0 (|0 c ⟩⟨0 c | ⊗ |(j + 1 mod N) p ⟩⟨j p | + |1 c ⟩⟨1 c | ⊗ |(j − 1 mod N) p ⟩⟨j p |) having the opposite convention to (3) used in the present work.

Figure 4 .
Figure 4.Quantum circuit implementing one time-step of the DTQW on the 2 n -cycle proposed in the present work.The quantum Fourier transform, F , and its inverse, F † , do not include the SWAP gates.To implement t time steps of the DTQW we have to concatenate the above circuit t times.In doing so, since the QFT is unitary F † F = I, we are left with only one QFT at the beginning, the central block (shaded) repeated t times, and one IQFT at the end.This simplification cannot occur in the quantum circuit in Fig.3due to the CNOT and the coin gates.For an initially localized walker, |ψ 0 ⟩ = |ϕ c ⟩ ⊗ |0 p ⟩ the initial QFT is conveniently replaced by a layer of Hadamard gates (see Appendix B).

DFigure 5 .
Figure 5. Metrics in Table1as a function of the number of position qubits, n, and the number of time-steps, t, of a DTQW on the 2 n -cycle.Each column corresponds to a different scheme (Present, QFT, ID (lin.-depth), and ID (ancillae)) and each row to a different metric (number of one-N(1) and two-qubit gates N(2) , circuit depth D).

Figure 6 .
Figure 6.(a) Qubit connectivity map of ibm_cairo.(b,c) Optimal mapping of the coin-position state onto multiqubit state |q c n q p n−1 . . .q p 0 ⟩ for the cycle with (b) N = 4 and (c) N = 8 vertices (n = 2, 3 position qubits, respectively).No SWAP operations between position and coin qubits are required by the controlled-R k gates in Fig.4, the coin qubit (orange) being already adjacent to all position qubits (blue).

Figure 7 .
Figure 7. Hadamard DTQW on the (N = 4)-cycle with initial state as in Eq. (23).(a) Hellinger fidelity between the ideal and the experimental probability distributions of walker's position as a function of time-step t.The experimental distributions include: Noisy simulation and implementations on the actual quantum hardware with optimization_level=1 (IBM Cairo) and optimization_level=3 (IBM Cairo Opt.) in transpilation.Results for the noisy simulation of the DTQW circuit in the QFTscheme[58] are reported for comparison.(b) Ideal and experimental (IBM Cairo Opt.) probability distributions of walker's position for time-steps t = 0, . . ., 8. Results for both simulations and quantum hardware implementation are obtained for 10 5 shots and by encoding the position state in qubits number 3 and 8 and the coin state in qubit 5 of ibm_cairo, see Fig.6(a),(b).

Figure 8 .
Figure 8. Same as in Fig. 7 but for N = 8. Results for both simulations and quantum hardware implementation are obtained for 10 5 shots and by encoding the position state in qubits number 10, 13, and 15 and the coin state in qubit 12 of ibm_cairo, see Fig. 6(a),(c).

Figure 9 .
Figure 9. Recurrent generation of maximally entangled single-particle states for a Hadamard DTQW on the (N = 4)-cycle with initial state as in Eq. (23) investigated by means of the second-order Rényi entropy as a function of time-steps t.Bipartite entanglement between the two parts (coin and position degrees of freedom) exists if the second-order Rényi entropy of a part is larger than that of the total system.Results of entropies are obtained via 300 randomized measurements[67] and 10 5 shots for each step of the DTQW, with optimization_level=1 in transpilation.Position state is encoded in qubits number 3 and 8 and the coin state in qubit 5 of ibm_cairo, see Fig.6(a),(b).

Figure A2 .
Figure A2.Quantum circuit proposed in the present work (Fig. 4) for one step of the Hadamard DTQW (Sec.4.1) on the 4-cycle transpiled in ibm_cairo with (a) optimization_level=1 and (b) optimization_level=3.Each qubit is labelled by the corresponding index in the qubit connectivity map in Fig. 6(a).
These schemes differ in how the generalized CNOT gates in the increment gate are realized [see Fig. *