Coherence and Entanglement Dynamics in Training Variational Quantum Perceptron

In quantum computation, what contributes supremacy of quantum computation? One of the candidates is known to be a quantum coherence because it is a resource used in the various quantum algorithms. We reveal that quantum coherence contributes to the training of variational quantum perceptron proposed by Y. Du et al., arXiv:1809.06056 (2018). In detail, we show that in the first part of the training of the variational quantum perceptron, the quantum coherence of the total system is concentrated in the index register and in the second part, the Grover algorithm consumes the quantum coherence in the index register. This implies that the quantum coherence distribution and the quantum coherence depletion are required in the training of variational quantum perceptron. In addition, we investigate the behavior of entanglement during the training of variational quantum perceptron. We show that the bipartite concurrence between feature and index register decreases since Grover operation is only performed on the index register. Also, we reveal that the concurrence between the two qubits of index register increases as the variational quantum perceptron is trained.


Introduction
Quantum physics is known to provide better algorithms than its classical counterparts. Well-known examples include quantum superdense coding, quantum teleportation, and quantum key distribution [1]. When quantum physics is applied to computation, quantum computation algorithms can demonstrate remarkable improvements. In 1994, a factoring algorithm was suggested, that can factorize integer within a polynomial-time [2]. Furthermore, a quantum search algorithm was provided, which can effectively find a target in a large database [3][4][5][6][7]. For the implementation of quantum computation, many approaches have been proposed, such as linear optics [8,9], trapped ions [10], quantum dots [11], and superconductors [12]. Recently, artificial intelligence, in which quantum algorithms are applied, has been studied [13].
In quantum computation, what contributes supremacy of quantum computation? One of those candidates is entanglement [14,15]. D. Gottesman and I. L. Chuang suggested a universal quantum computer using quantum teleportation [16]. Because entanglement is a resource of quantum teleportation, Reference [16] reported that entanglement can contribute to a certain quantum computer.
Our result can show how coherence and entanglement may influence the performance of the quantum computer and quantum perceptron. It tells that the quantum system's various characters can affect the performance of the quantum computer and quantum perceptron. Further, our work can shed light on understanding quantum supremacy.
The remainder of this paper is organized as follows. In Section 2, we briefly explain the mathematical tool for coherence distribution. In Section 3, we present the behavior of coherence in VQP training. Specifically, we explain the distribution of coherence in the MPQC. Also, we investigate how coherence depletion occurs in the Grover algorithm. In Section 4, we discuss the coherence distribution and coherence depletion in two examples (Sections 4.1 and 4.2). Also, we investigate the behavior of entanglement by evaluating concurrence in the two examples. In Section 5, we discuss and conclude our results.

Mathematical Tools for Coherence Distribution
In this section, we briefly describe the measures for evaluating coherence. The starting point is that the state without superposition [54] can be well expressed using a diagonalized operator having classical probability distributions [1]. Moreover, the state is defined as an incoherent state. Next, we denote the set of incoherent states as I. Subsequently, we can define a quantum operation, by which I is sent into I. The operation can be considered as an incoherent operation. If the operation preserves the trace, one can define the operation as an incoherent completely positive and trace-preserving map(ITPCPM). Hence, the conditions that the coherence measure C(·) should satisfy are expressed as follows: The necessary and sufficient condition for State mixing does not increase coherence, which implies that the convexity ∑ i p i C(ρ i ) ≥ C(∑ i p i ρ i ) holds.
One can see that the l 1 −norm coherence (C l 1 ) and the relative entropy of coherence (C rel ) satisfy the conditions above [33]. The l 1 −norm coherence (C l 1 ) and the relative entropy of coherence (C rel ) can be analytically defined as: Here, ||M|| l 1 = ∑ i,j |M ij | is the l 1 −norm, S(·||·) is the relative von Neumann entropy, and S(·) is the von Neumann entropy. Because l 1 −norm coherence and relative entropy of coherence satisfy (C1), the minimum of these coherence measures is zero. Meanwhile, when ρ d is a d-dimensional maximally coherent state, the maximum of l 1 −norm coherence (relative entropy of coherence) is d − 1 (log 2 d).
Unlike a quantum correlation, coherence can be applied to a single system. In addition, the coherence of a multipartite system can be defined. For a bipartite system AB, the coherence C(ρ AB ) of ρ AB is expressed as [48] C(ρ AB ) = C(ρ A ) + C acc A + C(ρ B ) + C acc B + C r AB .
Here, ρ A = Tr B ρ AB and ρ B = Tr A ρ AB . C acc A (C acc B ) is the accessible coherence of system A(system B), where C acc A is defined as follows: Here, Π i = |i i| is an orthogonal basis of Hilbert space of system B. C acc B can be defined similarly as C acc A . Therefore, the local coherence of system A(B) is given as AB is remaining coherence. The remaining coherence is not localized on system A or system B. If the l 1 −norm coherence or the relative entropy of coherence is used as a coherence measure, then the remaining coherence becomes non-negative [48].

Coherence Processing in VQP Training
VQP is an quantum computer algorithm of a perceptron [55]. In a classical algorithm, a perceptron can be performed as follows. Suppose that a training dataset D Here, x i ∈ R M is a data vector and y i ∈ {+1, −1} is the label corresponding to data vector x i . The purpose of perceptron is to find the hyperplane W ∈ R M that minimizes − ∑ N i=1 sign(y i W · x i ). When in the VQP algorithm hyperplane W is correctly found, two halfspaces contain data vectors with correct labels. If a mislabeled data vector x k is in the halfspace, we must find the x k among every data vector in the half-space. When we use the Grover algorithm in this process, we can find the mislabeled data vector, with query complexity of O( √ N). The VQP proposed by Reference [47] is shown in Figure 1. It comprises a log 2 M qubit feature register(R F ) and log 2 N qubit index register(R I ). The initial state of the feature register and the index register is expressed as |0 ⊗ log 2 N+log 2 M . U data encodes the values of the dataset into the initial state. Suppose that the label of the k-th data vector is incorrect and U data provides a quantum state of |Φ k = U data |0 ⊗ log 2 N+log 2 M . Hence, the quantum computer performs P-times of the multi-layer parametric quantum circuit (MPQC) Figure 1b. In Figure 1b, U L 1 ( θ), U L 2 ( φ), θ, and φ are expressed as In V 1 ( θ, φ), U L 1 ( θ) and the multiqubit controlled-Z gate changes the phase of the mislabeled data. U L 2 ( φ) eliminates entanglement between the feature register and the index register. G 1 performs the Grover algorithm on the index register.
To perform an identical calculation, the entanglement between the feature register and the index register should be created. The quantum circuit U( θ i ) can be found in Figure 1c. Here, R X , R Y , and R Z are single-qubit rotation gates of the x, y, and z components, respectively.
When quantum computations are performed in P-times, quantum computer measures each qubit using a projective measurement {|0 0| , |1 1|} ⊗ log 2 N . Suppose that the result of the measurement is q( θ, φ) = (q(0; θ, φ), · · · , q(N − 1; θ, φ)). Here, q(i; θ, φ) is the probability that the result of the index register is i. The purpose of the VQP is to train the quantum circuit such that the measurement probability distribution q becomes close to the target probability distribution p = (p(0), · · · , p(N − 1)). Here, when i = k, we have p(i) = 0. When i = k, we have p(k) = 1. The training of a quantum circuit is provided by a classical algorithm. The measurement probability distribution, target probability distribution, and (untrained) circuit parameter determine the loss of the maximum mean discrepancy (MMD): Here, K(x, y) = exp(−|x − y| 2 /2σ i ) is a Gaussian kernel [56], and σ i is the bandwidth. The quantum circuit is trained in the manner in which the MMD loss diminishes. Assuming that the learning rate is r, the parameter is updated as [57]. Here, ∇ θ, φ is the gradient with respect to θ and φ.
The quantum algorithm comprises two processes. First, the MPQC should focus on the encoded information of the quantum state in the index register. Second, the Grover algorithm should effectively obtain the concentrated information in the index register. One can guess that the former is related to the coherence distribution, and the latter is related to coherence depletion.

Coherence Distribution Process
In the VQP, every information of dataset D is encoded into the feature and index registers. We can guess that for the Grover algorithm to find a mislabeled data, the MPQC should concentrate coherence into the index register. Therefore, we should train the VQP such that after operating the MPQC, the coherence distribution behaves as in the case of a successful training, as shown in Figure 2a. One should note that every coherence of the index register does not facilitate the Grover algorithm. The index register's local coherence comprises the coherence and the accessible coherence of the partial state. We can see that the index register's local coherence seems to be directly related to VQP training. The following two findings are obtained from our results. First, when the VQP is correctly trained, the accessible coherence of the index register and the feature register disappear. Second, the coherence of the index register state increases up to a specific value. Therefore, one can conclude that the accessible coherence hinders VQP training, whereas the coherence of the index register state facilitates VQP training.

Coherence Depletion Process
As we explained previously, the MPQC composing the VQP strengthens the coherence of the index register state. As shown in Figure 2b, when VQP training is correctly trained, the Grover algorithm consumes the coherence of the index register state. Further, we will demonstrate that even when during performing the Grover algorithm accessible coherence occurs, the Grover algorithm consumes the accessible coherence. However, as shown in Figure 2b, if VQP training is not correctly trained, the index register's local coherence increases, which degrades the Grover algorithm's performance. This is because coherence depletion should occur when the Grover algorithm is correctly performed [32].

Simulation Examples of Training
In this section, we analyze the relationship between VQP training and coherence, by using the examples in Reference [47].

Example 1
We consider an example where the dataset is D = {(1, 0, 1), (1, 0, 1), (1, 0, 1), (0, 1, −1)} [47]. We assume that the mislabeled data is x 4 = (0, 1) in the dataset. The VQP for this problem is described in Figure 3a. In Figure 3a, U data is composed of the Hadamard gate and controlled-controlled-X gate. Because in this example we have M = 2 and N = 4, the MPQC comprise single qubit rotation gates R X , R Y , and R Z . The disentanglement gate can be constructed by the controlled-controlled-X gate. If training can be performed correctly, the measurement probability q( θ) converges to the target probability p = (0, 0, 0, 1).
In the process of training VQP of Figure 3a, the coherence distribution and the coherence depletion are displayed in Figure 3b-f. Figure 3c,e show the l 1 −norm coherence, and Figure 3d,f show the relative entropy of coherence. Since the index register (feature register) is a 4-dimensional (2-dimensional) quantum system, the maximum of l 1 −norm coherence is 3 (2), and the maximum of relative entropy of coherence is 2 (1). As shown in Figure 3, the local coherence of the index register (purple solid line) shows maximal coherence regardless of the iteration. Meanwhile, the accessible coherence of the index register (dotted blue line) converges to zero as the number of iterations increases. In addition, the coherence of the index register state(solid blue line) converges to local coherence. This implies that the coherence of the index register state contributes to the VQP training. [Line description of (c,d)] Solid black (blue) line corresponds to a coherence of feature (index) register state C(ρ F ) (C(ρ I )). Dashed black (blue) line corresponds to an accessible coherence of feature (index) register C acc (ρ F ) (C acc (ρ I )). Solid red (purple) line corresponds to a local coherence of feature (index) register C(ρ F ) + C acc (ρ F ) (C(ρ I ) + C acc (ρ I )). [Line description of (e,f)] Solid black line corresponds to a coherence of index register state C(ρ I ). Dashed black line corresponds to an accessible coherence of index register C acc (ρ I ). Solid red line corresponds to a local coherence of index register C(ρ I ) + C acc (ρ I ). Figure 3, the coherence of the feature register state(solid black line) converges to zero. Also, the accessible coherence of the feature register(dotted black line) converges to zero. Therefore, the local coherence of the feature register decreases as the number of iterations increases.

Example 2
This example has the dataset D = {( x i , y i )} 7 i=0 , given by the following data vectors [47]: , 0.6, 0.8], Here, the correct classification is defined as follows: For data vector , when x i1 , x i2 > 0 and x i3 , x i4 ≤ 0, the label y i = +1 is assigned, but when x i1 , x i2 ≤ 0 and x i3 , x i4 > 0, the label y i = −1 is assigned [47]. In this example, we assume that y 0 = y 1 = · · · = y 5 = + 1, y 6 = − 1, y 7 = 1, which means that the mislabeled data is y 7 .The VQP model for this example is shown in Figure 4. Because the number of mislabeled data is one, the number of optimal iterations is l = (π − 2θ)/4θ 1.6734 [3]. Here, we have Therefore, the VQP model contains two Grover operations G 1 and G 2 . In addition, After training is completed, the probability of success becomes 80.2%. When the number of iterations is 12, the success probability of the VQP becomes a local minimum. Since the index register (feature register) is an 8-dimensional (4-dimensional) quantum system, the maximum of l 1 −norm coherence is 7 (3), and the maximum of relative entropy of coherence is 3 (2). Figure 4c-f show the l 1 −norm coherence before(after) performing G 1 and G 2 . The success probability in Figure 4b is similar to the coherence of the index register state shown in Figure 4c,d. When the number of iterations is approximately 12, the coherence of the index register state(solid blue line) becomes a minimum. As the number of iterations increases, the coherence of the index register state converges to a maximum. Meanwhile, the accessible coherence of the index register state(dotted blue line) and the feature register state(dotted black line) vanishes as the number of iterations increases. Therefore, most of the local coherence in the index register is the coherence of the index register state. Further, the manner in which G 1 and G 2 affect the local coherence of the index register is noteworthy. That is, while G 1 does not consume the local coherence, the local coherence of the index register diminishes in G 2 . Therefore, to understand VQP training, it is important to determine where coherence depletion occurs in the Grover operation. Figure 5 shows the case where VQP training fails. Here, we consider y 0 = y 1 = · · · = y 5 = y 6 = +1, y 7 = −1, where the mislabeled data is y 7 . As shown in Figure 5a, the success probability q(7) decreases from 43.28% to 38.26%. In Figure 5a,b, the accessible coherence of the index register(dotted blue line) and the accessible coherence of the feature register(dotted black line) do not vanish. Further, the coherence of the index register state(blue solid line) diminishes constantly before performing G 2 . Figure 5c,d shows that the coherence depletion occurs in G 1 , but the coherence increase in G 2 . This implies that these Grover operations cannot consume the l 1 −norm coherence appropriately. [Line description of (c,d)] Solid black (blue) line corresponds to the coherence of feature (index) register state C(ρ F ) (C(ρ I )). Dashed black (blue) line corresponds to the accessible coherence of feature (index) register C acc (ρ F ) (C acc (ρ I )). Solid red (purple) line corresponds to the local coherence of feature (index) register C(ρ F ) + C acc (ρ F ) (C(ρ I ) + C acc (ρ I )). [Line description of (e,f)] Solid black line corresponds to the coherence of index register state C(ρ I ). Dashed black line corresponds to the accessible coherence of index register C acc (ρ I ). Solid red line corresponds to the local coherence of index register C(ρ I ) + C acc (ρ I ). [Line description of (b,c)] Solid black (blue) line corresponds to the coherence of feature (index) register state C(ρ F ) (C(ρ I )). Dashed black (blue) line corresponds to the accessible coherence of feature (index) register C acc (ρ F ) (C acc (ρ I )). Solid red (purple) line corresponds to the local coherence of feature (index) register C(ρ F ) + C acc (ρ F ) (C(ρ I ) + C acc (ρ I )). [Line description of (d,e)] Solid black line corresponds to the coherence of index register state C(ρ I ). Dashed black line corresponds to the accessible coherence of index register C acc (ρ I ). Solid red line corresponds to the local coherence of index register C(ρ I ) + C acc (ρ I ).

Investigation of Entanglement in Two Examples
One of the known resources of quantum supremacy is entanglement. Therefore, in this study, we verify whether entanglement contributes to the performance of the VQP model. Figure 6 shows the concurrence [51][52][53] of two examples discussed in the previous section. The concurrence between two-qubit state ρ I i I j of index register I i and I j is defined as follows [52]: Here, λ i is eigenvalue of ρ I i I j σ y ⊗ σ y ρ * I i I j σ y ⊗ σ y , and the relation λ 1 > λ 2 > λ 3 > λ 4 is required. Also the bipartite concurrence between the feature register and the index register is defined as [51] E (bip) c (|ψ FI ) = 2(1 − Trρ 2 I ).
In Example 1, the multipartite entanglement of the entire system is defined as follows [53]: Because the quantum state of the entire system is a pure state, E (bip) c and E (mul) c can be descried as above. The maximum of concurrence of two-qubit state is one, but the maximum of bipartite concurrence is given by 2(d F − 1)/d F , (Here is the proof. Suppose that the dimension d F of the feature register is smaller than the dimension d I of the index register. Then, the pure entangled state of these register can be described as |ψ FI = ∑ d F i=1 λ i |λ i F ⊗ |λ i I , from Schmidt decomposition. Here, {|λ i X } is an orthonormal basis of system X ∈ {F, I}. In the case of maximal entanglement, we can have λ i = 1/ √ d (∀i). Then, the trace of the square of the partial state of |ψ ψ| becomes 1/d F . Substituting the value, we find that the maximum of bipartite concurrence is given by 2(d F − 1)/d F .) where d F is the dimension of the feature register. In Example 1, the maximum of bipartite concurrence is one because of d F = 2. However, in Example 2, the maximum of bipartite concurrence is given as √ 3/2 1.2247, (For example, in Figure 6d, when the iteration is 10, the bipartite concurrence becomes 1.088. It implies that when the iteration is 10, the state between the feature register and the index register is close to the maximally entangled state.) because of d F = 4. Here, Figure 6a,b is the case of success in example 1, and Figure 6c,d is the case of success in example 2. In Figure 6a, the solid blue line(the dashed line) is the multipartite concurrence before(after) the Grover operation is performed [53]. Further, we consider the bipartite concurrence between the feature register and the index register. The solid(dashed) black line is the bipartite concurrence between the feature register and the index register before(after) Grover operation is performed. Figure 6a shows that by performing the Grover operation the multipartite concurrence converges to the bipartite concurrence, and by iterating Grover operation the bipartite concurrence disappears. This implies that entanglement should diminish for the successful training of the VQP model. Figure 6b shows the concurrence [52] of qubits composing the index register. When VQP training is performed correctly, the concurrence before the Grover operation is performed converges to the value corresponding to the maximally entangled state. However, after the Grover operation, the concurrence disappears. This coincides with previous results [32] which implied that entanglement should be removed for the Grover algorithm to be performed correctly. Figure 6c,d show that the bipartite concurrence between the index register and the feature register diminishes. It is noteworthy that unlike those shown in Figure 6a,b, the bipartite concurrence does not converge to zero. Also, the success probability of the VQP is less than one owing to non-zero concurrence. Figure 6c shows that G 2 increases the value of the concurrence among the two qubits of the index register as the VQP is successfully trained. Figure 6d shows that G 2 decreases the bipartite concurrence between the feature and the index register. The phenomena can be understood because the Grover algorithm is only operated in the index register.  [Line description of (c)] Solid blue, purple, and red lines correspond to the concurrence E c (ρ I 1 I 2 ), E c (ρ I 2 I 3 ), and E c (ρ I 1 I 3 ) after GA1, respectively. Dashed blue, purple, and red lines correspond to the concurrence E c (ρ I 1 I 2 ), E c (ρ I 2 I 3 ), and E c (ρ I 1 I 3 ) after GA2, respectively. [Line description of (d)] Solid black line corresponds to the bipartite concurrence between feature and index register E

Conclusions
In this study, we investigated the contribution of coherence to the training of the VQP model. First, we discovered that before performing MPQC, coherence should be concentrated on the index register. Second, we demonstrated that the Grover algorithm consumes the coherence of the index register.
We discovered that coherence distribution and coherence depletion occur in correctly trained VQP model. Further, we demonstrated that to train the VQP model correctly, the local coherence of the index register should not contain accessible coherence [48]. Also, we investigated whether entanglement may affect to training the VQP model. We discovered that VQP model should be trained not to produce entanglement. Finally, we demonstrated that entanglement depletion does not occur in VQP training. Our result provided how coherence and entanglement may influence the performance of the quantum computer and quantum perceptron. And our work will help to understand the essence of quantum supremacy.