A Convex Hull-Based Machine Learning Algorithm for Multipartite Entanglement Classiﬁcation

: Quantum entanglement becomes more complicated and capricious when more than two parties are involved. There have been methods for classifying some inequivalent multipartite entan-glements, such as GHZ states and W states. In this paper, based on the fact that the set of all W states is convex, we approximate the convex hull by some critical points from the inside and propose a method of classiﬁcation via the tangent hyperplane. To accelerate the calculation, we bring ensemble learning of machine learning into the algorithm, thus improving the accuracy of the classiﬁcation.


Introduction
Machine learning was born from pattern recognition, which possesses the ability to make decisions without explicit programming after learning from large amounts of data.Up to now, machine learning has been employed to quantum areas.Thus far, a number of promising applications have been proposed, such as quantum metric learning [1], the gate decomposition problem [2], quantum states discrimination [3], quantum discrete feature encoding [4], quantum nodes based on variational unsampling protocols [5] and quantifying steerability [6].
Entanglement was first described by Einstein, Podolsky and Rosen [7].Later, quantum entanglement became a useful resource, enabling tasks such as quantum cryptography [8], quantum teleportation [9] and driving fields on the spectrum [10].There are also many methods which have been proposed to distinguish and quantify entanglement, including Tsallis-q entanglement [11], device-independent entanglement witnesses [12] and the geometric measure of entanglement [13].
When it comes to the number of parties involved in entanglement, there are two typical classes: bipartite entanglement and multipartite entanglement.When there are more than two parties involved, the situation gets complicated.For example, when there are three qubits in the Hilbert space H A , H B and H C , a state is called a fully separable state if it can be written as where |α A ∈ H A , |β B ∈ H B , |γ C ∈ H C .Biseparable states can be written as a product state in the bipartite system.A biseparable state can be created if two of the three qubits are grouped together into one party.There are three possibilities for grouping two qubits together, and hence there are three classes of biseparable states.There are three possibilities: |δ denotes a two-party state that might be entangled.Finally, a state is called genuine entangled if it is neither fully separable nor biseparable.There are two main families of multipartite entanglement: one is the GHZ states [14,15], and the other is the so-called W states [16].A schematic picture of the structure of mixed states for three qubits is shown in Figure 1 [17].Given two three-qubit states |φ and |ψ , one can ask whether it is possible to transform a single copy of |φ into |ψ with local operations and classical communication (LOCC) without requiring that this be done with certainty.This operation is called stochastic local operations and classical communication (SLOCC).Compared with the well-known local operations and classical communication (LOCC), SLOCC has a non-unit probability.For systems of N qudits described by Hilbert spaces of the form H := C d 1 ⊗ . . .⊗ C d N , SLOCC operations are mathematically described by the group G := SL(d 1 , C) × . . .× SL(d N , C), and the action is given by the tensor product.We call two states equivalent if there is a non-vanishing probability of success when trying to convert one to another through SLOCC.The distinction between a GHZ state and a W state is that one cannot be transferred to another through SLOCC [18].In that case, we can establish an equivalence relation stating that two states |φ and |ψ are equivalent if the parties have a no probability of success when trying to convert |φ into |ψ and also |ψ into |φ when |φ is a GHZ state and |ψ is a W state.However, this conversion can happen when both states are GHZ or W states.This relation has been termed stochastic equivalence.Their equivalence under SLOCC indicates that both states are again suited to implement the same tasks of QIT, although this time, the probability of a successful performance of the task may differ from |φ to |ψ .For instance, in a three-qubit case, we have that |ψ can be locally converted into |φ if an operator A ⊗ B ⊗ C exists, satisfying where operator A contains contributions from any round in which party A takes action on its subsystem and likewise for operators B and C. To make sure the opposite conversion can happen, each of these operators is necessarily invertible, particularly such that There has been an abundance of methods for classifying separable and entangled states.For instance, for 2 × 2 and 2 × 3 systems, the PPT criterion is a well-known method [19], the computable cross-norm or realignment (CCNR) criterion is simple and strong [20,21], and the entanglement witness is a necessary and sufficient criterion in terms of directly measurable observables [22].However, criteria for classifying GHZ and W states are relatively few at present.In this work, we employ machine learning techniques to tackle the GHZ and W states by recasting them as learning tasks.Namely, we attempt to construct a GHZ-W classifier.Our idea is to give the classifier a large number of sampled trial states and corresponding category labels and then train the classifier to predict the category labels of new states that it has not encountered before.

Convex Hull Approximation
As with the detection of entanglement, a natural question is asked: how is it decided which class a given state belongs to?However, methods for distinguishing between the GHZ class and the W class are very rare.For the detection of entanglement, the entanglement witness [23] can be used, as the W class states form a convex set.However, it is not clear how one can show that a state is tripartite entangled and belongs to the W class.This cannot be accomplished with witnesses; since they are designed to show that a state lies outside a convex set, they fail to prove that a state is inside a convex set.The traditional entanglement witness is used as a separability-entanglement classifier.An observable W is called an entanglement witness (or witness for short) if Tr(W s ) ≥ 0 for all separable s and Tr(W e ) < 0 for at least one entangled e holds.Thus, if one measures Tr(W ) < 0, then one knows for sure that the state is entangled.From a geometric point of view, both the state spaces and the entangled spaces form a convex set.The witness forms a hyperplane in the space, dividing it into two parts.
However, the boundaries of the convex set are so complicated that direct application of supervised learning to the classification will not be satisfying.In addition, due to the lack of prior knowledge for training, the neural network cannot provide an acceptable accuracy.In a study in 2018, researchers proposed a method of classification for entangled states and separable states via constructing a convex hull approximating the set of entangled states [24].Entangled states form a convex set in the spaces, and as shown in Figure 1, W states also form a convex set inside the set of GHZ states.Therefore, inspired by [24], we introduce the convex hull approximation [25] here.The construction of convex hull is one of the most fundamental problems in computational geometry.Here, we approximate the W states from inside, for the W state space is a close convex set, and its critical points are all the pure W states [17].We define a convex hull as follows: where c 1 , . . ., c n ∈ X are pure W states sampled randomly.C is said to be a convex hull approximation (CHA) of the W state space.To find the critical points, we propose an iterative algorithm in Section 3.2.2.With the increasing of the number of critical points, the convex hull approaches the W state space.In other words, C will be a more accurate CHA of the W state space if we construct it with more pure W states.With this we can approximately tell whether a state ρ is a W state or not by testing if its feature vector is in C.This is equivalent to determining whether the feature vector can be written as a convex combination of c i by solving the following linear programming: The constraint condition is equivalent to the following expansion: Here, α = α(C, p) is a function of C and p, and p is the feature vector of the state ρ to be tested.If α(C, p) ≥ 1, then p is in C, and thus ρ is a W state; otherwise, ρ it is highly possible that this is a GHZ state.A schematic picture of the convex hull approximation is shown in Figure 2.Then, we propose an iterative algorithm for detecting the W states.At the first step, we use a few critical points (i.e., pure W states to build a CHA C).For a state ρ whose feature vector is p, we find the maximum α by Equation ( 5), and it is still in C. If α ≥ 1, then ρ is certainly in CHA and thus is a W state. Otherwise, suppose αp lies on a hyperplane P such that P ∩ C is the boundary of C. We can enlarge C by sampling the pure W states near the known critical points.Then, we repeat the above procedure many times until α ≥ 1 or α converge.

A Tangent-Based Classifier via CHA
Due to the fact that the set of W states is convex, there must exist many tangent hyperplanes P so that all the W states are on the same side of the space divided by the hyperplane (i.e., N(P )|ρ W > 0 is satisfied with all W states, where N(P ) is the normal vector of the tangent hyperplanes P).
A single tangent hyperplane P is not enough to classify the W and GHZ states, but with enough tangent hyperplanes, the error rate can be tolerated.Thus, we aim to generate enough tangent hyperplanes here.Due to the fact that the boundary of the set of W states W is so complicated, it is difficult to find tangent hyperplanes directly.However, with enough critical points on the convex hull, we could approximate the tangent hyperplane by a hyperplane determined by some points on the convex hull that are close enough (i.e., the volume of the formed hyper-body would be minimized).With n dimensions for the states, n critical points are needed here.

Voronoi Diagram
A Voronoi diagram is a given finite set of points that divides the space into some small regions based on the nearest neighbor principle, where the points in the region are closer to the points in the set of points contained in the region than to any other store in the set of points.Given a set S = {p 1 , p 2 , . . . ,p n } constructed by n points, the Voronoi district of p i comes to Reg(p i ) = p|d(p, p i ) < d(p, p j ), i = j , where d(p i , p j ) refers to the Minkowski distance between p i and p j .The division given by Reg(p i )(i = 1, 2, . . ., n) and their boundary is called the Voronoi diagram generated by p i .

Minimum Hyper-Body
As was mentioned above, to find the most approximately tangent hyperplanes means to find the minimum hyper-body.Here, we utilize critical points and their neighbors to generate the Voronoi diagram, and it was proven in [26] that the optimal time complexity is O(nlogn).We implement the method of divide-and-conquer.The intersec-tion of the region where the target point is located is called the vertex.In a Voronoi diagram, the existence of a vertex implies the existence of a hyper-body.Figure 3 is a plane Voronoi diagram schematic.q 1 , q 2 and q 3 are three arbitrary points in the given n points.Point O is the barycenter of the triangle constructed by q 1 , q 2 and q 3 , which means that there are no other target points in the triangle.Therefore, we can approximately consider the hyper-body composed of adjacent target points to be a locally optimal solution.In the plane Voronoi diagram case, we first select segment q 1 q 2 as a side of the triangle and then divide the n − 2 points other than q 1 , q 2 into two subsets: S 1 = {p|Target points in the areas adjacent to the area where q 1 and q 2 are located} and S 2 = {p|Target points in other areas}.We can select q 3 and q as the present, so we ob- tain q 1 q 2 q 3 and q 1 q 2 q.Therefore, we just have to select all triangles made by the Voronoi vertex and compare them to find the triangle with the smallest area as the solution.
Figure 3.A plane Voronoi diagram schematic.q 1 , q 2 and q 3 are three arbitrary points in the given n points.Point O is the barycenter of q 1 q 2 q 3 .In a Voronoi diagram, there is a property where the vertex point is the barycenter of the triangle formed by the target points.Therefore, there will not be any other target points in q 1 q 2 q 3 .Point O is an arbitrary target point, so it must be outside of the triangle.In this case, q 1 q 2 q 3 can be considered the minimum triangle.

Combining CHA and Machine Learning
The method above (see Section 2.1) can detect the GHZ states and W states.However, when increasing the accuracy, we have to enlarge the convex hull so that there is a large number of critical points waiting for determination of whether they are in the convex hull or not, leading to a greater time cost.Therefore, we bring supervised learning here to speed up the algorithm.

Data Preparation
As for the data preparation, for any quantum state ρ, the density operator acting on C −1 due to the fact that ρ is Hermitian and of trace 1, where d i is the dimension of H i .Generalized Gell-Mann matrices [27] are a frequently used linear independent Hermitian orthonormal basis here.
Let {|1 , . . . ,|n } be the computational basis of the n-dimensional Hilbert space and Thus, there are symmetrical matrices presented as follows: There are also anti-symmetrical matrices, presented as and diagonal matrices, presented as Therefore, the generalized Gell-Mann matrices can be presented as follows: where tr(λ i ) = 0 and tr λ i λ j = 2δ i,j when i = j.Therefore, every ρ can be expanded into linear combinations as follows: where x i = n 2(n−1) tr(ρλ i ) .In supervised learning, the training set should have the following form: where m is the size of the training set, x i is the state vector input and label i is the corresponding tag such that label i ∈ {0, 1}.

Extended Data Form
The CHA method described above has another obvious drawback from the perspective of the trade-off between accuracy and time consumption.Improving the accuracy means adding additional critical points to expand the convex hull, which leads to a greater time cost to determine whether a point is within the expanded convex hull.To overcome this problem, we combine CHA with supervised learning, as machine learning has the ability to speed up this computation.To boost the accuracy, we add more information into Equation (13).The original feature vectors x are extended to (x, α) so it can contain the boundary information.Therefore, the dataset can be rewritten as where α i = α(C, x i ).Therefore, the classifier h is defined on X × R. The loss function of h is defined as follows: Then, we can employ a standard ensemble learning approach to train a classifier.

Ensemble Learning
In supervised learning algorithms, the goal is to learn a stable model that performs well in all aspects, but in practice, this is often not the case, and sometimes, we can only obtain multiple models with preferences.The underlying idea of integration learning is that even if a weak classifier gets a wrong prediction, the other weak classifiers can correct the error [28].In the bagging method, the bootstrap method is used to obtain N datasets from the overall dataset with put-back sampling, a model is learned on each dataset, and the final prediction results are obtained using the output of the N models.Specifically, the classification problem uses the N model prediction voting method, and the regression problem uses the N model prediction averaging method.Boosting is a machine learning algorithm that can be used to reduce bias in supervised learning.It is also mainly used to learn a series of weak classifiers and combine them into a strong classifier.Each training example is assigned an equal weight at the beginning of training, and then the algorithm is used to train the training set for several rounds.After each training, the training examples that fail are assigned a larger weight (i.e., the learning algorithm pays more attention to the wrong samples after each learning, resulting in multiple prediction functions).Here, we imply both two models to compare their effects.

Training Phase of the Predictors
Here, we generated GHZ and W states directly, calling the functions GHZState.mand WState.m as in [29].We generated a specific number of random quantum states which were either GHZ states or W states with different labels.Then, we generated SLOCC to transfer these states and obtain the training set, as is shown in Figure 4. ) as the original sates and generated SLOCC to transfer 191 these states and form more states in the GHZ states set and W states set.We adopted the result as the training set.

Testing Phase of the Predictors
With the training set, we could construct the convex hull.Here, we choose an iterative algorithm to find more critical points by one known critical point (i.e., to find its neighbors).The algorithms are shown below.

Algorithm for Calculating the α
Let the GHZ states |φ GHZ ∈ H A and the W states |ψ W ∈ H B .To approximate the set of W states with a convex hull C, we generated a bunch of critical points.The process was carried out as follows: (1) Randomly sample a state |φ GHZ ∈ H A ∼ = C d A from a uniform distribution according to the Haar measure; Execute the process above N times to obtain n critical points c 1 , . . ., c n .Then, solve the convex optimization problem mentioned above to decide whether the vector is in the convex hull generated.

Algorithm for Finding Critical Points
For the set of W states W, which is closed and convex, for an arbitrary state ρ, there must exist and only exist a critical point satisfying that α ρ ρ + (1 − α ρ )I/(d A d B ) is on the boundary of W. When α ≤ α ρ , ρ is a W state, and when α > α ρ , it is a GHZ state.Here is an iterative algorithm for calculating α ρ based on the convex hull C: (1) Initiate p as the feature vector of ρ, and set ε = 1, ξ = 0.9; (2) Update α ρ ← α(C, p); (3) Now, α ρ p = ∑ i λ i c i .Pick c i 1 , . . ., c i D to be the critical points satisfying λ i k > 0. Update C ← conv(c i 1 , . . ., c i D ); (4) For each k = 1, . ..D, suppose c i k is the feature vector of |a k |b k .Sample the neighbor of c i k ; that is, to randomly generate two Hermitian operators , where c k is the feature vector of |a k b k ; (5) Update ε ← ξε and go back to step 2.
We repeated step 4 10 times to find enough neighbors.The initiation of ξ could be adjusted.When ξ is closer to 1, the approximation is more precise, and on the contrary, when ξ is closer to 0, the speed of approximation gets faster.

Algorithm for Calculating an Approximate Tangent Hyperplane
For a given set of critical points S = {p 1 , p 2 , . . . ,p n }, we chose a Voronoi diagram to find the approximate tangent hyperplanes: (1) Divide the n critical points into 50 parts, each of which contains m critical points.
Generate a Voronoi diagram via each part.Here, we directly implement the function voronoin.m of the Qhull toolbox [30] to generate the n-dimensional Voronoi diagram.(2) Find the minimum hyper-body via the adjacent target points.Generate the corresponding tangent hyperplane.Decide which states are GHZ states according to the hyperplane.(3) Repeat step 2 50 times or until all the diagrams have been used.
Without the implementation of CHA, the results of the direct supervised learning are shown below (Table 1).Here, we name the CHA combined with a tangent hyperplane as TCHA and name the CHA combined with ensemble learning as ECHA.For the lack of boundary knowledge, it was difficult to reduce the error rate.For the three-qubit and four-qubit cases, the error rates of different numbers of critical points are shown below (Figures 5 and 6).It can be seen that the accuracy decreased when the number of critical points increased.The performances of the three-qubit cases were better than those of the four-qubit cases.The approximation was more accurate when the number of critical points increased, and when the number of qubits increased, the entanglements became more complicated.For the case of the TCHA, the performance was better than that of the CHA due to the introduction of a tagent hyperplane.However, TCHA highly depends on the number of critical points, leading to a huge amount of computations, so the performances have limits.The performances of the ECHA were better than those of TCHA and CHA.The training of the machine learning model is related to the dimensions of the feature vectors, which was 65 in the case of the 3-qubit, case and increased to 257 in the case of the 4-qubit case.The dimensions of the feature vectors predictably grew as the number of qubits grew, so the accuracy would be reduced.

Conclusions
In this paper, we built a GHZ-W state classifier by an ensemble learning approach.To improve the accuracy,we first implemented a Voronoi diagram to build a tangent hyperplane classifier.Then, we added the boundary information about the convex hull as prior knowledge in the data to be trained and tested it to build an ensemble learning classifier.Such classifiers outperformed the algorithm with direct supervised learning in terms of accuracy.
The key of our scheme is the approximation of the convex hull of quantum states, so in theory, this method can be implemented with other classifications of multipartite entanglements meeting the conditions of a convex set.For example, the hierarchy of multipartite entangled states among N-party quantum states meets this condition, so genuine multipartite entangled states and k separable states can be classified via this method.Classifications of W states as well as Dicke states, cluster states and graph states are part of the same case [31].We hope our scheme can be implemented in other types of entanglement classification.Aside from that, theoretically, such a classifier can also be extended to higher dimensions.We hope that our classifier will be able to handle more quantum information tasks in the future.

Figure 1 .
Figure 1.Schematic picture of the structure of mixed states for three qubits.The convex set of all fully separable states (fs) is a subset of the set of all biseparable states (bs).The biseparable states are the convex combinations of the biseparable states with respect to fixed partitions sketched by the three different leaves.Outside are the genuine tripartite entangled states, the W class and the GHZ class.There are many more GHZ states than W states. Reproduced with permission from Otfried Gühne, Géza Tóth, Physics Reports; published by Elsevier, 2009.

Figure 2 .
Figure 2. Convex hull approximation.The more critical points there are, the better the approximation is.

Figure 4 . 2 and W states |W = 1 √n
Figure 4. Schematic diagram for generating the training set.We implemented the typical forms of GHZ states |GHZ = |0 ⊗n +|1 ⊗n √ 2

( 2 )
Randomly sample a state |ψ W ∈ H B ∼ = C d B from a uniform distribution according to the Haar measure; (3) Return |φ GHZ |ψ W .

Figure 5 .
Figure 5.The error rates of different classifiers for the 3-qubit case when N increased.The performances of the two ECHAs were better than the CHA, while those of the ECHAs were similar.

Figure 6 .
Figure 6.The error rates of different classifiers for the 4-qubit case when N increased.They were slightly poorer than those for the 3-qubit case.

Table 1 .
Error rate of the classifier via the direct supervised learning algorithm.