We have introduced the
MiFGD algorithm for the factorized form of the low-rank QST problems. We proved that, under certain assumptions on the problem parameters,
MiFGD converges linearly to a neighborhood of the optimal solution, whose size depends on the momentum parameter
, while using acceleration motions in a non-convex setting. We demonstrate empirically, using both simulated and real data, that
MiFGD outperforms non-accelerated methods on both the original problem domain and the factorized space, contributing to recent efforts on testing QST algorithms in real quantum data [
22]. These results expand on existing work in the literature illustrating the promise of factorized methods for certain low-rank matrix problems. Finally, we provide a publicly available implementation of our approach, compatible to the open-source software Qiskit [
28], where we further exploit parallel computations in
MiFGD by extending its implementation to enable efficient, parallel execution over shared and distributed memory systems.
Despite our theory does not apply to the Pauli basis measurement directly (i.e., using randomly selected Pauli bases , does not lead to the -norm RIP), using the data from random Pauli basis measurements directly could provide excellent tomographic reconstruction with MiFGD. Preliminary results suggest that only random Pauli bases should be taken for a reconstruction, with the same level of accuracy as with expectation values of random Pauli matrices. We leave the analysis of our algorithm in this case for future work, along with detailed experiments.
Related Work
Matrix sensing. The problem of low-rank matrix reconstruction from few samples was first studied within the paradigm of convex optimization, using the nuclear norm minimization [
29,
57,
58]. The use of non-convex approaches for low-rank matrix recovery—by imposing rank-constraints—has been proposed in [
59,
60,
61]. In all these works, the convex and non-convex algorithms involve a full, or at least a truncated, singular value decomposition (SVD) per algorithm iteration. Since SVD can be prohibitive, these methods are limited to relatively small system sizes.
Momentum acceleration methods are used regularly in the convex setting, as well as in machine learning practical scenarios [
62,
63,
64,
65,
66,
67]. While momentum acceleration was previously studied in non-convex programming setups, it mostly involve non-convex constraints with a convex objective function [
47,
61,
68,
69]; and generic non-convex settings but only considering with the question of whether momentum acceleration leads to fast convergence to a saddle point or to a local minimum, rather than to a global optimum [
45,
70,
71,
72].
The factorized version for semi-definite programming was popularized in [
73]. Effectively the factorization of a the set of PSD matrices to a product of rectangular matrices results in a non-convex setting. This approach have been heavily studied recently, due to computational and space complexity advantages [
25,
26,
30,
31,
32,
33,
34,
36,
37,
38,
41,
74,
75,
76]. None of the works above consider the inclusion and analysis of momentum. Moreover, the
Procrustes Flow approach [
32,
34] uses certain initializations techniques, and thus relies on multiple SVD computations. Our approach on the other hand uses a single, unique, top-
r SVD computation. Comparison results beyond QST are provided in the appendix.
Compressed sensing QST using non-convex optimization. There are only few works that study non-convex optimization in the context of compressed sensing QST. The authors of [
16] propose a hybrid algorithm that (i) starts with a conjugate-gradient (CG) algorithm in the factored space, in order to get initial rapid descent, and (ii) switch over to accelerated first-order methods in the original
space, provided one can determine the switch-over point cheaply. Using the multinomial maximum likelihood objective, in the initial CG phase, the Hessian of the objective is computed per iteration (i.e., a
matrix), along with its eigenvalue decomposition. Such an operation is costly, even for moderate values of qubit number
n, and heuristics are proposed for its completion. From a theoretical perspective, [
16] provide no convergence or convergence rate guarantees.
From a different perspective, [
77] relies on spectrum estimation techniques [
78,
79] and the Empirical Young Diagram algorithm [
80,
81] to prove that
copies suffice to obtain an estimate
that satisfies
; however, to the best of our knowledge, there is no concrete implementation of this technique to compare with respect to scalability.
Ref. [
82] proposes an efficient quantum tomography protocol by determining the permutationally invariant part of the quantum state. The authors determine the minimal number of local measurement settings, which scales quadratically with the number of qubits. The paper determines which quantities have to be measured in order to get the smallest uncertainty possible. See [
83] for a more recent work on permutationally invariant tomography. The method has been tested in a six-qubit experiment in [
84].
Ref. [
22] presented an experimental implementation of compressed sensing QST of a
qubit system, where only 127 Pauli basis measurements are available. To achieve recovery in practice, the authors proposed a computationally efficient estimator, based on gradient descent method in the factorized space. The authors of [
22] focus on the experimental efficiency of the method, and provide no specific results on the optimization efficiency, neither convergence guarantees of the algorithm. Further, there is no available implementation publicly available.
Similar to [
22], Ref. [
26] also proposes a non-convex projected gradient decent algorithm that works on the factorized space in the QST setting. The authors prove a rigorous convergence analysis and show that, under proper initialization and step-size, the algorithm is guaranteed to converge to the global minimum of the problem, thus ensuring a provable tomography procedure.
Our results extend these results by including acceleration techniques in the factorized space. The key contribution of our work is proving convergence of the proposed algorithm in a
linear rate to the global minimum of the problem, under common assumptions. Proving our results required developing a whole set of new techniques, which are not based on a mere extension of existing results.
Compressed sensing QST using convex optimization. The original formulation of compressed sensing QST [
4] is based on convex optimization methods, solving the trace-norm minimization, to obtain an estimation of the low-rank state. It was later shown [
8] that essentially any convex optimization program can be used to robustly estimate the state. In general, there are two drawbacks in using convex optimization optimization in QST. Firstly, as the dimension of density matrices grow exponentially in the number of qubits, the search space in convex optimization grows exponentially in the number of qubits. Secondly, the optimization requires projection onto the PSD cone at every iteration, which becomes exponentially hard in the number of qubits.
We avoid these two drawbacks by working in the factorized space. Using this factorization results in a search space that is substantially smaller than its convex counterpart, and moreover, in a single use of top-
r SVD during the entire execution algorithm. Bypassing these drawbacks, together with accelerating motions, allows us to estimate quantum states of larger qubit systems than state-of-the-art algorithms.
Full QST using non-convex optimization. The use of non-convex algorithms in QST was studied in the context of full tomography as well. By “full tomography” we refer to the situation where an informationally complete measurement is performed, so that the input data to the algorithm is of size
. The exponential scaling of the data size restrict the applicability of full tomography to relatively small system sizes. In this setting non-convex algorithms which work in the factored space were studied [
85,
86,
87,
88,
89]. Except of the work [
88], we are not aware of theoretical results on the convergence of the proposed algorithm due to the presence of spurious local minima. The authors of [
88] characterize the local vs. the global behavior of the objective function under the factorization
and discuss how existing methods fail due to improper stopping criteria or due to the lack of algorithmic convergence results. Their work highlights the lack of rigorous convergence results of non-convex algorithms used in full quantum state tomography. There is no available implementation publicly available for these methods as well.
Full QST using convex optimization. Despite the non-scalability of full QST, and the limitation of convex optimization, a lot of research was devoted to this topic. Here, we mention only a few notable results that extend the applicability of full QST using specific techniques in convex optimization. Ref [
52] shows that for given measurement schemes the solution for the maximum likelihood is given by a linear inversion scheme, followed by a projection onto the set of density matrices. More recently, the authors of [
18] used a combination of the techniques of [
52] with the sparsity of the Pauli matrices and the use of GPUs to perform a full QST of 14 qubits. While pushing the limit of full QST using convex optimization, obtaining full tomographic
experimental data for more than a dozen qubits is significantly time-intensive. Moreover, this approach is highly centralized, in comparison to our approach that can be distributed. Using the sparsity pattern property of the Pauli matrices and GPUs is an excellent candidate approach to further enhance the performance of non-convex compressed sensing QST.
QST using neural networks. Deep neural networks are ubiquitous, with many applications to science and industry. Recently, [
9,
10,
11,
27] show how machine learning and neural networks can be used to perform QST, driven by experimental data. The neural network architecture used is based on restricted Boltzmann machines (RBMs) [
90], which feature a visible and a hidden layer of stochastic binary neurons, fully connected with weighted edges. Test cases considered include reconstruction of W state, magnetic observables of local Hamiltonians, the unitary dynamics induced by Hamiltonian evolution. Comparison results are provided in the Main Results section. Alternative approaches include conditional generative adversarial networks (CGANs) [
91,
92]: in this case, two dueling neural networks, a generator and a discriminator, learn to generate and identify multi-modal models from data.
QST for Matrix Product States (MPS). This is the case of highly structured quantum states where the state is well-approximated by a MPS of low bond dimension [
12,
13]. The idea behind this approach is, in order to overcome exponential bottlenecks in the general QST case, we require highly structured subsets of states, similar to the assumptions made in compressed sensing QST. MPS QST is considered an alternative approach to reduce the computational and storage complexity of QST.
Direct fidelity estimation. Rather than focusing on entrywise estimation of density matrices, the direct fidelity estimation procedure focuses on checking how close is the state of the system to a target state, where closeness is quantified by the fidelity metric. Classic techniques require up to
number of samples, where
denotes the accuracy of the fidelity term, when considering a general quantum state [
93,
94], but can be reduced to almost dimensionality-free
number of samples for specific cases, such as stabilizer states [
95,
96,
97]. Shadow tomography is considered as an alternative and generalization of this technique [
98,
99]; however, as noted in [
94], the procedure in [
98,
99] requires exponentially long quantum circuits that act collectively on all the copies of the unknown state stored in a quantum memory, and thus has not been implemented fully on real quantum machines. A recent neural network-based implementation of such indirect QST learning methods is provided here [
100].
The work in [
93,
94], goes beyond simple fidelity estimation, and utilizes random single qubit rotations to learn a minimal sketch of the unknown quantum state by which one that can predict arbitrary linear function of the state. Such methods constitute a favorable alternative to QST as they do not require number of samples that scale polynomially with the dimension; however, this, in turn, implies that these methods cannot be used in general to estimate the density matrix itself.