1. Prior Work and Problem Statement
System identification and system inversion are well-known problems, especially for classical systems. These problems are less challenging in the “nonblind”/“supervised” case [
1] where the aim is, e.g., to identify the considered system by using the known input and the measured output. In contrast, in the “blind”/“unsupervised” case [
2], the input values are unknown and uncontrolled, but some hypotheses are sometimes made on the input signal(s).
For quantum systems, non-blind system identification methods were first introduced in 1997 in [
3] that came up with the name quantum process tomography (QPT), see [
4]. They use copies of a set of known pure input states that are transformed by the process. Those transformed states are then measured and estimated using quantum state tomography (QST aims at estimating a quantum state using measurements). From there, the parameters of the process can be estimated from what is essentially a regression. This method scales poorly when the number of qubits increases, and is only experimentally feasible for one or two qubits. This is to be expected because, in general, a quantum process has
independent real parameters ([
4], p. 391), with
d the dimension of the Hilbert space (for an
-qubit system
). This method would later be called standard QPT (SQPT), in contrast to non-standard QPT that uses ancilla qubits and weak measurements (see [
5] for a survey). In Ref. [
6], a SQPT approach that scales better with the number of qubits by assuming that the process is sparse is introduced. Like Baldwin et al., in most of [
7], we choose to restrict ourselves to unitary processes. This class is of particular interest because the evolution of any closed quantum system is described by a unitary transformation. A unitary process has
independent real parameters.
A significant problem of SQPT is the need to precisely prepare the copies of the input states. Any systematic error on the input state has huge consequences for the precision. In 2015, we introduced the blind version of QPT (BQPT) in [
8], then detailed it in [
9], and more recently in [
10]. In those papers, we focused on the tomography of the two-qubit cylindrical-symmetry Heisenberg coupling process. For those algorithms, the operator has to prepare one or several copies of an unknown set of initial states. This requires a preparation procedure to be known and reproducible, so that several copies of each used state may be prepared. It is not a violation of the no cloning theorem, the latter does not apply if we prepared the state that we want to reproduce. This idea removes the issue of systematic errors (with respect to a desired state) during the preparation. The system is identified by processing output measurements associated with
different unknown input states going through the system. Generally, we need to perform QST or at least to estimate some measurement outcome probabilities for each of the
output states. For the approaches of [
8,
9], this kind of QST requires
copies of each considered output state. Therefore, for each one of the
states the same experiment has to be repeated
times with the same input state value, for
input state preparations in total. The most recent paper [
10] also proposes “single-preparation BQPT methods” (SBQPT), i.e., methods which can operate with only one instance of each considered input state,
.
In [
11] (2021), we introduced the setup that will be further developed in the current paper. In Ref. [
11] we considered copies of a single 2-qubit state (initially unentangled) being transformed by a unitary process and measured at 5 different time delays (
). In the current paper, we consider a setup closer to standard QPT where only two times are considered (see
Figure 1). The unit-norm
d-dimensional vectors
represent the initial quantum pure states. Those initial states are considered unknown. We simply assume that they are pure, unentangled, linearly independent, and that at least one of the states is not orthogonal to all the others (i.e.,
). These are reasonable hypotheses, as long as the qubits are prepared separately, the states are unentangled; and
d random states are always (probability 1) linearly independent and not orthogonal in the
d-dimensional Hilbert space. After waiting
, each input state vector
is multiplied by the unitary
matrix
, thus yielding the output state
.
We assume that enough types of measurements are performed on copies of all
states to achieve QST on each state. The present paper does not focus on the measurements performed and the QST algorithm. We simply assume that each state is recovered up to a global phase and a low residual error. For the numerical simulations, we will use the first QST algorithm of [
12] which is suited to pure states and has the advantage of only requiring unentangled measurements on each qubit. However, the current paper is not bound to [
12] and any pure state QST algorithm [
13,
14] can be performed. The fact that we perform measurements on the input states means that our algorithm is not blind, but since their values are not imposed by the proposed method, we keep the main advantage of the blind approaches (resilience to systematic error).
Section 2 briefly describes the system states and measurements.
Section 3 describes a straightforward method that does not require an initialization and achieves QPT using the estimates of the states.
Section 4 describes a method that improves the first estimate by maximizing the likelihood of the measurements. Finally,
Section 5 contains some numerical results.
2. States and Measurements
2.1. Considered States
We hereafter consider an -qubit system, typically composed of distinguishable spins 1/2. Any pure state of that system is here expressed in the basis defined as the tensor product of the standard bases associated with each qubit. The components of in that basis can be stored in a d-element vector , with . The components of are complex and the norm of is 1. The global phase of has no physical meaning, so we can assume that the first non-zero component of is a real strictly positive number. In the rest of the paper, we consider the vector instead of the state .
2.2. Considered Types of Measurements
First focusing on a single qubit, we perform measurements based on the three Pauli operators
, and
[
4] and, e.g., related to spin 1/2 components along the
, and
Z axes. For each such direction, we define the eigenvector matrix whose first and second columns are the eigenvectors of the considered Pauli operator, respectively, associated with eigenvalues
and
in the standard basis. These eigenvector matrices may be shown to read:
The probabilities of the outcomes
and
when performing a measurement for state
along
are, respectively, the first and second elements of
where
is the element-wise squared modulus and
is the trans conjugate.
When considering
qubits, we perform the above-defined measurements in parallel for all qubits. Each such type
of measurements corresponds to a given direction
for the
m-th qubit for each
m in
(
). For each set of eigenvectors
(each
is a column of one of the matrix of
of (
1)) and eigenvalue
(
is either
if
is the first column of
and
if it is the second), respectively, associated with each qubit, the probability that a measurement on
yields these eigenvalues reads:
(where ⊗ is the tensor product). Those
d probabilities (from
to
) therefore form the vector
where
is the eigenvector matrix associated with the measurement along the directions
of
. It is expressed as the tensor (i.e., Kronecker) product of one-qubit matrices of (
1)
For example with
qubits, measuring the first one along
and the second one along
(
) yields the following eigenvector matrix
. Those measurements are not multi-qubit Pauli measurements (used in (8.149) in [
4]) because the latter only have 2 outcomes whereas the former have
d outcomes (they are the concatenations of
2-outcome measurements). In the rest of the paper, this type of measurement will be referred to as a string of
, and
Z (in this example,
).
For
qubits, there are
of those measurements. Since we are dealing with pure states, we can work with only 4 types of measurements:
(
is
X on every odd numbered qubit and
Y on the even numbered, all the others are the same measurement types on all qubits). In Ref. [
12] we explain how to perform QST with those measurements in
Section 3 and
Section 5. We will not mention it again in the rest of the paper, but if
we perform 3 types of measurements instead of 4 (along directions
X,
Y, and
Z), as
.
Those measurements are performed on and . In total, states are measured with 4 types of measurements. To estimate probabilities, each measurement is performed a given number of times that we call , the total number of measurements performed is . For each one of the distinct measurements, the numbers of times each one of the d outcomes was observed are stored in the d-dimensional vector where is the index if the measured state, defines the type of the measurement and is 0 if is measured and 1 if it is . Thus, contains the measurement counts for the state along direction . The expected value of is .
4. Fine Tuning
4.1. Problem Statement
Section 3 describes a method to achieve QPT using the results of the QST on every state. The current section details a different approach that requires an initial estimate of
(we will use
from (
4)) and finds the unitary matrix
and initial states
that maximize the likelihood of the measurements. Formally:
, where
represents the measurements results and
is the log-likelihood which we maximize in order to maximize the likelihood. The problem is actually simpler if we perform the maximization successively, i.e., find the best
for each
of which we compute the likelihood,
, because optimizing
knowing
(i.e., computing
) can be performed independently on all the
:
, where
and
are the measurements performed on
and
, respectively. This is the case because the
are statistically independent and involve different arguments to be maximized for different
j. Considering this, the problem becomes:
In order to solve (
6) we first need to be able to compute the likelihood of the measurements. Since most gradient based optimization algorithms can only be performed with a real number vector as argument, we also need to find real number parametrization for
and
. Those two points are the focuses of the following two subsections.
4.2. Statistical Model for the Measurements
In [
16], the formula for the likelihood of samples from multiple outcome measurements is given (albeit for a mixed state represented by a density matrix which we would have to replace by
or
). Once we remove additive constants, the log-likelihood boils down to:
, where
is the theoretical probabilities of the
m-th outcome, and
is the number of times the
m-th outcome has been measured. If the measurement whose likelihood we want to compute has
as eigenvectors matrix (
) and is performed on
, then
and
(see the definition of
and
in
Section 2.2,
stands for transpose). If, instead of
, we measure
, then
and
. Let us rewrite
using the notation adapted to our measurements:
(
ℓ is either 0 or 1 so
is either
or
). We can replace
in (
6) by its expression (knowing
), this yields:
4.3. Parametrization of the Arguments
For a given represents an unentangled state. By definition, it can be decomposed as a tensor product of 1-qubit states: . Each has 2 real parameters, and . Therefore, can be parameterized with real parameters: .
is a unitary matrix. Hence, it can be shown that there exists a Hermitian matrix , such that where exp is the matrix exponential. Therefore, can be parameterized with real parameters: , where is the parametrization of starting with the real parts of the components that are on or above the diagonal () where is the element on row and column of ) and ending with the imaginary parts of the components that are strictly above the diagonal (). Accounting for the fact that can only be recovered up to a global phase, we can assume that corresponding to the top left element of is 0 and remove it from the parametrization. Indeed, ( is the identity matrix) has a 0 for its top left element and and only differ by a global phase. Therefore, as far as the optimization algorithm is concerned, has real parameters: .
4.4. Optimization
In order to find the real parameters of
that solve (
7) we use the BFGS quasi-Newton algorithm [
17] initialized at the
parameters that yield
up to a global phase. This algorithm is implemented with the fminunc Matlab function, we provide it with the analytical expressions of the gradients of the criterion in order to make it run faster. At each step of the optimization of
,
d optimizations are performed on
arguments in order to find the
(to solve the max inside the first sum in (
7)). Those optimizations are also performed using the BFGS quasi-Newton algorithm with the analytical gradient provided. The latter algorithm is initialized at the real parameters of the unentangled state that is the closest to
, where
is the inverse of the
at the current state of the optimization (the
whose likelihood we are computing in order to maximize it),
j is the index of the
we are optimizing and
and
are defined in
Section 3.1. The optimization algorithms stop when the norm of the difference between the arguments at two successive iterations is lower than
. Moreover, for the optimization of
it stops after 700 iterations if the previous criterion is not met. For 3 qubits or less, the optimization of
always stops before the 700 iterations. For 4 and 5 qubits, this is not always the case but the BFGS algorithm decreases the criterion at every step so even if the algorithm has not properly converged, the final estimate
is still more likely than all the others, and, in particular, more likely than
.
5. Numerical Results
Our algorithm is tested by simulating a random matrix
which is a random complex matrix (composed of independent realizations of
with
and
independent standard normal variables) to which the Gram–Schmidt process has been applied in order to make it unitary. The states
are generated randomly by applying
(defined in
Section 4.3) to the
random parameters generated uniformly on the intervals on which they are defined.
We then simulate the associated measurements and apply the algorithms of
Section 3 and
Section 4 in order to obtain estimates of
and
. With
, the computation time on one thread on an Intel Xeon silver 4214 2.4-GHz processor is way shorter for
(around 30 s for 5 qubits and less than 10 s for fewer qubits) than for
(around 7 h for 5 qubits, 15 mn for 4 qubits and less than a minute for fewer qubits).
We choose to perform further tests with 4 qubits. 500 matrices are generated, and the associated and are computed with and for 2 qubits and and for 4 qubits. The associated numbers of copies of states to be measured are times greater, so and for 2 qubits and and for 4 qubits. We also compute with is the result of the likelihood maximization initialized at (only available in simulation) instead of .
The metric we use in order to quantify the proximity between and its estimate (either or ) is where is the angle that maximizes our metric (it accounts for the fact that can only be recovered up to a global phase) and is the Frobenius norm. This metric is between 0 (if and are the same up to a global phase) and 1 (if they are orthogonal with respect to the Hilbert–Schmidt inner product).
The cumulative density function (cdf) of our metric (called error) is displayed in
Figure 2. We note that:
is very similar to its reference (especially with ). This means that the likelihood algorithm converges towards the global minimum (so 700 iterations is enough and is a good enough initial point).
is worse than . This means that the costly likelihood maximization is not made in vain.
The errors with
are roughly twice smaller than the errors with
. So we are in the classic linear case where the error is proportional to the square root of the number of measurements. Additionally, the same graph with any
could be deduced from
Figure 2.