Classical Modeling of a Lossy Gaussian Bosonic Sampler

Gaussian boson sampling (GBS) is considered a candidate problem for demonstrating quantum advantage. We propose an algorithm for the approximate classical simulation of a lossy GBS instance. The algorithm relies on the Taylor series expansion, and increasing the number of terms of the expansion that are used in the calculation yields greater accuracy. The complexity of the algorithm is polynomial in the number of modes given the number of terms is fixed. We describe conditions for the input state squeezing parameter and loss level that provide the best efficiency for this algorithm (by efficient, we mean that the Taylor series converges quickly). In recent experiments that claim to have demonstrated quantum advantage, these conditions are satisfied; thus, this algorithm can be used to classically simulate these experiments.


Introduction
Quantum computers are computational devices which operate using phenomena described by quantum mechanics.Therefore, they can carry out the operations which are not available for classical computers.The ability of a quantum computer to solve a specific task faster than any classical computer is usually referred to as quantum advantage.Although quantum algorithms that provide exponential speedup over classical ones are known, they are hard to implement in practice.Examples of such algorithms include Shor's algorithm of factoring integers [1], that works in polynomial time, whereas all classical algorithms require exponential time.Modern quantum computers are far from experimentally demonstrating quantum advantage on basic problems like integer factorization.
Boson sampling [2] is a problem that was proposed as a good candidate for demonstrating quantum advantage due to its nature.A boson sampler is a linear-optical device that consists of non-classical sources of indistinguishable photons, a multichannel interferometer mixing photons of different sources, and photon detectors at the output channels of the interferometer.In the original proposal, the indistinguishable photons were prepared in Fock states.The problem then is to calculate the photon statistics after the interferometer given an input state and the interferometer matrix.The relevant parameters are the number of modes N and the total number of photons injected in the interferometer M. Experimentally, it corresponds to performing multiple measurements of the photon counts at the outputs of such a device [3].
Due to the technological complexity of generating Fock states, several variants of the original boson sampling problem have been proposed.They aim at improving the photon generation efficiency and increasing the scale of implementations.One such example is the scattershot boson sampling, which uses many parametric down-conversion sources to improve the single photon generation rate.It has been implemented experimentally using a 13-mode integrated photonic chip and six PDC photon sources [4].
Another variant is the Gaussian boson sampling [5,6], in which Gaussian states are injected into the interferometer instead of Fock states.Gaussian input states can be generated using PDC sources, and it allows the non-classical input states to be prepared deterministically.In this variant, the relative input photon phases can affect the sampling distribution.Experiments were carried out with N = 12 [7], N = 100 [8] and N = 144 [9,10], with up to 255 photons registered in one event.The latter implementations used PPKTP crystals as PDC sources and employed an active phase-locking mechanism to ensure a coherent superposition.
Any experimental setup, of course, differs from the idealized model considered in theoretical modeling.Bosonic samplers suffer from two fundamental types of imperfections.First, the parameters of a real device, such as the reflection coefficients of the beam splitters and the phase rotations, are never known exactly.A small change in the interferometer parameters can affect the sampling statistics drastically, so that the modeling of an ideal device no longer makes much sense.Another type of imperfections is photon losses.These losses happen because of imperfections in photon preparation, absorption inside the interferometer and imperfect detectors and coupling.
There are different ways of modeling losses: for example, by introducing extra beam splitters [11] or replacing the interferometer matrix by a combination of lossless linear optics transformations and the diagonal matrix that contains transmission coefficients [12].In the algorithm described in this paper, we will assume that losses occur on the inputs of the interferometer, and we will describe the exact way that we model them.
Imperfections in middle-sized systems make them, in general, easier to emulate with classical computers [13].It was shown [14] that with the increase of losses in a system, the complexity of the task decreases.When the number of photons M ′ that arrive at the outputs is less than √ M, the problem of boson sampling can be efficiently solved using classical computers.On the other hand, if the losses are low, the problem remains hard for classical computers [15].
In this paper, we propose a classical algorithm for calculating probabilities of output states in a GBS problem.The algorithm uses Taylor series expansion, and it converges faster depending on the parameters of the problem: namely, the amount of losses in the system and the squeezing parameter of the input states.The higher the losses in the system, the less orders of the series are needed to approximate the probability of observing a given output state.
The work by Oh et al. [16] used the following approach to simulating GBS: the covariance matrix of the output Gaussian state was decomposed into "quantum" and "classical" parts, in which the "quantum" part was simulated using matrix product states and the "classical" part was simulated by random displacement.Thus, when the photon loss rate is high, the computational complexity of this algorithm is reduced.
The algorithm that we propose in this paper uses some similar ideas: namely, the zeroth order of the Taylor series may be considered the "classical" part that is computed quite easily, while the remaining terms are the "quantum" part that is more computationally complex.The contribution of this "quantum" part is smaller when the losses in the system are high; thus, our algorithm also has optimal conditions that depend on the magnitude of losses.We also analyze some recent GBS implementations to compare the conditions in those experiments with the optimal conditions for our algorithm.

Problem Specification
Let us first consider a lossless linear-optics interferometer with a transmission matrix U: where creation operators acting on the i-th input and output modes are denoted a † i and d † i .Suppose the input modes are injected with single-mode squeezed states: where we omit the state's normalizing constant (1 The goal is to calculate the probability of detecting n 1 photons in the first output mode, n 2 photons in the second output mode and so on.This probability can be calculated in the following way: where ρout is the density matrix of the output state.

Modeling Losses
In real-life bosonic samplers, there will always be losses.Here, we will model them by substituting where b † i acts on a mode that we cannot observe, and c 2 + s 2 = 1, c, s ∈ R. Now, the goal is to compute the same probability (Tr ρout ⃗ n ) but taking losses into account.The input state will now be and we now take partial trace over all loss modes when calculating the density matrix:

Algorithm Derivation
Let us consider a single mode:

Calculating Partial Trace
We start by applying the Hubbard-Stratonovich transformation [17,18] e to both exponents in the density matrix operator.This gives us the following: We can now calculate the partial trace over loss modes: The following expression can be simplified: The density matrix now can be written in the following way:

Switching between Probability Density Functions
We can view this integral as taking an expected value over a two-dimensional normal distribution.ξ and ξ then become normally distributed random variables with a mean vector equal to zero.Their covariance matrix has the following form: Then, we can write where E N(0,Σ) denotes averaging over the two-dimensional normal distribution N(0, Σ).

Choosing Γ
The idea consists of minimizing the "perturbation parameter" so that each subsequent order of the Taylor series expansion has less impact on the expression.Since higher orders of the expansion contain higher powers of c 2 and higher moments E N(0,Λ) [χ n χm ], and these moments can be calculated via second moments χ 2 = χ2 and χ χ = h, the role of the "perturbation parameter" is played by Let us consider the conditions that must be satisfied by h.Firstly, h must satisfy 1/α 2 −s 4 , because ξ 2 0 ≥ 0 and χ 2 ≥ 0. Secondly, since Γ is a covariance matrix, its eigenvalues must be non-negative.The eigenvalues of Γ are ξ 2 0 , χ 2 − h and χ 2 + h.Thus, h needs to satisfy The minimum of max χ 2 , |h| is realized when h = −χ 2 = − 1 2 1 1/α+s 2 .

Multimode Case
Let us apply the steps described above to the case of N modes.We start with an input state We construct a density matrix and take the partial trace over loss modes: We apply the Hubbard-Stratonovich transformation 2N times, resulting in an integral over ∏ N i=1 dξ i d ξi : where the integral for each variable ξ i and ξi is calculated over (−∞, +∞).Again, we redefine We compute partial trace over loss modes: This expression now can be considered as taking an expected value over a 2Ndimensional normal distribution where all variable pairs ξ i , ξi are independent.Every variable pair ξ i , ξi has covariance matrix Σ, and we can write this expression in the following way: For each variable pair ξ i , ξi we now choose ξ 0i , χ i , χi in a way that is described above.Then, We now consider the Taylor series expansion (up to the second order) of the expression in the square brackets, which we will denote μ: The creation operators â † i that act on the input modes can be written in terms of the operators d † i that act on the output modes: We can expand the brackets in the expression for μ, leaving the terms up to the second order: (36) When we take the product of these two expressions, most of the resulting terms will have zero expected value because of the properties of the normal distribution.Then The integrals over χ i , χi result in specific moments of the distribution, and the integral over ξ 0i can be calculated using Monte-Carlo methods.The final expression is where by Wick's probability theorem

Calculating Traces
In order to calculate Tr ρout ⃗ n , we need to be able to calculate expressions Tr ν(⃗ x) ⃗ n , The first one can be calculated fairly easily: Now, suppose we need to calculate Tr ( d † 1 ) q 1 ...( d † N ) q N ν(⃗ x)( d1 ) q 1 ...( dN ) q N ⃗ n .First, we note that where by, e.g., |⃗ n − ⃗ p⟩ we mean i |n i − p i ⟩.

Algorithm Overview
The goal of the algorithm is to calculate the probability of a state |⃗ n⟩, given ⃗ n, α, c, s and U. We assume that the Taylor series expansion is calculated up to the desired order before computation starts.The integrals over χ i and χi should also be computed (it can be completed analytically via Wick's probability theorem).
We start by calculating two-variable covariance matrix Σ using α and s.We now select Γ in the way specified above such that it minimizes the series expansion parameter ε.In order to compute the integrals over ξ 0i , we sample ξ 0i for each i from a normal distribution N(0, ξ 2 0 ).We now compute Tr μ ⃗ n , which by linearity consists in computing traces of the form described above; for each sample, ⃗ ξ 0 we need only a polynomial number of operations.Finally, we take an average over our samples and multiply by the necessary constant

Taylor Series Convergence for Actual Experimental Conditions
We have discussed above the fact that the role of the "perturbation parameter" in the series expansion is played by c 2 • max χ 2 , |h| , which we can choose to be equal to 1/α+s 2 .This parameter depends on the experimental conditions (i.e., the squeezing parameter of the input state α and loss level s 2 ).The smaller this parameter is, the faster the series will converge.Thus, the best conditions for this algorithm are achieved when the loss level s 2 is high and the squeezing parameter α is low.Let us consider the actual experimental implementation of the Gaussian boson sampling problem and estimate how small this parameter is in those conditions.
Let us consider the relation between α and the average amount of photons per state ⟨n⟩.If the squeezing parameter is ζ = re iφ , then α = tanh r, while ⟨n⟩ = sinh 2 r.
In a paper by Zhong et al. [8], 25 PPKTP crystals were used to produce 25 two-mode squeezed states, which is equivalent to 50 single-mode squeezed states.The average number of photons registered by the detectors was 43.Thus, the average amount of photons per mode ⟨n⟩ is around 43 50 ; r = arcsinh( ⟨n⟩) ≈ 0.855, α = tanh r ≈ 0.694.The average collection efficiency is said to be c 2 = 0.628.Then, ε = 1 2 c 2 1 α +s 2 ≈ 0.18.In another paper by Zhong et al. [9], the average amount of photons produced was increased to 70 at maximum pump intensity.This corresponds to α ≈ 0.76.The overall transmission rate in the experiment is said in the paper to be 48% and 54% for different settings, so we take s 2 ≈ 0.5.This yields ε ≈ 0.14.
In the most recent experiment by Deng et al. [10], the average amount of photons was increased even more, measuring states with ≈ 50, ≈ 75 and ≈ 100 photons on average with different pump intensities while still producing 25 two-mode squeezed states.The efficiency of the setup is said to be 43%, yielding ε ≈ 0.11, ε ≈ 0.12 and ε ≈ 0.12, respectively.
To estimate the expected accuracy of the algorithm, we can assume that the numerical values of each order are approximately equal, meaning that we can write where P k denotes the sum of all the terms of the k-th order, and P 0 ≈ P 2 ≈ P 4 ≈ P k is assumed.The expression then becomes a geometric progression with common ratio ε.Then, on average, the 0-th order contributes to the probability a part equal to 1 − ε, while the second order contributes ε(1 − ε), the fourth contributes ε 2 (1 − ε), etc. Calculating up to the second order then discards a total contribution of ε, which is approximately 0.18 2 = 3.2%, 0.14 2 = 1.96%, 0.11 2 = 1.21% and 0.12 2 = 1.44% for the conditions that are analyzed above.When the calculation is performed up to the fourth order, the lost contribution is approximately 0.18 3 ≈ 0.58%, 0.14 3 ≈ 0.27%, 0.11 3 ≈ 0.13% and 0.12 3 ≈ 0.17%.
The conclusion that we draw is that even in large GBS experiments which are said to demonstrate quantum advantage, the conditions are such that ε is fairly small, and calculating up to the fourth order is enough for the lost contribution to be below 1%.

Implementation Details 6.1. Contraction Precomputation
Let us consider the term 1 2 We can rewrite it as 1 2 where T jk = ∑ i U ij U ik is a contraction of U with itself.It depends only on U and can be calculated before sampling ⃗ ξ 0 , which reduces the amount of operations required to calculate each probability sample from a ⃗ ξ 0 sample.

Factorial Fractions Precomputation
In calculating traces of the form described above, we need to calculate factorial fractions of the form m! (m−p)!≡ F m p , where 0 ≤ p ≤ m.Since the target state ⃗ n is fixed, m ≤ max(n i ).

Reusing ∑ i x i U ij
During calculation, while calculating each trace, we can calculate ∑ i x i U ij only once for each ⃗ ξ 0 sample and then reuse it, thus using less operations to calculate each trace.Let us denote and (47)

Precomputation
In this section, we will analyze the computational complexity of precomputation.By precomputation we mean the calculations that need to be carried out only once before ⃗ ξ 0 sampling and before calculating probability samples for each ⃗ ξ 0 .The multiplicative constant N can be calculated with O(N) multiplication operations.For each term in the resulting sum, we will define its order to be the number of variables χ and χ, or, equivalently, the power of the loss parameter c.Thus, the term will be of the second order.Then, each term of the order K will have a contraction of the form where some of the U ji can be conjugated.This leaves at most K + 1 different ways to conjugate the factors.Each contraction has K free indices, and calculating the sum requires N K additions and N K (K − 1) multiplications.The total number of additions is N 2K and the number of multiplications is N 2K (K − 1), where K is the maximum order we choose to calculate.Calculating all

Probability Sample Computation
Here, we will analyze the computational complexity of calculating a single probability sample given ⃗ ξ 0 .We will assume that the terms are calculated up to some order K.
Calculating the trace requires one multiplication of an N × N matrix by a N-dimensional vector, N exponentiation operations and 2N multiplication operations.This calculations needs to be made only once for each ⃗ x.Calculating any other trace of the form requires 2N exponentiation operations and 4N multiplication operations (since factorial fractions are precomputed).
The number of terms for a given order K is N K times the number of different non-zero K-th order moments χ The exact amount is hard to calculate, but the total number of moments (including those that are zero) is (K + 1)N K .Thus, the maximum amount of terms required to compute is (K + 1)N 2K .
Since the amount of operations required to calculate each term is O(N), the total computational complexity of calculating a probability sample for a given ⃗ ξ 0 is O K • N 2K .

Results
Below are the results of probability calculation for N = 5 for different output states.The calculated probabilities are compared to exact solutions.The parameters are α = 0.9, c = s = √ 2 2 .The number of samples is 4096.These results show that for calculating a single output state probability accurately, the number of samples needs to be on the order of 10 4 .Below are the results of using fewer samples per state, but instead of comparing individual probabilities, we look at the cosine similarity between the exact and approximated probability distributions over all two-photon states, Figures 1-3.The above graph suggests that the number of samples per state needed to approximate the distribution does not depend much on N. It is computationally hard to check this when comparing to the exact solution, but if we assume that the cosine similarity converges to a value close to 1, we can estimate how quickly it converges.Below, we look at the cosine similarity between a distribution calculated using H samples per state and a distribution calculated with H + ∆H samples per state for different H, where we choose ∆H = 10.This allows us to estimate how much the distribution changes with ∆H new samples: if the cosine similarity is close to 1, then new samples do not alter the distribution significantly.Figure 4 suggests more strongly that the number of samples per state required for accurate approximation is not really influenced by N.This can be explained by the fact that the number of two-photon states increases with N. If the number of samples per state is constant, then the total number of samples increases with N. Below are benchmark results that show the average precomputation time, which depends only on N, and the time per sample, which depends on N and the amount of photons M in the target state, Figures 5 and 6   These results show that even N = 40 mode GBS can be simulated on an average laptop using this algorithm.
A direct comparison of the performance of our method with other published results is problematic, since our algorithm calculates the probability of output states rather than directly sampling states from some approximate distribution.The closest algorithm is described in [16].However, the performance comparison is still problematic because the error bars of the two methods cannot be compared directly.Nevertheless, judging by Figure 9 from [16], our algorithm is more memory-efficient, since it uses about 0.1 GB of memory to run in conditions where the transmission rate is 0.5 and the number of modes is 40, whereas the algorithm [16] uses 1 TB, and the memory requirement grows exponentially with the number of modes.

Figure 1 .
Figure 1.Probability calculation for 5 modes for different 2-photon output states.

Figure 2 .
Figure 2. Graph of the average probability and the standard deviation calculated up to different orders for different numbers of samples.The state for this graph is 2-photon.

Figure 3 .
Figure 3. Convergence of the cosine similarity between estimated probability distribution over the set of all 2-photon states and ground truth for different N.

Figure 4 .
Figure 4. Cosine similarity between probability distribution over the set of all 2-photon states after H samples and after H + 10 samples for different N. .

Figure 5 .
Figure 5. Precomputation time on an Intel i5 CPU in ms versus the number of modes.

Figure 6 .
Figure 6.Average time per sample on an Intel i5 CPU versus the number of modes for states with different photon numbers.