Classical Modelling of a Bosonic Sampler with Photon Collisions

The original formulation of the boson sampling problem assumed that little or no photon collisions occur. However, modern experimental realizations rely on setups where collisions are quite common, i.e., the number of photons M injected into the circuit is close to the number of detectors N. Here we present a classical algorithm that simulates a bosonic sampler: it calculates the probability of a given photon distribution at the interferometer outputs for a given distribution at the inputs. This algorithm is most effective in cases with multiple photon collisions, and in those cases, it outperforms known algorithms.


Introduction
Quantum computers are computational devices that operate using phenomena described by quantum mechanics. Therefore, they can carry out operations that are not available for classical computers. Practical tasks are known which can be solved exponentially faster using quantum computers rather than classical ones. For example, the problem of integer factorization, which underlies the widely used RSA cryptosystem, can be solved by classical computers only in an exponential number of operations, whereas the quantum Shor's algorithm [1] can solve it in a polynomial number of operations. Due to the technological challenges of manufacturing quantum computers, quantum supremacy (the ability of a quantum computational device to solve problems that are intractable for classical computers) in practice remains an open question.
Boson sampling [2] is a good candidate for demonstrating quantum supremacy. Basically, boson samplers are linear-optical devices containing a number of non-classical sources of indistinguishable photons, a multichannel interferometer mixing photons of different sources, and photon detectors at the output channels of the interferometer. A more specific set-up to be addressed in this paper deals with the single photon sources. In this case, there are exactly M single photons injected into some or all N inputs of the interferometer. Performing multiple measurements of the photon counts at the outputs, one characterizes experimentally the many-body quantum statistics after the interferometer [3], given an input state and the interferometer matrix.
Boson samplers are not universal quantum computers, that is they cannot perform arbitrary unitary rotations in a high-dimensional Hilbert space of a quantum system. Nevertheless, a simulation of a boson sampler with a classical computer requires a number of operations exponential in M. It was shown [4] that a classical complexity of the boson sampling matches the complexity of computing the permanent of a complex matrix. This means that the problem of a boson sampling is #P-hard [5] and there are no known classical algorithms that solve it in polynomial time. The best known exact algorithm for computing the permanent of a n × n matrix is the Ryser formula [6], which requires O(n2 n ) operations. The Clifford-Clifford algorithm [7] is known to solve the boson sampling problem in O(M2 M + N M 2 ) operations. This makes large enough bosonic samples practically intractable with classical computational devices. Although boson sampling does not allow for arbitrary quantum computations, there are still practical problems that can be solved with boson sampling: for example, molecular docking [8], calculating the vibronic spectrum of a molecule [9,10] as well as certain problems in graph theory [11]. Boson sampling is also useful for statistical modeling [12] and machine learning [13,14].
There are several variants of boson sampling that aim at improving the photon generation efficiency and increasing the scale of implementations. For example, Scattershot boson sampling uses many parametric down-conversion sources to improve the single photon generation rate. It has been implemented experimentally using a 13-mode integrated photonic chip and six PDC photon sources [15]. Another variant is the Gaussian boson sampling [2,16], which uses the Gaussian input states instead of single photons. The Gaussian input states are generated using PDC sources, and it allows a deterministic preparation of the non-classical input light sources. In this variant, the relative input photon phases can affect the sampling distribution. Experiments were carried out with N = 12 [17] and N = 100 [18]. The latter implementation uses the PPKTP crystals as PDC sources and employs an active phase-locking mechanism to ensure a coherent superposition.
Any experimental set-up, of course, differs from the idealized model considered in theoretical modeling. Bosonic samplers suffer from two fundamental types of imperfections. First, the parameters of a real device, such as the reflection coefficients of the beam splitters and the phase rotations, are never known exactly. Varying the interferometer parameters too much can change the sampling statistics drastically so that modeling of an ideal device no longer makes much sense. Assume now that we know the parameters of the experimental set-up with great accuracy. Then what makes the device non-ideal is primarily photon losses, that is, not all photons emitted at the inputs are detected in the output channels. These losses happen because of imperfections in photon preparation, absorption inside the interferometer and imperfect detectors. There are different ways of modeling losses, for example by introducing the extra beam splitters [19] or replacing the interferometer matrix by a combination of lossless linear optics transformations and the diagonal matrix that contains transmission coefficients that are less than one [20].
Imperfections in middle-sized systems make them, in general, easier to emulate with classical computers [21]. It was shown [22] that with the increase of losses in a system the complexity of the task decreases. When the number of photons M that arrive at the outputs is less than √ M, the problem of a boson sampling can be efficiently solved using classical computers. On the other hand, if the losses are low, the problem remains hard for classical computers [23].
Photon collisions are a particular phenomenon that is present in nearly any experimental realization but were disregarded in the proposal by Aaronson and Arkhipov [4]. Originally it was proposed that the number of the interferometer channels is roughly a square of the number of photons in the set-up, N ≥ M 2 . In this situation, all or most of the photons arrive at a separate channel, that is no or a few of the photon collisions occur. In the experimental realizations [18], however, N ≈ M.
Generally, a large number of photon collisions makes the system easier to emulate. For example, one can consider the extreme case that all photons arrive at a single output channel -the probability of such an outcome can be estimated within a polynomial time. The effect of the photon collisions on the computational complexity of a boson sampling has been previously studied [24]. A measure called the Fock state concurrence sum was introduced and it was shown that the minimal algorithm runtime depends on this measure. There is an algorithm for the Gaussian boson sampling that also takes advantage of photon collisions [25].
In this paper, we present the algorithm aimed at the simulation of the bosonic samplers with photon collisions. In the N ≈ M regime, our scheme outperforms the Clifford-Clifford method. For example, we consider an output state of the sampler with M = N where one-half of the outputs are empty, and the other half is populated with 2 photons in each channel. Computing the probability of such an outcome requires us O N 2 3 N/2 operations. The speedup in states that have more collisions is even greater.

Problem Specification
Consider a linear-optics interferometer with N inputs and N outputs which is described by a given unitary N × N matrix U: where a † i and b † i are the creation operators acting on inputs and outputs, respectively. We will denote the input state as where k i is the number of photons in the i-th input. The output state will be denoted as It follows from (1) and (2) that a specific input state corresponds to a set of output states that are observed with different probabilities: can be written as After expanding, this expression will be a sum of terms that have the following form: where α(l 1 , l 2 , . . . , l N ) is a complex number that consists of the elements of U that correspond to the given output state. Therefore, the probability of observing an output state |l 1 , l 2 , . . . , l N will be The problem is to determine the probabilities of all of the output states. The main difficulty lies in calculating the number α(l 1 , l 2 , . . . , l N ) for the given input and output states. This paper presents an algorithm that solves this problem using the properties of the Fourier transform.

Algorithm Description
Let us define a function where {Q i } is some fixed set of N natural numbers. The choice of {Q i } will later be discussed in detail. This function represents the expression (3), where the creation operators b † j are replaced with exponents e i2πQ j t that oscillate with frequencies Q j . After expanding the expression (4), we get where the sum is computed over all sets {l 1 , . . . , l N } such that ∑ N i=1 l i = M. Therefore, for each possible output state |l = |l 1 , l 2 , . . . , l N there is the harmonic in and an amplitude of α(l 1 , l 2 , . . . , l N ). The set of numbers {Q i } can be chosen in such a way that the harmonics do not overlap, i.e., there are no two outputs states |l and |l with equal frequencies If no harmonics overlap, then any of the numbers α(l 1 , l 2 , . . . , l N ) can be found from the Fourier transform of the function g(t; {Q i }). On the other hand, to calculate the probability of a specific state |l it is sufficient to choose {Q i } in such a way that the frequency f (|l ; {Q i }) is unique in the spectrum, i.e., the frequency of any other state |l differs from the frequency of the state in consideration: An example of g(t, {Q i }) with non-overlapping harmonics and its spectrum can be seen in Figure 1 (the choice of {Q i } used here is described in Section 3.1).  {Q i }) (the top picture) and its spectrum obtained as squared modulus of the Fourier transform of g(t) (the bottom picture). A system with N = M = 3 is used, the input state is |k = |1, 1, 1 . Each peak in the spectrum corresponds to one of ten possible output states.

The First Method of Choosing {Q i }
Let us consider the methods of choosing {Q i } that will satisfy the necessary conditions on the spectrum. The first one consists of the following: let M be the total number of photons at the inputs, i.e., M = ∑ N i=1 k i for the input state |k 1 , k 2 , . . . , k N . We choose Then for an any output state |l 1 , l 2 , . . . , l N the sum will be a number that has a representation l N l N−1 . . . l 1 in a positional numeral system with a radix M + 1 (since l i < M + 1 ∀i). From the uniqueness of representation of numbers in positional numeral systems, it follows that every sum ∑ N i=1 l i Q i (some number in a positional numeral system with a radix M + 1) will correspond to exactly one set of numbers l 1 , l 2 , . . . , l N (its representation in this numeral system; l i being its digits).
Using this method of choosing {Q i } guarantees that the probability of any output state can be calculated from the spectrum of g(t;

The Second Method of Choosing {Q i }
Another method of choosing {Q i } is useful when the goal is to compute the probability of one specific output state |l when the input state |k is given. This method does not guarantee that the frequencies will be different for any two output states, but it guarantees that the frequency of the state in consideration (the target frequency) f (|l ; {Q i }) will be unique in the spectrum. Note that in this case Q i = Q i (|l ), i.e., the choice of {Q i } depends on the output state. This method of choosing {Q i } can be described in the following way: Therefore if all of the outputs in the state |l contain the photons, then Q 1 = 1, Q 2 = (l 1 + 1), Q 3 = (l 2 + 1)(l 1 + 1) and so on: Q i+1 is (l i + 1) times greater than Q i .
Let us show that this method will actually lead to the target frequency being unique in the spectrum. Let {Q i } = {Q i (|l )} be the frequencies calculated using the method described above. We need to prove that for any output state |l = |l it is true that Firstly, let us suppose that some of the outputs in the state |l 1 , l 2 , . . . , l N contain 0 photons. Let h 1 , . . . , h K be the indices of the outputs that contain at least one photon: since all the terms corresponding to empty outputs are zero in both sums (Q i = 0 if the i-th output contains 0 photons).
Note that we can view it as a "reduced" system with the K outputs, in which the output state |l contains at least one photon in every output. However, this system has one difference. Previously we considered the possible output states to be all the states that satisfy ∑ N i=1 l i = M and l i ≥ 0 ∀i. Now, in this "reduced" system we must consider all the output states such that ∑ K i=1 l i ≤ M and l i ≥ 0 ∀i. This happens because the output states |l can have a non-zero amount of the photons in the outputs that were empty in |l ; such outputs will have no effect on the frequency and they remain outside the "reduced" system. Therefore, instead of a system where some outputs can be empty and some Q i can be zero, but ∑ N i=1 l i = ∑ N i=1 l i , we can consider the system where l i > 0 ∀i but ∑ N i=1 l i ≥ ∑ N i=1 l i . In this system Q i = ∏ i−1 j=1 (l j + 1), and Q 1 = 1. To prove the correctness of the algorithm, we must prove the following statement: Theorem 1. Let N be some natural number. Let l 1 , l 2 , . . . , l N , l 1 , l 2 , . . . l N be natural numbers that satisfy the following conditions: The proof of this statement can be found in Appendix A.

Parameters of the Fourier Transform
To calculate the Fourier transform of the function g(t; {Q i }) we will use a fast Fourier transform (FFT). Firstly, we will define its parameters: the sampling interval ∆t (or the sampling frequency f s = 1 ∆t ) and the number of data points K. The function will be calculated at points n∆t, 1 ≤ n ≤ K. Since all the frequencies in the spectrum of g(t; {Q i }) are natural numbers, they can be discerned with the frequency resolution of ∆ f = 1. The function therefore will be calculated in the points within an interval [0; 1] which contains at least one period of each harmonic.
The sampling frequency f s = 1 ∆t is often chosen according to the Nyquist-Shannon theorem: if the Nyquist frequency f N = f s 2 = 1 2∆t is greater than the highest frequency in the spectrum f max , then the function can be reconstructed from the spectrum and no aliasing occurs. Therefore, one way of choosing the sampling frequency is f s = 2 f max . It can be used with both methods of choosing {Q i }.
Since the function is calculated in the points within the interval [0; 1], the number of data points K is equal to the sampling frequency f s . Optimization of the algorithm requires lowering the sampling frequency as much as possible.
If the goal is to calculate the probability of one specific state |l and the second method of choosing {Q i } is used, then the sampling frequency f s can be chosen to be lower than 2 f max . This will lead to aliasing: a peak with frequency f will be aliased by the peaks with frequencies f + k f s , k ∈ Z. To correctly calculate the probability of the output state from the spectrum computed this way, the spectrum must not contain frequencies that satisfy Note that it will not be possible to reconstruct the function g(t; {Q i }) from such a spectrum.
We will show that the sampling frequency f s for calculating the probability of the output state |l using the second method of choosing {Q i } can be chosen to be greater by one than the target frequency f = f (|l ; {Q i (|l )}): To prove this statement, we must show that the spectrum of g(t; {Q i }) will not contain the frequencies f = f (|l ; {Q i }) that satisfy f = f + k f s , k ∈ Z. This is shown by a theorem that is analogous to Theorem 1 yet has a weaker condition: equation in condition (3) is taken modulo f s .

Theorem 2.
Let N be some natural number. Let l 1 , l 2 , . . . , l N , l 1 , l 2 , . . . l N be natural numbers that satisfy the following conditions: The proof of this statement can be found in Appendix A.

Complexity of the Algorithm
Let us consider the computational complexity of this algorithm. The complexity of a fast Fourier transform of a data array of K points is O(K log K). The total complexity of the algorithm consists of the complexity of calculating g(t; {Q i }) in K points and the complexity of a fast Fourier transform.
Computing g(t; {Q i }) in each point is done in ∝ N 2 operations: the expression for g(t; {Q i }) consists of at most N factors, each of which can be computed in N additions, N multiplications and N exponentiations. If some of the inputs are empty, there will be fewer factors in the expression, and the resulting complexity will be lower.
When the first method of choosing {Q i } is used, the number of data points K is proportional to the highest frequency in the spectrum of g(t; {Q i }), since the sampling frequency is chosen using the Nyquist-Shannon theorem. The frequencies corresponding to the outputs states, in this case, are equal to ∑ N i=1 l i Q i = ∑ N i=1 l i (M + 1) i−1 . The highest frequency then is M(M + 1) N−1 and corresponds to the state where the last output contains all the photons. The total complexity of calculating all the probabilities will then be When the second method of choosing {Q i } is used, the number of data points K depends on the frequency of the output state in consideration. This frequency is highest when photons are spread over outputs evenly. For the system with M = mN, m ∈ N this corresponds to the state where each output contains m photons. In this case, the highest frequency is equal to Therefore, the sampling frequency and the required number of data points will be (m + 1) N−1 . The complexity of the algorithm in the worst case will be In most states, however, photons will not be spread evenly between outputs, and outputs with a high number of photons will lower the sampling frequency and the complexity for calculating the probability of the state. This means that the more photon collisions are in a state, the better this algorithm performs. Let us consider several specific cases.
1. M = N, the goal is to compute the output state that contains 2 photons in one half of the outputs and 0 photons in the other half. The frequency corresponding to such a state will be and the complexity of the algorithm will be equal to For comparison, the complexity of the Clifford-Clifford algorithm (which is O(M2 M + N M 2 )) in this case will be equal to O(N2 N ).
2. M = N 2 , the goal is to compute the output state that contains 2N photons in one half of the outputs and 0 photons in the other half. The frequency corresponding to such state will be ∑ Again, the complexity of the Clifford-Clifford algorithm, in this case, will be equal to O(N 2 2 N 2 ).

Weighted Average Complexity
We can measure the weighted average computational complexity of the algorithm described above by computing ∑ i p i C i , where i numbers measurement outcomes (output states) for the bosonic sampler, p i is the probability to observe the ith state, C i is the complexity of calculating p i (assuming the second method of choosing {Q i } is used), and the sum is calculated over all possible outcomes.
We have computed this weighted average complexity for the systems with varying N. We set M = N, and |k = |1, 1, . . . , 1 as the input state. The interferometer matrices for those systems were randomly generated unitary matrices. Figure 2 shows that the weighted average complexity of the algorithm is significantly lower than the maximum complexity of the algorithm and their ratio decreases as N increases.

The Metropolis-Hastings
For systems with large N it might be computationally intractable to calculate the exact probability distribution of output states. The number of the possible output states scales with N and M as Sampling from a probability distribution from which direct sampling is difficult can be done using the Metropolis-Hastings algorithm, which uses a Markov process. It allows us to generate a Markov chain in which points appear with frequencies that are equal to their probability. In our case, the points will be represented by the output states, i.e., sets of numbers |l 1 , l 2 , . . . , l N such that ∑ N i=1 l i = M. We will require a transition function that will generate a candidate state from the last state in the chain. When the points are represented by real numbers, a candidate state can be chosen from a Gaussian distribution centered around the last point. In our case, however, the transition function will be more complex.
The transition function (Algorithm 1) must allow the chain to arrive in each of the possible states. It will be convenient to define it in the following way: h is a set that contains indices of non-empty outputs K := |h| K is the amount of non-empty outputs r := random({1, . . . , K}) We generate a random number 1 ≤ r ≤ K l h r := l h r − 1 Decrease the number of photons in h r -th output s := random({1, . . . , N} \ h r ) We generate a random number 1 ≤ s ≤ N such that s = r l s := l s + 1 Increase the number of photons in s-th output return |l 1 , l 2 , . . . , l N For the condition of a detailed balance to hold, we will require a function p(l 1 , . . . , l N ; l 1 , . . . , l N ) which is equal to the probability of |l 1 , . . . , l N being the transition function output when the last state in the chain is |l 1 , . . . , l N . It is trivially constructed from the transition function.
Let u be the ratio of the exact probabilities of the states |l 1 , . . . , l N and |l 1 , . . . , l N . The condition of a detailed balance will hold if the Markov chain will go from the state |l 1 , . . . , l N to the state |l 1 , . . . , l N with the probability α = u · p(l 1 , . . . , l N ; l 1 , . . . , l N ) p(l 1 , . . . , l N ; l 1 , . . . , l N ) .
Given the Markov chain, we can then calculate the approximate probability of a state by dividing the number of times this state occurs in the chain by the total number of steps of the chain.

Results
To demonstrate that the frequencies with which states appear in the Markov chain converge to the exact probability distribution, we have tested it on the system with N = 10, M = 10, |k = |1, 1, . . . , 1 and a random unitary 10 × 10 matrix as the interferometer matrix. To calculate the distance between the exact and the approximate distribution we used cosine similarity: where p and q are some probability distributions. Namely, the value of 1 − S C (p, q) is 0 when p and q are equal. Figure 3 shows that 1 − S C (p, q) decreases as the Markov chain makes more steps.

Conclusions
We have presented a new algorithm for calculating the probabilities of the output states in the boson sampling problem. We have shown the correctness of the algorithm and calculated its computational complexity. This algorithm is simple in implementation as it relies heavily on the Fourier transform, which has numerous well-documented implementations.
The performance of this algorithm is better than the other algorithms in cases where there are many photon collisions. An example we give is an output state where all the photons are spread equally across one-half of the outputs, with the other half of the outputs empty. In this case the algorithm requires O N 2 3 N/2 operations, while the Clifford-Clifford algorithm requires O N2 N operations.
We have also proposed a method to approximately calculate the probability distribution in the boson sampling problem. It can be used when the system size is too large and calculating the exact probability distribution is intractable. Our results show that this algorithm indeed produces a probability distribution that converges to the exact probability distribution.
We plan to study further the application of the Metropolis-Hastings algorithm to approximate the boson sampling problem. When losses are modeled in the system, the probability distribution of the output states becomes concentrated. For example, when losses are high, the most probable states are those with many lost photons. When the losses are low, the probability is concentrated in the area where no or a few photons are lost. This property makes the Metropolis-Hastings algorithm especially effective in solving this problem.
2. Let us prove some general statements that will help us prove the induction step. Let us assume that the statement of the theorem is true for N − 1. Then the following lemma holds for numbers l 1 , . . . , l N , l 1 , . . . , l N that satisfy the conditions of the theorem for N: Lemma A1. ∀m ∈ N : 1 ≤ m ≤ N the following is true: l m = l m − k m−1 + k m (l m + 1), where k m and k m−1 are natural numbers and k 0 = k N = 0.
Proof. We will prove this lemma by induction on m. First we prove the base case m = 1. Let us write down the expression from condition (3) of the theorem modulo (l 1 + 1) (all terms that contain (l 1 + 1) will turn to zero and only the first ones from each side will remain): (A1) Then where k 1 is an integer. Since l 1 ≥ 0, k 1 must be natural. Since k 0 = 0, the equation Now let us assume that the statement of this lemma holds for all i such that 1 ≤ i ≤ m − 1. Let us write down the expression from condition (3) of the theorem modulo (l m + 1)(l m−1 + 1) . . . (l 1 + 1)-this will turn to zero all the terms, except for first m on both sides (note that Q m has a form of (l m−1 + 1)(l m−2 + 1) . . . (l 1 + 1)): l 1 Q 1 + l 2 Q 2 + . . . + l m Q m ≡ l 1 Q 1 + l 2 Q 2 + . . . + l m Q m mod (l m + 1)(l m−1 + 1) . . . (l 1 + 1).
where k m is an integer. Using l 1 = l 1 + k 1 (l 1 + 1) and dividing by (l 1 + 1) , we get Applying sequentially l i = l i − k i−1 + k i (l i + 1) and dividing by (l i + 1) for all i from 2 to m − 1 we get where k m−1 ≥ 0 by the assumption of the induction step and k m ≥ 0 since l m ≥ 0.
Secondly, let us write down condition (2) of the theorem with the equation l i = l i − k i−1 + k i (l i + 1) (which is given by Lemma 1): (−k i−1 + k i (l i + 1)) ≤ 0.
Since k 0 = k N = 0, However, the condition (1) states that l i > 0 ∀i, Lemma 1 states that k i ≥ 0 ∀i > 1, and we have shown above that k 1 > 0. Therefore k i l i ≥ 0 ∀i and k 1 l 1 > 0, which means that ∑ N i=1 k i l i > 0. We get a contradiction which proves the theorem.