Estimation with Heisenberg-Scaling Sensitivity of a Single Parameter Distributed in an Arbitrary Linear Optical Network

Quantum sensing and quantum metrology propose schemes for the estimation of physical properties, such as lengths, time intervals, and temperatures, achieving enhanced levels of precision beyond the possibilities of classical strategies. However, such an enhanced sensitivity usually comes at a price: the use of probes in highly fragile states, the need to adaptively optimise the estimation schemes to the value of the unknown property we want to estimate, and the limited working range, are some examples of challenges which prevent quantum sensing protocols to be practical for applications. This work reviews two feasible estimation schemes which address these challenges, employing easily realisable resources, i.e., squeezed light, and achieve the desired quantum enhancement of the precision, namely the Heisenberg-scaling sensitivity. In more detail, it is here shown how to overcome, in the estimation of any parameter affecting in a distributed manner multiple components of an arbitrary M-channel linear optical network, the need to iteratively optimise the network. In particular, we show that this is possible with a single-step adaptation of the network based only on a prior knowledge of the parameter achievable through a “classical” shot-noise limited estimation strategy. Furthermore, homodyne measurements with only one detector allow us to achieve Heisenberg-limited estimation of the parameter. We further demonstrate that one can avoid the use of any auxiliary network at the price of simultaneously employing multiple detectors.


Introduction
Due to the discrete nature of physical phenomena, the error in the estimation of physical properties, such as lengths, delays, temperatures, or refractive indexes, when employing N probes (e.g., photons, electrons) is strongly limited by the so-called shot-noise scaling factor of 1/ √ N when a classical estimation strategy, i.e., in which the probe and the measurement employed can be fully described classically, is performed. However, it has been proven that it is possible, by exploiting quantum features such as entanglement and the squeezing of light, to overcome this classical limitation, and to reach an enhanced scaling in the precision of order 1/N, called the Heisenberg limit [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].
A promising path towards this quantum-enhanced sensitivity is the one enabled by the squeezing of light [18][19][20][21]. Squeezed states are particular states of the electromagnetic field characterised by a quadrature field with smaller fluctuations than the field quadratures of the vacuum itself. This useful property, together with the Gaussian nature of these states which makes them relatively easy to produce, resilient to noise, and mathematically simple to manipulate [20,22], makes these states evident candidates for metrological purposes. The working principle of these protocols is rather straightforward, and can be outlined as follows. The probe, a pure squeezed state, undergoes an optical phase delay of a magnitude which depends on the unknown parameter we are interested to measure. The phase delay In fact, in the regime of small phases, the transformation the probe undergoes is small enough to make the rotation of the squeezed quadrature negligible, which in turn remains practically unchanged. Nevertheless, in some practical scenarios the experimenters have no control on the value of the phase-e.g., the system under investigation is unreachable or cannot be manipulated-and in such cases, this approach ceases to be feasible.
We will show here how the challenge of adaptivity, and any restriction in the range of possible values of the parameter to be estimated, can be overcome thanks to some recent results in the field of distributed quantum metrology, for the estimation at the Heisenberg scaling sensitivity of a single parameter, distributed throughout an arbitrary, multi-channel passive and linear network (see Figure 1). In particular, the unknown parameter can be thought of as a physical property of an external field-such as the temperature of the environment, or the magnitude of an electromagnetic field-which globally affects some or all of the components of the network. This review is organised in two parts, in which we introduce two distinct estimation schemes, discussing the features that differentiate the two approaches. In Section 2, we will discuss a squeezing-encoding scheme achieving Heisenberg-scaling sensitivity by employing a single squeezed vacuum state and homodyne detection at a single output channel [42]. We will assume that no prior knowledge on the value of the parameter is known, and that the structure of the network, as well as the nature of the parameter it encodes, are completely arbitrary. We will present the conditions that need to be satisfied in order to reach the Heisenberg scaling in such a generic model, and we will show that, in general, only a classical prior estimation of the unknown parameter suffices to prepare a single auxiliary linear network required to optimize the setup. These results show that it is possible to conceive feasible two-step estimation strategies, composed of a first classical estimation of the unknown parameter required to engineer the auxiliary stage, and then the actual quantum estimation. In Section 3, we describe a different scheme achieving the Heisenberg scaling, which makes use of a single-mode squeezed-coherent state as a probe, and homodyne detection in every output port of the linear network [43]. Once again, the model will assume no prior knowledge on the parameter, nor on the structure of the network. We will see that, in this case, no auxiliary stage is required to achieve the Heisenberg scaling, so that the estimation can be carried out in a single step, at the cost of employing a multiple homodyne detector. Moreover, we will show that, due to the introduction of a non-vanishing displacement in the probe, the overall precision becomes the sum of two contributions, one deriving from the information encoded in the sample mean of the outcomes of the homodyne measurements, the other in the sample covariance matrix.
. . . . . . ϕ. The parameter can be thought of as a physical property of an external agent (e.g., temperature, electromagnetic field) which affects multiple components, possibly of different natures, of the network [42,43]. Reprinted with permission from ref. [42], © 2021 The Author(s).

Quantum Estimation Based on Single-Homodyne Measurements
In this section, we will describe a generic model of a M-channel linear networkÛ ϕ that allows the Heisenberg scaling to be reached in the estimation of a parameter ϕ distributed arbitrarily inÛ ϕ , regardless of the structure of the network. Our model makes use of two auxiliary stages, namely two other linear networksV in andV out , whose purposes are to distribute the probe, a single-mode squeezed vacuum injected in the first channel, in all the optical modes ofÛ ϕ , and then to refocus it in the only channel observed through homodyne detection. We will show that only one of the two auxiliary networks must be optimized, and although the optimal choice of such network depends on the value of ϕ, the required precision for this optimization can be achieved with a classical estimation strategy. This allows us to conceive two-step estimation protocols, where a prior classical estimation to engineer the optimal auxiliary stage is followed by the quantum Heisenberg scaling-achieving estimation.

Setup
Let us consider a M-channel passive linear network, whose action on the state of the probe is described by the unitary operatorÛ ϕ , ϕ being a single, generally distributed, unknown parameter we are interested to estimate. Due to its passive and linear nature, this network can be represented by a unitary matrix. Let then, U ϕ be the M × M unitary matrix representing the action ofÛ ϕ on the annihilation operatorsâ i , i = 1, . . . , M, associated with each channel of the network, satisfying the commutation relations â i ,â j = 0, â i ,â † j = δ ij , where we denote with δ ij the Kronecker delta. The matrix U ϕ is thus defined by the We will consider a single-mode squeezed state |ψ 0 = |r ≡Ŝ(r)|vac , r = (r, 0, . . . , 0), as a probe, withŜ(r) = e 1 2 r(â †2 1 −â 2 1 ) squeezing operator and N = sinh 2 r average number of photons, all injected in a single channel of the apparatus, i.e., the first one with the choice of squeezing parameters r in Equation (2). In other words, the state |ψ 0 presents a non-vanishing number of photons only in the first mode. As discussed in Section 1, the approach of squeezing-based estimation strategies is to infer the value of ϕ from the transformation of the covariance matrix Γ = diag(e 2r , 1, . . . , e −2r , 1, . . . )/2 of the state |ψ after the interferometric evolutionÛ ϕ . To do so, we will consider the model where a single output channel, say the first, is measured through homodyne detection. We will denote with θ the phase of the local oscillator, which coincides with the phase of the measured quadraturex θ . We will assume that r > 0 without loss of generality. We notice that, with this assumption, the squeezed quadrature of the state |ψ 0 isp 1 Var[x 1 ] = e 2r /2. In terms of the creation and annihilation operators, the measured quadrature field can be expressed asx θ = (e −iθâ 1 + e iθâ † 1 )/ √ 2. Since the linear networkÛ ϕ is arbitrary, the average number of photons that can be actually detected after the interferometric evolution ranges between 0 and N. Naturally, if this number were to be small, or far from N, we would expect a sub-optimal performance of the estimation scheme, since most of the photons would come out of the networkÛ ϕ from channels that are not observed, and information on ϕ would in this way be lost. Moreover, it may happen that the transformation that it imposes on the probe in the transition to the first output port is trivial, namely that the element (U ϕ ) 11 of the transition amplitude matrix U ϕ does not depend on ϕ. This occurrence would preclude the probe from acquiring any observable information on the parameter. In order to prevent these conditions from happening, this model includes the presence of two auxiliary linear and passive networks acting on the probe,V in before andV out after the linear networkÛ ϕ (see Figure 2). The first auxiliary stageV in can be understood as network scattering, which distributes the probe through multiple input channels of the parameter-dependent networkÛ ϕ . The purpose of the second stageV out is instead to refocus the probe, after the interaction withÛ ϕ , into the only observed channel. The unitary matrix describing the overall network is thus given by the matrix product of the three single matrix representations V out , U ϕ and V in .  (2) is injected in the first channel of a network composed of a first auxiliary stageV in , a networkÛ ϕ which depends on a generally distributed parameter ϕ we want to estimate, and a second auxiliary stageV out , before being detected through homodyne measurements in the first output port. The role of the two auxiliary stagesV in andV out is to respectively distribute the photons of the probe through multiple channels, and then to refocus them into the only observed channel. We will show that only one auxiliary network needs to be optimized to reach the Heisenberg scaling, while, for networks with a large number of channels, the effect of the non-optimized network is typically irrelevant on the overall precision of the estimation [42]. Reprinted with permission from ref. [42], © 2021 The Author(s).
Since in this model all the photons are injected in the first input channel, which is also the only channel observed at the output, the only relevant transition amplitude is the element (u ϕ ) 11 of the overall unitary matrix in Equation (3). We can then rewrite where P ϕ = (V out U ϕ V in ) 11 2 is the probability that a single photon injected in the first port is detected at the first output port of the overall network, and γ ϕ = arg((V out U ϕ V in ) 11 ) is the phase acquired by the probe during the interferometric evolution. The Gaussian nature of the probe and of the homodyne measurements yield a Gaussian probability density function which governs the outcomes of the homodyne detection [18][19][20][21]. The univariate Gaussian probability density function in Equation (5) is centred at zero due to the absence of a displacement in the probe, while its variance is given by (see Appendix A) where θ is the phase of the local oscillator.
Once the probability density function p ϕ (x) is known, it is possible to evaluate the Fisher information [44,45] of the estimation scheme, which in turn fixes the ultimate precision δϕ min achievable in the estimation of ϕ through ν iteration of the measurement, given by the Cramer-Rao bound [44,45] δϕ min = 1 For a Gaussian distribution centred on zero, the Fisher information reads (see Appendix B) where ∂ ϕ := d/dϕ. We notice from Equation (6) that all the information on the parameter ϕ is encoded in the variance σ 2 ϕ of the measured quadraturex θ through the two quantities P ϕ and γ ϕ . Thus, we can split ∂ ϕ σ 2 ϕ into two contributions, one containing the derivative of P ϕ , the other the derivative of γ ϕ , namely where ∂ P and ∂ γ are derivatives with respect to P ϕ and γ ϕ , respectively, so that

Heisenberg Scaling
Generally, without imposing any condition on the setup, this model does not achieve the Heisenberg scaling in the precision for the estimation of ϕ. In fact, we can explicitly rewrite the variance σ 2 ϕ in Equation (6) in terms of the average number of photons N = sinh 2 r in the probe where O(1) is a term of order equal to or smaller than 1, negligible in the asymptotic regime of N large (We will say that given two functions f (N) and g(N), f (N) = O(g(N)) when lim N→∞ | f (N)/g(N)| < +∞). We can also rewrite the derivatives ∂ P σ 2 ϕ and ∂ γ σ 2 ϕ in terms of N Plugging the asymptotics shown in Equations (12) and (13) into the expression of the Fisher information in Equation (9), we notice that the numerator of F (ϕ) can be of the order N 2 at most. Since the denominator is in general of the order N 2 as well, it yields an overall general scaling of the Fisher information of O(1)-i.e., even lower than the SQL.
In order for this setup to reach the Heisenberg scaling, we thus need to impose some constraints which prevent the denominator of the Fisher information, i.e., the variance σ 2 ϕ , to grow with N. We show in Appendix C that the asymptotic conditions (Given a function f (N) and a finite sum p(N) of powers of N, we will say that f (N) ∼ p(N) when they show the same asymptotic behaviour. In formulas, f (N) ∼ p(N) when f (N) = p(N) + O(N s−ε ), ∀ε > 0, with s exponent of the smallest power of N appearing in the sum p(N)) need to be satisfied for large N, with 0 ≤ < N and k = 0 arbitrary constant independent of N. We will discuss more in detail the physical meaning of these conditions in Section 2.3, and we will see that Equation (14) is a minimum-resolution requirement on the tuning of the local oscillator, while Equation (15) is the condition on the refocusing of the probe. Intuitively, Equations (14) and (15) assure that σ 2 ϕ at the denominator of F (ϕ) does not grow as fast as ∂ ϕ σ 2 ϕ at the numerator: instead, we can see in Appendix C that, when these conditions hold, the variance σ 2 ϕ becomes of the order 1/N, while its derivative ∂ ϕ σ 2 ϕ remains constant for large N. In particular, we show in Appendix C that the Fisher information in Equation (9) asymptotically reads proving the achievement of the Heisenberg scaling, with (k, ) = 8k 1 + 16k 2 + 4 2 (17) positive factor reaching its maximum value for k = ±1/4 and = 0, namely (1/4, 0) = 1.
Compared with the conditions found in the literature for single-parameter Gaussian estimation schemes based on squeezed-vacuum probes which, adjusted to the notation employed so far, can be translated into γ ϕ − θ = tan −1 e 2r ∼ π/2 + (4N) −1 and P ϕ = 1 [26,27], we see that Equations (14) and (15) achieve two important further results

•
It is possible to loosen the optimal conditions found in literature, which still allow us to reach the Heisenberg scaling, at the price of a multiplying factor 0 < (k, ) ≤ 1 which does not depend on N and hence does not ruin the scaling of the precision; • These conditions are explicitly expressed in terms of the average number N of photons in the probe and, therefore, in terms of the precision we want to achieve. In Section 2.3, we will discuss how this allows us to assess the precision needed to engineer suitable auxiliary stages V in and V out to reach the Heisenberg scaling, showing that it is possible to avoid an iterative adaptation of the optical network.
Lastly, we recall that it is always possible to asymptotically saturate the Cramér-Rao bound in Equation (8) in the limit ν → +∞ of samples with a large number ν of observations. In particular, the maximum-likelihood estimatorφ MLE is an asymptotically efficient and Gaussian estimator which can be obtained through the maximisation of the Likelihood function associated with the set {x i } i=1,...,ν of the ν measurement outcomes of the quadrature fieldx θ [44,45]. In Appendix D we see that the non-trivial solution which maximises the Likelihood function L(ϕ; x) in Equation (18) is simply given by the estimatorφ MLE satisfying where σ 2 (ϕ) is the variance σ 2 ϕ in Equation (6) as a function of ϕ, and S(x) 2 is the usual sample variance Generally, Equation (19) cannot be solved analytically, so that numerical methods need to be employed to find non-trivial solutions. Nevertheless, it is possible to find some exceptions, particularly for elementary functional dependencies of P ϕ and γ ϕ on the unknown parameter ϕ. For example, in the case for which P ϕ ≡ P is independent of ϕ, and the functional dependence of γ(ϕ) := γ ϕ on ϕ of the phase acquired by the probe is invertible, the function σ(ϕ) 2 in Equation (19) can be easily inverted as well, and the maximum-likelihood estimator reads We can notice how, due to the presence of the cosine in σ 2 ϕ in Equation (6), some prior knowledge on the parameter ϕ is required in order to correctly choose the invertibility interval for cos(2γ(ϕ) − 2θ)-i.e., to choose the correct value of n ∈ N and the sign of the arccos function in Equation (21). In the next section, we will see how a classical prior knowledge of the parameter ϕ is required to satisfy condition (15), achievable with a prior coarse estimation reaching an uncertainty of the order of 1/ √ N. Such prior knowledge on the parameter, for a large enough N, can be employed to choose the correct invertibility interval.

Conditions for the Heisenberg Scaling
We can see that both conditions in Equations (14) and (15) are ϕ-dependent, suggesting that an adaptive procedure must take place in order to employ the estimation scheme described in Section 2.1, as it is customary for ab-initio Gaussian estimation strategies [24,27,[35][36][37]. However, some considerations can be made in this regard.
Condition (14) fixes the phase of the quadraturex θ which needs to be measured. The quantity γ ϕ is in fact the phase acquired by the squeezed vacuum during the interferometric evolution from the first input port to the first output port, and for γ ϕ ≡ ϕ Equation (14) resembles the condition θ = ϕ + tan −1 e 2r found in the literature for single-phase estimation [24]. On the other hand, condition (14) is a looser condition to reach the Heisenberg scaling, and it puts in relation the precision with which we are able to choose the phase θ of the local oscillator-given by the resolution of the homodyne detection apparatus-with the precision achievable in the estimation of ϕ. In particular, it is evident how the minimum resolution for the homodyne detector required to reach an uncertainty δϕ of order 1/N must be, in turn, of order 1/N. This is in agreement with the common notion in metrology for which a sensor cannot detect changes in the quantity that is being measured which are smaller than its resolution.
Interestingly, we notice from Equation (14) that the constant k cannot be equal to zero. Counterintuitively, the value k = 0 coincides with the choice of measuring the quadraturex γ ϕ +π/2 , namely the minimum-variance quadrature after the squeezed vacuum undergoes a phase-shift of magnitude γ ϕ , i.e., after the interferometric evolution given bŷ u ϕ in Equation (3). This apparent incongruity can be explained by observing the expression of ∂ ϕ σ 2 ϕ in Equation (10). Differently from displacement-encoding approaches-in which the information on the parameter is obtained from the transformation of the displacement of the probe, and thus minimizing the noise of the signal is always the optimal choicehere the value of ϕ is encoded in the variance of the quadrature itself. For us to be able to extract information on the parameter, the variance of the signal needs to be sensitive to small variations in ϕ-i.e., the derivative ∂ ϕ σ 2 ϕ must be non-vanishing. Of the two contributions of ∂ ϕ σ 2 ϕ in Equation (10), the one originating from the variations in P ϕ is identically vanishing when condition (15) is satisfied, since ∂ ϕ P ϕ 0 for P ϕ close to its maximum. The remaining contribution derives from the variations in the overall phase γ ϕ , and the variance of the maximally squeezed quadraturex γ ϕ +π/2 is a stationary point with respect to variations in the phase γ ϕ and is thus insensitive to ϕ, namely ∂ γ σ 2 ϕ = 0 for γ ϕ − θ = π/2 in Equation (11) (See Figure 3).
σ ϕ x θ x 0 Figure 3. Polar plot of the standard deviation σ ϕ (see Equation (6)) in blue, and of the Fisher information F (ϕ) in Equation (16) in orange as functions of the phase θ of the quadraturex θ measured, for P ϕ = 1. The large values of F (ϕ) are reached for θ, satisfying condition (14). Interestingly, for θ = γ ϕ ± π/2, namely when measuring the quadrature with minimum variance, σ ϕ reaches its minimum, but the Fisher information drops to zero: as a squeezing-encoding estimation scheme, this model relies on the information about ϕ inscribed in the variance of the quadrature measured.
On the other hand, the minimum variance is a stationary point as a function of ϕ, and thus is locally insensitive to the variations of the parameter. Reprinted with permission from ref. [34], © 2021 The Author(s).
Condition (15) is the requirement that most of the photons injected into the network end up in the observed output channel. In fact, this condition can be rewritten in terms of the average number of photons that are not correctly refocused N(1 − P ϕ ) ∼ . Thus, condition (15) tells us that the number of photons which are not observed must be a constant , not growing with N. In other words, this condition assures that most of the information on ϕ encoded in the probe is not lost in channels that are not observed. As a matter of fact, we can see from Equation (6) that the variance of the observed quadraturex θ after the interferometric evolution is the convex combination of the variances of a squeezed vacuum and the pure vacuum, with coefficients P ϕ and 1 − P ϕ ,respectively. In order for this variance to be 'squeezed', in the sense that it is of order 1/N, the contribution from the pure vacuum must be of order 1/N, namely 1 − P ϕ = O(1/N). This condition can also be seen as a requirement of the performance of the refocusing stageV out . In fact, in order to satisfy condition (15) for a given choice ofV in , the auxiliary stageV out must be chosen so that V out U ϕ V in 2 ∼ 1 − /N. As discussed earlier, this implies that, in general, the auxiliary stageV out which satisfies this condition depends on the value of the parameter itself, requiring an adaptive approach to find an optimal refocusing stage to reach the Heisenberg scaling. We show now that the information on ϕ required to engineer an adequate refocusing stage to reach the Heisenberg scaling can be obtained through a classical estimation strategy, namely that which achieves the scaling 1/ √ N typical of the shot-noise limit. This result is due to the structure of which is essentially a transition probability P = |v out · v in | 2 between the unitary vectors v in = U ϕ V in e 1 and v out = V † out e 1 , with e 1 = (1, 0, . . . , 0) T . Hence, a small tilt of order O(1/ √ N) between the unit vectors v in and v out yields a quadratic reduction in their transition probability. To prove this, we will call ϕ cl the rough guess of the value of ϕ that is sufficiently precise to engineer a refocusing stageV out which satisfy condition (15), and we will show that the estimation strategy to obtain this rough estimate of ϕ is classical, namely that the error δϕ = ϕ − ϕ cl associated with the prior rough estimation is allowed to be of order 1/ √ N. For a given choice ofV in and , we will callV out (ϕ cl ) a solution of Equation (15). The single-photon transition probability P ϕ appearing in this condition can be written as the squared complex modulus of the scalar product of two M-dimensional complex vectors U ϕ V in e 1 and V out (ϕ cl ) † e 1 , with e 1 = (1, 0, . . . , 0) T . We can then write where the transition probability η(ϕ, ϕ cl ) is a smooth function of ϕ and ϕ cl , with a locus of points of maxima along the condition ϕ cl = ϕ, since, for a perfect knowledge of the parameter the auxiliary stage,V out (ϕ cl = ϕ) would satisfy η(ϕ, ϕ) = 1. If the prior estimation ϕ cl slightly deviates from the real value of the parameter ϕ = ϕ cl + δϕ, we can write the expansion where the derivative of η(ϕ, ϕ cl ) is zero along the condition ϕ = ϕ cl . We can see, comparing Equations (23) and (15), that an error δϕ of order 1/ √ N suffices to correctly engineer a refocusing stageV out that allows for the Heisenberg scaling. It is then possible to conceive two-step ab initio protocols exploiting the model presented in this section: a first, coarse, classical estimation of the parameter ϕ is performed and the rough estimate ϕ cl is obtained, with an error δϕ = ϕ − ϕ cl = O(1/ √ N) of the same order of the shot-noise limit. Then, the classical information obtained on ϕ can be employed to engineer the refocusing stagê V out , onceV in is fixed, so that the overall network satisfies condition (15), allowing us to reach the Heisenberg scaling through the quantum strategy described in Section 2.1.
Lastly, we notice that, in order to satisfy condition (15), it is also possible to optimize the input auxiliary stageV in while arbitrarily fixing the refocusing stageV out . In such a case, identical considerations can be made regarding the possibility of a two-step protocol, since the optimizationV in still requires only a classical coarse estimation ϕ cl of the parameter. Interestingly, only one of the auxiliary stages needs to be optimized, and thus depends on ϕ cl , whether it isV in orV out . This leaves the choice of the second auxiliary stage completely arbitrary, notwithstanding that the pre-factor (∂ ϕ γ ϕ ) 2 appearing in the Fisher information in Equation (16) is not vanishing. Indeed, the condition (∂ ϕ γ ϕ ) 2 = 0 corresponds to the situation in which the optimized networkû ϕ =V out (ϕ cl )Û ϕVin acts trivially, namely without imprinting any information about ϕ, on the probe. Remarkably, it has been shown that, for a random choice of the non-optimized auxiliary network, sampled uniformly within the set of all the possible linear networks, the pre-factor (∂ ϕ γ ϕ ) 2 multiplying the scaling N 2 in the Fisher information is typically different from zero [34]. In other words, within certain non-restrictive regularity conditions and for linear networks with a large enough number of channels, the value of the pre-factor becomes essentially unaffected by the choice of the non-optimised auxiliary network. This important feature can be exploited for experimental applications, for example, employing the arbitrary non-optimised stage to manipulate the information encoded into the probe regarding the structure of a linear network with multiple unknown parameters, ultimately allowing the choice of functions of such parameters to be estimated at the Heisenberg scaling sensitivity [46].

A Two-Channel Network
In this section, we will apply our model for the estimation of distributed parameters to a particular example of a 2-channel network, in which the unknown parameter ϕ influences the reflectivity η ϕ of a beam-splitter and the magnitudes λ ϕ and λ ϕ of two phase-shifts (see Figure 4). We can think of the global parameter ϕ as an external physical property, such as the temperature or the magnitude of the electromagnetic field, affecting the components of the networkÛ ϕ . We will suppose that the functional dependence of the phase-shifts λ ϕ , λ ϕ and of the reflectivity η ϕ on the true value of the parameter ϕ are known and smooth, whether given by some law of nature or opportunely engineered. With reference to Figure 4, we can write the matrices representing the action of the beam-splitter and the phase-shifts as respectively, where σ i , i = 1, 2, 3, is the i-th Pauli matrix and I 2 is the 2 × 2 identity matrix, so that the networkÛ ϕ is represented by the matrix . Schematic diagram of the two-channel network described in Section 2.4. The linear network U ϕ is composed of a beam splitter with coefficient η ϕ and two phase-shifts of magnitudes λ ϕ and λ ϕ . The auxiliary stageV in at the input is ϕ-independent, while the output stageV out is optimized after a classical prior estimation ϕ cl of the parameter. In particular, the quantity α ϕ cl = (λ ϕ cl − λ ϕ cl )/2 − π/4 depends on ϕ cl only through the phase-shifts λ ϕ cl and λ ϕ cl . Reprinted with permission from ref. [42], © 2021 The Author(s).
We easily notice that (U ϕ ) 11 2 = cos 2 η ϕ , which, in general, is different from one and thus does not satisfy the condition (15), with the exception of the two values η ϕ = 0, π which correspond to the absence of the mixing between the two modes. As described in the model earlier, we then add two auxiliary stages V in and V out ≡ V out (ϕ cl ), of which only one depends on a prior coarse estimation ϕ cl of ϕ realised with a classical strategy, so that . In particular we choose as input stage and as output stage where α ϕ cl = (λ ϕ cl − λ ϕ cl )/2 − π/4 is a quantity which can be obtained through a classical where δλ = λ ϕ − λ ϕ cl and δλ = λ ϕ − λ ϕ cl are the error in the estimates of λ ϕ and λ ϕ due to the imprecision of the classical estimation ϕ cl . We can then easily see that P ϕ in Equation (28) satisfies condition (15), since both the errors δλ and δλ are of order , and similarly for δλ -and thus we obtain from Equation (28) In order to evaluate the Fisher information in Equation (16), we need to calculate both the phase acquired by the probe throughout the whole interferometric evolution γ ϕ , and the coefficient . The phase γ ϕ is easily obtained as the complex phase of (u ϕ ) 11 Since The transition probability P ϕ can then be written as so that the factor = h 2 (∂ ϕ λ ϕ − ∂ ϕ λ ϕ ) 2 /4 appearing in the Fisher information in Equation (16) is easily evaluated comparing Equations (15) and (31). The Fisher information can be obtained from Equation (16), with ∂ ϕ γ ϕ given by Equation (30), and (k, ) given by Equation (17), with k given by the condition on the local oscillator phase and = h 2 (∂ ϕ λ ϕ − ∂ ϕ λ ϕ ) 2 /4. We notice from the expression of α ϕ cl = (λ ϕ cl − λ ϕ cl )/2 − π/4 that the unknown reflectivity of the beam splitter η ϕ does not influence the refocusing stageV out (ϕ cl ), but it appears in the phase γ ϕ acquired by the probe in Equation (30). In particular, if the two phases λ ϕ and λ ϕ are vanishing, the dependence of α ϕ cl , and thus of the refocusing stagê V out (ϕ cl ), on the classical estimation ϕ cl of the parameter disappears completely. In other words, this network for λ ϕ = λ ϕ = 0 transforms the reflectivity η ϕ of a beam splitter into the magnitude of a phase shift, independently from η ϕ .

Quantum Estimation Based on Multi-Homodyne Measurements
In Section 2 we have presented a scheme for the estimation of a distributed parameter encoded in a multi-channel network, reaching the Heisenberg scaling employing a squeezed vacuum state and homodyne detection performed at a single output channel. In particular, in Section 2.2, we have discussed in depth about the conditions in Equations (14) and (15) which need to be satisfied to reach the Heisenberg scaling: Equation (14) imposes a minimum resolution in tuning the local oscillator phase, in order to infer the value of the parameter from the noise of a sufficiently squeezed quadrature. Equation (15) is a requirement on the refocusing of the probe into the only observed channel and, in order to be satisfied, a classical knowledge of the parameter is generally required to engineer the optimal refocusing network. A natural question that arises is whether it is possible to ease these conditions by carrying out some changes on our scheme. In particular, what would happen if homodyne measurements were performed, not only at a single channel, but at all the output ports of the network instead? Would condition (15) become looser, allowing us to engineer the optimal auxiliary stages with even less information on the unknown parameter? Moreover, as we have already discussed in Section 1, a non-vanishing displacement in the probe would make it possible to infer the value of the parameter directly from the average of the quadrature with minimum variance, and not from its noise, which may be a more feasible approach in particular experimental scenarios. In this section, we will investigate a model that implements these changes (see Figure 5). We will see that, with these assumptions, not only is there no need to perform a prior estimation of the parameter to optimize the network, but the Heisenberg scaling can be achieved without employing an auxiliary stage in the first place. Moreover, the presence of displacement in the probe will allow us to perform the estimation directly measuring the minimum-variance quadrature, relying on the information about the parameter encoded in the average signal of the homodyne, and not in its noise.   Figure 5. Scheme of the setup described in Section 3. A squeezed coherent state is injected in the first input port of a networkÛ ϕ which depends on a parameter ϕ that is generally distributed among multiple components of the network. Homodyne detection is performed at each of the output ports. Differently from the setup in Figure 2, no auxiliary stage is required to reach the Heisenberg scaling. Reprinted with permission from ref. [43], © 2022 The Author(s).

Setup
We will consider a linear and passive networkÛ ϕ , which depends on a single and generally distributed parameter, for example affecting several components of the network, as shown in Figure 1. Once again,Û ϕ admits a unitary matrix representation U ϕ obtained through Equation (1). Differently from Section 2 though, we consider as a probe the single-mode squeezed state |ψ 0 = |α, r ≡D(α)Ŝ(r)|vac , α = (α, 0, . . . , 0), r = (r, 0, . . . , 0), withD(α) = e α(â † 1 −â 1 ) , i.e., a squeezed coherent state with a mean number of photons N = N D + N S = α 2 + sinh 2 r, α, r > 0, where we can introduce the displacement d, given by α i = (d i + id i+M )/ √ 2, i = 1, . . . , M. We remark that the choice α, r > 0 is a specific (ϕindependent) condition which is required in displacement-encoding approaches: squeezing and displacement can, in general, have different complex phases, and the condition α, r > 0 assures that the squeezed quadrature (x π/2 for r > 0) is orthogonal-i.e., conjugated-to the displaced quadrature (x 0 for α > 0), and hence it is the most sensible to the presence of phases. Moreover, in our model, we will consider homodyne detection in all M output channels of the linear network, so that M quadrature fieldsx i,θ i are measured, where θ i is the phase of the i-th local oscillator, i = 1, . . . , M.
Since we are observing all the output ports of the network, but a non-vanishing number of photons is injected only in the first channel, only the first column of the unitary matrix U ϕ is relevant in our model, consisting of the transition amplitude of single photons from the first to every channel of the network (see Equation (1)). We can thus employ the parametrisation where P i is the probability that a single photon is transmitted through the linear network from the first to the i-th channel, and γ i is the phase that it would acquire (In order to keep the notation lighter, we are dropping the subscript ϕ in P i , γ i , µ and Σ. Nonetheless, it rests assured that these quantities, in general, depend on the unknown parameter). Once again, due to the Gaussian nature of the model, the (joint) probability distribution associated with the outcome x of the M homodyne detectors is also Gaussian and reads [18][19][20][21] In Equation (34), both the covariance matrix Σ and the mean µ depend on the parameter ϕ. The elements of the covariance matrix Σ are evaluated in Appendix A, and read where δ ij is the Kronecker delta,γ i = γ i − θ i is the phase acquired at the output of the i-th channel relative to the correspondent local oscillator, and µ is the mean vector (see The determinant det[Σ] can also be written in compact form (see Appendix A) For a multivariate Gaussian distribution of the form shown in Equation (34), the Fisher information can be easily evaluated from its definition in Equation (7) and employ the expression of the probability distribution p ϕ (x) in Equation (34) where C = det[Σ]Σ −1 is the cofactor matrix of Σ and Tr[A] = ∑ i A ii denotes the trace of the matrix A. Compared with the Fisher information shown in Equation (9) for the model described in Section 2, the Fisher information for this setup includes an additional term F D (ϕ) which depends on the derivative of the average µ with respect to the parameter ϕ. Moreover, the contribution F S (ϕ), representing the information on ϕ encoded in the covariance matrix Σ, can be split into two terms, of which the first resembles the Fisher information in Equation (9) once we perform the substitution σ 2 ϕ → det[Σ]: we will show in the following that, in the asymptotic regime of large N S , this is the only contribution of F S (ϕ) which, besides F D (ϕ), reaches the Heisenberg scaling.

Heisenberg Scaling
Similarly to what happens to the setup described in Section 2, the Fisher information in Equation (38) does not generally reach the Heisenberg scaling unless certain conditions are met. To show this, it is necessary to evaluate all the contributions of F (ϕ) in terms of the number of photons N D and N S in the asymptotic regime. First, it is convenient to express the cofactor matrix C explicitly in terms of the squeezing parameter r. In Appendix B, we see that a closed form for C in such terms exists where and Σ ij are shown in Equation (35). We then notice that, in order to analyse the generic asymptotic behaviour of the Fisher information in Equation (38), it suffices to separately examine the asymptotics of the terms µ, Σ ij , S sti , det[Σ], and of their derivatives. From Equations (35)-(37), (39) and (40) we see that Σ, S and det[Σ] are all of order N S in general, while µ is of order √ N D . The same asymptotic behaviours are also kept for their respective derivatives, since both ∂ ϕγi ≡ ∂ ϕ γ i and ∂ ϕ P i are independent of the probe, and thus of N D and N S . We can thus see from the expression of F (ϕ) in Equation (38) that its numerator grows at most with N 2 , while the denominator-i.e., the denominator det[Σ]in general grows with N S . Therefore, in order for F (ϕ) to reach the Heisenberg scaling, namely a scaling of order N 2 , some conditions must be imposed so that the denominators in F D (ϕ) and F S (ϕ) do not grow for large N S . In Appendix C, it can be seen that, to achieve the Heisenberg scaling, the determinant det[Σ] must be of the order N −1 , similarly to what happens to σ 2 ϕ in the single-homodyne setup in Section 2.2, and the conditions for this to occur areγ When these conditions hold, we can introduce the finite (possibly vanishing) quantities k i = lim N S →∞ N S (γ i ∓ π/2), so that the determinant det[Σ] reduces to (see Appendix C) while ∂ ϕ det[Σ], ∂ ϕ Σ, ∂ ϕ C and C tend to constant values, and ∂ ϕ µ scales as √ N D . We can easily see that this makes only the first two terms of F (ϕ) in Equation (38) dominant for large N D and N S . When conditions (41) are met, we can thus neglect the last term in Equation (38), and the Fisher information asymptotically reaches the Heisenberg scaling in N D N S and N 2 S (see Appendix C), where we introduced the quantities while ζ(x) = (16x 2 + 1) −1 , (x) = (8x) 2 /(16x 2 + 1) 2 (45) are positive, even functions which reach their maxima at x = ±1/4 and x = 0, respectively, namely (1/4) = 1 and ζ(0) = 1.
It is now possible to compare the Fisher information in Equation (43) achieved with the present scheme, with the Fisher information in Equation (16) obtained with the setup for distributed metrology employing a squeezed vacuum state and homodyne detection on a single channel. Since this setup employs a squeezed probe with a non-vanishing displacement, it lends itself to both displacement-based and squeezing-based estimation approaches. This is reflected by the presence of two separate contributions to the Fisher information in Equation (43): the first is given by the variations in the displacement µ, the second by the variations in the determinant det[Σ], with respect to changes in the value of ϕ. Both terms present a pre-factor (∂γ) 2 avg shown in Equation (44) which resembles the pre-factor (∂ ϕ γ ϕ ) 2 in Equation (16) for the single-homodyne counterpart: in particular, (∂γ) 2 avg is a weighted average of the sensitivities of the complex phases γ i to changes in the parameter ϕ, each one weighted by the 'fraction' P i of the probe undergoing the phase shift γ i . Indeed, we can see that (∂γ) 2 avg reduces to the single-homodyne counterpart when the whole probe is refocused into a single channel-say the first-so that P 1 = 1 and P i = 0 for i = 2, . . . , M, i.e., (∂γ) 2 avg = (∂ ϕ γ 1 ) 2 . In the term of the Fisher information associated with the squeezing-encoding in Equation (43), the factor (k avg ) coincides in turn with the function (k, ) in Equation (17) for = 0-i.e., the ideal case of perfect refocusing of the probe, and all photons being observed-and after the substitution k → k avg , namely the weighted average of the coefficients k i with the same weights P i . In other words, this term can be thought of as a generalisation of the Fisher information for a single homodyne due to the presence of multiple observed channels. Noticeably, this term can be set as equal to zero only for N S = 0, in which case, the whole expression in Equation (43) vanishes, ruining the Heisenberg scaling, or for k avg = 0. In fact, we can see from Equation (37) that the condition k avg = 0-i.e., γ i = ±π/2, namely when the quadratures with minimum variances are measured at each channel minimises det[Σ], which becomes a stationary point with respect to the variations in ϕ, and ∂ ϕ det[Σ] in Equation (43) vanishes. On the other hand, the first contribution to Equation (43) is instead a new term not present in the Fisher information for the single homodyne, deriving from the information on the parameter encoded in the displacement µ. This term achieves the Heisenberg scaling in N D N S , in the sense that it reaches the Heisenberg scaling in N = N D + N S if both N D and N S grow with N, i.e., if N D = βN and N S = (1 − β)N with 0 ≤ β < 1 independently of N. The function ζ(k avg ) in Equation (45) does not have roots, hence the first contribution to F vanishes only for N D = 0, i.e., for a squeezed vacuum as a probe.
Finally, we will now write the Likelihood function L(ϕ; x) for the setup described here, and discuss the maximum-likelihood estimatorφ MLE saturating the Cramér-Rao bound shown in Equation (8). After performing ν measurements of the field quadraturesx i,θ i , the Likelihood of the outcomes (x 1 , . . . , x ν ) is given by with p ϕ (x j ) found in Equation (34). In Appendix D, we show that the maximum-likelihood estimator, a non-trivial solution of the maximisation of the Likelihood function in Equation (46), is implicitly given by the estimatorφ MLE which satisfies the equation where Σ is the covariance matrix in Equation (35), and µ the mean vector in Equation (36). This equation cannot, in general, be solved analytically, hence numerical methods typically need to be in place to findφ MLE . On the other hand, Equation (47) simplifies for certain particular cases. For example, when measuring all the minimum-variance quadratures so thatγ i = ±π/2 in Equation (41), and the probabilities P i are independent of ϕ, we can see from Equation (35) that ∂ ϕ Σ = 0. In this case, the Likelihood Equation becomes where the term in the right-hand side can be seen as a linear combination of the quantities µ −μ, whereμ = ∑ ν j=1 x j /ν are estimators of the mean µ. We see here how the displacement of the probe allows us to perform the estimation of ϕ through the sample mean µ, i.e., the average of the outcomes of the homodyne measurements. On the other hand, for a squeezed vacuum such as a probe, µ = 0, so that Equation (47) becomes where it is possible to recognise the sample covariance matrixΣ = ∑ ν j=1 x j x T j /ν, estimator of the covariance matrix Σ.

Conditions for the Heisenberg Scaling
For the model introduced in this section, some considerations regarding the conditions in Equation (41) can also be drawn, especially in light of the feature of the single-homodyne, squeezed-vacuum scheme discussed in Section 2.3.
In fact, these conditions do nothing but fix the phases θ i of the quadraturesx i,θ i that need to be measured to achieve the Heisenberg scaling. Regarding the minimum resolution of the homodyne sensors, it appears that there is no evident advantage in this setup compared to the single-homodyne scheme, since the resolution required to tune the local oscillators of each channel is still of order 1/N. On the other hand, introducing displacement in the probe allows, as previously discussed, the value of the parameter to be inferred from the information inscribed in the average of the measured quadratures µ. Therefore, it is possible with this scheme to exactly measure the minimum-variance quadraturesx i,γ i ±π/2 , a possibility that was prevented in the previous scheme due to the requirement k = 0 in Equation (14). Although the situation for which k i = 0 for i = 1, . . . , M sets to zero the contribution F S (ϕ) in Equation (43), associated with the information encoded in the covariance matrix Σ, the first contribution F D (ϕ) of the Fisher information still reaches the Heisenberg scaling.
Another interesting feature of this protocol, which differentiates it from its singlehomodyne counterpart, is that it does not require any adaptation of the network: since every output channel is observed, no condition on the transition probabilities P i is required. Therefore, not only there is no requirement for a ϕ-dependent auxiliary stage, but there is no need for an auxiliary stage to reach the Heisenberg scaling in the first place . However, the precision in the estimation is still affected by the network through the terms k avg and (∂γ) avg , which appear in the constant factors multiplying the scaling in the Fisher information in Equation (38). In fact, we can see from Equation (45) that these two quantities depend on the transition probabilities P i and on the derivatives of the complex phases ∂ ϕ γ i . In particular, F (ϕ) can be vanishing for exceptionally poorly conceived networks, for which (∂γ) avg = 0: for example, a network for which γ i is independent of ϕ for all values of i such that P i = 0, is associated with a vanishing factor (∂γ) avg , as we can see from its definition in Equation (44). In this case, adding a ϕ-independent auxiliary network V, either at the input or at the output of U ϕ , would modify the transition amplitude of the overall interferometric evolution, and ultimately yield a non-vanishing value of (∂γ) avg .

Conclusions
The recent advances in quantum metrology made possible the realisation of protocols achieving super-sensitivity in the estimation of optical lengths, time delays and space-time distortions due to gravitational waves, refractive indices in given materials, density and thickness of biological samples up to the nanometer scale, temperatures, polarisations of light, magnitudes of fields and their gradients, and more. However, current quantum sensing technologies still present some limitations, whether caused by the requirement of adaptively optimising the estimation procedure to the unknown value of the parameter that needs to be measured, by the fragility of the metrological resources which yield supersensitivity, or by the scarce operating range of the protocols, impracticalities arise when tackling the metrological schemes with quantum mechanics. Moreover, most standard approaches to distributed quantum metrology suffer from a lack of universal estimation schemes, namely those which can operate independently of the unknown parameter nature and value, and of the unitary evolution which encodes the value of the parameter into the probe employed for the estimation.
In this work, we have reviewed in detail two schemes which address these limitations [42,43]. By employing analyses based on the Cramér-Rao bound, i.e., the ultimate precision achievable for a given estimation scheme, and on the Fisher information, we were able to assess the super-sensitivity of various feasible metrological setups, always achievable in the regime of large statistical samples through the maximum-likelihood estimator. We showed that, without making any assumptions on the structure of the multichannel passive and linear network encoding the unknown parameter, which can as well be distributed among multiple components of the interferometer (such as temperature affecting the network in a distributed manner) it is possible to relax the adaptivity of the network to a requirement of classical knowledge-i.e., not at the quantum limit-on the parameter. This prior knowledge serves to engineer an auxiliary stage needed to perform the super-sensitive estimation with a single squeezed-vacuum state and homodyne detection at a single output port. Moreover, the adaptivity on the network can actually be completely circumvented when all the output ports of the network are observed with homodyne detectors. We discussed how this allows us to conceive estimation schemes at the Heisenberg limit, which can be performed with a single optimization step, when a classical knowledge of the parameter is needed to engineer the auxiliary stage, or without any optimization at all, when no prior knowledge is required.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Probability Distributions from Homodyne Measurements
In this appendix, we will obtain the probability density functions which govern the outcomes of homodyne detections performed on a single-mode squeezed state |α, r =D(α)Ŝ(r)|vac injected in the first port of a M-channel passive and linear net-workÛ, with α = (α, 0, . . . , 0) = ( √ N D , 0, . . . , 0), and r = (r, 0, . . . , 0), so that N S = sinh 2 r, with α, r > 0, and N = N D + N S sum of the average number of photons in the displacement and in the squeezing of the state |α, r . We refer to refs. [18][19][20][21] for reviews on Gaussian states, and how to describe and manipulate them mathematically. The state |α, r is a Gaussian state which is completely described, through its Wigner distribution by its vector of displacement d = √ 2 (α, 0, . . . , 0), and its matrix of covariances Γ = diag(e 2r , 1 . . . , 1, e −2r , 1 . . . , 1)/2. Moreover, passive and linear networks preserve the Gaussian nature of |α, r . For this reason, we will first obtain in generality the final displacement d U and covariance matrix Γ U of the stateÛ|α, r . Then, we will marginalise the Wigner distribution, rotated by the symplectic and orthogonal matrix accordingly to the phases Θ = diag(θ) = diag(θ 1 , . . . , θ M ) of the local oscillators of the homodyne. In doing so, we will be able to obtain the expressions in Equations (6), and (35)- (37). The action of a passive and linear unitaryÛ on a state is described by a symplectic rotation of the state Wigner distribution [18][19][20][21]. The Wigner distribution W U (z) of the statê U|α, r after the interferometric evolutionÛ is thus obtained, rotating the initial Wigner distribution W(z) found in Equation (A1) with the orthogonal and symplectic matrix where U is the unitary matrix representing the transition amplitudes which can be found through Equation (1), thus obtaining the transformation Since the probe in our model is Gaussian, the transformation of the state is completely described by the rotations d U = Rd of the displacement and Γ U = RΓR T of the covariance matrix of the initial state, with Γ = diag(e 2R , e −2R )/2, R = diag(r, 0, . . . , 0), and d = √ 2 diag(α, 0, . . . , 0). A straightforward calculation shows that and that where we have defined the M × M matrices We can see that the only elements of U entering in Equation (A5) are the ones in the first column U i1 , since these are all the transition amplitudes from the only populated input port (namely the first) to every output port. We can easily see that the same is true for the matrices in Equation (A7). In fact, exploiting the relations where δ ij is the Kronecker delta, we can evaluate We can thus parametrise only the first column of U, introducing the quantities P i = |U i1 | 2 and γ i = arg U i1 for i = 1, . . . , M, which are respectively the transition probabilities from the first input to the i-th output port, and the complex phases of the relevant transition amplitudes, so that U i1 = √ P i e iγ i . We can then rewrite the displacement d in Equation (A5) and the covariance matrix Γ in Equation (A6) in terms of P i and γ i and ∆X 2 ij = 1 2 δ ij + P i P j cos γ i − γ j sinh 2 r + cos γ i + γ j sinh r cosh r (A11a) Finally, we notice that if we are performing homodyne to measure the quadratureŝ x i,θ i , where θ = (θ 1 , . . . , θ M ) are the phases of the local oscillators at each output channel, this corresponds to apply a further unitary rotation U(θ) = diag(e −iθ 1 , . . . , e −iθ M ) on the state U|α, r . This corresponds to substituting U → U(θ)U in the expression of R in Equation (A3), whose effect is to add a phase e −iθ i to every elements of the i-th row of U. Thus, if we defineγ i = γ i − θ i , we can write the displacement d θ and the covariance matrix Γ θ of the (accessory) stateÛ(θ)Û|α, d and If we are performing homodyne detection only on a single channel, say the first, as it is done, for example, in Section 2, we need to first trace away the M − 1 channels we are not observing. The probability density function p(x) is thus given by the marginal of the Wigner distribution of the stateÛ(θ)Û|α, r associated with the first symplectic variable x 1 and, equivalently, a Gaussian distribution with average µ given by the first element of d θ in Equation (A12) and variance σ 2 given by the first row, first column element of Γ θ in Equation (A13) σ 2 = 1 2 + P 1 (sinh 2 r + cos 2γ 1 sinh r cosh r).
For a squeezed vacuum state, α = 0 and we recognise the expression of the variance in Equation (6) from Equation (A16).
If, instead, we are observing all M output channels of the network U, the probability density function p(x) will be a Gaussian distribution with average µ given by the first M elements of d θ from which we recognise Equation (36), while the covariance matrix Σ is given by the first M rows and columns of Γ θ , i.e., by the matrix in Equation (A14a) Σ ij = 1 2 δ ij + P i P j cos γ i −γ j sinh 2 r + cos γ i +γ j sinh r cosh r , from which we obtain the expression in Equation (35).
To evaluate the determinant |Σ| in Equation (37), we first notice that the covariance matrix Γ of the initial state |α, r can be written as the sum Γ ≡ I 2M /2 + K 0 , where I is the 2M × 2M identity matrix and K 0 = Γ − I 2M /2 is a diagonal matrix of rank 2. Since the rank is invariant under orthogonal rotations, the same holds true for Γ θ = RΓR T ≡ I 2M + K, with rank(K) = 2. By definition of rank, none of the sub-matrices of K can have rank greater than 2, hence we can write with rank(A) ≤ 2, since A is a submatrix of K A ij = P i P j cos γ i −γ j sinh 2 r + cos γ i +γ j sinh r cosh r (A20) We can thus apply the results in Appendix E to Σ and write |Σ| = |I M /2 + A| as a sum of determinants of the matrices obtained replacing any number of columns of I M /2, with the respective columns of A where the first term is the determinant of I M /2, the terms in the first summation are determinants of the matrices obtained by substituting the i-th column of I M /2 with the i-th column of A, and the terms in the last summation from substituting the i-th and j-th columns, while also exploiting the symmetry of A. Noticeably, since rank A ≤ 2, all the contributions involving the replacement of three or more columns of A are vanishing for the definition of rank of a matrix. Replacing the expression of A found in Equation (A20) into Equation (A21), the expression in Equation (37) can then be easily obtained after some elementary trigonometry.
Fisher information thus reduces to the evaluation of the expectation values of a polynomial in (x − µ), for which we can exploit standard results of Gaussian integrals: from which we see that only even powers of the term (x − µ) have a non-vanishing contribution. We can thus evaluate By substituting the expressions in Equations (A24) into Equation (A25), and exploiting the Jacobi's formula for the derivative of the determinant of square matrices which assures that we see that most of the terms in Equation (A25) cancel out, yielding the Fisher information We can easily reduce Equation (A27) to the Fisher information in Equation (9) for the single-homodyne case with µ = 0.
To obtain the FIM in Equation (38), we first exploit the relation and thus we express the covariance matrix Σ −1 in terms of its cofactor matrix, i.e., , where we have exploited the symmetry of Σ so that C T = C. We can thus rewrite, in the single-parameter case We can recognise in the second term of Equation (A29) the Jacobi's formula for the derivative of the determinant in Equation (A26), which allows us to obtain the expression for the Fisher information shown in Equation (38) In order to elicit the cofactor matrix C in terms of the elements of Σ and thus obtain Equation (39), we first need to make some observations. First, the (s, t)-cofactor C st is defined as the determinant of the L − 1 × L − 1 sub-matrix of Σ obtained by deleting its s-th row and t-th column, then multiplied by (−1) s+t . This can be equivalently thought of as the determinant of the L × L matrix Σ [s,t],1 , where we denote with X [s,t],n the matrix obtained from the matrix X replacing all its elements in the the s-th row and in the p-th column with zeros, with the exception of the element (s, t) which is replaced by n, namely For the particular case of the covariance matrix Σ in Equation (35), as discussed in Appendix A, we can write Σ = I M /2 + A, with A symmetric matrix such that rank(A) = ρ ≤ 2 given in Equation (A20). Thus, we can write the (s, t)-cofactor as We notice that the matrix (I M /2) [s,s],0 is a diagonal matrix with a single zero eigenvaluei.e., the s-th-and thus each contribution to C ss is non-vanishing only if the s-th columns of (I M /2) [s,s],0 -a column of only 0s-is replaced. Moreover, the s-th column of A [s,s],1 is also a column of all 0s, with the exception of the s-th element, which is equal to 1. We thus obtain which is the sum of terms obtained substituting the s-th, the s-th and i-th, and the s-th, i-th and j-th columns, respectively. Noticeably, replacing more than 3 columns yields vanishing contributions, since rank(A [s,t],1 ) ≤ 3. When s = t, (I M /2) [s,t],0 is still a diagonal matrix but with two zeros in the diagonal, more precisely the s-th an the t-th elements. Both the s-th and the t-th columns are thus columns of zeros, and all non-vanishing contributions to C st must replace these two columns. For example, the only contribution obtained by swapping the s-th and t-th columns is of the type exploiting at the same time the symmetry of A. The contributions obtained by swapping the s-th, t-th and i-th columns, with i = s, t, are of the type where once again, we are exploiting the symmetry of A. Once again, substituting more than 3 columns yields no contributions, since rank(A [s,t],1 ) ≤ 3. The final expression for C st , s = t thus reads Replacing in (A33) and (A36) the definition of A = Σ − I M /2 in Equation (A32), it is straightforward to obtain Equation (39).

Appendix C. Asymptotic Analyses of Gaussian Metrology
In this appendix, we will perform all the asymptotic analyses for a large number of photons N in the setups presented in Sections 2 and 3.

Appendix C.1. Single Homodyne
Here, we evaluate the asymptotic expressions of the Fisher information in Equation (16), showing that the conditions in Equations (14) and (15) yield the Heisenberg scaling for the estimation of the respective quantities of interest.
As shown in Equations (10) and (11), the dependence of the variance on the parameters ϕ only appears through the transition probability P ϕ and the acquired complex phase γ ϕ where ∂ P and ∂ f represent the differentiation with respect to P ϕ and γ ϕ , and To achieve the Heisenberg scaling, some conditions must be imposed so that the variance in Equation (A37) does not grow with N. The only option to do that without ruining the sensitivity of the setup is requiring that γ ϕ − θ π/2, as we can see from Equation (A37). In particular, we impose condition (14), and evaluate the variance in (A37) and its gradient (A38) in the large N limit Then, by imposing condition (15) on P ϕ , we get By substituting the expressions in Equation (A42) into the Fisher information in Equation (9), we can finally evaluate its asymptotic expression with (k, ) = 8k 16k 2 + 1 + 4 2 (A44) a positive and N-independent pre-factor.

Appendix C.2. Multiple Homodyne
In this appendix, we will study the asymptotic regime of the Fisher information in Equation (38), for large N = N S + N D = sinh 2 r + α 2 . We will show that conditions (41) are necessary to reach the Heisenberg scaling, and that, when they hold, the asymptotic expression for the Fisher information is the one shown in Equation (43). As discussed in Section 3.2, all terms in the numerators of the Fisher information in Equation (A45) are at most of the order N 2 S or N S N D . We thus need to study the asymptotic behaviour of the determinant det [Σ] in Equation (37), which is at the denominators of the FI, and find what conditions prevent det[Σ] from growing with N S as well, ruining the Heisenberg scaling. In particular, in order to reach the Heisenberg scaling, namely a scaling of the order of N 2 in the FI, the determinant det[Σ] must be at most of the order O(1).
Thus, we first focus our attention on det [Σ] in Equation (37) det In particular, we will suppose that the asymptotic behaviour of the complex phasesγ i for large N S is given byγ i =γ 0i + k i N −α S , with k i ∈ R independent of N and α > 0 (Notice that this condition is not restrictive, since ifγ i were to grow with N (i.e., α < 0), it would yield an oscillating asymptotic behaviour of det[Σ] and every other term appearing in the FI). By expanding in powers of N S the terms in which the squeezing parameter r appears in (A46) we obtain where To prevent the scaling with N S of the determinant det[Σ], D 1 must be set as equal to, or tending to, zero. After some trigonometry, and exploiting the fact that ∑ i P i = 1 due to the unitarity of the network, we can rewrite which tends to zero only if both terms in the parenthesis tend to zero, i.e., when γ 0i = π/2 + nπ, with n ∈ Z. To analyse the order for large N S of the determinant when the conditionγ 0i = π/2 + nπ holds-and thusγ i = π/2 + nπ + k i N −α S -we need to analyse the behaviour of each term in Equation (A48). Since cos(2γ i ) ∼ −1 + 2k i N −2α S , and , both D 1 and D 2 scale with N −2α S , while D 3 is asymptotically constant. Thus, we find the asymptotic behaviour of det[Σ] through Equation (A47): the term D 2 is always negligible with respect to D 1 N S , while for α ≤ 1 the term D 1 N S dominates over D 3 /N S so that det[Σ] is of order N 1−2α S , and for α > 1 instead D 3 /N S dominates D 1 N S and the determinant scales with N −1 S . In equations: Noticeably, the determinant stops growing with N S only for α ≥ 1/2. We thus now study the asymptotic behaviour of the numerators appearing in the Fisher information in Equation (A45), when the conditionγ i = π/2 + k i N −α S , with α ≥ 1/2, is true. We will perform the same analysis done for det[Σ] earlier, considering only the dominating term for every value of α. We first obtain the derivative of Σ from Equation (35), substitutingγ i = π/2 + k i /N α S , with α ≥ 1/2, and we notice that each element of Σ scales at most with N 1−α S for 1/2 ≤ α < 1, while it becomes constant for α ≥ 1. We then analyse the auxiliary term (40) of which we evaluate the derivative, always with the conditionsγ i = π/2 + k i /N α S , with α ≥ 1/2 which scales at most with N 1−α S for large N S . Moreover, the covariance matrix Σ in (35) asymptotically reads which is in general constant asymptotically for large N S . Inserting Equations (A52) and (A54) in the cofactor matrix (39), we obtain the term F D (ϕ) reaches the Heisenberg scaling for every value of α ≥ 1, which means that the choiceγ i = π/2-corresponding to α = +∞-still yields the Heisenberg scaling. On the other hand, α must be equal to 1 in order for the first term of F S (ϕ) to achieve the Heisenberg scaling. The consequences and the physical implications of this difference are discussed in depth in Section 3.2. The conditions to reach the Heisenberg scaling can thus be condensed into the request on the complex phasesγ for i = 1, . . . , M, as shown in Equation (41), or equivalently thatγ i π/2 + k i /N S asymptotically, with k i ∈ R independent of N S . Moreover, we have seen that the only relevant terms under this condition are the first two in Equation (A45), and that only for α = 1 the first term of F S (ϕ) is non-vanishing.
The maximum-likelihood estimator is obtained maximising the Likelihood function in Equation (A65)-or its logarithm. Applying the maximisation to the likelihood function in Equation (A65), we obtain the equation 0 = ∂ ϕ ln L(ϕ; x 1 , . . . , x ν ) where we exploited Jacobi's formula for the derivative of the determinant of a matrix shown in Equation (A26), the identity in Equation (A28), and the symmetry of Σ. Equation (A66) is the expression of the Likelihood Equation (47). This can be easily simplified to the case of univariate Gaussian distribution for M = 1 and absence of average µ, for which Σ → σ 2 and x j → x j , obtaining as shown in Equation (19).

Appendix E. Formulas for the Determinant of a Sum of Two Matrices
Let us consider an L × L matrix Z of the form Z = D + W, where D = diag(d 1 , . . . , d L ) is a diagonal matrix, and rank(W) = ρ ≤ L. In this appendix, we will show a convenient method to express the determinant det[Z] in terms of the elements of W, for the evaluation of the determinant in Equation (37) and of the cofactor matrix in Equations (39).
We exploit the identity [47] det where Θ α (X, Y) is the sum of the determinants of the matrices obtained by replacing any set of α columns (rows) of X with the α columns (rows) of Y at the same position. For example, for two 3 × 3 matrices the quantity Θ α (D, W) is given, for α = 1, by while for α = 2 Since rank(W) = ρ, the determinant of any α × α sub-matrix of W is zero if α > ρ, so we can write Let us now make explicit the first, easier terms of the summation. In particular, with the goal of applying this expression to the covariance matrix Σ in Equation (35) and to the cofactor matrix C in Equation (39), which can both be expressed as the sum of a diagonal matrix and a rank-two and rank-three matrix (see Equations (A19), (A20) and (A32)), we will explicitly report the first three terms Θ 0 (X, Y), Θ 1 (X, Y) and Θ 2 (X, Y).
For α = 0, no columns are replaced from D, hence Θ 0 (D, W) = det[D] = ∏ k d k . We also notice that, if at least one of the eigenvalues of D is zero, this term vanishes.
For α = 1, Θ 1 (D, W) is the sum of determinants of matrices of the form with i = 1, . . . , L. The determinant of a matrix of the type in Equation (A73) is straightforward, and reduces to W ii × ∏ k =i d k . Thus, in general, Θ 1 (D, W) = ∑ i W ii × ∏ k =i d k .
A key observation to make, in view of the application to the covariance matrix Σ and cofactor matrix C, is that if two or more eigenvalues of D are zero, all these determinants are zero, and thus Θ 1 (D, W) = 0; if, instead, a single eigenvalue is zero, say d j = 0, only one of these determinants of Θ 1 (D, W) is non-vanishing, namely the determinant of the matrix obtained by replacing exactly the j-th column of D. In this case, for d j = 0, then we have Θ 1 (D, W) = W jj × ∏ k =j d k .
Let us now consider lastly the case α = 2. The matrices contributing to Θ 2 (D, W) are of the form with i < i = 1, . . . , L. Once again, the determinants of the matrices of the type in Equation (A74) are easy to be evaluated, and read det W (i,i ) ∏ k ∈{i,i } d k , where We notice that det W (i,i ) = det W (i ,i) . Thus, we can rewrite the terms Θ 2 (D, W) as the sum Θ 2 (D, W) = ∑ i ∑ i >1 det W (i,i ) ∏ k ∈{i,i } d k . Once again, key observations can be made: if the diagonal matrix D has at least three null eigenvalues, then Θ 2 (D, W) is vanishing. If only two eigenvalues are zero, e.g., d j = d j = 0, with j = j , then there is only one non-vanishing contribution to Θ 2 (D, W), given by the matrix obtained by substituting the j-th and j -th columns of D, and in this case we have Θ 2 (D, W) = det W (j,j ) ∏ k ∈{j,j } d k . If only one eigenvalue is zero, namely d j = 0, then Θ 2 (D, W) is given by the sum of all the determinants of the matrices where the j-th column of D has been replaced, namely Θ 2 (D, W) = ∑ i =j det W (i,j) ∏ k =i,j d k .
It is then possible to extend with similar considerations the same method for every value of α, eventually obtaining the compact expression for the determinant in Equation (A68) where C L α is the set of all the combinations without repetitions of α columns out of L, W (γ) denotes the α × α sub-matrix of W obtained by selecting the rows and columns with indices γ 1 , . . . , γ α .