Frequentist and Bayesian Quantum Phase Estimation

Frequentist and Bayesian phase estimation strategies lead to conceptually different results on the state of knowledge about the true value of an unknown parameter. We compare the two frameworks and their sensitivity bounds to the estimation of an interferometric phase shift limited by quantum noise, considering both the cases of a fixed and a fluctuating parameter. We point out that frequentist precision bounds, such as the Cramér–Rao bound, for instance, do not apply to Bayesian strategies and vice versa. In particular, we show that the Bayesian variance can overcome the frequentist Cramér–Rao bound, which appears to be a paradoxical result if the conceptual difference between the two approaches are overlooked. Similarly, bounds for fluctuating parameters make no statement about the estimation of a fixed parameter.


Introduction
The estimation of a phase shift using interferometric techniques is at the core of metrology and sensing [1][2][3]. Applications range from the definition of the standard of time [4] to the detection of gravitational waves [5,6]. The general problem can be concisely stated as the search for optimal strategies to minimize the phase estimation uncertainty. The noise that limits the achievable phase sensitivity can have a "classical" or a "quantum" nature. Classical noise originates from the coupling of the interferometer with some external source of disturbance, like seismic vibrations, parasitic magnetic fields or from incoherent interactions within the interferometer. Such noise can, in principle, be arbitrarily reduced, e.g., by shielding the interferometer from external noise or by tuning interaction parameters to ensure a fully coherent time evolution. The second source of uncertainty has an irreducible quantum origin [7,8]. Quantum noise cannot be fully suppressed, even in the idealized case of the creation and manipulation of pure quantum states. Using classically-correlated probe states, it is possible to reach the so-called shot noise or standard quantum limit, which is the limiting factor for the current generation of interferometers and sensors [9][10][11][12]. Strategies involving probe states characterized by squeezed quadratures [13] or entanglement between particles [14][15][16][17][18][19] are able to overcome the shot noise, the ultimate quantum bound being the so-called Heisenberg limit. Quantum noise reduction in phase estimation has been demonstrated in several proof-of-principle experiments with atoms and photons [20,21].
In the limit of a large number of repeated measurements, the sensitivity reached by the frequentist and Bayesian methods generally agree: this fact has very often induced the belief that the two paradigms can be interchangeably used in the phase estimation theory without acknowledging their irreconcilable nature. Overlooking these differences is not only conceptually inconsistent but can even create paradoxes, as, for instance, the existence of ultimate bounds in sensitivity proven in one paradigm that can be violated in the other.
In this manuscript, we directly compare the frequentist and the Bayesian parameter estimation theory. We study different sensitivity bounds obtained in the two frameworks and highlight the conceptual differences between the two. Besides the asymptotic regime of many repeated measurements, we also study bounds that are relevant for small samples. In particular, we show that the Bayesian variance can overcome the frequentist Cramér-Rao bound. The Cramér-Rao bound is a mathematical theorem providing the highest possible sensitivity in a phase estimation problem. The fact that the Bayesian sensitivity can be higher than the Cramér-Rao bound is therefore paradoxical. The paradox is solved by clarifying the conceptual differences between the frequentist and the Bayesian approaches, which therefore cannot be directly compared. Such difference should be considered when discussing theoretical and experimental figures of merit in interferometric phase estimation.
Our results are illustrated with a simple test model [37,38]. We consider N qubits with basis states |0 and |1 , initially prepared in a (generalized) GHZ state |GHZ = (|0 ⊗N + |1 ⊗N )/ √ 2, with all particles being either in |1 or in |0 . The phase-encoding is a rotation of each qubit in the Bloch sphere |0 → e −iθ/2 |0 and |1 → e +iθ/2 |1 , which transforms the |GHZ state into |GHZ(θ) = (e −iNθ/2 |0 ⊗N + e +iNθ/2 |1 ⊗N )/ √ 2. The phase is estimated by measuring the parity (−1) N 0 , where N 0 is the number of particles in the state |0 [37,[39][40][41]. The parity measurement has two possible results µ = ±1 that are conditioned by the "true value of the phase shift" θ 0 with probability p(±1|θ 0 ) = (1 ± cos (Nθ 0 ))/2. The probability to observe the sequence of results µ = {µ 1 , µ 2 , . . . , µ m } in m independent repetitions of the experiment (with same probe state and phase encoding transformation) is where m ± is the number of the observed results ±1, respectively. Notice that p(µ|θ 0 ) is the conditional probability for the measurement outcome µ, given that the true value of the phase shift is θ 0 (which we consider to be unknown in the estimation protocol). Equation (1) provides the probability that will be used in the following sections for the case N = 2 and θ 0 ∈ [0, π/2]. Sections 2 and 3 deal with the case where θ 0 has a fixed value and in Section 4 we discuss precision bounds for a fluctuating phase shift.

Frequentist Approach
In the frequentist paradigm, the phase (assumed having a fixed but unknown value θ 0 ) is estimated via an arbitrarily chosen function of the measurement results, θ est (µ), called the estimator. Typically, θ est (µ) is chosen by maximizing the likelihood of the observed data (see below). The estimator, being a function of random outcomes, is itself a random variable. It is characterized by a statistical distribution that has an objective, measurable character. The relative frequency with which the event θ est occurs converges to a probability asymptotically with the number of repeated experimental trials.

Frequentist Risk Functions
Statistical fluctuations of the data reflect the statistical uncertainty of the estimation. This is quantified by the variance, around the mean value θ est µ|θ 0 = ∑ µ θ est (µ)p(µ|θ 0 ), the sum extending over all possible measurement sequences (for fixed θ 0 and m). An important class is that of locally unbiased estimators, namely those satisfying θ est µ|θ 0 = θ 0 and d θ est µ|θ dθ θ=θ 0 = 1 (see, for instance, [42]). An estimator is unbiased if and only if it is locally unbiased at every θ 0 .
The quality of the estimator can also be quantified by the mean square error (MSE) [23] giving the deviation of θ est from the true value of the phase shift θ 0 . It is related to Equation (2) by the relation In the frequentist approach, often the variance is not considered as a proper way to quantify the goodness of an estimator. For instance, an estimator that always gives the same value independently of the measurement outcomes is strongly biased: it has zero variance but a large MSE that does not scale with the number of repeated measurements. Notice that the MSE cannot be accessed from the experimentally available data since the true value θ 0 is unknown. In this sense, only the fluctuations of θ est around its mean value, i.e., the variance (∆ 2 θ est ) µ|θ 0 , have experimental relevance. For unbiased estimators, Equations (2) and (4) coincide. In general, since the bias term in Equation (4) is never negative, MSE(θ est ) µ|θ 0 ≥ ∆ 2 θ est µ|θ 0 and any lower bound on (∆ 2 θ est ) µ|θ 0 automatically provides a lower bound on MSE(θ est ) µ|θ 0 but not vice versa. In the following section, we therefore limit our attention to bounds on (∆ 2 θ est ) µ|θ 0 . The distinction between the two quantities becomes more important in the case of a fluctuating phase shift θ 0 , where the bias can affect the corresponding bounds in different ways. We will see this explicitly in Section 4.

Barankin Bound
The Barankin bound (BB) provides the tightest lower bound to the variance (2) [43]. It can be proven to be always (for any m) saturable, in principle, by a specific local (i.e., dependent of θ 0 ) estimator and measurement observable. Of course, since the estimator that saturates the BB depends on the true value of the parameter (which is unknown), the bound is of not much use in practice. Nevertheless, the BB plays a central role, from the theoretical point of view, as it provides a hierarchy of weaker bounds which can be used in practice with estimators that are asymptotically unbiased. The BB can be written as [44] where L(µ|θ i , θ) = p(µ|θ i )/p(µ|θ) is generally indicated as likelihood ratio and the supremum is taken over n parameters a i ∈ R, which are arbitrary real numbers, and θ i , which are arbitrary phase values in the parameter domain. For unbiased estimators, we can replace θ est µ|θ i = θ i for all i and the BB becomes independent of the estimator: A derivation of the BB is presented in Appendix A. The explicit calculation of ∆ 2 θ BB is impractical in most applications due to the number of free variables that must be optimized. However, the BB provides a strict hierarchy of bounds of increasing complexity that can be of great practical importance. Restricting the number of variables in the optimization can provide local lower bounds that are much simpler to determine at the expense of not being saturable in general, namely, for an arbitrary number of measurements. Below, we demonstrate the following hierarchy of bounds: where ∆ 2 θ CRLB is the Cramér-Rao lower bound (CRLB) [45,46] and ∆ 2 θ ChRB is the Hammersley-Chapman-Robbins bound (ChRB) [47,48]. We will also introduce a novel extended version of the ChRB, indicated as ∆ 2 θ EChRB .

Cramér-Rao Lower Bound and Maximum Likelihood Estimator
The CRLB is the most common frequentist bound in parameter estimation. It is given by [45,46]: The inequality ∆ 2 θ est µ|θ 0 ≥ ∆ 2 θ CRLB is obtained by differentiating θ est µ|θ 0 with respect to θ 0 and using a Cauchy-Schwarz inequality: where we have used ∑ µ dp(µ|θ 0 ) dθ 0 is the Fisher information. The equality ∆ 2 θ est µ|θ 0 = ∆ 2 θ CRLB is achieved if and only if with λ θ 0 a parameter independent of µ (while it may depend on θ 0 ). Noticing that , the CRLB can be straightforwardly generalized to any function f (θ 0 ) independent of µ. In particular, choosing f (θ 0 ) = θ 0 , we can directly prove that MSE(θ est ) µ|θ 0 ≥ ∆ 2 θ CRLB , which also depends on the bias. Asymptotically in m, the saturation of Equation (8) is obtained for the maximum likelihood estimator (MLE) [22,23,49]. This is the value θ MLE (µ) that maximizes the likelihood p(µ|θ 0 ) (as a function of the parameter θ 0 ) for the observed measurement sequence µ, For a sufficiently large sample size m (in the central limit), independently of the probability distribution p(µ|θ 0 ), the MLE becomes normally distributed [18,22,23,49]: with mean given by the true value θ 0 and variance equal to the inverse of the Fisher information.
The MLE is well defined provided that there is a unique maximum in the considered phase interval.

Hammersley-Chapman-Robbins Bound
The ChRB is obtained from Equation (5) by taking n = 2, a 1 = 1, a 2 = −1, θ 1 = θ 0 + λ, θ 2 = θ 0 , and can be written as [47,48] Clearly, restricting the number of parameters in the optimization in Equation (5) leads to a less strict bound. We thus have ∆ 2 θ BB ≥ ∆ 2 θ ChRB . For unbiased estimators, we obtain Furthermore, the supremum over λ on the right side of Equation (14) is always larger or equal to its limit λ → 0: provided that the derivatives on the right-hand side exist. We thus recover the CRLB as a limiting case of the ChRB. The ChRB is always stricter than the CRLB and we obtain the last inequality in the chain (7). Notice that the CRLB requires the probability distribution p(µ|θ 0 ) to be differentiable [24]-a condition that can be dropped for the ChRB and the more general BB. Even if the distribution is regular, the above derivation shows that the ChRB, and more generally the BB, provide tighter error bounds than the CRLB. With increasing n, the BB becomes tighter and tighter and the CRLB represents the weakest bound in this hierarchy, which can be observed in Figure 2a. Next, we determine a stricter bound in this hierarchy.

Extended Hammersley-Chapman-Robbins Bound
We obtain the extended Hammersley-Chapman-Robbins bound (EChRB) as a special case of Equation (5), by taking n = 3, a 1 = 1, a 2 = A, a 3 = −1, θ 1 = θ 0 + λ 1 , θ 2 = θ 0 + λ 2 , and θ 3 = θ 0 , giving where the supremum is taken over all possible λ 1 , λ 2 ∈ N and A ∈ R. Since the ChRB is obtained from Equation (17) in the specific case A = 0, we have that ∆ 2 θ EChRB ≥ ∆ 2 θ ChRB . For unbiased estimators, we obtain In Figure 2a, we compare the different bounds for unbiased estimators and for the example considered in the manuscript: the CRLB (black line), the ChRB (filled triangles) and the EChRB (empty triangles), satisfying the chain of inequalities (7). In Figure 2b, we show the values of λ in Equation (15) for which the supremum is achieved in our case.

Bayesian Approach
The Bayesian approach makes use of the Bayes-Laplace theorem, which can be very simply stated and proved. The joint probability of two stochastic variables µ and θ is symmetric: where p(θ) and p(µ) are the marginal distributions, obtained by integrating the joint probability over one of the two variables, while p(µ|θ) and p(θ|µ) are conditional distributions.
We recall that in a phase inference problem, the set of measurement results µ is generated by a fixed and unknown value θ 0 according to the likelihood p(µ|θ 0 ). In the Bayesian approach to the estimation of θ 0 , one introduces a random variable θ and uses the Bayes-Laplace theorem to define the conditional probability The posterior probability p post (θ|µ) provides a degree of belief, or plausibility, that θ 0 = θ (i.e., that θ is the true value of the phase), in the light of the measurement data µ [50]. In Equation (19), the prior distribution p pri (θ) expresses the a priori state of knowledge on θ, p(µ|θ) is the likelihood that is determined by the quantum mechanical measurement postulate, e.g., as in Equation (1), and the marginal probability p mar (µ) = b a dθ p(θ, µ) is obtained through the normalization for the posterior, where a and b are boundaries of the phase domain. The posterior probability p post (θ|µ) describes the current knowledge about the random variable θ based on the available information, i.e., the measurement results µ.

Noninformative Prior
In the Bayesian approach, the information on θ provided by the posterior probability always depends on the prior distribution p pri (θ). It is possible to account for the available a priori information on θ by choosing a prior distribution accordingly. However, if no a priori information is available, it is not obvious how to choose a "noninformative" prior [51]. The flat prior p pri (θ) = const was first introduced by Laplace to express the absence of information on θ [51]. However, this prior would not be flat for other functions of θ and, in the complete absence of a priori information, it seems unreasonable that some information is available for different parametrizations of the problem. To see this, recall that a transformation of variables requires that p pri (ϕ) is the Fisher information (10), remains invariant under re-parametrization. For arbitrary transformations ϕ = f (θ), the Fisher information obeys the transformation property and we perform the change of variable ϕ = f (θ), then the transformation property of the Fisher information ensures that p pri (ϕ)= p pri (θ)|d f −1 (ϕ)/dϕ| ∝ F(ϕ). Notice that, as in our case, the Fisher information F(θ) may actually be independent of θ. In this case, the invariance property does not imply that Jeffreys prior is flat for arbitrary re-parametrizations ϕ = f (θ), instead, F(ϕ) = |d f −1 (ϕ)/dϕ|.

Posterior Bounds
From the posterior probability (19), we can provide an estimate θ BL (µ) of θ 0 . This can be the maximum a posteriori, θ BL (µ) = arg max θ p post (θ|µ), which coincides with the maximum likelihood Equation (12) when the prior is flat, p pri (θ) = const, or the mean of the distribution, With the Bayesian approach, it is possible to provide a confidence interval around the estimator, given an arbitrary measurement sequence µ, even with a single measurement. The variance (20) can be taken as a measure of fluctuation of our degree of belief around θ BL (µ). There is no such concept in the frequentist paradigm. The Bayesian posterior variance ∆ 2 θ BL (µ) θ|µ and the frequentist variance (∆ 2 θ BL ) µ|θ 0 have entirely different operational meanings. Equation (20) provides a degree of plausibility that θ BL (µ) = θ 0 , given the measurement results µ. There is no notion of bias in this case. On the other hand, the quantity (∆ 2 θ BL ) µ|θ 0 measures the statistical fluctuations of θ BL (µ) when repeating the sequence of m measurements infinitely many times.

Ghosh Bound
In the following, we derive a lower bound to Equation (20) first introduced by Ghosh [ The above bound is a function of the specific measurement sequence µ and depends on b a dθ 1 p post (θ|µ) dp post (θ|µ) dθ 2 that we can identify as a "Fisher information of the posterior distribution".

The Ghosh bound is saturated if and only if
where λ µ does not depend on θ while it may depend on µ.

Average Posterior Bounds
While Equation (20) depends on the specific µ, it is natural to consider its average over all possible measurement sequences at fixed θ 0 and m, weighted by the likelihood p(µ|θ 0 ): which we indicate as average Bayesian posterior variance, where p(θ, µ|θ 0 ) = p post (θ|µ)p(µ|θ 0 ). We would be tempted to compare the average posterior sensitivity (∆ 2 θ BL ) µ,θ|θ 0 to the frequentist Cramér-Rao bound ∆ 2 θ CRLB . However, because of the different operational meanings of the frequentist and the Bayesian paradigms, there is no reason for Equation (24) to fulfill the Cramér-Rao bound: indeed, it does not, as we show below.

Numerical Comparison of Bayesian and Frequentist Phase Estimation
In the numerical calculations shown in Figure 3, we consider a Bayesian estimator given by where I 0 (α) is the modified Bessel function of the first kind. This choice of prior distribution can continuously turn from a peaked function to a flat one when changing α, while being differentiable in the full phase interval. The more negative is α, the more p pri (θ) broadens in [0, π/2]. In particular, in the limit α → −∞, the prior approaches the flat distribution, which in our case coincides with Jeffreys prior since the Fisher information is independent of θ. In the limit α = 0, the prior is given by lim α→0 p pri (θ) = 4 sin(2θ) 2 /π. For positive values of α, the larger α, the more peaked is p pri (θ) around θ 0 = π/4. In particular p pri (θ) ≈ e −4α(θ−π/4) 2 / √ π/4α for α 1. Equation (26) is normalized to one for θ ∈ [0, π 2 ]. In the inset of the different panels of Figure 3, we plot p pri (θ) for α = −100 [panel (a)], α = −10 (b), α = 1 (c) and α = 10 (d).
Asymptotically in the number of measurements m, the Ghosh bound as well as its likelihood average converge to the Cramér-Rao bound. Indeed, it is well known that in this limit the posterior probability becomes a Gaussian centered at the true value of the phase shift and with variance given by the inverse of the Fisher information, a result known as Laplace-Bernstein-von Mises theorem [18,23,55]. By replacing Equation (27) into Equation (22), we recover a posterior variance given by 1/ mF(θ 0 ) .

Bounds for Random Parameters
In this section, we derive bounds of phase sensitivity obtained when θ 0 is a random variable distributed according to p(θ 0 ). Operationally, this corresponds to the situation where θ 0 remains fixed (but unknown) when collecting a single sequence of m measurements µ. In between measurement sequences, θ 0 fluctuates according to p(θ 0 ).

Van Trees Bound
It is possible to derive a general lower bound on the mean square error (29) based on the following assumptions: 1.
∂p(µ,θ 0 ) ∂θ 0 and ∂ 2 p(µ,θ 0 ) ∂θ 2 0 are absolutely integrable with respect to µ and θ 0 ; Multiplying ξ(θ 0 ) by p(θ 0 ) and differentiating with respect to θ 0 , we have Integrating over θ 0 in the range of [a, b] and considering the above properties, we find Finally, using the Cauchy-Schwarz inequality, we arrive at MSE(θ est ) µ,θ 0 ≥ ∆ 2 θ VTB , where is generally indicated as Van Trees bound [24,56,57]. The equality holds if and only if where λ does not depend on θ 0 and µ. It is easy to show that where the first term is the Fisher information F(θ 0 ), defined by Equation (10), averaged over p(θ 0 ), and the second term can be interpreted as a Fisher information of the prior [24]. Asymptotically in the number of measurements m and for regular distributions p(θ 0 ), the first term in Equation (34) dominates over the second one.

Ziv-Zakai Bound
A further bound on MSE(θ est ) µ,θ 0 can be derived by mapping the phase estimation problem to a continuous series of binary hypothesis testing problems. A detailed derivation of the Ziv-Zakai bound [24,58,59] is provided in Appendix B. The final result reads MSE(θ est ) µ,θ 0 ≥ ∆ 2 θ ZZB , where and is the minimum error probability of the binary hypothesis testing problem. This bound has been adopted for quantum phase estimation in Ref. [26]. To this end, the probability P min (θ 0 , θ 0 + h) can be maximized over all possible quantum measurements, which leads to the trace distance [7]. As the optimal measurement may depend on θ 0 and h, the bound (35), which involves integration over all values of θ 0 and h, is usually not saturable. We remark that the trace distance also defines a saturable frequentist bound for a different risk function than the variance [60].

Van Trees Bound for the Average Estimator Variance
We can derive a general lower bound for the variance (28) by following the derivation of the Van Trees bound, which was discussed in Section 4.2.1. In contrast to the standard Van Trees bound for the mean square error, here the bias enters explicitly. Defining ξ(θ 0 ) = ∑ µ θ est (µ) − θ est µ|θ 0 p(µ|θ 0 ) and assuming the same requirements as in the derivation of the Van Trees bound for the MSE, we arrive at Finally, a Cauchy-Schwarz inequality gives (∆ 2 θ est ) µ,θ 0 ≥ ∆ 2 θ fVTB , where with equality if and only if where λ is independent of θ 0 and µ. We can compare Equation (38) with the average CRLB Equation (37). We find b a dθ 0 where in the first step we use Jensen's inequality, and the second step follows from Equation (34) We thus arrive at which is valid for generic estimators.

Bayesian Bounds
In Equation (41), the prior used to define the posterior p post (θ|µ) via the Bayes-Laplace theorem is arbitrary. In general, such a prior p pri (θ) is different from the statistical distribution of θ 0 , which can be unknown. If p(θ 0 ) is known, then one can use it as a prior in the Bayesian posterior probability, i.e., p pri (θ) = p(θ 0 ). In this specific case, we have p mar (µ) = p(µ), and thus p post (θ|µ)p(µ) = p post (θ|µ)p mar (µ) = p(µ, θ). In other words, for this specific choice of prior, the physical joint probability p(µ, θ 0 ) of random variables θ 0 and µ coincides with the Bayesian p(µ, θ). Equation (41) thus simplifies to Notice that this expression is mathematically equivalent to the frequentist average mean square error (29) if we replace θ with θ 0 and θ BL (µ) with θ est (µ). This means that precision bounds for Equation (29), e.g., the Van Trees and Ziv-Zakai bounds can also be applied to Equation (43). These bounds are indeed often referred to as "Bayesian bounds" (see Ref. [24]). We emphasize that the average over the marginal distribution p mar (µ), which connects Equations (24) and (43), has operational meaning if we consider that θ 0 is a random variable distributed according to p(θ 0 ), and p(θ) is used as prior in the Bayes-Laplace theorem to define a posterior distribution. In this case, and under the condition f (µ, a, b) = 0 (for instance if the prior distribution vanishes at the borders of the phase domain), using Jensen's inequality, we find which coincides with the Van Trees bound discussed above. We thus find that the averaged Ghosh bound for random parameters (42) is sharper than the Van Trees bound (38): which is also confirmed by the numerical data shown in Figure 4.
In Figure 4, we compare ∆ 2 θ BL µ,θ with the various bounds discussed in this section. As p(θ 0 ), we consider the same prior (26) used in Figure 3. We observe that all bounds approach the Van Trees bound with increasing sharpness of the prior distribution. Asymptotically in the number of measurements m, all bounds converge to the Cramér-Rao bound.

Discussion and Conclusions
In this manuscript, we have clarified the differences between frequentist and Bayesian approaches to phase estimation. The two paradigms provide statistical results that have a different conceptual meaning and cannot be compared. We have also reviewed and discussed phase sensitivity bounds in the frequentist and Bayesian frameworks, when the true value of the phase shift θ 0 is fixed or fluctuates. These bounds are summarized in Table 1.
In the frequentist approach, for a fixed θ 0 , the phase sensitivity is determined from the width of the probability distribution of the estimator. The physical content of the distribution is that, when repeating the estimation protocol, the obtained θ est (µ) will fall, with a certain confidence, in an interval around the mean value θ est µ|θ 0 (e.g., 68% of the times within a 2(∆θ est ) µ|θ 0 interval for a Gaussian distribution) that, for unbiased estimators, coincides with the true value of the phase shift.
In the Bayesian case, the posterior p post (θ|µ) provides a degree of plausibility that the phase shift θ equals the interferometer phase θ 0 when the data µ was obtained. This allows the Bayesian approach to provide statistical information for any number of measurements, even a single one. To be sure, this is not a sign of failure or superiority of one approach with respect to the other one, since the two frameworks manipulate conceptually different quantities. The experimentalist can choose to use one or both approaches, keeping in mind the necessity to clearly state the nature of the statistical significance of the reported results.

Paradigm Risk Function
Bounds Remarks (5) hierarchy of bounds, Equation (7) EChRB Equation (17) MSE(θ est ) µ|θ 0 ChRB Equation (14) CRLB Equation (8) Bayesian (37) hierarchy of bounds, Equation (40)  fVTB Equation (38) MSE(θ est ) µ,θ 0 VTB Equation (32) bounds are independent of the bias ZZB Equation (35) Bayesian (∆ 2 θ BL ) µ,θ,θ 0 aGBr Equation (42) prior p pri (θ) and fluctuations p(θ 0 ) arbitrary (∆ 2 θ BL ) µ,θ VTB Equation (32) prior p pri (θ) and fluctuations p(θ 0 ) coincide ZZB Equation (35) hierarchy of bounds, Equation (45) The two predictions converge asymptotically in the limit of a large number of measurements. This does not mean that in this limit the significance of the two approaches is interchangeable (it cannot be stated that in the limit of large repetition of the measurements, frequentist ad Bayesian provide the same results). In this respect, it is quite instructive to notice that the Bayesian 2σ confidence may be below that of the Cramér-Rao bound, as shown in Figure 3. This, at first sight, seems paradoxical, since the CRLB is a theorem about the minimum error achievable in parameter estimation theory. However, the CRLB is a frequentist bound and, again, the paradox is solved taking it into account that the frequentist and the Bayesian approaches provide information about different quantities.
Finally, a different class of estimation problems with different precision bounds is encountered if θ 0 is itself a random variable. In this case, the frequentist bounds for the mean-square error (Van Trees, Ziv-Zakai) become independent of the bias, while those on the estimator variance are still functions of the bias. The Van Trees and Ziv-Zakai bounds can be applied to the Bayesian paradigm if the average of the posterior variance over the marginal distribution is the relevant risk function. This is only meaningful if the prior p pri (θ) that enters the Bayes-Laplace theorem coincides with the actual distribution p(θ 0 ) of the phase shift θ 0 . We conclude with a remark regarding the so-called Heisenberg limit, which is a saturable lower bound on the CRLB over arbitrary quantum states with a fixed number of particles. For instance, for a collection of N two-level systems, the CRLB can be further bounded by ∆θ est ≥ 1/ mF(θ 0 ) ≥ 1/ √ mN [18,20]. This bound is often called the ultimate precision bound since no quantum state is able to achieve a tighter scaling than N. From the discussions presented in this article, it becomes apparent that Bayesian approaches (as discussed in Section 3) or precision bounds for random parameters (Section 4) are expected to lead to entirely different types of 'ultimate' lower bounds. Such bounds are interesting within the respective paradigm for which they are derived, but they cannot replace or improve the Heisenberg limit since they address fundamentally different scenarios that cannot be compared in general.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Derivation of the Barankin Bound
Let θ est be an arbitrary estimator for θ. Its mean value coincides with θ if and only if the estimator is unbiased (for arbitrary values of θ). In the following, we make no assumption about the bias of θ est and therefore do not replace θ est µ|θ by θ.
Introducing the likelihood ratio under the condition p(µ|θ 0 ) > 0 for all µ, we obtain with Equation (A1) that for an arbitrary family of phase values θ 1 , . . . , θ n picked from the parameter domain. Furthermore, we have for all θ i . Multiplying both sides of Equation (A4) with θ est µ|θ 0 and subtracting it from (A3) yields Let us now pick a family of n finite coefficients a 1 , . . . , a n . From Equation (A5), we obtain The Cauchy-Schwarz inequality now yields where is the variance of the estimator θ est . We thus obtain for all n, a i , and θ i . The Barankin bound then follows by taking the supremum over these variables.

Appendix B. Derivation of the Ziv-Zakai Bound
Derivations of the Ziv-Zakai bound can be found in the literature (see, for instance, Refs. [24,58,59]). This Appendix follows these derivations closely and provides additional background, which may be useful for readers less familiar with the field of hypothesis testing.
Let X ∈ [0, a] be a random variable with probability density p(x). We can formally write p(x) = −dP(X ≥ x)/dx, where P(X ≥ x) ≡ a x p(y)dy is the probability that X is larger or equal than x. We obtain from integration by parts where we assume that a is finite [if a → ∞ the above relation holds when lim a→∞ a 2 P(X ≥ a) = 0]. Finally, we can formally extend the above integral up to ∞ since P(X ≥ a) = 0: Following Ref. [59], we now take = θ est (µ) − θ 0 and X = | |. We thus have We express the probability as Next, we replace θ 0 with θ 0 + h in the second integral: We now take a closer look at the expression within the angular brackets and interpret it in the framework of hypothesis testing. Suppose that we try to discriminate between the two cases θ 0 = ϕ (hypothesis 1, denoted H 1 ) and θ 0 = ϕ + h (denoted H 2 ). We decide between the two hypothesis H 1 and H 2 on the basis of the measurement result x using the estimator θ est (x). One possible strategy consists in choosing the hypothesis whose value is closest to the obtained estimator. Hence, if θ est (x) ≤ ϕ + h/2, we assume H 1 to be correct and, otherwise, if θ est (x) > ϕ + h/2, we pick H 2 .
Let us now determine the probability to make an erroneous decision using this strategy. There are two scenarios that will lead to a mistake. First, our strategy fails whenever θ est (x) ≤ ϕ + h/2 when θ 0 = ϕ + h. In this case, H 2 is true, but our strategy leads us to choose H 1 . The probability for this to happen, given that θ 0 = ϕ + h, is P(θ est (x) − ϕ ≤ h 2 |θ 0 = ϕ + h). To obtain the probability error of our strategy, we need to multiply this with the probability with which θ 0 assumes the value ϕ + h, which is given by p(H 2 ) = p(ϕ+h) p(ϕ)+p(ϕ+h) . Second, our strategy also fails if θ est (x) > ϕ + h/2 for θ 0 = ϕ. This occurs with the conditional probability P(θ est (x) − ϕ > h 2 |θ 0 = ϕ), and θ 0 = ϕ with probability p(H 1 ) = p(ϕ) p(ϕ)+p(ϕ+h) . The total probability to make a mistake is consequently given by and we can rewrite Equation (A13) as The strategy described above depends on the estimator θ est and may not be optimal. In general, a binary hypothesis testing strategy can be characterized in terms of the separation of the possible values of x into the two disjoint subsets X 1 and X 2 which are used to choose hypothesis H 1 or H 2 , respectively. That is, if x ∈ X 1 we pick H 1 and otherwise H 2 . Since one of the two hypotheses must be true, we have 1 = p(H 1 ) + p(H 2 ) = X 1 dxp(x|H 1 )p(H 1 ) + where the error made by such a strategy is given by P X 1 err (H 1 , H 2 ) = P(x ∈ X 2 |H 1 )p(H 1 ) + P(x ∈ X 1 |H 2 )p(H 2 ) = X 2 p(x|H 1 )p(H 1 )dx + X 1 p(x|H 2 )p(H 2 )dx (A16) This probability is minimized if p(x|H 2 )p(H 2 ) < p(x|H 1 )p(H 1 ) for x ∈ X 1 and, consequently, p(x|H 2 )p(H 2 ) ≥ p(x|H 1 )p(H 1 ) for x ∈ X 2 . This actually identifies an optimal strategy for hypothesis testing, known as the likelihood ratio test: if the likelihood ratio p(x|H 1 )/p(x|H 2 ) is larger than the threshold value p(H 2 )/p(H 1 ), This result represents a lower bound on P X 1 err (ϕ, ϕ + h) for arbitrary choices of X 1 . This includes the case discussed in Equation (A13). Thus, using P err (ϕ, ϕ + h) ≥ P min (ϕ, ϕ + h) (A19) in Equation (A14) and inserting back into Equation (A12), we finally obtain the Ziv-Zakai bound for the mean square error: hdh dθ 0 (p(θ 0 ) + p(θ 0 + h))P min (θ 0 , θ 0 + h).
This bound can be further sharpened by introducing a valley-filling function [61], which is not considered here.