Frequentist and Bayesian Quantum Phase Estimation

Li, Yan; Pezzè, Luca; Gessner, Manuel; Ren, Zhihong; Li, Weidong; Smerzi, Augusto

doi:10.3390/e20090628

Open AccessArticle

Frequentist and Bayesian Quantum Phase Estimation

¹

Institute of Theoretical Physics and Department of Physics, State Key Laboratory of Quantum Optics and Quantum Optics Devices, Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan 030006, China

²

QSTAR, INO-CNR and LENS, Largo Enrico Fermi 2, 50125 Firenze, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2018, 20(9), 628; https://doi.org/10.3390/e20090628

Submission received: 7 July 2018 / Revised: 6 August 2018 / Accepted: 10 August 2018 / Published: 23 August 2018

(This article belongs to the Special Issue Advances in Quantum Metrology)

Download

Browse Figures

Versions Notes

Abstract

:

Frequentist and Bayesian phase estimation strategies lead to conceptually different results on the state of knowledge about the true value of an unknown parameter. We compare the two frameworks and their sensitivity bounds to the estimation of an interferometric phase shift limited by quantum noise, considering both the cases of a fixed and a fluctuating parameter. We point out that frequentist precision bounds, such as the Cramér–Rao bound, for instance, do not apply to Bayesian strategies and vice versa. In particular, we show that the Bayesian variance can overcome the frequentist Cramér–Rao bound, which appears to be a paradoxical result if the conceptual difference between the two approaches are overlooked. Similarly, bounds for fluctuating parameters make no statement about the estimation of a fixed parameter.

Keywords:

quantum metrology; Bayesian estimation; parameter estimation

1. Introduction

The estimation of a phase shift using interferometric techniques is at the core of metrology and sensing [1,2,3]. Applications range from the definition of the standard of time [4] to the detection of gravitational waves [5,6]. The general problem can be concisely stated as the search for optimal strategies to minimize the phase estimation uncertainty. The noise that limits the achievable phase sensitivity can have a “classical” or a “quantum” nature. Classical noise originates from the coupling of the interferometer with some external source of disturbance, like seismic vibrations, parasitic magnetic fields or from incoherent interactions within the interferometer. Such noise can, in principle, be arbitrarily reduced, e.g., by shielding the interferometer from external noise or by tuning interaction parameters to ensure a fully coherent time evolution. The second source of uncertainty has an irreducible quantum origin [7,8]. Quantum noise cannot be fully suppressed, even in the idealized case of the creation and manipulation of pure quantum states. Using classically-correlated probe states, it is possible to reach the so-called shot noise or standard quantum limit, which is the limiting factor for the current generation of interferometers and sensors [9,10,11,12]. Strategies involving probe states characterized by squeezed quadratures [13] or entanglement between particles [14,15,16,17,18,19] are able to overcome the shot noise, the ultimate quantum bound being the so-called Heisenberg limit. Quantum noise reduction in phase estimation has been demonstrated in several proof-of-principle experiments with atoms and photons [20,21].

There is a vast amount of literature dealing with the parameter estimation problem that has been mostly developed following two different approaches [22,23,24]: frequentist and Bayesian. Both approaches have been investigated in the context of quantum phase estimation [18,20,25,26,27,28,29,30,31] and implemented/tested experimentally [32,33,34,35,36]. They build on conceptually different meanings attached to the word “probability” and their respective results provide conceptually different information on the estimated parameters and their uncertainties.

In the limit of a large number of repeated measurements, the sensitivity reached by the frequentist and Bayesian methods generally agree: this fact has very often induced the belief that the two paradigms can be interchangeably used in the phase estimation theory without acknowledging their irreconcilable nature. Overlooking these differences is not only conceptually inconsistent but can even create paradoxes, as, for instance, the existence of ultimate bounds in sensitivity proven in one paradigm that can be violated in the other.

In this manuscript, we directly compare the frequentist and the Bayesian parameter estimation theory. We study different sensitivity bounds obtained in the two frameworks and highlight the conceptual differences between the two. Besides the asymptotic regime of many repeated measurements, we also study bounds that are relevant for small samples. In particular, we show that the Bayesian variance can overcome the frequentist Cramér–Rao bound. The Cramér–Rao bound is a mathematical theorem providing the highest possible sensitivity in a phase estimation problem. The fact that the Bayesian sensitivity can be higher than the Cramér–Rao bound is therefore paradoxical. The paradox is solved by clarifying the conceptual differences between the frequentist and the Bayesian approaches, which therefore cannot be directly compared. Such difference should be considered when discussing theoretical and experimental figures of merit in interferometric phase estimation.

Our results are illustrated with a simple test model [37,38]. We consider N qubits with basis states

| 0 〉

and

| 1 〉

, initially prepared in a (generalized) GHZ state

| GHZ 〉 = {(| 0 〉}^{\otimes N} + {| 1 〉}^{\otimes N}) / \sqrt{2}

, with all particles being either in

| 1 〉

or in

| 0 〉

. The phase-encoding is a rotation of each qubit in the Bloch sphere

| 0 〉 \to e^{- i θ / 2} | 0 〉

and

| 1 〉 \to e^{+ i θ / 2} | 1 〉

, which transforms the

| GHZ 〉

state into

| GHZ (θ) 〉 = {(e^{- i N θ / 2} | 0 〉}^{\otimes N} + e^{+ i N θ / 2} {| 1 〉}^{\otimes N}) / \sqrt{2}

. The phase is estimated by measuring the parity

{(- 1)}^{N_{0}}

, where

N_{0}

is the number of particles in the state

| 0 〉

[37,39,40,41]. The parity measurement has two possible results

μ = \pm 1

that are conditioned by the “true value of the phase shift”

θ_{0}

with probability

p (\pm 1 | θ_{0}) = (1 \pm \cos (N θ_{0})) / 2

. The probability to observe the sequence of results

μ = {μ_{1}, μ_{2}, \dots, μ_{m}}

in m independent repetitions of the experiment (with same probe state and phase encoding transformation) is

p (μ | θ_{0}) = \prod_{i = 1}^{m} p (μ_{i} | θ_{0}) = {(\frac{1 + \cos (N θ_{0})}{2})}^{m_{+}} {(\frac{1 - \cos (N θ_{0})}{2})}^{m_{-}},

(1)

where

m_{\pm}

is the number of the observed results

\pm 1

, respectively. Notice that

p (μ | θ_{0})

is the conditional probability for the measurement outcome

μ

, given that the true value of the phase shift is

θ_{0}

(which we consider to be unknown in the estimation protocol). Equation (1) provides the probability that will be used in the following sections for the case

N = 2

and

θ_{0} \in [0, π / 2]

. Section 2 and Section 3 deal with the case where

θ_{0}

has a fixed value and in Section 4 we discuss precision bounds for a fluctuating phase shift.

2. Frequentist Approach

In the frequentist paradigm, the phase (assumed having a fixed but unknown value

θ_{0}

) is estimated via an arbitrarily chosen function of the measurement results,

θ_{est} (μ)

, called the estimator. Typically,

θ_{est} (μ)

is chosen by maximizing the likelihood of the observed data (see below). The estimator, being a function of random outcomes, is itself a random variable. It is characterized by a statistical distribution that has an objective, measurable character. The relative frequency with which the event

θ_{est}

occurs converges to a probability asymptotically with the number of repeated experimental trials.

2.1. Frequentist Risk Functions

Statistical fluctuations of the data reflect the statistical uncertainty of the estimation. This is quantified by the variance,

{(Δ^{2} θ_{est})}_{μ | θ_{0}} = \sum_{μ} {(θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2} p (μ | θ_{0}),

(2)

around the mean value

{〈 θ_{est} 〉}_{μ | θ_{0}} = \sum_{μ} θ_{est} (μ) p (μ | θ_{0})

, the sum extending over all possible measurement sequences (for fixed

θ_{0}

and m). An important class is that of locally unbiased estimators, namely those satisfying

{〈 θ_{est} 〉}_{μ | θ_{0}} = θ_{0}

and

\frac{d {〈 θ_{est} 〉}_{μ | θ}}{d θ} |_{θ = θ_{0}} = 1

(see, for instance, [42]). An estimator is unbiased if and only if it is locally unbiased at every

θ_{0}

.

The quality of the estimator can also be quantified by the mean square error (MSE) [23]

MSE {(θ_{est})}_{μ | θ_{0}} = \sum_{μ} {(θ_{est} (μ) - θ_{0})}^{2} p (μ | θ_{0}),

(3)

giving the deviation of

θ_{est}

from the true value of the phase shift

θ_{0}

. It is related to Equation (2) by the relation

MSE {(θ_{est})}_{μ | θ_{0}} = {(Δ^{2} θ_{est})}_{μ | θ_{0}} + {({〈 θ_{est} 〉}_{μ | θ_{0}} - θ_{0})}^{2} .

(4)

In the frequentist approach, often the variance is not considered as a proper way to quantify the goodness of an estimator. For instance, an estimator that always gives the same value independently of the measurement outcomes is strongly biased: it has zero variance but a large MSE that does not scale with the number of repeated measurements. Notice that the MSE cannot be accessed from the experimentally available data since the true value

θ_{0}

is unknown. In this sense, only the fluctuations of

θ_{est}

around its mean value, i.e., the variance

{(Δ^{2} θ_{est})}_{μ | θ_{0}}

, have experimental relevance. For unbiased estimators, Equations (2) and (4) coincide. In general, since the bias term in Equation (4) is never negative,

MSE {(θ_{est})}_{μ | θ_{0}} \geq {(Δ^{2} θ_{est})}_{μ | θ_{0}}

and any lower bound on

{(Δ^{2} θ_{est})}_{μ | θ_{0}}

automatically provides a lower bound on

MSE {(θ_{est})}_{μ | θ_{0}}

but not vice versa. In the following section, we therefore limit our attention to bounds on

{(Δ^{2} θ_{est})}_{μ | θ_{0}}

. The distinction between the two quantities becomes more important in the case of a fluctuating phase shift

θ_{0}

, where the bias can affect the corresponding bounds in different ways. We will see this explicitly in Section 4.

2.2. Frequentist Bounds on Phase Sensitivity

2.2.1. Barankin Bound

The Barankin bound (BB) provides the tightest lower bound to the variance (2) [43]. It can be proven to be always (for any m) saturable, in principle, by a specific local (i.e., dependent of

θ_{0}

) estimator and measurement observable. Of course, since the estimator that saturates the BB depends on the true value of the parameter (which is unknown), the bound is of not much use in practice. Nevertheless, the BB plays a central role, from the theoretical point of view, as it provides a hierarchy of weaker bounds which can be used in practice with estimators that are asymptotically unbiased. The BB can be written as [44]

{(Δ^{2} θ_{est})}_{μ | θ_{0}} \geq Δ^{2} θ_{BB} \equiv \sup_{θ_{i}, a_{i}, n} \frac{{\{\sum_{i = 1}^{n} a_{i} [{〈 θ_{est} 〉}_{μ | θ_{i}} - {〈 θ_{est} 〉}_{μ | θ_{0}}]\}}^{2}}{\sum_{μ} {[\sum_{i = 1}^{n} a_{i} L (μ | θ_{i}, θ_{0})]}^{2} p (μ | θ_{0})},

(5)

where

L (μ | θ_{i}, θ) = p (μ | θ_{i}) / p (μ | θ)

is generally indicated as likelihood ratio and the supremum is taken over n parameters

a_{i} \in R

, which are arbitrary real numbers, and

θ_{i}

, which are arbitrary phase values in the parameter domain. For unbiased estimators, we can replace

{〈 θ_{est} 〉}_{μ | θ_{i}} = θ_{i}

for all i and the BB becomes independent of the estimator:

{(Δ^{2} θ_{est})}_{μ | θ_{0}} \geq Δ^{2} θ_{BB}^{ub} \equiv \sup_{θ_{i}, a_{i}, n} \frac{{\{\sum_{i = 1}^{n} a_{i} [θ_{i} - θ_{0}]\}}^{2}}{\sum_{μ} {[\sum_{i = 1}^{n} a_{i} L (μ | θ_{i}, θ_{0})]}^{2} p (μ | θ_{0})} .

(6)

A derivation of the BB is presented in Appendix A.

The explicit calculation of

Δ^{2} θ_{BB}

is impractical in most applications due to the number of free variables that must be optimized. However, the BB provides a strict hierarchy of bounds of increasing complexity that can be of great practical importance. Restricting the number of variables in the optimization can provide local lower bounds that are much simpler to determine at the expense of not being saturable in general, namely, for an arbitrary number of measurements. Below, we demonstrate the following hierarchy of bounds:

{(Δ^{2} θ_{est})}_{μ | θ_{0}} \geq Δ^{2} θ_{BB} \geq Δ^{2} θ_{EChRB} \geq Δ^{2} θ_{ChRB} \geq Δ^{2} θ_{CRLB},

(7)

where

Δ^{2} θ_{CRLB}

is the Cramér–Rao lower bound (CRLB) [45,46] and

Δ^{2} θ_{ChRB}

is the Hammersley–Chapman–Robbins bound (ChRB) [47,48]. We will also introduce a novel extended version of the ChRB, indicated as

Δ^{2} θ_{EChRB}

.

2.2.2. Cramér–Rao Lower Bound and Maximum Likelihood Estimator

The CRLB is the most common frequentist bound in parameter estimation. It is given by [45,46]:

Δ^{2} θ_{CRLB} = \frac{{(\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}})}^{2}}{m F (θ_{0})} .

(8)

The inequality

{(Δ^{2} θ_{est})}_{μ | θ_{0}} \geq Δ^{2} θ_{CRLB}

is obtained by differentiating

{〈 θ_{est} 〉}_{μ | θ_{0}}

with respect to

θ_{0}

and using a Cauchy–Schwarz inequality:

{(\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}})}^{2} = {(\sum_{μ} (θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}}) \frac{d p (μ | θ_{0})}{d θ_{0}})}^{2} \leq m F (θ_{0}) {(Δ^{2} θ_{est})}_{μ | θ_{0}},

(9)

where we have used

\sum_{μ} \frac{d p (μ | θ_{0})}{d θ_{0}} = 0

and

\sum_{μ} \frac{1}{p (μ | θ_{0})} {(\frac{\partial p (μ | θ)}{\partial θ} |_{θ_{0}})}^{2} = m \sum_{μ} \frac{1}{p (μ | θ_{0})} {(\frac{\partial p (μ | θ)}{\partial θ} |_{θ_{0}})}^{2}

valid for m independent measurements, and

F (θ_{0}) = \sum_{μ} \frac{1}{p (μ | θ_{0})} {(\frac{\partial p (μ | θ)}{\partial θ} |_{θ_{0}})}^{2}

(10)

is the Fisher information. The equality

{(Δ^{2} θ_{est})}_{μ | θ_{0}} = Δ^{2} θ_{CRLB}

is achieved if and only if

θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}} = λ_{θ_{0}} \frac{d \log p (μ | θ_{0})}{d θ_{0}},

(11)

with

λ_{θ_{0}}

a parameter independent of

μ

(while it may depend on

θ_{0}

). Noticing that

\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}} = \sum_{μ} (θ_{est} (μ) - f (θ_{0})) \frac{d p (μ | θ_{0})}{d θ_{0}}

, the CRLB can be straightforwardly generalized to any function

f (θ_{0})

independent of

μ

. In particular, choosing

f (θ_{0}) = θ_{0}

, we can directly prove that

MSE {(θ_{est})}_{μ | θ_{0}} \geq Δ^{2} θ_{CRLB}

, which also depends on the bias.

Asymptotically in m, the saturation of Equation (8) is obtained for the maximum likelihood estimator (MLE) [22,23,49]. This is the value

θ_{MLE} (μ)

that maximizes the likelihood

p (μ | θ_{0})

(as a function of the parameter

θ_{0}

) for the observed measurement sequence

μ

,

θ_{MLE} (μ) \equiv \arg \max_{θ_{0}} {p (μ | θ_{0})} .

(12)

For a sufficiently large sample size m (in the central limit), independently of the probability distribution

p (μ | θ_{0})

, the MLE becomes normally distributed [18,22,23,49]:

p (θ_{MLE} | θ_{0}) = \sqrt{\frac{m F (θ_{0})}{2 π}} e^{- \frac{m F (θ_{0})}{2} {(θ_{0} - θ_{MLE})}^{2}} (m ≫ 1),

(13)

with mean given by the true value

θ_{0}

and variance equal to the inverse of the Fisher information. The MLE is well defined provided that there is a unique maximum in the considered phase interval. In the case of Equation (1), this condition is fulfilled provided that one restrict the phase domain to

[0, π / (2 N)]

for instance.

In Figure 1, we plot the results of a maximum likelihood analysis for the example considered in this manuscript. In this case, the MLE is readily calculated and given by

θ_{MLE} (μ) = \frac{1}{2} \arccos (\frac{m_{+} - m_{-}}{m_{+} + m_{-}})

, and the Fisher information is

F (θ_{0}) = N^{2}

, independent of

θ_{0}

(we recall that

N = 2

in our example). In Figure 1a we plot the bias

{〈θ_{MLE}〉}_{μ | θ_{0}} - θ_{0}

(dots) as a function of m, for

θ_{0} = π / 4

. Error bars are

\pm Δ θ_{CRLB}

. Notice that

{〈θ_{MLE}〉}_{μ | θ_{0}} = θ_{0}

for every m. This does not mean that the estimator is locally unbiased: indeed, the derivative

d {〈θ_{MLE}〉}_{μ | θ_{0}} / d θ_{0}

[see panel (b)] is different from 1 for every value of m. We have

d {〈θ_{MLE}〉}_{μ | θ_{0}} / d θ_{0} \to 1

asymptotically in m. In Figure 1b, we plot

m F (θ_{0}) {(Δ^{2} θ_{MLE})}_{μ | θ_{0}}

as a function of the number of independent measurements m (red dots). This quantity is compared to

m F (θ_{0}) Δ^{2} θ_{CRLB} = {(d {〈θ_{MLE}〉}_{μ | θ_{0}} / d θ_{0})}^{2}

(red line). With increasing sample size m,

{(Δ^{2} θ_{MLE})}_{μ | θ_{0}} \to 1 / (m F (θ_{0}))

corresponding to the CRLB for unbiased estimators.

2.2.3. Hammersley–Chapman–Robbins Bound

The ChRB is obtained from Equation (5) by taking

n = 2

,

a_{1} = 1, a_{2} = - 1

,

θ_{1} = θ_{0} + λ

,

θ_{2} = θ_{0}

, and can be written as [47,48]

Δ^{2} θ_{ChRB} = \sup_{λ} \frac{{({〈 θ_{est} 〉}_{μ | θ_{0} + λ} - {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2}}{\sum_{μ} \frac{p {(μ | θ_{0} + λ)}^{2}}{p (μ | θ_{0})} - 1} .

(14)

Clearly, restricting the number of parameters in the optimization in Equation (5) leads to a less strict bound. We thus have

Δ^{2} θ_{BB} \geq Δ^{2} θ_{ChRB}

. For unbiased estimators, we obtain

Δ^{2} θ_{ChRB}^{ub} = \sup_{λ} \frac{λ^{2}}{\sum_{μ} \frac{p {(μ | θ_{0} + λ)}^{2}}{p (μ | θ_{0})} - 1} .

(15)

Furthermore, the supremum over

λ

on the right side of Equation (14) is always larger or equal to its limit

λ \to 0

:

\begin{matrix} \sup_{λ} \frac{{({〈 θ_{est} 〉}_{μ | θ_{0} + λ} - {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2}}{\sum_{μ} \frac{p {(μ | θ_{0} + λ)}^{2}}{p (μ | θ_{0})} - 1} & \geq \lim_{λ \to 0} \frac{{({〈 θ_{est} 〉}_{μ | θ_{0} + λ} - {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2}}{\sum_{μ} \frac{p {(μ | θ_{0} + λ)}^{2}}{p (μ | θ_{0})} - 1} = \frac{{(\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}})}^{2}}{m \sum_{μ} \frac{1}{p (μ | θ_{0})} {(\frac{d p (μ | θ_{0})}{d θ_{0}})}^{2}}, \end{matrix}

(16)

provided that the derivatives on the right-hand side exist. We thus recover the CRLB as a limiting case of the ChRB. The ChRB is always stricter than the CRLB and we obtain the last inequality in the chain (7). Notice that the CRLB requires the probability distribution

p (μ | θ_{0})

to be differentiable [24]—a condition that can be dropped for the ChRB and the more general BB. Even if the distribution is regular, the above derivation shows that the ChRB, and more generally the BB, provide tighter error bounds than the CRLB. With increasing n, the BB becomes tighter and tighter and the CRLB represents the weakest bound in this hierarchy, which can be observed in Figure 2a. Next, we determine a stricter bound in this hierarchy.

2.2.4. Extended Hammersley–Chapman–Robbins Bound

We obtain the extended Hammersley–Chapman–Robbins bound (EChRB) as a special case of Equation (5), by taking

n = 3

,

a_{1} = 1

,

a_{2} = A

,

a_{3} = - 1

,

θ_{1} = θ_{0} + λ_{1}

,

θ_{2} = θ_{0} + λ_{2}

, and

θ_{3} = θ_{0}

, giving

Δ^{2} θ_{EChRB} = \sup_{λ_{1}, λ_{2}, A} \frac{{({〈 θ_{est} 〉}_{μ | θ_{0} + λ_{1}} + A {〈 θ_{est} 〉}_{μ | θ_{0} + λ_{2}} - (1 + A) {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2}}{\sum_{μ} \frac{{[p (μ | θ_{0} + λ_{1}) - p (μ | θ_{0}) + A p (μ | θ_{0} + λ_{2})]}^{2}}{p (μ | θ_{0})}},

(17)

where the supremum is taken over all possible

λ_{1}, λ_{2} \in N

and

A \in R

. Since the ChRB is obtained from Equation (17) in the specific case

A = 0

, we have that

Δ^{2} θ_{EChRB} \geq Δ^{2} θ_{ChRB}

. For unbiased estimators, we obtain

Δ^{2} θ_{EChRB}^{ub} = \sup_{λ_{1}, λ_{2}, A} \frac{{(λ_{1} + A λ_{2})}^{2}}{\sum_{μ} \frac{{[p (μ | θ_{0} + λ_{1}) - p (μ | θ_{0}) + A p (μ | θ_{0} + λ_{2})]}^{2}}{p (μ | θ_{0})}} .

(18)

In Figure 2a, we compare the different bounds for unbiased estimators and for the example considered in the manuscript: the CRLB (black line), the ChRB (filled triangles) and the EChRB (empty triangles), satisfying the chain of inequalities (7). In Figure 2b, we show the values of

λ

in Equation (15) for which the supremum is achieved in our case.

3. Bayesian Approach

The Bayesian approach makes use of the Bayes–Laplace theorem, which can be very simply stated and proved. The joint probability of two stochastic variables

μ

and

θ

is symmetric:

p (μ, θ) = p (μ | θ) p (θ) = p (θ | μ) p (μ) = p (θ, μ)

, where

p (θ)

and

p (μ)

are the marginal distributions, obtained by integrating the joint probability over one of the two variables, while

p (μ | θ)

and

p (θ | μ)

are conditional distributions.

We recall that in a phase inference problem, the set of measurement results

μ

is generated by a fixed and unknown value

θ_{0}

according to the likelihood

p (μ | θ_{0})

. In the Bayesian approach to the estimation of

θ_{0}

, one introduces a random variable

θ

and uses the Bayes–Laplace theorem to define the conditional probability

p_{post} (θ | μ) = \frac{p (μ | θ) p_{pri} (θ)}{p_{mar} (μ)} .

(19)

The posterior probability

p_{post} (θ | μ)

provides a degree of belief, or plausibility, that

θ_{0} = θ

(i.e., that

θ

is the true value of the phase), in the light of the measurement data

μ

[50]. In Equation (19), the prior distribution

p_{pri} (θ)

expresses the a priori state of knowledge on

θ

,

p (μ | θ)

is the likelihood that is determined by the quantum mechanical measurement postulate, e.g., as in Equation (1), and the marginal probability

p_{mar} (μ) = \int_{a}^{b} d θ p (θ, μ)

is obtained through the normalization for the posterior, where a and b are boundaries of the phase domain. The posterior probability

p_{post} (θ | μ)

describes the current knowledge about the random variable

θ

based on the available information, i.e., the measurement results

μ

.

3.1. Noninformative Prior

In the Bayesian approach, the information on

θ

provided by the posterior probability always depends on the prior distribution

p_{pri} (θ)

. It is possible to account for the available a priori information on

θ

by choosing a prior distribution accordingly. However, if no a priori information is available, it is not obvious how to choose a “noninformative” prior [51]. The flat prior

p_{pri} (θ) = const

was first introduced by Laplace to express the absence of information on

θ

[51]. However, this prior would not be flat for other functions of

θ

and, in the complete absence of a priori information, it seems unreasonable that some information is available for different parametrizations of the problem. To see this, recall that a transformation of variables requires that

p_{pri} (φ) = p_{pri} (θ) | d f^{- 1} (φ) / d φ |

for any function

φ = f (θ)

. Hence, if

p_{pri} (θ)

is flat, one obtains that

p_{pri} (φ) = | d f^{- 1} (φ) / d φ |

is, in general, not flat.

Notice that

p_{pri} (θ) \propto \sqrt{F (θ)}

—called Jeffreys prior [52,53]—where

F (θ)

is the Fisher information (10), remains invariant under re-parametrization. For arbitrary transformations

φ = f (θ)

, the Fisher information obeys the transformation property

F (φ) = F (θ) {(d θ / d φ)}^{2} = F (θ) {(d f^{- 1} (φ) / d φ)}^{2}

. Therefore, if

p_{pri} (θ) \propto \sqrt{F (θ)}

and we perform the change of variable

φ = f (θ)

, then the transformation property of the Fisher information ensures that

p_{pri} (φ) = p_{pri} (θ) | d f^{- 1} (φ) / d φ | \propto \sqrt{F (φ)}

. Notice that, as in our case, the Fisher information

F (θ)

may actually be independent of

θ

. In this case, the invariance property does not imply that Jeffreys prior is flat for arbitrary re-parametrizations

φ = f (θ)

, instead,

\sqrt{F (φ)} = | d f^{- 1} (φ) / d φ |

.

3.2. Posterior Bounds

From the posterior probability (19), we can provide an estimate

θ_{BL} (μ)

of

θ_{0}

. This can be the maximum a posteriori,

θ_{BL} (μ) = \arg \max_{θ} p_{post} (θ | μ)

, which coincides with the maximum likelihood Equation (12) when the prior is flat,

p_{pri} (θ) = const

, or the mean of the distribution,

θ_{BL} (μ) = \int_{a}^{b} d θ θ p_{post} (θ | μ)

.

With the Bayesian approach, it is possible to provide a confidence interval around the estimator, given an arbitrary measurement sequence

μ

, even with a single measurement. The variance

{(Δ^{2} θ_{BL} (μ))}_{θ | μ} = \int_{a}^{b} d θ p_{post} (θ | μ) {(θ - θ_{BL} (μ))}^{2}

(20)

can be taken as a measure of fluctuation of our degree of belief around

θ_{BL} (μ)

. There is no such concept in the frequentist paradigm. The Bayesian posterior variance

{(Δ^{2} θ_{BL} (μ))}_{θ | μ}

and the frequentist variance

{(Δ^{2} θ_{BL})}_{μ | θ_{0}}

have entirely different operational meanings. Equation (20) provides a degree of plausibility that

θ_{BL} (μ) = θ_{0}

, given the measurement results

μ

. There is no notion of bias in this case. On the other hand, the quantity

{(Δ^{2} θ_{BL})}_{μ | θ_{0}}

measures the statistical fluctuations of

θ_{BL} (μ)

when repeating the sequence of m measurements infinitely many times.

Ghosh Bound

In the following, we derive a lower bound to Equation (20) first introduced by Ghosh [54]. Using

\int_{a}^{b} d θ p_{post} (θ | μ) = 1,

we have

\begin{matrix} \int_{a}^{b} d θ (θ - θ_{BL} (μ)) \frac{d p_{post} (θ | μ)}{d θ} & = {p_{post} (θ | μ) (θ - θ_{BL} (μ))|}_{a}^{b} - \int_{a}^{b} d θ p_{post} (θ | μ) = f (μ, a, b) - 1, \end{matrix}

(21)

where

f (μ, a, b) = b p_{post} (b | μ) - a p_{post} (a | μ) - θ_{BL} (μ) (p_{post} (b | μ) - p_{post} (a | μ))

depends on the value of the posterior distribution calculated at the boundaries. If

p_{pri} (a) = p_{pri} (b) = 0

, we have

f (μ, a, b) = 0

. Analogously with the derivation of the (frequenstist) CRLB, we exploit the Cauchy–Schwarz inequality,

(\int_{a}^{b} d θ {(\frac{d p_{post} (θ | μ)}{d θ})}^{2} \frac{1}{p_{post} (θ | μ)}) (\int_{a}^{b} d θ p_{post} (θ | μ) {(θ - θ_{BL} (μ))}^{2}) \geq {(f (μ, a, b) - 1)}^{2},

leading to

{(Δ^{2} θ_{BL} (μ))}_{θ | μ} \geq Δ^{2} θ_{GB} (μ)

, where [54]

Δ^{2} θ_{GB} (μ) = \frac{{(f (μ, a, b) - 1)}^{2}}{\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}} .

(22)

The above bound is a function of the specific measurement sequence

μ

and depends on

\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}

that we can identify as a “Fisher information of the posterior distribution”. The Ghosh bound is saturated if and only if

θ - θ_{BL} (μ) = λ_{μ} \frac{d \log p (θ | μ)}{d θ},

(23)

where

λ_{μ}

does not depend on

θ

while it may depend on

μ

.

3.3. Average Posterior Bounds

While Equation (20) depends on the specific

μ

, it is natural to consider its average over all possible measurement sequences at fixed

θ_{0}

and m, weighted by the likelihood

p (μ | θ_{0})

:

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}} = \sum_{μ} {(Δ^{2} θ_{BL} (μ))}_{θ | μ} p (μ | θ_{0}) = \sum_{μ} \int_{a}^{b} d θ p (θ, μ | θ_{0}) {(θ - θ_{BL} (μ))}^{2},

(24)

which we indicate as average Bayesian posterior variance, where

p (θ, μ | θ_{0}) = p_{post} (θ | μ) p (μ | θ_{0})

.

We would be tempted to compare the average posterior sensitivity

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

to the frequentist Cramér–Rao bound

Δ^{2} θ_{CRLB}

. However, because of the different operational meanings of the frequentist and the Bayesian paradigms, there is no reason for Equation (24) to fulfill the Cramér–Rao bound: indeed, it does not, as we show below.

Likelihood-Averaged Ghosh Bound

A lower bound to Equation (24) is obtained by averaging the Ghosh bound Equation (22) over the likelihood function. We have

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}} \geq Δ^{2} θ_{aGB}

, where [18]

Δ^{2} θ_{aGB} = \sum_{μ} \frac{{(f (μ, a, b) - 1)}^{2}}{\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{\partial p_{post} (θ | μ)}{\partial θ})}^{2}} p (μ | θ_{0}) .

(25)

This likelihood-averaged Ghosh bound is independent of

μ

because of the statistical average.

3.4. Numerical Comparison of Bayesian and Frequentist Phase Estimation

In the numerical calculations shown in Figure 3, we consider a Bayesian estimator given by

θ_{BL} (μ) = \int_{a}^{b} d θ θ p_{post} (θ | μ)

with prior distributions

p_{pri} (θ) = \frac{2}{π} \frac{e^{α \sin {(2 θ)}^{2}} - 1}{e^{α / 2} I_{0} (α / 2) - 1},

(26)

where

I_{0} (α)

is the modified Bessel function of the first kind. This choice of prior distribution can continuously turn from a peaked function to a flat one when changing

α

, while being differentiable in the full phase interval. The more negative is

α

, the more

p_{pri} (θ)

broadens in

[0, π / 2]

. In particular, in the limit

α \to - \infty

, the prior approaches the flat distribution, which in our case coincides with Jeffreys prior since the Fisher information is independent of

θ

. In the limit

α = 0

, the prior is given by

\lim_{α \to 0} p_{pri} (θ) = 4 \sin {(2 θ)}^{2} / π

. For positive values of

α

, the larger

α

, the more peaked is

p_{pri} (θ)

around

θ_{0} = π / 4

. In particular

p_{pri} (θ) \approx e^{- 4 α {(θ - π / 4)}^{2}} / \sqrt{π / 4 α}

for

α ≫ 1

. Equation (26) is normalized to one for

θ \in [0, \frac{π}{2}]

. In the inset of the different panels of Figure 3, we plot

p_{pri} (θ)

for

α = - 100

[panel (a)],

α = - 10

(b),

α = 1

(c) and

α = 10

(d).

In Figure 3, we plot, as a function of m, the posterior variance

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

(blue circles) that, as expected, is always larger than the likelihood-averaged Ghosh bound Equation (25) (solid blue lines). For comparison, we also plot the frequentist variance

{(Δ^{2} θ_{BL})}_{μ | θ_{0}} = \sum_{μ} {(θ_{BL} (μ) - {〈 θ_{BL} 〉}_{μ | θ_{0}})}^{2} p (μ | θ_{0})

(red dots) around the mean value

{〈 θ_{BL} 〉}_{μ | θ_{0}} = \sum_{μ} θ_{BL} (μ) p (μ | θ_{0})

of the estimator. This quantity obeys the Cramér–Rao theorem

{(Δ^{2} θ_{BL})}_{μ | θ_{0}} \geq Δ^{2} θ_{CRLB}

and the more general chain of inequalities (7). This is confirmed in the figure where we show

Δ^{2} θ_{CRLB} = {| d {〈 θ_{BL} 〉}_{μ | θ_{0}} / d θ_{0} |}^{2} / (m F (θ_{0}))

(red line). Notice that, when the prior narrows around

θ_{0}

, the variance

{(Δ^{2} θ_{BL})}_{μ | θ_{0}}

decreases, but, at the same time, the estimator becomes more and more biased, i.e.,

| d {〈 θ_{BL} 〉}_{μ | θ_{0}} / d θ_{0} |

decreases as well (note indeed that the red dashed line is proportional to

| d {〈 θ_{BL} 〉}_{μ | θ_{0}} / d θ_{0} |^{2}

).

Interestingly, in Figure 3, we clearly see that the Bayesian posterior variance

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

and the likelihood-averaged Ghosh bound may stay in some cases below the (frequentist)

Δ^{2} θ_{CRLB}

[see panels (a) and (b)], even if the prior is almost flat. The discrepancy with the CRLB is remarkable and can be quite large for small values of m. Still, there is no contradiction since

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

and

{(Δ^{2} θ_{BL})}_{μ | θ_{0}}

have different operational meanings and interpretations. They both respect their corresponding sensitivity bounds.

Asymptotically in the number of measurements m, the Ghosh bound as well as its likelihood average converge to the Cramér–Rao bound. Indeed, it is well known that in this limit the posterior probability becomes a Gaussian centered at the true value of the phase shift and with variance given by the inverse of the Fisher information,

p_{post} (θ | μ) = \sqrt{\frac{m F (θ_{0})}{2 π}} e^{- \frac{m F (θ_{0})}{2} {(θ - θ_{0})}^{2}}, (m ≫ 1),

(27)

a result known as Laplace–Bernstein–von Mises theorem [18,23,55]. By replacing Equation (27) into Equation (22), we recover a posterior variance given by

1 / (m F (θ_{0}))

.

4. Bounds for Random Parameters

In this section, we derive bounds of phase sensitivity obtained when

θ_{0}

is a random variable distributed according to

p (θ_{0})

. Operationally, this corresponds to the situation where

θ_{0}

remains fixed (but unknown) when collecting a single sequence of m measurements

μ

. In between measurement sequences,

θ_{0}

fluctuates according to

p (θ_{0})

.

4.1. Frequentist Risk Functions for Random Parameters

Let us first consider the frequentist estimation of a fluctuating parameter

θ_{0}

with the estimator

θ_{est}

. The mean sensitivity obtained by averaging

{(Δ^{2} θ_{est})}_{μ | θ_{0}}

, Equation (3), over

p (θ_{0})

is

\begin{matrix} {(Δ^{2} θ_{est})}_{μ, θ_{0}} & = & \int_{a}^{b} d θ_{0} {(Δ^{2} θ_{est})}_{μ | θ_{0}} p (θ_{0}) \\ = & \sum_{μ} \int_{a}^{b} d θ_{0} p (μ | θ_{0}) p (θ_{0}) {({〈 θ_{est} 〉}_{μ | θ_{0}} - θ_{est} (μ))}^{2} \\ = & \sum_{μ} \int_{a}^{b} d θ_{0} p (μ, θ_{0}) {({〈 θ_{est} 〉}_{μ | θ_{0}} - θ_{est} (μ))}^{2}, \end{matrix}

(28)

where

μ

and

θ_{0}

are both random variables and we have used

p (μ | θ_{0}) p (θ_{0}) = p (μ, θ_{0})

.

An averaged risk function for the efficiency of the estimator is given by averaging the mean square error (3) over

p (θ_{0})

, leading to

MSE {(θ_{est})}_{μ, θ_{0}} = \int d θ_{0} MSE {(θ_{est})}_{μ | θ_{0}} p (θ_{0}) = \int d θ_{0} \sum_{μ} {(θ_{est} (μ) - θ_{0})}^{2} p (μ, θ_{0}) .

(29)

Analogously to Equation (4), we can write

MSE {(θ_{est})}_{μ, θ_{0}} = {(Δ^{2} θ_{est})}_{μ, θ_{0}} + \int d θ_{0} {({〈 θ_{est} 〉}_{μ | θ_{0}} - θ_{0})}^{2} p (θ_{0}) .

(30)

In the following, we derive lower bounds for both

{(Δ^{2} θ_{est})}_{μ, θ_{0}}

and

MSE {(θ_{est})}_{μ, θ_{0}}

. Notice that bounds on

{(Δ^{2} θ_{est})}_{μ, θ_{0}}

hold also for

MSE {(θ_{est})}_{μ, θ_{0}}

due to

MSE {(θ_{est})}_{μ, θ_{0}} \geq {(Δ^{2} θ_{est})}_{μ, θ_{0}}

. Nevertheless, bounds on the average the mean square error are widely used (and are often called Bayesian bounds [56]) since they can be expressed independently of the bias.

4.2. Bounds on the Mean Square Error

We first consider bounds on

MSE {(θ_{est})}_{μ, θ_{0}}

, Equation (29), for arbitrary estimators.

4.2.1. Van Trees Bound

It is possible to derive a general lower bound on the mean square error (29) based on the following assumptions:

$\frac{\partial p (μ, θ_{0})}{\partial θ_{0}}$ and $\frac{\partial^{2} p (μ, θ_{0})}{\partial θ_{0}^{2}}$ are absolutely integrable with respect to $μ$ and $θ_{0}$ ;
$p (a) ξ (a) - p (b) ξ (b) = 0$ , where $ξ (θ_{0}) = \sum_{μ} (θ_{est} (μ) - θ_{0}) p (μ | θ_{0})$ .

Multiplying

ξ (θ_{0})

by

p (θ_{0})

and differentiating with respect to

θ_{0}

, we have

\frac{\partial p (θ_{0}) ξ (θ_{0})}{\partial θ_{0}} = \sum_{μ} (θ_{est} (μ) - θ_{0}) \frac{\partial p (μ, θ_{0})}{\partial θ_{0}} - p (θ_{0}) .

Integrating over

θ_{0}

in the range of

[a, b]

and considering the above properties, we find

\sum_{μ} \int_{a}^{b} d θ_{0} (θ_{BL} (μ) - θ_{0}) \frac{\partial p (μ, θ_{0})}{\partial θ_{0}} = 1 .

(31)

Finally, using the Cauchy–Schwarz inequality, we arrive at

MSE {(θ_{est})}_{μ, θ_{0}} \geq Δ^{2} θ_{VTB}

, where

Δ^{2} θ_{VTB} = \frac{1}{\sum_{μ} \int_{a}^{b} d θ_{0} \frac{1}{p (μ, θ_{0})} {(\frac{\partial p (μ, θ_{0})}{\partial θ_{0}})}^{2}}

(32)

is generally indicated as Van Trees bound [24,56,57]. The equality holds if and only if

θ_{est} (μ) - θ_{0} = λ \frac{d \log p (μ, θ_{0})}{d θ_{0}},

(33)

where

λ

does not depend on

θ_{0}

and

μ

. It is easy to show that

\sum_{μ} \int_{a}^{b} d θ_{0} \frac{1}{p (μ, θ_{0})} {(\frac{\partial p (μ, θ_{0})}{\partial θ_{0}})}^{2} = m \int_{a}^{b} d θ_{0} p (θ_{0}) F (θ_{0}) + \int_{a}^{b} d θ_{0} \frac{1}{p (θ_{0})} {(\frac{\partial p (θ_{0})}{\partial θ_{0}})}^{2},

(34)

where the first term is the Fisher information

F (θ_{0})

, defined by Equation (10), averaged over

p (θ_{0})

, and the second term can be interpreted as a Fisher information of the prior [24]. Asymptotically in the number of measurements m and for regular distributions

p (θ_{0})

, the first term in Equation (34) dominates over the second one.

4.2.2. Ziv–Zakai Bound

A further bound on

MSE {(θ_{est})}_{μ, θ_{0}}

can be derived by mapping the phase estimation problem to a continuous series of binary hypothesis testing problems. A detailed derivation of the Ziv–Zakai bound [24,58,59] is provided in Appendix B. The final result reads

MSE {(θ_{est})}_{μ, θ_{0}} \geq Δ^{2} θ_{ZZB}

, where

Δ^{2} θ_{ZZB} = \frac{1}{2} \int d h h \int d θ_{0} (p (θ_{0}) + p (θ_{0} + h)) P_{\min} (θ_{0}, θ_{0} + h),

(35)

and

P_{\min} (θ_{0}, θ_{0} + h) = \frac{1}{2} (1 - \sum_{μ} |\frac{p (θ_{0}) p (μ | θ_{0})}{p (θ_{0}) + p (θ_{0} + h)} - \frac{p (θ_{0} + h) p (μ | θ_{0} + h)}{p (θ_{0}) + p (θ_{0} + h)}|)

(36)

is the minimum error probability of the binary hypothesis testing problem. This bound has been adopted for quantum phase estimation in Ref. [26]. To this end, the probability

P_{\min} (θ_{0}, θ_{0} + h)

can be maximized over all possible quantum measurements, which leads to the trace distance [7]. As the optimal measurement may depend on

θ_{0}

and h, the bound (35), which involves integration over all values of

θ_{0}

and h, is usually not saturable. We remark that the trace distance also defines a saturable frequentist bound for a different risk function than the variance [60].

4.3. Bounds on the Average Estimator Variance

We now consider bounds on

{(Δ^{2} θ_{est})}_{μ, θ_{0}}

, Equation (28), for arbitrary estimators.

4.3.1. Average CRLB

Taking the average over

p (θ_{0})

of Equation (7), we obtain a chain of bounds for

{(Δ^{2} θ_{est})}_{μ, θ_{0}}

. In particular, in its simplest form, we have

{(Δ^{2} θ_{est})}_{μ, θ_{0}} \geq Δ^{2} θ_{aCRLB}

, where

Δ^{2} θ_{aCRLB} = \int_{a}^{b} d θ_{0} \frac{{(\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}})}^{2}}{m F (θ_{0})} p (θ_{0})

(37)

is the average CRLB.

4.3.2. Van Trees Bound for the Average Estimator Variance

We can derive a general lower bound for the variance (28) by following the derivation of the Van Trees bound, which was discussed in Section 4.2.1. In contrast to the standard Van Trees bound for the mean square error, here the bias enters explicitly. Defining

ξ (θ_{0}) = \sum_{μ} (θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}}) p (μ | θ_{0})

and assuming the same requirements as in the derivation of the Van Trees bound for the MSE, we arrive at

\sum_{μ} \int_{a}^{b} d θ_{0} (θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}}) \frac{\partial p (μ, θ_{0})}{\partial θ_{0}} = \int_{a}^{b} d θ_{0} \frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}} p (θ_{0}) .

Finally, a Cauchy–Schwarz inequality gives

{(Δ^{2} θ_{est})}_{μ, θ_{0}} \geq Δ^{2} θ_{fVTB}

, where

Δ^{2} θ_{fVTB} = \frac{{(\int_{a}^{b} d θ_{0} \frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}} p (θ_{0}))}^{2}}{\sum_{μ} \int_{a}^{b} d θ_{0} \frac{1}{p (μ, θ_{0})} {(\frac{\partial p (μ, θ_{0})}{\partial θ_{0}})}^{2}},

(38)

with equality if and only if

θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}} = λ \frac{d \log p (μ, θ_{0})}{d θ_{0}},

(39)

where

λ

is independent of

θ_{0}

and

μ

.

We can compare Equation (38) with the average CRLB Equation (37). We find

\int_{a}^{b} d θ_{0} \frac{{(\frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}})}^{2}}{m F (θ_{0})} p (θ_{0}) \geq \frac{{(\int_{a}^{b} d θ_{0} \frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}} p (θ_{0}))}^{2}}{m \int_{a}^{b} d θ_{0} p (θ_{0}) F (θ_{0})} \geq \frac{{(\int_{a}^{b} d θ_{0} | \frac{d {〈 θ_{est} 〉}_{μ | θ_{0}}}{d θ_{0}} | p (θ_{0}))}^{2}}{\sum_{μ} \int_{a}^{b} d θ_{0} \frac{1}{p (μ, θ_{0})} {(\frac{\partial p (μ, θ_{0})}{\partial θ_{0}})}^{2}},

where in the first step we use Jensen’s inequality, and the second step follows from Equation (34) which implies

m \int_{a}^{b} d θ_{0} p (θ_{0}) F (θ_{0}) \leq \sum_{μ} \int_{a}^{b} d θ_{0} \frac{1}{p (μ, θ_{0})} {(\frac{\partial p (μ, θ_{0})}{\partial θ_{0}})}^{2}

since

\int_{a}^{b} d θ_{0} \frac{1}{p (θ_{0})} {(\frac{d p (θ_{0})}{d θ_{0}})}^{2} \geq 0

.

We thus arrive at

{(Δ^{2} θ_{est})}_{μ, θ_{0}} \geq Δ^{2} θ_{aCRLB} \geq Δ^{2} θ_{fVTB},

(40)

which is valid for generic estimators.

4.4. Bayesian Framework for Random Parameters

The Bayesian posterior variance,

{(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

, Equation (24), averaged over

p (θ_{0})

is

\begin{matrix} {(Δ^{2} θ_{BL})}_{μ, θ, θ_{0}} & = & \int_{a}^{b} d θ_{0} {(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}} p (θ_{0}) \\ = & \sum_{μ} \int_{a}^{b} d θ \int_{a}^{b} d θ_{0} p_{post} (θ | μ) p (μ | θ_{0}) p (θ_{0}) {(θ - θ_{BL} (μ))}^{2} \\ = & \sum_{μ} \int_{a}^{b} d θ p_{post} (θ | μ) p (μ) {(θ - θ_{BL} (μ))}^{2}, \end{matrix}

(41)

where

p (μ) = \int_{a}^{b} d θ_{0} p (μ | θ_{0}) p (θ_{0})

is the average probability to observe

μ

taking into account fluctuations of

θ_{0}

.

A bound on Equation (41) can be obtained by averaging Equation (25) over

p (θ_{0})

, or, equivalently, averaging the Ghosh bound, Equation (22), over

p (μ)

. We obtain the average Ghosh bound for random parameters

θ_{0}

,

{(Δ^{2} θ_{BL})}_{μ, θ, θ_{0}} \geq Δ^{2} θ_{aGBr}

, where

\begin{matrix} Δ^{2} θ_{aGBr} & = & \int_{a}^{b} d θ_{0} \sum_{μ} \frac{{(f (μ, a, b) - 1)}^{2}}{\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}} p (μ | θ_{0}) p (θ_{0}) \\ = & \sum_{μ} \frac{{(f (μ, a, b) - 1)}^{2}}{\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}} p (μ) . \end{matrix}

(42)

The bound holds for any prior

p_{pri} (θ)

and is saturated if and only if, for every value of

μ

, there exists a

λ_{μ}

such that Equation (23) holds.

Bayesian Bounds

In Equation (41), the prior used to define the posterior

p_{post} (θ | μ)

via the Bayes–Laplace theorem is arbitrary. In general, such a prior

p_{pri} (θ)

is different from the statistical distribution of

θ_{0}

, which can be unknown. If

p (θ_{0})

is known, then one can use it as a prior in the Bayesian posterior probability, i.e.,

p_{pri} (θ) = p (θ_{0})

. In this specific case, we have

p_{mar} (μ) = p (μ)

, and thus

p_{post} (θ | μ) p (μ) = p_{post} (θ | μ) p_{mar} (μ) = p (μ, θ)

. In other words, for this specific choice of prior, the physical joint probability

p (μ, θ_{0})

of random variables

θ_{0}

and

μ

coincides with the Bayesian

p (μ, θ)

. Equation (41) thus simplifies to

{(Δ^{2} θ_{BL})}_{μ, θ} = \sum_{μ} \int_{a}^{b} d θ p (μ, θ) {(θ - θ_{BL} (μ))}^{2} .

(43)

Notice that this expression is mathematically equivalent to the frequentist average mean square error (29) if we replace

θ

with

θ_{0}

and

θ_{BL} (μ)

with

θ_{est} (μ)

. This means that precision bounds for Equation (29), e.g., the Van Trees and Ziv–Zakai bounds can also be applied to Equation (43). These bounds are indeed often referred to as “Bayesian bounds” (see Ref. [24]).

We emphasize that the average over the marginal distribution

p_{mar} (μ)

, which connects Equations (24) and (43), has operational meaning if we consider that

θ_{0}

is a random variable distributed according to

p (θ_{0})

, and

p (θ)

is used as prior in the Bayes–Laplace theorem to define a posterior distribution. In this case, and under the condition

f (μ, a, b) = 0

(for instance if the prior distribution vanishes at the borders of the phase domain), using Jensen’s inequality, we find

\begin{matrix} Δ^{2} θ_{aGBr} & = & \sum_{μ} \frac{p (μ)}{\int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}} \\ \geq & \frac{1}{\sum_{μ} p (μ) \int_{a}^{b} d θ \frac{1}{p_{post} (θ | μ)} {(\frac{d p_{post} (θ | μ)}{d θ})}^{2}} \\ = & \frac{1}{\sum_{μ} \int_{a}^{b} d θ \frac{1}{p (θ, μ)} {(\frac{\partial p (θ, μ)}{\partial θ})}^{2}}, \end{matrix}

(44)

which coincides with the Van Trees bound discussed above. We thus find that the averaged Ghosh bound for random parameters (42) is sharper than the Van Trees bound (38):

{(Δ^{2} θ_{BL})}_{μ, θ} \geq Δ^{2} θ_{aGBr} \geq Δ^{2} θ_{VTB},

(45)

which is also confirmed by the numerical data shown in Figure 4.

In Figure 4, we compare

{(Δ^{2} θ_{BL})}_{μ, θ}

with the various bounds discussed in this section. As

p (θ_{0}),

we consider the same prior (26) used in Figure 3. We observe that all bounds approach the Van Trees bound with increasing sharpness of the prior distribution. Asymptotically in the number of measurements m, all bounds converge to the Cramér–Rao bound.

5. Discussion and Conclusions

In this manuscript, we have clarified the differences between frequentist and Bayesian approaches to phase estimation. The two paradigms provide statistical results that have a different conceptual meaning and cannot be compared. We have also reviewed and discussed phase sensitivity bounds in the frequentist and Bayesian frameworks, when the true value of the phase shift

θ_{0}

is fixed or fluctuates. These bounds are summarized in Table 1.

In the frequentist approach, for a fixed

θ_{0}

, the phase sensitivity is determined from the width of the probability distribution of the estimator. The physical content of the distribution is that, when repeating the estimation protocol, the obtained

θ_{est} (μ)

will fall, with a certain confidence, in an interval around the mean value

{〈 θ_{est} 〉}_{μ | θ_{0}}

(e.g.,

68 %

of the times within a

2 {(Δ θ_{est})}_{μ | θ_{0}}

interval for a Gaussian distribution) that, for unbiased estimators, coincides with the true value of the phase shift.

In the Bayesian case, the posterior

p_{post} (θ | μ)

provides a degree of plausibility that the phase shift

θ

equals the interferometer phase

θ_{0}

when the data

μ

was obtained. This allows the Bayesian approach to provide statistical information for any number of measurements, even a single one. To be sure, this is not a sign of failure or superiority of one approach with respect to the other one, since the two frameworks manipulate conceptually different quantities. The experimentalist can choose to use one or both approaches, keeping in mind the necessity to clearly state the nature of the statistical significance of the reported results.

The two predictions converge asymptotically in the limit of a large number of measurements. This does not mean that in this limit the significance of the two approaches is interchangeable (it cannot be stated that in the limit of large repetition of the measurements, frequentist ad Bayesian provide the same results). In this respect, it is quite instructive to notice that the Bayesian

2 σ

confidence may be below that of the Cramér–Rao bound, as shown in Figure 3. This, at first sight, seems paradoxical, since the CRLB is a theorem about the minimum error achievable in parameter estimation theory. However, the CRLB is a frequentist bound and, again, the paradox is solved taking it into account that the frequentist and the Bayesian approaches provide information about different quantities.

Finally, a different class of estimation problems with different precision bounds is encountered if

θ_{0}

is itself a random variable. In this case, the frequentist bounds for the mean-square error (Van Trees, Ziv–Zakai) become independent of the bias, while those on the estimator variance are still functions of the bias. The Van Trees and Ziv–Zakai bounds can be applied to the Bayesian paradigm if the average of the posterior variance over the marginal distribution is the relevant risk function. This is only meaningful if the prior

p_{pri} (θ)

that enters the Bayes–Laplace theorem coincides with the actual distribution

p (θ_{0})

of the phase shift

θ_{0}

.

We conclude with a remark regarding the so-called Heisenberg limit, which is a saturable lower bound on the CRLB over arbitrary quantum states with a fixed number of particles. For instance, for a collection of N two-level systems, the CRLB can be further bounded by

Δ θ_{est} \geq 1 / \sqrt{m F (θ_{0})} \geq 1 / (\sqrt{m} N)

[18,20]. This bound is often called the ultimate precision bound since no quantum state is able to achieve a tighter scaling than N. From the discussions presented in this article, it becomes apparent that Bayesian approaches (as discussed in Section 3) or precision bounds for random parameters (Section 4) are expected to lead to entirely different types of ‘ultimate’ lower bounds. Such bounds are interesting within the respective paradigm for which they are derived, but they cannot replace or improve the Heisenberg limit since they address fundamentally different scenarios that cannot be compared in general.

Author Contributions

Y.L., L.P., M.G., W.L. and A.S. conceived the study, performed theoretical calculations and drafted the article. All authors have read and approved the final manuscript.

Acknowledgments

This work was supported by the National Key R & D Program of China (No. 2017YFA0304500 and No. 2017YFA0304203), the National Natural Science Foundation of China (Grant No. 11874247), the 111 plan of China (No. D18001), the Hundred Talent Program of the Shanxi Province (2018), the Program of State Key Laboratory of Quantum Optics and Quantum Optics Devices (No. KF201703), and the QuantEra project Q-Clocks. M.G. acknowledges support by the Alexander von Humboldt Foundation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of the Barankin Bound

Let

θ_{est}

be an arbitrary estimator for

θ

. Its mean value

{〈 θ_{est} 〉}_{μ | θ} = \sum_{μ} θ_{est} (μ) p (μ | θ)

(A1)

coincides with

θ

if and only if the estimator is unbiased (for arbitrary values of

θ

). In the following, we make no assumption about the bias of

θ_{est}

and therefore do not replace

{〈 θ_{est} 〉}_{μ | θ}

by

θ

.

Introducing the likelihood ratio

L (μ | θ_{i}, θ_{0}) = \frac{p (μ | θ_{i})}{p (μ | θ_{0})}

(A2)

under the condition

p (μ | θ_{0}) > 0

for all

μ

, we obtain with Equation (A1) that

\sum_{μ} θ_{est} (μ) L (μ | θ_{i}, θ_{0}) p (μ | θ_{0}) = {〈 θ_{est} 〉}_{μ | θ_{i}},

(A3)

for an arbitrary family of phase values

θ_{1}, \dots, θ_{n}

picked from the parameter domain. Furthermore, we have

\sum_{μ} L (μ | θ_{i}, θ_{0}) p (μ | θ_{0}) = \sum_{μ} p (μ | θ_{i}) = 1

(A4)

for all

θ_{i}

. Multiplying both sides of Equation (A4) with

{〈 θ_{est} 〉}_{μ | θ_{0}}

and subtracting it from (A3) yields

\sum_{μ} (θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}}) L (μ | θ_{i}, θ_{0}) p (μ | θ_{0}) = {〈 θ_{est} 〉}_{μ | θ_{i}} - {〈 θ_{est} 〉}_{μ | θ_{0}} .

(A5)

Let us now pick a family of n finite coefficients

a_{1}, \dots, a_{n}

. From Equation (A5), we obtain

\sum_{μ} (θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}}) (\sum_{i = 1}^{n} a_{i} L (μ | θ_{i}, θ_{0})) p (μ | θ_{0}) = \sum_{i = 1}^{n} a_{i} ({〈 θ_{est} 〉}_{μ | θ_{i}} - {〈 θ_{est} 〉}_{μ | θ_{0}}) .

(A6)

The Cauchy–Schwarz inequality now yields

{(\sum_{i = 1}^{n} a_{i} ({〈 θ_{est} 〉}_{μ | θ_{i}} - {〈 θ_{est} 〉}_{μ | θ_{0}}))}^{2} \leq {(Δ^{2} θ_{est})}_{μ | θ_{0}} (\sum_{μ} {(\sum_{i = 1}^{n} a_{i} L (μ | θ_{i}, θ_{0}))}^{2} p (μ | θ_{0})),

(A7)

where

{(Δ^{2} θ_{est})}_{μ | θ_{0}} = \sum_{μ} {(θ_{est} (μ) - {〈 θ_{est} 〉}_{μ | θ_{0}})}^{2} p (μ | θ_{0})

(A8)

is the variance of the estimator

θ_{est}

. We thus obtain

{(Δ^{2} θ_{est})}_{μ | θ_{0}} \geq \frac{{(\sum_{i = 1}^{n} a_{i} ({〈 θ_{est} 〉}_{μ | θ_{i}} - {〈 θ_{est} 〉}_{μ | θ_{0}}))}^{2}}{\sum_{μ} {(\sum_{i = 1}^{n} a_{i} L (μ | θ_{i}, θ_{0}))}^{2} p (μ | θ_{0})},

(A9)

for all n,

a_{i}

, and

θ_{i}

. The Barankin bound then follows by taking the supremum over these variables.

Appendix B. Derivation of the Ziv–Zakai Bound

Derivations of the Ziv–Zakai bound can be found in the literature (see, for instance, Refs. [24,58,59]). This Appendix follows these derivations closely and provides additional background, which may be useful for readers less familiar with the field of hypothesis testing.

Let

X \in [0, a]

be a random variable with probability density

p (x)

. We can formally write

p (x) = - d P (X \geq x) / d x

, where

P (X \geq x) \equiv \int_{x}^{a} p (y) d y

is the probability that X is larger or equal than x. We obtain from integration by parts

\begin{matrix} 〈 X^{2} 〉 = \int_{0}^{a} x^{2} p (x) d x & = & - {[x^{2} P (X \geq x)]}_{0}^{a} + 2 \int_{0}^{a} P (X \geq x) x d x \\ = & 2 \int_{0}^{a} P (X \geq x) x d x \\ = & \frac{1}{2} \int_{0}^{2 a} P (X \geq \frac{h}{2}) h d h, \end{matrix}

(A10)

where we assume that a is finite [if

a \to \infty

the above relation holds when

\lim_{a \to \infty} a^{2} P (X \geq a) = 0

]. Finally, we can formally extend the above integral up to ∞ since

P (X \geq a) = 0

:

〈 X^{2} 〉 = \frac{1}{2} \int_{0}^{\infty} P (X \geq \frac{h}{2}) h d h .

(A11)

Following Ref. [59], we now take

ϵ = θ_{est} (μ) - θ_{0}

and

X = | ϵ |

. We thus have

MSE {(θ_{est})}_{μ, θ_{0}} = {〈 | ϵ |}^{2} 〉 = \frac{1}{2} \int_{0}^{\infty} P (| ϵ | \geq \frac{h}{2}) h d h .

(A12)

We express the probability as

\begin{matrix} P (| ϵ | \geq \frac{h}{2}) & = & P (ϵ > \frac{h}{2}) + P (ϵ \leq - \frac{h}{2}) \\ = & P (θ_{est} (μ) - θ_{0} > \frac{h}{2}) + P (θ_{est} (μ) - θ_{0} \leq - \frac{h}{2}) \\ = & \int P (θ_{est} (μ) - θ_{0} > \frac{h}{2} | θ_{0}) p (θ_{0}) d θ_{0} + \int P (θ_{est} (μ) - θ_{0} \leq - \frac{h}{2} | θ_{0}) p (θ_{0}) d θ_{0} . \end{matrix}

Next, we replace

θ_{0}

with

θ_{0} + h

in the second integral:

\begin{matrix} P (| ϵ | \geq \frac{h}{2}) & = & \int P (θ_{est} (x) - θ_{0} > \frac{h}{2} | θ_{0}) p (θ_{0}) d θ_{0} + \int P (θ_{est} (x) - θ_{0} \leq \frac{h}{2} | θ_{0} + h) p (θ_{0} + h) d θ_{0} \\ = & \int (p (φ) + p (φ + h)) [\frac{p (φ)}{p (φ) + p (φ + h)} P (θ_{est} (x) - φ > \frac{h}{2} | θ_{0} = φ) + \\ + \frac{p (φ + h)}{p (φ) + p (φ + h)} P (θ_{est} (x) - φ \leq \frac{h}{2} | θ_{0} = φ + h)] d φ . \end{matrix}

We now take a closer look at the expression within the angular brackets and interpret it in the framework of hypothesis testing. Suppose that we try to discriminate between the two cases

θ_{0} = φ

(hypothesis 1, denoted

H_{1}

) and

θ_{0} = φ + h

(denoted

H_{2}

). We decide between the two hypothesis

H_{1}

and

H_{2}

on the basis of the measurement result x using the estimator

θ_{est} (x)

. One possible strategy consists in choosing the hypothesis whose value is closest to the obtained estimator. Hence, if

θ_{est} (x) \leq φ + h / 2

, we assume

H_{1}

to be correct and, otherwise, if

θ_{est} (x) > φ + h / 2

, we pick

H_{2}

.

Let us now determine the probability to make an erroneous decision using this strategy. There are two scenarios that will lead to a mistake. First, our strategy fails whenever

θ_{est} (x) \leq φ + h / 2

when

θ_{0} = φ + h

. In this case,

H_{2}

is true, but our strategy leads us to choose

H_{1}

. The probability for this to happen, given that

θ_{0} = φ + h

, is

P (θ_{est} (x) - φ \leq \frac{h}{2} | θ_{0} = φ + h)

. To obtain the probability error of our strategy, we need to multiply this with the probability with which

θ_{0}

assumes the value

φ + h

, which is given by

p (H_{2}) = \frac{p (φ + h)}{p (φ) + p (φ + h)}

. Second, our strategy also fails if

θ_{est} (x) > φ + h / 2

for

θ_{0} = φ

. This occurs with the conditional probability

P (θ_{est} (x) - φ > \frac{h}{2} | θ_{0} = φ)

, and

θ_{0} = φ

with probability

p (H_{1}) = \frac{p (φ)}{p (φ) + p (φ + h)}

. The total probability to make a mistake is consequently given by

\begin{matrix} P_{err} (φ, φ + h) & = & P (θ_{est} (x) - φ > \frac{h}{2} | H_{1}) p (H_{1}) + P (θ_{est} (x) - φ \leq \frac{h}{2} | H_{2}) p (H_{2}) \\ = & \frac{p (φ)}{p (φ) + p (φ + h)} P (θ_{est} (x) - φ > \frac{h}{2} | θ_{0} = φ) + \\ + \frac{p (φ + h)}{p (φ) + p (φ + h)} P (θ_{est} (x) - φ \leq \frac{h}{2} | θ_{0} = φ + h), \end{matrix}

(A13)

and we can rewrite Equation (A13) as

P (| ϵ | \geq \frac{h}{2}) = \int_{- \infty}^{\infty} (p (φ) + p (φ + h)) P_{err} (φ, φ + h) d φ .

(A14)

The strategy described above depends on the estimator

θ_{est}

and may not be optimal. In general, a binary hypothesis testing strategy can be characterized in terms of the separation of the possible values of x into the two disjoint subsets

X_{1}

and

X_{2}

which are used to choose hypothesis

H_{1}

or

H_{2}

, respectively. That is, if

x \in X_{1}

we pick

H_{1}

and otherwise

H_{2}

. Since one of the two hypotheses must be true, we have

\begin{matrix} 1 & = & p (H_{1}) + p (H_{2}) \\ = & \int_{X_{1}} d x p (x | H_{1}) p (H_{1}) + \int_{X_{2}} d x p (x | H_{1}) p (H_{1}) + \int_{X_{1}} d x p (x | H_{2}) p (H_{2}) + \int_{X_{2}} d x p (x | H_{2}) p (H_{2}) \\ = & \int_{X_{1}} d x p (x | H_{1}) p (H_{1}) + \int_{X_{2}} d x p (x | H_{2}) p (H_{2}) + P_{err}^{X_{1}} (H_{1}, H_{2}), \end{matrix}

(A15)

where the error made by such a strategy is given by

\begin{matrix} P_{err}^{X_{1}} (H_{1}, H_{2}) & = & P (x \in X_{2} | H_{1}) p (H_{1}) + P (x \in X_{1} | H_{2}) p (H_{2}) \\ = & \int_{X_{2}} p (x | H_{1}) p (H_{1}) d x + \int_{X_{1}} p (x | H_{2}) p (H_{2}) d x \\ = & p (H_{1}) + \int_{X_{1}} [p (x | H_{2}) p (H_{2}) - p (x | H_{1}) p (H_{1})] d x . \end{matrix}

(A16)

This probability is minimized if

p (x | H_{2}) p (H_{2}) < p (x | H_{1}) p (H_{1})

for

x \in X_{1}

and, consequently,

p (x | H_{2}) p (H_{2}) \geq p (x | H_{1}) p (H_{1})

for

x \in X_{2}

. This actually identifies an optimal strategy for hypothesis testing, known as the likelihood ratio test: if the likelihood ratio

p (x | H_{1}) / p (x | H_{2})

is larger than the threshold value

p (H_{2}) / p (H_{1}),

we pick

H_{1}

, whereas, if it is smaller, we pick

H_{2}

. With this choice, the error probability is minimal and reads

\begin{matrix} P_{\min} (H_{1}, H_{2}) & = & \int_{X_{2}} [p (x | H_{1}) p (H_{1}) - p (x | H_{2}) p (H_{2})] d x + \int_{X_{1}} [p (x | H_{2}) p (H_{2}) - p (x | H_{1}) p (H_{1})] d x + \\ + \int_{X_{1}} p (x | H_{1}) p (H_{1}) d x + \int_{X_{2}} p (x | H_{2}) p (H_{2}) d x \\ = & \frac{1}{2} - \frac{1}{2} \int |p (x | H_{1}) p (H_{1}) - p (x | H_{2}) p (H_{2})| d x, \end{matrix}

(A17)

where we used Equation (A15).

Applied to our case, we obtain

P_{\min} (φ, φ + h) = \frac{1}{2} (1 - \sum_{μ} |\frac{p (μ | θ_{0} = φ) p (φ)}{p (φ) + p (φ + h)} - \frac{p (μ | θ_{0} = φ + h) p (φ + h)}{p (φ) + p (φ + h)}|) .

(A18)

This result represents a lower bound on

P_{err}^{X_{1}} (φ, φ + h)

for arbitrary choices of

X_{1}

. This includes the case discussed in Equation (A13). Thus, using

P_{err} (φ, φ + h) \geq P_{\min} (φ, φ + h)

(A19)

in Equation (A14) and inserting back into Equation (A12), we finally obtain the Ziv–Zakai bound for the mean square error:

MSE {(θ_{est})}_{μ, θ_{0}} \geq \frac{1}{2} \int_{0}^{\infty} h d h \int d θ_{0} (p (θ_{0}) + p (θ_{0} + h)) P_{\min} (θ_{0}, θ_{0} + h) .

(A20)

This bound can be further sharpened by introducing a valley-filling function [61], which is not considered here.

References

Zehnder, L. Ein neuer Interferenzrefraktor. Zeitschrift für Instrumentenkunde 1891, 11, 275. (In German) [Google Scholar]
Mach, L. Ueber einen Interferenzrefraktor. Zeitschrift für Instrumentenkunde 1892, 12, 89. (In German) [Google Scholar]
Ramsey, N.F. Molecular Beams; Oxford University Press: London, UK, 1963. [Google Scholar]
Wynands, R. Atomic Clocks. In Lecture Notes in Physics; Muga, G., Ruschhaupt, A., Campo, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 789. [Google Scholar]
Barish, B.C.; Weiss, R. LIGO and the Detection of Gravitational Waves. Phys. Today 1999, 52, 44–50. [Google Scholar] [CrossRef]
Pitkin, M.; Reid, S.; Rowan, S.; Hough, J. Gravitational Wave Detection by Interferometry (Ground and Space). Living Rev. Relativ. 2011, 14, 5. [Google Scholar] [CrossRef] [PubMed]
Helstrom, C.W. Quantum detection and estimation theory. J. Stat. Phys. 1969, 1, 231. [Google Scholar] [CrossRef]
Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; North-Holland Publishing Company: Amsterdam, The Netherlands, 1982. [Google Scholar]
Ludlow, A.D.; Boyd, M.M.; Ye, J.; Peik, E.; Schmidt, P.O. Optical atomic clocks. Rev. Mod. Phys. 2015, 87, 637–701. [Google Scholar] [CrossRef] [Green Version]
Schnabel, R.; Mavalvala, N.; McClelland, D.E.; Lam, P.K. Quantum metrology for gravitational wave astronomy. Nat. Commun. 2010, 1, 121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Aasi, J.; Abadie, J.; Abbott, B.P.; Abbott, R.; Abbott, T.D.; Abernathy, M.R.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R.X.; et al. Enhanced sensitivity of the LIGO gravitational wave detector by using squeezed states of light. Nat. Photon. 2010, 7, 613–619. [Google Scholar] [CrossRef]
Cronin, A.D.; Schmiedmayer, J.; Pritchard, D.E. Optics and interferometry with atoms and molecules. Rev. Mod. Phys. 2009, 81, 1051–1129. [Google Scholar] [CrossRef] [Green Version]
Caves, C.M. Quantum-mechanical noise in an interferometer. Phys. Rev. D 1981, 23, 1693–1708. [Google Scholar] [CrossRef]
Giovannetti, V.; Lloyd, S.; Maccone, L. Quantum metrology. Phys. Rev. Lett. 2006, 96, 010401. [Google Scholar] [CrossRef] [PubMed]
Pezzè, L.; Smerzi, A. Entanglement, nonlinear dynamics, and the Heisenberg limit. Phys. Rev. Lett. 2009, 102, 100401. [Google Scholar] [CrossRef] [PubMed]
Hyllus, P.; Laskowski, W.; Krischek, R.; Schwemmer, C.; Wieczorek, W.; Weinfurter, H.; Pezzè, L.; Smerzi, A. Fisher information and multiparticle entanglement. Phys. Rev. A 2012, 85, 022321. [Google Scholar] [CrossRef]
Tóth, G. Multipartite entanglement and high-precision metrology. Phys. Rev. A 2012, 85, 022322. [Google Scholar] [CrossRef]
Pezzè, L.; Smerzi, A. Quantum theory of phase estimation. In Atom Interferometry, Proceedings of the International School of Physics "Enrico Fermi", Italy, 15–20 July 2013; Tino, G.M., Kasevich, M.A., Eds.; IOS Press: Sesto Fiorentino, Italy, 2014; Course 188, 691. [Google Scholar]
Tóth, G.; Apellaniz, I. Quantum metrology from a quantum information science perspective. J. Phys. A Math. Theor. 2014, 47, 424006. [Google Scholar] [CrossRef] [Green Version]
Giovannetti, V.; Lloyd, S.; Maccone, L. Advances in quantum metrology. Nat. Photon. 2011, 5, 222–229. [Google Scholar] [CrossRef] [Green Version]
Pezzè, L.; Smerzi, A.; Oberthaler, M.K.; Schimed, R.; Treutlein, P. Quantum metrology with nonclassical states of atomic ensembles. Rev. Mod. Phys. 2018, in press. [Google Scholar]
Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory, Volume I; Prentice Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer: Berlin, Germany, 1998. [Google Scholar]
Van Trees, H.L.; Bell, K.L. Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking; Wiley: New York, NY, USA, 2007. [Google Scholar]
Lane, A.S.; Braunstein, S.L.; Caves, C.M. Maximum-likelihood statistics of multiple quantum phase measurements. Phys. Rev. A 1993, 47, 1667. [Google Scholar] [CrossRef] [PubMed]
Tsang, M. Ziv–Zakai error bounds for quantum parameter estimation. Phys. Rev. Lett. 2012, 108, 230401. [Google Scholar] [CrossRef] [PubMed]
Lu, X.M.; Tsang, M. Quantum Weiss-Weinstein bounds for quantum metrology. Quantum Sci. Technol. 2016, 1, 015002. [Google Scholar] [CrossRef] [Green Version]
Hall, M.J.; Wiseman, H.M. Heisenberg-style bounds for arbitrary estimates of shift parameters including prior information. New J. Phys. 2012, 14, 033040. [Google Scholar] [CrossRef]
Giovannetti, V.; Maccone, L. Sub-Heisenberg estimation strategies are ineffective. Phys. Rev. Lett. 2012, 108, 210404. [Google Scholar] [CrossRef] [PubMed]
Pezzè, L. Sub-Heisenberg phase uncertainties. Phys. Rev. A 2013, 88, 060101(R). [Google Scholar] [CrossRef]
Pezzè, L.; Hyllus, P.; Smerzi, A. Phase-sensitivity bounds for two-mode interferometers. Phys. Rev. A 2015, 91, 032103. [Google Scholar] [CrossRef]
Hradil, Z.; Myška, R.; Peřina, J.; Zawisky, M.; Hasegawa, Y.; Rauch, H. Quantum phase in interferometry. Phys. Rev. Lett. 1996, 76, 4295. [Google Scholar] [CrossRef] [PubMed]
Pezzè, L.; Smerzi, A.; Khoury, G.; Hodelin, J.F.; Bouwmeester, D. Phase detection at the quantum limit with multiphoton mach-zehnder interferometry. Phys. Rev. Lett. 2007, 99, 223602. [Google Scholar] [CrossRef] [PubMed]
Kacprowicz, M.; Demkowicz-Dobrzanski, R.; Wasilewski, W.; Banaszek, K.; Walmsley, I.A. Experimental quantum-enhanced estimation of a lossy phase shift. Nat. Photon. 2010, 4, 357. [Google Scholar] [CrossRef]
Krischek, R.; Schwemmer, C.; Wieczorek, W.; Weinfurter, H.; Hyllus, P.; Pezzè, L.; Smerzi, A. Useful multiparticle entanglement and sub-shot-noise sensitivity in experimental phase estimation. Phys. Rev. Lett. 2011, 107, 080504. [Google Scholar] [CrossRef] [PubMed]
Xiang, G.Y.; Higgins, B.L.; Berry, D.W.; Wiseman, H.M.; Pryde, G.J. Entanglement-enhanced measurement of a completely unknown optical phase. Nat. Photon. 2011, 5, 43–47. [Google Scholar] [CrossRef]
Bollinger, J.J.; Itano, W.M.; Wineland, D.J.; Heinzen, D.J. Optimal frequency measurements with maximally correlated states. Phys. Rev. A 1996, 54, R4649–R4652. [Google Scholar] [CrossRef] [PubMed]
Pezzè, L.; Smerzi, A. Sub shot-noise interferometric phase sensitivity with beryllium ions Schrödinger cat states. Europhys. Lett. 2007, 78, 30004. [Google Scholar] [CrossRef] [Green Version]
Gerry, C.C.; Mimih, J. The parity operator in quantum optical metrology. Contemp. Phys. 2010, 51, 497. [Google Scholar] [CrossRef]
Sackett, C.A.; Kielpinski, D.; King, B.E.; Langer, C.; Meyer, V.; Myatt, C.J.; Rowe, M.; Turchette, Q.A.; Itano, W.M.; et al. Experimental entanglement of four particles. Nature 2000, 404, 256–259. [Google Scholar] [CrossRef] [PubMed]
Monz, T.; Schindler, P.; Barreiro, J.T.; Chwalla, M.; Nigg, D.; Coish, W.; Harlander, M.; Hänsel, W.; Hennrich, M.; Blatt, R. 14-Qubit Entanglement: Creation and Coherence. Phys. Rev. Lett. 2011, 106, 130506. [Google Scholar] [CrossRef] [PubMed]
Hayashi, M. Asymptotic Theory of Quantum Statistical Inference, Selected Papers; World Scientific Publishing: Singapore, 2005. [Google Scholar]
Barankin, E.W. Locally best unbiased estimates. Ann. Math. Stat. 1949, 20, 477. [Google Scholar] [CrossRef]
Mcaulay, R.J.; Hofstetter, E.M. Barankin bounds on parameter estimation. IEEE Trans. Inf. Theory 1971, 17, 669–676. [Google Scholar] [CrossRef]
Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
Rao, C.R. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1971, 37, 81–91. [Google Scholar]
Hammersley, J.M. On estimating restricted parameters. J. R. Stat. Soc. Ser. B 1950, 12, 192. [Google Scholar]
Chapman, D.G.; Robbins, H. Minimum variance estimation without regularity assumptions. Ann. Math. Stat. 1951, 22, 581. [Google Scholar] [CrossRef]
Pflanzagl, J.; Hamböker, R. Parametric Statistical Theory; De Gruyter: Berlin, Germany, 1994. [Google Scholar]
Sivia, D.S.; Skilling, J. Data Analysis: A Bayesian Tutorial; Oxford University Press: London, UK, 2006. [Google Scholar]
Robert, C.P. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation; Springer: New York, NY, USA, 2007. [Google Scholar]
Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A 1946, 186, 453. [Google Scholar] [CrossRef]
Jeffreys, H. Theory of Probability; Oxford University Press: London, UK, 1961. [Google Scholar]
Ghosh, M. Cramér–Rao bounds for posterior variances. Stat. Probabil. Lett. 1993, 17, 173. [Google Scholar] [CrossRef]
Cam, L.L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar]
Van Trees, H.L. Detection, Estimation, and Modulation Theory, Part I; Wiley: New York, NY, USA, 1968. [Google Scholar]
Schutzenberger, M.P. A generalization of the Fréchet-Cramér inequality to the case of Bayes estimation. Bull. Am. Math. Soc. 1957, 63, 142. [Google Scholar]
Ziv, J.; Zakai, M. Some lower bounds on signal parameter estimation. IEEE Trans. Inform. Theor. 1969, 15, 386–391. [Google Scholar] [CrossRef]
Bell, K.L.; Steinberg, Y.; Ephraim, Y.; Van Trees, H.L. Extended Ziv–Zakai lower bound for vector parameter estimation. IEEE Trans. Inf. Theor. 1997, 43, 624–637. [Google Scholar] [CrossRef]
Gessner, M.; Smerzi, A. Statistical speed of quantum states: Generalized quantum Fisher information and Schatten speed. Phys. Rev. A 2018, 97, 022109. [Google Scholar] [CrossRef] [Green Version]
Bellini, S.; Tartara, G. Bounds on error in signal parameter estimation. IEEE Trans. Commun. 1974, 22, 340–342. [Google Scholar] [CrossRef]

Figure 1. (a) Bias

{〈 θ_{MLE} 〉}_{μ | θ_{0}} - θ_{0}

(green dots) as function of m with error bars

{(Δ θ_{MLE})}_{μ | θ_{0}}

. The red lines are

\pm Δ θ_{CRLB} = \pm | d {〈 θ_{MLE} 〉}_{μ | θ_{0}} / d θ_{0} | / \sqrt{m F (θ_{0})}

; (b) variance of the maximum likelihood estimator multiplied by the Fisher information,

m F (θ_{0}) {(Δ^{2} θ_{MLE})}_{μ | θ_{0}}

(red circles), as a function of the sample size m. It is compared to the bias

{(d {〈 θ_{MLE} 〉}_{μ | θ_{0}} / d θ_{0})}^{2}

(red dashed line). We recall that

θ_{0} = π / 4

and

F (θ_{0}) = 4

here.

Figure 1. (a) Bias

{〈 θ_{MLE} 〉}_{μ | θ_{0}} - θ_{0}

(green dots) as function of m with error bars

{(Δ θ_{MLE})}_{μ | θ_{0}}

. The red lines are

\pm Δ θ_{CRLB} = \pm | d {〈 θ_{MLE} 〉}_{μ | θ_{0}} / d θ_{0} | / \sqrt{m F (θ_{0})}

; (b) variance of the maximum likelihood estimator multiplied by the Fisher information,

m F (θ_{0}) {(Δ^{2} θ_{MLE})}_{μ | θ_{0}}

(red circles), as a function of the sample size m. It is compared to the bias

{(d {〈 θ_{MLE} 〉}_{μ | θ_{0}} / d θ_{0})}^{2}

(red dashed line). We recall that

θ_{0} = π / 4

and

F (θ_{0}) = 4

here.

Figure 2. (a) comparison between unbiased frequentist bounds for the example considered in this manuscript, Equation (1): the CRLB

m Δ^{2} θ_{CRLB}^{ub} = 1 / F (θ_{0})

(black line), the Hammersley–Chapman–Robbins bound

m Δ^{2} θ_{ChRB}^{ub}

(Equation (15), filled triangles) and the extended Hammersley–Chapman–Robbins bound

m Δ^{2} θ_{EChRB}^{ub}

(Equation (18), empty triangles); (b) values of

λ

achieving the supremum in Equation (15), as a function of m.

Figure 2. (a) comparison between unbiased frequentist bounds for the example considered in this manuscript, Equation (1): the CRLB

m Δ^{2} θ_{CRLB}^{ub} = 1 / F (θ_{0})

(black line), the Hammersley–Chapman–Robbins bound

m Δ^{2} θ_{ChRB}^{ub}

(Equation (15), filled triangles) and the extended Hammersley–Chapman–Robbins bound

m Δ^{2} θ_{EChRB}^{ub}

(Equation (18), empty triangles); (b) values of

λ

achieving the supremum in Equation (15), as a function of m.

Figure 3. Comparisons of phase estimation variance as a function of the sample size for Bayesian and frequentist data analysis under different prior distributions, (a)

α = - 100

, (b)

α = - 10

, (c)

α = 1

, (d)

α = 10

. In all figures, Red circles (frequentist) are

m {(Δ^{2} θ_{BL})}_{μ | θ_{0}}

, the red dashed line is the Cramér-Rao lower bound

m Δ^{2} θ_{CRLB}

, Equation (8). Blue circles (Bayesian) are

m {(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

, the blue solid line is the likelihood-averaged Ghosh bound

m Δ^{2} θ_{aGB}

, Equation (25). The inset in each panel is

p_{pri} (θ)

, Equation (26), for the corresponding values of

α

.

Figure 3. Comparisons of phase estimation variance as a function of the sample size for Bayesian and frequentist data analysis under different prior distributions, (a)

α = - 100

, (b)

α = - 10

, (c)

α = 1

, (d)

α = 10

. In all figures, Red circles (frequentist) are

m {(Δ^{2} θ_{BL})}_{μ | θ_{0}}

, the red dashed line is the Cramér-Rao lower bound

m Δ^{2} θ_{CRLB}

, Equation (8). Blue circles (Bayesian) are

m {(Δ^{2} θ_{BL})}_{μ, θ | θ_{0}}

, the blue solid line is the likelihood-averaged Ghosh bound

m Δ^{2} θ_{aGB}

, Equation (25). The inset in each panel is

p_{pri} (θ)

, Equation (26), for the corresponding values of

α

.

Figure 4. Comparisons of average posterior Bayesian variance,

m {(Δ^{2} θ_{BL})}_{μ, θ}

(dots), as a function of the sample size m under different prior distributions, (a)

α = - 100

, (b)

α = - 10

, (c)

α = 1

, (d)

α = 10

. This variance is compared to to the average Ghosh bound for random parameters

m (Δ^{2} θ_{aGBr})

(grey line), the Van Trees bound

m (Δ^{2} θ_{VTB})

(green line), the Ziv–Zakai bound

m (Δ^{2} θ_{ZZB})

(red line) and

1 / F (θ_{0})

(black horizontal line). The inset in each panel is the prior

p_{pri} (θ)

, Equation (26), for the corresponding values of

α

.

Figure 4. Comparisons of average posterior Bayesian variance,

m {(Δ^{2} θ_{BL})}_{μ, θ}

(dots), as a function of the sample size m under different prior distributions, (a)

α = - 100

, (b)

α = - 10

, (c)

α = 1

, (d)

α = 10

. This variance is compared to to the average Ghosh bound for random parameters

m (Δ^{2} θ_{aGBr})

(grey line), the Van Trees bound

m (Δ^{2} θ_{VTB})

(green line), the Ziv–Zakai bound

m (Δ^{2} θ_{ZZB})

(red line) and

1 / F (θ_{0})

(black horizontal line). The inset in each panel is the prior

p_{pri} (θ)

, Equation (26), for the corresponding values of

α

.

Table 1. Frequentist vs Bayesian bounds for fixed and random parameters.

	Paradigm	Risk Function	Bounds		Remarks
$θ_{0}$ fixed	Frequentist	${(Δ^{2} θ_{est})}_{μ \| θ_{0}}$	BB	Equation (5)	hierarchy of bounds, Equation (7)
		${(Δ^{2} θ_{est})}_{μ \| θ_{0}}$	EChRB	Equation (17)
		$MSE {(θ_{est})}_{μ \| θ_{0}}$	ChRB	Equation (14)
		$MSE {(θ_{est})}_{μ \| θ_{0}}$	CRLB	Equation (8)
	Bayesian	${(Δ^{2} θ_{BL})}_{μ \| θ_{0}}$	GB	Equation (22)	function of $μ$
	Bayesian	${(Δ^{2} θ_{BL})}_{μ, θ \| θ_{0}}$	aGB	Equation (25)	average over likelihood $p (μ \| θ_{0})$
$θ_{0}$ random	Frequentist	${(Δ^{2} θ_{est})}_{μ, θ_{0}}$	aCRLB	Equation (37)	hierarchy of bounds, Equation (40)
		${(Δ^{2} θ_{est})}_{μ, θ_{0}}$	fVTB	Equation (38)	hierarchy of bounds, Equation (40)
		$MSE {(θ_{est})}_{μ, θ_{0}}$	VTB	Equation (32)	bounds are independent of the bias
		$MSE {(θ_{est})}_{μ, θ_{0}}$	ZZB	Equation (35)	bounds are independent of the bias
	Bayesian	${(Δ^{2} θ_{BL})}_{μ, θ, θ_{0}}$	aGBr	Equation (42)	prior $p_{pri} (θ)$ and fluctuations $p (θ_{0})$ arbitrary
		${(Δ^{2} θ_{BL})}_{μ, θ}$	VTB	Equation (32)	prior $p_{pri} (θ)$ and fluctuations $p (θ_{0})$ coincide
		${(Δ^{2} θ_{BL})}_{μ, θ}$	ZZB	Equation (35)	hierarchy of bounds, Equation (45)

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Pezzè, L.; Gessner, M.; Ren, Z.; Li, W.; Smerzi, A. Frequentist and Bayesian Quantum Phase Estimation. Entropy 2018, 20, 628. https://doi.org/10.3390/e20090628

AMA Style

Li Y, Pezzè L, Gessner M, Ren Z, Li W, Smerzi A. Frequentist and Bayesian Quantum Phase Estimation. Entropy. 2018; 20(9):628. https://doi.org/10.3390/e20090628

Chicago/Turabian Style

Li, Yan, Luca Pezzè, Manuel Gessner, Zhihong Ren, Weidong Li, and Augusto Smerzi. 2018. "Frequentist and Bayesian Quantum Phase Estimation" Entropy 20, no. 9: 628. https://doi.org/10.3390/e20090628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Frequentist and Bayesian Quantum Phase Estimation

Abstract

1. Introduction

2. Frequentist Approach

2.1. Frequentist Risk Functions

2.2. Frequentist Bounds on Phase Sensitivity

2.2.1. Barankin Bound

2.2.2. Cramér–Rao Lower Bound and Maximum Likelihood Estimator

2.2.3. Hammersley–Chapman–Robbins Bound

2.2.4. Extended Hammersley–Chapman–Robbins Bound

3. Bayesian Approach

3.1. Noninformative Prior

3.2. Posterior Bounds

Ghosh Bound

3.3. Average Posterior Bounds

Likelihood-Averaged Ghosh Bound

3.4. Numerical Comparison of Bayesian and Frequentist Phase Estimation

4. Bounds for Random Parameters

4.1. Frequentist Risk Functions for Random Parameters

4.2. Bounds on the Mean Square Error

4.2.1. Van Trees Bound

4.2.2. Ziv–Zakai Bound

4.3. Bounds on the Average Estimator Variance

4.3.1. Average CRLB

4.3.2. Van Trees Bound for the Average Estimator Variance

4.4. Bayesian Framework for Random Parameters

Bayesian Bounds

5. Discussion and Conclusions

Author Contributions

Acknowledgments

Conflicts of Interest

Appendix A. Derivation of the Barankin Bound

Appendix B. Derivation of the Ziv–Zakai Bound

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI