Robust phase estimation of Gaussian states in the presence of outlier quantum states

In this paper, we investigate the problem of estimating the phase of a coherent state in the presence of unavoidable noisy quantum states. These unwarranted quantum states are represented by outlier quantum states in this study. We first present a statistical framework of robust statistics in a quantum system to handle outlier quantum states. We then apply the method of M-estimators to suppress untrusted measurement outcomes due to outlier quantum states. Our proposal has the advantage over the classical methods in being systematic, easy to implement, and robust against occurrence of noisy states.


Introduction
One of the challenges in developing quantum information technologies is to suppress uncontrollable elements in both classical and quantum devices. This has been an active research subject called state preparation and measurement (SPAM) error [1][2][3]. For example, imperfection at the quantum state preparation stage generates unwarranted outlier quantum states at random. Under this circumstance, it is impossible to predict precisely when these outlier quantum states are generated. The resulting quantum state is represented by a convex mixture of the desired quantum state and the outlier quantum states. Thus, the actual model is contaminated by outlier quantum states. No matter how small the occurrence of these outlier quantum states, they affect statistics of measurement outcomes. In this paper, we develop a statistical framework to handle certain types of SPAM errors by applying robust statistics [4][5][6][7][8].
Data due to undesired quantum states are called outliers in statistics, and they typically lie outside the range of trusted data. The traditional working rule to remove outliers in practice is the so-called 3-σ rule. In short, all data that are 3-σ away from the sample mean are discarded. This is intimately related to the tradition of 3-σ confidence in statistics. However, there is no justification for such heuristic and subjective data processing from the statistical point of view. The current status in the community is in fact that we should not rely on the 3-σ confidence [9][10][11]. Robust statistics is a branch of statistics and has been one of the proper tools to remove outliers systematically [4][5][6][7][8]. Recent success of robust statistics in machine learning is another justification to use it rather than the 3-σ rule.
The problem of handling outlier quantum states is a practical and important issue in any quantum communication protocols. Yet, it seems that this problem has not been addressed properly in the framework of robust statistics to the best of our knowledge. The main contribution of this paper is first to present a statistical framework to handle the problem of state estimation in the presence of unknown outlier quantum states. We then demonstrate the usefulness and effectiveness of robust statistics in the quantum case. To this end, we consider a specific problem-phase estimation of coherent states in the presence of outlier quantum states. This problem has many practical applications in quantum communication protocols using continuous variables [12][13][14]. Measurement data drawn according to this contaminated state contain outliers. We apply two specific M-estimators to make robust estimates from data. We find that a recently proposed robust estimator in [15] based on the generalized divergence performs well. We compare its performance to other methods and evaluate its robustness based on our proposed figure of merit called a ε-curve.
The system of noisy quantum gaussian states has a long history [16][17][18][19][20][21]. Further developments in applications to quantum information processing protocols were studied over the last two decades [12][13][14][20][21][22][23][24][25]. We note that our model is described by a different class of noise models studied in the literature. The main difference from the previous studies is that our noise model is not a typical completely positive and trace preserving (CP-TP) map. In particular, the model studied in this paper is not described by a unital CP-TP map, but a mixture of different quantum gaussian states. Another distinction is that we do not need to assume a specific form for outlier quantum states when applying our method to real data. Note that we could apply the conventional parameter estimation method, if we know specific forms of noisy quantum states. For example, we can treat noisy states as nuisance [26,27]. By contrast, the proposed method of M-estimators can be applied to the occurrence of unknown noisy quantum states. This is one of the practical advantages of the theory of robust statistics. In this sense, we do not need to know any physical mechanism to create these outlier quantum states in our setting.
In our study, the question of the ultimate precision limit, which is an active subject of quantum metrology in quantum gaussian systems [28][29][30][31][32][33][34][35], was not a primary objective. This is because the ultimate limit typically depends on the nature of the noise model. Furthermore, we cannot derive such an ultimate precision limit without having knowledge of outlier quantum states. Instead, we aimed at finding a practical and robust estimation strategy, and this is the basic philosophy of robust statistics.
The outline of the paper is as follows. In Section 2, a short summary of robust statistics is given for the paper to begin self-contained. In Section 3, we discuss robustness of estimators and propose a new measure of robustness used in this paper. In Section 4, we develop the concept of a quantum statistical model in the presence of outlier quantum states. In Section 5, we apply our formalism to the problem of phase estimation of coherent states. The last Section 6 presents a summary of the paper.

Preliminaries
In this section, we present a short summary on robust statistics based on the theory of M-estimators. To provide the basic idea, we focus on estimating a single parameter case. Its extension to multiple parameters can be done similarly. The purpose of this section is to give a simple recipe to apply M-estimators. Readers who are only interested in applying M-estimators can skip most of this section. We provide a summary of how to apply the theory of M-estimators at the end of this section. See Refs. [4][5][6][7][8] for more detailed discussions.

M-estimator
An M-estimator is a generalization of the maximum likelihood estimator (MLE) and is defined as follows. Consider a datum x = {x 1 , . . . , x n } of the sample size n, which is identically and independently distributed (i.i.d.) according to one-parameter family of probability density functions f (x|θ). The MLE to estimate θ is defined by:θ The MLE needs to be a stationary point of the logarithmic likelihood equation: The basic philosophy behind the M-estimator is to generalize this equation by: where ψ(x|θ) can be an arbitrary function as long as it satisfies certain conditions [8]. An estimator, which is defined as the above generalized stationary condition (3), is called an M-estimator. Equation (3) is called an M-equation in robust statistics. Clearly, the M-estimator depends on a choice of ψ function. The choice ψ(x|θ) = d dθ log f (x|θ) corresponds to the familiar MLE. In the following discussion, we consider estimation of the expectation value µ of the model for simplicity. We assume that the true probability density function is a function of f (x − µ) and is symmetric at the origin, i.e., f (−x) = f (x). A typical example of this kind is the normal distribution. Note that it is easy to generalize our setting to an arbitrary location model [8]. Under this assumption, an M-equation to infer the parameter µ is a function of x − µ, and hence we have: The zoology of M-estimators has been studied in robust statistics, see for example studies [5,7,8]. For our purpose, we consider two specific M-estimators; bisquare and gamma M-estimators. The ψ function for the bisquare M-estimator (also known as Tukey's estimator) is given by: where c is a tuning parameter. The basic property of a bisquare M-estimator is to suppress contribution from data that are far away from the true parameter µ. Clearly, the ψ bi function vanishes at |x − µ| = c smoothly. In [15], an M-estimator was proposed based on the gamma divergence, which is a generalization of the Kullback-Leibler divergence (also called the relative entropy). Its performance was demonstrated to be more robust than the traditional M-estimators. When the true probability density function obeys the normal distribution, the ψ function is defined by: where γ > 0 is a tuning parameter appearing in the power of the normal distribution N (µ, σ) (µ is the expectation value, and σ is the standard deviation). Another tuning parameter, the standard deviation σ, needs to be properly chosen as well. This will be discussed later. ψ gam approaches 0 as |x| → ∞, and its convergence is exponential.

Tuning Parameter
We now discuss the issue of the tuning parameter. Generally speaking, it is favorable to have less freedom for tuning parameters. The performance of an M-estimator should also be independent of the choice of tuning parameters. However, there is no universally accepted methodology to set these tuning parameters up. One of the standards is based on asymptotic relative efficiency as follows. This guarantees the M-estimator would perform well in a large sample regime. Let us assume that the true model is the normal distribution, and consider an M-estimator. Asymptotic relative efficiency η is defined by the ratio: where σ 2 MLE and σ 2 asymptotic denote asymptotic variances of the MLE and the M-estimator, respectively. Following the tradition, tuning parameters are determined by the condition η = 0.95 [8]. For bisquare and gamma M-estimators, explicit values for the tuning parameters are known as: whereσ is an estimated standard deviation and is often set to the normalized version of the median absolute deviation, called the MADN (The MADN is defined by MAD/0.675 with MAD being the median absolute deviation.) Another tuning parameter of the gamma M-estimator is set to be the estimate of the standard deviation. A remark concerning gamma M-estimator is important. When outliers are not far away from the neighborhood of true data, the choice γ = 0.5 is observed to be best from many examples known in the literature. We also analyzed this peculiar trick for several gaussian models and reached the same conclusion. Therefore, we adopt the choice γ = 0.5 in the remaining part of the paper.

Iterative Algorithm
The M-equation (3) is a non-linear function in general, and there is no efficient way to find the root of the equation. This point will be more problematic when estimating multiple parameters, since one has to solve coupled multivariate equations. A simple iterative method is usually used in robust statistics to find an approximated solution [8]. Let us rewrite Equation (3) as: x . This can be put into the form: We start with an initial choice for µ (0) and then iterate it according to: After several iteration steps (a = 1, 2, . . . , a fin ), we stop the algorithm. Alternatively, any stopping rule can be adopted. A common choice for the initial value is the median when estimating the expectation value. As explicitly demonstrated in Section 5.3, this iteration algorithm is efficient.
We will now summarize the method of M-estimators in practice. First, we choose an appropriate ψ function to apply. Second, we set the M-equation up by plugging an observed datum. Third, we solve this M-equation for the parameters of interest. One method is to use the above simple iterative algorithm, but other methods can be applied as well. The obtained value is the estimate based on this M-estimator. As a last remark, it is better to apply several M-estimators and compare them. Based on the comparison, we adjust the tuning parameters of the chosen ψ function. Repeating this procedure, we can obtain a more reliable estimate.

Robustness of M-estimator
In robust statistics, one of the main objectives is to construct an estimator, which is not affected by outliers. This then leads the study of robustness of M-estimators. There are several known measures for evaluating robustness quantitatively, such as influence curves, gross error sensitivity, local shift sensitivity, and breakdown point [8]. In the following, we first explain the most common figure of merit for robustness, a breakdown point, and then we introduce a new quantity ε-curve for our purpose. Readers who are not interested in detail can skip Sections 3.2 and 3.3.

Classical Contaminated Model
We will now describe a classical statistical model in the presence of outliers, which is known as the contaminated model in robust statistics. Suppose we are interested in estimating a d-parameter family of probability distributions f θ : where Θ ⊂ R d is an open subset. We denote its cumulative distribution function (CDF) by F θ . The simplest situation is when possible outliers are generated by a single probability distribution g whose CDF is G. A probabilistic mixture of two probability distributions f θ and g gives the contaminated model: where ε ∈ [0, 1) denotes the strength of occurrence of outliers. The strength of the noise ε is usually referred to as the contamination parameter. A familiar and classical example of this kind is to consider the normal distribution f θ = f N µ,σ with the expectation value µ and the standard deviation σ as the ideal probability density function. Outliers are generated by another normal distribution whose expectation value is orders of magnitude different from µ. This is equivalent to a statistical model of the form: Note that the contamination parameter as well as parameters characterized in g are to be regarded as nuisance parameters of the model M Contaminated .
The more general case of modeling outliers in the ideal case (9) is introduced as follows. Let G outlier be a set of all possible CDFs for generating outliers. In it, each element G ∈ G outlier corresponds to a different CDF generating different types of outliers. The general contaminated model with outliers is described by the CDF: where ε is the contamination parameter. In this general case, we do not need to assume specific forms of G in order to apply the method of M-estimators. This is in contrast to the setting of parametric models.

Asymptotic Breakdown Point
Intuitively, the value of the breakdown point represents the maximum ratio of outliers in a datum, which can be suppressed by an M-estimator. Beyond this breakdown point, the M-estimator will no longer return a trusted value, which is typically far away from the parameter set Θ. Consider a one-parameter model M = { f θ |θ ∈ Θ} for the sake of simplicity and denote ∂Θ by the boundary of the open set Θ. Fix the M-estimatorθ under consideration. A contaminated model with outliers is given by (12).
The asymptotic breakdown point of the estimatorθ for the model (12) is denoted by ε * θ , F θ . This is defined by the maximum value of ε ∈ [0, 1), in whichθ on the boundary remains finite for arbitrary outlier distributions. Mathematically, this is expressed as: where K ⊂ Θ is a closed bounded subset satisfying K ∩ ∂Θ = ∅ andθ ∞ (F) denotes asymptotic behavior ofθ for the distribution F. For example, the asymptotic breakdown point of the median is 0.5, since it gives values outside of the parameter set Θ when half of the sample size is made up of outliers.

Finite Breakdown Point
The above asymptotic breakdown point is defined by the asymptotic behavior of the M-estimator. To evaluate this quantity for finite sample size data, there are several variants known in the literature. For our study, we focused on the finite breakdown point (FBP) based on replacement of the true data [36,37].
Letθ n be an M-estimator and consider a datum x = {x 1 , x 2 . . . , x n } of the sample size n. Denote by x; y m one of possible data obtained by replacing n − m elements of x by y m = {y 1 , y 2 , . . . , y m }. Intuitively, the FBP of the estimatorθ n for x is defined by the maximum ratio m n , in whichθ n (x; y m ) behaves normally. This FBP is denoted by ε * n θ n , x . Typically, ε * n (θ n , x) is independent of the datum x. It can be proven that the FBP converges to the asymptotic BP in the limit n → ∞ [8].
The formal definition of the FBP is as follows. Let X m x be a set of all possible data whose intersection with x is n − m, i.e., The FBP for x is given by: In our simulation for the FBP, we randomly generated outliers obeying the normal distribution whose expectation value has a different order from that of the true distribution. See Section 5.3 for details. By definition, the concept of the FBP relies on replacement of the actual data by artificially created outliers.
Before closing this section, we will briefly discuss the evaluation of robustness. The FBP seems to be the most common choice of robustness in the literature. In statistics, one compares FBPs to show their robustness when one proposes a new M-estimator. However, there were severe critiques to relying on it as a measure of robustness [4]. One of the major objections is that this quantity does not concern the actual estimate at all. In other words, a robust estimator in the sense of high FBP could be a very poor estimator. To see this point, we will evaluate robustness based on the FBP together with a newly proposed figure of merit, called an ε-curve. This is defied by the behavior of M-estimators as a function of the contamination parameter ε. This ε-curve is conceptually simple, and it concerns the actual estimate. In this paper, we find it more natural to measure robustness based on the ε-curve. See Section 5.3 for a detailed comparison.

Quantum Statistical Model with Outliers
In this section, we develop the concept of a quantum gaussian model with unavoidable outlier states. First, we consider the general quantum statistical model in the presence of outlier quantum states.
The ideal quantum statistical model is given by the family of states: Let G outlier be a set of possible outlier quantum states. For example, the set G outlier consists of L elements as: The case of a continuous set of outlier quantum states can be defined similarly.
Following the same philosophy as the classical contaminated model (12), we define a quantum contaminated model by: Here, the contamination parameter ε ∈ [0, 1) represents the strength of contamination. Importantly, we do not have precise knowledge on the values ε or σ. This is in contrast to the conventional noise model in the quantum estimation theory, where we assume a parametric family of noise models. Without knowing a specific form of noisy quantum states, we cannot apply the standard methodology of parametric inference to infer the parameters of interest. The simplest case is when there is one particular outlier quantum state σ, which may or may not be known to us. In this situation, we have ρ ε θ = (1 − ε)ρ θ + εσ. When the outlier quantum states exhibit a probabilistic structure, the second term in Equation (17) is expressed as a convex mixture of outlier states. Hence, we can express the above model (17) as: where p(s) is a probability density function for the occurrence of the parametric family of outlier states {σ(s)}, and s is in general a vector-valued quantity.
A few remarks about our setting are as follows. First, the question of the optimal estimator. It is in general a difficult task to derive an optimal estimatorθ for our model (17). This is because the MLE, which is asymptotically optimal, is no longer optimal in the presence of unknown outlier quantum states.
Second, an additional complication comes in the quantum case due to the measurement degree of freedom. In the quantum estimation theory, we can derive an optimal measurement strategy to extract the maximum information about the parameter θ from the ideal model M. However, this optimal measurement is no longer optimal for the quantum contaminated model (17). In fact, it is almost impossible to identify the optimal measurement, when the set G outlier is not fixed but has uncertain elements. The purpose of robust statistics is to estimate the parameter of interest θ in the presence of nuisance parameter ε and unknown outlier quantum states.
With this in mind, we shall not explore an optimal estimation strategy in this paper, but we fix a good measurement for the ideal state. We then apply the method of the M-estimator to make robust estimates on θ.
Third, the model (18) can be understood as a quantum channel (a CP-TP map). This quantum noise model acts on the ideal state ρ θ . The simplest instance of a single outlier state is expressed as Note that this quantum channel is not unital in general. Although this class of non-unital maps is well studied, the more general forms (17) or (18) are not explored in view of robust statistics to the best of our knowledge.
Last, measurement outcomes. A measurement is described by a positive operator-valued measure (POVM). Let Π x ≥ 0 (x ∈ X ) be elements of the POVM such that Π x dx = I (the identity operator). When we perform this POVM on the state (18), the resulting measurement outcomes obey the probability density function: This is a probability mixture of the ideal measurement outcome tr (ρ θ Π x ) and noisy outcomes tr (σ(s)Π x ). This relation establishes a connection to the classical contaminated model (12).

Quantum Gaussian State with Outliers
We will now consider a concrete example for a quantum contaminated model for a coherent state. In our model, the ideal state is an unknown coherent state, and outlier quantum states are given by the thermal gaussian states. A motivation for considering this model is that there occur noisy thermal states with some frequency at the state preparation stage. This could be treated as the quantum contaminated model of the form (17).
The standard definition of the coherent state, which is characterized by a complex number α ∈ C, is: where |j is the Fock state of j photons. We can also define the coherent state by a unitary shift as follows. Letâ † andâ be the creation and annihilation operators, respectively, and define the shift operator by: The coherent state (20) is then expressed as a shifted state by α from the vacuum state |0 : Next, we consider thermal states as outlier quantum states. The thermal state at the inverse temperature β = 1 k B T (k B : the Boltzmann constant) is defined by: where C β = 1 − e −β and we set the unit energy for the single-mode field without loss of generality, i.e.,hω = 1. Note that the zero temperature limit, β → ∞, converges to the vacuum state: A quantum gaussian shift state at the inverse temperature β is defined by: which is also expressed in the integral form as: with 2κ 2 = e β − 1 −1 . It is straightforward to see that the zero temperature limit of the quantum gaussian shift state is a coherent state, i.e., lim β→∞ ρ GS β,α = |α α|.
The parameter κ (κ ≥ 0) represents a dispersion of thermal spreads of coherent states as seen from Equation (26). In the following we mainly use κ and denote the quantum gaussian state state simply by ρ α,κ , since there is a one-to-one correspondence between β and κ.
Let α ∈ C be the complex parameter of interest, and consider quantum outlier states described by the quantum gaussian shift state ρ z,κ . With these definitions, our quantum contaminated model is defined by: In this expression, p(z, κ) describes the probability density function of the outlier quantum state ρ z,κ . From this expression, we see that the quantum contaminated model is a mixed quantum gaussian models in our setting. In practice, we take the independent density function as p(z, κ) = p 1 (z R )p 2 (z I )p 3 (κ), where z R = Re z and z I = Im z denote the real and imaginary parts of z, respectively. To implement our numerical simulation of the above model, we consider two different settings as follows.
Single outlier quantum state: When there is one particular outlier quantum state, the model (28) is expressed as: This corresponds to denotes the Dirac delta function. In this model, our interest is two real parameters, θ 1 = Re α and θ 2 = Im α. All other parameters, ε, z 0 , and κ 0 , are the nuisance parameters.
Distributed outlier quantum states: Consider the case when a center of possible outlier quantum states α are generated by the normal distributions on the phase space with a given dispersion κ 0 . The quantum contaminated model is expressed as: This corresponds to set a probability distribution for outlier quantum states as (28). To remind ourselves, f N µ,σ (x) is a probability density function of the normal distribution with the expectation value µ and the standard deviation σ. Thus, taking a limit σ 1 , σ 2 → 0 reduces to the case of a single outlier quantum state. The above distributed outlier quantum states can be easily generalized to the case of multiple centers by adding more terms to Equation (30).

Homodyne Measurement on the Noisy Quantum Gaussian States
Denote the standard quadrature operators by: The homodyne measurement at φ is defined by a projection measurement of an observable: As is well known in quantum optics, a homodyne measurement at φ on the coherent state |α gives statistics of the normal distribution whose probability density function is: Therefore, denoting θ 1 = Re α and θ 2 = Im α, respectively, it is equivalent to the normal distribution: where N (µ, σ) denotes the normal distribution with the expectation value µ and the standard deviation σ. More generally, statistics of measurement outcomes of homodyne measurement on the quantum gaussian state ρ α,κ (25) are: This identifies statistics of the normal distribution: Combining our quantum contaminated model (28) and measurement statistics (33), we obtain statistics for homodyne measurement at φ on the noisy quantum gaussian states as: This is a convex mixture of normal distributions. For the case of a single outlier quantum state (29), we have: When we consider homodyne measurement on a more general quantum contaminated model (30), measurement statistics are given by: We note that the second term can be further simplified by the use of a convolution formula of the normal distribution. For our purpose, the above expression suffices to implement our numerical simulation, which will be discussed in the next section.

Phase Estimation of Noisy Coherent State
To illustrate advantages of the M-estimator in noisy quantum gaussian systems, we consider a phase estimation problem, which has many important applications for quantum information processing protocols [28][29][30][31][32][33][34][35]. Suppose an unknown coherent state |α is given. Importantly, the ideal state is parametrized by two real parameters (α R , α I ) ∈ R 2 (α R = Re α and α I = Im α). Our primary interest is to estimate the phase θ = arctan α I α R of an unknown coherent state |α . In our setting, the phase θ is the parameter of interest and the other parameter, the amplitude of the state, r = α 2 R + α 2 I is the nuisance parameter of the model [26,27]. The optimal estimation strategy for this problem is known. However, this optimal measurement depends on the unknown phase φ and cannot be implemented. In the following, we consider a random mixture of two homodyne measurements at φ = 0, π/2. As an estimator, we first apply M-estimators (α I ,α R ) to infer the value α. We then convert it to phase by:θ A schematic diagram of our setting is given in Figure 1.

Numerical Simulation
We will now describe procedures of our numerical simulation. We set the true coherent state as α = 10 + 4i. The true phase value is θ = arctan(0.4) 0.3805. We randomly generate n quantum gaussian states, which is a convex mixture of the ideal coherent state and outlier quantum states (either a single outlier quantum state case (29) or a distributed outlier quantum state case (30)). We perform a random homodyne measurements at φ = 0, π/2 with equal probability. From measurement outcomes x at φ = 0, we apply an M-estimator to estimateα R (x). The imaginary part will be estimated by homodyne measurement at φ = π/2. We repeat the iterative algorithm of Section 2.3 to obtain estimates. We stop iteration when the difference between two successive estimates is below a predetermined threshold value. We then apply the formula (37) to obtain an estimateθ n (x). In our simulation, we change the sample size n for a given contamination parameter ε. We compare two types of M-estimators (bisquare and gamma) together with the sample mean and the median. (The sample mean for the ideal coherent state corresponds to the MLE in our setting).

Single Outlier Quantum State
An outlier quantum state is set as z 0 = 15 + 15i, κ 0 = 0.1 in Equation (29). We first change the sample size n = 1000, 2000, . . . , 5000 to analyze performances of M-estimators for the contamination parameter ε = 0.01. Comparisons of M-estimatorsα R andα I vs. the sample size n are plotted in Figure 2a,b, respectively. In Figure 2c,d, we plot the mean square errors (MSEs) of these estimators vs. the sample size n for the contamination parameter ε = 0.01. In these figures, estimates from the sample mean (yellow-green), the median (purple), bisquare M-estimator (blue), and gamma M-estimator (red) are plotted, averaged over 500 runs of each setting. The true values are plotted by a dashed-dotted line (orange). From Figure 2a,b, we can see that the bisqure and gamma M-estimators both performed well when compared to the sample mean and the median. The observed differences between these two figures come from the fact that α R is harder to estimate than α I . This is because measurement outcomes at φ = 0 are more sensitive to outlier quantum states, as the two gaussian distributions are close to each other. Good performances in the MSEs were also observed for the bisqure and gamma M-estimators, as shown in Figure 2c,d. We should stress that the mean and the median are not consistent estimators for our setting.   Next, we plot the result for phase estimation in Figure 3. The same convention as in Figure 2 is used. From Figure 3a,b, we draw the same conclusions for the bisqure and gamma M-estimators as in Figure 2. Namely, they are robust and consistent estimators. This is well understood from the fact that it is sufficient to have good estimators for α R and α I to estimate phase accurately.

Robustness of M-estimators
We will now analyze the robustness of the two M-estimators, the sample mean, and the median. First, we analyzed the FBP (14) for phase estimation, as shown in Figure 4. To produce this figure, we randomly replaced data with artificial data, which were generated by the normal distribution N (1000, 0.1). The step size for increasing the number of replacing outliers (outlier counts) is 250 in Figure 4. In this setting, an estimator is regarded as being outside the parameter space when it returns the value arctan(1) 0.785. This corresponds toα R andα I being close to 1000.
Based on this figure, FBPs for four estimators were roughly estimated as follows: mean = 1500/5000 = 0.3, bisquare = 1750/5000 = 0.35, gamma and median = 2750/5000 = 0.55. From a first glance at this figure, we might conclude that the median and gamma M-estimators are robust estimators. However, we already know from Figure 3 that the median does not show consistency as an estimator. This discrepancy is easily resolved by realizing that the definition of the FBP is not related to the actual estimate at all. From this simple example, we conclude that the FBP should not be used as a measure of robustness a priori. Next, we plotted the ε-curve proposed at the end of Section 3. This is to plot behaviors of an M-estimator as a function of the contamination parameter ε for a fixed sample size. In Figure 5, we show the ε-curve for phase estimation for the sample size n = 5000. From this figure, we find that gamma M-estimators were the most robust estimators for small ε. By contrast, the sample mean and the median smoothly deviated from the true value. However, it is noted that the bisquare and gamma M-estimators behaved in uncontrollable manners after some threshold: ε 0.25 for bisquare and ε 0.35 for gamma. Before we move to the next subsection, we report the average number of iterations of the M-estimator algorithm in Tables 1 and 2. From the two tables, we see that the number of iterations needed to get a robust estimate was relatively small. This shows that the M-estimator can be implemented efficiently.

Distributed Outlier Quantum States
We will now consider a more general setting where outlier quantum states are distributed around the vacuum state of the form Equation (30). Outlier quantum states are generated according to the normal distribution as α R ∼ N(0.1, 0.1), α I ∼ N(0.1, 0.1) with a fixed value of dispersion κ 0 = 0.1. We set the true coherent state as α = 10 − 4i in this example. The true phase value is θ = arctan(−0.4) −0.3805. In Figure 6, we show the performances of the M-estimators together with the sample mean and the median for the contamination parameter ε = 0.01. The same convention as in Figure 2 is used. From Figure 6, we observe that the bisquare and gamma M-estimators performed well compared to other estimators. In this class of outlier quantum states, the median seems to be a bad choice to use. In fact, its performance was worse than that of the sample mean.  Finally, we analyzed the robustness of M-estimators by the ε-curve. In Figure 7, we plot the ε-curve for phase estimation of the sample size n = 5000. From this result, we found that the gamma M-estimator was most robust and the bisquare M-estimator was second. The sample mean was also found to be relatively robust with small fluctuation. However, this behavior in fact depends on the nature of outlier quantum states as discussed in the next subsection. Lastly, the result of the median showed that it is not trustable for the distributed outlier quantum states of this kind.

Discussion
Following the two reasons described below, we conclude that the gamma M-estimator is the most robust estimator in comparison to the mean, median, and the well-known robust estimator bisquare, upon estimating the phase θ of the coherent state. First, from Figures 3a and 6a, gamma was better than the other estimators in terms of accuracy in estimating the phase θ. Second, ε-curves ( Figures 5  and 7) suggest that the gamma M-estimator was the best in terms of robustness as well.
From the MSE plots (Figures 3b and 6b), it was confirmed that the gamma estimator and the bisquare estimator had similar convergence speeds. Furthermore, from Tables 1 and 2, it was found that the iteration of the robust estimation algorithm was suppressed to 10 or less for both gamma and bisquare before breakdown. In Figure 6a, relatively good accuracy of the mean estimation was observed. This is because of the effect of the nuisance parameter in our model. In fact, the sample means for α R and α I behaved very poorly with large biases. However, the errors of each estimation of α R and α I were canceled out. This then yielded an improvement in estimating the phase θ. We remark that this kind of an accidental improvement can only happen in a special case. When we do not have prior knowledge on possible outlier quantum states, M-estimators give more robust and reliable estimates.

Conclusions and Outlook
In this paper, we studied the problem of phase estimation for coherent states in the presence of unavoidable outlier quantum states. Measurement outcomes were then distributed according to a convex mixture of the ideal distribution and noisy distributions due to outliers. To overcome this problem, the methodology of M-estimators was introduced based on the well-established theory of robust statistics. Two specific M-estimators, bisquare and gamma, were studied to illustrate the advantage over conventional estimators. We found that the gamma M-estimator is most accurate as well as robust against the effect of outlier quantum states.
Before closing our paper, we would like to make a few remarks regarding future works. Our formalism applies to more general estimation settings with a quantum gaussian state, such as squeezed states and entangled states. This is done by replacing the ideal state, a coherent state, by other states of interest in Equation (28). More general quantum contaminated models can be treated without any modification. Preliminary studies showed that M-estimators work efficiently in this case as well. Our quantum statistical model in the presence of outlier quantum states can also be used to model imperfections in other elements of quantum information processing, such as gate operations and measurement processes. In this sense, the proposed method of M-estimators has wider applications to overcome certain types of SPAM errors in practice. Last, we mainly considered a quantum communication scenario in this paper. Utilization of robust statistics should be relevant to handle the problem of outlier quantum states when implementing other high precision measurement schemes based on quantum resources, such as quantum metrology, quantum imaging, and quantum sensing.

Conflicts of Interest:
The authors declare no conflict of interest.