Non-asymptotic confidence sets for circular means

: The mean of data on the unit circle is deﬁned as the minimizer of the average squared Euclidean distance to the data. Based on Hoeffding’s mass concentration inequalities, non-asymptotic conﬁdence sets for circular means are constructed which are universal in the sense that they require no distributional assumptions. These are then compared with asymptotic conﬁdence sets in simulations and for a real data set.


Introduction
In applications, data assuming values on the circle, i.e., circular data, arise frequently, examples being measurements of wind directions, or time of the day that patients are admitted to a hospital unit.We refer to the literature, e.g., [1][2][3][4][5], for an overview of statistical methods for circular data, in particular the ones described in this section.
Here, we will concern ourselves with the arguably simplest statistic, the mean.However, given that a circle does not carry a vector space structure, i.e., there is neither a natural addition of points on the circle nor can one divide them by a natural number, what should the meaning of "mean" be?
In order to simplify the exposition, we specifically consider the unit circle in the complex plane, S 1 = {z ∈ C : |z| = 1}, and we assume the data can be modelled as independent random variables Z 1 , . . ., Z n which are identically distributed as the random variable Z taking values in S 1 .In the literature, however, the circle is often taken to lie in the real plane R 2 , i.e., while we denote the point on the circle corresponding to an angle θ ∈ (−π, π] by exp(iθ) = cos(θ) + i sin(θ) ∈ C one may take it to be (cos θ, sin θ) ∈ R 2 .
Of course, C is a real vector space, so the Euclidean sample mean Zn = 1 n ∑ n k=1 Z k ∈ C is well-defined.However, unless all Z k take identical values, it will (by the strict convexity of the closed unit disc) lie inside the circle, i.e., its modulus | Zn | will be less than 1.Though Zn cannot be taken as a mean on the circle, if Zn = 0, one might say that it specifies a direction; this leads to the idea of calling Zn /| Zn | the circular sample mean of the data.
Observing that the Euclidean sample mean is the minimiser of the sum of squared distances, this can be put in the more general framework of Fréchet means [6]: define the set of circular sample means to be μn = argmin and analoguously define the set of circular population means of the random variable Z to be Then, as usual, the circular sample means are the circular population means with respect to the empirical distribution of Z 1 , . . ., Z n .The circular population mean can be related to the Euclidean population mean E Z by noting that in statistics, this is called the bias-variance decomposition), so that is the set of points on the circle closest to E Z.It follows that µ is unique if and only if E Z = 0 in which case it is given by µ = E Z/| E Z|, the orthogonal projection of E Z onto the circle; otherwise, i.e., if E Z = 0, the set of circular population means is all of S 1 .We consider the information of whether the circular population mean is not unique, e.g., but not exclusively because Z is uniformly distributed over the circle, to be relevant; it thus should be inferred from the data as well.Analogously, μn is either all of S 1 or uniquely given by Zn /| Zn | according to whether Zn is 0 or not.Note that Zn = 0 a.s.if Z is continuously distributed on the circle, even if E Z = 0. Zn is what is known as the vector resultant, while Zn /| Zn | is sometimes referred to as the mean direction.
The expected squared distances minimised in Equation ( 2) are given by the metric inherited from the ambient space C; therefore, µ is also called the set of extrinsic population means.If we measured distances intrinsically along the circle, i.e., using arc-length instead of chordal distance, we would obtain what is called the set of intrinsic population means.We will not consider the latter in the following, see e.g., [7] for a comparison and [8,9] for generalizations of these concepts.
Our aim is to construct confidence sets for the circular population mean µ that form a superset of µ with a certain (so-called) coverage probability that is required to be not less than some pre-specified significance level 1 − α for α ∈ (0, 1).The classical approach is to construct an asymptotic confidence interval where the coverage probability converges to 1 − α when n tends to infinity.This can be done as follows: since Z is a bounded random variable, √ n( Zn − E Z) converges to a bivariate normal distribution when identifying C with R 2 .Now, assume E Z = 0 so µ is unique.Then, the orthogonal projection is differentiable in a neighbourhood of E Z, so the δ-method (see e.g., [1] (p. 111) or [4] (Lemma 3.1)) can be applied and one easily obtains where Arg : C \ {0} → (−π, π] ⊂ R denotes the argument of a complex number (it is defined arbitrarily at 0 ∈ C), while multiplying with µ −1 rotates such that E Z = µ is mapped to 0 ∈ (−π, π], see e.g., [4] (Proposition 3.1) or [7] (Theorem 5).Estimating the asymptotic variance and applying Slutsky's lemma, one arrives at the asymptotic confidence set provided μn is unique, where the angle determining the interval is given by with q 1− α 2 denoting the (1 − α 2 )-quantile of the standard normal distribution N (0, 1).There are two major drawbacks to the use of asymptotic confidence intervals: firstly, by definition, they do not guarantee a coverage probability of at least 1 − α for finite n, so the coverage probability for a fixed distribution and sample size may be much smaller.Indeed, Simulation 2 in Section 4 demonstrates that, even for n = 100, the coverage probability may be as low as 64% when constructing the asymptotic confidence set for 1 − α = 90%.Secondly, they assume that E Z = 0, so they are not applicable to all distributions on the circle.Since in practice it is unknown whether this assumption hold, one would have to test the hypothesis E Z = 0, possibly again by an asymptotic test, and construct the confidence set conditioned on this hypothesis having been rejected, setting C A = S 1 otherwise.However, this sequential procedure would require some adaptation taking the pre-test into account (cf.e.g., [10])-we come back to this point in Section 5-and it is not commonly implemented in practice.
We therefore aim to construct non-asymptotic confidence sets for µ, guaranteeing coverage with at least the desired probability for any sample size n, which in addition are universal in the sense that they do not make any distributional assumptions about the circular data besides them being independent and identically distributed.It has been shown in [7] that this is possible; however, the confidence sets that were constructed there were far too large to be of use in practice.Nonetheless, we start by varying that construction in Section 2 but using Hoeffding's inequality instead of Chebyshev's as in [7].Considerable improvements are possible if one takes the variance E(Im(µ −1 Z)) 2 "perpendicular to E Z" into account; this is achieved by a second construction in Section 3. Of course, the latter confidence sets will still be conservative but Proposition 2(iv) shows that they are (for 1 − α = 95%) only a factor ∼ 3  2 longer than the asymptotic ones when the sample size n is large.We further illustrate and compare those confidence sets in simulations and for an application to real data in Section 4, discussing the results obtained in Section 5.

Construction Using Hoeffding's Inequality
We will construct a confidence set as the acceptance region of a series of tests.This idea has been used before for the construction of confidence sets for the circular population mean [7] (Section 6); however, we will modify that construction by replacing Chebyshev's inequality-which is too conservative here-by three applications of Hoeffding's inequality [11] (Theorem 1): if U 1 , . . ., U n are independent random variables taking values in the bounded interval for any t ∈ (0, b − ν).The bound on the right-hand side-denoted β(t)-is continuous and strictly decreasing in t (as expected; see Appendix A) with a, b) is strictly increasing in ν (see Appendix A again), which is also to be expected.While there is no closed form expression for t(γ, ν, a, b), it can without difficulty be determined numerically.
Note that the estimate is often used and called Hoeffding's inequality [11].While this would allow to solve explicitly for t, we prefer to work with β as it is sharper, especially for ν close to b as well as for large t.Nonetheless, it shows that the tail bound β(t) tends to zero as fast as if using the central limit theorem which is why it is widely applied for bounded variables, see e.g., [12].Now, for any ζ ∈ S 1 , we will test the hypothesis that ζ is a circular population mean.This hypothesis is equivalent to saying that there is some λ ∈ [0, 1] such that E Z = λζ.Multiplication by ζ −1 then rotates E Z onto the non-negative real axis: . ., n which may be viewed as the projection of Z 1 , . . ., Z k onto the line in the direction of ζ and onto the line perpendicular to it.Both are sequences of independent random variables taking values in [−1, 1] with E X k = λ and E Y k = 0 under the hypothesis.They thus fulfill the conditions for Hoeffding's inequality with a = −1, b = 1 and ν = λ or 0, respectively.
We will first consider the case of non-uniqueness of the circular mean, i.e., µ = S 1 , or equivalently λ = 0.Then, the critical value s 0 = t( α 4 , 0, −1, 1) is well-defined for any α 4 > 2 −n , and we get P( Xn ≥ s 0 ) ≤ α 4 , and also, by considering −X 1 , . . ., −X n , that P(− Xn ≥ s 0 ) ≤ α 4 .Analogously, We conclude that Rejecting the hypothesis µ = S  In the case of uniqueness of the circular mean, i.e., for the hypothesis λ > 0, we use the monotonicity of ν + t(γ, ν, a, b) in ν and obtain as well.For the direction perpendicular to the direction of ζ (see Figure 2), however, we may now work with 3  8 α, so for s p = t( 3 8 α, 0, −1, 1)-which is well-defined whenever s 0 is since  3).Define C H as all ζ which we could not reject, i.e., Then, we obtain the following result:  Proof.(i) holds by construction.
3 8 α ≤ exp(−ns 2 p /2), implying that s 0 and s p are of order n − 1 2 ; the same holds stochastically for δ H since Zn → E Z a.s.Regarding the second statement of (iii), if µ is unique, consider ζ = −µ; then, τ = E Xn < 0 and − √ 2s 0 is eventually less than τ 2 and also α > 2 −n+2 eventually.Hence, the probability of obtaining the trivial confidence set C H = S 1 is eventually bounded by ), and thus will go to zero exponentially fast as n tends to infinity.

Estimating the Variance
From the central limit theorem for μn in case of unique µ, cf.Equation (4), we see that the aymptotic variance of μn gets small if | E Z| is close to 1 (then E Z is close to the boundary S 1 of the unit disc, which is possible only if the distribution is very concentrated) or if the variance E(Im(µ −1 Z)) 2 in the direction perpendicular to µ is small (if the distribution were concentrated on ±µ, this variance would be zero and μn would equal µ with large probability).While δ H (| Zn | being the denominator of its sine) takes the former into account, the latter has not been exploited yet.To do so, we need to estimate E(Im(µ −1 Z)) 2 .
Consider V n = 1 n ∑ n k=1 Y 2 k that is under the hypothesis that the corresponding ζ is the unique circular population mean has expectation is the mean of n independent random variables taking values in [0, 1] and having expectation 1 − σ 2 .By another application of Equation ( 6), we obtain P(σ 2 0, 1) holds and becomes an equality; we denote it by σ 2 = V n + t( α 4 , 1 − σ 2 , 0, 1).Inserting into Equation ( 6), it by construction fulfills It is easy to see that the right-hand side depends continuously on and is strictly decreasing in , thereby traversing the interval [0, 1] so that one can again solve the equation numerically.We then may, with an error probability of at most α 4 , use σ 2 as an upper bound for σ 2 .Note that The latter is fulfilled for any V n < 1 since Equation ( 9) is equivalent to With such an upper bound on its variance, we now can get a better estimate for P( Ȳn > t).Indeed, one may use another inequality by Hoeffding [11] (Theorem 3): the mean Wn = 1 n ∑ n k=1 W k of a sequence W 1 , . . ., W n of independent random variables taking values in (−∞, 1], each having zero expectation as well as variance ρ 2 fulfills for any w ∈ (0, 1).Again, an elementary calculation (analogous to Lemma A1) shows that the right-hand side of Equation ( 10) is strictly decreasing in w, continuously ranging between 1 and n as w varies in (0, 1), so that there exists a unique w = w(γ, ρ 2 ) for which the right-hand side equals γ, provided γ ∈ ρ 2 1+ρ 2 n , 1 .Moreover, the right-hand side increases with ρ 2 (as expected), so that w(γ, ρ 2 ) is increasing in ρ 2 , too (cf.Appendix A).
Therefore, under the hypothesis that the corresponding ζ is the unique circular population mean, 1+ρ 2 increases with ρ 2 , so in case s 0 exists, , i.e., the existence of s V .
Following the construction for C H from Section 2, we can again obtain a confidence set for µ with coverage probability at least 1 − α as shown in our previous article [13].In practice however, this confidence set is hard to calculate since σ 2 = σ 2 (ζ) has to be calculated for every ζ ∈ S 1 .Though these confidence sets can be approximated by using a grid as in [13], we suggest using a simultaneous upper bound for the variance of Im ζ −1 Z k .
We obtain a (conservative) connected, symmetric confidence set C V ⊆ C H by testing ζ ∈ C H with σ 2 max = sup ζ∈C H σ 2 as a common upper bound for the variance perpendicular to any ζ ∈ C H .Note that σ 2 max can be obtained as the solution of Equation ( 9) with Furthermore, we can shorten C V by iteratively redefining Ṽn recalculating C V (see Algorithm 1).The resulting opening angle will be denoted by δ V = arcsin s V | Zn | .Algorithm 1: Algorithm for computation of C V .
Data: observations Z 1 , . . ., Z n ∈ S 1 ; significance level α; stop criterion ε Result: a non-asymptotic confidence set C V for the circular population mean (i) The set C V resulting from Algorithm 1 is a (1 − α)-confidence set for the circular population mean set.In particular, if E Z = 0, i.e., the circular population mean set equals S 1 , then | Zn | > √ 2s 0 with probability at most α, so indeed C V = S 1 with probability of at least 1 − α.
(ii) s V is of order n − 1 2 .(iii) If E Z = 0, i.e., if the circular population mean is unique, then √ nδ V → 0 in probability, and the probability of obtaining a trivial confidence set, i.e., P(C 2 )-quantile of the standard normal distribution N (0, 1).
Proof.Again, (i) follows by construction, while (iii) is shown as in Proposition 1.
For (ii), note that s V ≤ s 0 since the bound in Equation ( 10) for ρ 2 = 1 agrees with the bound in Equation ( 6) for a = −1, b = 1 and v = 0, thus s V and δ V are at least of the order n − 1 2 .For (iv), we will use the estimate in Equation (11).Recall that ln(1 + x) = x − x 2 2 + o(x 2 ); therefore, for large n and hence small s V a.s.

Simulation and Application to Real Data
We will compare the asymptotic confidence set C A , the confidence set C H constructed directly using Hoeffding's inequality in Section 2, and the confidence set C V resulting from Algorithm 1 by reporting their corresponding opening angles δ A , δ H , and δ V in degrees ( • ) as well as their coverage frequencies in simulations.
All computations have been performed using our own code based on the software package R (version 2.15.3) [14] .

Simulation 1: Two Points of Equal Mass at ±10 •
First, we consider a rather favourable situation: n = 400 independent draws from the distribution with P(Z = exp(10πi/180)) = P(Z = exp(−10πi/180)) = 1 2 .Then, we have | E Z| = E Z = cos(10πi/180) ≈ 0.985, implying that the data are highly concentrated, µ = 1 is unique, and the variance of Z in the direction of µ is 0; there is only variation perpendicular to µ, i.e., in the direction of the imaginary axis (see Figure 4).Table 1.Results for simulation 1 (two points of equal mass at ±10 • ) based on 10,000 repetitions with n = 400 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which µ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 − α = 95%.  1 shows the results based on 10,000 repetitions for a nominal coverage probability of 1 − α = 95%: the average δ H is about 3.5 times larger than δ V , which is about twice as large as δ A .As expected, the asymptotics are rather precise in this situation: C A did cover the true mean in about 95% of the cases, which implies that the other confidence sets are quite conservative; indeed C H and C V covered the true mean in all repetitions.One may also note that the angles varied only a little between repetitions.

Simulation 2: Three Points Placed Asymmetrically
Secondly, we consider a situation which has been designed to show that even a considerably large sample size (n = 100) not guarantee approximate coverage for the asymptotic confidence set C A : the distribution of Z is concentrated on three points, ξ j = exp(θ j πi/180), j = 1, 2, 3 with weights ω j = P(Z = ξ j ) chosen such that E Z = | E Z| = 0.9 (implying a small variance and µ = 1), ω 1 = 1% and Arg ξ 1 > 0, while Arg ξ 2 , Arg ξ 3 < 0. In numbers, θ 1 ≈ 25.8, θ 2 ≈ −0.3, and θ 3 ≈ −179.7 (in • ) while ω 2 ≈ 94%, and ω 3 ≈ 5% (see Figure 5).The results based on 10,000 repetitions are shown in Table 2 where a nominal coverage probability of 1 − α = 90% was prescribed.Clearly, C A with its coverage probability of less than 64% performs quite poorly while the others are conservative; δ V ≈ 5 • still appears small enough to be useful in practice, though.Fisher [3] (Example 4.4) describes a data set of the directions 100 ants took in response to an illuminated target placed at 180 • for which it may be of interest to know whether the ants indeed (on average) move towards that target (see [15] for the original publication).The data set is available as Ants_radians within the R package CircNNTSR [16].
The circular sample mean for this data set is about −176.9 • ; for a nominal coverage probability of 1 − α = 95%, one gets δ H ≈ 27.3 • , δ V ≈ 20.5 • , and δ A ≈ 9.6 • so that all confidence sets contain ±180 • (see Figure 6).The data set's concentration is not very high, however, so the circular population mean could-according to C V -also be −156.4• or 162.6 • .
Target Figure 6.Ant data ( ) placed at increasing radii to visually resolve ties; in addition, the circular mean direction ( ) as well as confidence sets C H ( ), C V ( ), and C A ( ) are shown.

Discussion
We have derived two confidence sets, C H and C V , for the set of circular sample means.Both guarantee coverage for any finite sample size without making any assumptions on the distribution of the data (besides that they are independent and identically distributed) at the cost of potentially being quite conservative; they are non-asymptotic and universal in this sense.Judging from the simulations and the real data set, C V -which estimates the variance perpendicular to the mean direction-appears to be preferable over C H (as expected) and small enough to be useful in practice.
While the asymptotic confidence set's opening angle is less than half (asymptotically about 2/3 for α = 5%) of the one for C V in our simulations and application, it has the drawback that even for a sample size of n = 100, it may fail to give a coverage probability close to the nominal one; in addition, one has to assume that the circular population mean is unique.Of course, one could also devise an asymptotically justified test for the latter but this would entail a correction for multiple testing (for example working with α 2 each time), which would also render the asymptotic confidence set conservative.
Further improvements would require sharper "universal" mass concentration inequalities taking the first or the first two moments into account; however, this is beyond the scope of this article.Lemma A4.Let w = w(γ, ρ 2 ) be the solution of the equation Then, w is increasing in ρ 2 .
Proof.w is the solution of the equation The derivatives of the left-hand side of Equation (A2) w.r.t.ρ 2 and w exist and are continuous.Furthermore, the derivative w.r.t.w does not vanish for any w ∈ (0, 1): this derivative is

4 P 4 P 4 P 4 Figure 1 .
Figure 1.The construction for the test of the hypothesis µ = S 1 , or equivalently E Z = 0.

4 P(Im ζ − 1 Figure 2 .
Figure 2. The construction for the test of the hypothesis E Z = λζ with λ > 0.

Figure 3 .Proposition 1 .
Figure 3.The critical Zn regarding the rejection of ζ. δ H bounds the angle between μn and any accepted ζ.

Figure 4 .
Figure 4. Two points of equal mass at ±10 • and their Euclidean mean.

0 θ 1 =Figure 5 .
Figure 5. Three points placed asymmetrically with different masses and their Euclidean mean.

Table 2 .
Results for simulation 2 (three points placed asymmetrically) based on 10,000 repetitions with n = 100 observations each: average observed δ H , δ V , and δ A (with corresponding standard deviation), as well as frequency (with corresponding standard error) with which µ = 1 was covered by C H , C V , and C A , respectively; the nominal coverage probability was 1 − α = 90%.