1. Introduction
Directional data refers to multivariate data with a unit norm, whose sample space can be expressed as
where
denotes the Euclidean norm. Circular data, when
, lie on a circle. Circular data are encountered in various disciplines, such as political sciences [
1], criminology [
2], biology [
3], ecology [
4], and astronomy [
5], among others.
A large class of distributions has been proposed; see [
6] for a short list. A classic distribution is the wrapped Cauchy distribution [
7], for which [
6] showed that it is a special case of the generalized projected Cauchy distribution (GCPC). The GCPC generalizes the WC by the introduction of an extra parameter that allows for anisotropy. The benefit of the GCPC distribution is that it provides a better fit to asymmetric and bimodal data.
In this paper we focus on the GCPC distribution. Specifically, we derive the relationship with the wrapped Cauchy (WC) distribution. We examine the conditions for unimodality and then provide an alternative formula for cumulative probability function. We derive non-closed-form expressions for its mean resultant length and the Kullback–Leibler divergence (KLD) from the WC distribution but derive an analytical formula for its entropy. We further propose two log-likelihood ratio tests for equality of one- and two-location parameters without assuming equal concentration parameters. We revisit maximum likelihood estimation (MLE) and regression modeling. For the MLE we show a problem that may trap the log-likelihood in a local maximum and show how to easily escape and reach the global maximum. We further correct a mistake in [
6] and empirically examine the convergence rate of the regression coefficients. A real data example, with and without predictor variables, illustrates the superior performance of the GCPC distribution compared to the WC distribution.
The next section briefly presents and examines the GCPC distribution. Simulation studies and the real data example follow, with conclusions closing the paper.
2. The GCPC Distribution
Suppose a
d-dimensional random variable
follows some multivariate distribution defined over
and we project it onto the circle/sphere/hyper-sphere,
, where
. The marginal distribution of
, which is of interest, is obtained by integrating out
r over the positive line
The probability density function of the bivariate Cauchy distribution, with some location vector
and scatter matrix
, is given by
By substituting (
2) into (
1) and evaluating the integral, ref. [
6] derived the circular projected Cauchy (CPC) distribution
where
It is important to note that
, while
.
Ref. [
6] employed one of the conditions imposed in [
8], that is,
, but not
. This condition implies that the eigenvector
of
is the normalized location vector
, where
, while the other eigenvector can be defined up to sign as
or
. The eigenvalue corresponding to the location vector is equal to 1, while the other eigenvalue is equal to
; hence,
, and the inverse of the scatter matrix is given by
Thus, (
3) becomes
. Utilizing (
4) and after some calculations, the density in (
5) may also be expressed in polar coordinates by
where
, with
,
and
.
We shall denote the eigenvalue,
, of the covariance matrix
by anisotropy parameter, and the reason is explained in the next subsection. The GCPC distribution exhibits reflective symmetry with respect to
only if
, but its density function is even since
. The maximum value of the density occurs when
and its value is
. The density of the GCPC may also be written as
where
It is important to note that, if the anisotropy parameter
, the GCPC distribution reduces to the circular independent projected Cauchy (CIPC) distribution [
6]
which is the WC distribution ([
9], p. 51) with a different parameterization
where
or, conversely,
[
6].
As the name suggests, controls the anisotropy of GCPC, and, if , we end up with an isotropic covariance matrix .
Figure 1 presents the density plots of GCPC
with
,
and
. For small values of
the distribution is bimodal, and this is more prevalent with small values of
. As the
parameter increases, the bimodality vanishes. The unimodality conditions are discussed in
Section 2.2.
2.1. Relationship with the CIPC Distribution
Theorem 1. If ϕ follows the GCPC distribution, GCPC, follows the CIPC distribution, GCPC CIPC, where is the two-argument arc-tangent: Proof. The proof is straightforward by application of the change-of-variables formula. Without loss of generality, set
. Under the transformation
we can see that
where
. Differentiating
gives
. Applying the change-of-variables formula to
, with
and
, we obtain
where in the last step we used
. This is the
density. □
Based on this we can define the opposite transformation as follows:
Lemma 1. If ψ follows the CIPC distribution, CIPC, then follows the GCPC distribution, GCPC.
Proof. The proof is again straightforward and hence omitted. □
2.2. Unimodality Conditions for the GCPC Distribution
Ref. [
6] observed that the density function of GCPC (
6) may be bimodal. They equated the derivative of the log-density to zero and obtained:
This yields two cases: either or the expression in brackets equals zero.
Case 1:
.
Case 2: The term inside the brackets equals zero. Let
:
Letting
and multiplying both sides with
yields
where
. Hence,
Finally,
Notes.
If , the quadratic degenerates and requires .
The square root term in expands as . To ensure that this is non-negative two conditions apply:
- –
If , then .
- –
If , then .
If
, the term inside the brackets in (
11) vanishes and we end up with
Case 1 (unimodality of the distribution).
If , then the following four cases apply:
- –
Case A: . The distribution is always unimodal since gives complex roots, with no further conditions needed.
- –
Case B: . The distribution is unimodal when
, i.e., when:
- –
Case C: . The distribution is unimodal when
, i.e., when:
- –
Case D: . Here always, so the distribution is never unimodal for regardless of ; bimodal critical (stationary) points always exist.
The conditions stated in Cases A and D are straightforward for the bimodality. Cases B and C state that, if falls outside the admissible region, the distribution is unimodal, where and . In those two cases, examination of bimodality requires some extra computations.
2.3. Probabilities
Following [
6], instead of the cumulative probability function, we derive the probability included within a given interval
and provide an alternative formula.
Applying Theorem 1 the formula above becomes
where
.
2.4. Mean Resultant Length
The mean resultant length is defined as . Based on this, one may compute the circular variance as and the circular standard deviation as .
Theorem 2. The mean resultant length of the GCPC distribution is given bywhere is the complete elliptic integral of the third kind [10] (the integral can be computed in R analytically via the built-in command integrate() or via the package gsl [11]) Proof. Applying the change-of-variables formula
from Theorem 1 so that
, we express
in terms of
:
Substituting and using
we obtain:
Since the integrand is even in
, doubling the integral over
and reducing via the substitution
yields
confirming the non-closed-form nature of
for
. □
There is no closed-form expression for
unless
, and this applies to higher trigonometric moments.
Figure 2a displays the values of
for a grid of values of
and
. We observe that
increases with increasing
and decreases with increasing
values.
2.5. Entropy
Theorem 3. The entropy of the GCPC distribution is given by Proof. The entropy is defined as
. Taking the logarithm of (
7), we have:
. Using the change-of-variables formula from Theorem 1, we have
and
where
.
By changing variables and substituting the result into the entropy we obtain:
The last integral equals
, so
We know that the entropy of the WC distribution is
. Let us now write
with
. Then,
, where
Using the Fourier identity
we get
. Applying the same Fourier identity at frequency
, with
,
Using
and
we obtain
Substituting the above result into (
12) we obtain
□
Note that, if
,
, and, if
,
.
Figure 2b displays the values of
H for a grid of values of
and
. We observe that, in contrast to
,
H increases with increasing
and decreases as
increases.
2.6. KLD Between GCPC and CIPC
Using Theorem 1 we derived a formula for the KLD between GCPC
and CIPC
, which admits no closed-form expression
where the expectation is taken with respect to
.
2.7. Maximum Likelihood Estimation
Ref. [
6] performed maximum likelihood estimation (MLE) using the Euclidean representation of the density of GCPC (
5). We observed that this estimation approach is suboptimal and can yield a local instead of the global maximum when the distribution is bimodal. However, we emphasize that, if the distribution is unimodal, their approach is valid. Despite their efforts to ensure the maximum via a clever implementation, we will show in the real data analysis that the use of the log-likelihood parameterized using (
5) does not lead to the global maximum since the log-likelihood can have four local maxima (ref. [
6] mentioned that the solution to Equation (
11) has four roots) when the distribution is bimodal. The safest option is to always employ the log-likelihood of the density in (
6), i.e., when the density is expressed in the circular (univariate) form and not using the Euclidean representation. In our implementation, the initial values are obtained from maximizing the CIPC distribution via the Newton–Raphson algorithm (alternatively one could use the algorithm of [
7], which is faster).
2.8. Hypothesis Testing and Confidence Intervals
2.8.1. Hypothesis Test for the Location Parameter of One Sample
In order to test whether the location parameter equals some pre-specified value we will employ the log-likelihood ratio test. Under , , so we need to maximize the restricted log-likelihood, , with respect to the other two parameters, and , and estimate their values, and . Under , , and we maximize the unrestricted log-likelihood, , to obtain , and . Under regularity conditions, .
2.8.2. Confidence Interval for the True Location Parameter
Using the log-likelihood ratio test, we can construct asymptotic confidence intervals for the true location parameter
of a sample by searching for the pair of values for which it holds that [
12]:
.
2.8.3. Equality of Two Location Parameters
To test the equality of two location parameters, without assuming equality of the parameters
and
, we will perform a log-likelihood ratio test (all the relevant functions are available in the
R package
Directional [
13]) following [
6]. Assume we have circular observations from two independent samples,
and
, where
and
denote the sample sizes of the two samples.
Under
,
, and, by using Equation (
6), the log-likelihood is written as
where
and
for
.
Under
,
, and the log-likelihood is written as
where
and
for
.
Under regularity conditions, .
2.9. Regression Modeling Revisited
Ref. [
6] defined the GCPC regression by considering the bivariate representation (Euclidean coordinates) of the circular data as opposed to their univariate nature [
14]. This is akin to the approach employed in the spherically projected multivariate linear (SPML) model, as detailed by [
15], and allows for varying concentration parameter (
). The log-likelihood of the GCPC regression model is written as
where
and
with
denoting the
i-th observation of the covariate vector.
To maximize the log-likelihood of the GCPC regression model (
13), Ref. [
6] suggested the use of multiple starting values; however, we propose a computationally more efficient approach. We begin with initial regression coefficients produced by the SPML regression model of [
15] and then estimate the
parameter using Brent’s algorithm (this is available in
R via the built-in
optimize() function) [
16]. These estimates are subsequently used as starting values in
R’s built-in
optim() function. Our experiments have shown that this process is faster and results in a more stable optimization. Ref. [
6] did not consider the case of circular–circular regression, where a covariate
X is circular. But, this case is easily accommodated in the above scenario by transforming the circular covariate to Euclidean by using
. In case of multiple circular covariates, all of them are transformed into their Euclidean coordinates and the same link function (
14) is used. Simplicial predictors (compositional data) are straightforward to add by using two options. The first is to apply the additive log-ratio transformation [
17] so that the regression coefficients sum to zero and have a meaningful (derivative-based) interpretation. In case of zero values present the logarithmic transformation breaks down, so the alternative approach is to transform them via the
transformation.
A further extension consists of linking the parameter to the covariates, but that would increase the computational cost. A complementary approach to introduce non-linearity consists of incorporating splines for the covariate effects.