Generalized Cardioid Distributions for Circular Data Analysis

The Cardioid (C) distribution is one of the most important models for modeling circular data. Although some of its structural properties have been derived, this distribution is not appropriate for asymmetry and multimodal phenomena in the circle, and then extensions are required. There are various general methods that can be used to produce circular distributions. This paper proposes four extensions of the C distribution based on the beta, Kumaraswamy, gamma, and Marshall–Olkin generators. We obtain a unique linear representation of their densities and some mathematical properties. Inference procedures for the parameters are also investigated. We perform two applications on real data, where the new models are compared to the C distribution and one of its extensions.


Introduction
Fitting densities to data has a long history. Statistical distributions are very useful in describing and predicting real world phenomena. Hundreds of extended distributions have been developed by introducing one or more parameters to a baseline distribution over the past decades for modeling data in several disciplines, in particular in reliability engineering [1], survival analysis [2], demography [3], actuarial science [4], etc.
Adding parameters to a well-established distribution is a time honored device for obtaining more flexible new families of distributions. In fact, several classes of distributions have been introduced by adding one or more parameters to generate new distributions in the statistical literature. Recent developments address definitions of new families that extend well-known distributions and, at the same time, provide great flexibility in modeling real data. The well-known generators are the Marshall-Olkin-G [5], beta-G [6], gamma-G [7], Kumaraswamy-G (Kw-G) [8], exponentiated generalized (EG) [9], type I half-logistic-G [10], Burr X-G [11], and exponentiated Weibull-H [12], among others. The applications of these generators have been made in the context of linear data, i.e., on the support of a subset of R.
Several phenomena in practice provide angles (expressed in degrees or radians) as outputs called circular data, such as in the analysis of phase features obtained from radar imagery [13], time series analysis of wind speeds and directions [14], etc. As one of the most used circular distributions, the two-parameter Cardioid (C) law was pioneered by Jeffreys [15] for describing directional spectra of ocean waves. This model has a cumulative distribution function (cdf), G(x) = G(x; µ, ρ), and probability density function (pdf), g(x) = g(x; µ, ρ), given by (for 0 < x ≤ 2π) and Stats 2021, 4 respectively, where 0 < µ ≤ 2π is a location parameter, and |ρ| ≤ 0.5 represents a concentration index. Some known competing distributions to the C distribution are the wrapped normal, wrapped Cauchy, wrapped Lévy, and Wrapped Lindley. A novel circular distribution introduced by Wang and Shimizu [16] applied the Möbius transformation to the C model. The Papakonstantinou family studied by Abe et al. [17] also extended (1). However, these extensions present hard analytic formulas for their densities. Recently, Paula et al. [18] introduced a simple extended C distribution, called the exponentiated Cardioid (EC), derived from the exponentiated G (exp-G) generator-after adapting the mapping linear to circular-that can describe asymmetric and some bimodal cases beyond those of the C model. The models mentioned and those that will be presented in this work are also classified as trigonometric distributions. In recent years, many trigonometric models have been proposed, such as the transformed Sin-G family [19] and Cos-G Class [20], thus highlighting their importance.
In this work, we derive four extensions of the C model through the adapted β-G, Kw-G, Γ-G, and MO-G generators, which extend the exp-G family. We propose four new circular distributions called the beta Cardioid (βC), Kumaraswamy Cardioid (KwC), gamma Cardioid (ΓC), and Marshall-Olkin Cardioid (MOC). Their densities are expressed in a unique linear representation, which is the result of weighting the term [1 + 2ρ cos(x − µ)] in Equation (2). Circular data phenomena often demand the proposal of tailored clustering structures. Abraham et al. [21] presented a discussion on an unsupervised clustering algorithm in circular data obtained from X-ray beam projectors. Based on mixtures of one-dimensional Langevin distributions, Qiu and Wu [22] derived a new information criterion to cluster circular data. We understand that these works motivate our proposals as the potential inputs for future clustering structures. Furthermore, some mathematical properties of the new models are derived, such as extensions and trigonometric moments [23]. A brief discussion about likelihood-based estimation procedures is provided. Finally, two applications to real data are performed to illustrate the flexibility of our proposals.
The remainder of this paper is organized as follows. New circular distributions are defined in Section 2. Section 3 provides some of their properties, and an estimation procedure is addressed in Section 4. Subsequently, two applications to real data are performed in Section 5, and some conclusions are offered in Section 6.

Generalized Cardioid Models
We provide some three-and four-parameter distributions by transforming the C distribution according to four well-known generators.
For the first two generators, given a p-parameter baseline cdf as input, one has new (p + 2)-parameter models, whereas for the remaining generators, (p + 1)-parameter distributions are furnished. Let where mod(x, y) is the remainder after x is divided by y. In what follows, we will do an adaptation to the generators (3)- (6) in order to propose generalized Cardioid models with cdf F(·) and pdf f (·) that satisfy the conditions 1. f The conditions are required for circular data studies (see Mardia and Sutton [24]). The new models present discontinuity in {2kπ : k ∈ Z}. This pattern also holds for other circular models in the literature such as wrapped exponential [25].
x f(x)
x f(x)
x f(x)

A General Formula
All four extensions have the same support, and their densities can be expressed as whereh i (x) is defined in Table 1. Table 1. The weighted multipliers for the proposed models.
The new densities can be interpreted as weighted multipliers for the baseline pdf kernel [1 + 2ρ cos(x − µ)]. Thus, the behavior ofh i (x) in (11) has an important task for studying the flexibility of the new models. Figure 5 displays the weighted functionsh i (x). For these plots, we set (µ, ρ) = (2, 0.2) and consider θ = φ ∈ (0, 100) and x ∈ (0, 2π). Note that althoughh 3 (x) andh 4 (x) have the highest values,h 1 (x) andh 2 (x) present larger domain regions, which lead to more flexible scenarios. Thus, we conclude that the βC and KwC can be more flexible among these models.

Mathematical Properties
In this section, we obtain the trigonometric moments for the new models. First, we recall some concepts in the area of circular distributions. We follow the notation of Pewsey et al. [26].
Analogously as over the real line, a circular distribution can also be described by its characteristic function (cf). However, as random variables X considered in this paper are periodic, we can write where i = √ −1, which implies φ X (t) = 0 or e i t 2 π = 1; i.e., the cf should be defined only at integer values.
The cf evaluated at an integer p is called the pth trigonometric moment of X defined by The quantity τ p,0 is the mean resultant vector in the complex plane of length ρ p = where | · | is the norm of a complex argument. The quantities ρ 1 and µ 1 are fundamental measures of concentration and location, respectively. The polar representation of τ p,0 is Furthermore, the pth central trigonometric moment of a circular distribution is whereᾱ p andβ p are its real and imaginary parts. The polar representation of τ p,µ 1 is given by Here, we are interested in finding expressions for τ p,µ 1 .
In what follows, µ refers to the parameter discussed previously in the models, while µ 1 is the mean direction.
Furthermore, we derive expansions for f i (x) by means of the following results. First, consider a baseline distribution having cdf G(x) and pdf g(x). The exp-G family with power parameter θ > 0 has cdf and pdf given by Π θ (x) = G(x) θ and π θ (x) = θ g(x) G(x) θ−1 , respectively. Expansions for densities obtained from Equations (3)-(6) have often been given in terms of the last two functions: From Nadarajah et al. [27]: From Cordeiro and de Castro [8]: From Castellares and Lemonte [28]: where and ψ i−1 (·) are the Stirling polynomials given in Castellares and Lemonte [28].
By applying (17) to Equations (13)-(16), we obtain linear representations for (11), which hold for the four generalized C distributions. Theorem 1. The pdf (11) can be expanded as where b Equation (19) can be used to derive some mathematical properties (having intractable analytical forms) of f i (x) (for i = 1, . . . , 4). Furthermore, as a consequence, we have expansions for the weightsh i (x) (which have complex forms) as linear combinations of x l−v sin h (x − µ). Proposing criteria for choosing the best f i (x) based on these expansions may be a promising research branch. In particular, we obtain expressions for the central trigonometric moments of distributions with pdf (11).

Corollary 1. Let τ (j)
p,µ 1 be the pth central trigonometric moment of the model F j . We obtain and ind j (t) is given in Theorem 1.

Estimation
This section tackles a brief discussion about maximum likelihood estimation of the parameters of the pdf family (11). Several approaches for estimating the parameters have been proposed in the literature, but the maximum likelihood method is the most commonly employed. The maximum likelihood estimates (MLEs) present desirable properties for constructing confidence intervals for the parameters. They are easily computed by using well-known platforms such as the R (optim function), SAS (PROC NLMIXED), and Ox program (MaxBFGS sub-routine).. Let x 1 , . . . , x n be an observed sample from a random variable having pdf (11). Thus, the associated log-likelihood function for δ = (θ, φ, µ, ρ) can be expressed as (for i = 1, . . . 4) The score vector follows from i (δ) as whose components are Thus, the MLE of δ is δ = argmax δ∈∆ { i (δ)}, where ∆ is the parametric space or, equivalently, the solution of the system of nonlinear equations U θ,i = U φ,i = U µ,i = U ρ,i = 0. The compactness of the parameter space ∆ and the continuity of the log-likelihood function on ∆ are sufficient for the existence of the MLE.
The partitioned observed information matrix for the model F i (x) takes the form (for i = 1, . . . , 4) for a, b = θ, φ, µ, ρ , whose elements are and For interval estimation of the parameters in F i (x), we obtain the Fisher information matrix (FIM) K i (δ) = E(J i (δ)) under standard regularity conditions. For n sufficiently large, √ n( δ − δ) D − → N 4 (0,K i (δ)) from a result in Casella and Berger [30], whereK i = K i /n is the unit FIM, "N k (µ, Σ)" denotes the k-dimensional multivariate normal distribution with parameters µ and Σ, and " D − →" means convergence in distribution.
However, the FIM is seldom tractable. As a solution, we can adopt J i instead of K i . This last strategy will be used in the numerical results. In the next section, the last asymptotic result will be used to determine the standard errors associated with MLEs.

Applications
In this section, we provide two applications to illustrate the potentiality of the proposed models. The first dataset consists of 21 wind directions obtained by a Milwaukee weather station at 6:00 a.m. on consecutive days (see [31]). The second one corresponds to the directions taken by 76 turtles after treatment addressed by Stephens [32].
The Cartesian histograms of first and second datasets in Figures 6a and 7a indicate positive (0.4313) and negative (−0.0816) skewness, respectively. Furthermore, these datasets have bimodal shapes. First, the MLEs and their SEs (given in parentheses) are evaluated and, subsequently, the values of the Kuiper (K), Watson (W), Akaike information criterion (AIC), and Bayesian information criterion (BIC) statistics. The first two adherence measures are used in the context of circular statistics and can be found in Jammalamadaka and Sengupta [23]. All computations were performed using function maxLik of the R statistical software (see [33]).
The results for the first and second datasets are reported in Tables 3 and 4, respectively. We note that all generalized models fit both datasets better than the Cardioid model according to these statistics. For the first dataset, the EC distribution stands out according to the K, AIC, and BIC measures, while the βC model yields the best fit to the dataset according to the W statistic. The βC model outperforms the other models for the second dataset.   7 display plots of the empirical and fitted densities to these data. The plots support the indications from these tables.

Conclusions
We propose four new distributions with supports on the circle. These extensions of the Cardioid (C) distribution follow by inserting this distribution in the beta-G, gamma-G, Kumaraswamy-G (Kw-G), and Marshall-Olkin-G generators, considering a specific adaptation. We derive expansions for the densities and trigonometric moments of the new models. We also discuss the maximum likelihood estimation for their parameters. Two applications illustrate the flexibility of the proposed models to fit real data. Data Availability Statement: [31,32].

Conflicts of Interest:
The authors declare no conflict of interest.