1. Introduction
Fitting densities to data has a long history. Statistical distributions are very useful in describing and predicting real world phenomena. Hundreds of extended distributions have been developed by introducing one or more parameters to a baseline distribution over the past decades for modeling data in several disciplines, in particular in reliability engineering [
1], survival analysis [
2], demography [
3], actuarial science [
4], etc.
Adding parameters to a well-established distribution is a time honored device for obtaining more flexible new families of distributions. In fact, several classes of distributions have been introduced by adding one or more parameters to generate new distributions in the statistical literature. Recent developments address definitions of new families that extend well-known distributions and, at the same time, provide great flexibility in modeling real data. The well-known generators are the Marshall–Olkin-G [
5], beta-G [
6], gamma-G [
7], Kumaraswamy-G (Kw-G) [
8], exponentiated generalized (EG) [
9], type I half-logistic-G [
10], Burr X-G [
11], and exponentiated Weibull-H [
12], among others. The applications of these generators have been made in the context of linear data, i.e., on the support of a subset of
.
Several phenomena in practice provide angles (expressed in degrees or radians) as outputs called circular data, such as in the analysis of phase features obtained from radar imagery [
13], time series analysis of wind speeds and directions [
14], etc. As one of the most used circular distributions, the two-parameter Cardioid (C) law was pioneered by Jeffreys [
15] for describing directional spectra of ocean waves. This model has a cumulative distribution function (cdf),
, and probability density function (pdf),
, given by (for
)
and
respectively, where
is a location parameter, and
represents a concentration index. Some known competing distributions to the C distribution are the wrapped normal, wrapped Cauchy, wrapped Lévy, and Wrapped Lindley. A novel circular distribution introduced by Wang and Shimizu [
16] applied the Möbius transformation to the C model. The Papakonstantinou family studied by Abe et al. [
17] also extended (
1). However, these extensions present hard analytic formulas for their densities. Recently, Paula et al. [
18] introduced a simple extended C distribution, called the exponentiated Cardioid (EC), derived from the exponentiated G (exp-G) generator—after adapting the mapping linear to circular—that can describe asymmetric and some bimodal cases beyond those of the C model. The models mentioned and those that will be presented in this work are also classified as trigonometric distributions. In recent years, many trigonometric models have been proposed, such as the transformed Sin-G family [
19] and Cos-G Class [
20], thus highlighting their importance.
In this work, we derive four extensions of the C model through the adapted
-G, Kw-G,
-G, and MO-G generators, which extend the exp-G family. We propose four new circular distributions called the beta Cardioid (
C), Kumaraswamy Cardioid (KwC), gamma Cardioid (
C), and Marshall–Olkin Cardioid (MOC). Their densities are expressed in a unique linear representation, which is the result of weighting the term
in Equation (
2). Circular data phenomena often demand the proposal of tailored clustering structures. Abraham et al. [
21] presented a discussion on an unsupervised clustering algorithm in circular data obtained from X-ray beam projectors. Based on mixtures of one-dimensional Langevin distributions, Qiu and Wu [
22] derived a new information criterion to cluster circular data. We understand that these works motivate our proposals as the potential inputs for future clustering structures. Furthermore, some mathematical properties of the new models are derived, such as extensions and trigonometric moments [
23]. A brief discussion about likelihood-based estimation procedures is provided. Finally, two applications to real data are performed to illustrate the flexibility of our proposals.
The remainder of this paper is organized as follows. New circular distributions are defined in
Section 2.
Section 3 provides some of their properties, and an estimation procedure is addressed in
Section 4. Subsequently, two applications to real data are performed in
Section 5, and some conclusions are offered in
Section 6.
2. Generalized Cardioid Models
We provide some three- and four-parameter distributions by transforming the C distribution according to four well-known generators.
Let be the cdf of a baseline distribution with p parameters:
- (a)
The
-G cdf defined by Eugene et al. [
6] is
where
are two additional parameters,
is the incomplete beta function ratio evaluated at
, and
is the complete beta function;
- (b)
The Kw-G cdf pioneered by Cordeiro and Castro [
8] is
where
are two additional parameters;
- (c)
The
-G cdf reported by Zografos and Balakrishnan [
7] is
where
,
is the gamma function, and
is the incomplete gamma function;
- (d)
The MO-G cdf defined by Marshal and Olkin [
5] is
where
is a shape parameter.
For the first two generators, given a p-parameter baseline cdf as input, one has new -parameter models, whereas for the remaining generators, -parameter distributions are furnished.
Let
, where
is the remainder after
x is divided by
y. In what follows, we will do an adaptation to the generators (
3)–(
6) in order to propose generalized Cardioid models with cdf
and pdf
that satisfy the conditions
The conditions are required for circular data studies (see Mardia and Sutton [
24]). The new models present discontinuity in
. This pattern also holds for other circular models in the literature such as wrapped exponential [
25].
2.1. Beta Cardioid
By applying (
1) to Equation (
3), the cdf of the
C distribution is
for
. This case is denoted by
. By differentiating the last equation, the
C pdf, say
, has the form
where
and
For , the C model reduces to the EC distribution discussed by Paula et al. (2020).
Figure 1a–d display
C densities for some parametric points.
2.2. Kumaraswamy Cardioid
By inserting (
1) in Equation (
4), the Kw-C cdf, say
, can be expressed as
for
. This case is denoted by
. The KwC pdf,
, can be reduced to
where
and
Figure 2a–d display KwC densities for some parametric points.
2.3. Gamma Cardioid
By applying (
1) in Equation (
5), the
C cdf,
, has the form
for
. This case is denoted by
. By differentiating the last equation, the
C pdf,
, reduces to
where
and
Figure 3a–d display
C densities for some parametric points.
2.4. Marshall–Olkin Cardioid
By inserting (
1) in Equation (
6), the MOC cdf,
, is given by
for
. This case is denoted by
. Thus, the MOC pdf,
, becomes
where
and
Figure 4a–d display MOC densities for some parametric points.
2.5. A General Formula
All four extensions have the same support, and their densities can be expressed as
where
is defined in
Table 1.
The new densities can be interpreted as weighted multipliers for the baseline pdf kernel
. Thus, the behavior of
in (
11) has an important task for studying the flexibility of the new models.
Figure 5 displays the weighted functions
. For these plots, we set
and consider
and
. Note that although
and
have the highest values,
and
present larger domain regions, which lead to more flexible scenarios. Thus, we conclude that the
C and KwC can be more flexible among these models.
3. Mathematical Properties
In this section, we obtain the trigonometric moments for the new models. First, we recall some concepts in the area of circular distributions. We follow the notation of Pewsey et al. [
26].
Analogously as over the real line, a circular distribution can also be described by its characteristic function (cf). However, as random variables
X considered in this paper are periodic, we can write
where
, which implies
or
; i.e., the cf should be defined only at integer values.
The cf evaluated at an integer
p is called
the pth trigonometric moment of
X defined by
The quantity
is the mean resultant vector in the complex plane of length
and direction
where
is the norm of a complex argument. The quantities
and
are fundamental measures of concentration and location, respectively. The polar representation of
is
Furthermore,
the pth central trigonometric moment of a circular distribution is
where
and
are its real and imaginary parts. The polar representation of
is given by
Here, we are interested in finding expressions for .
In what follows, refers to the parameter discussed previously in the models, while is the mean direction.
Furthermore, we derive expansions for
by means of the following results. First, consider a baseline distribution having cdf
and pdf
. The exp-G family with power parameter
has cdf and pdf given by
respectively. Expansions for densities obtained from Equations (
3)–(
6) have often been given in terms of the last two functions:
From Nadarajah et al. [
27]:
From Cordeiro and de Castro [
8]:
From Castellares and Lemonte [
28]:
where
and
are the Stirling polynomials given in Castellares and Lemonte [
28].
From Cordeiro et al. [
29]:
where the coefficients
are given by (
)
and
.
Let
. The cdf of
is (Paula et al., 2020)
By simple differentiation, we can write
where
,
After some algebraic manipulations, the
pth central circular trigonometric moment of
, say
, with mean direction
, follows as
where
and
. The functions
and
are easily handled both numerically and analytically.
For example,
Table 2 displays some special quantities using the symbolic computation software
wxmaxima.
By applying (
17) to Equations (
13)–(
16), we obtain linear representations for (
11), which hold for the four generalized C distributions.
Theorem 1. The pdf (11) can be expanded aswhere for , , , and . Equation (
19) can be used to derive some mathematical properties (having intractable analytical forms) of
(for
). Furthermore, as a consequence, we have expansions for the weights
(which have complex forms) as linear combinations of
. Proposing criteria for choosing the best
based on these expansions may be a promising research branch. In particular, we obtain expressions for the central trigonometric moments of distributions with pdf (
11).
Corollary 1. Let be the pth central trigonometric moment of the model . We obtainwhereand is given in Theorem 1. Proof of Corollary 1. Let be the pth circular trigonometric moment of .
Furthermore, let
,
,
, and
. Thus, assuming
and
, similar to real and imaginary parts of
in (
18), it follows from Equations (
13)–(
16):
where
□
4. Estimation
This section tackles a brief discussion about maximum likelihood estimation of the parameters of the pdf family (
11). Several approaches for estimating the parameters have been proposed in the literature, but the maximum likelihood method is the most commonly employed. The maximum likelihood estimates (MLEs) present desirable properties for constructing confidence intervals for the parameters. They are easily computed by using well-known platforms such as the R (
optim function), SAS (
PROC NLMIXED), and Ox program (
MaxBFGS sub-routine)..
Let
be an observed sample from a random variable having pdf (
11). Thus, the associated log-likelihood function for
can be expressed as (for
)
The score vector follows from
as
whose components are
and
Thus, the MLE of is where is the parametric space or, equivalently, the solution of the system of nonlinear equations . The compactness of the parameter space and the continuity of the log-likelihood function on are sufficient for the existence of the MLE.
The partitioned observed information matrix for the model
takes the form (for
)
whose elements are
for
, where
and
For interval estimation of the parameters in , we obtain the Fisher information matrix (FIM) under standard regularity conditions.
For
n sufficiently large,
from a result in Casella and Berger [
30], where
is the unit FIM, “
” denotes the
k-dimensional multivariate normal distribution with parameters
and
, and “
” means convergence in distribution.
However, the FIM is seldom tractable. As a solution, we can adopt instead of . This last strategy will be used in the numerical results. In the next section, the last asymptotic result will be used to determine the standard errors associated with MLEs.
5. Applications
In this section, we provide two applications to illustrate the potentiality of the proposed models. The first dataset consists of 21 wind directions obtained by a Milwaukee weather station at 6:00 a.m. on consecutive days (see [
31]). The second one corresponds to the directions taken by 76 turtles after treatment addressed by Stephens [
32].
The Cartesian histograms of first and second datasets in
Figure 6a and
Figure 7a indicate positive (
) and negative (
) skewness, respectively. Furthermore, these datasets have bimodal shapes.
First, the MLEs and their SEs (given in parentheses) are evaluated and, subsequently, the values of the Kuiper (K), Watson (W), Akaike information criterion (AIC), and Bayesian information criterion (BIC) statistics. The first two adherence measures are used in the context of circular statistics and can be found in Jammalamadaka and Sengupta [
23]. All computations were performed using function
maxLik of the R statistical software (see [
33]).
The results for the first and second datasets are reported in
Table 3 and
Table 4, respectively. We note that all generalized models fit both datasets better than the Cardioid model according to these statistics. For the first dataset, the EC distribution stands out according to the K, AIC, and BIC measures, while the
C model yields the best fit to the dataset according to the W statistic. The
C model outperforms the other models for the second dataset.
Figure 6 and
Figure 7 display plots of the empirical and fitted densities to these data. The plots support the indications from these tables.