1. Introduction
It is well known that the likelihood function is one of the most important tools in classical inference, and the resultant estimator, the maximum likelihood estimator (MLE), has nice efficiency properties, although it has not so good robustness properties.
Tests based on MLE (likelihood ratio test, Wald test, Rao’s test, etc.) have, usually, good efficiency properties, but in the presence of outliers, the behavior is not so good. To solve these situations, many robust estimators have been introduced in the statistical literature, some of them based on distance measures or divergence measures. In particular, density power divergence measures introduced in [
1] have given good robust estimators: minimum density power divergences estimators (MDPDE) and, based on them, some robust test statistics have been considered for testing simple and composite null hypotheses. Some of these tests are based on divergence measures (see [
2,
3]), and some others are used to extend the classical Wald test; see [
4,
5,
6] and the references therein.
The classical likelihood function requires exact specification of the probability density function, but in most applications, the true distribution is unknown. In some cases, where the data distribution is available in an analytic form, the likelihood function is still mathematically intractable due to the complexity of the probability density function. There are many alternatives to the classical likelihood function; in this paper, we focus on the composite likelihood. Composite likelihood is an inference function derived by multiplying a collection of component likelihoods; the particular collection used is a conditional determined by the context. Therefore, the composite likelihood reduces the computational complexity so that it is possible to deal with large datasets and very complex models even when the use of standard likelihood methods is not feasible. Asymptotic normality of the composite maximum likelihood estimator (CMLE) still holds with the Godambe information matrix to replace the expected information in the expression of the asymptotic variance-covariance matrix. This allows the construction of composite likelihood ratio test statistics, Wald-type test statistics, as well as score-type statistics. A review of composite likelihood methods is given in [
7]. We have to mention at this point that CMLE, as well as the respective test statistics are seriously affected by the presence of outliers in the set of available data.
The main purpose of the paper is to introduce a new robust family of estimators, namely, composite minimum density power divergence estimators (CMDPDE), as well as a new family of Wald-type test statistics based on the CMDPDE in order to get broad classes of robust estimators and test statistics.
In
Section 2, we introduce the CMDPDE, and we provide the associated estimating system of equations. The asymptotic distribution of the CMDPDE is obtained in
Section 2.1.
Section 2.2 is devoted to the definition of a family of Wald-type test statistics, based on CMDPDE, for testing simple and composite null hypotheses. The asymptotic distribution of these Wald-type test statistics is obtained, as well as some asymptotic approximations to the power function. A numerical example, presented previously in [
8], is studied in
Section 3. A simulation study based on this example is also presented (
Section 3), in order to study the robustness of the CMDPDE, as well as the performance of the Wald-type test statistics based on CMDPDE. Proofs of the results are presented in the
Appendix A.
2. Composite Minimum Density Power Divergence Estimator
We adopt here the notation by [
9], regarding the composite likelihood function and the respective CMLE. In this regard, let
be a parametric identifiable family of distributions for an observation
, a realization of a random
m-vector
. In this setting, the composite density based on
K different marginal or conditional distributions has the form:
and the corresponding composite log-density has the form:
with:
where
is a family of random variables associated either with marginal or conditional distributions involving some
and
and
,
are non-negative and known weights. If the weights are all equal, then they can be ignored. In this case, all the statistical procedures produce equivalent results.
Let
also be independent and identically distributed replications of
. We denote by:
the composite log-likelihood function for the whole sample. In complete accordance with the classical MLE, the CMLE,
, is defined by:
It can also be obtained by solving the equations.
where:
We are going to see how it is possible to get the CMLE,
on the basis of the Kullback–Leibler divergence measure. We shall denote by
the density generating the data with the respective distribution function denoted by
G. The Kullback–Leibler divergence between the density function
and the composite density function
is given by:
The term:
can be removed because it does not depend on
; hence, we can define the following estimator of
based on the Kullback–Leibler divergence:
or equivalently:
If we replace in (
3) the distribution function
G by the empirical distribution function
, we have:
and this expression is equivalent to Expression (
1). Therefore, the estimator
coincides with the CMLE. Based on the previous idea, we are going to introduce, in a natural way, the composite minimum density power divergence estimator (CMDPDE).
The CMLE,
, obeys asymptotic normality (see [
9]) and in particular:
where
denotes the Godambe information matrix, defined by:
with
being the sensitivity or Hessian matrix and
being the variability matrix, defined, respectively, by:
where the superscript
T denotes the transpose of a vector or a matrix.
The matrix
is nonnegative definite by definition. In the following, we shall assume that the matrix
is of full rank. Since the component score functions can be correlated, we have
. If
is a true log-likelihood function, then
,
being the Fisher information matrix of the model. Using multivariate version of the Cauchy–Schwarz inequality, we have that the matrix
is non-negative definite, i.e., the full likelihood function is more efficient than any other composite likelihood function (cf. [
10], Lemma 4A).
We are now going to proceed to the definition of the CMDPDE, which is based on the density power divergence measure, defined as follows. For two densities
p and
q associated with two
m-dimensional random variables, respectively, the density power divergence (DPD) between
p and
q was defined in [
1] by:
for
while for
, it is defined by:
For
, Expression (
4) reduces to the
distance:
It is also interesting to note that (
4) is a special case of the so-called Bregman divergence
. If we consider
, we get
times
. The parameter
controls the trade-off between robustness and asymptotic efficiency of the parameter estimates (see the Simulation Section), which are the minimizers of this family of divergences. For more details about this family of divergence measures, we refer to [
11].
In this paper, we are going to consider DPD measures between the density function
and the composite density function
, i.e.,
for
while for
, we have,
The CMDPDE,
is defined by:
The term:
does not depend on
, and consequently, the minimization of (
5) with respect to
is equivalent to minimizing:
or:
Now, we replace the distribution function
G by the empirical distribution function
, and we get:
As a consequence, for a fixed value of
the CMDPDE of
can be obtained by minimizing the expression given in (
6); or equivalently, by maximizing the expression:
Under the differentiability of the model, the maximization of the function in Equation (
7) leads to an estimating system of equations of the form:
The system of Equations (
8) can be written as:
and the CMDPDE
of
is obtained by the solution of (
9). For
in (
9), we have:
but:
and we recover the estimating equation for the CMLE,
, presented in (
2).
2.1. Asymptotic Distribution of the Composite Minimum Density Power Divergence Estimator
Equation (
9) can be written as follows:
with:
Therefore, the CMDPDE,
is an M-estimator. In this case, it is well known (cf. [
12]) that the asymptotic distribution of
is given by:
being:
and:
We are going to establish the expressions of
and
In relation to
, we have:
and:
In relation to
, we have,
Based on the previous results, we have the following theorem.
Theorem 1. Under suitable regularity conditions, we have:where the matrices and were defined in (10) and (11), respectively. Remark 1. If we apply the previous theorem for , then we get the CMLE, and the asymptotic variance covariance matrix coincides with the Godambe information matrix because:for 2.2. Wald-Type Tests Statistics Based on the Composite Minimum Power Divergence Estimator
Wald-type test statistics based on MDPDE have been considered with excellent results in relation to the robustness in different statistical problems; see for instance [
4,
5,
6].
Motivated by those works, we focus in this section on the definition and the study of Wald-type test statistics, which are defined by means of CMDPDE estimators instead of MDPDE estimators. In this context, if we are interested in testing:
we can consider the family of Wald-type test statistics:
For
, we get the classical Wald-type test statistic considered in the composite likelihood methods (see for instance [
7]).
In the following theorem, we present the asymptotic null distribution of the family of the Wald-type test statistics
Theorem 2. The asymptotic distribution of the Wald-type test statistics given in (14) is a chi-square distribution with p degrees of freedom. Theorem 3. Let be the true value of the parameter θ, with Then, it holds:being:and: Remark 2. Based on the previous result, we can approximate the power, of the Wald-type test statistics in by:where is a sequence of distribution functions tending uniformly to the standard normal distribution function It is clear that:for all Therefore, the Wald-type test statistics are consistent in the sense of Fraser. In many practical hypothesis testing problems, the restricted parameter space
is defined by a set of
r restrictions of the form:
on
, where
is a vector-valued function such that the
matrix:
exists and is continuous in
and rank
; where
denotes the null vector of dimension
r.
Now, we are going to consider composite null hypotheses,
, in the way considered in (
16), and our interest is in testing:
on the basis of a random simple of size
Definition 1. The family of Wald-type test statistics for testing (18) is given by:where the matrices and were defined in (17), (10) and (11), respectively, and the function in (16). If we consider , then coincides with the CMLE, of and with the inverse of the Fisher information matrix, and then, we get the classical Wald test statistic considered in the composite likelihood methods.
In the next theorem, we present the asymptotic distribution of
Theorem 4. The asymptotic distribution of the Wald-type test statistics, given in (19), is a chi-square distribution with r degrees of freedom. Consider the null hypothesis . By Theorem 4, the null hypothesis should be rejected if . The following theorem can be used to approximate the power function. Assume that is the true value of the parameter, so that .
Theorem 5. Let be the true value of the parameter, with Then, it holds:being:and: 3. Numerical Example
In this section, we shall consider an example, studied previously by [
8], in order to study the robustness of CMLE. The aim of this section is to clarify the different issues that were discussed in the previous sections.
Consider the random vector
, which follows a four-dimensional normal distribution with mean vector
and variance-covariance matrix:
i.e., we suppose that the correlation between
and
is the same as the correlation between
and
. Taking into account that
should be semi-positive definite, the following condition is imposed:
. In order to avoid several problems regarding the consistency of the CMLE of the parameter
(cf. [
8]), we shall consider the composite likelihood function:
where:
where
and
are the densities of the marginals of
, i.e., bivariate normal distributions with mean vectors
and
, respectively, and common variance-covariance matrix:
with densities given by:
being:
By
, we are denoting the parameter vector of our model, i.e,
. The system of equations that it is necessary to solve in order to obtain the CMDPDE:
is given (see
Appendix A.4) by:
and:
being:
After some heavy algebraic manipulations specified in
Appendix A.5, the sensitivity and variability matrices are given by:
and:
where
and
.
Simulation Study
A simulation study, developed by using the R statistical programming environment, is presented in order to study the behavior of the CMDPDE, as well as the behavior of the Wald-type test statistics based on them. The theoretical model studied in the previous example is considered. The parameters in the model are:
and we are interested in studying the behavior of the CMDPDE:
as well as the behavior of the Wald-type test statistics for testing:
Through
R = 10,000 replications of the simulation experiment, we compare, for different values of
the corresponding CMDPDE through the root of the mean square errors (RMSE), when the true value of the parameters is
and
. We pay special attention to the problem of the existence of some outliers in the sample, generating
of the samples with
and
, respectively. Notice that, although the case
has been considered; this case is less important taking into account the method of the theoretical model under consideration, and having the case of independent observations, the composite likelihood theory is useless. Results are presented in
Table 1 and
Table 2. Two points deserve our attention. The first one is that, as expected, RMSEs for contaminated data are always greater than RMSEs for pure data and that the RMSEs decrease when the sample size
n increases. The second is that, while in pure data, RMSEs are greater for big values of
, when working with contaminated data, the CMDPDE with medium-low values of
(
) present the best behavior in terms of efficiency. These statements are also true for larger levels of contamination, noting that, when larger percentages are considered, larger values of
are also considerable in terms of efficiency (see
Table 3,
Table 4 and
Table 5 for contamination equal to
,
and
, respectively). Considering the mean absolute error (MAE) for the evaluation of the accuracy, we obtain similar results (
Table 6).
For a nominal size
, with the model under the null hypothesis given in (
29), the estimated significance levels for different Wald-type test statistics are given by:
with
being the indicator function (with a value of one if
S is true and zero otherwise). Empirical levels with the same previous parameter values are presented in
Table 7 (pure data) and
Table 8 (
of outliers). While medium-high values of
are not recommended at all, CMLE is generally the best choice when working with pure data. However, the lack of robustness of the CMLE test is impressive, as can be seen in
Table 8. The effect of contamination in medium-low values of
is much lighter, while for medium-high values of
, it can return to being deceptively beneficial.
For finite sample sizes and nominal size
, the simulated powers are obtained under
in (
29), when
,
and
(
Table 9 and
Table 10). The (simulated) power for different composite Wald-type test statistics is obtained by:
As expected, when we get closer to the null hypothesis and when decreasing the sample sizes, the power decreases. With pure data, the best behavior is obtained with low values of , and with this level of contamination (), the best results are obtained for medium values of .