Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family

Li, Lei; Vidyashankar, Anand N.; Diao, Guoqing; Ahmed, Ejaz

doi:10.3390/e21040348

Open AccessArticle

Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family

by

Lei Li

¹

,

Anand N. Vidyashankar

^1,*

,

Guoqing Diao

¹ and

Ejaz Ahmed

²

¹

Department of Statistics, George Mason University, Fairfax, VA 22030, USA

²

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(4), 348; https://doi.org/10.3390/e21040348

Submission received: 6 February 2019 / Revised: 23 March 2019 / Accepted: 24 March 2019 / Published: 29 March 2019

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

Download

Browse Figures

Versions Notes

Abstract

:

Big data and streaming data are encountered in a variety of contemporary applications in business and industry. In such cases, it is common to use random projections to reduce the dimension of the data yielding compressed data. These data however possess various anomalies such as heterogeneity, outliers, and round-off errors which are hard to detect due to volume and processing challenges. This paper describes a new robust and efficient methodology, using Hellinger distance, to analyze the compressed data. Using large sample methods and numerical experiments, it is demonstrated that a routine use of robust estimation procedure is feasible. The role of double limits in understanding the efficiency and robustness is brought out, which is of independent interest.

Keywords:

compressed data; Hellinger distance; representation formula; iterated limits; influence function; consistency; asymptotic normality; location-scale family

1. Introduction

Streaming data are commonly encountered in several business and industrial applications leading to the so-called Big Data. These are commonly characterized using four V’s: velocity, volume, variety, and veracity. Velocity refers to the speed of data processing while volume refers to the amount of data. Variety refers to various types of data while veracity refers to uncertainty and imprecision in data. It is believed that veracity is due to data inconsistencies, incompleteness, and approximations. Whatever be the real cause, it is hard to identify and pre-process data for veracity in a big data setting. The issues are even more complicated when the data are streaming.

A consequence of the data veracity is that statistical assumptions used for analytics tend to be inaccurate. Specifically, considerations such as model misspecification, statistical efficiency, robustness, and uncertainty assessment-which are standard part of a statistical toolkit-cannot be routinely carried out due to storage limitations. Statistical methods that facilitate simultaneous addressal of twin problems of volume and veracity would enhance the value of the big data. While health care industry and financial industries would be the prime benefactors of this technology, the methods can be routinely applied in a variety of problems that use big data for decision making.

We consider a collection of n (n is of the order of at least

10^{6}

) observations, assumed to be independent and identically distributed (i.i.d.), from a probability distribution

f (\cdot)

belonging to a location-scale family; that is,

f (x; μ, σ) = \frac{1}{σ} f (\frac{x - μ}{σ}), μ \in I R, σ > 0 .

We denote by

Θ

the parameter space and without loss of generality take it as compact since otherwise it can be re-parametrized in a such a way that the resulting parameter space is compact (see [1]).

The purpose of this paper is to describe a methodology for joint robust and efficient estimation of

μ

and

σ^{2}

that takes into account (i) storage issues, (ii) potential model misspecifications, and (iii) presence of aberrant outliers. These issues-which are more likely to occur when dealing with massive amounts of data-if not appropriately accounted in the methodological development, can lead to inaccurate inference and misleading conclusions. On the other hand, incorporating them in the existing methodology may not be feasible due to a computational burden.

Hellinger distance-based methods have long been used to handle the dual issue of robustness and statistical efficiency. Since the work of [1,2] statistical methods that invoke alternative objective functions which converge to the objective function under the posited model have been developed and the methods have been shown to possess efficiency and robustness. However, their routine use in the context of big data problems is not feasible due to the complexity in the computations and other statistical challenges. Recently, a class of algorithms-referred to as Divide and Conquer—have been developed to address some of these issues in the context of likelihood. These algorithms consist in distributing the data across multiple processors and, in the context of the problem under consideration, estimating the parameters from each processor separately and then combining them to obtain an overall estimate. The algorithm assumes availability of several processors, with substantial processing power, to solve the complex problem at hand. Since robust procedures involve complex iterative computations-invoking the increased demand for several high-speed processors and enhanced memory-routine use of available analytical methods in a big data setting is challenging. Maximum likelihood method of estimation in the context of location-scale family of distributions has received much attention in the literature ([3,4,5,6,7]). It is well-known that the maximum likelihood estimators (MLE) of location-scale families may not exist unless the defining function

f (\cdot)

satisfies certain regularity conditions. Hence, it is natural to ask if other methods of estimation such as minimum Hellinger distance estimator(MHDE) under weaker regularity conditions. This manuscript provides a first step towards addressing this question. Random projections and sparse random projections are being increasingly used to “compress data” and then use the resulting compressed data for inference. The methodology, primarily developed by computer scientists, is increasingly gaining attention among the statistical community and is investigated in a variety of recent work ([8,9,10,11,12]). In this manuscript, we describe a Hellinger distance-based methodology for robust and efficient estimation after the use of random projections for compressing i.i.d data belonging to the location-scale family. The proposed method consists in reducing the dimension of the data to facilitate the ease of computations and simultaneously maintain robustness and efficiency when the posited model is correct. While primarily developed to handle big and streaming data, the approach can also be used to handle privacy issues in a variety of applications [13].

The rest of the paper is organized as follows: Section 2 provides background on minimum Hellinger distance estimation; Section 3 is concerned with the development of Hellinger distance-based methods for compressed data obtained after using random projections; additionally, it contains the main results and their proofs. Section 4 contains results of the numerical experiments and also describes an algorithm for implementation of the proposed methods. Section 5 contains a real data example from financial analytics. Section 6 is concerned with discussions and extensions. Section 7 contains some concluding remarks.

2. Background on Minimum Hellinger Distance Estimation

Ref. [1] proposed minimum Hellinger distance (MHD) estimation for i.i.d. observations and established that MHD estimators (MHDE) are simultaneously robust and first-order efficient under the true model. Other researchers have investigated related estimators, for example, [14,15,16,17,18,19,20]. These authors establish that when the model is correct, the MHDE is asymptotically equivalent to the maximum likelihood estimator (MLE) in a variety of independent and dependent data settings. For a comprehensive discussion of minimum divergence theory see [21].

We begin by recalling that the Hellinger distance between two probability densities is the

L^{2}

distance between the square root of the densities. Specifically, let, for

p \geq 1

,

| | \cdot {| |}_{p}

denote the

L^{p}

norm defined by

\begin{matrix} {| | h | |}_{p} = {\{\int {| h |}^{p}\}}^{1 / p} . \end{matrix}

The Hellinger distance between the densities

f (\cdot)

and

g (\cdot)

is given by

\begin{matrix} H^{2} (f (\cdot), g (\cdot)) = | | f^{1 / 2} (\cdot) - g^{1 / 2} (\cdot) {| |}_{2}^{2} . \end{matrix}

Let

f (\cdot | θ)

denote the density of

I R^{d}

valued independent and identically distributed random variables

X_{1}, \dots, X_{n}

, where

θ \in Θ \subset I R^{p}

; let

g_{n} (\cdot)

be a nonparametric density estimate (typically a kernel density estimator). The Hellinger distance between

f (\cdot | θ)

and

g_{n} (\cdot)

is then

\begin{matrix} H^{2} (f (\cdot | θ), g_{n} (\cdot)) = | | f^{1 / 2} (\cdot | θ) - g_{n}^{1 / 2} (\cdot) {| |}_{2}^{2} . \end{matrix}

The MHDE is a mapping

T (\cdot)

from the set of all densities to

I R^{p}

defined as follows:

\begin{matrix} θ_{g} = T (g) = \underset{θ \in Θ}{argmin} H^{2} (f (\cdot | θ), g (\cdot)) . \end{matrix}

(1)

Please note that the above minimization problem is equivalent to maximizing

A (f (\cdot | θ), g (\cdot)) = \int f^{1 / 2} (x | θ) g^{1 / 2} (x) d x

. Hence MHDE can alternatively be defined as

\begin{matrix} θ_{g} = \underset{θ \in Θ}{argmax} A (f (\cdot | θ), g (\cdot)) . \end{matrix}

To study the robustness of MHDE, ref. [1] showed that to assess the robustness of a functional with respect to the gross-error model it is necessary to examine the

α

-influence curve rather than the influence curve, except when the influence curve provides a uniform approximation to the

α

-influence curve. Specifically, the

α

-influence function (

{IF}_{α} (θ, z)

) is defined as follows: for

θ \in Θ

, let

f_{α, θ, z} = (1 - α) f (\cdot | θ) + α η_{z}

, where

η_{z}

denotes the uniform density on the interval

(z - ϵ, z + ϵ)

, where

ϵ > 0

is small,

α \in (0, 1)

,

z \in I R

; the

α

-influence function is then defined to be

\begin{matrix} {IF}_{α} (θ, z) = \frac{T (f_{α, θ, z}) - θ}{α}, \end{matrix}

(2)

where

T (f_{α, θ, z})

is the functional for the model with density

f_{α, θ, z} (\cdot)

. Equation (2) represents a complete description of the behavior of the estimator in the presence of contamination, up to the shape of the contaminating density. If

{IF}_{α} (θ, z)

is a bounded function of z such that

{lim}_{z \to \infty} {IF}_{α} (θ, z) = 0, for every θ \in Θ

, then the functional T is robust at

f (\cdot | θ)

against

100 % α

contamination by gross errors at arbitrary large value z. The influence function can be obtained by letting

α \to 0

. Under standard regularity conditions, the minimum divergence estimators (MDE) are first order efficient and have the same influence function as the MLE under the model, which is often unbounded. Hence the robustness of these estimators cannot be explained through their influence functions. In contrast, the

α

-influence function of the estimators are often bounded, continuous functions of the contaminating point. Finally, this approach often leads to high breakdown points in parametric estimation. Other explanations can also be found in [22,23].

Ref. [1] showed that the MHDE of location has a breakdown point equal to

50 %

. Roughly speaking, the breakdown point is the smallest fraction of data that, when strategically placed, can cause an estimator to take arbitrary values. Ref. [24] obtained breakdown results for MHDE of multivariate location and covariance. They showed that the affine-invariant MHDE for multivariate location and covariance has a breakdown point of at least

25 %

. Ref. [18] showed that the MHDE has

50 %

breakdown in some discrete models.

3. Hellinger Distance Methodology for Compressed Data

In this section we describe the Hellinger distance-based methodology as applied to the compressed data. Since we are seeking to model the streaming independent and identically distributed data, we denote by J the number of observations in a fixed time-interval (for instance, every ten minutes, or every half-hour, or every three hours). Let B denote the total number of time intervals. Alternatively, B could also represent the number of sources from which the data are collected. Then, the incoming data can be expressed as

{X_{j l}, 1 \leq j \leq J; 1 \leq l \leq B}

. Throughout this paper, we assume that the density of

X_{j l}

belongs to a location-scale family and is given by

f (x; θ^{*}) = \frac{1}{σ^{*}} f (\frac{x - μ^{*}}{σ^{*}})

, where

θ^{*} = (μ^{*}, σ^{*})

. A typical example is a data store receiving data from multiple sources, for instance financial or healthcare organizations, where information from multiple sources across several hours are used to monitor events of interest such as cumulative usage of certain financial instruments or drugs.

3.1. Random Projections

Let

R_{l} = (r_{i j l})

be a

S \times J

matrix, where S is the number of compressed observations in each time interval,

S ≪ J

, and

r_{i j l}

’s are independent and identically distributed random variables and assumed to be independent of

{X_{j l}, j = 1, 2, \dots, J; 1 \leq l \leq B}

. Let

\begin{matrix} {\tilde{Y}}_{i l} = \sum_{j = 1}^{J} r_{i j l} X_{j l} \end{matrix}

and set

\tilde{Y_{l}} = {({\tilde{Y}}_{1 l}, \dots, {\tilde{Y}}_{S l})}^{'}

; in matrix form this can be expressed as

{\tilde{Y}}_{l} = R_{l} X_{l}

. The matrix

R_{l}

is referred to as the sensing matrix and

{{\tilde{Y}}_{i l}, i = 1, 2 \dots, S; l = 1, 2, \dots, B}

is referred to as the compressed data. The total number of compressed observations

m = S B

is much smaller than the number of original observations

n = J B

. We notice here that

R_{l}

’s are independent and identically distributed random matrices of order

S \times J

. Referring to each time interval or a source as a group, the following Table 1 is a tabular representation of the compressed data.

In random projections literature, the distribution of

r_{i j l}

is typically taken to be Gaussian; but other distributions such as Rademacher distribution, exponential distribution and extreme value distributions are also used (for instance, see [25]). In this paper, we do not make any strong distributional assumptions on

r_{i j l}

. We only assume that

E [r_{i j l}] = 1

and

Var [r_{i j l}] = γ_{0}^{2}

, where

E [\cdot]

represents the expectation of the random variable and

Var [\cdot]

represents the variance of the random variable. Additionally, we denote the density of

r_{i j l}

by

q (\cdot)

.

We next return to the storage issue. When

S = 1

and

r_{i j l} = 1

,

{\tilde{Y}}_{i l}

is a sum of J random variables. In this case, one retains (stores) only the sum of J observations and robust estimates of

θ^{*}

are sought using the sum of observations. In other situations, that is when

r_{i j l}

are not degenerate at 1, the distribution of

{\tilde{Y}}_{i l}

is complicated. Indeed, even if

r_{i j l}

are assumed to be normally distributed, the marginal distribution of

{\tilde{Y}}_{i l}

is complicated. The conditional distribution is

{\tilde{Y}}_{i l}

(given

r_{i j l}

) is a weighted sum of location scale distributions and does not have a useful closed form expression. Hence, in general for these problems the MLE method is not feasible. We denote by

ω_{i l}^{2} = \sum_{j = 1}^{J} r_{i j l}^{2}

and work with the random variables

Y_{i l} \equiv ω_{i l}^{- 1} {\tilde{Y}}_{i l} .

We denote the true density of

Y_{i l}

to be

h_{J} (\cdot | θ^{*}, γ_{0})

. Also, when

γ_{0} = 0

(which implies

r_{i j l} \equiv 1

) we denote the true density of

Y_{i l}

by

h^{* J} (\cdot | θ^{*})

to emphasize that the true density is a convolution of J independent and identically distributed random variables.

3.2. Hellinger Distance Method for Compressed Data

In this section, we describe the Hellinger distance-based method for estimating the parameters of the location scale family using the compressed data. As described in the last section, let

{X_{j l}, j = 1, 2, \dots, J; l = 1, 2, \dots, B}

be a doubly indexed collection of independent and identically distributed random variables with true density

\frac{1}{σ^{*}} f (\frac{\cdot - μ^{*}}{σ^{*}})

. Our goal is to estimate

θ^{*} = (μ^{*}, σ^{2 *})

using the compressed data

{Y_{i l}, i = 1, 2, \dots, S; l = 1, 2, \dots, B}

. We re-emphasize here that the density of

Y_{i l}

depends additionally on

γ_{0}

, the variance of the sensing random variables

r_{i j l}

.

To formulate the Hellinger-distance estimation method, let

G

be a class of densities metrized by the

L_{1}

distance. Let

{h_{J} (\cdot | θ, γ_{0}); θ \in Θ}

be a parametric family of densities. The Hellinger distance functional T is a measurable mapping mapping from

G

to

Θ

, defined as follows:

\begin{matrix} T (g) & \equiv & \underset{θ}{arg min} \int_{I R} {(g^{\frac{1}{2}} (y) - h_{J}^{\frac{1}{2}} (y | θ, γ_{0}))}^{2} d y \\ = & \underset{θ}{arg min} H D^{2} (g, h_{J} (\cdot | θ, γ_{0})) = θ_{g}^{*} (γ_{0}) . \end{matrix}

When

g (\cdot) = h_{J} (\cdot | θ^{*}, γ_{0})

, then under additional assumptions

θ_{g}^{*} (γ_{0}) = θ^{*} (γ_{0})

. Since minimizing the Hellinger-distance is equivalent to maximizing the affinity, it follows that

\begin{matrix} T (g) = \underset{θ}{arg max} A (g, h_{J} (\cdot | θ, γ_{0})), where \end{matrix}

\begin{matrix} A (g, h_{J} (\cdot | θ, γ_{0})) \equiv \int_{I R} g^{\frac{1}{2}} (y) h_{J}^{\frac{1}{2}} (y | θ, γ_{0}) d y . \end{matrix}

It is worth noticing here that

\begin{matrix} A (g, h_{J} (\cdot | θ, γ_{0})) = 1 - \frac{1}{2} H D^{2} (g, h_{J} (\cdot | θ, γ_{0})) . \end{matrix}

(3)

To obtain the Hellinger distance estimator of the true unknown parameters

θ^{*}

, expectedly we choose the parametric family

h_{J} (\cdot | θ, γ_{0})

to be density of

Y_{i l}

and

g (\cdot)

to be a non-parametric

L_{1}

consistent estimator

g_{B} (\cdot)

of

h_{J} (\cdot | θ, γ_{0})

. Thus, the MHDE of

θ_{B}^{*}

is given by

\begin{matrix} {\hat{θ}}_{B} (γ_{0}) & = & \underset{θ}{arg max} A (g_{B}, h_{J} (\cdot | θ, γ_{0})) = T (g_{B}) . \end{matrix}

In the notation above, we emphasize the dependence of the estimator on the variance of the projecting random variables. We notice here that the solution to (1) may not be unique. In such cases, we choose one of the solutions in a measurable manner.

The choice of the density estimate, typically employed in the literature is the kernel density estimate. However, in the setting of the compressed data investigated here, there are S observations per group. These S observations are, conditioned on

r_{i j l}

independent; however they are marginally dependent (if

S > 1

). In the case when

S > 1

, we propose the following formula for

g_{B} (\cdot)

. First, we consider the estimator

\begin{matrix} g_{B}^{(i)} (y) = \frac{1}{B c_{B}} \sum_{l = 1}^{B} K (\frac{y - Y_{i l}}{c_{B}}), i = 1, 2, \dots, S . \end{matrix}

With this choice, the MHDE of

θ_{B}^{*}

is given by, for

1 \leq i \leq S

,

\begin{matrix} {\hat{θ}}_{i, B} (γ_{0}) = \underset{θ}{arg max} A (g_{B}^{(i)}, h_{J} (\cdot | θ, γ_{0})) . \end{matrix}

(4)

The above estimate of the density chooses

i^{t h}

observation from each group and obtains the kernel density estimator using the B independent and identically distributed compressed observations. This is one choice for the estimator. Of course, alternatively, one could obtain

S^{B}

different estimators by choosing different combinations of observations from each group.

It is well-known that the estimator is almost surely

L_{1}

consistent for

h_{J} (\cdot | θ^{*}, γ_{0})

as long as

c_{B} \to 0

and

B c_{B} \to \infty

as

B \to \infty

. Hence, under additional regularity and identifiability conditions and further conditions on the bandwidth

c_{B}

, existence, uniqueness, consistency and asymptotic normality of

{\hat{θ}}_{i, B} (γ_{0})

, for fixed

γ_{0}

, follows from the existing results in the literature.

When

γ_{0} = 0

and

r_{i j l} \equiv 1

, as explained previously, the true density is a

J

–fold convolution of

f (\cdot | θ^{*})

, it is natural to ask the following question: if one lets

γ_{0} \to 0

, will the asymptotic results converge to what one would obtain by taking

γ_{0} = 0

. We refer to this property as a continuity property in

γ_{0}

of the procedure. Furthermore, it is natural to wonder if these asymptotic properties can be established uniformly in

γ_{0}

. If that is the case, then one can also allow

γ_{0}

to depend on B. This idea has an intuitive appeal since one can choose the parameters of the sensing random variables to achieve an optimal inferential scheme. We address some of these issues in the next subsection.

Finally, we emphasize here that while we do not require

S > 1

, in applications involving streaming data and privacy problems S tends to greater than one. In problems where the variance of sensing variables are large, one can obtain an overall estimator by averaging

{\hat{θ}}_{i, B} (γ_{0})

over various choices of

1 \leq i \leq S

; that is,

\begin{matrix} {\hat{θ}}_{B} (γ_{0}) = \frac{1}{S} \sum_{i = 1}^{S} {\hat{θ}}_{i, B} (γ_{0}) . \end{matrix}

(5)

The averaging improves the accuracy of the estimator in small compressed samples (data not presented). For this reason, we provide results for this general case, even though our simulation and theoretical results demonstrate that for some problems considered in this paper, S can be taken to be one. We now turn to our main results which are presented in the next subsection. The following Figure 1 provides a overview of our work.

3.3. Main Results

In this section we state our main results concerning the asymptotic properties of the MHDE of compressed data

Y_{i l}

. We emphasize here that we only store

{({\tilde{Y}}_{i l}, r_{i \cdot l}, ω_{i l}^{2}) : i = 1, 2, \dots, S; l = 1, 2, \dots, B} .

Specifically, we establish the continuity property in

γ_{0}

of the proposed methods by establishing the existence of the iterated limits. This provides a first step in establishing the double limit. The first proposition is well-known and is concerned with the existence and uniqueness of MHDE for the location-scale family defined in () using compressed data.

Proposition 1.

Assume that

h_{J} (\cdot | θ, γ_{0})

is a continuous density function. Assume further that if

θ_{1} \neq θ_{2}

. Then for every

γ_{0} \geq 0

,

h_{J} (y | θ_{1}, γ_{0}) \neq h_{J} (y | θ_{2}, γ_{0})

on a set of positive Lebesgue measure, the MHDE in (4) exists and is unique.

Proof.

The proof follows from Theorem 2.2 of [20] since, without loss of generality,

Θ

is taken to be compact and the density function

h_{J} (\cdot | θ, γ_{0})

is continuous in

θ

. □

Consistency: We next turn our attention to consistency. As explained previously, under regularity conditions for each fixed

γ_{0}

, the MHDE

{\hat{θ}}_{i, B} (γ_{0})

is consistent for

θ^{*} (γ_{0})

. The next result says that under additional conditions, the consistency property of MHDE is continuous in

γ_{0}

.

Proposition 2.

Let

h_{J} (\cdot | θ, γ_{0})

be a continuous probability density function satisfying the conditions of Proposition 1. Assume that

\begin{matrix} lim_{γ_{0} \to 0} sup_{θ \in Θ} \int_{I R} | h_{J} (y | θ, γ_{0}) - h^{* J} (y | θ) | d y = 0 . \end{matrix}

(6)

Then, with probability one (wp1) the iterated limits also exist and equals

θ^{*}

; that is, for

1 \leq i \leq S

,

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} {\hat{θ}}_{i, B} (γ_{0}) = lim_{γ_{0} \to 0} lim_{B \to \infty} {\hat{θ}}_{i, B} (γ_{0}) = θ^{*} . \end{matrix}

Proof.

Without loss of generality let

Θ

be compact since otherwise it can be embedded into a compact set as described in [1]. Since

f (\cdot)

is continuous in

θ

and

g (\cdot)

is continuous in

γ_{0}

, it follows that

h_{J} (\cdot | θ, γ_{0})

is continuous in

θ

and

γ_{0}

. Hence by Theorem 1 of [1] for every fixed

γ_{0} \geq 0

and

1 \leq i \leq S

,

\begin{matrix} lim_{B \to \infty} {\hat{θ}}_{i, B} (γ_{0}) = θ^{*} (γ_{0}) . \end{matrix}

Thus, to verify the convergence of

θ^{*} (γ_{0})

to

θ^{*}

as

γ_{0} \to 0

, we first establish, using (6), that

\begin{matrix} lim_{γ_{0} \to 0} sup_{θ \in Θ} | A (h_{J} (\cdot | θ, γ_{0}), h^{* J} (\cdot | θ)) - 1 | = 0 . \end{matrix}

To this end, we first notice that

\begin{matrix} sup_{θ \in Θ} H D^{2} (h_{J} (\cdot | θ, γ_{0}), h^{* J} (\cdot | θ)) \leq sup_{θ \in Θ} \int_{I R} | (h_{J} (y | θ, γ_{0}) - h^{* J} (y | θ) | d y . \end{matrix}

Hence, using (3),

\begin{matrix} sup_{θ \in Θ} | A (h_{J} (\cdot | θ, γ_{0}), h^{* J} (\cdot | θ)) - 1 | & = & \frac{1}{2} sup_{θ \in Θ} H D^{2} (h_{J} (\cdot | θ, γ_{0}), h^{* J} (\cdot | θ)) \\ \to & 0 as γ_{0} \to 0 . \end{matrix}

Hence,

\begin{matrix} lim_{γ_{0} \to 0} A (h_{J} (\cdot | θ^{*} (γ_{0}), γ_{0}), h^{* J} (\cdot | θ^{*} (γ_{0}))) = 1 . \end{matrix}

Also, by continuity,

\begin{matrix} lim_{γ_{0} \to 0} A (h^{* J} (\cdot | θ^{*} (γ_{0}), γ_{0}), h^{* J} (\cdot | θ^{*})) = 1, \end{matrix}

which, in turn implies that

\begin{matrix} lim_{γ_{0} \to 0} A (h_{J} (\cdot | θ^{*} (γ_{0}), γ_{0}), h^{* J} (\cdot | θ^{*})) = 1 . \end{matrix}

Thus existence of the iterated limit first as

B \to \infty

and then

γ_{0} \to 0

follows using compactness of

Θ

and the identifiability of the model. As for the other iterated limit, again notice notice that for each

1 \leq i \leq S

,

A (g_{B}^{(i)}, h_{J} (\cdot | θ, γ_{0}))

converges to

A (g_{B}^{(i)}, h^{* J} (\cdot | θ))

with probability one as

γ_{0}

converges to 0. The result then follows again by an application of Theorem 1 of [20]. □

Remark 1.

Verification of condition (6) seems to be involved even in the case of standard Gaussian random variables and standard Gaussian sensing random variables. Indeed in this case, the density of

h_{J} (\cdot | θ, γ_{0})

is a

J -

fold convolution of a Bessel function of second kind. It may be possible to verify the condition (6) using the properties of these functions and compactness of the parameter space Θ. However, if one is focused only on weak-consistency, it is an immediate consequence of Theorems 1 and 2 below and condition (6) is not required. Finally, it is worth mentioning here that the convergence in (6) without uniformity over Θ is a consequence of convergence in probability of

r_{i j l}

to 1 and Glick’s Theorem.

Asymptotic limit distribution: We now proceed to investigate the limit distribution of

θ_{B}^{*} (γ_{0})

as

B \to \infty

and

γ_{0} \to 0

. It is well-known that for fixed

γ_{0} \geq 0

, after centering and scaling,

θ_{B}^{*} (γ_{0})

has a limiting Gaussian distribution, under appropriate regularity conditions (see for instance [20]). However to evaluate the iterated limits as

γ_{0} \to 0

and

B \to \infty

, additional refinements of the techniques in [20] are required. To this end, we start with additional notations. Let

s_{J} (\cdot | θ, γ_{0}) = h_{J}^{\frac{1}{2}} (\cdot | θ, γ_{0})

and let the score function be denoted by

u_{J} (\cdot | θ, γ_{0}) \equiv \nabla log h_{J} (\cdot | θ, γ_{0}) = {(\frac{\partial log h_{J} (\cdot | θ, γ_{0})}{\partial μ}, \frac{\partial log h_{J} (\cdot | θ, γ_{0})}{\partial σ})}^{'}

. Also, the Fisher information

I (θ (γ_{0}))

is given by

\begin{matrix} I (θ (γ_{0})) = \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) d y . \end{matrix}

In addition, let

{\dot{s}}_{J} (\cdot | θ, γ_{0})

be the gradient of

s_{J} (\cdot | θ, γ_{0})

with respect to

θ

, and

{\ddot{s}}_{J} (\cdot | θ, γ_{0})

is the second derivative matrix of

s_{J} (\cdot | θ, γ_{0})

with respect to

θ

. In addition, let

t_{J} (\cdot | θ) = h^{* J \frac{1}{2}} (\cdot | θ)

and

v_{J} (\cdot | θ) = \nabla log h^{* J} (\cdot | θ)

. Furthermore, let

Y_{i l}^{*}

denote

Y_{i l}

when

γ_{0} \equiv 0

. Please note that in this case,

Y_{i l} = Y_{1 l}

for all

i = 1, 2, \dots, S .

The corresponding kernel density estimate of

Y_{i l}^{*}

is given by

\begin{matrix} g_{B}^{*} (y) = \frac{1}{B c_{B}} \sum_{l = 1}^{B} K (\frac{y - Y_{i l}^{*}}{c_{B}}) . \end{matrix}

(7)

We emphasize here that we suppress i on the LHS of the above equation since

g_{B}^{(i) *} (\cdot)

are equal for all

1 \leq i \leq S

.

The iterated limit distribution involves additional regularity conditions which are stated in the Appendix. The first step towards this aim is a representation formula which expresses the quantity of interest, viz.,

\sqrt{B} ({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))

as a sum of two terms, one involving sums of compressed i.i.d. random variables and the other involving remainder terms that converge to 0 at a specific rate. This expression will appear in different guises in the rest of the manuscript and will play a critical role in the proofs.

3.4. Representation Formula

Before we state the lemma, we first provide two crucial assumptions that allow differentiating the objective function and interchanging the differentiation and integration:

Model assumptions on

h_{J} (\cdot | θ, γ_{0})

(D1)

h_{J} (\cdot | θ, γ_{0})

is twice continuously differentiable in

θ

.

(D2) Assume further that

| | \nabla s_{J} (\cdot | θ, γ_{0}) {| |}_{2}

is continuous and bounded.

Lemma 1.

Assume that the conditions (D1) and (D2) hold. Then for every

1 \leq i \leq S

and

γ_{0} \geq 0,

the following holds:

\begin{matrix} B^{\frac{1}{2}} {({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))}^{'} = A_{1 B} (γ_{0}) + A_{2 B} (γ_{0}), where \end{matrix}

(8)

\begin{matrix} A_{1 B} (γ_{0}) = B^{\frac{1}{2}} D_{B}^{- 1} ({\tilde{θ}}_{i, B} (γ_{0})) T_{B} (γ_{0}), A_{2 B} (γ_{0}) = B^{\frac{1}{2}} D_{B}^{- 1} ({\tilde{θ}}_{i, B} (γ_{0})) R_{B} (γ_{0}), \end{matrix}

(9)

\begin{matrix} {\tilde{θ}}_{i, B} (γ_{0}) \in U_{B} (θ^{'} (γ_{0})), U_{B} (θ^{'} (γ_{0})) = {θ^{'} : θ^{'} (γ_{0}) = t θ^{*} (γ_{0}) + (1 - t) {\hat{θ}}_{i, B} (γ_{0}), t \in [0, 1]}, \end{matrix}

(10)

\begin{matrix} D_{B} (θ (γ_{0})) & = & - \frac{1}{2} \int_{I R} {\dot{u}}_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) g_{B}^{(i) \frac{1}{2}} (y) d y \\ - \frac{1}{4} \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) g_{B}^{(i) \frac{1}{2}} (y) d y \\ \equiv & D_{1 B} (θ (γ_{0})) + D_{2 B} (θ (γ_{0})), \end{matrix}

(11)

\begin{matrix} T_{B} (γ_{0}) \equiv \frac{1}{4} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) (h_{J} (y | θ^{*}, γ_{0}) - g_{B}^{(i)} (y)) d y, and \end{matrix}

(12)

\begin{matrix} R_{B} (γ_{0}) \equiv \frac{1}{4} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) {(h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i) \frac{1}{2}} (y))}^{2} d y . \end{matrix}

(13)

Proof.

By algebra, note that

{\dot{s}}_{J} (y | θ, γ_{0}) = \frac{1}{2} u_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) .

Furthermore, the second partial derivative of

s_{J} (\cdot | θ, γ_{0})

is given by

{\ddot{s}}_{J} (y | θ, γ_{0}) = \frac{1}{2} {\dot{u}}_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) + \frac{1}{4} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) .

Now using (D1) and (D2) and partially differentiating

H D_{B}^{2} (θ (γ_{0})) \equiv H D^{2} (g_{B}^{(i)} (\cdot), h_{J} (\cdot | θ, γ_{0}))

with respect to

θ

and setting it equal to 0, the estimating equations for

θ^{*} (γ_{0})

is

\begin{matrix} \nabla H D_{B}^{2} (θ^{*} (γ_{0})) = 0 . \end{matrix}

(14)

Let

{\hat{θ}}_{i, B} (γ_{0})

be the solution to (14). Now applying first order Taylor expansion of (14) we get

\begin{matrix} \nabla H D_{B}^{2} (θ^{*} (γ_{0})) = \nabla H D_{B}^{2} ({\hat{θ}}_{i, B} (γ_{0})) + D_{B} ({\tilde{θ}}_{i, B} (γ_{0})) ({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0})), \end{matrix}

where

{\tilde{θ}}_{i, B} (γ_{0})

is defined in (10), and

D_{B} (\cdot)

is given by

\begin{matrix} D_{B} (θ (γ_{0})) & = & - \frac{1}{2} \int_{I R} {\dot{u}}_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) g_{B}^{(i) \frac{1}{2}} (y) d y \\ - \frac{1}{4} \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) g_{B}^{(i) \frac{1}{2}} (y) d y \\ \equiv & D_{1 B} (θ (γ_{0})) + D_{2 B} (θ (γ_{0})), \end{matrix}

and

\nabla H D_{B}^{2} (\cdot)

is given by

\begin{matrix} \nabla H D_{B}^{2} (θ (γ_{0})) = - \frac{1}{2} \int_{I R} u_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) (h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i) \frac{1}{2}} (y)) d y . \end{matrix}

Thus,

\begin{matrix} {({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))}^{'} = D_{B}^{- 1} ({\tilde{θ}}_{i, B} (γ_{0})) \nabla H D_{B}^{2} (θ^{*} (γ_{0})) . \end{matrix}

By using the identity,

b^{\frac{1}{2}} - a^{\frac{1}{2}} = {(2 a^{\frac{1}{2}})}^{- 1} ((b - a) - {(b^{\frac{1}{2}} - a^{\frac{1}{2}})}^{2})

,

\nabla H D_{B}^{2} (θ^{*} (γ_{0}))

can be expressed as the difference of

T_{B} (γ_{0})

and

R_{B} (γ_{0})

, where

\begin{matrix} T_{B} (γ_{0}) \equiv \frac{1}{4} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) (h_{J} (y | θ^{*}, γ_{0}) - g_{B}^{(i)} (y)) d y, \end{matrix}

and

\begin{matrix} R_{B} (γ_{0}) \equiv \frac{1}{4} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) {(h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i) \frac{1}{2}} (y))}^{2} d y . \end{matrix}

Hence,

\begin{matrix} B^{\frac{1}{2}} {({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))}^{'} = A_{1 B} (γ_{0}) + A_{2 B} (γ_{0}), \end{matrix}

where

A_{1 B} (γ_{0})

and

A_{2 B} (γ_{0})

are given in (9). □

Remark 2.

In the rest of the manuscript, we will refer to

A_{2 B} (γ_{0})

as the remainder term in the representation formula.

We now turn to the first main result of the manuscript, namely a central limit theorem for

{\hat{θ}}_{i, B} (γ_{0})

as first

B \to \infty

and then

γ_{0} \to 0

. As a first step, we note that the Fisher information of the density

h^{* J} (\cdot | θ)

is given by

\begin{matrix} I (θ) = \int_{I R} v_{J} (y | θ) v_{J}^{'} (y | θ) h^{* J} (y | θ) d y . \end{matrix}

(15)

Next we state the assumptions needed in the proof of Theorem 1. We separate these conditions as (i) model assumptions, (ii) kernel assumptions, (iii) regularity conditions, (iV) conditions that allow comparison of original data and compressed data.

Model assumptions on

h^{* J} (\cdot | θ)

(D1’)

h^{* J} (\cdot | θ)

is twice continuously differentiable in

θ

.

(D2’) Assume further that

| | \nabla t_{J} (\cdot | θ) {| |}_{2}

is continuous and bounded.

Kernel assumptions

(B1)

K (\cdot)

is symmetric about 0 on a compact support and bounded in

L_{2} .

We denote the support of

K (\cdot)

by

S u p p (K)

.

(B2) The bandwidth

c_{B}

satisfies

c_{B} \to 0

,

B^{\frac{1}{2}} c_{B} \to \infty, B^{\frac{1}{2}} c_{B}^{2} \to 0

.

Regularity conditions

(M1) The function

u_{J} (\cdot | θ, γ_{0}) s_{J} (\cdot | θ, γ_{0})

is continuously differentiable and bounded in

L_{2}

at

θ^{*}

.

(M2) The function

{\dot{u}}_{J} (\cdot | θ, γ_{0}) s_{J} (\cdot | θ, γ_{0})

is continuous and bounded in

L_{2}

at

θ^{*}

. In addition, assume that

\begin{matrix} lim_{B \to \infty} \int_{I R} {({\dot{u}}_{J} (y | θ_{i, B}, γ_{0}) s_{J} (y | θ_{i, B}, γ_{0}) - {\dot{u}}_{J} (y | θ^{*}, γ_{0}) s_{J} (y | θ^{*}, γ_{0}))}^{2} d y = 0 . \end{matrix}

(M3) The function

u_{J} (\cdot | θ, γ_{0}) u_{J}^{'} (\cdot | θ, γ_{0}) s_{J} (\cdot | θ, γ_{0})

is continuous and bounded in

L_{2}

at

θ^{*}

; also,

\begin{matrix} lim_{B \to \infty} \int_{I R} {(u_{J} (y | {\hat{θ}}_{i, B}, γ_{0}) u_{J}^{'} (y | {\hat{θ}}_{i, B}, γ_{0}) s_{J} (y | θ_{i, B}, γ_{0}) - u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) s_{J} (y | θ^{*}, γ_{0}))}^{2} d y = 0 . \end{matrix}

(M4) Let

{α_{B} : B \geq 1}

be a sequence diverging to infinity. Assume that

\begin{matrix} lim_{B \to \infty} B sup_{t \in S u p p (K)} P_{θ^{*} (γ_{0})} (| Δ - c_{B} t | > α_{B}) = 0, \end{matrix}

where

S u p p (K)

is the support of the kernel density

K (\cdot)

and

Δ

is a generic random variable with density

h_{J} (\cdot | θ^{*}, γ_{0})

.

(M5) Let

\begin{matrix} M_{B} = sup_{| y | \leq α_{B}} sup_{t \in S u p p (K)} |\frac{h_{J} (y - t c_{B} | θ^{*}, γ_{0})}{h_{J} (y | θ^{*}, γ_{0})}| . \end{matrix}

Assume

sup_{B \geq 1} M_{B} < \infty

.

(M6) The score function has a regular central behavior relative to the smoothing constants, i.e.,

\begin{matrix} lim_{B \to \infty} {(B^{\frac{1}{2}} c_{B})}^{- 1} \int_{- α_{B}}^{α_{B}} u_{J} (y | θ^{*}, γ_{0}) d y = 0 . \end{matrix}

Furthermore,

\begin{matrix} lim_{B \to \infty} (B^{\frac{1}{2}} c_{B}^{4}) \int_{- α_{B}}^{α_{B}} u_{J} (y | θ^{*}, γ_{0}) d y = 0 . \end{matrix}

(M7) The density functions are smooth in an

L_{2}

sense; i.e.,

\begin{matrix} lim_{B \to \infty} sup_{t \in S u p p (K)} \int_{I R} {(u_{J} (y + c_{B} t | θ^{*}, γ_{0}) - u_{J} (y | θ^{*}, γ_{0}))}^{2} h_{J} (y | θ^{*}, γ_{0}) d y = 0 . \end{matrix}

(M1’) The function

v_{J} (\cdot | θ) t_{J} (\cdot | θ)

is continuously differentiable and bounded in

L_{2}

at

θ^{*}

.

(M2’) The function

{\dot{v}}_{J} (\cdot | θ) t_{J} (\cdot | θ)

is continuous and bounded in

L_{2}

at

θ^{*}

. In addition, assume that

\begin{matrix} lim_{B \to \infty} \int_{I R} {({\dot{v}}_{J} (y | θ_{B}) t_{J} (y | θ_{B}) - {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}))}^{2} d y = 0 . \end{matrix}

(M3’) The function

v_{J} (\cdot | θ) v_{J}^{'} (\cdot | θ) t_{J} (\cdot | θ)

is continuous and bounded in

L_{2}

at

θ^{*}

. also,

\begin{matrix} lim_{B \to \infty} \int_{I R} {(v_{J} (y | {\hat{θ}}_{i, B}) v_{J}^{'} (y | {\hat{θ}}_{i, B}) t_{J} (y | {\hat{θ}}_{i, B}) - v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*}))}^{2} d y = 0 . \end{matrix}

Assumptions comparing models for original and compressed data

(O1) For all

θ \in Θ

,

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} {(u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) - v_{J} (y | θ) v_{J}^{'} (y | θ) t_{J} (y | θ))}^{2} d y = 0 . \end{matrix}

(O2) For all

θ \in Θ

,

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} {({\dot{u}}_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) - {\dot{v}}_{J} (y | θ) t_{J} (y | θ))}^{2} d y = 0 . \end{matrix}

Theorem 1.

Assume that the conditions (B1)–(B2), (D1)–(D2) , (D1’)–(D2’), (M1)–(M7), (M1’)–(M3’), and (O1)–(O2) hold. Then, for every

1 \leq i \leq S,

the following holds:

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} P (\sqrt{B} ({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0})) \leq x) = P (G \leq x), \end{matrix}

where G is a bivariate Gaussian random variable with mean 0 and variance

I^{- 1} (θ^{*})

, where

I (θ)

is defined in (15).

Before we embark on the proof of Theorem 1, we first discuss the assumptions. Assumptions (B1) and (B2) are standard assumptions on the kernel and the bandwidth and are typically employed when investigating the asymptotic behavior of divergence-based estimators (see for instance [1]). Assumptions (M1)–(M7) and (M1’)–(M3’) are regularity conditions which are concerned essentially with

L_{2}

continuity and boundedness of the scores and their derivatives. Assumptions (O1)–(O2) allow for comparison of

u_{J} (\cdot | θ, γ_{0})

and

v_{J} (\cdot | θ)

. Returning to the proof of Theorem 1, using representation formula, we will first show that

{lim}_{γ_{0} \to 0} {lim}_{B \to \infty} P (A_{1 B} (γ_{0}) \leq x) = P (G \leq x)

, and then prove that

{lim}_{γ_{0} \to 0} {lim}_{B \to \infty} A_{2 B} (γ_{0}) = 0

in probability. We start with the following proposition.

Proposition 3.

Assume that the conditions (B1), (D1)–(D2), (M1)–(M3), (M1’)–(M3’), (M7) and (O1)–(O2) hold. Then,

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} P (A_{1 B} (γ_{0}) \leq x) = P (G \leq x), \end{matrix}

where G is given in Theorem 1.

We divide the proof of Proposition 3 into two lemmas. In the first lemma we will show that

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} D_{B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{4} I (θ^{*}) . \end{matrix}

Next in the second lemma we will show that first letting

B \to \infty

and then allowing

γ_{0} \to 0,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) \overset{d}{\to} N (0, I (θ^{*})) . \end{matrix}

We start with the first part.

Lemma 2.

Assume that the conditions (D1)–(D2), (D1’)–(D2’), (M1)–(M3), (M1’)–(M3’) and (O1)–(O2) hold. Then, with probability one, the following prevails:

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} D_{B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{4} I (θ^{*}) . \end{matrix}

Proof.

Using representation formula in Lemma 1. First fix

γ_{0} > 0

. It suffices to show

\begin{matrix} lim_{B \to \infty} D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{2} I (θ^{*} (γ_{0})), and lim_{B \to \infty} D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{4} I (θ^{*} (γ_{0})) . \end{matrix}

We begin with

D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0}))

. By algebra,

D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0}))

can be expressed as

\begin{matrix} D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) = D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0})) + D_{1 B}^{(2)} ({\tilde{θ}}_{i, B} (γ_{0})) + D_{1 B}^{(3)} (θ^{*} (γ_{0})), where \end{matrix}

\begin{matrix} D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{2} \int_{I R} {\dot{u}}_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) s_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) (g_{B}^{(i) \frac{1}{2}} (y) - s_{J} (y | θ^{*}, γ_{0})) d y, \end{matrix}

\begin{matrix} D_{1 B}^{(2)} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{2} \int_{I R} ({\dot{u}}_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) s_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) - {\dot{u}}_{J} (y | θ^{*}, γ_{0}) s_{J} (y | θ^{*}, γ_{0})) h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) d y, \end{matrix}

\begin{matrix} and D_{1 B}^{(3)} (θ_{B}^{*} (γ_{0})) = - \frac{1}{2} \int_{I R} {\dot{u}}_{J} (y | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y = \frac{1}{2} I (θ^{*} (γ_{0})) . \end{matrix}

It suffices to show that as

B \to \infty

,

D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0})) \to 0

, and

D_{1 B}^{(2)} ({\tilde{θ}}_{i, B} (γ_{0})) \to 0

. We first consider

D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0}))

. By Cauchy-Schwarz inequality and assumption (M2), it follows that there exists

0 < C_{1} < \infty

,

\begin{matrix} |D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0}))| & \leq & \frac{1}{2} {\{\int_{I R} {({\dot{u}}_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) s_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}))}^{2} d y\}}^{\frac{1}{2}} {\{\int_{I R} {(g_{B}^{(i) \frac{1}{2}} (y) - s_{J} (y | θ^{*}, γ_{0}))}^{2} d y\}}^{\frac{1}{2}} \\ \leq & C_{1} {\{\int_{I R} {(g_{B}^{(i) \frac{1}{2}} (y) - s_{J} (y | θ^{*}, γ_{0}))}^{2} d y\}}^{\frac{1}{2}} \to 0, \end{matrix}

where the last convergence follows from the

L_{1}

convergence of

g_{B}^{(i)} (\cdot)

and

h_{J} (\cdot | θ^{*}, γ_{0})

. Hence, as

B \to \infty

,

D_{1 B}^{(1)} ({\tilde{θ}}_{i, B} (γ_{0})) \to 0

. Next we consider

D_{1 B}^{(2)} ({\tilde{θ}}_{i, B} (γ_{0}))

. Again, by Cauchy-Schwarz inequality and assumption (M2), it follows that

D_{1 B}^{(2)} ({\tilde{θ}}_{i, B} (γ_{0})) \to 0

. Hence

D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) \to \frac{1}{2} I (θ^{*} (γ_{0}))

. Turning to

D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0}))

, by similar argument, using Cauchy-Schwarz inequality and assumption (M3), it follows that

D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) \to - \frac{1}{4} I (θ^{*} (γ_{0}))

. Thus, to complete the proof, it is enough to show that

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{2} I (θ^{*}) and lim_{γ_{0} \to 0} lim_{B \to \infty} D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{4} I (θ^{*}) . \end{matrix}

(16)

We start with the first term of (16). Let

\begin{matrix} J_{1} (γ_{0}) = \int_{I R} {\dot{u}}_{J} (y | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y - \int_{I R} {\dot{v}}_{J} (y | θ^{*}) h^{* J} (y | θ^{*}) d y . \end{matrix}

We will show that

{lim}_{γ_{0} \to 0} J_{1} (γ_{0}) = 0

. By algebra, the difference of the above two terms can be expressed as the sum of

J_{11} (γ_{0})

and

J_{12} (γ_{0})

, where

\begin{matrix} J_{11} (γ_{0}) = \int_{I R} ({\dot{u}}_{J} (y | θ^{*}, γ_{0}) s_{J} (y | θ^{*}, γ_{0}) - {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*})) s_{J} (y | θ^{*}, γ_{0}) d y, and \end{matrix}

\begin{matrix} J_{12} (γ_{0}) = \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) (s_{J} (y | θ^{*}, γ_{0}) - t_{J} (y | θ^{*})) d y . \end{matrix}

J_{11} (γ_{0})

converges to zero by Cauchy-Schwarz inequality and assumption (O2), and

J_{12} (γ_{0})

converges to zero by Cauchy-Schwarz inequality, assumption (M2’) and Scheffe’s theorem. Next we consider the second term of (16). Let

\begin{matrix} J_{2} (γ_{0}) = \int_{I R} u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y - \int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) h^{* J} (y | θ^{*}) d y . \end{matrix}

We will show that

{lim}_{γ_{0} \to 0} J_{2} (γ_{0}) = 0

. By algebra, the difference of the above two terms can be expressed as the sum of

J_{21} (γ_{0})

and

J_{22} (γ_{0})

, where

\begin{matrix} J_{21} (γ_{0}) = \int_{I R} (u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) s_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*})) s_{J} (y | θ^{*}, γ_{0}) d y, \end{matrix}

\begin{matrix} and J_{22} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*}) (s_{J} (y | θ^{*}, γ_{0}) - t_{J} (y | θ^{*})) d y . \end{matrix}

J_{11} (γ_{0})

converges to zero by Cauchy-Schwarz inequality and assumption (O1), and

J_{12} (γ_{0})

converges to zero by Cauchy-Schwarz inequality, assumption (M3’) and Scheffe’s theorem. Therefore the lemma holds. □

Lemma 3.

Assume that the conditions (B1), (D1)–(D2), (D1’)–(D2’), (M1)–(M3), (M3’), (M7) and (O1)–(O2) hold. Then, first letting

B \to \infty

, and then

γ_{0} \to 0,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) \overset{d}{\to} N (0, I (θ^{*})) . \end{matrix}

Proof.

First fix

γ_{0} > 0

. Please note that using

\int_{I R} u_{J} (y | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y = 0

, we have that

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) & = & B^{\frac{1}{2}} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) g_{B}^{(i)} (y) d y \\ = & B^{\frac{1}{2}} \int_{I R} u_{J} (y | θ^{*}, γ_{0}) \frac{1}{B} \sum_{l = 1}^{B} \frac{1}{c_{B}} K (\frac{y - Y_{i l}}{c_{B}}) d y \\ = & B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} u_{J} (Y_{i l} + c_{B} t | θ^{*}, γ_{0}) K (t) d t . \end{matrix}

Therefore,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) - B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} u_{J} (Y_{i l} | θ^{*}, γ_{0}) = B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} (u_{J} (Y_{i l} + c_{B} t | θ^{*}, γ_{0}) - u_{J} (Y_{i l} | θ^{*}, γ_{0})) K (t) d t . \end{matrix}

Since

Y_{i l}

’s are i.i.d. across l, using Cauchy-Schwarz inequality and assumption (B1), we can show that there exists

0 < C < \infty

,

\begin{matrix} E {[4 B^{\frac{1}{2}} T_{B} - B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} u_{J} (Y_{i l} | θ^{*}, γ_{0})]}^{2} & = & E {[\int_{I R} (u_{J} (Y_{i 1} + c_{B} t | θ^{*}, γ_{0}) - u_{J} (Y_{i 1} | θ^{*}, γ_{0})) K (t) d t]}^{2} \\ \leq & C E {[{\{\int_{I R} {(u_{J} (Y_{i 1} + c_{B} t | θ^{*}, γ_{0}) - u_{J} (Y_{i 1} | θ^{*}, γ_{0}))}^{2} d t\}}^{\frac{1}{2}}]}^{2} \\ \leq & C E [\int_{I R} {(u_{J} (Y_{i 1} + c_{B} t | θ^{*}, γ_{0}) - u_{J} (Y_{i 1} | θ^{*}, γ_{0}))}^{2} d t] \\ = & C \int_{I R} \int_{I R} {(u_{J} (y + c_{B} t | θ^{*}, γ_{0}) - u_{J} (y | θ^{*}, γ_{0}))}^{2} h_{J} (y | θ^{*}, γ_{0}) d y d t, \end{matrix}

converging to zero as

B \to \infty

by assumption (M7). Also, the limiting distribution of

4 B^{\frac{1}{2}} T_{B} (γ_{0})

is

N (0, I (θ^{*} (γ_{0})))

as

B \to \infty

. Now let

γ_{0} \to 0 .

It is enough to show that as

γ_{0} \to 0

the density of

N (0, I (θ^{*} (γ_{0})))

converges to the density of

N (0, I (θ^{*}))

. To this end, it suffices to show that

{lim}_{γ_{0} \to 0} I (θ^{*} (γ_{0})) = I (θ^{*})

. However, this is established in Lemma 2. Combining the results, the lemma follows. □

Proof of Proposition 3.

The proof of Proposition 3 follows immediately by combining Lemmas 2 and 3. □

We now turn to establishing that the remainder term in the representation formula converges to zero.

Lemma 4.

Assume that the assumptions (B1)–(B2), (M1)–(M6) hold. Then

\begin{matrix} lim_{γ_{0} \to 0} lim_{B \to \infty} A_{2 B} (γ_{0}) = 0 in probability . \end{matrix}

Proof.

Using Lemma 2, it is sufficient to show that

B^{\frac{1}{2}} R_{B}

converges to 0 in probability as

B \to \infty

. Let

\begin{matrix} d_{J} (y | θ^{*} (γ_{0})) = g_{B}^{(i) \frac{1}{2}} (y) - s_{J} (y | θ^{*}, γ_{0}) . \end{matrix}

Please note that

\begin{matrix} d_{J}^{2} (y | θ^{*} (γ_{0})) \leq 2 \{{(h_{J} (y | θ^{*}, γ_{0}) - E [g_{B}^{(i)} (y)])}^{2} + {(E [g_{B}^{(i)} (y)] - g_{B}^{(i)} (y))}^{2}\} h_{J}^{- 1} (y | θ^{*}, γ_{0}) . \end{matrix}

Then

\begin{matrix} | R_{B} (γ_{0}) | & \leq & \frac{1}{2} \int_{I R} | u_{J} (y | θ^{*}, γ_{0}) | d_{J}^{2} (y | θ^{*} (γ_{0})) d y \\ \leq & \frac{1}{2} \int_{- α_{B}}^{α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | d_{J}^{2} (y | θ^{*} (γ_{0})) d y + \frac{1}{2} \int_{| y | \geq α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | d_{J}^{2} (y | θ^{*} (γ_{0})) d y \\ \equiv & R_{1 B} (γ_{0}) + R_{2 B} (γ_{0}) . \end{matrix}

We first deal with

R_{1 B} (γ_{0})

, which can be expressed as the sum of

R_{1 B} (γ_{0})

and

R_{2 B} (γ_{0})

, where

\begin{matrix} R_{1 B}^{(1)} (γ_{0}) = \int_{- α_{B}}^{α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | {(h_{J} (y | θ^{*}, γ_{0}) - E [g_{B}^{(i)} (y)])}^{2} h_{J}^{- 1} (y | θ^{*}, γ_{0}) d y, \end{matrix}

(17)

\begin{matrix} and R_{1 B}^{(2)} (γ_{0}) = \int_{- α_{B}}^{α_{B}} | u (y | θ^{*}, γ_{0}) | {(E [g_{B}^{(i)} (y)] - g_{B}^{(i)} (y))}^{2} h_{J}^{- 1} (y | θ^{*}, γ_{0}) d y . \end{matrix}

Now consider

R_{1 B}^{(2)}

. Let

ϵ > 0

be arbitrary but fixed. Then, by Markov’s inequality,

\begin{matrix} P (B^{\frac{1}{2}} R_{1 B}^{(2)} > ϵ) & \leq & ϵ^{- 1} B^{\frac{1}{2}} E [R_{1 B}^{(2)}] \\ \leq & ϵ^{- 1} B^{\frac{1}{2}} \int_{α_{B}}^{α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | (Var [g_{B}^{(i)} (y)]) h_{J}^{- 1} (y | θ^{*}, γ_{0}) d y . \end{matrix}

(18)

Now since

Y_{i l}^{'} s

are independent and identically distributed across l, it follows that

\begin{matrix} Var [g_{B}^{(i)} (y)] & \leq & \frac{1}{B c_{B}} \int_{I R} K^{2} (t) h_{J} (y - t c_{B} | θ^{*}, γ_{0}) d t . \end{matrix}

(19)

Now plugging (19) into (18), interchanging the order of integration (using Tonelli’s Theorem), we get

\begin{matrix} P (B^{\frac{1}{2}} R_{1 B}^{(2)} > ϵ) \leq C {(B^{\frac{1}{2}} c_{B})}^{- 1} \int_{- α_{B}}^{α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | d y \to 0, \end{matrix}

where C is a universal constant, and the last convergence follows from conditions (M5)–(M6). We now deal with

R_{1 B}^{(1)}

. To this end, we need to calculate

{(E [g_{B}^{(i)} (y)] - h_{J} (y | θ^{*}, γ_{0}))}^{2}

. Using change of variables, two-step Taylor approximation, and assumption (B1), we get

\begin{matrix} E [g_{B}^{(i)} (y)] - h_{J} (y | θ^{*}, γ_{0}) & = & \int_{I R} K (t) (h_{J} (y - t c_{B} | θ^{*}, γ_{0}) - h_{J} (y | θ^{*}, γ_{0})) d t \\ = & \int_{I R} K (t) \frac{{(t c_{B})}^{2}}{2} h_{J}^{''} (y_{B}^{*} (t) | θ^{*}, γ_{0}) d t . \end{matrix}

(20)

Now plugging in (20) into (17) and using conditions (M3) and (M6), we get

\begin{matrix} B^{\frac{1}{2}} R_{1 B}^{(1)} (γ_{0}) \leq C B^{\frac{1}{2}} c_{B}^{4} \int_{- α_{B}}^{α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | d y . \end{matrix}

(21)

Convergence of (21) to 0 now follows from condition (M6). We next deal with

R_{2 B} (γ_{0})

. To this end, by writing our the square term of

d_{J} (\cdot | θ^{*} (γ_{0}))

, we have

\begin{matrix} B^{\frac{1}{2}} R_{2 B} (γ_{0}) = \int_{| y | \geq α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | (h_{J} (y | θ^{*}, γ_{0}) + g_{B}^{(i)} (y) - s_{J} (y | θ^{*}, γ_{0}) g_{B}^{(i) \frac{1}{2}} (y)) d y . \end{matrix}

(22)

We will show that the RHS of (22) converges to 0 as

B \to \infty .

We begin with the first term. Please note that by Cauchy-Schwarz inequality,

\begin{matrix} B {(\int_{| y | \geq α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | h_{J} (y | θ^{*}, γ_{0}) d y)}^{2} & \leq & \{\int_{I R} u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y\} \\ {B P_{θ^{*} (γ_{0})} (| Δ | \geq α_{B})}, \end{matrix}

the last term converges to 0 by (M4). As for the second term, note that, a.s., by Cauchy-Schwarz inequality,

\begin{matrix} {(\int_{| y | \geq α_{B}} | u_{J} (y | θ^{*}, γ_{0}) | g_{B}^{(i)} (y) d y)}^{2} \leq \int_{| y | \geq α_{B}} u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) g_{B}^{(i)} (y) d y . \end{matrix}

Now taking the expectation and using Cauchy-Schwarz inequality, one can show that

\begin{matrix} B E {[\int_{| y | \geq α_{m}} | u_{J} (y | θ^{*}, γ_{0}) | g_{B}^{(i)} (y) d y]}^{2} \leq a_{B} \int_{I R} K (t) \int_{I R} u_{J} (y | θ^{*}, γ_{0}) u_{J}^{'} (y | θ^{*}, γ_{0}) h_{J} (y - c_{B} t | θ^{*}, γ_{0}) d y d t, \end{matrix}

where

a_{B} = B sup_{z \in S u p p (K)} P_{θ^{*}} (| Δ - c_{B} z | > α_{B})

. The convergence to 0 of the RHS of above inequality now follows from condition (M4). Finally, by another application of the Cauchy-Schwarz inequality,

\begin{matrix} B E [\int_{| y | \geq α_{m}} | u_{J} (y | θ^{*}, γ_{0}) | g_{B}^{(i) \frac{1}{2}} (y) s_{J} (y | θ^{*}, γ_{0}) d y] \leq a_{B} \int_{I R} u_{J} (y - c_{B} t | θ^{*}, γ_{0}) u_{J}^{'} (y - c_{B} t | θ^{*}, γ_{0}) h_{J} (y | θ^{*}, γ_{0}) d y . \end{matrix}

The convergence of RHS of above inequality to zero follows from (M4). Now the lemma follows. □

Proof of Theorem 1.

Recall that

\begin{matrix} B^{\frac{1}{2}} {({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))}^{'} = A_{1 B} (γ_{0}) + A_{2 B} (γ_{0}), \end{matrix}

where

A_{1 B} (γ_{0})

and

A_{2 B} (γ_{0})

are given in (9). Proposition 3 shows that

{lim}_{γ_{0} \to 0} {lim}_{B \to \infty} A_{1 B} (γ_{0}) = N (0, I^{- 1} (θ^{*}))

; while Lemma 4 shows that

{lim}_{γ_{0} \to 0} {lim}_{B \to \infty} A_{2 B} (γ_{0}) = 0

in probability. The result follows from Slutsky’s theorem. □

We next show that by interchanging the limits, namely first allowing

γ_{0}

to converge to 0 and then letting

B \to \infty

the limit distribution of

{\hat{θ}}_{i, B} (γ_{0})

is Gaussian with the same covariance matrix as Theorem 1. We begin with additional assumptions required in the proof of the theorem.

Regularity conditions

(M4’) Let

{α_{B} : B \geq 1}

be a sequence diverging to infinity. Assume that

\begin{matrix} lim_{B \to \infty} B sup_{t \in S u p p (K)} P_{θ^{*}} (| Δ - c_{B} t | > α_{B}) = 0, \end{matrix}

where

S u p p (K)

is the support of the kernel density

K (\cdot)

and

Δ

is a generic random variable with density

h^{* J} (\cdot | θ^{*})

.

(M5’) Let

\begin{matrix} M_{B} = sup_{| y | \leq α_{B}} sup_{t \in S u p p (K)} |\frac{h^{* J} (y - t c_{B} | θ^{*})}{h^{* J} (y | θ^{*})}| . \end{matrix}

Assume that

sup_{B \geq 1} M_{B} < \infty

.

(M6’) The score function has a regular central behavior relative to the smoothing constants, i.e.,

\begin{matrix} lim_{B \to \infty} {(B^{\frac{1}{2}} c_{B})}^{- 1} \int_{- α_{B}}^{α_{B}} v_{J} (y | θ^{*}) d y = 0 . \end{matrix}

Furthermore,

\begin{matrix} lim_{B \to \infty} (B^{\frac{1}{2}} c_{B}^{4}) \int_{- α_{B}}^{α_{B}} v_{J} (y | θ^{*}) d y = 0 . \end{matrix}

(M7’) The density functions are smooth in an

L_{2}

sense; i.e.,

\begin{matrix} lim_{B \to \infty} sup_{t \in S u p p (K)} \int_{I R} {(v_{J} (y + c_{B} t | θ^{*}) - v_{J} (y | θ^{*}))}^{2} h^{* J} (y | θ^{*}) d y = 0 . \end{matrix}

Assumptions comparing models for original and compressed data

(V1) Assume that

{lim}_{γ_{0} \to 0} {sup}_{y} | u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*}) | = 0 .

(V2)

v_{J} (\cdot | θ)

is

L_{1}

continuous in the sense that

X_{n} \overset{p}{\to} X

implies that

E [v_{J} (X_{n} | θ) - v_{J} (X | θ)] = 0

, where the expectation is with respect to distribution

K (\cdot)

.

(V3) Assume that for all

θ \in Θ

,

\int_{I R} \nabla h^{* J} (y | θ) d y < \infty

.

(V4) Assume that for all

θ \in Θ

,

{lim}_{γ_{0} \to 0} {sup}_{y} |\frac{s_{J} (y | θ, γ_{0})}{t_{J} (y | θ)} - 1| = 0

.

Theorem 2.

Assume that the conditions (B1)–(B2), (D1’)–(D2’), (M1’)–(M7’), (O1)–(O2) and (V1)–(V4) hold. Then,

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} P (\sqrt{B} ({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0})) \leq x) = P (G \leq x), \end{matrix}

where G is a bivariate Gaussian random variable with mean 0 and variance

I^{- 1} (θ^{*})

.

We notice that in the above Theorem 2 that we use conditions (V2)–(V4) which are regularity conditions on the scores of the

J -

fold convolution of

f (\cdot)

while

(V 1)

facilitates comparison of the scores of the densities of the compressed data and that of the

J -

fold convolution. As before, we will first establish (a):

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} P (A_{1 B} (γ_{0}) \leq x) = P (G \leq x), \end{matrix}

and then (b):

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} A_{2 B} (γ_{0}) = 0

in probability. We start with the proof of (a).

Proposition 4.

Assume that the conditions (B1)–(B2), (D1’)–(D2’), (M1’)–(M3’), (M7’), (O1)–(O2), and (V1)–(V2) hold. Then,

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} P (A_{1 B} (γ_{0}) \leq x) = P (G \leq x) . \end{matrix}

We divide the proof of Proposition 4 into two lemmas. In the first lemma, we will show that

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} D_{B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{4} I (θ^{*}) . \end{matrix}

In the second lemma, we will show that first let

γ_{0} \to 0

, then let

B \to \infty

,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) \overset{d}{\to} N (0, I (θ^{*})) . \end{matrix}

Lemma 5.

Assume that the conditions (B1)–(B2), (D1’)–(D2’), (M1’)–(M3’), (O1)–(O2), and (V1)–(V2) hold. Then,

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} D_{B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{4} I (θ^{*}) . \end{matrix}

(23)

Proof.

First fix B. Recall that

\begin{matrix} D_{B} (θ (γ_{0})) & = & - \frac{1}{2} \int_{I R} {\dot{u}}_{J} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) {(g_{B}^{(i)} (y))}^{\frac{1}{2}} d y \\ - \frac{1}{4} \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) {(g_{B}^{(i)} (y))}^{\frac{1}{2}} d y \\ \equiv & D_{1 B} (θ (γ_{0})) + D_{2 B} (θ (γ_{0})) . \end{matrix}

By algebra,

D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0}))

can be expressed as the sum of

H_{1 B}^{(1)}

,

H_{1 B}^{(2)}

,

H_{1 B}^{(3)}

,

H_{1 B}^{(4)}

and

H_{1 B}^{(5)}

, where

\begin{matrix} H_{1 B}^{(1)} = - \frac{1}{2} \int_{I R} [{\dot{u}}_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) s_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) - {\dot{v}}_{J} (y | {\tilde{θ}}_{i, B}) t_{J} (y | {\tilde{θ}}_{i, B})] g_{B}^{(i) \frac{1}{2}} (y) d y, \end{matrix}

\begin{matrix} H_{1 B}^{(2)} = - \frac{1}{2} \int_{I R} [{\dot{v}}_{J} (y | {\tilde{θ}}_{i, B}) t_{J} (y | {\tilde{θ}}_{i, B}) - {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*})] g_{B}^{(i) \frac{1}{2}} (y) d y, \end{matrix}

\begin{matrix} H_{1 B}^{(3)} = - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) [g_{B}^{(i) \frac{1}{2}} (y) - h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0})] d y, \end{matrix}

\begin{matrix} H_{1 B}^{(4)} = - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) [s_{J} (y | θ^{*}, γ_{0}) - t_{J} (y | θ^{*})] d y, and H_{1 B}^{(5)} = \frac{1}{2} I (θ^{*}) . \end{matrix}

We will show that

\begin{matrix} lim_{γ_{0} \to 0} D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) = H_{1 B}^{(2)} + lim_{γ_{0} \to 0} H_{1 B}^{(3)} + H_{1 B}^{(5)}, \end{matrix}

(24)

where

lim_{γ_{0} \to 0} H_{1 B}^{(3)} = - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) [g_{B}^{* \frac{1}{2}} (y) - t_{J} (y | θ^{*})] d y and

(25)

g_{B}^{*} (\cdot)

is given in (7). First consider

H_{1 B}^{(1)}

. it converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality and assumption (O2). Next we consider

H_{1 B}^{(3)}

. We will first show that

\begin{matrix} lim_{γ_{0} \to 0} - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) g_{B}^{(i) \frac{1}{2}} (y) d y = - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) g_{B}^{* \frac{1}{2}} (y) d y . \end{matrix}

To this end, notice that by Cauchy-Schwarz inequality and boundedness of

{\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*})

in

L_{2}

, it follows that there exists a constant C such that

\begin{matrix} |\int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) [g_{B}^{(i) \frac{1}{2}} (y) - g_{B}^{* \frac{1}{2}} (y)] d y| & \leq & C {\{\int_{I R} {(g_{B}^{(i) \frac{1}{2}} (y) - g_{B}^{* \frac{1}{2}} (y))}^{2} d y\}}^{\frac{1}{2}} \\ \leq & C {\{\int_{I R} |g_{B}^{(i)} (y) - g_{B}^{*} (y)| d y\}}^{\frac{1}{2}} . \end{matrix}

It suffices to show that

g_{B}^{(i)} (\cdot)

converges to

g_{B}^{*} (\cdot)

in

L_{1}

. Since

\begin{matrix} \int_{I R} | g_{B}^{(i)} (y) - g_{B}^{*} (y) | d y = 2 - 2 \int_{I R} min \{g_{B}^{(i)} (y), g_{B}^{*} (y)\} d y, \end{matrix}

and

min \{g_{B}^{(i)} (y), g_{B}^{*} (y)\} \leq g_{B}^{*} (y)

, by dominated convergence theorem,

g_{B}^{(i)} (\cdot) \overset{L_{1}}{\to} g_{B}^{*} (\cdot)

. Next we will show that

\begin{matrix} lim_{γ_{0} \to 0} - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) s_{J} (y | θ^{*}, γ_{0}) d y = - \frac{1}{2} \int_{I R} {\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*}) t_{J} (y | θ^{*}) d y . \end{matrix}

In addition, by Cauchy-Schwarz inequality, boundedness of

{\dot{v}}_{J} (y | θ^{*}) t_{J} (y | θ^{*})

in

L_{2}

and Scheffe’s theorem, we have that

\int_{I R} {\dot{v}}_{J} (y | θ^{*}) h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) (s_{J} (y | θ^{*} | γ_{0}) - t_{J} (y | θ^{*})) d y

converges to zero as

γ_{0} \to 0 .

Next we consider

H_{1 B}^{(4)}

. it converges to zero by Cauchy-Schwarz inequality and assumption (M2’). Thus (24) holds. Now let

B \to \infty,

we will show that

{lim}_{B \to \infty} H_{1 B}^{(2)} = 0

and

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} H_{1 B}^{(3)} = 0 .

First consider

{lim}_{B \to \infty} H_{1 B}^{(2)}

. It converges to zero by Cauchy-Schwarz inequality and assumption (M2’). Next we consider

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} H_{1 B}^{(3)}

. It converges to zero by Cauchy-Schwarz inequality and

L_{1}

convergence of

g_{B}^{*} (\cdot)

and

h^{* J} (\cdot | θ^{*})

. Therefore

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} D_{1 B} ({\tilde{θ}}_{i, B} (γ_{0})) = \frac{1}{2} I (θ^{*})

.

We now turn to show that

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{4} I (θ^{*})

. First fix B and express

D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0}))

as the sum of

H_{2 B}^{(1)}

,

H_{2 B}^{(2)}

,

H_{2 B}^{(3)}

,

H_{2 B}^{(4)}

, and

H_{2 B}^{(5)}

, where

\begin{matrix} H_{2 B}^{(1)} = - \frac{1}{4} \int_{I R} [u_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) u_{J}^{'} (y | {\tilde{θ}}_{i, B}, γ_{0}) s_{J} (y | {\tilde{θ}}_{i, B}, γ_{0}) - v_{J} (y | {\tilde{θ}}_{i, B}) v_{J}^{'} (y | {\tilde{θ}}_{i, B}) t_{J} (y | {\tilde{θ}}_{i, B})] g_{B}^{(i) \frac{1}{2}} (y) d y, \end{matrix}

\begin{matrix} H_{2 B}^{(2)} = - \frac{1}{4} \int_{I R} [v_{J} (y | {\tilde{θ}}_{i, B}) v_{J}^{'} (y | {\tilde{θ}}_{i, B}) t_{J} (y | {\tilde{θ}}_{i, B}) - v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*})] g_{B}^{(i) \frac{1}{2}} (y) d y, \end{matrix}

\begin{matrix} H_{2 B}^{(3)} = - \frac{1}{4} \int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*}) [g_{B}^{(i) \frac{1}{2}} (y) - h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0})] d y, \end{matrix}

\begin{matrix} H_{2 B}^{(4)} = - \frac{1}{4} \int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*}) [s_{J} (y | θ^{*}, γ_{0}) - t_{J} (y | θ^{*})] d y, and H_{2 B}^{(5)} = - \frac{1}{4} I (θ^{*}) . \end{matrix}

We will show that

\begin{matrix} lim_{γ_{0} \to 0} D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) = H_{2 B}^{(2)} + lim_{γ_{0} \to 0} H_{2 B}^{(3)} + H_{2 B}^{(5)}, where \end{matrix}

(26)

lim_{γ_{0} \to 0} H_{2 B}^{(3)} = - \frac{1}{2} \int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) t_{J} (y | θ^{*}) [g_{B}^{* \frac{1}{2}} (y) - t_{J} (y | θ^{*})] d y .

(27)

First consider

H_{2 B}^{(1)}

. It converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality and assumption (O1). Next consider

H_{2 B}^{(3)}

. By similar argument as above and boundedness of

v_{J}^{2} (y | θ^{*}) t_{J} (y | θ^{*})

, it follows that (27) holds. Next consider

H_{2 B}^{(4)}

. It converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality and assumption (M3’). Now let

B \to \infty,

we will show that

{lim}_{B \to \infty} H_{2 B}^{(2)} = 0

and

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} H_{2 B}^{(3)} = 0

. First consider

H_{2 B}^{(2)}

. It converges to zero by Cauchy-Schwarz inequality and assumption (M3’) as

B \to \infty

. Finally consider

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} H_{2 B}^{(3)}

. It converges to zero by Cauchy-Schwarz inequality and

L_{1}

convergence of

g_{B}^{*} (\cdot)

and

h^{* J} (\cdot | θ^{*})

. Thus

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} D_{2 B} ({\tilde{θ}}_{i, B} (γ_{0})) = - \frac{1}{4} I (θ^{*})

. Now letting

B \to \infty

, the proof of (23) follows using arguments similar to the one in Lemma 2. □

Lemma 6.

Assume that the conditions (B1)–(B2),(D1’)–(D2’), (M1’)–(M3’), (M7’), (O1)–(O2), and (V1)–(V2) hold. Then, first letting

B \to \infty

, and then letting

γ_{0} \to 0,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) \overset{d}{\to} N (0, I (θ^{*})) . \end{matrix}

(28)

Proof.

First fix B. We will show that as

γ_{0} \to 0,

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) \overset{d}{\to} \int_{I R} v_{J} (y | θ^{*}) g_{B}^{*} (y) d y . \end{matrix}

First observe that

\begin{matrix} 4 B^{\frac{1}{2}} T_{B} (γ_{0}) - \int_{I R} v_{J} (y | θ^{*}) g_{B}^{*} (y) d y & = & \int_{I R} [u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*})] g_{B}^{(i)} (y) d y \end{matrix}

(29)

\begin{matrix} + \int_{I R} v_{J} (y | θ^{*}) [g_{B}^{(i)} (y) - g_{B}^{*} (y)] d y . \end{matrix}

(30)

We will show that the RHS of (29) converges to zero as

γ_{0} \to 0

and the RHS of (30) converges to zero in probability as

γ_{0} \to 0

. First consider the RHS of (29). Since

\begin{matrix} \int_{I R} [u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*})] g_{B}^{(i)} (y) d y & \leq & \int_{I R} sup_{y} | u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*}) | g_{B}^{(i)} (y) d y, \end{matrix}

which converges to zero as

γ_{0} \to 0

by assumption (V1). Next consider the RHS of (30). Since

\begin{matrix} \int_{I R} v_{J} (y | θ^{*}) [g_{B}^{(i)} (y) - g_{B}^{*} (y)] d y & = & \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} [v_{J} (Y_{i l} + u c_{B}) - v_{J} (Y_{i l}^{*} + u c_{B})] K (u) d u . \end{matrix}

By assumption (V2), it follows that as

γ_{0} \to 0

, (30) converges to zero in probability. Now letting

B \to \infty

, we have

\begin{matrix} B^{\frac{1}{2}} \int_{I R} v_{J} (y | θ^{*}) g_{B}^{*} (y) d y - B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} v_{J} (Y_{i l}^{*} | θ^{*}) = B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} (v_{J} (Y_{i l}^{*} + c_{B} t | θ^{*}) - v_{J} (Y_{i l}^{*} | θ^{*})) K (t) d t, \end{matrix}

and

\begin{matrix} E {[B^{\frac{1}{2}} \int_{I R} v_{J} (y | θ^{*}) g_{B}^{*} (y) d y - B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} v_{J} (Y_{i l}^{*} | θ^{*})]}^{2} & = & E {[B^{\frac{1}{2}} \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} (v_{J} (Y_{i l}^{*} + c_{B} t | θ^{*}) - v_{J} (Y_{i l}^{*} | θ^{*})) K (t) d t]}^{2} \\ \leq & C E [\int_{I R} {(v_{J} (Y_{i 1}^{*} + c_{B} t | θ^{*}) - v_{J} (Y_{i 1}^{*} | θ^{*}))}^{2} d t] \\ = & C \int_{I R} \int_{I R} {(v_{J} (y + c_{B} t | θ^{*}) - v_{J} (y | θ^{*}))}^{2} h^{* J} (y | θ^{*}) d y d t \\ \to & 0 as B \to \infty, \end{matrix}

where the last convergence follows by assumption (M7’). Hence, using the Central limit theorem for independent and identically distributed random variables it follows that the limiting distribution of

B^{\frac{1}{2}} \int_{I R} v_{J} (y | θ^{*}) g_{B}^{*} (y) d y

is

N (0, I (θ^{*}))

, proving the lemma. □

Proof of Proposition 4.

The proof of Proposition 4 follows by combining Lemmas 5 and 6. □

Lemma 7.

Assume that the conditions (M1’)–(M6’) and (V1)–(V4) hold. Then,

\begin{matrix} lim_{B \to \infty} lim_{γ_{0} \to 0} A_{2 B} (γ_{0}) = 0 in probability . \end{matrix}

Proof.

First fix B. Let

\begin{matrix} H_{B} (γ_{0}) = \int_{I R} u_{J} (y | θ^{*}, γ_{0}) {[h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i)} (y)]}^{2} d y - \int_{I R} v_{J} (y | θ^{*}) {[t_{J} (y | θ^{*}) - g_{B}^{*} (y)]}^{2} d y . \end{matrix}

we will show that as

γ_{0} \to 0

,

H_{B} (γ_{0}) \to 0

. By algebra,

H_{B} (γ_{0})

can be written as the sum of

H_{1 B} (γ_{0})

and

H_{2 B} (γ_{0})

, where

\begin{matrix} H_{1 B} (γ_{0}) = \int_{I R} (u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*})) {[h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i)} (y)]}^{2} d y, and \end{matrix}

\begin{matrix} H_{2 B} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) {[h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - g_{B}^{(i)} (y)]}^{2} d y . \end{matrix}

First consider

H_{1 B} (γ_{0})

. It is bounded above by

C {sup}_{y} | u_{J} (y | θ^{*}, γ_{0}) - v_{J} (y | θ^{*}) |

, which converges to zero as

γ_{0} \to 0

by assumption (V1), where C is a constant. Next consider

H_{2 B} (γ_{0})

. We will show that

H_{2 B} (γ_{0})

converges to

\begin{matrix} \int_{I R} v_{J} (y | θ^{*}) {[t_{J} (y | θ^{*}, γ_{0}) - g_{B}^{*} (y)]}^{2} d y . \end{matrix}

In fact, the difference of

H_{2 B} (γ_{0})

and the above formula can be expressed as the sum of

H_{2 B}^{(1)} (γ_{0})

,

H_{2 B}^{(2)} (γ_{0}),

and

H_{2 B}^{(3)} (γ_{0})

, where

\begin{matrix} H_{2 B}^{(1)} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) (h_{J} (y | θ^{*}, γ_{0}) - h^{* J} (y | θ^{*})) d y, \end{matrix}

\begin{matrix} H_{2 B}^{(2)} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) (g_{B}^{(i)} (y) - g_{B}^{*} (y)) d y, and \end{matrix}

\begin{matrix} H_{2 B}^{(3)} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) (h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) g_{B}^{(i)} (y) - t_{J} (y | θ^{*}, γ_{0}) g_{B}^{*} (y)) d y . \end{matrix}

First consider

H_{2 B}^{(1)} (γ_{0})

. Please note that

\begin{matrix} |H_{2 B}^{(1)} (γ_{0})| & \leq & \int_{I R} | \nabla h^{* J} (y | θ^{*}) | |\frac{h_{J} (y | θ^{*}, γ_{0})}{h^{* J} (y | θ^{*})} - 1| d y \\ \leq & \{{(sup_{y} |\frac{s_{J} (y | θ, γ_{0})}{t_{J} (y | θ)} - 1|)}^{2} + 2 sup_{y} |\frac{s_{J} (y | θ, γ_{0})}{t_{J} (y | θ)} - 1|\} \int_{I R} | \nabla h^{* J} (y | θ^{*}) | d y, \end{matrix}

which converges to 0 as

γ_{0} \to 0

by assumptions (V3) and (V4). Next we consider

H_{2 B}^{(2)} (γ_{0})

. Since

\begin{matrix} H_{2 B}^{(2)} (γ_{0}) = \frac{1}{B} \sum_{l = 1}^{B} \int_{I R} (v_{J} (Y_{i l} + u c_{B} | θ^{*}) - v_{J} (Y_{i l}^{*} + u c_{B} | θ^{*})) K (u) d u, \end{matrix}

which converges to zero as

γ_{0} \to 0

due to assumption (V2). Finally consider

H_{2 B}^{(3)} (γ_{0})

, which can be expressed as the sum of

L_{1 B} (γ_{0})

and

L_{2 B}

, where

\begin{matrix} L_{1 B} (γ_{0}) = \int_{I R} v_{J} (y | θ^{*}) (h_{J}^{\frac{1}{2}} (y | θ^{*}, γ_{0}) - t_{J} (y | θ^{*})) g_{B}^{(i) \frac{1}{2}} (y) d y, and \end{matrix}

\begin{matrix} L_{2 B} = \int_{I R} v_{J} (y | θ^{*}) t_{J} (y | θ^{*}) (g_{B}^{(i) \frac{1}{2}} (y) - g_{B}^{* \frac{1}{2}} (y)) d y . \end{matrix}

First consider

L_{1 B} (γ_{0})

. Notice that

\begin{matrix} |L_{1 B} (γ_{0})| \leq sup_{y} |\frac{s_{J} (y | θ, γ_{0})}{t_{J} (y | θ)} - 1| \int_{I R} v_{J} (y | θ^{*}) t_{J} (y | θ^{*}) g_{B}^{(i) \frac{1}{2}} (y) d y \to 0, \end{matrix}

where the last convergence follows by Cauchy-Schwarz inequality and assumption (V4). Next we consider

L_{2 B}

. By Cauchy-Schwarz inequality, it is bounded above by

\begin{matrix} {\{\int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) h^{* J} (y | θ^{*}) d y\}}^{\frac{1}{2}} {\{\int_{I R} {(g_{B}^{(i) \frac{1}{2}} (y) - g_{B}^{* \frac{1}{2}} (y))}^{2} d y\}}^{\frac{1}{2}} . \end{matrix}

(31)

Equation (31) converges to zero as

γ_{0} \to 0

by boundedness of

\int_{I R} v_{J} (y | θ^{*}) v_{J}^{'} (y | θ^{*}) h^{* J} (y | θ^{*}) d y

and

L_{1}

convergence between

g_{B}^{(i)} (\cdot)

and

g_{B}^{*} (\cdot)

, where the

L_{1}

convergence has already been established in Lemma 5. Now letting

B \to \infty

, following similar argument as Lemma 4 and assumptions (M1’)–(M6’), the lemma follows. □

Proof of Theorem 2.

Recall that

\begin{matrix} B^{\frac{1}{2}} {({\hat{θ}}_{i, B} (γ_{0}) - θ^{*} (γ_{0}))}^{'} = A_{1 B} (γ_{0}) + A_{2 B} (γ_{0}) . \end{matrix}

Proposition 4 shows that first letting

γ_{0} \to 0,

then

B \to \infty

,

A_{1 B} (γ_{0}) \overset{d}{\to} N (0, I^{- 1} (θ^{*}))

; while Lemma 7 shows that

{lim}_{B \to \infty} {lim}_{γ_{0} \to 0} A_{2 B} (γ_{0}) = 0

in probability. The theorem follows from Slutsky’s theorem. □

Remark 3.

The above two theorems (Theorems 1 and 2) do not immediately imply the double limit exists. This requires stronger conditions and more delicate calculations and will be considered elsewhere.

3.5. Robustness of MHDE

In this section, we describe the robustness properties of MHDE for compressed data. Accordingly, let

h_{J, α, z} (\cdot | θ, γ_{0}) \equiv (1 - α) h_{J} (\cdot | θ, γ_{0}) + α η_{z}

, where

η_{z}

denotes the uniform density on the interval

(z - ϵ, z + ϵ)

, where

ϵ > 0

is small,

θ \in Θ

,

α \in (0, 1)

, and

z \in I R

. Also, let

s_{J, α, z} (y | θ, γ_{0}) = h_{J, α, z}^{\frac{1}{2}} (y | θ, γ_{0})

,

u_{J, α, z} (y | θ, γ_{0}) = \nabla log h_{J, α, z} (y | θ, γ_{0})

,

h_{α, z}^{* J} (\cdot | θ) \equiv (1 - α) h^{* J} (\cdot | θ) + α η_{z}

,

s_{α, z}^{* J} (\cdot | θ) = h_{α, z}^{* J \frac{1}{2}} (\cdot | θ)

, and

u_{α, z}^{* J} = \nabla log h_{α, z}^{* J} (\cdot | θ)

. Before we state the theorem, we describe certain additional assumptions-which are essentially

L_{2} -

continuity conditions-that are needed in the proof.

Model assumptions for robustness analysis

(O3) For

α \in [0, 1]

and all

θ \in Θ

,

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} {({\dot{u}}_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) - {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ))}^{2} d y = 0 . \end{matrix}

(O4) For

α \in [0, 1]

and all

θ \in Θ

,

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} {(u_{J, α, z} (y | θ, γ_{0}) u_{J, α, z}^{'} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) - u_{α, z}^{* J} (y | θ) u_{α, z}^{* J^{'}} (y | θ) s_{α, z}^{* J} (y | θ))}^{2} d y = 0 . \end{matrix}

Theorem 3.

(i) Let

α \in (0, 1),

and assume that for all

θ \in Θ

, and assume that the assumptions of Proposition 1 hold, also assume that

T (h_{J, α, z} (\cdot | θ, γ_{0}))

is unique for all z. Then,

T (h_{J, α, z} (\cdot | θ, γ_{0}))

is a bounded, continuous function of z and

\begin{matrix} lim_{γ_{0} \to 0} lim_{| z | \to \infty} T (h_{J, α, z} (\cdot | θ, γ_{0})) = θ; \end{matrix}

(32)

(ii) Assume further that the conditions (V1), (M2)-(M3), and (O3)-(O4) hold. Then,

\begin{matrix} lim_{γ_{0} \to 0} lim_{α \to 0} α^{- 1} [T (h_{J, α, z} (\cdot | θ, γ_{0})) - θ] = {[I (θ)]}^{- 1} \int_{I R} [η_{z} (y) v_{J} (y | θ)] d y, \end{matrix}

Proof.

Let

θ_{z} (γ_{0})

denote

T (h_{J, α, z} (\cdot | θ, γ_{0}))

and let

θ_{z}

denote

T (h_{α, z}^{* J} (\cdot | θ))

We first show that (32) holds. Let

γ_{0} \geq 0

be fixed. Then, by triangle inequality,

\begin{matrix} lim_{| z | \to \infty} | θ_{z} (γ_{0}) - θ | \leq lim_{| z | \to \infty} | θ_{z} (γ_{0}) - θ (γ_{0}) | + lim_{| z | \to \infty} | θ (γ_{0}) - θ | . \end{matrix}

(33)

We will show that the first term of RHS of (33) is equal to zero. Suppose that it is not zero, without loss of generality, by going to a subsequence if necessary, we may assume that

θ_{z} \to θ_{1} \neq θ

as

| z | \to \infty

. Since

θ_{z} (γ_{0})

minimizes

H D^{2} (h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0}))

, it follows that

\begin{matrix} H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0})) \leq H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) \end{matrix}

(34)

for every

θ^{'} \in Θ

. We now show that as

| z | \to \infty

,

\begin{matrix} H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0})) \to H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{1}, γ_{0})) . \end{matrix}

(35)

To this end, note that as

| z | \to \infty,

for every y,

\begin{matrix} h_{J, α, z} (y | θ, γ_{0}) \to (1 - α) h_{J} (y | θ, γ_{0}), and h_{J} (y | θ_{z}, γ_{0}) \to h_{J} (y | θ_{1}, γ_{0}) \end{matrix}

Therefore, as

| z | \to \infty,

\begin{matrix} |H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0})) - H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{1}, γ_{0}))| & \leq & 2 (Q_{1} + Q_{2}), \end{matrix}

where

\begin{matrix} Q_{1} = \int_{I R} |h_{J, α, z}^{\frac{1}{2}} (y | θ, γ_{0}) - {((1 - α) h_{J} (y | θ, γ_{0}))}^{\frac{1}{2}}| {(h_{J} (y | θ_{z}, γ_{0})))}^{\frac{1}{2}} d y, \end{matrix}

\begin{matrix} Q_{2} = \int_{I R} |h_{J}^{\frac{1}{2}} (y | θ_{z}, γ_{0}) - {(h_{J} (y | θ_{1}, γ_{0}))}^{\frac{1}{2}}| {((1 - α) h_{J} (y | θ, γ_{0}))}^{\frac{1}{2}} d y . \end{matrix}

Now, by Cauchy-Schwarz inequality and Scheffe’s theorem, it follows that as

| z | \to \infty

,

Q_{1} \to 0

and

Q_{2} \to 0

. Therefore, (35) holds. By Equations (34) and (35), we have

\begin{matrix} H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{1}, γ_{0})) \leq H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) \end{matrix}

(36)

for every

θ^{'} \in Θ

. Now consider

\begin{matrix} H I F (α, h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) \equiv \int_{I R} {({[(1 - α) δ (h_{J} (\cdot | θ, γ_{0}), h_{J} (y | θ^{'}, γ_{0})) + 1]}^{\frac{1}{2}} - 1)}^{2} h_{J} (y | θ^{'}, γ_{0}) d y, \end{matrix}

where

δ (h_{J} (\cdot | θ, γ_{0}), h_{J} (y | θ^{'}, γ_{0})) = \frac{h_{J} (y | θ, γ_{0})}{h_{J} (y | θ^{'}, γ_{0})} - 1

. Since

G^{*} (δ) = {[{((1 - α) δ + 1)}^{\frac{1}{2}} - 1]}^{2}

is a non-negative and strictly convex function with

δ = 0

as the unique point of minimum. Hence

H I F (α, h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) > 0

unless

δ (h_{J} (\cdot | θ, γ_{0}), h_{J} (y | θ^{'}, γ_{0})) = 0

on a set of Lebesgue measure zero, which by the model identifiability assumption , is true if and only if

θ^{'} = θ

. Since

θ_{1} \neq θ

, it follows that

\begin{matrix} H I F (α, h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{1}, γ_{0})) > H I F (α, h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) . \end{matrix}

Since

H I F (α, h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) = H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) - α

. This implies that

\begin{matrix} H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{1}, γ_{0})) > H D^{2} ((1 - α) h_{J} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})), \end{matrix}

which contradicts (36). The continuity of

θ_{z}

follows from Proposition 2 and the boundedness follows from the compactness of

Θ

. Now let

γ_{0} \to 0

, the second term of RHS of (33) converges to zero by Proposition 2.

We now turn to part (ii) of the Theorem. First fix

γ_{0} \geq 0 .

Since

θ_{z}

minimizes

H^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), t (γ_{0}))

over

Θ

. By Taylor expansion of

H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0}))

around

θ

, we get

\begin{matrix} 0 = \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0})) & = & H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) \\ + (θ_{z} - θ) D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}^{*}, γ_{0})), \end{matrix}

where

θ_{z}^{*} (γ_{0})

is a point between

θ

and

θ_{z}

,

\begin{matrix} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{2} \int_{I R} u_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) (s_{J, α, z} (y | θ, γ_{0}) - s_{J} (y | θ, γ_{0})) d y, \end{matrix}

(37)

and

D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0}))

can be expressed the sum of

D_{1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0}))

and

D_{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0}))

, where

\begin{matrix} D_{1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) = \frac{1}{2} \int_{I R} {\dot{u}}_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ^{'}, γ_{0}) d y and \end{matrix}

(38)

\begin{matrix} D_{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ^{'}, γ_{0})) = \frac{1}{4} \int_{I R} u_{J, α, z} (y | θ, γ_{0}) u_{J, α, z}^{'} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ^{'}, γ_{0}) d y . \end{matrix}

(39)

Therefore,

\begin{matrix} α^{- 1} (θ_{z} - θ) = - α^{- 1} D^{- 1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}^{*}, γ_{0})) \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) . \end{matrix}

We will show that

\begin{matrix} lim_{α \to 0} D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}^{*}, γ_{0})) = - \frac{1}{4} I (θ (γ_{0})), and \end{matrix}

(40)

\begin{matrix} lim_{α \to 0} α^{- 1} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{4} \int_{I R} [η_{z} (y) u_{J} (y | θ, γ_{0})] d y . \end{matrix}

(41)

We will first establish (40). Please note that as

α \to 0,

by definition

θ_{z} (α) \to θ .

Thus,

{lim}_{α \to 0} θ_{z}^{*} (α) = θ .

In addition, by assumptions (O3) and (O4),

D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}, γ_{0}))

is continuous in

θ_{z}

. Therefore, to prove (40), it suffices to show that

\begin{matrix} lim_{α \to 0} D_{1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = - \frac{1}{2} \int_{I R} {\dot{u}}_{J, α, z} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) d y = - \frac{1}{2} I (θ (γ_{0})), and \end{matrix}

(42)

\begin{matrix} lim_{α \to 0} D_{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{4} \int_{I R} u_{J, α, z} (y | θ, γ_{0}) u_{J, α, z}^{'} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) d y = \frac{1}{4} I (θ (γ_{0})) . \end{matrix}

(43)

We begin with

D_{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0}))

. Notice that

\begin{matrix} lim_{α \to 0} s_{J, α, z} (y | θ, γ_{0}) = s_{J} (y | θ, γ_{0}), lim_{α \to 0} u_{J, α, z} (y | θ, γ_{0}) = u_{J} (y | θ, γ_{0}), and \end{matrix}

\begin{matrix} lim_{α \to 0} {\dot{u}}_{J, α, z} (y | θ, γ_{0}) = {\dot{u}}_{J} (y | θ, γ_{0}) . \end{matrix}

Thus,

\begin{matrix} lim_{α \to 0} {\dot{u}}_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) = {\dot{u}}_{J} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) . \end{matrix}

In addition, in order to pass the limit inside the integral, note that, for every component of matrix

u_{J, α, z} (\cdot | θ, γ_{0}) u_{J, α, z}^{'} (\cdot | θ, γ_{0})

, we have

\begin{matrix} |u_{J, α, z} (y | θ, γ_{0}) u_{J, α, z}^{'} (y | θ, γ_{0})| & = & |(\frac{(1 - α) \nabla h_{J, α, z} (y | θ, γ_{0})}{(1 - α) h_{J, α, z} (y | θ, γ_{0}) + α η_{z} (y)}) {(\frac{(1 - α) \nabla h_{J, α, z} (y | θ, γ_{0})}{(1 - α) h_{J, α, z} (y | θ, γ_{0}) + α η_{z} (y)})}^{'}| \\ = & |(\frac{\nabla h_{J, α, z} (y | θ, γ_{0})}{h_{J, α, z} (y | θ, γ_{0}) + \frac{α}{1 - α} η_{z} (y)}) {(\frac{\nabla h_{J, α, z} (y | θ, γ_{0})}{h_{J, α, z} (y | θ, γ_{0}) + \frac{α}{1 - α} η_{z} (y)})}^{'}| \\ \leq & |(\frac{\nabla h_{J, α, z} (y | θ, γ_{0})}{h_{J, α, z} (y | θ, γ_{0})}) {(\frac{\nabla h_{J, α, z} (y | θ, γ_{0})}{h_{J, α, z} (y | θ, γ_{0})})}^{'}| = |u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0})|, \end{matrix}

where

| \cdot |

represents the absolute function for each component of the matrix, and

\begin{matrix} |s_{J, α, z} (y | θ, γ_{0})| \leq {[h_{J} (y | θ, γ_{0}) + η_{z} (y)]}^{\frac{1}{2}} . \end{matrix}

Now choosing the dominating function

\begin{matrix} m_{J}^{(1)} (y | θ, γ_{0}) = |u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0})| {[h_{J} (y | θ, γ_{0}) + η_{z} (y)]}^{\frac{1}{2}} s_{J} (y | θ, γ_{0}) \end{matrix}

and applying Cauchy-Schwarz inequality, we obtain that there exists a constant C such that

\begin{matrix} \int_{I R} |m_{J}^{(1)} (y | θ, γ_{0})| d y & \leq & C {\{\int_{I R} {(u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}))}^{2} d y\}}^{\frac{1}{2}}, \end{matrix}

which is finite by assumption (M2). Hence, by the dominated convergence theorem, (43) holds. Turning to (42), notice that for each component of the matrix

{\dot{u}}_{J, α, z} (y | θ, γ_{0})

,

\begin{matrix} |{\dot{u}}_{J, α, z} (y | θ, γ_{0})| & = & |\frac{{\ddot{h}}_{J} (y | θ, γ_{0}) [h_{J} (y | θ, γ_{0}) + \frac{α}{1 - α} η_{z} (y)] - (\nabla h_{J} (y | θ, γ_{0})) {(\nabla h_{J} (y | θ, γ_{0}))}^{'}}{{(h_{J} (y | θ, γ_{0}) + \frac{α}{1 - α} η_{z} (y))}^{2}}| \\ \leq & |\frac{{\ddot{h}}_{J} (y | θ, γ_{0})}{h_{J} (y | θ, γ_{0})}| + |\frac{(\nabla h_{J} (y | θ, γ_{0})) {(\nabla h_{J} (y | θ, γ_{0}))}^{'}}{h_{J}^{2} (y | θ, γ_{0})}|, \end{matrix}

where

| \cdot |

denotes the absolute function for each component. Now choosing the dominating function

\begin{matrix} m_{J}^{(2)} (y | θ, γ_{0}) = (|\frac{{\ddot{h}}_{J} (y | θ, γ_{0})}{h_{J} (y | θ, γ_{0})}| + |\frac{(\nabla h_{J} (y | θ, γ_{0})) {(\nabla h_{J} (y | θ, γ_{0}))}^{'}}{h_{J}^{2} (y | θ, γ_{0})}|) {[h_{J} (y | θ, γ_{0}) + η_{z} (y)]}^{\frac{1}{2}} s_{J} (y | θ, γ_{0}), \end{matrix}

and applying the Cauchy-Schwarz inequality it follows, using (M3), that

\begin{matrix} \int_{I R} |m_{J}^{(2)} (y | θ, γ_{0})| d y < \infty . \end{matrix}

Finally, by the dominated convergence theorem, it follows that

\begin{matrix} lim_{α \to 0} D_{1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ_{z}^{*}, γ_{0})) = - \frac{1}{2} I (θ (γ_{0})) . \end{matrix}

Therefore (40) follows. It remains to show that (41) holds. To this end, note that

\begin{matrix} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) & = & - \frac{1}{2} \int_{I R} s_{J, α, z} (y | θ, γ_{0}) u_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) d y . \end{matrix}

Now taking partial derivative of

H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0}))

with respect to

α

, it can be expressed as the sum of

U_{1}

,

U_{2}

and

U_{3}

, where

\begin{matrix} U_{1} = - \frac{1}{4} \int_{I R} \frac{- h_{J} (y | θ, γ_{0}) + η_{z} (y)}{s_{J, α, z} (y | θ, γ_{0})} u_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) d y, \end{matrix}

\begin{matrix} U_{2} = - \frac{1}{2} \int_{I R} s_{J, α, z} (y | θ, γ_{0}) \frac{- \nabla h_{J} (y | θ, γ_{0}) h_{J, α, z} (y | θ, γ_{0})}{h_{J, α, z}^{2} (y | θ, γ_{0})} s_{J} (y | θ, γ_{0}) d y, and \end{matrix}

\begin{matrix} U_{3} = - \frac{1}{2} \int_{I R} s_{J, α, z} (y | θ, γ_{0}) \frac{- (1 - α) \nabla h_{J} (y | θ, γ_{0}) (- h_{J} (y | θ, γ_{0}) + η_{z} (y))}{h_{J, α, z}^{2} (y | θ, γ_{0})} s_{J} (y | θ, γ_{0}) d y . \end{matrix}

By dominated convergence theorem (using similar idea as above to find dominating functions), we have

\begin{matrix} lim_{α \to 0} \frac{\partial \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0}))}{\partial α} = \frac{1}{4} \int_{I R} u_{J} (y | θ, γ_{0}) η_{z} (y) d y . \end{matrix}

Hence, by L’Hospital rule, (41) holds. It remains to show that

\begin{matrix} lim_{γ_{0} \to 0} lim_{α \to 0} D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = - \frac{1}{4} I (θ), and \end{matrix}

(44)

\begin{matrix} lim_{γ_{0} \to 0} lim_{α \to 0} α^{- 1} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{4} \int_{I R} [η_{z} (y) v_{J} (y | θ)] d y . \end{matrix}

(45)

We start with (44). Since for fixed

γ_{0} \geq 0

, by the above argument, it follows that

\begin{matrix} lim_{α \to 0} D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = - \frac{1}{4} I (θ (γ_{0})) = - \frac{1}{4} \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) d y, \end{matrix}

it is enough to show

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} u_{J} (y | θ, γ_{0}) u_{J}^{'} (y | θ, γ_{0}) h_{J} (y | θ, γ_{0}) d y = \int_{I R} v_{J} (y | θ) v_{J}^{'} (y | θ) h^{* J} (y | θ) d y, \end{matrix}

which is proved in Lemma 2. Hence (44) holds. Next we prove (45). By the argument used to establish (40), it is enough to show that

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} [η_{z} (y) u_{J} (y | θ, γ_{0})] d y = \int_{I R} [η_{z} (y) v_{J} (y | θ)] d y . \end{matrix}

(46)

However,

\begin{matrix} \int_{I R} η_{z} (y) [u_{J} (y | θ, γ_{0}) - v_{J} (y | θ)] d y \leq sup_{y} |u_{J} (y | θ, γ_{0}) - v_{J} (y | θ)|, \end{matrix}

and the RHS of the above inequality converges to zero as

γ_{0} \to 0

from assumption (V1). Hence (46) holds. This completes the proof. □

Our next result is concerned with the behavior of the

α -

influence function when

γ_{0} \to 0

first and then

| z | \to \infty

or

α \to 0

. The following three additional assumptions will be used in the proof of part (ii) of Theorem 4.

Model assumptions for robustness analysis

(O5) For

α \in [0, 1]

and all

θ \in Θ

,

{\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ)

is bounded in

L_{2}

.

(O6) For

α \in [0, 1]

and all

θ \in Θ

,

u_{α, z}^{* J} (y | θ) u_{α, z}^{* J^{'}} (y | θ) s_{α, z}^{* J} (y | θ)

is bounded in

L_{2}

.

(O7) For

α \in [0, 1]

and all

θ \in Θ

,

\begin{matrix} lim_{γ_{0} \to 0} \int_{I R} {(s_{J, α, z} (y | θ, γ_{0}) u_{J, α, z} (y | θ, γ_{0}) - s_{α, z}^{* J} (y | θ) u_{α, z}^{* J} (y | θ))}^{2} d y = 0 . \end{matrix}

Theorem 4.

(i) Let

α \in (0, 1),

and assume that for all

θ \in Θ

, assume that the assumptions of Proposition 1 hold, also assume that

T (h_{J, α, z} (\cdot | θ, γ_{0}))

is unique for all z. Then,

T (h_{J, α, z} (\cdot | θ, γ_{0}))

is a bounded, continuous function of z such that

\begin{matrix} lim_{| z | \to \infty} lim_{γ_{0} \to 0} T (h_{J, α, z} (\cdot | θ, γ_{0})) = θ; \end{matrix}

(ii) Assume further that the conditions (O3)–(O7) hold. Then,

\begin{matrix} lim_{α \to 0} lim_{γ_{0} \to 0} α^{- 1} [T (h_{J, α, z} (\cdot | θ, γ_{0})) - θ] = {[I (θ)]}^{- 1} \int_{I R} [η_{z} (y) v_{J} (y | θ)] d y . \end{matrix}

Proof.

Let

θ_{z} (γ_{0})

denote

T (h_{J, α, z} (\cdot | θ, γ_{0}))

and let

θ_{z}

denote

T (h_{α, z}^{* J} (\cdot | θ))

. First fix

z \in I R

; then by the triangular inequality,

\begin{matrix} lim_{γ_{0} \to 0} | θ_{z} (γ_{0}) - θ | \leq lim_{γ_{0} \to 0} | θ_{z} (γ_{0}) - θ_{z} | + lim_{γ_{0} \to 0} | θ_{z} - θ | . \end{matrix}

(47)

The first term of RHS of (47) is equal to zero due to proposition 2. Now let

| z | \to \infty

, then the second term on the RHS of (47) converges to zero using similar argument as Theorem 3 with density converging to

h_{α, z}^{* J} (\cdot | θ)

. This completes the proof of I). Turning to (ii), we will prove that

\begin{matrix} lim_{α \to 0} lim_{γ_{0} \to 0} D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = - \frac{1}{4} I (θ), \end{matrix}

(48)

\begin{matrix} lim_{α \to 0} lim_{γ_{0} \to 0} α^{- 1} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{4} \int_{I R} [η_{z} (y) v_{J} (y | θ)] d y . \end{matrix}

(49)

Recall from the proof of part (ii) of Theorem 3 that

\begin{matrix} D (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) & = & \frac{1}{2} \int_{I R} {\dot{u}}_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) \\ + \frac{1}{4} \int_{I R} u_{J, α, z} (y | θ, γ_{0}) u_{J, α, z}^{'} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) s_{J} (y | θ, γ_{0}) \\ \equiv & D_{1} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) + D_{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) . \end{matrix}

We will now show that for fixed

α \in (0, 1)

\begin{matrix} lim_{γ_{0} \to 0} D_{1} (h_{J, α, z} (θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{2} \int_{I R} {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y, and \end{matrix}

(50)

\begin{matrix} lim_{γ_{0} \to 0} D_{2} (h_{J, α, z} (θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = \frac{1}{4} \int_{I R} u_{α, z}^{* J} (y | θ) u_{α, z}^{* J^{'}} (y | θ) s_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y . \end{matrix}

(51)

We begin with (50). A standard calculation shows that

D_{1} (h_{J, α, z} (θ, γ_{0}), u_{α, z}^{* J} (y | θ))

can be expressed as the sum of

D_{11}

,

D_{12}

and

D_{13}

, where

\begin{matrix} D_{11} = \frac{1}{2} \int_{I R} ({\dot{u}}_{J, α, z} (y | θ, γ_{0}) s_{J, α, z} (y | θ, γ_{0}) - {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ)) s_{J} (y | θ, γ_{0}) d y, \end{matrix}

\begin{matrix} D_{12} = \frac{1}{2} \int_{I R} {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ) (s_{J} (y | θ, γ_{0}) - t_{J} (y | θ)) d y, and \end{matrix}

\begin{matrix} D_{13} = \frac{1}{2} \int_{I R} {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y . \end{matrix}

It can be seen that

D_{11}

converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality and assumption (O3); also,

D_{12}

converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality, assumption (O5) and Scheffe’s theorem. Hence (50) follows. Similarly (51) follows as

γ_{0} \to 0

by Cauchy-Schwarz inequality, assumption (O4), assumption (O6) and Scheffe’s theorem.

Now let

α \to 0

. Using the same idea as in Theorem 3 to find dominating functions, one can apply the dominated convergence Theorem to establish that

\begin{matrix} lim_{α \to 0} \frac{1}{2} \int_{I R} {\dot{u}}_{α, z}^{* J} (y | θ) s_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y = - \frac{1}{2} I (θ), and \end{matrix}

\begin{matrix} lim_{α \to 0} \frac{1}{4} \int_{I R} u_{α, z}^{* J} (y | θ) u_{α, z}^{* J^{'}} (y | θ) s_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y = \frac{1}{4} I (θ) . \end{matrix}

Hence (48) follows. Finally, it remains to establish (49). First fix

α \in (0, 1)

; we will show that

\begin{matrix} lim_{γ_{0} \to 0} \nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0})) = - \frac{1}{2} \int_{I R} s_{α, z}^{* J} (y | θ) u_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y . \end{matrix}

(52)

Please

\nabla H D^{2} (h_{J, α, z} (\cdot | θ, γ_{0}), h_{J} (\cdot | θ, γ_{0}))

can be expressed as the sum of

T_{1}

,

T_{2}

and

T_{3}

, where

\begin{matrix} T_{1} = - \frac{1}{2} \int_{I R} (s_{J, α, z} (y | θ, γ_{0}) u_{J, α, z} (y | θ, γ_{0}) - s_{α, z}^{* J} (y | θ) u_{α, z}^{* J} (y | θ)) s_{J} (y | θ, γ_{0}) d y, \end{matrix}

\begin{matrix} T_{2} = - \frac{1}{2} \int_{I R} s_{α, z}^{* J} (y | θ) u_{α, z}^{* J} (y | θ) (s_{J} (y | θ, γ_{0}) - t_{J} (y | θ)) d y, and \end{matrix}

\begin{matrix} T_{3} = - \frac{1}{2} \int_{I R} s_{α, z}^{* J} (y | θ) u_{α, z}^{* J} (y | θ) t_{J} (y | θ) d y . \end{matrix}

It can be seen thatr

T_{1}

converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality and assumption (O7);

T_{2}

converges to zero as

γ_{0} \to 0

by Cauchy-Schwarz inequality, boundedness of

u_{α, z}^{* J} (\cdot) s_{α, z}^{* J} (\cdot)

in

L_{2}

, and Scheffe’s theorem. Therefore, (52) holds. Finally, letting

α \to 0

and using the same idea as in Theorem 3 to find the dominating function, it follows by the dominated convergence theorem and L’Hospital rule that (49) holds. This completes the proof of the Theorem. □

Remark 4.

Theorems 3 and 4 do not imply that the double limit exists. This is beyond the scope of this paper.

In the next section, we describe the implementation details and provide several simulation results in support of our methodology.

4. Implementation and Numerical Results

In this section, we apply the proposed MHD based methods to estimate the unknown parameters

θ = (μ, σ^{2})

using the compressed data. We set J = 10,000 and

B = 100

. All simulations are based on 5000 replications. We consider the Gaussian kernel and Epanechnikov kernel for the nonparametric density estimation. The Gaussian kernel is given by

\begin{matrix} K (x) = \frac{1}{\sqrt{2 π}} exp (- \frac{x^{2}}{2}), \end{matrix}

and the Epanechnikov kernel is given by

\begin{matrix} K (x) = \frac{3}{4} (1 - x^{2}) 1_{(| x | \leq 1)} . \end{matrix}

We generate

X

and uncontaminated compressed data

\tilde{Y}

in the following way:

Step 1. Generate $X_{l}$ , where $X_{j l} \overset{i . i . d .}{\sim} N (μ, σ^{2})$ .
Step 2. Generate $R_{l}$ , where $r_{i j l} \overset{i . i . d .}{\sim} N (1, γ_{0}^{2})$ .
Step 3. Generate the uncontaminated ${\tilde{Y}}_{l}$ by calculating ${\tilde{Y}}_{l} = R_{l} X_{l}$ .

4.1. Objective Function

In practice, we store the compressed data

({\tilde{Y}}_{l}, r_{\cdot l}, ω_{l})

for all

1 \leq l \leq B

. Hence if

X_{j l}

follows Normal distribution with mean

μ

and variance

σ^{2}

, the form of the marginal density of the compressed data, viz.,

Y_{i l}

is complicated and does not have a closed form expression. However, for large J, using the local limit theorem its density can be approximated by Gaussian density with mean

\sqrt{J} μ

and variance

σ^{2} + γ_{0}^{2} (μ^{2} + σ^{2})

. Hence, we work with

U_{i l}

, where

U_{i l} = \frac{{\tilde{Y}}_{i l} - μ r_{i \cdot l}}{ω_{i l}}

. Please note that with this transformation,

E [U_{i l}] = 0

and

Var [U_{i l}] = σ^{2}

. Hence, the kernel density estimate of the unknown true density is given by

\begin{matrix} g_{B}^{(i)} (y | μ) = \frac{1}{B c_{B}} \sum_{l = 1}^{B} K (\frac{y - U_{i l}}{c_{B}}) . \end{matrix}

The difference between the kernel density estimate and the one proposed here is that we include the unknown parameter

μ

in the kernel. Additionally, this allows one to incorporate

(r_{\cdot r}, ω_{l})

into the kernel. Consequently, only the scale parameter

σ

is part of the parametric model. Using the local limit theorem, we approximate the true parametric model by

ϕ (\cdot | σ)

, where

ϕ (\cdot | σ)

is the density of

N (0, σ^{2})

. Hence, the objective function is

\begin{matrix} Ψ (i, θ) \equiv A (g_{B}^{(i)} (\cdot | μ), ϕ (\cdot | σ)) = \int_{I R} g_{B}^{(i) \frac{1}{2}} (y | μ) ϕ^{\frac{1}{2}} (y | σ) d y; \end{matrix}

and, the estimator is given by

\begin{matrix} {\hat{θ}}_{B} (γ_{0}) = \frac{1}{S} \sum_{i = 1}^{S} {\hat{θ}}_{i, B} (γ_{0}), where {\hat{θ}}_{i, B} (γ_{0}) = \underset{θ \in Θ}{argmax} Ψ (i, θ) . \end{matrix}

It is clear that

{\hat{θ}}_{B} (γ_{0})

is a consistent estimator of

θ^{*} .

In the next subsection, we use Quasi-Newton method with Broyden-Fletcher-Goldfarb-Shanno (BFGS) update to estimate

θ

. Quasi-Newton method is appealing since (i) it replaces the complicated calculation of the Hessian matrix with an approximation which is easier to compute (

Δ_{k} (θ)

given in the next subsection) and (ii) gives more flexible step size t (compared to the Newton-Raphson method), ensuring that it does not “jump” too far at every step and hence guaranteeing convergence of the estimating equation. The BFGS update (

H_{k}

) is a popular method for approximating the Hessian matrix via gradient evaluations. The step size t is determined using Backtracking line search algorithm described in Algorithm 2. The algorithms are given in detail in the next subsection. Our analysis also includes the case where

S \equiv 1

and

r_{i j l} \equiv 1

. In this case, as explained previously, one obtains significant reduction in storage and computational complexity. Finally, we emphasize here that the density estimate contains

μ

and is not parameter free as is typical in classical MHDE analysis. In the next subsection, we describe an algorithm to implement our method.

4.2. Algorithm

As explained previously, we use the Quasi-Newton Algorithm with BFGS update to obtain

{\hat{θ}}_{MHDE}

. To describe this method, consider the objective function (suppressing i)

Ψ (θ)

, which is twice continuously differentiable. Let the initial value of

θ

be

θ^{(0)} = (μ^{(0)}, σ^{(0)})

and

H_{0} = I

, where I is the identity matrix.

Algorithm 1: The Quasi-Newton Algorithm.

Set k = 1.

repeat
- Calculate $Δ_{k} (θ) = - H_{k - 1}^{- 1} \nabla Ψ (θ^{(k - 1)})$ , where $\nabla Ψ (y; θ^{k - 1})$ is the first derivative of $Ψ (θ)$ with respect to $θ$ at $(k - 1)$ th step.
- Determine the step length parameter t via backtracking line search.
- Compute $θ^{(k)} = θ^{(k - 1)} + t Δ_{k} (θ)$ .
- Compute $H_{k}$ , where the BFGS update is
  
  $\begin{matrix} H_{k} = H_{k - 1} + \frac{q_{k - 1} q_{k - 1}^{T}}{q_{k - 1}^{T} d_{k - 1}} - \frac{H_{k - 1} d_{k - 1} d_{k - 1}^{T} H_{k - 1}^{T}}{d_{k - 1}^{T} H_{k - 1} d_{k - 1}}, \end{matrix}$
  
  where
  
  $\begin{matrix} d_{k - 1} = θ^{(k)} - θ^{(k - 1)}, \\ q_{k - 1} = \nabla Ψ (θ^{(k)}) - \nabla Ψ (θ^{(k - 1)}) . \end{matrix}$
- Compute $e_{k} = | Ψ (θ^{(k)}) - Ψ (θ^{(k - 1)}) |$ .
- Set $k = k + 1$ .
until $(e_{k}) < threshold$ .

Remark 5.

In step 1, one can directly use the Inverse update for

H_{k}^{- 1}

as follows:

\begin{matrix} H_{k}^{- 1} = (I - \frac{d_{k - 1} q_{k - 1}^{T}}{q_{k - 1}^{T} d_{k - 1}}) H_{k - 1}^{- 1} (I - \frac{q_{k - 1} d_{k - 1}^{T}}{q_{k - 1}^{T} d_{k - 1}}) + \frac{d_{k - 1} d_{k - 1}^{T}}{q_{k - 1}^{T} d_{k - 1}} . \end{matrix}

Remark 6.

In step 2, the step size t should satisfy the Wolfe conditions:

\begin{matrix} \begin{matrix} Ψ (y; θ^{(k)} + t Δ_{k}) \leq Ψ (θ^{(k)}) + u_{1} t \nabla Ψ^{T} (θ^{(k)}) Δ_{k}, \\ \nabla Ψ (θ^{(k)} + t Δ_{k}) \geq u_{2} \nabla Ψ^{T} (θ^{(k)}) Δ_{k}, \end{matrix} \end{matrix}

where

u_{1}

and

u_{2}

are constants with

0 < u_{1} < u_{2} < 1 .

The first condition requires that t sufficiently decrease the objective function. The second condition ensures that the step size is not too small. The Backtracking line search algorithm proceeds as follows (see [26]):

Algorithm 2: The Backtracking Line Search Algorithm.

Given a descent direction

Δ (θ)

for

Ψ

at

θ

,

ζ \in (0, 0.5), κ \in (0, 1)

.

t : = 1 .

while

Ψ (θ + t Δ θ) > Ψ (θ) + ζ t \nabla Ψ {(θ)}^{T} Δ θ

,
do

t : = κ t .

end while

4.3. Initial Values

The initial value for

θ

are taken to be

\begin{matrix} \begin{matrix} μ^{(0)} = median ({\tilde{Y}}_{i l}) / J, \\ σ^{(0)} = 1.48 \times median (| {\tilde{Y}}_{i l} - median ({\tilde{Y}}_{i l}) |) / B . \end{matrix} \end{matrix}

Another choice of the initial value for

σ

is:

\begin{matrix} {\hat{σ}}^{(0)} = \sqrt{\frac{(\frac{\hat{Var [{\tilde{Y}}_{i l}]}}{J} - γ_{0}^{2} μ)}{γ_{0}^{2} + μ_{0}^{2}}}, \end{matrix}

(53)

where

\hat{Var [{\tilde{Y}}_{i l}]}

is an empirical estimate of the variance of

{\tilde{Y}}_{1}

.

Bandwidth Selection: A key issue in implementing the above method of estimation is the choice of the bandwidth. We express the bandwidth in the form

h_{B} = c_{B} s_{B}

, where

c_{B} \in {0.3, 0.4, 0.5, 0.7, 0.9}

, and

s_{B}

is set equal to

1.48 \times

median

(| {\tilde{Y}}_{i l} - median ({\tilde{Y}}_{i l}) |) / B

.

In all the tables below, we report the average (Ave), standard deviation (StD) and mean square error (MSE) to assess the performance of the proposed methods.

4.4. Analyses Without Contamination

From Table 2, Table 3, Table 4 and Table 5, we let true

μ = 2, σ = 1

, and take the kernel to be Gaussian kernel. In Table 2, we compare the estimates of the parameters as the dimension of the compressed data S increases. In this table, we allow S to take values in the set

{1, 2, 5, 10}

. Also, we let the number of groups

B = 100

, the bandwidth is chosen to be

c_{B} = 0.3

, and

γ_{0} = 0.1

. In addition, in Table 2,

S^{*} = 1

means that

S = 1

with

γ_{0} \equiv 0 .

From Table 2 we observe that as S increases, the estimates for

μ

and

σ

remain stable. The case

S^{*} = 1

is interesting, since even by storing the sum we are able to obtain point estimates which are close to the true value. In Table 3, we choose

S = 1, B = 100

and

c_{B} = 0.3

and compare the estimates as

γ_{0}

changes from 0.01 to 1.00. We can see that as

γ_{0}

increases, the estimate for

μ

remains stable, whereas the bias, standard deviation and MSE for

σ

increase.

In Table 4, we fix

S = 1, B = 100

and

γ_{0} = 0.1

and allow the bandwidth

c_{B}

to increase. Also,

c_{B}^{*} = 0.30

means that the bandwidth is chosen as

0.30

with

γ_{0} \equiv 0 .

Notice that in this case when

c_{B} = 0.9

B^{\frac{1}{2}} c_{B} = 9

while

B^{\frac{1}{2}} c_{B}^{2} = 8.1

which is not small as is required in assumption (B2). We notice again that as

c_{B}

decreases, the estimates of

μ

and

σ

are close to the true value with small MSE and StD.

In Table 5, we let

S = 1, c_{B} = 0.3

and

γ_{0} = 0.1

and let the number of groups B increase. This table implies that as B increases, the estimate performs better in terms of bias, standard deviation and MSE.

In Table 6, we set

γ_{0} \equiv 0

and keep other settings same as Table 5. This table implies that as B increases, the estimate performs better in terms of bias, standard deviation and MSE. Furthermore, the standard deviation and MSE are slightly smaller than the results in Table 5.

We next move on to investigating the effect of other sensing variables. In the following table, we use Gamma model to generate the additive matrix

R_{l}

. Specifically, the mean of Gamma random variable is set as

α_{0} β_{0} = 1

, and the variance

v a r \equiv α_{0} β_{0}^{2}

is chosen from the set

{0, {0.01}^{2}, 0.01, 0.25, 1.00}

which are also the variances in Table 3.

From Table 7, notice that using Gamma sensing variable yields similar results as Gaussian sensing variable. Our next example considers the case when the mean of the sensing variable is not equal to one and the sensing variable is taken to have a discrete distribution.Specifically, we use Bernoulli sensing variables with parameter p. Moreover, we fix

S = 1

and let

p J = S

. Therefore

p = 1 / J

. Hence as J increases, the variance decreases. Now notice that in this case the mean of sensing variable is p instead of 1. In addition,

E [{\tilde{Y}}_{i l}] = μ

and

Var [{\tilde{Y}}_{i l}] = σ^{2} + μ^{2} (1 - \frac{1}{J})

. Hence we set the initial value as

\begin{matrix} \begin{matrix} μ^{(0)} = median ({\tilde{Y}}_{i l}), \\ σ^{(0)} = 1.48 \times median (| {\tilde{Y}}_{i l} - median ({\tilde{Y}}_{i l}) |) . \end{matrix} \end{matrix}

Additionally, we take

B = 100

,

c_{B} = 0.30

and

s_{B}

to be

1.48 \times median (| {\tilde{Y}}_{i l} - median ({\tilde{Y}}_{i l}) |) .

Table 8 shows that MHD method also performs well with Bernoulli sensing variable, although the bias of

σ

, standard deviation and mean squre error for both estimates are larger than those using Gaussian sensing variable and Gamma sensing variable.

4.5. Robustness and Model Misspecification

In this section, we provide a numerical assessment of the robustness of the proposed methodology. To this end, let

\begin{matrix} f_{α, η} (x | θ) = (1 - α) f (x | θ) + α η (x), \end{matrix}

where

η (x)

is a contaminating component,

α \in [0, 1) .

We generate the contaminated reduced data

Y

in the following way:

Step 1. Generate $X_{l}$ , where $X_{j l} \overset{i . i . d .}{\sim} N (2, 1)$ .
Step 2. Generate $R_{l}$ , where $r_{i j l} \overset{i . i . d .}{\sim} N (1, γ_{0}^{2})$ .
Step 3. Generate uncontaminated ${\tilde{Y}}_{l}$ by calculating ${\tilde{Y}}_{l} = R_{l} X_{l}$ .
Step 4. Generate contaminated ${\tilde{Y}}_{i l}^{c}$ , where ${\tilde{Y}}_{i l}^{c} = {\tilde{Y}}_{i l} + η (x)$ with probability $α$ , and ${\tilde{Y}}_{i l}^{c} = {\tilde{Y}}_{i l}$ with probability $1 - α$ .

In the above description, the contamination with outliers is within blocks. A conceptual issue that one encounters is the meaning of outliers in this setting. Specifically, a data point which is an outlier in the original data set may not remain an outlier in the reduced data and vice-versa. Hence the concepts such as breakdown point and influence function need to be carefully studied. The tables below present one version of the robustness exhibited by the proposed method. In Table 9 and Table 10, we set

J = 10^{4}, B = 100, S = 1, γ_{0} = 0.1, c_{B} = 0.3, η = 1000

. In addition,

α^{*} = 0

means that

α = 0

with

γ_{0} \equiv 0 .

From the above Table we observe that, even under

50 %

contamination the estimate of the mean remains stable; however, the estimate of the variance is affected at high-levels of contamination (beyond

30 %

). An interesting and important issue is to investigate the role of

γ_{0}

on the breakdown point of the estimator.

Finally, we investigate the bias in MHDE as a function of the values of the outlier. The graphs below (Figure 2) describe the changes to MHDE when outlier values (

η

) increase. Here we set

S = 1, B = 100, γ_{0} = 0.1

. In addition, we let

α = 0.2,

and

η

to take values from

{100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}

. We can see that as

η

increases, both

\hat{μ}

and

\hat{σ}

increase up to

η = 500

then decrease, although

\hat{μ}

does not change too much. This phenomenon is because when the outlier value is small (or closer to the observations), then it may not be considered as an “outlier” by the MHD method. However, as the outlier values move “far enough” from other values, then the estimate for

μ

and

σ

remain the stable.

5. Example

In this section we describe an analysis of data from financial analytics, using the proposed methods. The data are from a bank (a cash and credit card issuer) in Taiwan and the targets of analyses were credit card holders of the bank. The research focused on the case of customers’ default payments. The data set (see [27] for details) contains 180,000 observations and includes information on twenty five variables such as default payments, demographic factors, credit data, history of payment, and billing statements of credit card clients from April 2005 to September 2005. Ref. [28] study machine learning methods for evaluating the probability of default. Here, we work with the first three months of data containing 90,000 observations concerning bill payments. For our analyses we remove zero payments and negative payment from the data set and perform a logarithmic transformation of the bill payments . Since the log-transformed data was multi-modal and exhibited features of a mixture of normal distributions, we work with the log-transformed data with values in the range (6.1, 13). Next, we performed the Box-Cox transformation to the log-transformed data. This transformation identifies the best transformation that yields approximately normal distribution (which belongs to the location-scale family). Specifically, let

L

denote the log-transformed data in range (6.1, 13), then the data after Box-Cox transformation is given by

X = (L^{2} - 1) / 19.9091

. The histogram for

X

is given in Figure 3. The number of observations at the end of data processing was 70,000.

Our goal is to estimate the average bill payment during the first three months. For this, we will apply the proposed method. In this analysis, we assume that the target model for

X

is Gaussian and split the data, randomly, into

B = 100

blocks yielding

J = 700

observations per block.

In Table 11, “est” represents the estimator, “

95 %

CI” stands for

95 %

confidence interval for the estimator. When analyzing the whole data and choosing bandwidth as

c_{n} = 0.30

, we get the MHDE of

μ

to be

\hat{μ} = 5.183

with

95 %

confidence interval

(5.171, 5.194)

, and the MHDE of

σ

as

\hat{σ} = 1.425

with confidence interval

(1.418, 1.433)

.

In Table 11, we choose the bandwidth as

c_{B} = 0.30 .

Also,

S^{*} = 1

represents the case where

S = 1

and

γ_{0} \equiv 0

. In all other settings, we keep

γ_{0} = 0.1

. We observe that all estimates are similar as S changes.

Next we study the robustness of MHDE for this data by investigating the relative bias and studying the influence function. Specifically, we first reduce the dimension from

J = 700

to

S = 1

for each of the

B = 100

blocks and obtain the compressed data

\tilde{Y}

; next, we generate the contaminated reduced data

{\tilde{Y}}_{i l}^{c}

from step 4 in Section 4.5. Also, we set

α = 0.20, γ_{0} = 0.20

; the kernel is taken to be to be Epanechnikov density with bandwidth

c_{B} = 0.30

.

η (x)

is assumed to takes values in

{50, 100, 200, 300, 500, 800, 1000}

(note that the approximate mean of

\tilde{Y}

is around 3600). Let

T_{MHD}

be the Hellinger distance functional. The influence function given by

\begin{matrix} \begin{matrix} IF (α; T, \tilde{Y}) = \frac{T_{MHD} ({\tilde{Y}}^{c}) - T_{MHD} (\tilde{Y})}{α}, \end{matrix} \end{matrix}

which we use to assess the robustness. The graphs shown below (Figure 4) illustrate how the influence function changes as the outlier values increase. We observe that for both estimates (

\hat{μ}

and

\hat{σ}

), the influence function first increase and then decrease fast. From

η (x) = 300

, the influence functions remain stable and are close to zero, which clearly indicate that MHDE is stable.

Additional Analyses: The histogram in Figure 3 suggests that, may be a mixture of normal distributions may fit the log and Box-Cox transformed data better than the normal distribution. For this reason, we calculated the Hellinger distance between four component mixture (chosen using BIC criteria) and the normal distribution and this was determined to be 0.0237, approximately. Thus, the normal distribution (which belongs to the location-scale family) can be viewed as a misspecified target distribution; admittedly, one does lose information about the components of the mixture distribution due to model misspecification. However, since our goal was to estimate the overall mean and variance the proposed estimate seems to possess the properties described in the manuscript.

6. Discussion and Extensions

The results in the manuscript focus on the iterated limit theory for MHDE of the compressed data obtained from a location-scale family. Two pertinent questions arise: (i) is it easy to extend this theory to MHDE of compressed data arising from non location-scale family of distributions? and (ii) is it possible to extend the theory from iterated limits to a double limit? Turning to (i), we note that the heuristic for considering the location-scale family comes from the fact that the first and the second moment are consistently estimable for partially observed random walks (see [29,30]). This is related to the size of J and can be of exponential order. For such large J, other moments may not be consistently estimable. Hence, the entire theory goes through as long as one is considering parametric models

f (\cdot | θ)

, where

θ = W (μ, σ^{2})

, for a known function

W (\cdot, \cdot)

. The case in point is the Gamma distribution which can be re-parametrized in terms of the first two moments.

As for (ii), it is well-known that existence and equality of iterated limits for real sequences does not imply the existence of the double limit unless additional uniformity of convergence holds (see [31] for instance). Extension of this notion for distributional convergence requires additional assumptions and are investigated in a different manuscript wherein more general divergences are also considered.

7. Concluding Remarks

In this paper we proposed the Hellinger distance-based method to obtain robust estimates for mean and variance in a location-scale model using compressed data. Our extensive theoretical investigations and simulations show the usefulness of the methodology and hence can be applied in a variety of scientific settings. Several theoretical and practical questions concerning robustness in a big data setting arise. For instance, the effect of the variability in the

R

matrix and its effect on outliers are important issues that need further investigation. Furthermore, statistical properties such as uniform consistency and uniform asymptotic normality under different choices for the distribution of

R

would be useful. These are under investigation by the authors.

Author Contributions

The problem was conceived by E.A., A.N.V. and G.D. L.L. is a student of A.N.V., and worked on theoretical and simulation details with inputs from all members at different stages.

Funding

The authors thank George Mason University Libraries for support with the article processing fees; Ahmed’s research is supported by a grant from NSERC.

Acknowledgments

The authors thank the anonymous reviewers for a careful reading of the manuscript and several useful suggestions that improved the readability of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MHDE	Minimum Hellinger Distance Estimator
MHD	Minimum Hellinger Distance
i.i.d.	independent and identically distributed
MLE	Maximum Likelihood Estimator
CI	Confidence Interval
IF	Influence Function
RHS	Right Hand Side
LHS	Left Hand Side
BFGS	Broyden-Fletcher-Goldfarb-Shanno
var	Variance
StD	Standard Deviation
MSE	Mean Square Error

References

Beran, R. Minimum Hellinger distance estimates for parametric models. Ann. Stat. 1977, 5, 445–463. [Google Scholar] [CrossRef]
Lindsay, B.G. Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Ann. Stat. 1994, 22, 1081–1114. [Google Scholar] [CrossRef]
Fisher, R.A. Two new properties of mathematical likelihood. Proc. R. Soc. Lond. Ser. A 1934, 144, 285–307. [Google Scholar] [CrossRef]
Pitman, E.J.G. The estimation of the location and scale parameters of a continuous population of any given form. Biometrika 1939, 30, 391–421. [Google Scholar] [CrossRef]
Gupta, A.; Székely, G. On location and scale maximum likelihood estimators. Proc. Am. Math. Soc. 1994, 120, 585–589. [Google Scholar] [CrossRef]
Duerinckx, M.; Ley, C.; Swan, Y. Maximum likelihood characterization of distributions. Bernoulli 2014, 20, 775–802. [Google Scholar] [CrossRef]
Teicher, H. Maximum likelihood characterization of distributions. Ann. Math. Stat. 1961, 32, 1214–1222. [Google Scholar] [CrossRef]
Thanei, G.A.; Heinze, C.; Meinshausen, N. Random projections for large-scale regression. In Big and Complex Data Analysis; Springer: Berlin, Germany, 2017; pp. 51–68. [Google Scholar]
Slawski, M. Compressed least squares regression revisited. In Artificial Intelligence and Statistics; Addison-Wesley: Boston, MA, USA, 2017; pp. 1207–1215. [Google Scholar]
Slawski, M. On principal components regression, random projections, and column subsampling. Electron. J. Stat. 2018, 12, 3673–3712. [Google Scholar] [CrossRef]
Raskutti, G.; Mahoney, M.W. A statistical perspective on randomized sketching for ordinary least-squares. J. Mach. Learn. Res. 2016, 17, 7508–7538. [Google Scholar]
Ahfock, D.; Astle, W.J.; Richardson, S. Statistical properties of sketching algorithms. arXiv, 2017; arXiv:1706.03665. [Google Scholar]
Vidyashankar, A.; Hanlon, B.; Lei, L.; Doyle, L. Anonymized Data: Trade off between Efficiency and Privacy. 2018; preprint. [Google Scholar]
Woodward, W.A.; Whitney, P.; Eslinger, P.W. Minimum Hellinger distance estimation of mixture proportions. J. Stat. Plan. Inference 1995, 48, 303–319. [Google Scholar] [CrossRef]
Basu, A.; Harris, I.R.; Basu, S. Minimum distance estimation: The approach using density-based distances. In Robust Inference, Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 1997; Volume 15, pp. 21–48. [Google Scholar]
Hooker, G.; Vidyashankar, A.N. Bayesian model robustness via disparities. Test 2014, 23, 556–584. [Google Scholar] [CrossRef]
Sriram, T.; Vidyashankar, A. Minimum Hellinger distance estimation for supercritical Galton–Watson processes. Stat. Probab. Lett. 2000, 50, 331–342. [Google Scholar] [CrossRef]
Simpson, D.G. Minimum Hellinger distance estimation for the analysis of count data. J. Am. Stat. Assoc. 1987, 82, 802–807. [Google Scholar] [CrossRef]
Simpson, D.G. Hellinger deviance tests: Efficiency, breakdown points, and examples. J. Am. Stat. Assoc. 1989, 84, 107–113. [Google Scholar] [CrossRef]
Cheng, A.; Vidyashankar, A.N. Minimum Hellinger distance estimation for randomized play the winner design. J. Stat. Plan. Inference 2006, 136, 1875–1910. [Google Scholar] [CrossRef]
Basu, A.; Shioya, H.; Park, C. Statistical Inference: The Minimum Distance Approach; Chapman and Hall/CRC: London, UK, 2011. [Google Scholar]
Bhandari, S.K.; Basu, A.; Sarkar, S. Robust inference in parametric models using the family of generalized negative exponential dispatches. Aust. N. Z. J. Stat. 2006, 48, 95–114. [Google Scholar] [CrossRef]
Ghosh, A.; Harris, I.R.; Maji, A.; Basu, A.; Pardo, L. A generalized divergence for statistical inference. Bernoulli 2017, 23, 2746–2783. [Google Scholar] [CrossRef]
Tamura, R.N.; Boos, D.D. Minimum Hellinger distance estimation for multivariate location and covariance. J. Am. Stat. Assoc. 1986, 81, 223–229. [Google Scholar] [CrossRef]
Li, P. Estimators and tail bounds for dimension reduction in l α (0 < α ≤ 2) using stable random projections. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, 20–22 January 2008; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2008; pp. 10–19. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Lichman, M. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 29 March 2019).
Yeh, I.C.; Lien, C.H. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 2009, 36, 2473–2480. [Google Scholar] [CrossRef]
Guttorp, P.; Lockhart, R.A. Estimation in sparsely sampled random walks. Stoch. Process. Appl. 1989, 31, 315–320. [Google Scholar] [CrossRef]
Guttorp, P.; Siegel, A.F. Consistent estimation in partially observed random walks. Ann. Stat. 1985, 13, 958–969. [Google Scholar] [CrossRef]
Apostol, T.M. Mathematical Analysis; Addison Wesley Publishing Company: Boston, MA, USA, 1974. [Google Scholar]

Figure 1. MLE vs. MHDE after Data Compression.

Figure 2. Comparison of estimates of

μ

(a) and

σ

(b) as outlier changes.

Figure 2. Comparison of estimates of

μ

(a) and

σ

(b) as outlier changes.

Figure 3. The histogram of credit payment data after Box-Cox transformation to Normality.

Figure 4. Influence Function of

\hat{μ}

(a) and

\hat{σ}

(b) for MHDE.

Figure 4. Influence Function of

\hat{μ}

(a) and

\hat{σ}

(b) for MHDE.

Table 1. Illustration of Data Reduction Mechanism, Here

r_{i l}^{*} = (r_{i \cdot l}, ω_{i l})

.

Table 1. Illustration of Data Reduction Mechanism, Here

r_{i l}^{*} = (r_{i \cdot l}, ω_{i l})

.

	Grp 1	Grp 2	⋯	Grp B		Grp 1	Grp 2	⋯	Grp B
Original	$X_{11}$	$X_{12}$	⋯	$X_{1 B}$	Compressed	$({\tilde{Y}}_{11}, r_{11}^{*})$	$({\tilde{Y}}_{12}, r_{12}^{*})$	⋯	$({\tilde{Y}}_{1 B}, r_{1 B}^{*})$
Data	$X_{21}$	$X_{22}$	⋯	$X_{2 B}$	Data	$({\tilde{Y}}_{21}, r_{21}^{*})$	$({\tilde{Y}}_{22}, r_{22}^{*})$	⋯	$({\tilde{Y}}_{2 B}, r_{2 B}^{*})$
	⋮	⋮	⋮	⋮	$\overset{S ≪ J}{⟹}$	⋮	⋮	⋮	⋮
	$X_{J 1}$	$X_{J 2}$	⋯	$X_{J B}$	$\overset{S ≪ J}{⟹}$	$({\tilde{Y}}_{S 1}, r_{S 1}^{*})$	$({\tilde{Y}}_{S 2}, r_{S 2}^{*})$	⋯	$({\tilde{Y}}_{S B}, r_{S B}^{*})$

Table 2. MHDE as the dimension S changes for compressed data

\tilde{Y}

using Gaussian kernel.

Table 2. MHDE as the dimension S changes for compressed data

\tilde{Y}

using Gaussian kernel.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$S^{*} = 1$	2.000	1.010	0.001	1.016	74.03	5.722
$S = 1$	2.000	1.014	0.001	1.018	74.22	5.844
$S = 2$	2.000	1.005	0.001	1.019	73.81	5.832
$S = 5$	2.000	0.987	0.001	1.017	74.16	5.798
$S = 10$	2.000	0.995	0.001	1.019	71.87	5.525

Table 3. MHDE as

γ_{0}

changes for compressed data

\tilde{Y}

using Gaussian kernel.

Table 3. MHDE as

γ_{0}

changes for compressed data

\tilde{Y}

using Gaussian kernel.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$γ_{0} = 0.00$	2.000	1.010	0.001	1.016	74.03	5.722
$γ_{0} = 0.01$	2.000	1.017	0.001	1.015	74.83	5.814
$γ_{0} = 0.10$	2.000	1.023	0.001	1.021	72.80	5.717
$γ_{0} = 0.50$	2.000	1.119	0.001	1.076	72.59	11.08
$γ_{0} = 1.00$	2.000	1.399	0.002	1.226	82.21	57.75

Table 4. MHDE as the bandwidth

c_{B}

changes for compressed data

\tilde{Y}

using Gaussian kernel.

Table 4. MHDE as the bandwidth

c_{B}

changes for compressed data

\tilde{Y}

using Gaussian kernel.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$c_{B}^{*} = 0.30$	2.000	1.010	0.001	1.016	74.03	5.722
$c_{B} = 0.30$	2.000	1.014	0.001	1.018	74.22	5.844
$c_{B} = 0.40$	2.000	1.015	0.001	1.063	79.68	10.26
$c_{B} = 0.50$	2.000	1.014	0.001	1.108	82.33	18.33
$c_{B} = 0.70$	2.000	1.004	0.001	1.212	93.96	53.64
$c_{B} = 0.90$	2.000	1.009	0.001	1.346	110.5	132.2

Table 5. MHDE as B changes for compressed data

\tilde{Y}

using Gaussian kernel with

γ_{0} = 0.1

.

Table 5. MHDE as B changes for compressed data

\tilde{Y}

using Gaussian kernel with

γ_{0} = 0.1

.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$B = 20$	2.000	2.205	0.005	1.739	378.5	688.6
$B = 50$	2.000	1.409	0.002	1.136	125.2	34.17
$B = 100$	2.000	1.010	0.001	1.016	74.03	5.722
$B = 500$	2.000	0.455	0.000	0.972	32.63	1.873

Table 6. MHDE as B changes for compressed data

\tilde{Y}

using Gaussian kernel with

γ_{0} = 0

.

Table 6. MHDE as B changes for compressed data

\tilde{Y}

using Gaussian kernel with

γ_{0} = 0

.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$B = 20$	2.000	2.282	0.005	1.749	381.4	706.0
$B = 50$	2.000	1.440	0.002	1.148	125.2	37.42
$B = 100$	2.000	1.014	0.001	1.018	74.22	5.844
$B = 500$	2.000	0.465	0.000	0.973	31.33	1.692

Table 7. MHDE as variance changes for compressed data

\tilde{Y}

using Gaussian kernel under Gamma sensing variable.

Table 7. MHDE as variance changes for compressed data

\tilde{Y}

using Gaussian kernel under Gamma sensing variable.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$v a r = 0.00$	2.000	1.010	0.001	1.016	74.03	5.722
$v a r = {0.01}^{2}$	2.000	1.005	0.001	1.016	74.56	5.806
$v a r = 0.01$	2.000	1.006	0.001	1.018	73.70	5.762
$v a r = 0.25$	2.000	1.120	0.001	1.078	73.70	11.56
$v a r = 1.00$	2.000	1.438	0.001	1.228	81.94	58.48

Table 8. MHDE as J changes for compressed data

\tilde{Y}

using Gaussian kernel under Bernoulli sensing variable.

Table 8. MHDE as J changes for compressed data

\tilde{Y}

using Gaussian kernel under Bernoulli sensing variable.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$J = 10$	2.000	104.9	11.01	1.215	97.78	55.79
$J = 100$	1.998	104.5	10.93	1.201	104.5	51.26
$J = 1000$	1.998	104.7	10.96	1.195	106.6	49.36
$J = 5000$	2.001	103.9	10.80	1.200	105.7	51.20
$J = 10000$	1.996	105.1	11.07	1.196	104.4	49.16

Table 9. MHDE as

α

changes for contaminated data

\tilde{Y}

using Gaussian kernel.

Table 9. MHDE as

α

changes for contaminated data

\tilde{Y}

using Gaussian kernel.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$α^{*} = 0.00$	2.000	1.010	0.001	1.016	74.03	5.722
$α = 0.00$	2.000	1.014	0.001	1.018	74.22	5.844
$α = 0.01$	2.000	1.002	0.001	1.022	74.89	6.079
$α = 0.05$	2.000	1.053	0.001	1.023	77.86	6.599
$α = 0.10$	2.000	1.086	0.001	1.034	79.30	7.350
$α = 0.20$	2.000	1.146	0.001	1.073	93.45	14.06
$α = 0.30$	2.001	7.205	0.054	1.264	688.2	542.5
$α = 0.40$	2.026	21.60	1.100	3.454	1861	9480
$α = 0.50$	2.051	14.00	2.600	4.809	1005	15513

Table 10. MHDE as

α

changes for contaminated data

\tilde{Y}

using Epanechnikov kernel.

Table 10. MHDE as

α

changes for contaminated data

\tilde{Y}

using Epanechnikov kernel.

	$\hat{μ}$			$\hat{σ}$
	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$	Ave	StD $\times 10^{3}$	MSE $\times 10^{3}$
$α^{*} = 0.00$	2.000	0.972	0.001	1.008	73.22	5.425
$α = 0.00$	2.000	1.014	0.001	1.018	74.22	5.844
$α = 0.01$	2.000	0.978	0.001	1.028	107.4	12.19
$α = 0.05$	2.000	1.264	0.002	1.025	108.7	12.35
$α = 0.10$	2.000	1.202	0.001	1.008	114.7	13.09
$α = 0.20$	2.000	1.263	0.002	1.046	129.8	18.76
$α = 0.30$	2.001	5.098	0.026	1.104	557.8	318.9
$α = 0.40$	2.021	21.80	0.900	3.004	1973	7870
$α = 0.50$	2.051	10.21	3.000	4.893	720.4	15669

Table 11. MHDE from the real data analysis.

		$\hat{μ}$	$\hat{σ}$
$S^{*} = 1$	est	5.171	1.362
$S^{*} = 1$	$95 %$ CI	(4.904, 5.438)	(1.158, 1.540)
$S = 1$	est	5.171	1.391
$S = 1$	$95 %$ CI	(4.898, 5.443)	(1.183, 1.572)
$S = 5$	est	5.172	1.359
$S = 5$	$95 %$ CI	(4.905, 5.438)	(1.155, 1.535)
$S = 10$	est	5.171	1.372
$S = 10$	$95 %$ CI	(4.902, 5.440)	(1.167, 1.551)
$S = 20$	est	5.171	1.388
$S = 20$	$95 %$ CI	(4.899, 5.443)	(1.180, 1.569)

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Vidyashankar, A.N.; Diao, G.; Ahmed, E. Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family. Entropy 2019, 21, 348. https://doi.org/10.3390/e21040348

AMA Style

Li L, Vidyashankar AN, Diao G, Ahmed E. Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family. Entropy. 2019; 21(4):348. https://doi.org/10.3390/e21040348

Chicago/Turabian Style

Li, Lei, Anand N. Vidyashankar, Guoqing Diao, and Ejaz Ahmed. 2019. "Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family" Entropy 21, no. 4: 348. https://doi.org/10.3390/e21040348

APA Style

Li, L., Vidyashankar, A. N., Diao, G., & Ahmed, E. (2019). Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family. Entropy, 21(4), 348. https://doi.org/10.3390/e21040348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Inference after Random Projections via Hellinger Distance for Location-Scale Family

Abstract

1. Introduction

2. Background on Minimum Hellinger Distance Estimation

3. Hellinger Distance Methodology for Compressed Data

3.1. Random Projections

3.2. Hellinger Distance Method for Compressed Data

3.3. Main Results

3.4. Representation Formula

3.5. Robustness of MHDE

4. Implementation and Numerical Results

4.1. Objective Function

4.2. Algorithm

4.3. Initial Values

4.4. Analyses Without Contamination

4.5. Robustness and Model Misspecification

5. Example

6. Discussion and Extensions

7. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI