Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model

Helali, Salima; Masmoudi, Afif; Slaoui, Yousri

doi:10.3390/e24030315

Open AccessArticle

Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model

by

Salima Helali

¹,

Afif Masmoudi

² and

Yousri Slaoui

^3,*

¹

Mathematics Laboratory, Angers University, 49100 Angers, France

²

Probability and Statistics Laboratory, Sfax University, Sfax 3029, Tunisia

³

Mathematics and Applications Laboratory, Poitiers University, 86073 Poitiers, France

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(3), 315; https://doi.org/10.3390/e24030315

Submission received: 17 January 2022 / Revised: 15 February 2022 / Accepted: 17 February 2022 / Published: 23 February 2022

(This article belongs to the Special Issue Statistical Learning of Networks and Functional Data)

Download

Browse Figures

Versions Notes

Abstract

:

The central focus of this paper is upon the alleviation of the boundary problem when the probability density function has a bounded support. Mixtures of beta densities have led to different methods of density estimation for data assumed to have compact support. Among these methods, we mention Bernstein polynomials which leads to an improvement of edge properties for the density function estimator. In this paper, we set forward a shrinkage method using the Bernstein polynomial and a finite Gaussian mixture model to construct a semi-parametric density estimator, which improves the approximation at the edges. Some asymptotic properties of the proposed approach are investigated, such as its probability convergence and its asymptotic normality. In order to evaluate the performance of the proposed estimator, a simulation study and some real data sets were carried out.

Keywords:

asymptotic properties; Bernstein polynomial; EM algorithm; Gaussian mixture model; kernel estimator; shrinkage estimator

1. Introduction

Density estimation is a widely adopted tool for multiple tasks in statistical inference, machine learning, visualization and exploratory data analysis. Existing density estimation algorithms can be categorized into either parametric, semi-parametric, or non-parametric approaches. In the non-parametric framework, several methods have been set forward for the smooth estimation of density and distribution functions. The most popular one, called kernel method, was introduced by [1]. The advances were carried out by [2] to estimate a density function. The reader is recommended to consult the paper [3] for an introduction to several kernel smoothing techniques. However, kernel methods display estimation problems at the edges, when we have a random variable X with density function f supported on a compact interval. Moreover, if

X_{1}, \dots, X_{n}

is a sample with the same density f, it is well known, in non-parametric kernel density estimation, that the bias of the standard kernel density estimator

\begin{matrix} {\hat{f}}_{n} (x) & = & \frac{1}{n h_{n}} \sum_{i = 1}^{n} K (\frac{x - X_{i}}{h_{n}}), \end{matrix}

(1)

is of a larger order near the boundary than that in the interior, where K is a kernel (that is, a positive function satisfying

\int K (x) d x = 1

) and (

h_{n}

) is a bandwidth (that is, a sequence of positive real numbers that goes to zero). Let us now suppose that f has two continuous derivatives everywhere and that, as

n \to \infty

,

h = h_{n} \to 0

and

n h \to 0

. Let

x = p h

for

p > 0

. Near the boundary, the expression of the mean and the variance are indicated as

E [{\hat{f}}_{n} (x)] ≃ f (x) \int_{- \infty}^{p} K (x) d x - f^{'} (x) \int_{- \infty}^{p} x K (x) d x + \frac{1}{2} h^{2} f^{″} (x) \int_{- \infty}^{p} x^{2} K (x) d x,

and

V a r [{\hat{f}}_{n} (x)] ≃ {(n h)}^{- 1} f (x) \int_{- \infty}^{p} K^{2} (x) d x .

These bias phenomena are called boundary bias. Numerous authors have elaborated methods for reducing these phenomena, such as data reflection [4], boundary kernels [5,6,7], local linear estimator [8,9], use of beta and gamma kernels [10,11] and bias reduction [12,13]. For a smooth estimator of a density function f with finite known support, there have been several methods, such as Vitale’s method [14], which is based on Bernstein polynomials and expressed as

\begin{matrix} {\tilde{f}}_{1, n, m} (x) & = & m \sum_{k = 0}^{m - 1} [F_{n} (\frac{k + 1}{m}) - F_{n} (\frac{k}{m})] b_{k} (m - 1, x), \end{matrix}

(2)

where

F_{n}

is the empirical distribution function and

b_{k} (m, x) = C_{m}^{k} x^{k} {(1 - x)}^{m - k}

is the Bernstein polynomial. This estimator was investigated in the literatures [15,16,17,18] and, more recently, by [12,19,20].

Within the parametric framework, it is noteworthy that the Gaussian mixture model can be used to estimate any density function, without any problem of estimation on the edge. This refers to the fact that the set of all normal mixture densities is dense in the set of all density functions under the

L^{1}

metric [21]. The investigation of mixture models stands for a full field in modern statistics. It is a probabilistic model introduced by [22] to illustrate the presence of subpopulations within an overall population. It has been developed so far by various authors, such as [23]. It is used for data classification and it provides efficient approaches of model-based clustering. The authors of [24] demonstrated that, when a Gaussian mixture model is used to estimate a density non-parametrically, the density estimator that uses the Bayesian information criterion (BIC) of [25] to select the number of components in the mixture is consistent [26].

However, we obtain the non-parametric kernel estimate of a density if we fit a mixture of n components in equal proportions

1 / n

, where n is the size of the observed sample. As a matter of fact, it can be inferred that mixture models occupy an interesting niche between parametric and non-parametric approaches to statistical estimation.

More recently, in the parametric context, [27] proposed a parametric model using Bernstein polynomials with positive coefficients to estimate the unknown density function f; this estimator is defined as follows:

\begin{matrix} f_{B} (x, p_{m}) & = & \sum_{i = 1}^{m} {\hat{p}}_{m i} B_{m i} (x), \end{matrix}

(3)

where

B_{m i} (x) = (m + 1) b_{i} (m, x)

, for

i = 0, \dots, m

,

p_{m} = {(p_{m 1}, \dots, p_{m m})}^{T}

(

p_{m i} \geq 0

,

i = 1, \dots, m

,

\sum_{i = 1}^{m} p_{m i} \leq 1

) and

{\hat{p}}_{m i}

are the estimators of the parameters

p_{m i}

, obtained by the Expectation Maximization (EM) algorithm as follows:

p_{m i}^{(s + 1)} = \frac{1}{n} \sum_{j = 1}^{n} \frac{p_{m i}^{(s)} B_{m i} (x_{j})}{\sum_{k = 0}^{m} p_{m k}^{(s)} B_{m k} (x_{j})}, i = 0, \dots, m; s = 0, 1, \dots

with

{\hat{p}}_{m i} = lim_{s \to \infty} p_{m i}^{(s)},

for

i = 1, \dots, m .

The proposed method gives a consistent estimator in

L^{2}

distance under some conditions.

The problem at the edge does not arise for the parametric model. For this reason, the basic idea of this work is to consider a shrinkage method using Bernstein (Vitale’s estimator) and Gaussian mixture estimators, to construct a shrinkage density estimator, in order to improve the approximation at the edge. A shrinkage estimator is a convex combination between estimators [28]. Basically, this implies that a naive or raw estimate is improved by combining it with other information.

The remainder of this paper is organized as follows: In the next section, we recall some intrinsic properties of the classical EM algorithm in the context of the Gaussian mixture parameter estimation. In Section 3, we introduce a new semi-parametric estimation approach based on the shrinkage method using Bernstein polynomials and Gaussian mixture densities. In Section 4, the consistency of the proposed estimator is exhibited, as well as its asymptotic normality.

Section 5 highlights a simulation study that compares the performance of the proposed approach with the Bernstein estimator, the standard Gaussian kernel estimator and Guan’s estimator. The closing Section 6 crowns the whole work, wraps the conclusion and provides new perspectives for future work.

2. Background

The Gaussian Mixture Model and Em Algorithm

Let us consider

X = (X_{1}, \dots, X_{n})

, a sequence of independent and identically distributed (i.i.d.) with common Gaussian mixture density defined by

\begin{matrix} g (x | θ) & = & \sum_{k = 1}^{K} π_{k} N (μ_{k}, σ_{k}) (x), \end{matrix}

(4)

where

θ = (π, μ, σ) = (π_{1} \dots, π_{K}, μ_{1} \dots, μ_{K}, σ_{1}, \dots, σ_{K}),

satisfies

0 \leq π_{k} \leq 1, \sum_{k = 1}^{K} π_{k} = 1, μ_{k} \in R, σ_{k} > 0, for k = 1, \dots, K where K > 0,

and

N (μ, σ) (x) = \frac{1}{σ \sqrt{2 π}} exp (- \frac{{(x - μ)}^{2}}{2 σ^{2}}) .

Finally, for each observed data point

X_{i}

, we associate a component label vector

Z_{i}

in order to manage the data clustering. This random vector

Z_{i} = {(Z_{i k})}_{1 \leq k \leq K}

is defined such that

Z_{i k} = 1

if the considered observation

X_{i}

is drawn form the

k^{k t}

component of the mixture and

Z_{i k} = 0

otherwise. Consequently,

Z_{i}

is distributed as a multivariate Bernoulli distribution with vector parameters

(π_{1}, \dots, π_{K})

as follows:

P (Z_{i} = z_{i}) = \prod_{k = 1}^{K} π_{k}^{z_{i k}} .

The EM algorithm is a popular tool in statistical estimation problems involving incomplete data or problems which can be posed in a similar form, such as the mixture parameters estimation [23,29]. In the EM framework,

(X_{1}, \dots, X_{n}, Z_{1}, \dots, Z_{n})

corresponds to the complete data and

(Z_{1}, \dots, Z_{n})

stand for the hidden data. Hence, the complete-data log-likelyhood is expressed by

\begin{matrix} L (X_{1}, \dots, X_{n}, Z_{1}, \dots, Z_{n}, θ) & = & \sum_{i = 1}^{n} \sum_{j = 1}^{K} Z_{i j} [log (π_{j}) + log (N (μ_{j}, σ_{j}) (X_{i}))] . \end{matrix}

(5)

The two steps of the EM algorithm, after l iterations, are the following:

(i): E-step: The conditional expectation of the complete-data log-likelyhood given the observed data, using the current fit $θ^{(l)}$ , is defined by

$\begin{matrix} φ (θ | θ^{(l)}) & = & E_{θ^{(l)}} (L (X_{1}, \dots, X_{n}, Z_{1}, \dots, Z_{n}, θ) | X_{1}, \dots, X_{n}) . \end{matrix}$

(6)

The posterior probability that $X_{i}$ belongs to the $j th$ component of the mixture at the $l th$ iteration, is expressed as

$\begin{matrix} τ_{i j}^{(l)} = E_{θ^{(l)}} (Z_{i j} | X_{1}, \dots, X_{n}) = \frac{π_{j}^{(l)} N (μ_{j}^{(l)}, {(σ^{2})}_{j}^{(l)}) (X_{i})}{\sum_{h = 1}^{K} π_{h}^{(l)} N {(μ_{h}^{(l)}, (σ_{h}^{2})}^{(l)} (X_{i})} . \end{matrix}$

(7)

Finally, we obtain

$\begin{matrix} φ (θ | θ^{(l)}) & = & \sum_{i = 1}^{n} \sum_{j = 1}^{K} τ_{i j}^{(l)} [log (π_{j}) + log (N (μ_{j}, σ_{j}) (X_{i}))] . \end{matrix}$

(8)
(ii): M-step: It consists of a global maximization of $φ (θ | θ^{(l)})$ with respect to $θ$ .

$\begin{matrix} θ^{(l + 1)} & = & arg max_{θ} φ (θ | θ^{(l)}) . \end{matrix}$

(9)

The updated estimates are stated by

$\begin{matrix} π_{j}^{(l + 1)} = \frac{1}{n} \sum_{i = 1}^{n} τ_{i j}^{(l)}, \end{matrix}$

(10)

$\begin{matrix} μ_{j}^{(l + 1)} & = & \frac{\sum_{i = 1}^{n} τ_{i j}^{(l)} X_{i}}{\sum_{i = 1}^{n} τ_{i j}^{(l)}}, \end{matrix}$

(11)

$\begin{matrix} {(σ_{j}^{2})}^{(l + 1)} & = & \frac{\sum_{i = 1}^{n} τ_{i j}^{(l)} {(X_{i} - μ_{j}^{(l + 1)})}^{2}}{\sum_{i = 1}^{n} τ_{i j}^{(l)}} . \end{matrix}$

(12)

We repeat these two steps until

||θ^{(l + 1)} - θ^{(l)}|| < ϵ,

where

ϵ

is a fixed threshold of convergence. The convergence properties of the EM algorithm have been investigated by [29] and by [30]. Relying upon Jensen’s inequality, it can be noticed that, as

φ (θ | θ^{(l)})

increases, the log-likelihood function also increases [29]. Consequently, the EM algorithm converges within a finite iteration number and gives the parameters’ maximum likelihood estimates. Therefore, under some conditions and according to [29], we have

\begin{matrix} lim_{l \to \infty} π_{j}^{(l)} = {\hat{π}}_{j}, lim_{l \to \infty} μ_{j}^{(l)} = {\hat{μ}}_{j} and lim_{l \to \infty} {(σ_{j}^{2})}^{(l)} = {\hat{σ^{2}}}_{j} almost surely (a - s) . \end{matrix}

(13)

In what follows,

\hat{θ} = ({\hat{π}}_{1}, \dots, {\hat{π}}_{K}, {\hat{μ}}_{1}, \dots, {\hat{μ}}_{K}, {\hat{σ}}_{1}, \dots, {\hat{σ}}_{K})

.

3. Proposed Approach

The proposed semi-parametric approach rests upon the shrinkage combination between the Gaussian mixture model and the Bernstein density estimators using the EM algorithm for the parameter estimations. The literature on shrinkage estimation is enormous. From this perspective, it is noteworthy to mention the most relevant contributions. The authors of [28] were the first to introduce the classic shrinkage estimator. The authors of [31] provided theory for the analysis of risk. Oman [32,33] developed estimators which shrink Gaussian density estimators towards linear subspaces. An in-depth investigation of shrinkage theory is displayed in Chapter 5 of [34].

The proposed semi-parametric approach based upon estimating the density function f relies on the same principle of Stein’s works and there are two aspects along this line. The first setting is non-parametric in the sense that we do not assume any parametric form of the density. The non-parametric setting is very important as it allows us to perform statistical inference without making any assumption on the parametric form of the true density f. The second setting is to consider the Gaussian mixture model as a parametric estimator of the unknown density f.

In what follows, we consider

X_{1}, \dots, X_{n}

a sequence of i.i.d. random variables having a common unknown density function f supported on

[0, 1]

. We here develop a shrinkage method to estimate the density function, which is divided into the following three steps:

Step 1: We consider the Bernstein estimator of the density function f, which is defined as

$\begin{matrix} {\tilde{f}}_{1, n, m} (x) & = & m \sum_{i = 0}^{m - 1} [F_{n} (\frac{i + 1}{m}) - F_{n} (\frac{i}{m})] b_{i} (m - 1, x) \end{matrix}$

(14)
Step 2: In view of (13), we consider the Gaussian mixture density as an estimator of the density function f, given by

$\begin{matrix} {\tilde{f}}_{2, n} (x) = \sum_{k = 1}^{K} {\hat{π}}_{k} N ({\hat{μ}}_{k}, {\hat{σ}}_{k}) (x), \end{matrix}$

(15)

where ${\hat{μ}}_{k}$ , ${\hat{σ}}_{k}$ and ${\hat{π}}_{k}$ are estimated by the EM algorithm defined in (13).
Step 3: We consider the shrinkage density estimator ${\hat{f}}_{n, m}$ form defined by

${\hat{f}}_{n, m} (x) = λ {\tilde{f}}_{1, n, m} (x) + (1 - λ) {\tilde{f}}_{2, n} (x),$

and we use the EM algorithm to estimate the parameter $λ \in [0, 1]$ of the proposed model.

By the same way as considered in Section 2, the two steps of the EM algorithm, after t iterations, are denoted in terms of the following:

1.: E-step: The conditional expectation of the complete-data log-likelihood given the observed data, using the current $λ^{(t)}$ , is provided by

$\begin{matrix} Q (λ | λ^{(t)}) & = & \sum_{i = 1}^{n} E_{λ^{(t)}} (W_{i 1} ∣ X_{i}) log {\tilde{f}}_{1, n, m} (X_{i}) + E_{λ^{(t)}} (W_{i 2} ∣ X_{i}) log {\tilde{f}}_{2, n} (X_{i}), \end{matrix}$

where $W_{i} = (W_{i 1}, W_{i 2})$ is a discrete random vector, following a multivariate Bernoulli distribution with vector parameters $(λ, 1 - λ)$ . Using Bayes’s formula, we obtain the posterior probability in the $t th$ iteration denoted by

$\begin{matrix} {\bar{τ}}_{i 1}^{(t)} & = & \frac{{\tilde{f}}_{1, n, m} (X_{i}) λ^{(t)}}{λ^{(t)} {\tilde{f}}_{1, n, m} (X_{i}) + (1 - λ^{(t)}) {\tilde{f}}_{2, n} (X_{i})}, \end{matrix}$

and

$\begin{matrix} {\bar{τ}}_{i 2}^{(t)} & = & \frac{{\tilde{f}}_{2, n} (X_{i}) λ^{(t)}}{λ^{(t)} {\tilde{f}}_{1, n, m} (X_{i}) + (1 - λ^{(t)}) {\tilde{f}}_{2, n} (X_{i})} = 1 - {\bar{τ}}_{i 1}^{(t)} . \end{matrix}$
2.: M-step: It consists of a global maximization of $Q (λ | λ^{(t)})$ with respect to $λ$ .

$λ^{(t + 1)} = arg max_{λ} Q (λ ∣ λ^{(t)}) .$

The updated estimate of $λ$ is indicated by

$λ^{(t + 1)} = \frac{1}{n} \sum_{i = 1}^{n} {\bar{τ}}_{i 1}^{(t)} .$

The estimation of

λ

is obtained from by iterating the EM algorithm until convergence.

\begin{matrix} lim_{t \to \infty} λ^{(t)} = \hat{λ} . \end{matrix}

(16)

Therefore, the proposed estimator of the density function f is defined by

\begin{matrix} {\hat{f}}_{n, m} (x) & = & \hat{λ} {\tilde{f}}_{1, n, m} (x) + (1 - \hat{λ}) {\tilde{f}}_{2, n} (x) . \end{matrix}

(17)

Basically, it is a shrinkage estimator that shrinks the Bernstein estimator towards the Gaussian mixture density by a specified amount of

λ

. If

λ = 1

, the estimator

{\hat{f}}_{n, m}

reduces to the Bernstein estimator

{\tilde{f}}_{1, n, n}

.

4. Convergence

In this section, we derive some asymptotic properties of the proposed estimator

{\hat{f}}_{n, m}

when the sample size tends to infinity. First, we assume that

λ

and K are fixed. The following proposition gives the probability convergence of the proposed estimator

{\hat{f}}_{n, m}

.

Proposition 1

(Probability convergence). If

m = o (n / log (n))

, then, for

x \in [0, 1]

, we have

{\hat{f}}_{n, m} (x) \underset{n, m \to + \infty}{\overset{P}{⟶}} λ f (x) + (1 - λ) f_{2} (x),

where

f_{2} (x) = \sum_{j = 1}^{K} π_{j} N (μ_{j}, σ_{j}^{2}) (x)

,

π_{j} = E (Z_{1 j}),

μ_{j} = E (X_{1} ∣ Z_{1 j} = 1),

σ_{j}^{2} = V a r (X_{1} ∣ Z_{1 j} = 1)

for

j = 1, \dots, K

and

\overset{P}{⟶}

denotes the convergence in probability.

The proof of Proposition 1 necessitates the following technical Lemma.

Lemma 1.

Let

{(S_{n})}_{n \geq 1}

be a sequence of i.i.d. random variables in the space of square integral functions

L^{2}

with a common mean μ and let

{(T_{n})}_{n \geq 1}

be a sequence of random variables. Hence,

E ({\bar{S}}_{n} | T_{n}) \underset{n \to + \infty}{\overset{L^{2}}{⟶}} μ, w h e r e {\bar{S}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} S_{i},

where

L^{2}

denotes the mean quadratic convergence

L^{2}

.

The proof of this lemma is reported in [35].

Proof of Proposition 1.

First, using Lemma 1 and following the same steps as the proof of Theorem 4.4 in [35], we prove that

{\hat{π}}_{j} \underset{n \to + \infty}{\overset{P}{⟶}} π_{j},

lim_{n \to \infty} {\hat{μ}}_{j} \underset{n \to + \infty}{\overset{P}{⟶}} μ_{j}

and

{\hat{σ^{2}}}_{j} \underset{n \to + \infty}{\overset{P}{⟶}} σ_{j}^{2} .

Then, according to Slutsky’s Theorem, we obtain

\begin{matrix} \sum_{k = 1}^{K} {\hat{π}}_{k} N ({\hat{μ}}_{k}, {\hat{σ}}_{k}) (x) \underset{n \to + \infty}{\overset{P}{⟶}} \sum_{j = 1}^{K} π_{j} N (μ_{j}, σ_{j}^{2}) (x) . \end{matrix}

(18)

Second, based on Theorem 3.1 in [16], we obtain

\begin{matrix} {\tilde{f}}_{1, n, m} (x) \underset{n \to + \infty}{\overset{P}{⟶}} f (x) for x \in [0, 1] . \end{matrix}

(19)

In addition, referring to (18) and (19) and grounded on the application of Slutsky’s Theorem, we conclude the proof. □

According to [21], the density

f (x)

is a close approximation to the mixture density

f_{2} (x)

. Thus, the estimator

{\hat{f}}_{n, m} (x)

provides an approximation to the true density

f (x)

.

To study the asymptotic normality of the estimator

{\hat{f}}_{n, m}

given by (17), we set forward the following assumptions in [36].

(A1): For almost $x \in [0, 1]$ and for all $i, j, h = 1 \dots, ν$ , the partial derivatives $\partial g / \partial ξ_{i}$ , $\partial^{2} g / \partial ξ_{i} \partial ξ_{j}$ and $\partial^{3} g / \partial ξ_{i} \partial ξ_{j} \partial ξ_{h}$ of the density g exist and satisfy that $|\frac{\partial g (x | θ)}{\partial ξ_{i}}|,$ $|\frac{\partial^{2} g (x | θ)}{\partial ξ_{i} ξ_{j}}|$ and $|\frac{\partial^{3} g (x | θ)}{\partial ξ_{i} ξ_{j} ξ_{h}}|$ are bounded, respectively, by $J_{i}$ , $J_{i j}$ and $J_{i j h}$ , where $J_{i}$ and $J_{i j}$ are integrable and $J_{i j h}$ , satisfies

$\int_{0}^{1} J_{i j h} (x) g (x | \hat{θ}) d x < \infty .$
(A2): The Fisher information matrix $I (θ)$ is positively defined at $\hat{θ}$ .

Proposition 2

(Normality asymptotic). Under the regularity conditions(A1)–(A2), if

f (x) > 0

for all

x \in [0, 1]

,

2 \leq m \leq (n / log n)

and

lim_{n, m \to \infty} n^{2 / 3} / m = 0

, then, we obtain

n^{1 / 2} m^{- 1 / 4} [{\hat{f}}_{n, m} (x) - λ f (x) - (1 - λ) f_{2} (x)] \underset{n, m \to + \infty}{\overset{D}{⟶}} N (0, λ^{2} γ (x)),

where

γ (x) = f (x) {(4 π x (1 - x))}^{- 1 / 2},

for

x \in] 0, 1 [

, and

\overset{D}{⟶}

denotes the convergence in distribution.

Proof of Proposition 2.

Using Theorem 3.2 in [16], we obtain

n^{1 / 2} m^{- 1 / 4} ({\tilde{f}}_{1, n, m} (x) - f (x)) \underset{n, m \to + \infty}{\overset{D}{⟶}} N (0, γ (x)) .

Thus,

n^{1 / 2} m^{- 1 / 4} (λ {\tilde{f}}_{1, n, m} (x) - λ f (x)) \underset{n, m \to + \infty}{\overset{D}{⟶}} N (0, λ^{2} γ (x)) .

According to Theorem 3.1 in [36], we obtain

\sqrt{n} (\hat{θ} - θ) \underset{n \to + \infty}{\overset{D}{⟶}} N (0, I {(θ)}^{- 1}) .

Using the delta method, we obtain

\sqrt{n} ({\tilde{f}}_{2, n} (x | \hat{θ}) - f_{2} (x | θ)) \underset{n \to + \infty}{\overset{D}{⟶}} N (0, D f_{2} (x | θ) I {(θ)}^{- 1} D f_{2} {(x | θ)}^{T}),

where

D f_{2} (x | θ)

is the Jacobian matrix of

f_{2} (x | θ) = f_{2} (x)

and

{\tilde{f}}_{2, n} (x | \hat{θ}) = {\tilde{f}}_{2, n} (x)

. Since

m^{- 1 / 4} \to 0

if

m \to \infty

, then, using Slutsky’s Theorem, we conclude the proof. □

The following corollary is a consequence of the previous proposition which gives an asymptotic confidence interval of the density f, for a risk

α \in] 0, 1 [

.

Corollary 1.

The

100 (1 - α) %

asymptotic confidence interval of

f (x)

is given by

({\hat{f}}_{n, m} (x) \pm \frac{z_{1 - \frac{α}{2}} λ \sqrt{γ (x)}}{\sqrt{n} m^{- 1 / 4}}),

where

z_{1 - \frac{α}{2}}

is the normal

(1 - \frac{α}{2})

quantile.

In the next section, we study the performance of the proposed estimator in estimating different distributions by comparing it to the performances of the Bernstein estimator and of the Gaussian kernel estimator.

5. Numerical Studies

5.1. Comparison Study

In this section, we investigate the performance of the proposed estimator given in (17), through estimating different densities by comparing it to the performance of the Bernstein estimator defined in (2), the standard Gaussian kernel estimator defined in (1) and the Guan’s estimator defined in (3). We apply the Bernstein estimator when the sample is concentrated on the interval

[0, 1]

. For this purpose, we need to make some suitable transformations in the different cases that are listed as follows:

1.: Let us suppose that X is concentrated on a finite support $[a, b]$ ; then, we work with the sample values $Y_{1}, \dots, Y_{n}$ , where $Y_{i} = (X_{i} - a) / (b - a)$ .
2.: For the density functions concentrated on $R$ , we can use the transformed sample $Y_{i} = 1 / 2 + π^{- 1} arctan (X_{i})$ , which transforms the range to the interval $(0, 1)$ .
3.: For the support $R_{+}$ , we can use the transformed sample $Y_{i} = X_{i} / (1 + X_{i})$ , which transforms the range to the interval $(0, 1)$ .

If the support is infinite, say,

R = (- \infty, \infty)

, we can consider

[x_{1}, x_{t}] \subset [a, b]

as the finite support of f, where

x_{1}

and

x_{t}

are the minimum and the maximum order, respectively. We choose a and b such that

F (a)

and

1 - F (b)

are of

O (n^{- 1})

,

a < x_{1}

, and

b > x_{t}

, where F is the distribution function [27]. Then, we can use the transformed sample, which transforms

[x_{1}, x_{t}]

to the interval

[0, 1]

mentioned in the case 1.

In the simulation study, three sample sizes were considered,

n = 50

,

n = 100

, and

n = 200

, as well as the following density functions:

(a): The beta mixture density $0.5 B (3, 9) + 0.5 B (9, 3)$ ;
(b): The beta mixture density $0.5 B (3, 1) + 0.5 B (10, 10)$ ;
(c): The normal mixture density $1 / 4 N (2, 1) + 3 / 4 N (- 3, 1)$ ;
(d): The chi-squared $χ_{n} (2)$ density.
(e): The gamma mixture density $0.5 G (1, 6) + 0.5 G (6, 1)$ ;
(f): The gamma mixture density $0.5 G (1, 2) + 0.5 G (4, 2)$ .

Our sample was decomposed into a learning sample of a size of 2/3 of the considered sample, on which the various statistical methods were constructed, and a second sample of a size of 1/3 of the considered sample, on which the predictive performance of the three methods were tested. For each density function f and sample size n, we computed the integrated squared error (

I S E

), the integrated absolute error (

I A E

) and the Kullback–Leibler divergence (

K L

) of the estimator

{\hat{f}}_{n, m}

over

N = 500

trials.

\hat{I S E} = \frac{1}{N} \sum_{k = 1}^{N} I S E ({\hat{f}}_{k}), \hat{I A E} = \frac{1}{N} \sum_{k = 1}^{N} I A E ({\hat{f}}_{k}) and \hat{K L} = \frac{1}{N} \sum_{k = 1}^{N} K L ({\hat{f}}_{k}),

where

{\hat{f}}_{k}

is the estimator computed from the

k th

sample and

\begin{matrix} I S E [{\hat{f}}_{k}] & = & \int_{0}^{1} {({\hat{f}}_{k} (x) - f (x))}^{2} d x, I A E ({\hat{f}}_{k}) = \int_{0}^{1} | {\hat{f}}_{k} (x) - f (x) | d x, \\ K L ({\hat{f}}_{k} | f) & = & \int_{0}^{1} {\hat{f}}_{k} (x) log \frac{{\hat{f}}_{k} (x)}{f (x)} d x . \end{matrix}

Indeed, it is advised to consider a learning sample bigger than a testing sample. In this work, our sample was decomposed into a learning sample of a size of 2/3 of the considered sample, on which the various statistical methods were constructed, and a second sample of a size of 1/3 of the considered sample, on which the predictive performances of the three methods were tested. Each run of the proposed estimator performed the following steps:

-: We first generated a random sample ${(X_{i})}_{1 \leq i \leq n}$ of size n from the models’ density $(a) - (f)$ .
-: We then split the generated data into a training set of a size of $2 / 3$ of the considered sample and a test set of a size of $1 / 3$ of the considered sample.
-: We applied the proposed estimator, using the observed data $X_{i}$ only from the training set, in order to estimate the density function.
-: The test set was then used to compute the estimation errors $\hat{I S E}$ , $\hat{I A E}$ and $\hat{K L}$ .

To select the optimal parameter K, we used the Gap Statistics algorithm [37]. We considered a Monte Carlo experiment to select the optimal choice of the degree m of the Bernstein polynomial and the bandwidth h of the kernel estimator, for each point

x \in [0, 1]

. We determined the parameters m (for

1 \leq m \leq 300

) and h (for

h = i / 1000

with

1 \leq i \leq 300

), which minimized the

I S E

, which was approximated by the

\hat{I S E}

.

We considered

N = 500

random samples of sizes

n = 50

,

n = 100

and

n = 200

.

Departing from Table 1, Table 2 and Table 3 and Figure 1, we deduce the following:

-: The results displayed in Table 1, Table 2 and Table 3 show that the $\hat{I S E}$ , $\hat{I A E}$ and $\hat{K L}$ decreased as the sample size increased.
-: Using the proposed estimator, we obtained better results than those given by the other estimators in a large part of the cases.
-: The Figure 2 and 3 give a better sense of where the error is located.
-: For the case (e) of the gamma mixture, the average $\hat{I S E}$ and $\hat{I A E}$ of Guan’s estimator (1.3) were smaller than those obtained by the proposed density estimator (3.4) and the Bernstein estimator (1.2). However, in all the other cases, using an appropriate choice of the degree m, the average $\hat{I S E}$ and $\hat{I A E}$ of the proposed density estimator (3.4) were smaller than what achieved by the kernel estimator (1.1), the Bernstein estimator (1.2) and Guan’s estimator (1.3), even when the sample size was large for same cases.
-: When we changed the parameters of the gamma mixture density in the sense that we had a smaller bias, our estimator was more competitive than the other approaches and we obtained better results.
-: Almost in all considered cases, the average $\hat{K L}$ of the density estimator (17) was smaller than that obtained by the Bernstein estimator defined in (2), that of the kernel estimator defined in (1) and that of Guan’s estimator.
-: In the considered distribution $0.5 B (3, 9) + 0.5 B (9, 3)$ , by choosing the appropriate m, the curve of the proposed distribution estimator (3.4) was closer to the true distribution than that of Guan’s estimator (1.3), even when the sample size was very large.

Referring to Figure 2 and Figure 3, we infer the following:

-: None of the estimators for the gamma mixture density $0.5 G (6, 1) + 0.5 G (1, 6)$ had good approximations near $x = 0$ . However, the $\hat{I S E}$ of the proposed estimator was closer to zero than that of the Bernstein estimator and the kernel estimator, especially near the edge $x = 1$ .
-: Guan’s estimator and the kernel estimator for the normal mixture density $0.25 N (2, 1) + 0.75 N (- 3, 1)$ had good approximations near $x = 0$ . However, the $\hat{I S E}$ of the proposed estimator was closer to zero than that of the other estimators, especially near the two edges.

Therefore, we note that, for difficult distributions that diverge at the boundaries, the proposed method would fail, but not as badly as the standard methods without shrinkage. In addition, the performed simulations revealed that, on average, the proposed approach could lead to satisfactory estimates near the boundaries, better than the classical Bernstein estimator.

5.2. Real Dataset

5.2.1. COVID-19 Data

In this subsection, we consider the COVID-19 data displayed in the INED website https://dc-covid.site.ined.fr/fr/donnees/france/ (accessed on 16 February 2022). These data concern the numbers of deaths due to COVID-19 in France (daily) from 21 March 2021, for 454 days. These data are such that

{min}_{i} (x_{i}) = 605

and

{max}_{i} (x_{i}) = 0

. Then, it is convenient to assume that the density of the numbers of deaths is defined on the interval

[0, 605]

and transform the data into the interval unit. The Monte Carlo procedure was performed and resulted in

h = 0.07659612

for the standard kernel estimator defined in (1.1),

m_{1} = 20

for the Bernstein estimator defined in (1.2), the proposed estimator, and

m_{2} = 12

for Guan’s estimator. These estimators are exhibited in Figure 1 (right panel) along with a histogram of the data. All the estimators are smooth and seem to capture the pattern highlighted by the histogram. We record that the proposed estimator outperformed the other estimators near the boundaries.

5.2.2. Tuna Data

The last example concerns the tuna data reported in [38]. The data are derived from an aerial line transect survey of Southern Bluefin Tuna in the Great Australian Bight. An aircraft with two spotters on board flew randomly over allocated line transects. These data correspond to the perpendicular sighting distances (in miles) of 64 detected tuna schools to the transect lines. The survey was conducted in summer when tuna data tend to stay on the surface. The data are such that

{min}_{i} (x_{i}) = 0.19

and

{max}_{i} (x_{i}) = 16.26

. The Monte Carlo procedure was performed and resulted in

h = 0.1079

for the standard kernel estimator defined in (1),

m_{1} = 13

for the Bernstein estimator defined in (2) and the proposed estimator, and

m_{2} = 6

for Guan’s estimator. These estimators are illustrated in Figure 4 (left panel) along with a histogram of the data. All the estimators are smooth and seem to capture the pattern highlighted by the histogram. We assert that the proposed estimator outperformed the other estimator, especially near the boundaries.

6. Conclusions

In this paper, we propose a shrinkage estimator of a density function based on the Bernstein density estimator and using a finite Gaussian mixture density. This method rests on three steps. The first step consists of considering the Bernstein estimator

{\tilde{f}}_{1, n, m}

. The second relies upon the Gaussian Mixture density

{\tilde{f}}_{2, n}

as an estimator of the unknown density f. The last step consists of considering the shrinkage form

λ {\tilde{f}}_{1, n, m} + (1 - λ) {\tilde{f}}_{2, n}

and EM algorithm in order to estimate the parameter

λ

. The asymptotic properties of this estimator were established. Afterwards, we demonstrate the effectiveness of the proposed method using some simulated and real data. We clarify how it can lead to very satisfactory estimates near the boundaries and in terms of

I S E

,

I A E

and

K L

. Eventually, we would simply assert that our research work is a step that may be taken further, extended and built upon as it lays the ground and paves the way for future works to elaborate a semi-parametric regression estimator using the shrinkage method. We also plan to work on the case where

λ

is a random variable. Another future research direction would be to extend our findings to the setting of serially dependent observations.

Author Contributions

Conceptualization, A.M. and Y.S.; Data curation, S.H.; Investigation, S.H. and Y.S.; Methodology, S.H., A.M. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research study received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Härdle, W. Smoothing Techniques with Implementation in S; S. Springer Science and Business Media: Berlin, Germany, 1991. [Google Scholar]
Schuster, E.F. Incorporating support constraints into nonparametric estimators of densities. Comm. Stat. Theory Methods 1985, 14, 1123–1136. [Google Scholar] [CrossRef]
Müller, H.-G. Smooth optimum kernel estimators near endpoints. Biometrika 1991, 78, 521–530. [Google Scholar] [CrossRef]
Müller, H.-G. On the boundary kernel method for nonparametric curve estimation near endpoints. Scand. J. Statist. 1993, 20, 313–328. [Google Scholar]
Müller, H.-G.; Wang, J.-L. Hazard rate estimation under random censoring with varying kernels and bandwidths. Biometrika 1994, 50, 61–76. [Google Scholar] [CrossRef]
Lejeune, M.; Sarda, P. Smooth estimators of distribution and density functions. Comput. Stat. Data Anal. 1992, 14, 457–471. [Google Scholar] [CrossRef]
Jones, M.C. Simple boundary correction for density estimation kernel. Stat. Comput. 1993, 13, 135–146. [Google Scholar] [CrossRef]
Chen, S.X. Beta kernel estimators for density functions. Comput. Stat. Data Anal. 1999, 31, 131–145. [Google Scholar] [CrossRef]
Chen, S.X. Probability density function estimation using gamma kernels. Ann. Inst. Stat. Math. 2000, 52, 471–480. [Google Scholar] [CrossRef]
Leblanc, A. A bias-reduced approach to density estimation using Bernstein polynomials. J. Nonparametr. Stat. 2010, 22, 459–475. [Google Scholar] [CrossRef]
Slaoui, Y. Bias reduction in kernel density estimation. J. Nonparametr. Stat. 2018, 30, 505–522. [Google Scholar] [CrossRef]
Vitale, R.A. A bernstein polynomial approach to density function estimation. Stat. Inference Relat. Topics 1975, 2, 87–99. [Google Scholar]
Ghosal, S. Convergence rates for density estimation with Bernstein polynomials. Ann. Stat. 2000, 29, 1264–1280. [Google Scholar] [CrossRef]
Babu, G.J.; Canty, A.J.; Chaubey, Y.P. Application of Bernstein polynomials for smooth estimation of a distribution and density function. J. Stat. Plan. Inference 2002, 105, 377–392. [Google Scholar] [CrossRef]
Kakizawa, Y. Bernstein polynomial probability density estimation. J. Nonparametr. Stat. 2004, 16, 709–729. [Google Scholar] [CrossRef]
Rao, B.L.S.P. Estimation of distribution and density functions by generalized Bernstein polynomials. Indian J. Pure Appl. Math. 2005, 36, 63–88. [Google Scholar]
Igarashi, G.; Kakizawa, Y. On improving convergence rate of Bernstein polynomial density estimator. J. Nonparametr. Stat. 2014, 26, 61–84. [Google Scholar] [CrossRef]
Slaoui, Y.; Jmaei, A. Recursive density estimators based on Robbins-Monro’s scheme and using Bernstein polynomials. Stat. Interface 2019, 12, 439–455. [Google Scholar] [CrossRef] [Green Version]
Li, J.Q.; Barron, A.R. Mixture density estimation. Adv. Neural Inf. Process. Syst. 2000, 12, 279–285. [Google Scholar]
Pearson, K. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1984, 185, 71–110. [Google Scholar]
McLachlan, G.; Peel, D. Finite Mixture Models; John Wiley and Sons: New York, NY, USA, 2004. [Google Scholar]
Roeder, K.; Wasserman, L. Practical Bayesian density estimation using mixtures of normals. J. Am. Stat. Assoc. 1997, 92, 894–902. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Leroux, B. Consistent estimation of a mixing distribution. Ann. Stat. 1992, 20, 1350–1360. [Google Scholar] [CrossRef]
Guan, Z. Efficient and robust density estimation using Bernstein type polynomials. J. Nonparametr. Stat. 2016, 28, 250–271. [Google Scholar] [CrossRef] [Green Version]
James, W.; Stein, C. Estimation with quadratic loss. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 443–460. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 1997, 39, 1–22. [Google Scholar]
Wu, C.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
Stein, C. Estimation of the mean of a multivariate normal distribution. Ann. Stat. 1981, 9, 1135–1151. [Google Scholar] [CrossRef]
Oman, S.D. Contracting towards subspaces when estimating the mean of a multivariate normal distribution. J. Multivar. Anal. 1982, 12, 270–290. [Google Scholar] [CrossRef] [Green Version]
Oman, S.D. Shrinking towards subspaces in multiple linear regression. Technometrics 1982, 24, 307–311. [Google Scholar] [CrossRef]
Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Zitouni, M.; Zribi, M.; Masmoudi, A. Asymptotic properties of the estimator for a finite mixture of exponential dispersion models. Filomat 2018, 32, 6575–6598. [Google Scholar] [CrossRef] [Green Version]
Redner, R.A.; Walker, H.F. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
Tibshirani, R.; Walther, G.; Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. 2001, 63, 411–423. [Google Scholar] [CrossRef]
Chen, S.X. Empirical likelihood confidence intervals for nonparametric density estimation. Biometrika 1996, 83, 329–341. [Google Scholar] [CrossRef]

Figure 1. Quantitative comparison between the proposed estimator and Guan’s estimator of

0.5 B (3, 9) + 0.5 B (9, 3)

for

n = 50

(left) and

n = 100

(right).

Figure 1. Quantitative comparison between the proposed estimator and Guan’s estimator of

0.5 B (3, 9) + 0.5 B (9, 3)

for

n = 50

(left) and

n = 100

(right).

Figure 2. Quantitative comparison among the mean squared error of the kernel estimator, the Bernstein estimator, the Guan’s estimator and the proposed estimator of

0.5 G (1, 6) + 0.5 G (6, 1)

for

n = 200

.

Figure 2. Quantitative comparison among the mean squared error of the kernel estimator, the Bernstein estimator, the Guan’s estimator and the proposed estimator of

0.5 G (1, 6) + 0.5 G (6, 1)

for

n = 200

.

Figure 3. Quantitative comparison among the mean squared error of the kernel estimator, the Bernstein estimator, the Guan’s estimator and the proposed estimator of

0.25 N (2, 1) + 0.75 N (- 3, 1)

for

n = 200

.

Figure 3. Quantitative comparison among the mean squared error of the kernel estimator, the Bernstein estimator, the Guan’s estimator and the proposed estimator of

0.25 N (2, 1) + 0.75 N (- 3, 1)

for

n = 200

.

Figure 4. Qualitative comparison among the kernel estimator defined in (1), the Bernstein estimator defined in (2), Guan’s estimator (3) and the proposed density estimator (17) of Tuna data (left) and of COVID-19 data (right).

Table 1. Average

\hat{I S E}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

I S E

.

Table 1. Average

\hat{I S E}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

I S E

.

Density	n	Proposed	Bernstein	Kernel	Guan’s
Density	n	Estimator	Estimator	Estimator	Estimator
	50	$0.092242$	$0.096684$	$0.197497$	$0.140323$
$(a)$	100	$0.091364$	$0.092242$	$0.174251$	$0.089129$
	200	$0.075532$	$0.079299$	$0.148143$	$0.086305$
	50	$1.157827$	$1.215347$	$0.530446$	$0.816906$
$(b)$	100	$0.235402$	$0.306704$	$0.482152$	$0.276573$
	200	$0.199786$	$0.289870$	$0.474805$	$0.255716$
	50	$0.001423$	$1.808252$	$2.222369$	$1.035522$
$(c)$	100	$0.000384$	$1.410606$	$1.641689$	$1.014602$
	200	$0.000227$	$1.348292$	$1.077352$	$0.994346$
	50	$0.525192$	$2.812448$	$4.936701$	$0.589465$
$(d)$	100	$0.492752$	$2.483141$	$2.331765$	$0.579595$
	200	$0.162917$	$0.898103$	$1.154646$	$0.507362$
	50	$2.180849$	$2.231986$	$2.424656$	$1.084340$
$(e)$	100	$2.050098$	$2.133496$	$2.295932$	$0.835670$
	200	$2.042379$	$2.086204$	$2.053453$	$0.717715$
	50	$0.313388$	$0.896995$	$1.397111$	$0.663889$
$(f)$	100	$0.253988$	$0.656400$	$0.762742$	$0.516530$
	200	$0.186290$	$0.577408$	$0.417980$	$0.472094$

Table 2. Average

\hat{I A E}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

I A E

.

Table 2. Average

\hat{I A E}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

I A E

.

Density	n	Proposed	Bernstein	Kernel	Guan’s
Density	n	Estimator	Estimator	Estimator	Estimator
	50	$0.250241$	$0.250241$	$0.391072$	$0.251641$
$(a)$	100	$0.196423$	$0.207109$	$0.367361$	$0.232399$
	200	$0.180536$	$0.191673$	$0.348499$	$0.214117$
	50	$0.855008$	$0.823416$	$0.621137$	$0.747562$
$(b)$	100	$0.417735$	$0.457722$	$0.669027$	$0.438088$
	200	$0.386280$	$0.455983$	$0.583057$	$0.423595$
	50	$0.035161$	$0.971720$	$1.238839$	$0.948451$
$(c)$	100	$0.019331$	$0.960044$	$1.157838$	$0.944000$
	200	$0.013233$	$0.923259$	$0.953675$	$0.931293$
	50	$0.669269$	$0.807667$	$1.942776$	$0.648617$
$(d)$	100	$0.657815$	$1.486974$	$1.415036$	$0.642986$
	200	$0.351205$	$1.573886$	$0.881866$	$0.637942$
	50	$0.830317$	$0.834203$	$1.499828$	$0.678362$
$(e)$	100	$0.706770$	$0.713225$	$1.374609$	$0.645112$
	200	$0.681767$	$0.692357$	$1.225502$	$0.620315$
	50	$0.453948$	$0.640596$	$1.068483$	$0.608421$
$(f)$	100	$0.414676$	$0.714844$	$0.782939$	$0.584572$
	200	$0.383248$	$0.883320$	$0.516527$	$0.584320$

Table 3. Average

\hat{K L}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

K L

.

Table 3. Average

\hat{K L}

for

N = 500

trials of Bernstein estimator, standard Gaussian kernel estimator and the proposed estimator

{\hat{f}}_{n, m}

, for

n = 50

,

n = 100

and

n = 200

. The bold values indicate the smallest values of

K L

.

Density	n	Proposed	Bernstein	Kernel	Guan’s
Density	n	Estimator	Estimator	Estimator	Estimator
	50	$0.025048$	$0.025048$	$0.289818$	$0.066817$
$(a)$	100	$0.012086$	$0.015023$	$0.081830$	$0.058541$
	200	$0.003256$	$0.001788$	$0.060468$	$0.029419$
	50	$1.088053$	$1.120246$	$0.575298$	$0.667079$
$(b)$	100	$0.284795$	$0.325933$	$0.381406$	$0.659105$
	200	$0.255697$	$0.310759$	$0.150096$	$0.654338$
	50	$2.689781$	$3.871172$	$4.732324$	$3.360316$
$(c)$	100	$0.011844$	$3.591156$	$4.196295$	$3.356093$
	200	$0.009426$	$3.505050$	$3.169852$	$3.318666$
	50	$0.976450$	$0.870222$	$4.702359$	$0.878995$
$(d)$	100	$0.960633$	$1.572251$	$3.031783$	$0.763044$
	200	$0.281862$	$1.584022$	$1.537355$	$0.742696$
	50	$1.549035$	$1.560172$	$5.538142$	$1.799133$
$(e)$	100	$1.207084$	$1.217043$	$3.498273$	$1.557894$
	200	$1.153420$	$1.169645$	$1.322299$	$1.387843$
	50	$0.528337$	$1.052789$	$1.952294$	$0.651422$
$(f)$	100	$0.292017$	$0.805962$	$1.191070$	$0.537625$
	200	$0.062893$	$0.589790$	$0.679154$	$0.339442$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Helali, S.; Masmoudi, A.; Slaoui, Y. Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model. Entropy 2022, 24, 315. https://doi.org/10.3390/e24030315

AMA Style

Helali S, Masmoudi A, Slaoui Y. Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model. Entropy. 2022; 24(3):315. https://doi.org/10.3390/e24030315

Chicago/Turabian Style

Helali, Salima, Afif Masmoudi, and Yousri Slaoui. 2022. "Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model" Entropy 24, no. 3: 315. https://doi.org/10.3390/e24030315

APA Style

Helali, S., Masmoudi, A., & Slaoui, Y. (2022). Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model. Entropy, 24(3), 315. https://doi.org/10.3390/e24030315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Parametric Estimation Using Bernstein Polynomial and a Finite Gaussian Mixture Model

Abstract

1. Introduction

2. Background

The Gaussian Mixture Model and Em Algorithm

3. Proposed Approach

4. Convergence

5. Numerical Studies

5.1. Comparison Study

5.2. Real Dataset

5.2.1. COVID-19 Data

5.2.2. Tuna Data

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI