A Proximal Point Algorithm for Minimum Divergence Estimators with Application to Mixture Models^{ †}

## Abstract

## 1. Introduction

## 2. A Description of the Algorithm

#### 2.1. General Context and Notations

#### 2.2. EM Algorithm and Tseng’s Generalization

#### 2.3. Generalization of Tseng’s Algorithm

## 3. Some Convergence Properties of ${\varphi}^{k}$

**Definition 1.**

**Remark 1.**

- A0.
- Functions $\varphi \mapsto {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}_{T}}),{D}_{\psi}$ are lower semicontinuous;
- A1.
- Functions $\varphi \mapsto {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}_{T}}),{D}_{\psi}$ and ${\nabla}_{1}{D}_{\psi}$ are defined and continuous on, respectively, $\Phi ,\Phi \times \Phi $ and $\Phi \times \Phi $;
- AC.
- Function $\varphi \mapsto \nabla {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}_{T}})$ is defined and continuous on Φ;
- A2.
- ${\Phi}^{0}$ is a compact subset of int$(\Phi )$;
- A3.
- ${D}_{\psi}(\varphi ,\overline{\varphi})>0$ for all $\overline{\varphi}\ne \varphi \in \Phi $.

**Proposition 1.**

**Proof.**

**Proposition 2.**

- (a)
- If AC is verified, then any limit point of ${\left({\varphi}^{k}\right)}_{k}$ is a stationary point of $\varphi \mapsto {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}^{T}})$;
- (b)
- If AC is dropped, then any limit point of ${\left({\varphi}^{k}\right)}_{k}$ is a “generalized” stationary point of $\varphi \mapsto {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}^{T}})$, i.e., zero belongs to the subgradient of $\varphi \mapsto {\widehat{D}}_{\phi}\left({p}_{\varphi}\right|{p}_{{\varphi}^{T}})$ calculated at the limit point.

**Proof.**

**Proposition 3.**

**Proof.**

**Corollary 1.**

**Proof.**

**Proposition 4.**

**Proof.**

## 4. Case Studies

#### 4.1. An Algorithm With Theoretically Global Infimum Attainment

#### 4.2. The Two-Component Gaussian Mixture

**Conclusion 1.**

**Conclusion 2.**

**Conclusion 3.**

**Remark 2.**

## 5. Simulation Study

**Remark 3**

`distrExIntegrate`function of package

`distrEx`. It is a slight modification of the standard function

`integrate`. It performs a Gauss–Legendre quadrature when function

`integrate`returns an error. In the Weibull mixture, we used the

`integral`function from package

`pracma`. Function

`integral`includes a variety of adaptive numerical integration methods such as Kronrod–Gauss quadrature, Romberg’s method, Gauss–Richardson quadrature, Clenshaw–Curtis (not adaptive) and (adaptive) Simpson’s method. Although function

`integral`is slow, it performs better than other functions even if the integrand has a relatively bad behavior.

#### 5.1. The Two-Component Gaussian Mixture Revisited

#### 5.2. The Two-Component Weibull Mixture Model

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

**Figure 1.**Decrease of the (estimated) Hellinger divergence between the true density and the estimated model at each iteration in the Gaussian mixture. The figure to the left is the curve of the values of the kernel-based dual Formula (3). The figure to the right is the curve of values of the classical dual Formula (2). Values are taken at a logarithmic scale $log(1+x)$.

**Table 1.**The mean and the standard deviation of the estimates and the errors committed in a 100 run experiment of a two-component Gaussian mixture. The true set of parameters is $\lambda =0.35,$ ${\mu}_{1}=-2$, ${\mu}_{2}=1.5$.

Estimation Method | λ | sd (λ) | ${\mu}_{1}$ | sd (${\mu}_{1}$) | ${\mu}_{2}$ | sd (${\mu}_{2}$) | TVD | sd (TVD) |
---|---|---|---|---|---|---|---|---|

Without Outliers | ||||||||

Classical MD$\phi $DE | 0.349 | 0.049 | –1.989 | 0.207 | 1.511 | 0.151 | 0.061 | 0.029 |

New MD$\phi $DE–Silverman | 0.349 | 0.049 | –1.987 | 0.208 | 1.520 | 0.155 | 0.062 | 0.029 |

MDPD $a=0.5$ | 0.360 | 0.053 | –1.997 | 0.226 | 1.489 | 0.135 | 0.065 | 0.025 |

EM (MLE) | 0.360 | 0.054 | –1.989 | 0.204 | 1.493 | 0.136 | 0.064 | 0.025 |

With $10\%$ Outliers | ||||||||

Classical MD$\phi $DE | 0.357 | 0.022 | –2.629 | 0.094 | 1.734 | 0.111 | 0.146 | 0.034 |

New MD$\phi $DE–Silverman | 0.352 | 0.057 | –1.756 | 0.224 | 1.358 | 0.132 | 0.087 | 0.033 |

MDPD $a=0.5$ | 0.364 | 0.056 | –1.819 | 0.218 | 1.404 | 0.132 | 0.078 | 0.030 |

EM (MLE) | 0.342 | 0.064 | –2.617 | 0.288 | 1.713 | 0.172 | 0.150 | 0.034 |

**Table 2.**The mean and the standard deviation of the estimates and the errors committed in a 100-run experiment of a two-component Weibull mixture. The true set of parameter is $\lambda =0.35,{\nu}_{1}=1.2,{\nu}_{2}=2$.

Estimation Method | λ | sd (λ) | ${\mu}_{1}$ | sd (${\mu}_{1}$) | ${\mu}_{2}$ | sd (${\mu}_{2}$) | TVD | sd (TVD) |
---|---|---|---|---|---|---|---|---|

Without Outliers | ||||||||

Classical MD$\phi $DE | 0.356 | 0.066 | 1.245 | 0.228 | 2.055 | 0.237 | 0.052 | 0.025 |

New MD$\phi $DE–Silverman | 0.387 | 0.067 | 1.229 | 0.241 | 2.145 | 0.289 | 0.058 | 0.029 |

MDPD $a=0.5$ | 0.354 | 0.068 | 1.238 | 0.230 | 2.071 | 0.345 | 0.056 | 0.029 |

EM (MLE) | 0.355 | 0.066 | 1.245 | 0.228 | 2.054 | 0.237 | 0.052 | 0.025 |

With $10\%$ Outliers | ||||||||

Classical MD$\phi $DE | 0.250 | 0.085 | 1.089 | 0.300 | 1.470 | 0.335 | 0.092 | 0.037 |

New MD$\phi $DE–Silverman | 0.349 | 0.076 | 1.122 | 0.252 | 1.824 | 0.324 | 0.067 | 0.034 |

MDPD $a=0.5$ | 0.322 | 0.077 | 1.158 | 0.236 | 1.858 | 0.344 | 0.060 | 0.029 |

EM (MLE) | 0.259 | 0.095 | 0.941 | 0.368 | 1.565 | 0.325 | 0.095 | 0.035 |

