# Improved Initialization of the EM Algorithm for Mixture Model Parameter Estimation

^{*}

## Abstract

**:**

**rebmix**R package.

## 1. Introduction

## 2. Related Work

## 3. Estimation of the Gaussian Mixture Model Parameters

#### 3.1. Expectation–Maximization Algorithm

#### 3.1.1. Model Selection with EM Algorithm

#### 3.2. REBMIX Algorithm

#### 3.3. Combined REBMIX and EM Algorithm

- Select the multiple number of bins ${v}_{1},{v}_{2},\cdots $ for histogram preprocessing;
- For each v from ${v}_{1},{v}_{2},\cdots $, obtain the corresponding set of REBMIX estimates ${R}_{{v}_{1}},{R}_{{v}_{2}},\cdots $;
- Merge all solutions from the sets ${R}_{{v}_{1}},{R}_{{v}_{2}},\cdots $ into one set R;
- For each solution $r\in R$, run the EM algorithm and obtain the set S of EM improved estimates of the GMM parameters;
- Run the model selection procedure on the set S.

- Select the multiple number of bins ${v}_{1},{v}_{2},\cdots $ for histogram preprocessing;
- For each v from ${v}_{1},{v}_{2},\cdots $, obtain the corresponding set of REBMIX estimates ${R}_{{v}_{1}},{R}_{{v}_{2}},\cdots $;
- Group all solutions from the sets ${R}_{{v}_{1}},{R}_{{v}_{2}},\cdots $ by the number of components c, i.e., form the sets ${R}_{c={c}_{\mathrm{min}}},{R}_{c={c}_{\mathrm{min}}+1},\cdots ,{R}_{c={c}_{\mathrm{max}}}$, where each set holds the estimated parameters of the GMM with the same number of components c;
- From each set ${R}_{c=j}$, choose the solution for which the likelihood value is the largest ${r}_{c=j,\mathrm{best}}$ and merge it into new set ${R}_{\mathrm{best}}=\{{r}_{c={c}_{\mathrm{min}},\mathrm{best}},{r}_{c={c}_{\mathrm{min}}+1,\mathrm{best}}\cdots {r}_{c=c\mathrm{max},\mathrm{best}}\}$;
- For each solution ${r}_{c=j,\mathrm{best}}\in {R}_{\mathrm{best}}$, run the EM algorithm and obtain the set S of EM improved estimates of the GMM parameters;
- Run the model selection procedure on the set S.

- Estimate or intuitively select number of bins v;
- Obtain the corresponding set R of REBMIX estimates for known v;
- For each solution $r\in R$ run EM algorithm and obtain the set S of EM estimates of the GMM parameters;
- Run the model selection procedure on the set S.

## 4. Experiments

#### 4.1. Artificial Datasets

#### 4.2. Density Estimation

#### 4.3. Image Segmentation

## 5. Results and Discussion

#### 5.1. Results on Artificially Generated Datasets

#### 5.1.1. Application to Clustering

#### 5.1.2. Application to the Density Estimation

#### 5.2. Density-Estimation Tasks

#### 5.3. Image-Segmentation Tasks

#### 5.4. Comparison of the Time Complexity of REBMIX and k-Means Algorithm

**rebmix**R package. For the k-means algorithm, we used the default R kmeans implementation provided in the

**base**R package. Although the k-means algorithm needs a different number of iterations, here we considered 100 iterations of the k-means algorithm. This seems reasonable due to the fact that repetitions of the k-means are often preferable [31]. In addition, due to the fact that, for the GMM initialization, the k-means algorithm needs to be repeated ${c}_{\mathrm{max}}-{c}_{\mathrm{min}}$ times, we have taken this into consideration by repeating the k-means algorithm for each $c\in \{{c}_{\mathrm{min}},{c}_{\mathrm{min}}+1,\cdots ,{c}_{\mathrm{max}}\}$. For the REBMIX algorithm, we have used the setting in which the number of bins v is not known and the range used was 3–100. In addition, when the time complexity with respect to the one parameter is evaluated, the other two were kept constant. In other words, when comparing the time complexity with respect to the number of dimensions d, the number of observations $n=1000$ and the maximum number of components ${c}_{\mathrm{max}}=10$ was kept constant. For the comparison with respect to the number of observations n, the number of dimensions was $d=5$ and the maximum number of components ${c}_{\mathrm{max}}=10$. Finally, for the comparison with respect to the maximum number of components ${c}_{\mathrm{max}}$, the number of dimensions was $d=5$ and the number of observations was $n=1000$.

#### 5.5. Note on Selection of Hyperparameters for the REBMIX&EM Strategies and Their Impact on Performance

## 6. Conclusions

**rebmix**R package (https://cran.r-project.org/web/packages/rebmix/index.html), and they are ready to use. In addition, let us explain how our work can be extended. First, the Bayesian regularization for the EM algorithm could be used to additionally avoid degeneracies and constrict the likelihood [14]. Second, the use of the GMM for certain image-segmentation tasks and the density-estimation tasks was questionable, like for the flower image or the Bend distribution. Although we have used the GMM as a proof of concept, our work can be extended to other types of MMs. For the extension to the other parametric family of the MM, the equations for the REBMIX and EM algorithms need to be derived. In [17], it was shown how the REBMIX algorithm can be extended to other types of MM, particularly to the von-Mises MM. For the EM algorithm, the M-step equations need to be derived. For example, in [18], the derivation of the Weibull-Normal MM parameters is given. In addition, there is a plethora of literature about how the EM can be extended to other parametric families [9,10] to name just a few. We will strive to provide some more theoretical insights into the REBMIX algorithm. Specifically, the empirical equation for decreasing the parameter ${D}_{\mathrm{min}}$ should be improved to avoid nonlinear time complexity in terms of the parameter ${c}_{\mathrm{max}}$. The hyperparameter, the number of bins v for the histogram preprocessing, needs special care. For the histogram preprocessing, we only used the same number of bins in each dimension, although the number of bins can be different in each dimension, or, finally, some adaptive bandwidth for the histogram preprocessing can be applied, like the ones used for a kernel density estimation in [11]. Some other REBMIX preprocessing capabilities like kernel density or k-nearest neighbor could be used. In the end, the application of the MM for the image segmentation should be more thoroughly researched. Throughout the testing of the proposals for the image-segmentation tasks, we encountered some interesting facts outlined in Section 5.3. As this was not the primary problem of this article, we left this topic open to be revisited.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc.
**1977**, 39, 1–38. [Google Scholar] - Yu, J.; Chaomurilige, C.; Yang, M.S. On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognit.
**2018**, 77, 188–203. [Google Scholar] [CrossRef] - Ma, J.; Jiang, X.; Jiang, J.; Gao, Y. Feature-guided Gaussian mixture model for image matching. Pattern Recognit.
**2019**, 92, 231–245. [Google Scholar] [CrossRef] - Liu, C.; Li, H.C.; Fu, K.; Zhang, F.; Datcu, M.; Emery, W.J. Bayesian estimation of generalized Gamma mixture model based on variational EM algorithm. Pattern Recognit.
**2019**, 87, 269–284. [Google Scholar] [CrossRef] - Du, Y.; Gui, W. Goodness of Fit Tests for the Log-Logistic Distribution Based on Cumulative Entropy under Progressive Type II Censoring. Mathematics
**2019**, 7, 361. [Google Scholar] [CrossRef][Green Version] - Pagès-Zamora, A.; Cabrera-Bean, M.; Díaz-Vilor, C. Unsupervised online clustering and detection algorithms using crowdsourced data for malaria diagnosis. Pattern Recognit.
**2019**, 86, 209–223. [Google Scholar] [CrossRef][Green Version] - Yu, L.; Yang, T.; Chan, A.B. Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference. IEEE Trans. Pattern Anal. Mach. Intell
**2019**, 41, 1323–1337. [Google Scholar] [CrossRef] [PubMed] - Gebru, I.D.; Alameda-Pineda, X.; Forbes, F.; Horaud, R. EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis. IEEE Trans. Pattern Anal. Mach. Intell
**2016**, 38, 2402–2415. [Google Scholar] [CrossRef] [PubMed][Green Version] - McLachlan, G.; Peel, D. Finite Mixture Models, 1st ed.; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
- McLachlan, G.; Krishnan, T. The EM Algorithm and Extensions, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
- Bäcklin, C.L.; Andersson, C.; Gustafsson, M.G. Self-tuning density estimation based on Bayesian averaging of adaptive kernel density estimations yields state-of-the-art performance. Pattern Recognit.
**2018**, 78, 133–143. [Google Scholar] [CrossRef] - Yang, M.S.; Lai, C.Y.; Lin, C.Y. A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognit.
**2012**, 45, 3950–3961. [Google Scholar] [CrossRef] - Celeux, G.; Govaert, G. Gaussian parsimonious clustering models. Pattern Recognit.
**1995**, 28, 781–793. [Google Scholar] [CrossRef][Green Version] - Baudry, J.P.; Celeux, G. EM for mixtures. Stat. Comput.
**2015**, 25, 713–726. [Google Scholar] [CrossRef] - Ng, S.K.; McLachlan, G.J. Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recognit.
**2004**, 37, 1573–1589. [Google Scholar] [CrossRef] - Nagode, M. Finite Mixture Modeling via REBMIX. J. Algorithms Optim.
**2015**, 3, 14–28. [Google Scholar] [CrossRef] - Ye, X.; Xi, P.; Nagode, M. Extension of REBMIX algorithm to von Mises parametric family for modeling joint distribution of wind speed and direction. Eng. Struct.
**2019**, 183, 1134–1145. [Google Scholar] [CrossRef] - Franko, M.; Nagode, M. Probability density function of the equivalent stress amplitude using statistical transformation. Reliab. Eng. Syst. Saf.
**2015**, 134, 118–125. [Google Scholar] [CrossRef] - Gallaugher, M.P.; McNicholas, P.D. Finite mixtures of skewed matrix variate distributions. Pattern Recognit.
**2018**, 80, 83–93. [Google Scholar] [CrossRef][Green Version] - Franczak, B.C.; Browne, R.P.; McNicholas, P.D. Mixtures of Shifted Asymmetric Laplace Distributions. IEEE Trans. Pattern Anal. Mach. Intell
**2014**, 36, 1149–1157. [Google Scholar] [CrossRef][Green Version] - Wang, H.; Luo, B.; Zhang, Q.; Wei, S. Estimation for the number of components in a mixture model using stepwise split-and-merge EM algorithm. Pattern Recognit. Lett.
**2004**, 25, 1799–1809. [Google Scholar] [CrossRef] - Zhang, B.; Zhang, C.; Yi, X. Competitive EM algorithm for finite mixture models. Pattern Recognit.
**2004**, 37, 131–144. [Google Scholar] [CrossRef] - Figueiredo, M.A.T.; Jain, A.K. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell
**2002**, 24, 381–396. [Google Scholar] [CrossRef][Green Version] - Ari, Ç.; Aksoy, S.; Arıkan, O. Maximum likelihood estimation of Gaussian mixture models using stochastic search. Pattern Recognit.
**2012**, 45, 2804–2816. [Google Scholar] [CrossRef] - Biernacki, C.; Celeux, G.; Govaert, G. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal.
**2003**, 41, 561–575. [Google Scholar] [CrossRef] - Melnykov, V.; Melnykov, I. Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal.
**2012**, 56, 1381–1395. [Google Scholar] [CrossRef] - Michael, S.; Melnykov, V. An effective strategy for initializing the EM algorithm in finite mixture models. Adv. Data Anal. Classif.
**2016**, 10, 563–583. [Google Scholar] [CrossRef] - Kwedlo, W. A new random approach for initialization of the multiple restart EM algorithm for Gaussian model-based clustering. Pattern Anal. Appl.
**2015**, 18, 757–770. [Google Scholar] [CrossRef][Green Version] - Maitra, R. Initializing Partition-Optimization Algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2009**, 6, 144–157. [Google Scholar] [CrossRef] - Zhao, Q.; Hautamäki, V.; Kärkkäinen, I.; Fränti, P. Random swap EM algorithm for Gaussian mixture models. Pattern Recognit. Lett.
**2012**, 33, 2120–2126. [Google Scholar] [CrossRef] - Fränti, P.; Sieranoja, S. How much can k-means be improved by using better initialization and repeats? Pattern Recognit.
**2019**, 93, 95–112. [Google Scholar] [CrossRef] - Scrucca, L.; Raftery, A.E. Improved initialisation of model-based clustering using Gaussian hierarchical partitions. Adv. Data. Anal. Classif.
**2015**, 9, 447–460. [Google Scholar] [CrossRef][Green Version] - Bishop, C. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Nagode, M.; Fajdiga, M. The REBMIX Algorithm for the Univariate Finite Mixture Estimation. Commun. Stat. Theory Methods
**2011**, 40, 876–892. [Google Scholar] [CrossRef] - Nagode, M.; Fajdiga, M. The REBMIX Algorithm for the Multivariate Finite Mixture Estimation. Commun. Stat. Theory Methods
**2011**, 40, 2022–2034. [Google Scholar] [CrossRef] - Nagode, M. Multivariate normal mixture modeling, clustering and classification with the rebmix package. arXiv
**2018**, arXiv:stat.ML/1801.08788. [Google Scholar] - Melnykov, V.; Chen, W.C.; Maitra, R. MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms. J. Stat. Softw.
**2012**, 51, 1–25. [Google Scholar] [CrossRef][Green Version] - Hubert, L.; Arabie, P. Comparing partitions. J. Classif.
**1985**, 2, 193–218. [Google Scholar] [CrossRef] - Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
- Knuth, K.H. Optimal Data-Based Binning for Histograms. arXiv
**2006**, arXiv:physics/0605197. [Google Scholar] - Aksac, A.; Özyer, T.; Alhajj, R. CutESC: Cutting edge spatial clustering technique based on proximity graphs. Pattern Recognit.
**2019**, 96, 106948. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. scikit-image: Image processing in Python. PeerJ
**2014**, 2, e453. [Google Scholar] [CrossRef] - Scott, D.W.; Sain, S.R. 9—Multidimensional Density Estimation. In Data Mining and Data Visualization; Rao, C., Wegman, E., Solka, J., Eds.; Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2005; Volume 24, pp. 229–261. [Google Scholar]
- Sturges, H.A. The Choice of a Class Interval. J. Am. Stat. Assoc.
**1926**, 21, 65–66. [Google Scholar] [CrossRef] - Velleman, P.F. Interactive Computing for Exploratory Data Analysis I: Display Algorithms. In Proceedings of the Statistical Computing Section; American Statistical Association: Washington, DC, USA, 1976. [Google Scholar]

**Figure 2.**Examples of artificial datasets with two dimensions and five components with a variable number of observations and an overlap level. The upper three graphs had $a=0.001$ and the lower three graphs had $a=0.125$. From left to right, the number of observations were respectively $n=100$, $n=1000$, and $n=$ 10,000.

**Figure 3.**Showcase of selected images from [39]. First row: flower image (label 353013; herd image (label 38092); woman image (label 216053). Second row: fisherman image (label 48055); bridge image (label 22093).

**Figure 5.**Clustering performance of the Exhaustive REBMIX&EM strategy, Best REBMIX&EM strategy, Single REBMIX&EM strategy, Kmeans&EM, Random&EM, and REBMIX on artificial datasets.

**Figure 6.**Density-estimation performance evaluation of Exhaustive REBMIX&EM strategy, Best REBMIX&EM strategy, Single REBMIX&EM strategy, Kmeans&EM, Random&EM and REBMIX on the artificial datasets; First row: Plots given are grouped by the number of observations n. Second row: Plots are grouped by the dimension d of the dataset for the $n=100$ observations in the dataset.

**Figure 7.**Density-estimation performance evaluation of Exhaustive REBMIX&EM strategy, Best REBMIX&EM strategy, Single REBMIX&EM strategy, Kmeans&EM, Random&EM, and ADEBA on datasets sampled from multivariate distributions in [11].

**Figure 8.**Mean computation times of Exhaustive REBMIX&EM strategy, Best REBMIX&EM strategy, Single REBMIX&EM strategy, Kmeans&EM, Random&EM, and ADEBA for each dataset size n.

**Figure 9.**Showcase segmentation of flower image (label 353013) and herd image (label 38092. The first row is ground-truth segmentations; Second row: Segmentation using random colors for each component in estimated GMM; Third row: Segmentations using mean values of the components in the estimated GMM.

**Figure 10.**Different types of noise added to the herd image (label 38092). First row: Original image, salt&pepper noise added, salt noise added, pepper noise added; Second row: Gaussian noise added, Poisson noise added, speckle noise added, localvar noise added.

**Figure 14.**Optimal number of bins v selected with the different strategies and Knuth rule on the artificial datasets from Section 4.1.

Parameter | Values |
---|---|

Dimension (d) | 3, 5, 10 |

Number of components (c) | 5, 10, 15 |

Overlap level (a) | 0.001, 0.125 |

M/N | ${\mathit{N}}_{1}\phantom{\rule{2.0pt}{0ex}}{\mathit{N}}_{2}\phantom{\rule{2.0pt}{0ex}}\cdots \phantom{\rule{2.0pt}{0ex}}{\mathit{N}}_{\mathit{s}}$ | Sums |
---|---|---|

${M}_{1}$ | ${n}_{11}\phantom{\rule{2.0pt}{0ex}}{n}_{12}\phantom{\rule{2.0pt}{0ex}}\cdots \phantom{\rule{2.0pt}{0ex}}{n}_{1s}$ | ${a}_{1}$ |

${M}_{2}$ | ${n}_{21}\phantom{\rule{2.0pt}{0ex}}{n}_{22}\phantom{\rule{2.0pt}{0ex}}\cdots \phantom{\rule{2.0pt}{0ex}}{n}_{2s}$ | ${a}_{2}$ |

⋮ | $\vdots \phantom{\rule{10.0pt}{0ex}}\vdots \phantom{\rule{10.0pt}{0ex}}\ddots \phantom{\rule{10.0pt}{0ex}}\vdots $ | ⋮ |

${M}_{r}$ | ${n}_{r1}\phantom{\rule{2.0pt}{0ex}}{n}_{r2}\phantom{\rule{2.0pt}{0ex}}\cdots \phantom{\rule{2.0pt}{0ex}}{n}_{rs}$ | ${a}_{r}$ |

Sums | ${b}_{1}\phantom{\rule{4.0pt}{0ex}}{b}_{2}\phantom{\rule{4.0pt}{0ex}}\cdots \phantom{\rule{4.0pt}{0ex}}{b}_{s}$ |

Image Label/Strategy | 353013 | 38092 | 216053 | 48055 | 22093 | |
---|---|---|---|---|---|---|

Random&EM | c | 20 | 20 | 20 | 20 | 20 |

ARI | 0.29 | 0.28 | 0.47 | 0.37 | 0.34 | |

Kmeans&EM | c | 20 | 20 | 20 | 20 | 20 |

ARI | 0.24 | 0.28 | 0.24 | 0.36 | 0.35 | |

Single REBMIX&EM with Knuth rule | c | 20 | 19 | 19 | 19 | 20 |

ARI | 0.38 | 0.42 | 0.35 | 0.41 | 0.41 | |

Single REBMIX&EM with $v=255$ | c | 18 | 20 | 20 | 20 | 20 |

ARI | 0.67 | 0.67 | 0.63 | 0.45 | 0.34 | |

Best REBMIX&EM | c | 19 | 16 | 16 | 16 | 18 |

ARI | 0.44 | 0.54 | 0.25 | 0.47 | 0.38 |

Image/Algorithm | 353013 | 38092 | 216053 | 48055 | 22093 |
---|---|---|---|---|---|

MS | 0.46 | 0.54 | 0.69 | 0.1 | 0.43 |

CutESC | 0.86 | 0.71 | 0.92 | 0.79 | 0.76 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Panić, B.; Klemenc, J.; Nagode, M. Improved Initialization of the EM Algorithm for Mixture Model Parameter Estimation. *Mathematics* **2020**, *8*, 373.
https://doi.org/10.3390/math8030373

**AMA Style**

Panić B, Klemenc J, Nagode M. Improved Initialization of the EM Algorithm for Mixture Model Parameter Estimation. *Mathematics*. 2020; 8(3):373.
https://doi.org/10.3390/math8030373

**Chicago/Turabian Style**

Panić, Branislav, Jernej Klemenc, and Marko Nagode. 2020. "Improved Initialization of the EM Algorithm for Mixture Model Parameter Estimation" *Mathematics* 8, no. 3: 373.
https://doi.org/10.3390/math8030373