# Fitting Non-Parametric Mixture of Regressions: Introducing an EM-Type Algorithm to Address the Label-Switching Problem

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Model Definition

#### 2.2. Local-Likelihood Estimation and the Label-Switching Problem

#### 2.2.1. Local-Likelihood Estimation

Algorithm 1 The EM algorithm for fitting non-parametric mixtures of regression. |

Step 1: (Initialization) Provide the initial values for ${\pi}_{k}^{\left(0\right)}\left(u\right)$, ${m}_{k}^{\left(0\right)}\left(u\right)$ and ${\sigma}_{k}^{2\left(0\right)}\left(u\right)$ for all $u\in \mathcal{U}$ and $k=1,2,\dots ,K$.Step 2: (E-Step) At the ${t}^{th}$ iteration, use expression (6) to compute the local responsibilities for each grid point $u\in \mathcal{U}$.Step 3: (M-Step) Let ${\gamma}_{k}^{\left(t\right)}\left(u\right)=({\gamma}_{1k}^{\left(t\right)}\left(u\right),{\gamma}_{2k}^{\left(t\right)}\left(u\right),\dots ,{\gamma}_{nk}^{\left(t\right)}\left(u\right))$ be a vector of local responsibilities at grid point $u\in \mathcal{U}$ associated with the ${k}^{th}$ component. Compute ${\widehat{\pi}}_{k}\left(u\right)$, ${\widehat{m}}_{k}\left(u\right)$ and ${\widehat{\sigma}}_{k}^{2}\left(u\right)$, for each $u\in \mathcal{U}$ and $k=1,2,\dots ,K$, using expressions (8)–(10).Step 4: Alternate between the E- and the M-Step until convergence. |

#### 2.2.2. Label-Switching Problem

#### 2.3. Modified Estimation Procedure

#### 2.3.1. Regularity Assumptions

**Assumption**

**1.**

**Assumption**

**2.**

#### 2.3.2. The Proposed Algorithm

Algorithm 2 Modified EM algorithm for fitting the NPGMRs model. |

Step 1:Perform local likelihood estimation using Algorithm 1. For each grid point $u\in \mathcal{U}$, consider the local responsibilities ${\gamma}_{k}\left(u\right)$, for $k=1,2,\dots ,K$, obtained at convergence. Step 2:For each grid point $u\in \mathcal{U}$, use the local responsibilities, ${\gamma}_{k}\left(u\right)$, for $k=1,2,\dots ,K$, to calculate the non-parametric mixture regression functions using (8)–(10) for all $v\in \mathcal{U}$. Thus obtaining the following set of non-parametric mixture of regression functions
$$\begin{array}{c}\hfill \mathcal{M}\left(u\right)=\{({\widehat{\pi}}_{k}\left(v\right),{\widehat{m}}_{k}\left(v\right),{\widehat{\sigma}}_{k}^{2}\left(v\right),{\gamma}_{k}\left(u\right)):v\in \mathcal{U};k=1,2,\dots ,K\}\end{array}$$
Step 3:Let $\mathcal{M}=\left\{\mathcal{M}\right(u):u\in \mathcal{U}\}$ and choose, as the final estimated non-parametric mixture of regression functions, the subset of functions $\mathfrak{M}\left(\mathcal{U}\right)\in \mathcal{M}$, where $\mathcal{U}$ denotes that the functions in the set $\mathfrak{M}(\xb7)$ are defined over the set of values $\mathcal{U}$, such that
$$\begin{array}{c}\hfill \kappa =\underset{k}{max}{\int}_{\mathcal{U}}{\{{\widehat{m}}_{k}^{\left(2\right)}\left(v\right)\}}^{2}dv\end{array}$$
Let $D=\{({x}_{i},{y}_{i}):i=1,2,\dots ,n\}$ the set of random sample data. To obtain the set $\mathfrak{M}\left(D\right)$ we, respectively, interpolate the function values in the set $\mathfrak{M}\left(\mathcal{U}\right)$. |

## 3. Simulation Study

#### 3.1. Choosing the Bandwidth and Number of Components

- (1)
- For each $k=1,2,\dots ,Kmax$, find the best bandwidth using the cross-validation approach, where $Kmax$ is the largest number of components to consider.
- (2)
- For each of the models in (1) based on the best bandwidth, choose as a final model the one that minimizes the BIC

#### 3.2. Initializing the Fitting Algorithm

- (1)
- For each $p=2,3,\dots ,5$, we estimate 20 ${p}^{th}-$degree polynomial GMLRs models.
- (2)
- Choose the model that minimizes the BIC in (1) to initialize the model.

#### 3.3. Performance Measures

- (a)
- Root of the Average Squared Errors (RASE)$$\begin{array}{c}\hfill {\mathrm{RASE}}_{f}^{2}=\frac{1}{n}\sum _{k=1}^{K}\sum _{i=1}^{n}{\gamma}_{ik}{\left(\right)}^{{f}_{k}}2\end{array}$$
- (b)
- Maximum Absolute Error (MAE)$$\begin{array}{c}\hfill {\mathrm{MAE}}_{f}=\underset{k}{max}\underset{i}{max}{\gamma}_{ik}|{f}_{k}\left({x}_{i}\right)-{\widehat{f}}_{k}\left({x}_{i}\right)|\end{array}$$
- (c)
- Model Classification StrengthLet D be the set of observed data and $\mathbf{z}$ the corresponding component indicator variable. Define $M[D,\mathbf{z}]$ as an $n\times n$ matrix with the $i{i}^{{}^{\prime}}$ element $M{[D,\mathbf{z}]}_{i{i}^{{}^{\prime}}}=1$ if ${z}_{ik}=1$ and ${z}_{{i}^{{}^{\prime}}k}=1$ and zero otherwise. That is, observations i and ${i}^{{}^{\prime}}$ are co-members of the same component.Define$$\begin{array}{c}\hfill cs=\frac{1}{n(n-1)}\sum _{i\ne {i}^{{}^{\prime}}=1}^{n}\mathbf{1}\left(\right)open="["\; close="]">M{[D,\mathbf{z}]}_{i{i}^{{}^{\prime}}}=M{[D,\widehat{\mathbf{z}}]}_{i{i}^{{}^{\prime}}}=1\end{array}$$$$\begin{array}{c}\hfill {\widehat{z}}_{ij}=\left(\right)open="\{"\; close>\begin{array}{c}1\phantom{\rule{28.45274pt}{0ex}}if\phantom{\rule{14.22636pt}{0ex}}{\gamma}_{ij}=ma{x}_{k}{\gamma}_{ik}\hfill \\ 0\phantom{\rule{28.45274pt}{0ex}}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}\hfill \end{array}\end{array}$$
- (d)
- Coefficient of Determination (${R}^{2}$) We use the following to calculate the proportion of variation in the response explained by the fitted NPGMRs model$$\begin{array}{c}\hfill {R}^{2}=\frac{BSS+EWSS}{TSS}=1-\frac{RWSS}{TSS}\end{array}$$
- (e)
- Standard Errors and Confidence IntervalsWe use the bootstrap approach to approximate the point-wise standard errors of the estimates as well as the confidence intervals for the model parameter functions. For a given ${x}_{0}$ we use the estimated model to generate the corresponding ${y}^{*}\sim {\sum}_{k=1}^{K}\widehat{\pi}\left({x}_{0}\right)\mathcal{N}\{{\widehat{m}}_{k}\left({x}_{0}\right),{\widehat{\sigma}}_{k}^{2}\left({x}_{0}\right)\}$; this way we generate the bootstrap sample denoted by $\{({x}_{i},{y}_{i}^{*}):i=1,2,\dots ,n\}$. We generate $B=1000$ such samples to produce bootstrap fitted models to approximate the point-wise standard errors and confidence intervals.

#### 3.4. Simulation Studies

## 4. Application

#### 4.1. Problem and Data Description

#### 4.2. Modelling and Results

`ks.test`function from the stats package of the R programming language [21]. The normality of each of the two components fitted by the proposed algorithm cannot be rejected. The normality of the first component fitted by the effective algorithm is rejected and that of the second component cannot be rejected at a 5% significance level.

## 5. Discussion

_{2}emissions (as the response) and national income (as a covariate) for a group of 145 countries. The effectiveness of the proposed algorithm was demonstrated by its ability to identify two latent components wholly independent of the initial conditions. Using a goodness-of-fit test, we showed that the Gaussian assumption on the component distributions of the two fitted components, based on the proposed algorithm, is appropriate.

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

BIC | Bayesian Information Criterion |

EM | Expectation-Maximization |

GMLRs | Gaussian Mixture of Linear Regressions |

LLFs | Local-Likelihood Functions |

NPGMRs | Non-parametric Gaussian Mixture of Regressions |

## References

- Titterington, D.M.; Smith, A.F.M.; Makov, U.E. Statistical Analysis of Finite Mixture Distributions; John Wiley and Sons: Hoboken, NJ, USA, 1985. [Google Scholar]
- Frühwirth-Schnatter, S.; Celeux, G.; Robert, C.P. Handbook of Mixture Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
- Quandt, R.E. A New Approach to Estimating Switching Regressions. J. Am. Stat. Assoc.
**1972**, 67, 306–310. [Google Scholar] [CrossRef] - Goldfeld, S.M.; Quandt, R.E. A Markov model for switching regressions. J. Econom.
**1973**, 1, 3–15. [Google Scholar] [CrossRef] - Quandt, R.E.; Ramsey, J.B. Estimating Mixtures of Normal Distributions and Switching Regressions. J. Am. Stat. Assoc.
**1978**, 73, 730–738. [Google Scholar] [CrossRef] - Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models; Springer Science & Business Media: New York, NY, USA, 2006. [Google Scholar]
- Hurn, M.; Justel, A.; Robert, C.P. Estimating mixtures of regressions. J. Comput. Graph. Stat.
**2003**, 12, 55–79. [Google Scholar] [CrossRef] - Huang, M.; Li, R.; Wang, S. Nonparametric mixture of regression models. J. Am. Stat. Assoc.
**2013**, 108, 929–941. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Xiang, S.; Yao, W. Semi-parametric mixtures of non-parametric regressions. Ann. Inst. Stat. Math.
**2018**, 70, 131–154. [Google Scholar] [CrossRef] - Wu, X.; Liu, T. Estimation and testing for semiparametric mixtures of partially linear models. Commun. Stat.-Theory Methods
**2017**, 46, 8690–8705. [Google Scholar] [CrossRef] - Zhang, Y.; Zheng, Q. Semiparametric mixture of additive regression models. Commun. Stat.-Theory Methods
**2018**, 47, 681–697. [Google Scholar] [CrossRef] - Zhang, Y.; Pan, W. Estimation and inference for mixture of partially linear additive models. Commun.-Stat.-Theory Methods
**2020**, 51, 2519–2533. [Google Scholar] [CrossRef] - Xiang, S.; Yao, W. Semi-parametric mixtures of regressions with single-index for model based clustering. Adv. Data Anal. Classif.
**2020**, 14, 261–292. [Google Scholar] [CrossRef] [Green Version] - Xiang, S.; Yao, W.; Yang, G. An Overview of Semi-parametric Extensions of Finite Mixture Models. Stat. Sci.
**2019**, 34, 391–404. [Google Scholar] [CrossRef] [Green Version] - Tibshirani, R.; Hastie, T. Local likelihood estimation. J. Am. Stat. Assoc.
**1987**, 82, 559–567. [Google Scholar] [CrossRef] - Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol.
**1977**, 39, 1–38. [Google Scholar] - Stephens, M. Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2000**, 62, 795–809. [Google Scholar] [CrossRef] - Tibshirani, R.; Walther, G. Cluster validation by prediction strength. J. Comput. Graph. Stat.
**2005**, 14, 511–528. [Google Scholar] [CrossRef] - Ingrassia, S.; Punzo, A. Cluster validation for mixtures of regressions via the total sum of squares decomposition. J. Classif.
**2020**, 37, 526–547. [Google Scholar] [CrossRef] - Dinda, S. Environmental Kuznets curve hypothesis: A survey. Ecol. Econ.
**2004**, 49, 431–455. [Google Scholar] [CrossRef] [Green Version] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]

**Figure 1.**The local-likelihood estimation procedure for fitting a two-component NPGMRs model: (

**a**) the true mixture of regressions used to generate the data (solid black curves are the component regression curves) (

**b**) the local likelihood estimation procedure using four grid points: the crosses represent the component means obtained by fitting a two-component mixture of Gaussians at each grid point. (

**c**) the label-switching problem: at grid point $u=0.4$, the estimated component means of the two components have switched labels.

**Figure 2.**Plots of the component regression functions for the three scenarios of the two-component NPGMRs model.

**Figure 4.**Bootstrap standard errors: plots of the estimated point-wise bootstrap standard errors at the grid points (shown by the bullet) for the estimated mean function of component 1 (left panel) and component 2 (right panel) for sample sizes $n=200$ (top panel), $n=400$ (middle panel) and $n=800$ (bottom panel). The error bars represent the approximate 95% point-wise bootstrap confidence intervals at the grid points. We also plot the point-wise standard errors (shown by the cross) obtained as the standard deviation of 500 estimates at the grid points.

**Figure 5.**Application data and fitted NPGMRs model: (

**a**) scatter plot of the data, (

**b**) initial component regression functions, (

**c**) fitted $K=2$ component NPGMRs model using proposed algorithm and (

**d**) using the algorithm in Huang et al. The dotted curves give the point-wise 95% bootstrap confidence intervals obtained using 5000 bootstrap samples.

Functions | Component (k) | |
---|---|---|

1 | 2 | |

${\pi}_{k}\left(x\right)$ | $exp\left(0.5x\right)/\{1+exp(0.5x\left)\right\}$ | $1-{\pi}_{1}\left(x\right)$ |

${m}_{k}\left(x\right)$ | $a-sin\left(2\pi x\right)$ | $cos\left(3\pi x\right)$ |

${\sigma}_{k}(x$) | $0.6exp\left(0.5x\right)$ | $0.5exp(-0.2x)$ |

Scenario | ||||||
---|---|---|---|---|---|---|

a = 1 | a = 2 | a = 3 | ||||

RASE_{m} | R^{2} | RASE_{m} | R^{2} | RASE_{m} | R^{2} | |

$n=200$ | ||||||

Proposed Algorithm | 0.3604 (0.0849) | 69.7196 (5.0879) | 0.2546 (0.0683) | 77.7391 (3.6825) | 0.2026 (0.043) | 86.2399 (2.0393) |

Effective Algorithm | 0.4339 (0.1374) | 68.7144 (5.4797) | 0.299 (0.1723) | 77.3533 (4.0877) | 0.2122 (0.0743) | 86.2229 (2.1173) |

$n=400$ | ||||||

Proposed Algorithm | 0.3018 (0.0678) | 69.1957 (3.7468) | 0.1929 (0.0427) | 77.7333 (2.6144) | 0.1545 (0.0296) | 86.3577 (1.2879) |

Effective Algorithm | 0.3987 (0.1359) | 67.7125 (4.0693) | 0.2132 (0.0696) | 77.3875 (2.7089) | 0.157 (0.0396) | 86.3374 (1.3074) |

$n=800$ | ||||||

Proposed Algorithm | 0.2533 (0.0494) | 68.9866 (2.7873) | 0.1485 (0.0278) | 77.8396 (1.9831) | 0.059 (0.0119) | 86.3538 (0.9668) |

Effective Algorithm | 0.3905 (0.1502) | 67.3622 (3.3621) | 0.1671 (0.0439) | 77.5533 (2.0859) | 0.1197 (0.0305) | 86.3476 (0.9778) |

Algorithm | RASE${}_{\mathit{m}}$ | $\mathit{ntrap}$ |
---|---|---|

Proposed Algorithm | 0.1944 (0.0445) | 1 |

Effective Algorithm | 0.2539 (0.0715) | 325 |

Model | K | h | BIC |
---|---|---|---|

NPGMRs | 1 | 0.95 | 770.7226 |

$\mathbf{2}$ | $\mathbf{0}.\mathbf{945}$ | $\mathbf{706}.\mathbf{9895}$ | |

3 | 0.945 | 766.3612 | |

4 | 0.95 | 819.9092 | |

5 | 0.9 | 916.5939 | |

GMLRs | 1 | 810.1633 | |

1 | 811.2591 | ||

2 | 760.8051 | ||

3 | 754.7527 | ||

4 | 788.4389 |

Algorithm | Performance Measures | ||||
---|---|---|---|---|---|

BIC | R${}^{2}$ | ||||

Estimated | Bootstrap Mean (Std) | 95% (Lower) Bootstrap | 95% (Upper) Bootstrap | ||

Proposed Algorithm | $\mathbf{706}.\mathbf{9895}$ | 83.3556 | 80.7076 (6.1293) | 63.8578 | 89.1703 |

Effective Algorithm | 718.4571 | 73.8182 | 70.0673 (5.8218) | 57.9395 | 80.5303 |

Algorithm | Component 1 | Component 2 | ||
---|---|---|---|---|

Test Statistic | p-Value | Test Statistic | p-Value | |

Proposed Algorithm | 0.1622 | 0.3690 | 0.1069 | 0.1441 |

Effective Algorithm | 0.2589 | <0.0001 | 0.1733 | 0.0690 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Skhosana, S.B.; Kanfer, F.H.J.; Millard, S.M.
Fitting Non-Parametric Mixture of Regressions: Introducing an EM-Type Algorithm to Address the Label-Switching Problem. *Symmetry* **2022**, *14*, 1058.
https://doi.org/10.3390/sym14051058

**AMA Style**

Skhosana SB, Kanfer FHJ, Millard SM.
Fitting Non-Parametric Mixture of Regressions: Introducing an EM-Type Algorithm to Address the Label-Switching Problem. *Symmetry*. 2022; 14(5):1058.
https://doi.org/10.3390/sym14051058

**Chicago/Turabian Style**

Skhosana, Sphiwe B., Frans H. J. Kanfer, and Salomon M. Millard.
2022. "Fitting Non-Parametric Mixture of Regressions: Introducing an EM-Type Algorithm to Address the Label-Switching Problem" *Symmetry* 14, no. 5: 1058.
https://doi.org/10.3390/sym14051058