# On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Mixture-Based Clustering for the Ordered Stereotype Model

**Likelihood****functions:**- The (incomplete) likelihood of the data is$$\begin{array}{cc}\hfill L(\Omega ,\{{\tau}_{g}\}\mid \{{y}_{ij}\})& =\prod _{i=1}^{n}\left[\sum _{g=1}^{G}{\tau}_{g}\prod _{j=1}^{m}\prod _{k=1}^{q}{\left({\theta}_{gjk}\right)}^{I({y}_{ij}=k)}\right]\hfill \end{array}$$We define the unknown row group memberships through the following indicator latent variables,$${Z}_{ig}=I(i\in g)=\left\{\begin{array}{cc}1\hfill & ifi\in g\hfill \\ 0\hfill & ifi\notin g\hfill \end{array}\right.\phantom{\rule{15.0pt}{0ex}}i=1,\dots ,n,\phantom{\rule{15.0pt}{0ex}}g=1,\dots ,G$$$$({Z}_{i1},\dots ,{Z}_{ig})\sim \mathrm{Multinomial}(1;{\tau}_{1},\dots ,{\tau}_{G}),\phantom{\rule{15.0pt}{0ex}}i=1,\dots ,n.$$$${l}_{c}(\Omega ,\{{\tau}_{g}\}\mid \{{y}_{ij}\},\{{z}_{ig}\})=\sum _{i=1}^{n}\sum _{g=1}^{G}{z}_{ig}log\left({\tau}_{g}\right)+\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{k=1}^{q}\sum _{g=1}^{G}{z}_{ig}I({y}_{ij}=k)log\left({\theta}_{gjk}\right).$$
**Parameter****estimation:**- The parameter estimation for a fixed number of components G is performed using the maximum likelihood estimation approach fulfilled by means of the expectation-maximization (EM) algorithm proposed by Dempster et al. (1977) and used in most finite mixture problems discussed by McLachlan and Peel (2004).The EM algorithm consists of two steps: expectation (E-step) and maximization (M-step). As part of the E-step, a conditional expectation of the complete data log-likelihood function is obtained given the observed data and current parameter estimates. In the finite mixture model, the latent data corresponds to the component identifiers. As part of the E-step, the expectation taken with respect to the conditional posterior distribution of the latent data, given the observed data and the current parameter estimates, is referred to as the posterior probability that response ${y}_{ij}$ comes from the gth mixture component, computed at each iteration of the EM algorithm. The remaining part of the M-step requires finding component-specific parameter estimates $\Omega $ by solving numerically the maximum likelihood estimation problem for each of the different component distributions.The E-step and M-step alternate until the relative increase in the log-likelihood function is no bigger than a small pre-specified tolerance value, when the convergence of the EM algorithm is achieved. In order to find an optimal number of components, maximum likelihood estimation is obtained for each number of groups G, and the model is selected based on a chosen model selection criterion.In this model, the EM algorithm performs a fuzzy assignment of rows to clusters based on the posterior probabilities. The EM algorithm is initialized with an estimate $\{{\widehat{\Omega}}^{\left(0\right)},\{{\widehat{\tau}}_{g}^{\left(0\right)}\}\}$ of the parameters and proceeds by alternation of the E-step and M-step to estimate the missing data $\{{\widehat{Z}}_{ig}\}$ and to update the parameter estimates. In this section, we develop the E-step and M-step for row clustering. This development follows closely Fernández et al. (2016) (Section 3).
**E-Step:**- In the tth iteration of the EM algorithm, the E-Step evaluates the expected values ${\widehat{Z}}_{ig}$ of the unknown classifications ${Z}_{ig}$ conditional on the data $\{{y}_{ij}\}$ and the previous estimates of the parameters $\{{\widehat{\Omega}}^{(t-1)},\{{\widehat{\tau}}_{g}^{(t-1)}\}\}$. The conditional expectation of the complete data log-likelihood at iteration t is given by$$\begin{array}{cc}\hfill Q(\Omega ,\{{\tau}_{g}\}\mid {\widehat{\Omega}}^{(t-1)},\{{\widehat{\tau}}_{g}^{(t-1)}\})& ={E}_{\{{Z}_{ig}\}\mid \{{y}_{ij}\},{\Omega}^{(t-1)}}\left[{\ell}_{c}(\Omega ,\{{\tau}_{g}\}\mid \{{y}_{ij}\},\{{Z}_{ig}\})\right]\hfill \\ & =\sum _{i=1}^{n}\sum _{g=1}^{G}log({\widehat{\tau}}_{g}^{(t-1)})E\left[{z}_{ig}\mid \{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\right]\hfill \\ \hfill \phantom{\rule{-56.9055pt}{0ex}}+\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{k=1}^{q}\sum _{g=1}^{G}& I({y}_{ij}=k)log\left({\theta}_{gjk}^{(t-1)}\right)E\left[{z}_{ig}\mid \{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\right].\hfill \end{array}$$$$\begin{array}{cc}\hfill {\widehat{Z}}_{ig}^{\left(t\right)}=Pr\left[{Z}_{ig}=1\mid \{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\right]& =\frac{Pr\left(\{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\mid {z}_{ig}=1\right)Pr\left({z}_{ig}=1\right)}{{\sum}_{\ell =1}^{G}Pr\left(\{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\mid {z}_{i\ell}=1\right)Pr\left({z}_{i\ell}=1\right)}\hfill \\ & =\frac{{\widehat{\tau}}_{g}^{(t-1)}{\prod}_{j=1}^{m}{\prod}_{k=1}^{q}{\left({\widehat{\theta}}_{gjk}^{(t-1)}\right)}^{I({y}_{ij}=k)}}{{\sum}_{\ell =1}^{G}\left\{{\widehat{\tau}}_{\ell}^{(t-1)}{\prod}_{j=1}^{m}{\prod}_{k=1}^{q}{\left({\widehat{\theta}}_{\ell jk}^{(t-1)}\right)}^{I({y}_{ij}=k)}\right\}}.\hfill \end{array}$$$$\begin{array}{c}\widehat{Q}(\Omega ,\{{\tau}_{g}\}\mid {\widehat{\Omega}}^{(t-1)},\{{\widehat{\tau}}_{g}^{(t-1)}\})=\sum _{i=1}^{n}\sum _{g=1}^{G}{\widehat{Z}}_{ig}^{\left(t\right)}log({\widehat{\tau}}_{g}^{(t-1)})\\ +\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{k=1}^{q}\sum _{g=1}^{G}{\widehat{Z}}_{ig}^{\left(t\right)}I({y}_{ij}=k)log\left({\widehat{\theta}}_{gjk}^{(t-1)}\right).\end{array}$$
**M-step:**- The M-step of the EM algorithm is the global maximization of the log-likelihood (4) obtained in the E-step, now conditional on the complete data $\{\{{y}_{ij}\},\{{\widehat{Z}}_{ig}\}\}$. For the case of finite mixture models, the updated estimations of the term containing the row-cluster proportions $\{{\tau}_{1},\dots {\tau}_{G}\}$ and the one containing the rest of the parameters $\Omega $ are computed independently. Thus, the M-step has two separate parts.The maximum-likelihood estimator for the parameter ${\tau}_{g}$ where the data ${Z}_{ig}$ are unobserved is$${\widehat{\tau}}_{g}^{\left(t\right)}=\frac{1}{n}\sum _{i=1}^{n}E\left[{Z}_{ig}\mid \{{y}_{ij}\},{\widehat{\Omega}}^{(t-1)}\right]=\frac{1}{n}\sum _{i=1}^{n}{\widehat{Z}}_{ig}^{\left(t\right)},\phantom{\rule{25.0pt}{0ex}}g=1,\dots ,G.$$$$\begin{array}{c}\hfill {\widehat{\Omega}}^{\left(t\right)}=\underset{\Omega}{\mathrm{argmax}}\left[\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{k=1}^{q}\sum _{g=1}^{G}{\widehat{Z}}_{ig}I({y}_{ij}=k)log\left({\theta}_{gjk}(\Omega )\right)\right]\end{array}$$

#### 2.2. The General Linear Cluster-Weighted Model

**Modeling****for**$f\left(y\right|\mathit{x},{\mathit{\vartheta}}_{g})$ and $f(\mathit{x},{\mathit{\theta}}_{g})$:- The CWM model is based on the assumption that $f\left(y\right|\mathit{x},{\mathit{\vartheta}}_{g})$ belongs to the exponential family of distributions that are strictly related to GLMs. The link function in Equation (5) relates the expected value $g\left({\mu}_{g}\right)={\beta}_{0g}+{\beta}_{1g}{x}_{1},\dots ,+{\beta}_{pg}{x}_{p}$. We are interested in estimation of the vector ${\mathit{\beta}}_{g}$, so the distribution of $y|\mathit{x},{\Im}_{g}$ is denoted by $f\left(y\right|\mathit{x},{\mathit{\beta}}_{g},{\lambda}_{g})$, where ${\lambda}_{g}$ denotes an additional parameter associated with a two-parameter exponential family.The marginal distribution $f(\mathit{x},{\theta}_{g})$ has the following components: $f(\mathit{v},{\theta}_{g}^{\prime})$ and $f(\mathit{w},{\theta}_{g}^{\u2033})$. The first component is modeled as p-variate Gaussian density with mean ${\mathit{\mu}}_{\mathit{g}}$ and covariance matrix ${\mathsf{\Sigma}}_{g}$ as $\varphi (\mathit{v},{\mathit{\mu}}_{\mathit{g}},{\mathsf{\Sigma}}_{g})$.The marginal density $f(\mathit{w},{\theta}_{g}^{\u2033})$ assumes that each finite discrete covariate W is represented as a vector ${\mathit{w}}^{r}={({w}^{r1},\dots ,{\mathit{w}}^{r{c}_{r}})}^{\prime}$, where ${w}^{rs}=1$ is ${w}_{r}$, which has the value s, s.t. $s\in \{1,\dots ,{c}_{r}\}$, and ${w}^{rs}=0$ otherwise.$$\begin{array}{c}\hfill f(\mathit{w},{\mathit{\gamma}}_{\mathit{g}})=\prod _{r=1}^{q}\prod _{s=1}^{{c}_{r}}{\left({\gamma}_{grs}\right)}^{{w}^{rs}},g=1,\dots ,G\end{array}$$$$\begin{array}{c}\hfill f(\mathit{x},y;\mathsf{\Phi})=\sum _{g=1}^{G}{\tau}_{g}f\left(y\right|\mathit{x};{\mathit{\beta}}_{g},{\lambda}_{g})\varphi (\mathit{v},{\mathit{\mu}}_{g},{\mathsf{\Sigma}}_{g})f(\mathit{w},{\gamma}_{g}).\end{array}$$
**Parameter****Estimation:**- The EM algorithm discussed in the previous section is used to estimate parameters of this model. Let ${({\mathit{x}}_{1}^{\prime},{y}_{1})}^{\prime},\dots ,{({\mathit{x}}_{n}^{\prime},{y}_{n})}^{\prime}$ be a sample of n independent pairs observations drawn from the model in Equation (9). For this sample, the complete data likelihood function, $L\left(\mathsf{\Phi}\right)$, is given by$$\begin{array}{c}\hfill {\mathsf{\u0141}}_{c}\left(\mathsf{\Phi}\right)=\prod _{i=1}^{n}\prod _{g=1}^{G}{\left[{\tau}_{g}f\left({y}_{i}\right|{\mathit{x}}_{i},{\mathit{\beta}}_{g},{\lambda}_{g})\varphi ({\mathit{v}}_{i},{\mathit{\mu}}_{g},{\mathsf{\Sigma}}_{g})f({\mathit{w}}_{i},{\mathit{\gamma}}_{g})\right]}^{{z}_{ig}}\end{array}$$By taking the logarithm of Equation (10), the complete data log-likelihood function, ${\ell}_{c}\left(\mathsf{\Phi}\right)$, is expressed as$$\begin{array}{c}\hfill {\ell}_{c}\left(\mathsf{\Phi}\right)=\sum _{i=1}^{n}\sum _{g=1}^{G}{z}_{ig}\left[log\left({\tau}_{g}\right)+logf\left({y}_{i}\right|{\mathit{x}}_{i},{\mathit{\beta}}_{g},{\lambda}_{g})+log\varphi ({\mathit{v}}_{i},{\mathit{\mu}}_{g},{\mathsf{\Sigma}}_{g})+logf({\mathit{w}}_{i},{\gamma}_{g})\right].\end{array}$$$$\begin{array}{c}\hfill Q(\mathsf{\Phi};{\mathsf{\Phi}}^{(t-1)})=\sum _{i=1}^{n}\sum _{g=1}^{G}{\tau}_{ig}^{(t-1)}\left[log\left({\tau}_{g}\right)+logf\left({y}_{i}\right|{\mathit{x}}_{i},{\mathit{\beta}}_{g},{\lambda}_{g})+log\varphi ({\mathit{v}}_{i},{\mathit{\mu}}_{g},{\mathsf{\Sigma}}_{g})+logf({\mathit{w}}_{i},{\gamma}_{g})\right].\end{array}$$
**E-step:**- The posterior probability that ${({{\mathit{x}}_{\mathit{i}}}^{\prime},{y}_{i})}^{\prime}$ comes from the g-th mixture component is calculated at the t-th iteration of the EM algorithm as$$\begin{array}{cc}\hfill {{\tau}_{ig}}^{(t)}& =E\left[{z}_{ig}\right|{({{\mathit{x}}_{\mathit{i}}}^{\prime},{y}_{i})}^{\prime},{\mathsf{\Phi}}^{(t)}]=\frac{{{\tau}_{g}}^{(t)}f({y}_{i}|{\mathit{x}}_{i},{{\mathit{\beta}}_{\mathit{g}}}^{(t)},{{\lambda}_{g}}^{(t)})\varphi ({\mathit{v}}_{i},{\mathit{\mu}}_{g}^{(t)},{{\mathsf{\Sigma}}_{\mathit{g}}}^{(t)})f({\mathit{w}}_{i},{{\gamma}_{g}}^{(t)})}{{\sum}_{{g}^{\prime}=1}^{G}f({y}_{i}|{\mathit{x}}_{i},{\mathit{\beta}}_{{g}^{\prime}}^{(t)},{\lambda}_{{g}^{\prime}}^{(t)})\varphi ({\mathit{v}}_{i},{\mathit{\mu}}_{{g}^{\prime}}^{(t)},{\mathsf{\Sigma}}_{{g}^{\prime}}^{(t)})f({\mathit{w}}_{i},{\gamma}_{{g}^{\prime}}^{(t)}){\tau}_{{g}^{\prime}}^{(t)}}.\hfill \end{array}$$
**M-step:**- The Q-function is maximized with respect to $\mathsf{\Phi}$, which is done separately for each term on the right hand side in Equation (9). As a result, the parameter estimates ${\widehat{\tau}}_{g}$, ${\widehat{{\mathit{\mu}}^{}}}_{g}$, ${\widehat{\mathsf{\Sigma}}}_{g}$, and ${\widehat{\mathit{\gamma}}}_{g}$, are obtained on the $(t+1)$-th iteration of the EM algorithm:$$\begin{array}{cc}\hfill {{\widehat{\tau}}_{g}}^{(t+1)}& =\frac{1}{n}\sum _{i=1}^{n}{\tau}_{ig}^{\left(t\right)}\hfill \\ \hfill {{\widehat{\mathit{\mu}}}_{g}}^{(t+1)}& =\frac{1}{{\sum}_{i=1}^{n}{\tau}_{ig}^{\left(t\right)}}\sum _{i=1}^{n}{\tau}_{ig}^{\left(t\right)}{\mathit{v}}_{i}\hfill \\ \hfill {{\widehat{\mathsf{\Sigma}}}_{g}}^{(t+1)}& =\frac{1}{{\sum}_{i=1}^{n}{\tau}_{ig}^{\left(t\right)}}\sum _{i=1}^{n}{\tau}_{ig}^{\left(t\right)}({\mathit{v}}_{i}-{\widehat{\mathit{\mu}}}_{g}^{(t+1)}){({\mathit{v}}_{i}-{\widehat{\mathit{\mu}}}_{g}^{(t+1)})}^{\prime}\hfill \\ \hfill {\widehat{\mathit{\gamma}}}_{gr}^{(t+1)}& =\frac{{\sum}_{i=1}^{n}{\tau}_{ig}^{\left(t\right)}{v}_{i}^{rs}}{{\sum}_{i=1}^{n}{\tau}_{ig}^{\left(t\right)}},\hfill \end{array}$$$$\begin{array}{c}\hfill \sum _{i=1}^{n}{\tau}_{ig}^{\left(t\right)}logf\left({y}_{i}\right|{\mathit{x}}_{i},{\mathit{\beta}}_{g},{\lambda}_{g}).\end{array}$$
**R**language (R Core Team 2016) in a similar framework as the mixture of generalized linear models are implemented. For additional details about this implementation, the reader is referred to Wedel and De Sabro (1995) and Wedel (2002).

#### 2.3. Model Selection Criterion

## 3. Application

#### 3.1. Data

#### 3.2. OSM Results

#### 3.3. CWM Results

**flexCWM**, developed by Mazza et al. (2017). The log-normal CWM was fitted to the following covariates: driver age, car age, density, and exposure. The model selection procedure based on the AIC and the BIC found three mixture components with their corresponding mixing probabilities as follows: $0.52$, $0.43$, and $0.05$. Table 4 shows the summary results for log-likelihood, AIC, and BIC. The CWM function selects the best model based on the minimum value of BIC. In our analysis, the best model is detected when $G=3$ and these results are shown in bold in Table 4. The number of selected components is consistent with the OSM approach.

## 4. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## Appendix A. Model Fitting

**Table A1.**Results of fitting the OSM (1) for the French motor claims by policy data set.

Coefficient | Estimation | S.E. | 95% C.I. |
---|---|---|---|

${\widehat{\mu}}_{2}$ | 0.551 | 0.148 | (0.261, 0.841) |

${\widehat{\mu}}_{3}$ | −0.219 | 0.171 | (−0.554, 0.116) |

${\widehat{\mu}}_{4}$ | 2.533 | 0.224 | (2.094, 2.972) |

${\widehat{\mu}}_{5}$ | −1.702 | 0.160 | (−2.016, −1.388) |

${\widehat{\alpha}}_{1}$ | 1.096 | 0.210 | (0.684, 1.508) |

${\widehat{\alpha}}_{2}$ | 0.044 | 0.125 | (−0.201, 0.289) |

${\widehat{\beta}}_{1}$ | −2.188 | 0.143 | (−2.468, −1.908) |

${\widehat{\beta}}_{2}$ | −2.631 | 0.199 | (−3.021, −2.241) |

${\widehat{\beta}}_{3}$ | −0.002 | 0.190 | (−0.374, 0.370) |

${\widehat{\beta}}_{4}$ | 1.673 | 0.172 | (1.336, 2.010) |

${\widehat{\varphi}}_{2}$ | 3.636 | 0.209 | (3.226, 4.046) |

${\widehat{\varphi}}_{3}$ | 4.855 | 0.193 | (4.477, 5.233) |

${\widehat{\varphi}}_{4}$ | 4.990 | 0.154 | (4.688, 5.292) |

**Table A2.**Results for the CWM model. The significance of the p-values are shown with the corresponding level of significance as defined $\approx 0$ (***), 0.001 (**), 0.01 (*), and 0.05 (.) for each estimated coefficient.

Cluster 1 | ||||
---|---|---|---|---|

Coefficient | Estimation | S.E. | p-Value | |

Intercept | $7.3496$ | $0.2765$ | <2.2×10${}^{-16}$ | *** |

DriverAge2 | $-0.3275$ | $0.2137$ | $0.1634$ | |

DriverAge3 | $-0.2088$ | $0.1963$ | $0.2877$ | |

DriverAge4 | $-0.0451$ | $0.1925$ | $0.8146$ | |

DriverAge5 | $0.4828$ | $0.2812$ | $0.0863$ | . |

CarAge2 | $-0.1575$ | $0.2119$ | $0.4574$ | |

CarAge3 | $0.0086$ | $0.2001$ | $0.9653$ | |

CarAge4 | $-0.1845$ | $0.2088$ | $0.3770$ | |

CarAge5 | $-0.4929$ | $0.2939$ | $0.0938$ | . |

Density | $0.0004$ | $0.0001$ | 5.008×10${}^{-05}$ | *** |

Exposure | $-0.8287$ | $0.1401$ | 4.332×10${}^{-09}$ | *** |

Cluster 2 | ||||

Coefficient | Estimation | S.E. | p-Value | |

Intercept | $7.0694$ | $0.0088$ | <2.2×10${}^{-16}$ | *** |

DriverAge2 | $-0.0244$ | $0.0084$ | $0.0381$ | ** |

DriverAge3 | $-0.0157$ | $0.0066$ | $0.0177$ | * |

DriverAge4 | $-0.0095$ | $0.0074$ | $0.1412$ | |

DriverAge5 | $0.0008$ | $0.0078$ | $0.9186$ | |

CarAge2 | $0.0051$ | $0.7986$ | $0.4246$ | |

CarAge3 | $0.0118$ | $2.0162$ | $0.0439$ | * |

CarAge4 | $0.0101$ | $0.0060$ | $0.0970$ | . |

CarAge5 | $0.0113$ | $0.0077$ | $0.1440$ | |

Density | 3.2818×10${}^{-06}$ | 3.0435×10${}^{-06}$ | $0.2811$ | |

Exposure | $-0.0051$ | $0.0047$ | $0.2815$ | |

Cluster 3 | ||||

Coefficient | Estimation | S.E. | p-Value | |

Intercept | $3.3979$ | $0.2561$ | <2.2×10${}^{-16}$ | *** |

DriverAge2 | $1.2945$ | $0.1588$ | 8.84×10${}^{-16}$ | *** |

DriverAge3 | $1.2333$ | $0.1382$ | <2.2×10${}^{-16}$ | *** |

DriverAge4 | $1.1096$ | $0.1295$ | <2.2×10${}^{-16}$ | *** |

DriverAge5 | $2.6965$ | $0.2028$ | <2.2×10${}^{-16}$ | *** |

CarAge2 | $0.6748$ | $0.2105$ | $0.0013$ | ** |

CarAge3 | $1.9939$ | $0.18853$ | <2.2×10${}^{-16}$ | *** |

CarAge4 | $1.8501$ | $0.19410$ | <2.2×10${}^{-16}$ | *** |

CarAge5 | $2.7567$ | $0.26130$ | <2.2×10${}^{-16}$ | *** |

Density | 1.7878×10${}^{-04}$ | 4.3673×10${}^{-05}$ | 4.520×10${}^{-05}$ | *** |

Exposure | 6.2711×10${}^{-02}$ | $0.5266$ | $0.5985$ |

## Appendix B. Average Scores for Scatter Plots

## References

- Agresti, Alan. 2010. Analysis of Ordinal Categorical Data, 2nd ed. Wiley Series in Probability and Statistics. New York: Wiley. [Google Scholar]
- Akaike, Hirotugu. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716–23. [Google Scholar] [CrossRef]
- Anderson, John A. 1984. Regression and ordered categorical variables. Journal of the Royal Statistical Society Series B (Methodological) 46: 1–30. [Google Scholar]
- Baribeau, Annmarie Geddes. 2016. Predictive modeling: The quest for data gold. Actuarial Review. Available online: https://www.casact.org/pubs/New-AR/AR_Nov-Dec_2016.pdf (accessed on 7 March 2018).
- Bermúdez, Lluís, and Dimitris Karlis. 2012. A finite mixture of bivariate poisson regression models with an application to insurance ratemaking. Computational Statistics & Data Analysis 56: 3988–99. [Google Scholar]
- Böhning, Dankmar, Wilfried Seidel, Marco Alfò, Bernard Garel, Valentin Patilea, and Günther Walther. 2007. Advances in mixture models. Computational Statistics & Data Analysis 51: 5205–10. [Google Scholar]
- Brown, Garfield O., and Winston S. Buckley. 2015. Experience rating with poisson mixtures. Annals of Actuarial Science 9: 304–21. [Google Scholar] [CrossRef]
- Charpentier, Arthur. 2014. Computational Actuarial Science with R. Boca Raton: CRC Press. [Google Scholar]
- Chen, Lien-Chin, Philip S. Yu, and Vincent S. Tseng. 2011. A weighted fuzzy-based biclustering method for gene expression data. International Journal of Data Mining and Bioinformatics 5: 89–109. [Google Scholar] [CrossRef] [PubMed]
- Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM-algorithm. Journal of the Royal Statistical Society B 39: 1–38. [Google Scholar]
- Dutang, Christophe, and Arthur Charpentier. 2016. CASdatasets, R package version 1.0-6; Available online: http://cas.uqam.ca/ (accessed on 6 March 2018).
- Everitt, Brian S., Morven Leese, and Sabine Landau. 2001. Cluster Analysis, 4th ed. London: Hodder Arnold Publication. [Google Scholar]
- Fernández, Daniel, Richard Arnold, and Shirley Pledger. 2016. Mixture-based clustering for the ordered stereotype model. Computational Statistics & Data Analysis 93: 46–75. [Google Scholar]
- Fernández, Daniel, Shirley Pledger, and Richard Arnold. 2014. Introducing Spaced Mosaic Plots. Research Report Series 14-3; Wellington: School of Mathematics, Statistics and Operations Research, VUW, ISSN 1174-2011. [Google Scholar]
- Fraley, Chris, and Adrian E. Raftery. 2002. Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association 97: 611–31. [Google Scholar] [CrossRef]
- Garrido, José, Christian Genest, and Juliana Schulz. 2016. Generalized linear models for dependent frequency and severity of insurance claims. Insurance: Mathematics and Economics 70: 205–15. [Google Scholar] [CrossRef]
- Gershenfeld, Neil. 1997. Nonlinear inference and cluster-weighted modeling. Annals of the New York Academy of Sciences 808: 18–24. [Google Scholar] [CrossRef]
- Green, Peter J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–32. [Google Scholar] [CrossRef]
- Grün, Bettina, and Friedrich Leisch. 2008. Finite Mixtures of Generalized Linear Regression Models. Berlin: Springer. [Google Scholar]
- Hubert, Lawrence, and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2: 193–218. [Google Scholar] [CrossRef]
- Ingrassia, Salvatore, Antonio Punzo, Giorgio Vittadini, and Simona C. Minotti. 2015. The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification 32: 85–113. [Google Scholar] [CrossRef]
- Jobson, John D. 1992. Applied Multivariate Data Analysis: Categorical and Multivariate Methods. Springer Texts in Statistics. Berlin: Springer. [Google Scholar]
- Johnson, Stephen C. 1967. Hierarchical clustering schemes. Psychometrika 2: 241–54. [Google Scholar] [CrossRef]
- Kaufman, Leonard, and Peter J. Rousseeuw. 1990. Finding Groups in Data an Introduction to Cluster Analysis. New York: Wiley. [Google Scholar]
- Klugman, Stuart, and Jacques Rioux. 2006. Toward a unified approach to fitting loss models. North American Actuarial Journal 10: 63–83. [Google Scholar] [CrossRef]
- Kraskov, Alexander, Harald Stögbauer, Ralph Gregor Andrzejak, and Peter Grassberger. 2005. Hierarchical clustering using mutual information. EPL (Europhysics Letters) 70: 278–84. [Google Scholar] [CrossRef]
- Lee, Simon C. K., and X. Sheldon Lin. 2010. Modeling and evaluating insurance losses via mixtures of Erlang distributions. North American Actuarial Journal 14: 107–30. [Google Scholar] [CrossRef]
- Lewis, S. J. G., Thomas Foltynie, Andrew D. Blackwell, Trevor W. Robbins, Adrian M. Owen, and Roger A. Barker. 2003. Heterogeneity of parkinson’s disease in the early clinical stages using a data driven approach. Journal of Neurology, Neurosurgery and Psychiatry 76: 343–48. [Google Scholar] [CrossRef] [PubMed]
- Liu, Ivy, and Alan Agresti. 2005. The analysis of ordered categorical data: An overview and a survey of recent developments. TEST: An Official Journal of the Spanish Society of Statistics and Operations Research 14: 1–73. [Google Scholar] [CrossRef]
- Manly, Bryan F.J. 2005. Multivariate Statistical Methods: A Primer. Boca Raton: Chapman & Hall/CRC Press. [Google Scholar]
- Mazza, Angelo, Antonio Punzo, and Salvatore Ingrassia. 2017. flexCWM, R package version 1.7; Available online: https://cran.r-project.org/web/packages/flexCWM/index.html (accessed on 6 March 2018).
- McCullagh, Peter. 1980. Regression models for ordinal data. Journal of the Royal Statistical Society 42: 109–42. [Google Scholar]
- McCullagh, Peter, and John A Nelder. 1989. Generalized Linear Models, 2nd ed. London: Chapman & Hall. [Google Scholar]
- McCune, Bruce, and James B. Grace. 2002. Analysis of Ecological Communities. Gleneden Beach: MjM Software Design, vol. 28. [Google Scholar]
- McLachlan, Geoffrey, and David Peel. 2004. Finite Mixture Models. Hobuken: John Wiley & Sons. [Google Scholar]
- McLachlan, Geoffrey J., and Kaye E. Basford. 1988. Mixture Models: Inference and Applications to Clustering. Statistics, Textbooks and Monographs. New York: M. Dekker. [Google Scholar]
- Meila, Marina. 2005. Comparing clusterings: An axiomatic view. Paper presented at the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, August 7–11; pp. 577–84. [Google Scholar]
- Melnykov, Volodymyr, and Ranjan Maitra. 2010. Finite mixture models and model-based clustering. Statistics Surveys 4: 80–116. [Google Scholar] [CrossRef]
- Miljkovic, Tatjana, and Bettina Grün. 2016. Modeling loss data using mixtures of distributions. Insurance: Mathematics and Economics 70: 387–96. [Google Scholar] [CrossRef]
- Miljkovic, T. 2017. Computational Actuarial Science With R. Journal of Risk and Insurance 84: 267. [Google Scholar]
- Nelder, John Ashworth, and Robert W. M. Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society Series A (General) 135: 370–84. [Google Scholar] [CrossRef]
- Pledger, Shirley, and Richard Arnold. 2014. Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection. Computational Statistics and Data Analysis 71: 241–61. [Google Scholar] [CrossRef]
- Quinn, Gerry P., and Michael J. Keough. 2002. Experimental Design and Data Analysis for Biologists. Cambridge: Cambridge University Press. [Google Scholar]
- Team, R. Core. 2016. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
- Schwarz, Gideon. 1978. Estimating the dimension of a model. The Annals of Statistics 6: 461–64. [Google Scholar] [CrossRef]
- Shi, Peng, Xiaoping Feng, and Anastasia Ivantsova. 2015. Dependent frequency severity modeling of insurance claims. Insurance: Mathematics and Economics 64: 417–28. [Google Scholar] [CrossRef]
- Verbelen, Roel, Lan Gong, Katrien Antonio, Andrei Badescu, and Sheldon Lin. 2014. Fitting mixtures of Erlangs to censored and truncated data using the EM algorithm. ASTIN Bulltin 45: 729–58. [Google Scholar] [CrossRef] [Green Version]
- Wedel, Michel. 2002. Concominat variables in finite mixture modeling. Statistica Neerlandica 56: 362–75. [Google Scholar] [CrossRef]
- Wedel, Michel, and Wayne S. DeSarbo. 1995. A mixture likelihood approach for generalized linear models. Journal of Classification 12: 21–55. [Google Scholar] [CrossRef]
- Werner, Geoff, and Claudine Modlin. 2016. Basic Ratemaking. Arlington: Casualty Actuarial Society. [Google Scholar]
- Wu, Han-Ming, ShengLi Tzeng, and Chun-houh Chen. 2007. Matrix visualization. In Handbook of Data Visualization. Berlin: Springer, pp. 681–708. [Google Scholar]
- Wu, Xindong, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, and et al. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14: 1–37. [Google Scholar] [CrossRef]

**Figure 1.**Scatter plot depicting the clustering composition for $R=3$ (

**left**) claim clusters. Different color and shape symbols represent the clusters: Cluster 1 (square), Cluster 2 (circle), and Cluster 3 (triangle). The bar plot (

**right**) displays the profile of the claims in each cluster. The percentage represents the probability ${\theta}_{gjk}$ in each category (Equation (2)).

**Figure 2.**Spaced mosaic plot for the row clustering model $G=3$. The height of each block is proportional to the number of claims in each claim cluster; the width is proportional to the numbers of each ordinal response within each cluster. The area represents the frequency of each combination, also shown numerically in each block. The relative spacing between ordinal categories (e.g., 2.636 between 0 and 1, shown by the yellow, red, and green bars) has been determined by the data.

**Figure 3.**The bar plot (

**left**) displays the profile of the losses in each cluster G = 1:3, by driver age. The bar plot (

**right**) displays the profile of the losses in each cluster G = 1:3, by car age.

**Table 1.**Summary of the variables used in the cluster-weighted model (CWM) and the ordered stereotype model (OSM).

CWM | |

Variable Name | Description with Categorical Levels in Parenthesis |

Driver Age | <23 (1), [23, 27) (2), [27, 43) (3), [43, 75) (4), and [75+ (5) |

Car Age | <1 (1), [1, 5) (2), [5, 10) (3), [10, 15) (4), and 15+ (5) |

Density | continuous |

Exposure | continuous |

Losses | continuous |

OSM | |

Variable Name | Description with Ordinal Levels in Parenthesis |

Driver Age | <23 (5), [23, 27) (4), [27, 43) (3), [43, 75) (2), and [75+ (1) |

Car Age | <1 (1), [1, 5) (2), [5, 10) (3), [10, 15) (4), and 15+ (5) |

Exposure | <0.25 (1), [0.25, 0.50) (2), [0.50, 0.75) (3), [0.75, 1.00) (4), and >1.00+(5) |

Density | <40 (1), [40, 200) (2), [200, 500) (3), [500, 4500) (4), and 4500+ (5) |

Losses | <1000 (1), [1000, 2000) (2), [2000, 50,000) (3), [50,000, 100,000) (4), and 100,000+ (5) |

G | Loglik | AIC | BIC |
---|---|---|---|

1 | −12,155 | 24,453 | 24,599 |

2 | −12,081 | 24,188 | 24,276 |

3 | −11,777 | 23,584 | 23,685 |

4 | −12,773 | 25,580 | 25,695 |

5 | −12,851 | 25,641 | 25769 |

G | Loss | Driver Age | Exposure | Car Age | Density |
---|---|---|---|---|---|

1 | $3.22$ | $4.63$ | $4.52$ | $4.57$ | $3.57$ |

2 | $1.95$ | $4.81$ | $3.96$ | $4.38$ | $1.88$ |

3 | $1.78$ | $4.90$ | $3.20$ | $3.23$ | $1.94$ |

G | Loglik | AIC | BIC |
---|---|---|---|

1 | −12,495 | 25,025 | 25,112 |

2 | −11,956 | 23,229 | 23,394 |

3 | −11,064 | 22,222 | 22,464 |

4 | −10,801 | 22,200 | 22,519 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Miljkovic, T.; Fernández, D.
On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio. *Risks* **2018**, *6*, 57.
https://doi.org/10.3390/risks6020057

**AMA Style**

Miljkovic T, Fernández D.
On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio. *Risks*. 2018; 6(2):57.
https://doi.org/10.3390/risks6020057

**Chicago/Turabian Style**

Miljkovic, Tatjana, and Daniel Fernández.
2018. "On Two Mixture-Based Clustering Approaches Used in Modeling an Insurance Portfolio" *Risks* 6, no. 2: 57.
https://doi.org/10.3390/risks6020057