Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions

Tzougas, George; Hong, Natalia; Ho, Ryan

doi:10.3390/a15010016

Open AccessArticle

Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions

by

George Tzougas

^1,2,*

,

Natalia Hong

^1,† and

Ryan Ho

^1,†

¹

Department of Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh TD1 3HE, UK

²

Department of Statistics, London School of Economics and Political Science, London TD1 3HE, UK

^*

Author to whom correspondence should be addressed.

^†

Work undertaken whilst at the London School of Economics.

Algorithms 2022, 15(1), 16; https://doi.org/10.3390/a15010016

Submission received: 1 November 2021 / Revised: 16 December 2021 / Accepted: 28 December 2021 / Published: 30 December 2021

(This article belongs to the Special Issue Stochastic Algorithms and Their Applications)

Download

Browse Figure

Versions Notes

Abstract

:

In this article we present a class of mixed Poisson regression models with varying dispersion arising from non-conjugate to the Poisson mixing distributions for modelling overdispersed claim counts in non-life insurance. The proposed family of models combined with the adopted modelling framework can provide sufficient flexibility for dealing with different levels of overdispersion. For illustrative purposes, the Poisson-lognormal regression model with regression structures on both its mean and dispersion parameters is employed for modelling claim count data from a motor insurance portfolio. Maximum likelihood estimation is carried out via an expectation-maximization type algorithm, which is developed for the proposed family of models and is demonstrated to perform satisfactorily.

Keywords:

EM algorithm; regression structures on the mean and dispersion parameters; non-life insurance; claim frequency

1. Introduction

During the last three decades, mixed Poisson regression models have been applied in various fields of studies, including non-life insurance for modelling overdispersed claim count data. The members of this family of models, which can be constructed based on a mixing distribution which is conjugate to the Poisson distribution, such as the negative binomial and the Poisson–inverse-Gaussian, have been the most popular choices due to the simplicity of their log-likelihood functions, which can be easily maximized using the standard maximum likelihood (ML) estimation approach. See, for example, Refs. [1,2,3] for the former and [4,5,6] for the latter, among many others. However, it should be noted that the assumption of conjugancy can be very restrictive for constructing a mixed Poisson model that will be able to efficiently capture different levels of overdispersion in real claim count data sets. In particular, as is well known, overdispersion is a direct consequence of unobserved heterogeneity due to systematic effects in the data. For example, in motor insurance, the driving skills, preferences, habits, and driving experience of policyholders, which differ, may lead to extra variation in the claim count data, the degree of which is controlled by the value of the dispersion parameter of the mixed Poisson model. Furthermore, overdispersion can either be caused by a large presence of zeros or a heavy tail in the data. Regarding the latter case, as is well known, the tails of mixed Poisson distributions in the case of continuous mixing distributions are similar to the tails of their mixing distributions (see, for instance, Ref. [7]). Therefore, restricting attention only to mixed Poisson models which are derived based on conjugate to the Poisson mixing densities may result in biased parameter estimates because of their inability to always efficiently model the tail of claim count distribution. This, in turn, may have a profound impact on many tasks which are carried out by the actuaries, such as risk management and pricing of (re-)insurance contracts. Thus, it becomes clear that an important task of actuaries is to be able to design more representative probabilistic models for the number of claims with good prediction accuracy. This procedure depends on the reliability of the statistical method which will be used to construct them.

The aim of this article is to present a general class of mixed Poisson regression models with varying dispersion stemming from non-conjugate

C^{2}

densities with a continuous first derivative and a continuous second derivative. The class of mixed Poisson models we consider is very wide and the flexibility it provides in (i) the distributional choice for the mixing density and (ii) modelling jointly the mean and dispersion parameters as parametric functions of risk factors allows us to add the required amount of weight to the right tail area of the claim count distribution for accommodating different levels of overdispersion, thus resulting in an improved risk evaluation. At this point, it should be noted that, with the exception of very few articles, such as those by [8,9,10], modelling jointly all the parameters of mixed Poisson models in terms of explanatory variables has not been explored in depth. Nevertheless, mean regression models often cannot adequately account for the heteroscedasticity of the claim count distribution or its possible dependence on risk factors. In addition, note that in [9], the exponential family distribution assumption for the univariate response variable is relaxed and replaced by a general distribution family, including distributions based on Box–Cox transformations (such as the Box–Cox t-distribution or the Box–Cox power exponential distribution) and zero adjusted-distributions. However, to the best of our knowledge, this is the first study to consider regression structures on the mean and dispersion parameters of univariate mixed Poisson models that have a probability mass function (pmf) which cannot be written in closed-form expressions. For demonstration purposes, the Poisson-lognormal (PLN) regression model with varying dispersion is fitted on a motor third-party liability (MTPL) insurance claim count data set using an expectation-maximization (EM) type algorithm, which takes advantage of the stochastic mixture representation of the proposed family of models which have a density that cannot be written in closed form in an easy and efficient manner. Moreover, it is worth noting that the development of stochastic algorithms, such as the EM and stochastic gradient descent algorithms, is of particular importance in machine learning and artificial intelligence applications, as they can be employed for efficiently calibrating various statistical models and deep networks. Two very interesting recent articles are those by [11,12].

The rest of this article is structured as follows. In Section 2, we present the derivation of the proposed mixed Poisson regression model with varying dispersion for claim counts. Section 3 deals with the ML estimation procedure for the PLN regression model with varying dispersion based on the proposed EM-type algorithm. Section 4 contains an application to the MTPL claim count data, and we fit the PLN claim count regression model with varying dispersion. In addition, the negative binomial regression model with varying dispersion and the zero-inflated (ZIP) Poisson regression model are used as benchmarks for comparison. Finally, concluding remarks are provided in Section 5.

2. Mixed Poisson Regression Model with Varying Dispersion

2.1. Modelling Framework

Consider a non-life insurance portfolio with n policyholder contracts each involving a particular claim type, and assume that the individual claim frequencies,

k_{i}

, arising from each insured i, for

i = 1, \dots, n

, are independent. In addition, suppose that given a continuous random variable,

z_{i} > 0

,

k_{i} | z_{i}

is distributed according to a Poisson distribution with probability mass function (pmf) given by

P (k_{i} | z_{i}) = \frac{exp (- μ_{i} z_{i}) {(μ_{i} z_{i})}^{k_{i}}}{k_{i}!},

(1)

for

k_{i} = 0, 1, 2, \dots

, and where

μ_{i} > 0

. The mean and variance of

k_{i} | z_{i}

are

E (k_{i} | z_{i}) = Var (k_{i} | z_{i}) =

μ_{i} z_{i}

.

Furthermore, consider that

z_{i}

is distributed according to a

C^{2}

mixing distribution, with probability density function (pdf)

g (z_{i}; ϕ_{i})

, with

ϕ_{i} > 0

, which is not conjugate to the Poisson distribution given by Equation (1). Additionally, we assume that

z_{i}

has a unit mean, that is,

E (z_{i}) = 1

, to ensure that the model is identifiable.

Considering the previous assumptions, we see that the resulting distribution of

k_{i}

is a mixed Poisson distribution with pmf

P (k_{i}) = \int_{0}^{\infty} P (k_{i} | z_{i}) g (z_{i}; ϕ_{i}) d z_{i},

(2)

where

g (z_{i}; ϕ_{i})

is the pdf of

z_{i}

. In addition, the mean and the variance of

k_{i}

are given by

E (k_{i}) = E_{z_{i}} [E (k_{i} | z_{i})] = μ_{i} E_{z_{i}} [z_{i}] = μ_{i}

(3)

and

Var (k_{i}) = E_{z_{i}} [Var (k_{i} | z_{i})] + {Var}_{z_{i}} [E (k_{i} | z_{i})] .

(4)

Finally, under the proposed modelling framework, the mean and dispersion parameters,

μ_{i}

and

ϕ_{i}

, are modelled as functions of risk factors

μ_{i} = exp (x_{1, i}^{⊤} β_{1})

(5)

and

ϕ_{i} = exp (x_{2, i}^{⊤} β_{2}),

(6)

where

x_{1, i}

and

x_{2, i}

are the, potentially different, vectors of explanatory variables with dimensions

p_{1} \times 1

and

p_{2} \times 1

, respectively, and where

β_{1} = {(β_{1, 1}, \dots, β_{1, p_{1}})}^{⊤}

and

β_{2} = {(β_{2, 1}, \dots, β_{2, p_{2}})}^{⊤}

are vectors of regression coefficients, where we consider that the matrices

X_{1}

and

X_{2}

are of full rank and are composed of the rows given by

x_{1, i}

and

x_{2, i}

, respectively.

2.2. Model Specification: The Poisson-Lognormal Regression Model with Varying Dispersion

For expository purposes, we specify the lognormal distribution as the mixing distribution of

z_{i}

with the following pdf

g (z_{i}; ϕ_{i}) = \frac{1}{\sqrt{2 π} ϕ_{i} z_{i}} exp [- \frac{1}{2 ϕ_{i}^{2}} {(log (z_{i}) + \frac{ϕ_{i}^{2}}{2})}^{2}],

(7)

where

z_{i} > 0

and

ϕ_{i} > 0

, with mean

E (z_{i}) = 1

and variance

V a r (z_{i}) = exp (ϕ_{i}^{2}) - 1

.

Then, based on Equations (1) and (7), it is easy to see that the resulting distribution of

k_{i}

is the Poisson-lognormal (PLN) distribution with pmf

P (k_{i}) = \int_{0}^{\infty} \frac{exp (- μ_{i} z_{i}) {(μ_{i} z_{i})}^{k_{i}}}{k_{i}!} \frac{exp [- \frac{1}{2 ϕ_{i}^{2}} {(log (z_{i}) + \frac{ϕ_{i}^{2}}{2})}^{2}]}{\sqrt{2 π} ϕ_{i} z_{i}} d z_{i},

(8)

where

μ_{i} > 0

and

ϕ_{i} > 0

are given by Equations (5) and (6), respectively. Unfortunately, since

g (z_{i}; ϕ_{i})

is not conjugate to the Poisson, the integral in Equation (8) is mathematically intractable, but it can be easily calculated using numerical integration.

Using the results in Equations (3) and (4), we calculate the mean and the variance of the PLN regression model with varying dispersion

E (k_{i}) = E_{z_{i}} [E (k_{i} | z_{i})] = μ_{i} E_{z_{i}} [z_{i}] = μ_{i}

(9)

and

\begin{matrix} \begin{matrix} Var (k_{i}) & = E_{Z_{i}} [Var (k_{i} | Z_{i} = z_{i})] + {Var}_{Z_{i}} [E (k_{i} | Z_{i} = z_{i})] \\ = μ_{i} + μ_{i}^{2} [exp (ϕ_{i}^{2}) - 1] . \end{matrix} \end{matrix}

(10)

3. Statistical Inference: The EM-Type Algorithm

Let

(k_{i}, x_{1, i}, x_{2, i})

,

i = 1, \dots, n,

be a sample of independent observations, where

k_{i}

is the response variable, and

x_{1, i}

and

x_{2, i}

are the vectors of explanatory variables with dimensions

p_{1} \times 1

and

p_{2} \times 1

, respectively. In addition, suppose that the data are produced according to the mixed Poisson model with varying dispersion. Then, the log-likelihood of the model can be written as

l (θ) = \sum_{i = 1}^{n} log (P (k_{i})),

(11)

where

θ = {(β_{1}^{⊤}, β_{2}^{⊤})}^{⊤}

is the vector of the parameters, and where

P (k_{i})

is the pmf of the model which is given by Equation (2). It should be noted that the likelihood given by Equation (11) is cumbersome to maximize since it is not usually tractable. Moreover, when the mean and dispersion parameters are modelled in terms of risk factors, additional computational challenges can be encountered.

However, ML estimation can be accomplished relatively easily via an EM-type algorithm. In particular, if the unobserved data

z_{i}

are augmented to the observed data

(k_{i}, x_{1, i}, x_{2, i})

, for

i = 1, \dots, n

, then the complete data log-likelihood factorizes into two parts

l_{c} (θ) = \sum_{i = 1}^{n} [- μ_{i} z_{i} + k_{i} log (μ_{i}) + k_{i} log (z_{i}) - log (k_{i}!)] + \sum_{i = 1}^{n} log (g (z_{i}; ϕ_{i}))

(12)

where

g (z_{i}; ϕ i)

is the pdf of the mixing distribution which is not conjugate to the Poisson, and where

μ_{i}

and

ϕ_{i}

are given by Equations (5) and (6) respectively.

The EM-type algorithm for the mixed Poisson regression model with varying dispersion can be described as follows

E-Step: The Q-function, which is the conditional expectation of the complete data log-likelihood, is given by

$\begin{matrix} Q (θ; θ^{(r)}) & = E_{z_{i}} (l_{c} (θ) | K, θ^{(r)}) \propto \end{matrix}$

(13)

$\begin{matrix} \sum_{i = 1}^{n} [- μ_{i}^{(r)} E_{z_{i}} (z_{i} | k_{i}; θ^{(r)}) + k_{i} log (μ_{i}^{(r)})] \end{matrix}$

(14)

$\begin{matrix} + \sum_{i = 1}^{n} E_{z_{i}} [log (g (z_{i}; ϕ_{i}^{(r)}))] \end{matrix}$

(15)

where $θ^{(r)}$ is the estimate of $θ$ at the rth iteration in the E-step of our EM algorithm. Then, using the estimates $θ^{(r)}$ , calculate the pseudo-values $w_{1_{i}} = E_{z_{i}} (z_{i} | k_{i}; θ^{(r)})$ and $w_{k_{i}} = E_{z_{i}} (s_{k} (Z_{i}) | k_{i}; θ^{(r)})$ for $i = 1, \dots, n$ and $k = 1, \dots, ν$ , where $s_{k} (.)$ are certain functions which are involved in the terms needed for maximizing the part of the Q-function which corresponds to the conditional expectation of the log-likelihood of the mixing distribution $g (z_{i}; ϕ_{i})$ .
M-Step: Using the pseudo-values $w_{1_{i}}$ and $w_{k_{i}}$ from the E-Step and the Newton–Raphson algorithm twice, find the maximum global point $θ^{(r + 1)}$ of the Q-function, that is, obtain the updated estimates $β_{1}^{(r + 1)}$ and $β_{2}^{(r + 1)}$ .
-
Firstly, taking the necessary derivatives of the Q-function with respect to $β_{1}$ , we obtain the following results

$h_{1} (β_{1}) = \frac{\partial Q (θ; θ^{(r)})}{\partial β_{1, j}} = \sum_{i = 1}^{n} (- μ_{i}^{(r)} w_{1_{i}} + k_{i}) x_{1, i j},$

(16)

and

$H_{1} (β_{1}) = \frac{\partial^{2} Q (θ; θ^{(r)})}{\partial β_{1, j} \partial β_{1, j}^{⊤}} = \sum_{i = 1}^{n} (- μ_{i}^{(r)} w_{1_{i}}) x_{1, i j} x_{1, i j}^{⊤} = X_{1}^{⊤} W_{1} X_{1},$

(17)

for $i = 1, \dots, n$ and $j = 1, \dots, p_{1}$ , and where $W_{1} = diag \{- \frac{k_{i}}{μ_{i}^{(r)}} w_{1_{i}}\} .$
Then, the iterative procedure for the Newton–Raphson algorithm for $β_{1}$ goes as follows

$β_{1}^{(r + 1)} \equiv β_{1}^{(r)} - {[H_{1} (β_{1}^{(r)})]}^{- 1} h_{1} (β_{1}^{(r)}) .$

(18)

-
Secondly, differentiating the Q-function with respect to $β_{2}$ gives

$h_{2} (β_{2}) = \frac{\partial Q (θ; θ^{(r)})}{\partial β_{2, l}} = \frac{\partial E_{z_{i}} [log g (z_{i}; ϕ_{i}^{(r)})]}{\partial β_{2, l}}$

(19)

and

$H_{2} (β_{2}) = \frac{\partial^{2} Q (θ; θ^{(r)})}{\partial β_{2, l} \partial β_{2, l}^{⊤}} = \frac{\partial E_{z_{i}} [log g (z_{i}; ϕ_{i}^{(r)}]}{\partial β_{2, l} \partial β_{2, l}^{⊤}},$

(20)

where for computing $h_{1} (β_{2})$ and $H_{2} (β_{2})$ , we need to use the pseudo-values $w_{k_{i}}$ for $i = 1, \dots, n$ and $k = 1, \dots, ν$ , because in this case, the maximization of the Q-function reduces to the maximization of the conditional expectation of the log-likelihood of the mixing distribution $g (z_{i}; ϕ_{i})$ .
Then, the Newton–Raphson iterative algorithm for $β_{2}$ is as follows

$β_{2}^{(r + 1)} = β_{2}^{(r)} - {[H_{2} (β_{2}^{(r)})]}^{- 1} h_{2} (β_{2}^{(r)}),$

(21)

for $i = 1, \dots, n$ and $l = 1, \dots, p_{2}$ .
Finally, iterate between the E- and the M-Steps until some convergence criterion is satisfied, for instance

$|\frac{l^{(r + 1)} - l^{(r)}}{l^{(r)}}| < t o l,$

(22)

where $l^{(r)}$ is the value of the log-likelihood after the r-th iteration, and where $t o l$ is a small number usually of the form $10^{- m}$ , where $m \in$ $Z^{+} .$ The stopping criterion refers to the progress of the likelihood function (i.e., its convergence). If the stopping criterion is satisfied, the EM algorithm stops iterating, and the estimate of $θ$ is $θ^{(r + 1)}$ . Otherwise, $θ$ is updated by $θ^{(r + 1)}$ , and the algorithm returns to the E-step.

EM Estimation for the PLN Regression Model with Varying Dispersion

In this section, we implement the EM algorithm for finding the ML estimates of the parameters of the PLN regression model with varying dispersion (Algorithm 1). The complete data log-likelihood of the model is given by

\begin{matrix} l_{c} (θ) & = & \sum_{i = 1}^{n} [- μ_{i} z_{i} + k_{i} log (μ_{i}) + k_{i} log (z_{i}) - log (k_{i}!)] + \\ \sum_{i = 1}^{n} [- \frac{1}{2} log (2 π) - log (ϕ_{i}) - log (z_{i}) - \frac{1}{2 ϕ_{i}^{2}} {(log (z_{i}) + \frac{ϕ_{i}^{2}}{2})}^{2}], \end{matrix}

(23)

for

i = 1, \dots, n

. Thus, the posterior expectations needed for the E-step are

E_{z_{i}} [z_{i} | k_{i}; θ^{(r)}]

and

E_{z_{i}} [{(log (z_{i}))}^{2} | k_{i}; θ^{(r)}]

, while at the M-step one needs to maximize the expected value of

l_{c} (θ)

with respect to

θ

. In particular, more formally, the EM-type algorithm can be written as follows.

Algorithm 1 EM Algorithm for the PLN Regression Model with Varying Dispersion

1.: Provide initial values $θ^{(0)}$ = $(β_{1^{0}}, β_{2}^{0})$ .
2.: (E-step) Update the conditional expectations $w_{1_{i}}$ = $E [z_{i} | k_{i}; θ^{(r)}]$ and $w_{2_{i}}$ = $E [log {(z_{i})}^{2} | k_{i}; θ^{(r)}]$ using $θ^{(r)}$ , for $i = 1, \dots, n$ from the rth iteration.
3.: (M-step) Find the maximum global point, $θ^{(r + 1)}$ , of the log-likelihood function $Q (θ; θ^{(r)})$ .
4.: If the criterion $|\frac{l^{(r + 1)} - l^{(r)}}{l^{(r)}}| < t o l$ is satisfied, the estimate of $θ$ is $θ^{(r + 1)}$ . Otherwise, update $θ^{(r)}$ by $θ^{(r + 1)}$ and return to step 2.

E-Step:
Calculate, for all $i = 1, \dots, n$ ,

$\begin{matrix} w_{1_{i}} & = & E [z_{i} | k_{i}; θ^{(r)}] \\ = & \frac{\int_{0}^{\infty} z_{i}^{k_{i}} exp [- \frac{1}{2 {(ϕ_{i}^{2})}^{(r)}} {(log (z_{i}) + \frac{{(ϕ_{i}^{2})}^{(r)}}{2})}^{2} - μ_{i}^{(r) z_{i}}] d z_{i}}{\int_{0}^{\infty} z_{i}^{k_{i} - 1} exp [- \frac{1}{2 {(ϕ_{i}^{2})}^{(r)}} {(log (z_{i}) + \frac{{(ϕ_{i}^{2})}^{(r)}}{2})}^{2} - μ_{i}^{(r) z_{i}}] d z_{i}} \end{matrix}$

(24)

and

$\begin{matrix} w_{2_{i}} & = & E [log {(z_{i})}^{2} | k_{i}; θ^{(r)}] \\ = & \frac{\int_{0}^{\infty} {(log (z_{i}))}^{2} z_{i}^{k_{i} - 1} exp [- \frac{1}{2 {(ϕ_{i}^{2})}^{(r)}} {(log (z_{i}) + \frac{{(ϕ_{i}^{2})}^{(r)}}{2})}^{2} - μ_{i}^{(r) z_{i}}] d z_{i}}{\int_{0}^{\infty} z_{i}^{k_{i} - 1} exp [- \frac{1}{2 {(ϕ_{i}^{2})}^{(r)}} {(log (z_{i}) + \frac{{(ϕ_{i}^{2})}^{(r)}}{2})}^{2} - μ_{i}^{(r) z_{i}}] d z_{i}} \end{matrix}$

(25)

where $μ_{i}^{(r)} = exp (x_{1, i}^{⊤} β_{1}^{(r)})$ and $ϕ_{i}^{(r)} = exp (x_{2, i}^{⊤} β_{2}^{(r)})$ .
Note that the expectations in Equations (24) and (25) can be evaluated numerically. Alternatively, a Monte Carlo approach can be used based on a rejection algorithm, leading to variants of the EM algorithm, such as the Monte Carlo EM (MCEM) algorithm, which do not rely on the pdf $g (k_{i} | z_{i})$ , that cannot be written in closed form, but it is sufficent to simulate from the posterior distribution $g (z_{i} | k_{i}, x_{1, i}, x_{2, i}) .$
M-Step:
-
Firstly, the regression parameters $β_{1}$ are updated using the pseudo-values $w_{1_{i}}$ , which are given by Equation (24), and the Newton–Raphson algorithm, which is given in Equations (16)–(18).
-
Secondly, the regression parameters $β_{2}$ are updated using the pseudo-values $w_{1_{i}}$ and $w_{2_{i}}$ , which are given by Equations (24) and (25), respectively, and the Newton–Raphson algorithm, which, in the case of the lognormal mixing distribution, is as follows

$h_{2} (β_{2}) = [\frac{w_{2_{i}}}{{(ϕ_{i}^{2})}^{(r)}} - \frac{{(ϕ_{i}^{2})}^{(r)}}{4} - 1] x_{2, i l},$

(26)

and

$H_{2} (β_{2}) = \sum_{i = 1}^{n} [\frac{- 2 w_{2_{i}}}{{(ϕ_{i}^{2})}^{(r)}} - \frac{{(ϕ_{i}^{2})}^{(r)}}{2}] x_{2, i l} x_{2, i l}^{⊤} = X_{2}^{⊤} W_{2} X_{2},$

(27)

for $i = 1, \dots, n$ and $l = 1, \dots, p_{2}$ , and where $W_{2} = diag \{\frac{- 2 ω_{_{2, i}}}{{(ϕ_{i}^{2})}^{(r)}} - \frac{{(ϕ_{i}^{2})}^{(r)}}{2}\} .$
Then, we can obtain the updated estimates of $β_{2}^{(r)}$ as follows

$β_{2}^{(r + 1)} \equiv β_{2}^{(r)} - {[H_{2} (β_{2}^{(r)})]}^{- 1} h_{2} (β_{2}^{(r)}) .$

(28)

4. Numerical Illustration

This study was based on a subset of claim frequency data from a pool of MTPL insurance policies observed for 3.5 years from a major Greek insurance company. A total of 14,143 observations with complete records (i.e., with availability of all the explanatory variables) were taken for our analysis. The response variable is the number of claims at fault registered for each insured vehicle. In addition, a subset of explanatory variables with the highest predictive power for the response variable was chosen based on exploratory analysis. In particular, we considered the following covariates: the age of the driver (AD), the horsepower (HP) of their car, and the age of their car (AC). Additionally, we grouped the levels of each a priori rating variable with respect to risk profiles with similar claim frequency in order to balance homogeneity as well as sufficiency of the volume of data in each cell.

The summary of the explanatory variables and their corresponding groupings with the number of observations in each category along with the descriptive statistics for claim counts are shown in Table 1.

In the following subsection, we fit the Poisson-lognormal (PLN) regression model on the number of claims. Moreover, we will compare its fit with those of the classic negative binomial type I (NBI) distribution, which has been used in an abundance of actuarial settings for approximating claim counts, for the case when regression components are introduced on its mean and dispersion parameters. Finally, the high presence of zeros in the MTPL data set motivates the use of zero-inflated models, which can provide a parsimonious yet powerful way to handle data sets that contain a large number of zeros. In this study, the zero-inflated Poisson (ZIP) regression model will be used as a benchmark for comparison.

The NBI regression model with varying dispersion is derived as follows. Consider policyholder i, $i = 1, \dots, n$ , whose number of claims, denoted as $k_{i}$ , with $k_{i} = 0, 1, 2, 3, \dots$ , are independent. In addition, assume that $k_{i} |, z_{i}$ follows a Poisson distribution with pmf given by Equation (1), and $z_{i}$ follows a Gamma distribution with pdf given by

$g (z_{i}; ϕ_{i}) = \frac{z_{i}^{\frac{1}{ϕ_{i}} - 1} {\frac{1}{ϕ_{i}}}^{\frac{1}{ϕ_{i}}} exp (- \frac{z_{i}}{ϕ_{i}})}{Γ (\frac{1}{ϕ_{i}})},$

(29)

where $ϕ_{i} > 0$ . Parameterization (29) ensures that $E (z_{i}) = 1$ , and hence the model is identifiable.
Then, the unconditional distribution of $k_{i}$ becomes an NBI distribution, with pmf given by

$P (k_{i}) = \frac{Γ (k_{i} + \frac{1}{ϕ_{i}})}{k_{i}! Γ (\frac{1}{ϕ_{i}})} {(\frac{ϕ_{i} μ_{i}}{1 + ϕ_{i} μ_{i}})}^{k_{i}} {(\frac{1}{1 + ϕ_{i} μ_{i}})}^{\frac{1}{ϕ_{i}}} .$

(30)

The mean and the variance of the NBI distribution are given by

$E (k_{i}) = μ_{i}$

(31)

and

$Var (k_{i}) = μ_{i} + μ_{i}^{2} ϕ_{i} .$

(32)
The mean and dispersion parameters of the NBI distribution are modelled in terms of covariates

$\begin{matrix} μ_{i} & = & exp (x_{1, i}^{⊤} β_{1}) and \end{matrix}$

(33)

$\begin{matrix} ϕ_{i} & = & exp (x_{2, i}^{⊤} β_{2}), \end{matrix}$

(34)

where $x_{1, i}$ and $x_{2, i}$ are covariate vectors with dimensions $p_{1} \times 1$ and $p_{2} \times 1$ , respectively, with $β_{1} = {(β_{1, 1}, \dots, β_{1, p_{1}})}^{⊤}$ and $β_{2} = {(β_{2, 1}, \dots, β_{2, p_{2}})}^{⊤}$ the corresponding parameter vectors, and where it is assumed that the matrices $X_{1}$ and $X_{2},$ with rows given by $x_{1, i}$ and $x_{2, i}$ , respectively, are of full rank.
The pmf of the ZIP regression model is given by

$P (k_{i}) = \{\begin{matrix} π + (1 - π) e x p (- μ_{i}), & if k_{i} = 0 \\ (1 - π) \frac{e^{- z_{i} μ_{i}} {(z_{i} μ_{i})}^{k_{i}}}{k_{i}!}, & if k_{i} = 1, 2, 3 \dots \end{matrix}$

(35)

The mean and the variance of the ZIP distribution are given by

$E (k_{i}) = μ_{i} (1 - π)$

(36)

and

$Var (k_{i}) = μ_{i} (1 - π) [1 + μ_{i} π],$

(37)

where $μ_{i} = e x p (x_{1, i}^{⊤} β_{1})$ , and where $x_{1, i}$ is a covariate vector with dimension $p_{1} \times 1$ , with $β_{1} = {(β_{1, 1}, \dots, β_{1, p_{1}})}^{⊤}$ the corresponding parameter vector, and where it is assumed that the matrix $X_{1}$ with rows given by $x_{1, i}$ , respectively, are of full rank (note that $π$ can also be modelled in terms of covariates using the logit link function. However, we refrain from doing this in this paper since this approach did not lead to better fitting performances for the ZIP model for the MTPL data).

4.1. Modelling Results

The ML estimates of the parameters (all the parameters were statistically significant at a 5% threshold) for the NBI and PLN regression models with varying dispersion and the ZIP regression model are presented in Table 2. Note that variable selection can be performed for all the models by selecting the best predictor for parameter

μ_{i}

using backward elimination. This can be done by including all available explanatory variables present in the data set and testing whether the exclusion of each variable will result in lower global deviance (DEV), Akaike information criterion (AIC), and Schwartz Bayesian criterion (SBC) values. Subsequently, in the case of the NBI and PLN models, we can take all the variables selected for the parameter

μ_{i}

and continue variable selection for the parameter

ϕ_{i}

by performing forward selection, where we can test which explanatory variable would lead to a further decrease of the DEV, AIC, and SBC values when added to parameter

ϕ_{i}

. Additionally, if different subsets of explanatory variables result in very similar values of DEV, AIC, and SBC, we should chose the simpler model with less predictors to avoid overfitting. Regarding our data, as we can see from Table 2, the explanatory variables AD, HP, and AC were chosen for

μ_{i}

, and only the variable AD was chosen for

ϕ_{i}

.

From the results in Table 2, we observe that the values of the estimated regression coefficients of the variables AD, HP, and AC have a similar effect (positive and/or negative) on parameter

μ_{i}

in the case of all the models, and the same observation can be made for parameter

ϕ_{i}

in the case of the NBI and PLN models.

Finally, we rely on normalized quantile residuals [13] as an exploratory graphical tool to help us evaluate the adequacy of the fit of the NBI, ZIP, and PLN models. For these discrete response distributions, the normalized (randomized) quantile residuals are defined as

{\hat{r}}_{i} = Φ^{- 1} (u_{i}),

where

Φ^{- 1}

is the inverse cumulative distribution function of a standard normal distribution and where

u_{i}

is defined as a random value from the uniform distribution on the interval

[F_{i} (k_{i} - 1 | θ^{(r + 1)}), F_{i} (k_{i} | θ^{(r + 1)})]

, where

F_{i}

is the cumulative distribution function estimated for the ith policyholder, and where

θ^{(r + 1)}

contains the estimated model parameters after the EM algorithm has reached the global maximum, and

k_{i}

is the corresponding observation. The fit of the claim count model can be evaluated by means of the usual quantile-quantile plots. Specifically, if the data indeed follow the assumed distribution, then the residual on the quantile-quantile plot will fall approximately on a straight line. Figure 1 shows the normalized (random) quantiles for the ZIP regression model and the NBI and PLN claim frequency regression models with varying dispersion.

From Figure 1, we observe that the residuals indicate that the NBI and PLN are better assumptions than the ZIP model since the residuals of the former two are close to the right tail of the claim frequency distribution. Furthermore, the PLN model seems to fit the claim count data slightly better than the NBI model, since, as was previously mentioned, the tail of mixed Poisson models is equivalent to the tail of their mixing distributions [7], and in this case the lognormal mixing density has a thicker right tail than the Gamma mixing density. Therefore, overall it is reasonable to suggest the employment of the PLN model for modelling claim counts in our data set. As we are going to observe in what follows, the PLN model also provides better fitting performances than the NBI and PIG models in terms of the DEV, AIC, and SBC values.

4.2. Models Comparison

In this subsection, we compare the fit of the ZIP regression model and the NBI and PLN regression models with varying dispersion based on DEV, AIC, and SBC, which are classic hypothesis/specification criteria.

The DEV is defined as

D E V = - 2 \hat{l} (\hat{θ}),

(38)

where

\hat{l}

is the maximum of the log-likelihood, and

\hat{θ}

is the estimated parameter vector of the model. Furthermore, the AIC and the SBC are given by

A I C = D E V + 2 \times d f

(39)

and

S B C = D E V + log (n) \times d f,

(40)

where

d f

are the degrees of freedom, and n is the number of observations in the sample.

The resulting DEV, AIC, and SBC values for the competing models are presented in Table 3. We observe that the PLN regression model provides the best fit with respect to all three criteria.

4.3. Computational Aspects

All computing was made using the programming language R. The PLN regression model with varying dispersion was estimated using the EM algorithm, which was presented in Section 3. In addition, the ML estimates of the parameters of the NBI regression model with varying dispersion and the ZIP regression model were obtained using the generalized additive models for the location, scale, and shape (GAMLSS) package in R [14].

Note that a rather strict criterion was used, and it took the algorithm quite a large number of iterations to converge. In particular, the stopping criterion was set as

t o l = 10^{- 12}

. Note also that the M-step involves two Newton–Raphson iterations, and hence it is important to identify the choice of meaningful initial values for the vectors

β_{1}

and

β_{2}

, as this can increase increase the computational time requirements for the EM algorithm and make it more difficult to locate the global maximum. We obtained good initial values for

β_{1}

by fitting the simple Poisson regression. Additionally, we obtained good initial values for

β_{2}

by (i) calculating

V a r (k_{i})

for the eight different risk classes which can be formed using all available risk factors and the observations

i = 1, \dots, n

and (ii) calculating

E (k_{i})

for the eight different risk classes and using the log-link function for

ϕ_{i}

(see Equation (6)), so we solve Equation (4) with respect to

β_{2}

. However, we also checked with many other starting values for

β_{2}

in order to ensure that the global maximum had been obtained. For all cases, the EM algorithm converged to a similar solution. The standard errors were computed by using the standard approach of [15].

Finally, as was anticipated, in terms of CPU time, it took the NBI regression model with varying dispersion and the ZIP regression model less than one minute, and they both compared significantly more favorably to the PLN regression model with varying dispersion, which exceeded 30 min of CPU time. However, it should be taken into account that the PLN model has a density which does not exist in closed form, and that there were 14,143 observations in the sample of MTPL data that was examined in this article. For larger data sets with more features, the computing effort can be reduced if the E- and M-steps are executed in parallel across multiple threads to exploit the processing power of modern-day multicore machines.

5. Conclusions, Limitations, and Future Research

In this article, we considered a family of mixed Poisson claim count regression models with varying dispersion and dependence parameters arising from non-conjugate mixing distributions for approximating overdispersed claim frequencies in non-life insurance. The flexibility in the choice of the mixing distribution combined with the proposed approach, which assumes that the mean and dispersion parameters of the model can be modelled in terms of risk factors, can provide an advantage relative to the majority of previous approaches in the literature, which have concentrated on mixing densities conjugate to the Poisson mixing and assumed that only the mean parameter can vary through covariates. From a practical business standpoint, the proposed modelling framework is beneficial for the insurance company, as it will result in an improved risk evaluation of policyholders who are more likely to have accidents, since the tail behaviour of mixed Poisson models is similar to that of the mixing density and the majority of heavy-tailed mixing distributions are not conjugate to the Poisson.

The PLN regression model with regression specifications on its mean and dispersion parameters was considered for expository purposes. Furthermore, we developed an efficient EM algorithm for maximum likelihood estimation of the parameters of the model. The implementation of the algorithm was illustrated by fitting the model to a real MTPL insurance data set. An interesting line for further research would be to extend the model to the multivariate case to permit inferences about the dependence structure between different types of overdispersed claim counts from the same and/or different types of coverage. However, it should be noted that the PLN model becomes more complicated in the two-dimensional setting due to algebraic intractability, which is a problem that is inherited from the univariate case. Moreover, modelling all the parameters of the PLN model in terms of covariates can further increase the computational burden in the high-dimensional setting. Finally, another fruitful future research direction is to include time series components to take into account both cross-dependence between different types of claims and time dependence.

Author Contributions

Conceptualization, G.T.; methodology, G.T.; software, G.T.; formal analysis, G.T., N.H. and R.H.; investigation, G.T., N.H. and R.H.; data curation, G.T., N.H. and R.H.; writing—original draft preparation, G.T., N.H. and R.H.; writing—review and editing, G.T., N.H. and R.H.; supervision, G.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Please note that we cannot make the data supporting reported results publicly available due to data use agreement with the company which provided the data set. However, a toy data set along with the code for estimating the parameters of the PLN regression model with varying dispersion can be made available upon request.

Acknowledgments

We would like to thank the handling editor and the two anonymous referees for their very helpful comments and suggestions that have significantly improved this article. Furthermore, we would like to thank the participants of the 13th International Conference on Computational and Methodological Statistics. Finally, we would like to thank the undergraduate student Shahzeb Khan for his interest in the research that was undertaken in this article and for his help in citing previous works.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EM	Expectation-maximization
NBI	Negative binomial type I
MCEM	Monte Carlo expectation-maximization
ML	Maximum likelihood
MTPL	Motor third-party liability
pdf	Probability density function
PLN	Poisson log-normal
pmf	Probability mass function
ZIP	Zero-inflated Poisson

References

Lawless, J.F. Negative binomial and mixed Poisson regression. Can. J. Stat. 1987, 15, 209–225. [Google Scholar] [CrossRef]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 1st ed.; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Hilbe, J.M. Negative Binomial Regression, 1st ed.; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Ord, J.K.; Whitmore, G.A. The Poisson-inverse Gaussian distribution as a model for species abundance. Commun. Stat. Theory Methods 1986, 15, 853–871. [Google Scholar] [CrossRef]
Willmot, G.E. The Poisson-Inverse Gaussian distribution as an alternative to the negative binomial. Scand. Actuar. J. 1987, 3–4, 113–127. [Google Scholar] [CrossRef]
Dean, C.; Lawless, J.F.; Willmot, G.E. A mixed Poisson–inverse-Gaussian regression model. Can. J. Stat. 1989, 17, 171–181. [Google Scholar] [CrossRef]
Perline, R. Mixed Poisson distributions tail equivalent to their mixing distributions. Stat. Comput. 1998, 38, 229–233. [Google Scholar] [CrossRef]
Rigby, R.A.; Stasinopoulos, D.M.; Akantziliotou, C. A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse Gaussian distribution. Comput. Stat. Data Anal. 2008, 53, 381–393. [Google Scholar] [CrossRef]
Barreto-Souza, W.; Simas, A.B. General mixed Poisson regression models with varying dispersion. Stat. Comput. 2016, 26, 1263–1280. [Google Scholar] [CrossRef]
Tzougas, G. EM estimation for the Poisson–inverse Gamma regression model with varying dispersion: An application to insurance ratemaking. Risks 2020, 8, 97. [Google Scholar] [CrossRef]
Blueschke, D.; Blueschke-Nikolaeva, V.; Neck, R. Approximately Optimal Control of Nonlinear Dynamic Stochastic Problems with Learning: The OPTCON Algorithm. Algorithms 2021, 14, 181. [Google Scholar] [CrossRef]
Amirghasemi, M. An Effective Decomposition-Based Stochastic Algorithm for Solving the Permutation Flow-Shop Scheduling Problem. Algorithms 2021, 14, 112. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. Comput. Graph. Stat. 1996, 5, 236–245. [Google Scholar]
Stasinopoulos, D.M.; Rigby, B.; Akantziliotou, C. Instructions on How to Use the Gamlss Package in R, 2nd ed. 2008. Available online: http://www.gamlss.org (accessed on 30 December 2021).
Louis, T.A. Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. 1982, 44, 226–233. [Google Scholar]

Figure 1. Normalized quantiles for the ZIP regression model and the NBI and PLN regression models with varying dispersion.

Table 1. Descriptive statistics of claim counts and the size of the different categories of the explanatory variables.

Statistic	Value	Age of the Driver (AD)		Horsepower of the Car (HP)		Age of the Car (AC)
# Observations	14,143	C1:	3238	C1:	5042	C1:	4318
Minimum	0	C2:	10,905	C2:	9101	C2:	9825
Median	0		-		-		-
Mean	0.4827		-		-		-
Variance	0.6988		-		-		-
Maximum	12		-		-		-

Table 2. Parameter estimates of the ZIP regression model and the NBI and PLN regression models with varying dispersion.

NBI		ZIP		PLN
Coeff. $β_{1}$		Coeff. $β_{1}$		Coeff. $β_{1}$
Intercept	$- 0.4729$	Intercept	$- 0.1277$	Intercept	$- 0.4709$
AD		CS		CS
C2	$- 1.2390$	C2	$- 1.2454$	C2	$- 1.2360$
HP		HP		HP
C2	$1.0378$	C2	$0.9892$	C2	$1.0469$
AC		AC		AC
C2	$- 0.6481$	C2	$- 0.6398$	C2	$- 0.6586$
Coeff. $β_{2}$				Coeff. $β_{2}$
Intercept	$- 2.4935$	Prob. $π$	$0.3032$	Intercept	$- 1.0969$
AD				CS
C2	$0.8878$			C2	$0.3481$

Table 3. ZIP regression model and NBI and PLN regression models comparison.

Specification Criteria Values
	DEV	AIC	SBC
NBI	15,885.1	15,897.1	15,940.1
ZIP	16,052.2	16,062.2	16,098
PLN	15,859.4	15,871.4	15,914.4

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tzougas, G.; Hong, N.; Ho, R. Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions. Algorithms 2022, 15, 16. https://doi.org/10.3390/a15010016

AMA Style

Tzougas G, Hong N, Ho R. Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions. Algorithms. 2022; 15(1):16. https://doi.org/10.3390/a15010016

Chicago/Turabian Style

Tzougas, George, Natalia Hong, and Ryan Ho. 2022. "Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions" Algorithms 15, no. 1: 16. https://doi.org/10.3390/a15010016

APA Style

Tzougas, G., Hong, N., & Ho, R. (2022). Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions. Algorithms, 15(1), 16. https://doi.org/10.3390/a15010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mixed Poisson Regression Models with Varying Dispersion Arising from Non-Conjugate Mixing Distributions

Abstract

1. Introduction

2. Mixed Poisson Regression Model with Varying Dispersion

2.1. Modelling Framework

2.2. Model Specification: The Poisson-Lognormal Regression Model with Varying Dispersion

3. Statistical Inference: The EM-Type Algorithm

EM Estimation for the PLN Regression Model with Varying Dispersion

4. Numerical Illustration

4.1. Modelling Results

4.2. Models Comparison

4.3. Computational Aspects

5. Conclusions, Limitations, and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI