Credibility Distribution Estimation with Weighted or Grouped Observations

: In non-life insurance practice, actuaries are often faced with the challenge of predicting the number of claims and claim amounts to be incurred at any given time, which serve to implement fair pricing and reserves given the nature of the risk. This paper extends Jewell’s credible distribution in terms of forecasting the distribution of individual risk in cases where the observations are weighted or are grouped in intervals. More specifically, we show how empirical distribution functions can be embedded within Bühlmann’s and Straub’s credibility model. The optimal projection theorem is applied for credibility estimation and more insight into the derivation of the credibility distribution estimators is also provided. In addition, distribution credibility estimators are established and numerical illustrations are presented herein. Two examples of distribution credibility estimation are given, one with insurance loss data and the other with industry financial data.


Introduction
In actuarial science, one of the fundamental problems is that of predicting future claims of individual risk given one's past experience of a collective of heterogeneous risks.Credibility is a ratemaking technique that serves to forecast future premiums for a group of insurance contracts for which we have experience, whilst we have a lot more experience for a collection of contracts that are similar but not exactly the same.
In the insurance industry, some legislated rules indicate that some changes over time occurred across the claim distribution.Therefore, it is essential to examine these changes at different points of the distribution.An empirical distribution function provides a way to model and sample cumulative probabilities for a data sample that does not fit a standard probability distribution.Its value at a given point is equal to the proportion of observations from the sample that are less than or equal to that point.
In non-life insurance practice, actuaries are often faced with the challenge of predicting the number of claims and the claim amounts to be incurred at any given time, which serve to implement fair pricing and reserves given the nature of the risk.Actuaries usually deal with events that are uncertain and their economic consequences.The aim of this paper is to carry out the credibility estimation of empirical distribution functions in measuring and managing these uncertainties.
In the first part of this paper, we extend the work of Jewell (1974b) in terms of forecasting the distribution of individual risk in cases where the observations are weighted for the non-homogeneous and homogeneous models.Here, the weights (sizes) w ij , i = 1, . . ., n j , j = 1, . . ., K are now changing in time.The contract j might result from a grouping and averaging of w ij observations in a contract with several independent and identically distributed observations S lij , l = 1, • • • w ij , during the year i, i.e., X ij = 1 w ij ∑ w ij l=1 S lij , and then taking the conditional mean of the identity function E[I(X ij ≤ x)|Θ j ].Alternatively, in the case of Risks 2024, 12, 10. https://doi.org/10.3390/risks12010010https://www.mdpi.com/journal/risksraw data, the contract j might result from the grouping and averaging of identity functions within the year i, Ī(S ij ≤ x) = 1 w ij ∑ w ij l=1 I(S lij ≤ x) and then taking the E[ Ī(S ij ≤ x)|Θ j ].Here, we proceed with the former considering the credibility distribution estimation as a point estimate approach of F X ij |Θ j (x|Θ j ) = E[I(X ij ≤ x)|Θ j ].Optimal linearized estimators of F X ij Θ j (x|Θ j ) are obtained by the classical least squares approach as well as by the optimal projection theorem of random variables on planes as presented by De Vylder (1976, 1996).
In the second part of this paper, we consider credibility distribution estimation based on grouped data formed by aggregating the individual observations of a variable into groups.The construction of the empirical distribution based on grouped data can be performed by obtaining the point values of the empirical distribution function whenever is possible.Then, we approximate the distribution functions by connecting those points with straight lines and applying premium estimation in a credibility framework.An alternative model of credibility estimation is also obtained similarly as in Bühlmann and Straub (1970) model.

Related Works
Bühlmann (1967) and Bühlmann and Straub (1970) established the theoretical foundation of modern credibility theory, presented as a distribution-free credibility estimation.The method was extended in the regression model by Hachemeister (1975), where the credibility premium depends linearly on a number of risk characteristics.Jewell (1974a) has shown that credibility is exactly Bayesian for a certain exponential family of distributions with natural conjugate priors.Furthermore, Landsman andMakov (1998, 1999) extended the results on the exponential family to the exponential dispersion family.The following key references are related to new developments in credibility estimation: Makov et al. (1996), Christiansen and Schinzinger (2016), Tsai and Lin (2017), Gong et al. (2018), Xacur and Garrido (2018), Tsai and Wu (2020), Tsai and Zhang (2019), Bozikas andPitselis (2020, 2021), Youn et al. (2021), Wang et al. (2021), Yan and Song (2022), and Kim et al. (2022).
Credibility distribution estimation is closely connected to the area of quantile credibility estimation.The quantile function is the inverse of the distribution function.It specifies the value of the random variable such that the probability of the variable that is less than or equal to that value is equal to the given probability.Kim and Jeon (2013) proposed a credibility theory by truncating the loss data based on quantiles.Some other references related to quantile estimation are: Pitt (2006), Pitselis (2009Pitselis ( , 2013Pitselis ( , 2017)), Kudryavtsev (2009), Gebizlioglu and Yagci (2008), Denuit (2008) and Landsman (1996).Jewell (1974b) extended the classical Bühlmann (1967) model to the problem of forecasting the distribution of individual risk based upon collective statistics and individual experience data and solved the problem by finding a Bayesian conditional distribution.Jewell (1974b) also obtain an additional insight into the nature of credibility estimation assuming that the true value of Θ j is known and obtained credible distributions and credible densities by carrying out simulations for some conjugate prior families of distributions (e.g., Poisson-gamma, etc.).He also considered the problem of founding a credibility approximation to the true distribution of the next observation.Korwar and Hollander (1973) defined a sequence of empirical Bayes estimators for estimating a distribution function.Zehnwirth (1981) established the asymptotic optimality of the empirical Bayes distribution function created from the Bayes rule relative to the Dirichlet process prior with unknown parameter.Cai et al. (2015) combined Bühlmann's credibility theory andFerguson's (1973) nonparametric Bayes analysis to develop a completely nonparametric estimation for loss distributions and established a unified distribution-free approach to experience rating for arbitrary premium principles.
This paper is organized as follows.In Section 2, both the linearized non-homogeneous and homogeneous estimators in the weighted credibility distribution model are obtained, and the credibility parameters are estimated.Optimal credibility distribution estimators are also obtained using the optimal projection theorem.In Section 3, the credibility distribution estimation for grouped data is presented.In Section 4, an alternative model of credibility distribution estimation is obtained when the observations are grouped in intervals.Applications to real data are presented in Section 5, one with insurance loss data and the other with industry financial data.Some concluding remarks are presented in Section 6.

Weighted Credibility Distribution Estimation
In the following, we consider the credibility model with several contracts and weighted observations.For an insurance portfolio, X ij are the average losses of w ij observations for contract j = 1, • • • , K and period i = 1, • • • n j .For industry portfolios, X ij denotes the average returns (losses/gains) of

Assumptions
We have the following assumptions: (i) The contracts are independent and the variables , where δ ir = 1 if i = r and 0 otherwise.

Structural Parameters
The structural parameters F X ij (x), s 2 F x , and a F x are as follows:

Notation
Here, we present the weighted empirical distribution function as well as some notations that are useful for the derivation of the credibility distribution estimation.
Lemma 1 (Expectation and Covariance Relations).Based on the above assumptions, we can obtain expressions for the conditional expectations and covariances as follows, , ( 5) The first part of (4) results from Similarly, we can prove the second and third parts of (4).For the proof of the first part of relation (5), we have In the same way, we can prove the second and third parts of (5).Finally, (6) can be proved as Similarly, as in Bühlmann and Straub (1970), by the following theorems, we will provide the optimal linearized non-homogeneous (as well as the homogeneous) credibility estimators and provide some useful estimators for the structure parameters.
Theorem 1 (Linearized non-homogeneous credibility distribution estimator).Under the assumptions (i)-(iii), the optimal linearized non-homogeneous estimator of F X ij (x|Θ j ) is obtained by with F n wj (x) and Z F x j as in (1).
Proof.We have to find such that is minimum.Differentiating (9) with respect to c j 0 , we have Substituting the value of c j 0 in (9) and differentiating with respect to c rl ′ , we obtain The right-hand side of (11) becomes Then, (11) implies that Multiplying ( 12) by w ij and summing with respect to i = 1, . . ., n j , (i.e., ∑ Since the probability distribution of which provides (7).
Theorem 2 (Linearized homogeneous credibility distribution estimator).Under the assumptions (i)-(iii), the optimal linearized homogeneous estimator of F X ij (x|Θ j ) is obtained by with F n wj (x), F n wz (x) and Z F x j as defined in (1). Proof.Letting we have to minimize such that holds under the restrictions ∑ K l=1 ∑ n l i=1 c j il = 1, with the Lagrange multiplier 2λ.The following quantity leads to From ( 16), we obtain Differentiating ( 17) with respect to c i ′ l ′ , we obtain Multiplying both sides by w i ′ l ′ and taking the sum over i ′ , we obtain for each l ′ : Substituting ( 20) into ( 19), we obtain Then, the optimal linearized homogeneous estimator of resulting in (13).
The following theorem will prove that F n wz (x) has a smaller variance than F n wj (x), i.e., based on the heterogeneity and the fluctuation of the risk, F n wz (x) has a minimal mean square error. .

Proof.
We have to minimize the following quantity Taking the derivative of ( 23) with respect This is the same as This gives We know that and from ( 26), we have . Theorem 4.Under assumptions (i)-(iii), the quadratic loss for the credibility distribution estimator is given by Proof.We have that provides (29).

Optimal Projection Theorem
In the following, De Vylder's (1976Vylder's ( , 1996) ) optimal projection theorem of random variables in the plane is applied in order to derive the optimal estimator of F X ij (x) and Proof.Directly from (2) and ( 5).Theorem 6.The optimal credibility estimator of Proof.In order to prove (30), it is sufficient to prove the unbiasedness and covariance conditions of the optimal projection theorem of random variables on planes not through the origin (see De Vylder (1976Vylder ( , 1996))), that is and The unbiasedness condition results from ( 2) and The covariance condition results from the independence of the contracts and the covariance relations of Lemma 1, which gives

Unbiased Estimators
Below, we provide unbiased estimators analogous to the Bühlmann and Straub (1970) model.
Lemma 2. The following estimators of the structural parameters F X ij (x), s 2 F x and a F x , presented in Section 2.2, are unbiased.
Based on De Vylder (1978), an unbiased estimator of a F x can take the form Proof.The unbiasedness of F X ij (x) is straightforward and is omitted.The unbiasedness of s 2 F follows from resulting in (33).For the proof of the unbiasedness of (34), we refer to Bühlmann and Straub (1970).Finally, the unbiasedness of a F in (35) results from which implies (35).

Credible Distribution for Grouped Data
Grouped data are formed by aggregating the individual observations of a variable into groups.For example, a histogram is a density approximation for grouped data.The construction of the empirical distribution based on grouped data can be achieved by obtaining the point values of the empirical distribution function whenever possible.Then, we can approximate the distribution function by connecting those point values with straight lines.
Empirical distribution for grouped data is evaluated at a point estimate x.We consider the case where the point estimate x is at a boundary and the case where the value of x is between the boundaries.

Empirical Distribution for Grouped Data at Boundary
For contract j, let the group boundaries be c 0j < c 1j < • • • < c nj , where c 0j = 0 and c n+1,j = ∞.Let m ij be the number of observations in the interval (c i−1,j , c ij ), i = 1, 2, . . ., n j , j = 1, 2, . . ., K and m .j= ∑ n j i=1 m ij be the total number of observations for the j contract.For grouped data, the empirical distribution function at each group boundary c ij is defined as For grouped data, there is no problem if the distribution function has to be estimated at a boundary.When all of the information is available, working with the empirical estimate of the distribution function is straightforward (see Klugman et al. (2012)).We have the following assumptions: 3.1.1.Assumptions (i*) The contracts are independent and the variables The observations X ij have finite variance, x (Θ j ).

Notation
Here, we adopt the following notation: Based on the above assumptions, a credibility distribution estimator for F X ij (x|Θ j ) is obtained as With the following theorem, we can obtain the credibility distribution estimator of F X ij (x|Θ j ).
Theorem 7.Under the assumptions (i * )-(iii * ), the credibility factor in ( 40) is given by with a x as in (38) and m .jas in (39).
Proof.The proof of the theorem can be obtained by minimizing the expression with respect to Z x j .

Credibility Estimators
Lemma 3. The credibility point estimators of F X ij (x), s 2 x and a x are given as follows: Proof.Similarly to the proof of Lemma 2.

Empirical Distribution for Grouped Data at Value x between Boundaries
Now, suppose that the value of x is between the boundaries c i−1,j and c ij .Then, for contract j, the empirical distribution function is given by This function is differentiable at all values except for the group boundaries.Based on (41), we can obtain the following Note that the above estimator is biased although it is an unbiased estimator of the true interpolated value (see Klugman et al. (2012)).
The conditional variance of the empirical distribution is where and Then, we can proceed as in Section 3.1 for obtaining the credibility distribution estimator of F X ij (x|Θ j ), when the value of x is between boundaries.

Alternative Credibility Distribution Approach for Grouped Data
For grouped data, the previous approaches yield credibility point estimates.If we want to find the credibility estimation in the framework of Bühlmann and Straub (1970), we may apply the concept of uniform distribution within each interval (c i−1,j , c ij ) and the first two moments can be estimated from , for k = 1, 2. Thus, for contract j, the empirical estimate of the mean (k = 1) is the weighted average of the interval midpoints where the weight m ij for an interval is the proportion of the observations that are in the interval (histogram), i.e., σ 2 (Θ j ), the credibility estimation based on grouped data can be obtained similarly as in the Bühlmann and Straub (1970) model with parameters Theorem 8.The following are unbiased estimators for µ, s 2 , and a: C.j .

Numerical Illustrations
In this section, we use two datasets, one with insurance motor claims data and a second with monthly returns financial data.

Numerical Example with Insurance Data
The dataset is provided by Insurance Europe (2022) and includes a database with figures on the European insurance industry during the period 2004-2020 for 32 EU countries.Our numerical illustration is based on a complete dataset of 10 selected countries for the years 2004-2018.Our dataset also contains the motor claims paid and the number of motor claims for each country and each year.The selected countries are the following: Austria (AT); Germany (DE); Finland (FI); Greece (GR); Hungary (HR); Italy (IT); Norway (NO); Poland (PL); Portugal (PT); and Sweden (SE).Table 1 shows the summary statistics of the motor claim amounts and the claim numbers for countries j = 1, • • • , 10 and years i = 1, • • • , 15. Note: X ij are the average claims per year and w ij represents the number of motor claims that correspond to each X ij .
Table 2 illustrates the results of a credibility distribution function for motor claims amount data during the years 2004-2018.More analytically, the upper part of the table shows the individual empirical distribution (x = 320, 800, 1000, 2000, 3000, 23,800, 23,896, 23,897) and the corresponding credibility distribution estimators F Cred X ij (x|Θ j ) are shown in the middle part of the table.The estimated credibility factors Z F x j , as well the estimated parameters F n ww (x), s 2 F , a F , are presented in the lower part of Table 2.Note that F n wj (x) = 0 means that the value of all claims X ij > x and F n wj (x) = 1 if claims X ij ≤x.
In Table 2, we observe a lack of monotonicity of the estimated credibility distributions for all contracts.In order to obtain monotonicity, we similarly proceed as in Cai et al. (2015) by restricting the credibility factor Z F j to be a constant free of x.The results are shown in Table 3.Although monotonicity has been restored from a risk management perspective (which serves to fair pricing and reserves given the nature of the risk), more investigation is required, especially in the points where monotonicity breaks down.
Remark 1.Another way of obtaining monotonicity of the credibility estimated by distribution functions is by sorting the resulting credibility by estimated distribution functions.In the relevant literature, there are methods for extracting a monotone function from non-monotonic data.Such a method is the monotonic regression that achieves the monotonicity and smoothness of the regression by introducing a regularization term, and solving an optimization problem with constraints.Some key references are: Friedman and Tibshirani (1984), Mukerjee (1988), Shively et al. (2009) and Zhang (2004).Similarly, the above approaches could be applied to our model.
By letting the values of motor claims be larger than x = 23,800 and less than or equal to x = 23,897 x = 23,897 is the maximum threshold of contract DE, which is the contract with the largest values of motor claims, as shown in Table 1), whilst the values of the estimated credibility distribution F Cred X ij (x|Θ j ) remain the same up to the fifth decimal place.By letting x > 23,897, the estimated credibility distribution goes to 1 (see Table 2).
Remark 2. Similarly to in Bühlmann and Straub (1970) model, a F x can possibly be negative.This means that there is no detectable difference between the risks.In this case we put a F x = 0, as in our cases for x = 23,800,23,896,23,897.
Figure 1 displays the individual empirical distribution in each contract.Note that the red bullets indicate the corresponding credibility estimate at specific points presented in Table 2.

Example of Credibility Distribution Estimation with Financial Data
The dataset was created (see Fama and French (2022)) as follows: each NYSE, AMEX, and NASDAQ stock was assigned to an industry portfolio at the end of June of year t based on its four-digit SIC code at that time.Compustat SIC codes have been used for the fiscal year ending in the calendar year t − 1. Whenever Compustat SIC codes are not available, CRSP SIC codes for June of year t were used.Then, returns from July of year t to June of year t + 1 are computed.The weights are the number of firms in portfolios.
Table 6 illustrates the results of credibility distribution function for monthly returns for 10 industry portfolios from July 1926 to July 2022.More analytically, the upper part of the table shows the individual empirical distribution F n wj (x) of the returns X ijh ≤ x, (x = −15, −10, −5, 0, 10, 15, 34.17, 59, 60, 79.79) and the corresponding credibility distribution estimators F Cred X ij |Θ j (x|Θ j ) are shown in the middle part of the table.The estimated credibility factors Z F x j , as well the estimated parameters F n ww (x), s 2 F x , a F x , are presented in the lower part of Table 6.The monotonicity of the estimated distribution function is shown in Table 6.By letting the values of returns be larger than x = 59 and less than or equal to x = 79.79(x = 79.79 is the maximum threshold of portfolio Durbl, which is the portfolio with the largest return values, as shown in Table 5), the values of the estimated credibility distribution F Cred X ij (x = 0|Θ j ) remain the same up to the fifth decimal place.By letting x > 79.79, F Cred X ij (x > 70.79|Θ j ) goes to 1 (see Table 6).Figure 2 displays the individual empirical distribution in each contract.Again, note that the red bullets indicate the corresponding credibility estimate at specific points presented in Table 6.

Credibility Coefficients for Industry Portfolios Data
Here, we provide an intuitive interpretation for the form of the credibility distribution estimator for the monthly returns for the 10 industry portfolios, by presenting the following credibility coefficients.Table 7 illustrates 6 and Remark 2, for x = 50, a F x = 0 and s 2 F x = 0.0469, imply that BRV = 0 and CC = ∞.

Example of Credibility Distribution Estimation with Financial Grouped Data
The empirical distribution function for the grouped data was depicted by the step function of Fama and French (2022) data.The grouping (see Table 8) is a subjective element in this fit and other persons would have different ones.The total number of observations in each portfolio is the same (m .j= 1155).
Table 9 illustrates the results of the credibility distribution function for monthly returns for 10 industry portfolios from July 1926 to July 2022.Analytically, the upper part of the table shows the individual empirical distribution F mj (x) of returns X ij ≤ x, (x = −15, −10, −5, 0, 10, 15) and the corresponding credibility distribution estimators F Cred X ij |Θ j (x|Θ j ) are shown in the middle part of the table.The estimated credibility fac-tors Z x j , as well the estimated parameters F mm (x), s 2 x , a x , are presented in the lower part of Table 9.The monotonicity of the estimated distribution function is shown in Table 9, but the convergence to one of the estimated credibility distribution for grouped data should be further investigated.Parameter estimation X ij ≤ x, (x = −15, −10, −5, 0, 10, 15) x = −15 F mm (x) = 0.013617 a x = 3.8391 ×10 −5 s 2 x = 7.9089 ×10 −7 Z x j = 0.997944 x = −10 F mm (x) = 0.047149 a x = 0.000196743 s 2 x = 1.7862 ×10 −5 Z x j = 0.991002 x = −5 F mm (x) =0.011843 a x = 0.0006369784 s 2 x = 7.6405 ×10 −5 Z x j = 0.950530 x = 0 F mm (x) = 0.407970 a x = 0.000118016 s 2 x = 0.1327 ×10 −3 Z x j = 0.9881473 x = 10 F mm (x) = 0.956798 a x = 0.000362038 s 2 x = 3.5504 ×10 −5 Z x j = 0.990288 x = 15 F mm (x) = 0.972741 a x = 0.000146344 s 2 x = 1.2874 ×10 −5 Z x j = 0.9912793 Credibility Coefficients for Financial Grouped Data Table 10 illustrates the coefficient of variation BRV, the average within-risk coefficient of variation WRV, and the credibility coefficient CC for the industry portfolios of grouped data.For grouped data, the previous approach gives a credibility point estimate.If we want to derive the classical credibility estimation, we can apply the concept of uniform distribution within each interval of returns and take the interval midpoints as the value of return.The weights are the number of observations in each interval.Table 11 shows the individual average return for the 10 industry portfolios µ j , the credibility estimation of returns for these portfolios µ(Θ j ) Cred , along with the credibility factor Z j and the estimated parameters µ, a and s 2 .

Concluding Remarks
The objective of this paper was to present the appropriate credibility distribution model that adequately describes the insurance losses, a model that can be used for risk management purposes.
The main contribution of the paper is that it embedded the empirical distribution into credibility modeling in the form of the Bühlmann and Straub (1970) model.In the first part of the paper, we present the model of the weighted credibility distribution, and in the second part, a model that applies to a grouped data in intervals.
With our models, we examine two datasets, one with motor claim amounts and the number of motor claims from 10 selected European countries during the period 2004-2020, and a second with monthly returns from July 1926 to July 2022 for 10 industry portfolios.For applying our credibility distribution model with grouped data, we grouped the second dataset (Fama/French financial data) into intervals of claim amounts.Under this setting, the grouping is subjective and the weights are the number of points within each interval and the total weights in each interval are the same.
The monotonicity (or non-monotonicity) and the convergence to one of the estimated distribution functions are shown numerically in Tables 2, 3, 6 and 9. From a theoretical point of view, the monotonicity, as well as the convergence of the estimated distribution functions need further investigation.Furthermore, the sufficient conditions for the asymptotic optimality of the empirical credibility distribution estimators can be also investigated, providing some good ideas for a new project.
Funding: This research received no external funding.

Figure 1 .
Figure 1.Individual empirical distribution and credibility distribution point estimates (in red) for motor claims per contract.

Figure 2 .
Figure 2. Individual empirical distribution and credibility distribution point estimates (in red) for industry portfolios per contract.
the coefficient of variation BRV =

Figure 3 .
Figure 3. Individual empirical distribution and credibility distribution point estimates (in red) for grouped data per contract.

Table 1 .
Summary statistics for 10 selected European countries.

Claims and Number of Motor Claims for the Years 2004-2018 X
ij : Motor claims amount in millions for the years i = 1, • • • , 15 and countries j = 1, • • • , 10

Table 2 .
Credibility distribution estimation for motor claims.

Claim Amounts and Number of Motor Claims from 10 Selected European Countries during the Period 2004-2018
Individual empirical distribution with claim amount X ij ≤ x,

Table 3 .
Credibility distribution estimation for motor claims.

Claim Amounts and Number of Motor Claims from 10 Selected European Countries during the Period 2004-2018, Z F Free of x
Individual empirical distribution with claim amount X ij ≤ x,(x = 320, 800, 1000, 2000, 3000, 23,800, 23,896, 23,897)

Table 6 .
Credibility distribution estimation for industry portfolios.

Table 8 .
Credibility distribution estimation for grouped data.

Credibility Coefficients for Industry Portfolios for Grouped Data
Example of the Classical Credibility Estimation with Financial Grouped Data

Table 11 .
Classical credibility model for grouped data.