Credibility Methods for Individual Life Insurance

Credibility theory is used widely in group health and casualty insurance. However, it is generally not used in individual life and annuity business. With the introduction of principle-based reserving (PBR), which relies more heavily on company-specific experience, credibility theory is becoming increasingly important for life actuaries. In this paper, we review the two most commonly used credibility methods: limited fluctuation and greatest accuracy (Bühlmann) credibility. We apply the limited fluctuation method to M Financial Group’s experience data and describe some general qualitative observations. In addition, we use simulation to generate a universe of data and compute Limited Fluctuation and greatest accuracy credibility factors for actual-to-expected (A/E) mortality ratios. We also compare the two credibility factors to an intuitive benchmark credibility measure. We see that for our simulated data set, the limited fluctuation factors are significantly lower than the greatest accuracy factors, particularly for low numbers of claims. Thus, the limited fluctuation method may understate the credibility for companies with favorable mortality experience. The greatest accuracy method has a stronger mathematical foundation, but it generally cannot be applied in practice because of data constraints. The National Association of Insurance Commissioners (NAIC) recognizes and is addressing the need for life insurance experience data in support of PBR—this is an area of current work.


Background
Insurance is priced based on assumptions regarding the insured population. For example, in life and health insurance, actuaries use assumptions about a group's mortality or morbidity, respectively. In auto insurance, actuaries make assumptions about a group of drivers' propensity toward accidents, damage, theft, etc.
The credibility ratemaking problem is the following: suppose that an individual risk has better (or worse) experience than the other members of the risk class. Note that the individual risk might actually be a group-for example, a group of auto insurance policyholders or an employer with group health coverage for its employees.
To what extent is the experience credible? How much of the experience difference can be attributed to random variation and how much is due to the fact that the individual is actually a better or worse risk than the rest of the population? To what extent should that experience be used in setting future premiums?
We can formulate the problem as follows.

•
Denote losses by X j and assume that we have observed independent losses X = (X 1 , X 2 , . . . , X n ).
Note that X j might be the annual loss amount from policyholder j, or the loss in the j th period, depending on the context.

•
Let ξ = E[X j ] and σ 2 = Var(X j ). • Let S = ∑ n j=1 X j and let X = S n be the sample mean.

•
Let M be some other estimate of the mean for this group. M might be based on industry data or large experience studies on groups similar to the risk class in question.
Credibility theory provides actuaries with a method for combining X and M for pricing. The resulting credibility estimate is: and Z is called the credibility factor.

Significance
Credibility theory is important for actuaries because it provides a means for using company-or group-specific experience in pricing and risk assessment. Norberg (1989) addressed the application of credibility theory to group life insurance. In the United States, however, while credibility theory is used widely in health and casualty insurance, it is generally not used in life and annuity business. The 2008 Practice Note of the American Academy of Actuaries' (AAA's) Life Valuation Subcommittee observes, "For some time, credibility theory has been applied within the property and casualty industry in order to solve business problems. This has not been the case within the life, annuity and health industries.
Therefore, examples of the use of credibility theory and related practices are somewhat difficult to find and somewhat simplistic in their content" (AAA 2008). Similarly, the 2009 report (Klugman et al. 2009) notes, "The major conclusion from this survey of 190 US insurers is that credibility theory is not widely adopted among surveyed actuaries at United States life and annuity carriers to date in managing mortality-, lapse-and expense-related risks." Actuarial Standard of Practice 25 (ASOP 25) recommends that credibility theory be used, and provides guidance on credibility procedures for health, casualty, and other coverages. In 2013, the Actuarial Standards Board revised ASOP 25 to include the individual life practice area. Thus, it will be important for life actuaries to start to use credibility methodology (ASB 2013).
Moreover, credibility theory is increasingly important for life actuaries, as the Standard Valuation Law (SVL) is changing to require that principle-based reserving (PBR) be used in conjunction with the traditional formulaic approaches prescribed by state insurance regulations. PBR relies more heavily on company-specific experience. Thus, it will be important for actuaries to have sound credibility methodology (NAIC and CIPR 2013). There is a proposed ASOP for PBR that places significant emphasis on credibility procedures (ASB 2014).

Overview of Paper
Our paper is structured as follows. In Section 2, we provide a brief overview of the two most common credibility methods: limited fluctuation (LF) and greatest accuracy (GA) or Bühlmann credibility. We will see that the LF method is easy to apply, but has several significant shortcomings. On the other hand, GA has a stronger mathematical foundation, but it generally cannot be applied in practice because of data constraints. In Section 3, we summarize some of the results of (Klugman et al. 2009), in which the authors illustrate an application of both the LF and GA methods to mortality experience from the Society of Actuaries (SOA) 2004-2005 Experience Study. In Section 4, we apply the LF method to M Financial's experience data and share some qualitative observations about our results. In Section 5, we use simulation to generate a "universe" of data. We apply the LF and GA credibility methods to the data and compare the results to an intuitive (though not mathematically grounded) benchmark "credibility factor". Based on the results of the qualitative comparison of the methods in Section 5, we document our conclusions and recommendations in Section 6.

Brief Overview of Credibility Methods
The two most common methods for computing the credibility factor Z are limited fluctuation (LF) credibility and greatest accuracy (GA) or Bühlmann credibility.

Full Credibility
The limited fluctuation method is most commonly used in practice. To apply LF, one computes the minimum sample size so that X will be within distance r of the true mean ξ with probability p. In other words, we seek n such that: Here, the sample size can be expressed in terms of number of claims, number of exposures (e.g., person-years), or aggregate claims. If the sample size meets or exceeds the minimum, then full credibility (Z = 1) can be assigned to the experience data. Otherwise, partial credibility may be assigned based on the ratio of the actual sample size to the size required for full credibility.
Observe that Equation (2) yields the following equivalent conditions: Denote the random variable X−ξ σ/ √ n by Y. Then we seek n so that: This condition holds if and only if: where Φ(u) is the standard normal cumulative distribution function (CDF) evaluated at u, assuming that Y has (approximately) a standard normal distribution. Finally, we see that condition Equation (2) holds if and only if: where y p is the 1+p 2 -percentile of the standard normal distribution. We denote this critical value of n for which full credibility is awarded by n f . From Equation (3), we have the following: This condition has intuitive appeal-full credibility is awarded if the observations are not too variable. For example, suppose r = 0.05 and p = 0.90. Then, we seek n so that the probability is at least 90% that the relative error in X is smaller than 5%. We have that: and for full credibility, we require that: Similarly, if we choose r = 0.03 and p = 0.90, y p r 2 ≈ 3007, and the standard for full credibility is: We computed the standard for full credibility to control the deviation of X from its mean ξ. We remark that, alternatively, we could compute n to control the error in S relative to its mean nξ. Following the derivation above, the same criterion as in Equation (3) results. This is not surprising, as S is a scalar multiple of X.

Application to Life Insurance
Suppose now that the X j are Bernoulli random variables that assume the values 1 or 0 with probabilities q and 1 − q, respectively. In the life insurance context, the random variable S counts the number of deaths. S = ∑ n j=1 X j has a binomial distribution, and under appropriate conditions, we can approximate the distribution of S with the Poisson distribution with mean and variance λ = nq. Note that λ is the expected number of deaths in a group of n lives. Applying Equation (3) with λ = nξ = nq, the standard for full credibility is: In practice, we replace the expected number of claims by the observed number of claims, applying full credibility if at least 1082 (or 3007), for example, are observed. Remark 1. In some derivations, λ is used to represent the expected number of claims per policy; thus, the credibility standard is written as nλ ≥ y p r 2 (e.g., see (Klugman et al. 2012)). In our derivation above, we used λ to represent the expected number of claims for the group of n lives. Thus, the standard is expressed as in Equation (5).

Partial Credibility
Suppose now that full credibility is not justified (i.e., that n < n f ). What value of Z < 1 should we assign in computing the credibility estimate Equation (1)? In (Klugman et al. 2012), the authors note, "A variety of arguments have been used for developing the value of Z, many of which lead to the same answer. All of them are flawed in one way or another." They present the following derivation.
We choose the value of Z in order to control the variance of the credibility estimate P in Equation (1). Observe first that: From Equation (4), we see that when Z < 1, we cannot ensure that Var(X) is small. Thus, we choose the value of Z < 1 so that Var(P) is fixed at its upper bound when Z = 1. In other words, we choose Z so that: thus, we set: Remark 2.
1. If full credibility is not justified (i.e., if n < n f ), the partial credibility factor Z is the square root of the ratio of the number of observations n to the number of observations n f required for full credibility. 2. Observe that as σ increases, Z decreases. Thus, lower credibility is awarded when the observations are more variable. Again, this is consistent with our intuition. 3. In Equation (6), the term: Var(X) is the mean of the estimator divided by the standard deviation of the estimator X of the unknown mean ξ. 4. We can write the formula Equation (6) succinctly to include both the full and partial credibility cases by writing:

Strengths and Weaknesses of the Limited Fluctuation Approach
The LF method is simple to apply and, unlike GA credibility, it relies only on company-specific data. Thus, LF is used widely in practice. However, it has numerous shortcomings. For example:

•
There is no justification for choosing an estimate of the form Equation (1).

•
There is no guarantee of the reliability of the estimate M, and the method does not account for the relative soundness of M versus X.

•
The choices of p and r are completely arbitrary. Note that as r → 0 or p → 1, n f → ∞. Thus, given any credibility standard n f , one can select a value of r and p to justify it!

Greatest Accuracy (Bühlmann) Credibility
Another common approach is greatest accuracy (GA) or Bühlmann credibility. In this approach, we assume that the risk level of the members of the risk class are described by a parameter Θ, which varies by policyholder. Note that we assume here that the risk class has been determined by the usual underwriting process, that is, that all of the "standard" underwriting criteria (e.g., smoker status, health history, driving record, etc.) have already been considered. Thus, Θ represents the residual risk heterogeneity within the risk class. We assume that Θ and its distribution are unobservable. An obvious choice for the premium would be µ n+1 (θ) Suppose we restrict ourselves to estimators that are linear combinations of the past observations. That is, estimators of the form: Define the expected value of the hypothetical means µ by: One can show that, under certain conditions, the credibility premium P = ZX + (1 − Z)µ minimizes the squared error loss: where the credibility factor is: Here n is the sample size and: It turns out that P is also the best linear approximation to the Bayesian premium E[X n+1 |X]. We observe that under the GA method, the following intuitive results hold: For more homogeneous risk classes (i.e., those whose value of a is small relative to ν), Z will be closer to 0. In other words, the value of µ is a more valuable predictor for a more homogenous population. However, for a more heterogeneous group (i.e., those whose value of a is large relative to ν), Z will be closer to 1. This result is appealing. If risk classes are very similar to each other (a is small relative to ν), the population mean µ should be weighted more heavily. If the risk classes are very different from each other (a is large relative to ν), the experience data should get more weight.

Strengths and Weaknesses of the Greatest Accuracy Method
The GA method has a more sound mathematical foundation in the sense that we do not arbitrarily choose an estimator of the form in Equation (1). Rather, we choose the best estimator (in the sense of mean-squared error) among the class of estimators that are linear in the past observations. Moreover, unlike LF, GA credibility takes into account how distinctive a group or risk class is from the rest of the population. However, in practice, companies do not have access to the data required to compute the expected process variance a or the variance in the hypothetical means ν. These quantities rely on (proprietary) data from other companies. As a result, GA credibility is rarely used in practice.

Other Credibility Methods
The credibility ratemaking problem can be framed in the context of generalized linear mixed models (GLMMs). We refer the reader to (Frees et al. 1999), (Nelder and Verrall 1997), and (Christiansen and Schinzinger 2016) for more information. In fact, the GA method presented in Section 2.2 is a special case of the GLMM-see (Frees et al. 1999) and (Klinker 2011) for details. Expressing credibility models in the framework of GLMMs is advantageous as they allow for more generality and flexibility. Moreover, one can use standard statistical software packages for data analysis. However, for our purposes, and for our simulated data set, the additional generality of GLMMs was not required. Other methods include mixed effects models, hierarchical models, and evolutionary models. We refer the reader to (Buhlmann and Gisler 2005), (Dannenbburg et al. 1996), and (Goovaerts and Hoogstad 1987).

Previous Literature: Application of Credibility to Company Mortality Experience Data
In (Klugman et al. 2009), the authors apply both the LF and GA methods to determine credibility factors for companies' actual-to-expected (A/E) mortality ratio in terms of claim counts and amounts paid. Expected mortality is based on the 2001 VBT and actual mortality is from 10 companies that participated in the Society of Actuaries (SOA) 2004-2005 Experience Study. The authors develop the formulae for the A/E ratios and the credibility factors and include an Excel spreadsheet for concreteness.
We apply the methods and formulae of (Klugman et al. 2009) in the work that follows in Sections 4 and 5. For completeness and readability, we briefly summarize the notation and formulae.

Notation
Assume that there are n lives.
• f i is the fraction of the year for which the ith life was observed.
We assume that q i = m c q s i (i.e., that the actual mortality is a constant multiple of the table). Observe that A c and E c give the actual and expected number of deaths. We define similar quantities for the actual and expected dollar amounts paid as follows. Let b i be the benefit amount for policy i. Then we define the estimated actual-to-expected (A/E) mortality ratio based on claim amounts.

Limited Fluctuation Formulae
In order to compute the credibility factor, we need the mean and variance of the estimatorsm c andm d . We present the results form c ; the results form d are similar. One can show that: If q s i is sufficiently small, we can assume that (1 − f i m c q s i ) is approximately 1, and the expression above for the variance simplifies: Now, combining the expressions for the mean and variance of the estimatorm c with the expression for the credibility factor given in Equation (7), we have that: If we use the approximation for the variance given in Equation (9), we have: Finally, replacing the unknown quantity m c with its estimate from the observed datam c = A c E c , the expressions simplify as: Observe that the final expression in Equation (10) is equivalent to the expression in Equation (6). Recall that if r = 0.05 and p = 0.90, the approximate expression in Equation (10) becomes: Similarly, if r = 0.03 and p = 0.90, the approximate expression in Equation (10) becomes: We remark that these parameters and the resulting requirement of 3007 claims for full credibility are prescribed by the Canadian Committee on Life Insurance Financial Reporting (Canadian Institute of Actuaries 2002).

Greatest Accuracy Formulae
We define the notation as in Section 3.1. Following (Klugman et al. 2009), we suppress the subscripts c and d for count and dollar, respectively. We add the subscript h to emphasize that we are computing the credibility factor for company h, h = 1, . . . , r. Thus, for example, A h is the actual dollar amount (or claim count) for company h, E h is the expected dollar amount (or claim count), and f hi the fraction of the year for which the ith life from company h was observed. We denote the number of lives in company h by n h .
We let m h denote the true mortality ratio for company h and we assume that mean and variance of m h are given by µ and σ 2 , respectively.
We present the formulas for the credibility factors based on dollar amounts and remark that one can compute the credibility factors based on claim counts by setting the benefit amounts b hi equal to 1.
In (Klugman et al. 2009), the authors posit an estimator of the form Zm h + W and use calculus to show that Note that the credibility factor for company h depends on the experience data of all companies. Moreover, we need an estimate for µ and σ 2 . In Klugman et al. (2009), the authors present an intuitive and unbiased estimator for the mean mortality ratio µ, total actual deaths over all companies total expected deaths over all companies .
To derive an estimator for the variance σ 2 , the authors derive formulas for the expected weighted squared error in terms of µ and σ 2 . This results in the estimator

LF Analysis Applied to M Financial's Data: Qualitative Results
We applied the LF method to M Financial's data to compute credibility factors for the A/E ratios based on M's experience data. In this section, we describe our calculations and share some qualitative observations about our results.
We computed the credibility factors using four different methods. For each method, we computed an aggregate credibility factor as well as specific factors for sex and smoker status. Thus, for each of the four methods, we computed five credibility factors: aggregate, male nonsmoker, female nonsmoker, male smoker, and female smoker. In each case, the actual mortality is based on M Financial's 2012 Preliminary Experience Study and the expected mortality is based on the 2008 VBT.
First, we used the methods described in (Klugman et al. 2009) and Section 3.2 to compute the LF credibility factors for M Financial's observed A/E ratios based on claim counts. We computed the credibility factors using both the "exact" and approximate expressions given in Equation (10). We denoted the resulting credibility factors by Z e c and Z a c , respectively. We also computed the credibility factors for the overall mortality rateq = number of claims total exposures .
We denoted the resulting credibility factor by Z q . Finally, we used the methods described in Sections 3.1 and 3.2 above and Section 2a of (Klugman et al. 2009) to compute the LF credibility factors for M Financial's observed A/E ratios based on amounts-retained net amount at risk (NAR)-instead of claim counts. We denoted the resulting credibility factor by Z NAR .
The values of Z a c , Z e c , and Z q were remarkably close for the aggregate credibility factor and for each sex/smoker status combination. More specifically, the maximum relative difference among the factors was 3%. Thus, while computing a credibility factor for the overall mortality rateq is too simplistic, in this case, the resulting credibility factors were remarkably close to the credibility factors based on claim counts.
The credibility factors Z NAR for the A/E ratio based on retained NAR were significantly lower than the credibility factors based on claim counts. The relative difference ranged from 47% to 64%, depending on the sex/smoker status. This is not surprising. As we observed in Remark 2 of Section 2.1.3, the credibility factor should decrease as the variance increases. When we compute the A/E ratio for NAR, there is an additional source of randomness-namely, whether claims occurred for high-value or low-value policies.
This raises the question of whether to use claim counts or NAR as the basis for the credibility factors. According to (Klugman 2011), "If there is no difference in mortality by amount, there really is no good statistical reason to use amounts. They add noise, not accuracy. If there is a difference, then the mix of sales has an impact. As long as the future mix will be similar, this will provide a more accurate estimate of future mortality."

Qualitative Comparison of Credibility Methods Using a Simulated Data Set
As we described in Section 2, the LF method is easy to apply, as it relies only on company-specific data. However, it has several significant shortcomings. The GA method addresses these shortcomings, but requires data from other companies. Because experience data is proprietary, GA is rarely used in practice.
In this section, we examine the performance of the LF and GA credibility methods on a simulated data set, and we compare the resulting credibility factors with an intuitive, though not mathematically grounded, "credibility factor."

Overview
In the simulation, we created a dataset consisting of 1 million individuals. More specifically, we created 20 risk classes or populations of 50,000 individuals each and computed the A/E ratio for each of the risk classes. Expected mortality was based on the 2008 VBT, and actual mortality was based on simulation from the table, or a multiple of it. Thus, the hypothetical means-the A/E ratio for each risk class-were known. To generate the experience data, we sampled from the risk classes and computed the observed A/E ratios.
Then, given the observed A/E ratios, the "industry average" (expected value of the hypothetical means or overall A/E ratio for the universe), and the known hypothetical means, we computed credibility factors three different ways: GA, LF, and an intuitively pleasing (though not mathematically grounded) benchmark credibility factor. We applied the GA and LF methods as in (Klugman et al. 2009).
We contrast the credibility results in a series of figures. The most notable result is that LF with the "standard" range and probability parameters yielded a significantly lower credibility factor than the other methods when the number of claims was small. Thus, the LF method might understate the credibility for companies with good mortality experience.

Generating the "Universe"
We generated 20 risk classes, or populations, of 50,000 people each. Thus, the universe consisted of 1 million individuals. Each of the 20 populations had a different age distribution and had an A/E ratio prescribed a priori. More specifically, the A/E ratio for each risk class was prescribed by scaling the 1-year 2008 VBT probabilities q x by a multiplier α h ∈ {0.73, 0.76, . . . , 1.30}, h = 1, . . . , 20.
Using the scaled 2008 VBT table, we computed t q x for t ≤ 20 and for various values of x. These values gave the cumulative distribution function (CDF) of the random variable T(x), the remaining future lifetime of (x). Then, we generated the outcomes of T(x) in the usual way. Namely, we generated outcomes from the uniform distribution on (0, 1) and inverted the CDF to determine the outcome of the random variable T(x). If T(x) < 20, then a claim occurred during the 20-year period; otherwise, no claim occurred.
We then calculated the following: • The ratio m h of actual deaths to expected deaths over the 20-year time period, h = 1, . . . , 20.
In other words, we computed the hypothetical mean for each of the 20 risk classes. These values ranged from 0.71 to 1.28.

•
The overall A/E ratio µ = 0.95 for the universe of 1 million individuals.

Generating the Experience Data and Computing the LF and GA Credibility Factors
We then generated experience data for each of the 20 risk classes. We viewed the experience data as the experience of a particular company, as in (Klugman et al. 2009). We fixed the number n of policyholders (e.g., n = 500, n = 1500, . . .) and randomly selected n "lives" from each of the risk classes. We computed the observed A/E ratiom h and we computed the credibility factorsZ LF h and Z GA h using the LF and GA methods, respectively, as described in (Klugman et al. 2009) and in Section 3.
In Figure 1, we show the LF and GA credibility factors for three "companies" (risk classes) from our simulated data set and contrast our results with the results from (Klugman et al. 2009), which are based on real data. The results are remarkably consistent. We observe that, without exception, the LF factors were considerably lower than the GA factors for small numbers of claims. We observed further as the number of claims approached the LF threshold for full credibility, the LF factors exceeded the GA factors. It is not surprising that the factors differed significantly or that the curves would cross, as the underlying methods are so different. A similar phenomenon was observed in (Klugman et al. 2009).

An Intuitively Appealing Benchmark "Credibility Factor"
We wanted to compare the LF and GA credibility factors with an intuitively appealing benchmark. To achieve this, we posed the question, "In repeated draws of experience data from the risk classes, how frequently is the observed A/E ratiom h closer than the 'industry average' µ to the true value (hypothetical mean) m h ?" Thus, we generated 2000 trials of experience data for n policyholders from each risk class. We introduce the following notation. • Let µ = the A/E ratio for the universe of 1 million individuals. In the simulation, we had µ ≈ 0.95. • Letm hni = the observed A/E ratio for company h, trial i, when there are n policyholders in the group.

•
Let m h be the true A/E ratio for population h. Recall that this was prescribed a priori when we generated the 20 populations. • Let Z hn be the credibility factor for company h when there are n policyholders in the group.
We computed an intuitively appealing benchmark credibility factor Z hn as follows. For each of the 2000 trials, define Thus, I hni indicates whether the observed A/E ratiom hni or the "industry average" µ = 0.95 is more representative of the true mean m h .
Then, we define Z hn = 1 2000 ∑ 2000 i=1 I hni . Thus, Z hn computed this way is the proportion of the 2000 trials for which the experience data was more representative of the risk class's A/E ratio than the universe's A/E ratio µ.
This definition of Z has intuitive appeal. The weight given tom h should be the proportion of the time that it is a better representation of the true mean m h than µ. However, we emphasize that there is no mathematical underpinning for this choice. It seems that we could just as easily use the square root of the proportion, or the proportion squared, for example. Moreover, in practice, one could not compute the benchmark factor. The benchmark factor simply allows us to compare, in our simulated universe, the LF and GA results with an intuitively appealing measure.
We wish to compare the benchmark factor to the LF and GA factors computed in Section 5.3. To make the comparison meaningful, we must express Z B hn as a function of the number of claims. Thus, we introduce the following notation: LetZ LF hd andZ GA hd be the credibility factors for company h when there are d claims computed via the LF and GA methods, respectively. We computed these factors in Section 5.3.

Qualitative Comparison of the Credibility Methods
We computed the benchmark factor in Section 5.4. We also computed credibility factors for the simulated data set using the GA and LF methods as in (Klugman et al. 2009). For LF, we used the common parameter choices r = 0.05 and p = 0.9 and the Canadian standard r = 0.03 and p = 0.9 (Canadian Institute of Actuaries 2002). In other words, under the LF method, we assigned full credibility if the probability that the relative error in the A/E ratio is less than 0.05 (or 0.03) was at least 90%. Recall that from the approximate expression in Equation (10), these parameter choices yielded 1082 and 3007 claims, respectively, as the standards for full credibility.
We summarize our observations for the 20 companies (risk classes) below, and show the credibility factors for 3 of the 20.
1. Generally speaking, the benchmark and GA factors were similar and considerably higher than the LF factors when the number of claims was small-see Figures 2 and 3.
2. The exception to Observation 1 occurred when the hypothetical mean m h was close to the overall population mean µ ≈ 0.95-see Figure 4. This is not surprising, as the benchmark factor is the relative frequency in 2000 trials that the observed A/E ratiom h is closer to the true hypothetical mean m h than the overall population mean µ. When m h is very close to µ, it is unlikely that m h will land closer-that is, the event |m h − m h | < |µ − m h | is unlikely, resulting in a smaller benchmark Z. 3. For our simulated data set, and for the real data in (Klugman et al. 2009), the GA method produced significantly higher credibility factors at low numbers of claims than the LF method with r = 0.05. Of course, the difference was even more pronounced when we chose the Canadian standard r = 0.03.  . We contrast the Benchmark, GA, and LF credibility factors for "company" (risk class) 20, whose hypothetical mean is m 20 ≈ 1.278.  . We contrast the Benchmark, GA, and LF credibility factors for "company" (risk class) 9, whose hypothetical mean is m 9 ≈ 0.953.

Conclusions and Recommendations
Without exception, the LF factors based on the full credibility requirement of 1082 claims were significantly lower than the GA factors when the number of claims was small. Of course, the disparity was greater when we used 3007 as the standard for full credibility.
Our analysis suggests that the LF method may significantly understate the appropriate credibility factor for populations with exceptionally good mortality experience, such as M Financial's clientele. Thus, our analysis provides support for awarding higher credibility factors than the LF method yields when the number of claims is low.
The NAIC recognizes that there is a need for life insurance experience data in support of PBR. The PBR Implementation Task Force are developing an Experience Reporting Framework for collecting, warehousing, and analyzing experience data. This is ongoing work; see (AAA 2016) and (NAIC 2016), for example.