1. Introduction
In auto insurance rate making, to avoid the potential effect of adverse selection in the insurance market (
Liu et al. 2022), the pricing of insurance contracts is based on the sophisticated evaluation of risk factors. Here, risk factors discriminate the risk level associated with different drivers and drive the underlying insurance costs. The statistical and actuarial soundness in estimating the relativities of risk factors through data modeling enhances the actuarial fairness of insurance price (
Meyers and Van Hoyweghen 2018), which is one of the critical components of the auto insurance rate regulation. The risk relativity is a measure relative to a selected base level for a given risk factor. When calculated using loss costs, they are referred to empirical risk relativities. Often, they are derived by statistical modeling of loss costs. However, risk relativities used by insurance companies to determine the insurance price may be adjusted due to the economic optimization process within insurance companies (
Arora and Arora 2014). This optimization process, where the minimization of bias between pure premium and average losses is targeted, is often driven by a bias–variance tradeoff that aims at smoothing the parameter estimates (
Dugas et al. 2003). The flexibility of designing their optimization process according to the desired objective function may cause higher bias for some rating groups, particularly those with smaller risk exposures. Given that pricing is based on the predictive modeling of loss severity and claim frequency, Canadian insurance companies must consider how to keep a balance between satisfying the regulatory rules and optimally adjusting the insurance rates when the desired property of risk relativity is imposed. For instance, the relativities of driving records are expected to decrease, but the empirical patterns for some other risk classifications may not be the case, and the pattern could be reversed. Therefore, from a regulatory perspective, insurance companies must explain the techniques and the rationales for making the adjustments in detail. In addition, when adjusting the relativities of risk factors, the plus–minus 10% rule needs to be ensured; credibility-weighted relativity can be made by using information from either major competitors or the benchmark values from regulators.
Although the basic principle of insurance is that the cost of risk transfer must be equal to the price of the insurance policy, adjustment to the premium charged is possible, especially in a competitive insurance market, because of the asymmetric information between insurance companies and policyholders (
Chiappori et al. 2006;
Cohen 2005;
Saito 2006). Insurance companies tend to have more information than policyholders about their financial situation, including the return on equity, insurance loss by lines of business, and expenses. In addition, insurance companies may prefer writing policies in one area and try to avoid writing them in another if they believe it is profitable. To achieve this, insurance companies may adjust the premium for their favorite groups to lower while making the prices higher than what they should be for the unfavored groups (
Regan et al. 2008). This is so-called insurance discrimination, which often occurs in the insurance market that is not regulated or lightly regulated (
Avraham 2017). A recent study in
Berry-Stolzlei and Born (
2012) shows a significant effect of de-regulation on insurance pricing in causing fairness. Insurance discrimination is possible even in a country where auto insurance is heavily regulated. For instance, in the auto insurance regulation of Canada, regulators allow the relativities of risk factors applied to the pricing algorithm to be adjusted within a specific interval. This adjustment is often conducted by credibility weighting procedures (
Ronka-Chmielowiec and Poprawska 2005;
Zahi 2021). Up to
of estimated risk relativity may be allowed for each factor as long as the adjustment is justifiable. There are many factors used in insurance pricing. If most factors are adjusted in the same direction, premium levels may significantly differ from loss cost levels. All of these tell us that it is arguable whether the insurance premium charged by insurance companies represents a “fair price” for an insured (
Frees and Huang 2021;
Hanafy and Ming 2021).
In auto insurance, the premiums are calculated by multiplying a set of risk relativities to a base rate that is further multiplied by a multiplier where a loading factor is assumed. The loading is an additional cost built into the insurance policy to cover expenses used to settle the insurance claims and costs for business operation. From the actuarial perspective, the risk relativities used in the premium charges must align with those obtained from statistical models (
Frezal and Barry 2020;
Landes 2015). However, as mentioned above, a minor adjustment may be made to each risk factor level. This may cause a significant difference in relativities between the premiums and loss costs at the insured level after accounting for the insurance loading factor. To estimate this loading factor, one may determine the loss ratio based on the premium and loss data. However, regulators must ensure that the premiums charged are not excessive when it comes to the loading factor (
Cummins and Weiss 1992;
Grabowski et al. 1989). This depends on the regulation rule on what loss ratio level is considered justifiable, fair, and exact. The new regulation rules may be applied when excessive premiums are charged. That is to say, by controlling the profit level and business expenses that the insurance companies may claim, fairness at the industry level may be addressed. In addition, because of the flexibility of the loading factor, political parties often use it for election campaigns to advertise a reduction of auto insurance premiums to the public, which often happens in Canada. Therefore, the study on fairness at an industry level may be used to justify whether or not such an insurance premium reduction is possible. If there is a possibility, how much can it be reduced when the government implements a new regulation rule for auto insurance? This is based on the assumption that lowering the benefit coverages is not permitted. Therefore, the loss cost level is considered to be not reducible. To further investigate the fairness and whether or not the premium level is excessive at the industry level, one can compare this fixed effect to the loss ratio benchmark, around 65%, to see if they coincide.
In insurance, a risk factor is classified into different levels so that the premium rates can be computed based on the levels of given risk factors (
Abraham 1985). However, when it comes to fairness, the definition of fairness is not clear, as the formal definition associated with the problem is lacking. On the other hand, there is an intrinsic difference in fairness between the insured and the insurer, which is mainly due to technologies used in insurance pricing, including financial and insurance technologies empowered by data science and analytical tools (
Cao et al. 2021). The perspectives of insureds and insurance companies can be very different, so the interpretation of fairness can go completely the opposite way. Often, the insured aims for individual fairness, while the insurance pricing is completed mainly on group fairness and then adjusted by individual risk level. In
Frezal and Barry (
2020), some limitations and misinterpretations of actuarial fairness were discussed and studied. It has re-centered the debate around insurance fairness by going beyond the actuarial consideration. In
Xin and Huang (
2022), insurance discrimination was discussed and reviewed from law and regulation perspectives to address the fairness criteria. Various anti-discrimination pricing methods, such as Generalized Linear Models (GLMs), were proposed to eliminate discrimination based on fairness criteria for insurance pricing. It was revealed that to avoid insurance discrimination, there is a need to avoid adverse selection, which has been part of the results of the application of machine learning and artificial intelligence that build predictive models based on the loss data at the individual level, unlike the case in rate regulation, which uses the group-based modeling. To further classify the insurance risk, Usage-Based Insurance (UBI) has been used to address the concern of the amount of driving in pricing, which provides another dimension of risk classification. However, the study in
Ferreira and Minikel (
2012) shows the importance of measuring auto insurance risk using major risk factors such as class and territory. They further found that the major risk factors cannot be replaced by the annual mileage driven.
The auto insurance premium rates are not always aligned with the major risk factor relativities calculated from the industry-level insurance loss data. This is because the regulators and the insurance companies have different definitions of fairness (
Thiery and Van Schoubroeck 2006). The regulators focus more on the overall reasonableness at the industry level and whether the premium rates the insurance companies charge are justifiable based on their designed risk groups. On the other hand, insurance companies focus more on fairness within their own companies. To minimize bias, they normally compute the average pure premium based on their company-level data and average claim amounts. To evaluate fairness, regulators use the major risk factors to derive the risk relativities using industry-level data. Insurance companies use these major risk factor relativities as benchmarks for the credibility-weighting purpose. Insurance companies aim to ensure actuarial fairness among the subpopulation of their customers, while regulators aim to ensure that individual companies are not charging excessive premiums. In addition, insurance companies’ major risk factor relativities must comply with the regulatory rules and benchmarks, which is consistent with the traditional actuarial fairness notion that similar risks should be charged similarly. For instance, in Canada, when proposing rate changes in risk factor levels, an insurance company must submit an application for rate changes to the regulator for approval, along with justification from actuarial and statistical perspectives to support the appropriateness and reasonableness of the rate changes.
Recently, the focus on fairness has shifted from collective risk-based to individual risk-based. The pricing uses predictive modeling to determine the individual loss cost rather than average loss costs estimated on the aggregate loss data by different risk levels (
Barry 2020). These changes may lead to a further adjustment of insurance premiums from one group to the other. Therefore, the high-risk group may not be charged as high as the one determined by the relativities of risk factors. Furthermore, regulation efforts have been made to improve fairness by restricting the use of some risk factors, such as gender, eliminating gender discrimination (
Ryan 1986;
Schmeiser et al. 2014). However, since the fairness of the classification mechanism is part of the standard regulatory goals, charging an excessive premium for high-risk drivers may conflict with the fairness objective (
Isotupa et al. 2019). In
Isotupa et al. (
2019), two experience-rating mechanisms were used to rate a group of drivers, where findings conclude that high driving record (DR) classes cause the highest risk drivers to pay unsustainable excessive premiums. The driving record in auto insurance refers to the total number of years of no accidents, and it is often capped at a maximum level, for example, six years.
Traditional actuarial fairness is based on company-level data to ensure the insurance rate charges are fair. The fairness criterion is often based on the minimization of mean square errors between average pure premium and average incurred loss amount in a given subpopulation across all subpopulations
Dugas et al. (
2003). This depends on the definition of subpopulation and the risk factors used to determine the pure premium. In this paper, actuarial fairness is built on industry-level data that aim to address fairness from the regulation perspective. Because of this, the risk factors used to address fairness are only the major risk factors, while the pricing completed by insurance companies uses many other risk factors to further classify the insurance. From a regulatory perspective, the concept of actuarial fairness at the industry level is relatively new; not much research has focused on this. A similar study was completed in
Charpentier et al. (
2022) to examine models for flood losses and the disparity by premiums they entail. Although offering wide coverage for moderate premiums to all may not be fair to low-risk insurers, social solidarity will be significantly improved. This is similar to the purpose of auto insurance regulation, which aims at collective actuarial fairness rather than individual fairness. However, the methodology that can be used for investigating actuarial fairness at an industry level is currently lacking.
This work applies statistical models to rate and classification data from the automobile statistical plan to investigate the disparities between insurance premiums and loss costs using major risk factors used in rate regulation as predictors. The automobile statistical plan provides aggregate loss, risk exposures, and premiums at the industry level. This information allows us to examine the relationship between the loss costs and premiums, potentially indicating overall fairness when risk classification is considered. We focus on information summarized by types of use (i.e., CLASS), driving records, regions, and major insurance coverage. We aim to demonstrate the existence and degree of auto insurance fairness by examining actuarial rates and actual insurance premiums among different levels of major risk factors. We investigate fairness by measuring the discrepancy of relativities between premiums and loss costs. We first estimate the risk relativities of the major insurance factors using Generalized Linear Models (GLM) through a combination of loss costs and premiums information. This approach differs from the traditional method, where relativities are measured based on loss costs only. Within this risk relativity estimate, we consider the major risk factors as explanatory variables: coverages, regions, types of use, and driving records. We capture the model fixed effect caused by adding premium information to the response variable in the model. We also investigate the effect on the fairness results from using different statistical models, suggesting a careful use of a statistical model in addressing fairness under different modeling objectives. The novelty of this work is the modeling techniques used to simultaneously estimate the relativities of major risk factors and the difference caused by loss costs and premiums information, which can be used to derive the intrinsic industry-level loss ratio.
The rest of this paper is organized as follows. In
Section 2, the data and its basic processing are briefly introduced. The proposed methods using Generalized Linear Models to estimate the risk relativities and the fixed effect are discussed. In
Section 3, the summary of the results is presented. Finally, we conclude our findings and provide further remarks in
Section 4.
3. Results
We first address the limitation of using the distance-based method for comparing the relativities obtained from premiums and loss costs separately, using each accident year data. The results are presented in
Table 2. Overall, the relativity difference between premiums and loss costs tends to be high for Rural for both CLASS and DR factors. This may suggest that the disparity between loss costs and premiums is high for Rural areas, which may be due to the adjustment of relativity on loss costs. Because of this significant difference in relativity, fairness may be of greater concern for Rural drivers than Urban drivers. In addition, the results in
Table 2 tell that adjustment is linked to the accident year, as we can observe that the adjustment seems higher for the accident year 2009 for coverages AB and TPL, which are two major coverages. However, this distance-based approach cannot tell the direction of adjustment of relativities. We further illustrate this disparity between premium and loss cost relativities, which are separated by Urban and Rural for the CLASS factor with TPL and COL coverages. From the results presented in
Figure 2 and
Figure 3, we realize that the CLASS with a small number of exposures is mainly affected, and the adjustment of relativity is much higher than other levels of CLASS, where the number of exposures is high. We also can see that relativities are adjusted much heavier for Rural than Urban areas for the groups with a smaller number of exposures. This may suggest that the fairness concern is for the minority rather than the majority groups, and it is more likely to happen for Rural drivers. Note that the obtained results are from a one-way analysis, implying a separate estimate for loss costs and premiums. In this case, the estimate of relativities for either loss costs or premiums is subject to a potential double penalty, which is the primary concern for a one-way analysis.
We next analyze and compare the pattern of DR relativities obtained from various proposed models: one-way, two-way and three-way GLMs. Note that all three models take loss costs and premiums together as the model response, which implies the one-way model differs from the one-way analysis conducted for distance-based measures mentioned above. We find that the one-way relativity estimates for DR obtained using premium and loss cost information are much higher than those obtained from the two-way models. This may result from de-coupling the interaction between DR and CLASS, leading to less impact on premiums caused by the double risk penalty. The results, separated by Urban and Rural and by accident year, are reported in
Figure 4 and
Figure 5. We also observe that the obtained results from the three-way models are similar to those from the two-way and one-way models for the case when the average loss data are used. This implies that multiple-way analysis is more appropriate for estimating the relativity of major risk factors such as DR and CLASS; therefore, it is a more suitable method for analyzing loss data to address the fairness issue. The detailed comparison among different coverages for the relativities estimates for DR, CLASS and fixed effect is reported in
Table 3,
Table 4 and
Table 5. The fixed effect reflects the disparity of the premiums and loss costs and, therefore, can be used for measuring the fairness of premiums. The obtained fixed effect estimated for one-way, two-way and three-way models, respectively, are summarized in
Table 6,
Table 7 and
Table 8. We observe that the estimates from different uses of the error function do not cause a significant difference, suggesting a robust estimate of such fixed effects using GLM. In all methods, collision coverage tends to have a much higher fixed effect for all accident years. This may imply that adjustment has been made more on collision coverage, which makes sense, as this coverage is directly linked to the driver’s accident history, indirectly reflecting the driving habit of an insured. In addition, the fixed effects estimated from the one-way analysis behave similarly for CLASS and DR, implying that the combined fixed effect through a multiple-way analysis is appropriate.
In the two-way analysis, the GLM estimates the relativity associated with DR and CLASS simultaneously and the fixed effect caused by the disparity of loss costs and premiums. For within territories, the input data are separated by Territory, Rural or Urban and by the three different accident years, 2009, 2010, and 2011, respectively. In analyzing a specific example, we will examine the results obtained from Accident Benefit (AB) coverage in the Urban Territory. Loss relativities are calculated taking
and
as the bases. A fixed effect greater than one would imply that the overall premiums are higher than the loss cost, but this is what we expect, since a loading factor is applied to the loss cost to calculate the final premiums by insurance companies. Suppose the value of the fixed effect is less than one, which is the opposite of the normal condition. In that case, this may imply that the premiums charged have been significantly adjusted among coverages, accident years, territories, DR or CLASS or their combinations. In
Table 7, when examining the coverage by separate accident years, we see that in 2009 and 2010, the fixed effects for AB Urban are significantly less than one. Thus, premium rates were further recovered in 2011, leading to a significant adjustment of premium rates over three years. We also observe that the fixed effects for Rural are significantly higher than those for Urban for all coverages and accident years that we consider. This evidences the fairness issue for Rural drivers, which implies that Rural drivers are overcharged for the premiums when we analyze the loss cost and premium separated by Urban and Rural. However, this two-way analysis does not consider the effect of Territory, which may significantly impact the results. This within-territory analysis is useful for the objective that aims at examining the territory-dependent fairness level based on DR and CLASS. To analyze the degree of the disparities between the premium and loss costs, we take a benchmark loss ratio of 70% (it is typical to have a loss ratio between 65% and 70% for auto insurance), which would lead to a fixed effect of 1.43. This means that we should allow a premium loading factor of 43% at an industry level and that any number smaller than 1.43 would mean a loss ratio bigger than 70%. Observing the fixed effects obtained from the two-way analysis of AB Urban, the annual average fixed effects all fall under this loss ratio. This may suggest that the overall average fixed effect obtained from a two-way model as an industry-level fixed effect is underestimated. Therefore, they are considered to be inappropriate. A fixed effect lower than 1.43 presented in the two-way analysis means that auto insurance companies would have an insufficient profit margin.
In the three-way analysis, the GLM estimates the relativities simultaneously, taking the additional explanatory variable of Territory along with DR and CLASS. The input data take Rural as the base. The data are still separated by the accident years 2009, 2010 and 2011. Loss relativities are calculated taking
and
as the base. The results are shown in
Table 8. When examining fixed effects for the three-way model, we see an opposite effect to that of the two-way analysis. Overall, the Urban Territory has a higher fixed effect than the Rural Territory. The average fixed effect considering exposures shows a fixed effect higher than 1.43, which is the assumed benchmark loss ratio of 70%. This result obtained from the three-way model is more reasonable than the ones from either the one-way or two-way models. The main reason behind this is that the Territory is a major determinant of auto insurance risk and is a crucial rating variable. We can see that the Territory is a statistically significant variable in each coverage, with a
p-value of almost zero. So, missing this critical variable from the statistical model can lead to significant bias in the fixed effect estimate and therefore an incorrect conclusion resulting from an inaccurate parameter estimate.
In
Table 9, we further examine the fixed effects using the three-year average loss data, which provide a more accurate estimate of the overall fixed effects. The estimated average fixed effect amongst the error distributions for the three years is relatively stable and varied only by a small amount due to the three-year period, which leads to a more stable and reliable estimate of fixed effects. We calculate the fixed effect’s weighted average for all three combined years using exposures as weight values. In
Table 7, for the years 2009 and 2010, we see that the fixed effect is 1.23, meaning that the premium is being charged 23% higher than the loss cost. This jumps significantly to 40% in 2011 (fixed effect of 1.4). Further examination of the other coverages indicates that the highest fixed effect corresponds to the COL coverage. The fixed effect is slightly over 1.8 for 2009 and 2010 and 1.66 for 2011 when looking at Rural Territory. However, when examining the fixed effect considering the 3-year average for all three coverages to see the overall difference between the premium and lost cost, the other two coverages mitigate the high fixed effect from Collision. By taking the weighted average using exposure as the weight, we can calculate a more meaningful fixed effect, as it would be misleading to compare fixed effects between coverages individually due to varying exposure amounts.
4. Conclusions
The study of actuarial fairness in auto insurance is an important issue in the decision making of rate regulation. However, given that insurance companies have certain flexibility for adjusting the risk relativity to control the operation process and management better, the relativities of risk factors may not coincide with the ones from the regulator. The rate-making methodologies may differ from regulators to insurance companies as their rate-making goals tend to differ. This will suggest that analysis and modeling results from companies may be justifiable and considered statistically sound even though a significant difference exists among their estimates. In addition, determining the relationship between the premium charged and loss cost and under what condition one can say the overall premium charged is considered fair and exact is a complex problem. This work has focused on measuring the disparity of loss costs and premiums using various statistical models, including one-way, two-way and three-way GLM. The shortcoming of the one-way analysis is that it assumes the variables we analyze are independent of each other. This means that modeling the single variable approach does not consider the effect of the other variables. Although the two-way analysis of DR and CLASS may achieve a de-coupling of the potential interaction, the missing territory variable completely changed the direction of the fixed effect. In the two-way approach, the fixed effect corresponding to Rural areas is deemed higher than that in Urban, but the opposite result was obtained in the three-way approach. This can be due to the disproportionate exposure amounts, where Urban has much higher exposure. However, a multiple-way is subject to its own limitation, and it is not useful if there are not enough risk exposures for the classification of risk. For example, adding more rating variables may decrease the number of exposures, which would mean the number of risks would be too low to estimate the costs accurately. This leads to a lack of credibility. Therefore, more rating variables included in the model could lead to less credibility. Therefore, we can conclude that the three-way analysis is the most appropriate model.
The results suggest a significant disparity between loss costs and premiums charged when focusing on particular groups of drivers. AB and TPL coverages for Rural drivers seem to have a small fixed effect, which implies that the loading applied to them is much smaller than other groups. However, the loading applied to Urban drivers in these two coverages seems excessive. For COL coverage, both Rural and Urban drivers have a high fixed effect, which means the loading is high, particularly for Urban drivers. From this study, we observe that Urban drivers tend to have higher loading than Rural drivers, but overall, the study using this Canadian data set does not reveal a significant excessive loading. The innovation of this work is the modeling techniques used to simultaneously estimate the relativities of major risk factors and the difference caused by loss cost and premium information. The impact of this work is to provide a statistical framework that insurance regulators can use to measure the fairness of insurance premiums across different groups of insureds. Although beyond the scope of this report, the estimated fixed effect can be useful for policymakers to decide how much to lower auto insurance premiums by identifying how much room there is between the premiums and loss costs.