# Analyzing Large Workers’ Compensation Claims Using Generalized Linear Models and Monte Carlo Simulation

^{1}

^{2}

^{*}

## Abstract

**:**

^{2}= 0.79). Injury characteristics and worker’s occupation were predictive of large claims’ occurrence and costs. The conclusions of this study are useful in modifying and estimating insurance pricing within high-risk agribusiness industries. The approach of this study can be used as a framework to forecast workers’ compensation claims amounts with rare, high-cost events in other industries. This work is useful for insurance practitioners concerned with statistical and predictive modeling in financial risk analysis.

## 1. Introduction

#### 1.1. Data

#### 1.2. Methods

#### 1.2.1. Generalized Linear Regression Modeling

#### 1.2.2. Penalization Methods and Variable Selection

#### 1.2.3. Quantitative Measure of Performance for Model Selection

^{2}and the root mean square error (RMSE). Values of R

^{2}range from 0 to 1, where 1 is a perfect fit and 0 means there is no gain by using the model over using fixed background response rates. It estimates the proportion of the variation in the response around the mean that can be attributed to terms in the model rather than to random error. The RMSE is defined as the standard deviation of the response variable.

^{2}and the lowest RMSE is preferred. The statistical details of all the model selection criteria are shown in Table 4 (where k is the number of estimated parameters in the model and n is the number of observations in the data set). The model comparison criteria in this study are adopted from [28], and the analyses were done using JMP Pro statistical software (JMP

^{®}, Version <13.2>. SAS Institute Inc., Cary, NC, 1989-2007).

#### 1.2.4. Stochastic Monte Carlo Modeling for Severity Simulation and Risk Analysis

#### 1.2.5. Development of the MC Simulation Model

## 2. Results

#### 2.1. Summary of Predictive Modeling Analysis

^{2}, RMSE, BIC, and AIC), both the gamma and lognormal regression models show a good fit to the data set. The gamma regression model does a better job of explaining the variability in the data, with a higher R

^{2}. The lognormal model shows lower values for RMSE, BIC, and AIC.

#### 2.2. Summary of the Developed MC Model Analysis

## 3. Discussion

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Baldwin, M.L.; McLaren, C.F. Workers’ Compensation: Benefits, Coverage, and Costs (2014 Data); National Academy of Social Insurance: Washington, DC, USA, 2016. [Google Scholar]
- Achieng, O.M. Actuarial modeling for insurance claim severity in motor comprehensive policy using industrial statistical distributions. In Proceedings of the 2010 International Congress of Actuaries, Cape Town, South Africa, 7–12 March 2010. [Google Scholar]
- Shi, P.; Frees, E.W. Long-tail longitudinal modeling of insurance company expenses. Insur. Math. Econ.
**2010**, 47, 303–314. [Google Scholar] [CrossRef] - Szymendera, S.D. Workers’ Compensation: Overview and Issues; (CRS Report R44580); Congressional Research Service: Washington, DC, USA, 2016.
- Guelman, L. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst. Appl.
**2012**, 39, 3659–3667. [Google Scholar] [CrossRef] - Engsner, H.; Lindholm, M.; Lindskog, F. Insurance valuation: A computable multi-period cost-of-capital approach. Insur. Math. Econ.
**2017**, 72, 250–264. [Google Scholar] [CrossRef] - Schwatka, N.V.; Atherly, A.; Dally, M.J.; Fang, H.; Brockbank, C.V.; Tenney, L.; Newman, L.S. Health risk factors as predictors of workers’ compensation claim occurrence and cost. Occup. Environ. Med.
**2017**, 74, 14–23. [Google Scholar] [CrossRef] [PubMed] - McCullagh, P.; Nelder, J. Generalized Linear Models. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar]
- Anderson, D.; Feldblum, S.; Modlin, C.; Schirmacher, D.; Schirmacher, E.; Thandi, N. A Practitioner’s Guide to Generalized Linear Models; Syllabus Year; Casualty Actuarial Society (CAS): Arlington County, VA, USA, 2010. [Google Scholar]
- Haberman, S.; Renshaw, A. Generalized linear models and actuarial science. Ournal Royal Stat. Soc.
**1996**, 45, 407–436. [Google Scholar] [CrossRef] - Xia, M. Bayesian Adjustment for Insurance Misrepresentation in Heavy-Tailed Loss Regression. Risks
**2018**, 6, 83. [Google Scholar] [CrossRef] - Boland, P.J. Statistical Methods in General Insurance. 2006. Available online: https://iase-web.org/documents/papers/icots7/5G1_BOLA.pdf (accessed on 25 June 2018).
- Packová, V. Loss Distributions in Insurance Risk Management. In Recent Advances on Economics and Business Administration, Proceedings of the International Conference on Economics and Business Administration (EBA 2015), Barcelona, Spain, 7–9 April 2015; INASE: Barcelona, Spain, 2015; pp. 17–22. [Google Scholar]
- Frees, E.W. Predictive modeling applications in actuarial science. Predictive Modeling Applications in Actuarial Science (Vol. 1); Cambridge University Press: Cambridge, UK, 2014; Volume 1. [Google Scholar]
- Nath, D.C.; Das, J. Modeling of Insurance Data through Two Heavy Tailed Distributions: Computation of Some of Their Actuarial Quantities through Simulation from Their Equilibrium Distributions and the Use of Their Convolutions. J. Math. Finance
**2016**, 6, 378–400. [Google Scholar] [CrossRef] - Keatinge, C.L. Modeling Losses with the Mixed Exponential Distribution. Proc. Casualty Actuar. Soc.
**1999**, LXXXVI, 654–698. [Google Scholar] - Ravi, A.; Butar, F.B. An insight into heavy-tailed distribution. J. Math. Sci. Math. Educ.
**2010**, 5, 15. [Google Scholar] - Tang, Q. Heavy Tails of Discounted Aggregate Claims in the Continuous-Time Renewal Model. J. Appl. Probab.
**2007**, 44, 285–294. [Google Scholar] [CrossRef] - Frees, E.W.; Shi, P.; Valdez, E.A. Actuarial applications of a hierarchical insurance claims model. ASTIN Bull. J. IAA
**2009**, 39, 165–197. [Google Scholar] [CrossRef] - Meyers, G. On Predictive Modeling for Claim Severity; Casualty Actuarial Society (CAS): Arlington County, VA, USA, 2017. [Google Scholar]
- Crotty, M.; Barker, C. Penalizing Your Models: An Overview of the Generalized Regression Platform; SAS Institute: Cary, NC, USA, 2014. [Google Scholar]
- Cerchiara, R.R.; Edwards, M.; Gambini, A. Generalized Linear Models in Life Insurance: Decrements and Risk Factor Analysis Under Solvency II. In Proceedings of the 18th International AFIR Colloquium, Rome, Italy, 1–3 October 2008; Available online: http://www.actuaries.org/AFIR/Colloquia/Rome2/Cerchiara_Edwards_Gambini.pdf (accessed on 20 May 2018).
- James, G.W. Linear Model Selection and Regularization. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013; pp. 203–264. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B
**1996**, 58, 267–288. [Google Scholar] - Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc.
**2012**, 101, 1418–1429. [Google Scholar] [CrossRef] - Burnham, P.K.; Anderson, D. Model selection and multi-model inference. In A Practical Information-Theoric Approach; Springer: Berlin, Germnay, 2003; p. 1229. [Google Scholar]
- Burnham, K.P.; Anderson, D. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Methods Res.
**2004**, 33, 261–304. [Google Scholar] [CrossRef] - JMP
^{®}11 Fitting Linear Models; SAS Institute: Cary, NC, USA, 2013. - Fish, L.J.; Halcoussis, D.; Phillips, G.M. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation. Am. J. Bus. Educ.
**2017**, 10, 81–96. [Google Scholar] - Armaghani, D.J.; Mahdiyar, A.; Hasanipanah, M.; Faradonbeh, R.S.; Khandelwal, M.; Amnieh, H.B. Risk Assessment and Prediction of Flyrock Distance by Combined Multiple Regression Analysis and Monte Carlo Simulation of Quarry Blasting. Rock Mech. Rock Eng.
**2016**, 49, 3631–3641. [Google Scholar] [CrossRef] - Panel, U.E.T. Guiding Principles for Monte Carlo Analysis; US EPA: Washington, DC, USA, 1997.
- Mooney, C.Z. Monte Carlo Simulation; Sage Publications: New York, NY, USA, 1997. [Google Scholar]
- Dunn, W.L.; Shultis, J.K. Monte Carlo Methods for Design and Analysis of Radiation Detectors. Radiat. Phys. Chem.
**2009**, 78, 852–858. [Google Scholar] [CrossRef] - Koehler, E.; Brown, E.; Haneuse, S.J.P.A. On the Assessment of Monte Carlo Error in Simulation-Based. Statistical Analyses. Am. Stat. Assoc.
**2009**, 63, 155–162. [Google Scholar] [CrossRef] [PubMed] - Mingoti, S.A.; Matos, R.A. Clustering Algorithms for Categorical Data: A Monte Carlo Study. Int. J. Stat. Appl.
**2012**, 2, 24–32. [Google Scholar] [CrossRef] - Mucha, V.; Pales, M.; Sakalova, K. Calculation of the Capital Requirement Using the Monte Carlo Simulation for Non-life Insurance. Èkon. Cas.
**2016**, 64, 878–893. [Google Scholar] - Asmussen, S. Conditional Monte Carlo for Sums, with Applications to Insurance and Finance; Thiele Research Reports; Department of Mathematics, Aarhus University: Aarhus, Denmark, 2017. [Google Scholar]
- Peters, G.W.; Targino, R.S.; Wuthrich, M.V. Bayesian Modelling, Monte Carlo Sampling and Capital Allocation of Insurance Risks. Safety
**2017**, 5, 53. [Google Scholar][Green Version] - Hahn, L. Multi-year non-life insurance risk of dependent lines of business in the multivariate additive loss reserving model. Insur. Math. Econ.
**2017**, 75, 71–81. [Google Scholar] [CrossRef] - Das, K.P.; Halder, S.C. Understanding extreme stock trading volume by generalized Pareto distribution. N. C. J. Math. Stat.
**2016**, 2, 45–60. [Google Scholar] - Kaassis, B.; Badri, A. Development of a Preliminary Model for Evaluating Occupational Health and SafetyRisk Management Maturity in Small and Medium-Sized Enterprises. Safety
**2018**, 4, 5. [Google Scholar] [CrossRef] - Comberti, L.; Demichela, M.; Baldissone, G.; Fois, G.; Luzzi, R. Large Occupational Accidents Data Analysis with a Coupled Unsupervised Algorithm: The S.O.M. K-Means Method an Application to the Wood Industry. Safety
**2018**, 4, 51. [Google Scholar] [CrossRef]

Year | Mean | Std Dev | Min | Max | Median | Sample Size | Skewness |
---|---|---|---|---|---|---|---|

2008 | $273,965 | $215,299 | $102,673 | $1,105,357 | $171,901 | 80 | 1.83 |

2009 | $342,128 | $940,824 | $103,273 | $8,151,576 | $174,868 | 74 | 8.07 |

2010 | $279,556 | $319,357 | $100,714 | $2,615,677 | $187,036 | 90 | 4.96 |

2011 | $255,055 | $180,380 | $100,354 | $831,617 | $191,890 | 76 | 1.79 |

2012 | $278,590 | $352,159 | $100,542 | $3,206,900 | $209,496 | 95 | 6.51 |

2013 | $304,881 | $694,877 | $100,243 | $7,591,850 | $170,690 | 155 | 8.84 |

2014 | $267,087 | $390,138 | $100,961 | $3,748,887 | $173,204 | 187 | 6.27 |

2015 | $222,002 | $226,601 | $100,162 | $2,145,148 | $152,556 | 223 | 5.19 |

2016 | $235,226 | $265,838 | $101,317 | $1,452,000 | $146,391 | 51 | 3.33 |

All | $268,622 | $451,790 | $100,162 | $8,151,576 | $168,988 | 1031 | 11.36 |

Variable | Description |
---|---|

Agricultural-related Industry | 16 levels; grain, agronomy, refined fuel, feed milling, etc. |

Gender | Male, female, unidentified |

Occupation | 104 levels; grain elevator operators, poultry producers, etc. |

Injury | 7 levels; death, permanent disability, medical only, etc. |

Body group | 6 levels; lower extremities, trunk, upper extremities, etc. |

Cause group | 9 levels; burn or heat-scald, etc. |

Nature group | 3 levels; multiple injuries, occupational diseases, etc. |

Body part | 49 levels; abdomen, ankle, hip, eye(s), internal organs, etc. |

Cause | 59 levels; chemicals, dust, lifting, machinery, pushing, etc. |

Nature | 29 levels; dislocation, amputation, laceration, etc. |

Age | min: 17.8 years old; max: 81.7 years old |

Tenure | min: 0 years; max: 48 years |

Method | Selection | Shrinkage |
---|---|---|

Maximum Likelihood | no | no |

Ridge | no | yes |

Forward Selection | yes | no |

Lasso | yes | yes |

Elastic Net | yes | yes |

Criterion | Formula |
---|---|

AIC * | −2 log likelihood + 2k |

BIC * | −2 log likelihood + k ln(n) |

RMSE * | $\sqrt{{\displaystyle \sum}_{i=1}^{n}\frac{{\left(yi-\widehat{y}i\right)}^{2}}{n}}$ |

R^{2} | $1-\frac{{{\displaystyle \sum}}_{i=1}^{n}{\left(yi-\widehat{y}i\right)}^{2}}{{{\displaystyle \sum}}_{i=1}^{n}{\left(yi-\overline{y}i\right)}^{2}}$ |

**Table 5.**Effect test results for the generalized linear model (GLM) with gamma distribution using the lasso penalization method.

Predictor | DF | Wald χ^{2} | Prob > χ^{2} * |
---|---|---|---|

Injury | 6 | 1315.03 | <0.0001 |

Cause | 50 | 629.23 | <0.0001 |

Occupation | 102 | 383.51 | <0.0001 |

Body Part | 42 | 165.15 | <0.0001 |

Nature | 22 | 18.92 | 0.0003 |

Cause Group | - | - | - |

Agricultural-related Industry | - | - | - |

^{2}: chi-square value. DF: degree of freedom for each variable.

**Table 6.**Effect test results for the GLM with Weibull distribution using the lasso penalization method.

Predictor | DF | Wald χ^{2} | Prob > χ^{2} * |
---|---|---|---|

Injury | 6 | 121.12 | <0.0001 |

Cause | 50 | 60.55 | <0.0001 |

Occupation | 100 | 71.51 | <0.0001 |

Body Part | 42 | 61.72 | <0.0001 |

Nature | 22 | 16.51 | 0.0009 |

Cause Group | 2 | 13.97 | 0.0029 |

Agricultural-related Industry | 17 | 7.12 | 0.0284 |

^{2}: chi-square value.

**Table 7.**Effect test results for the GLM with lognormal distribution using the lasso penalization method.

Predictor | DF | Wald χ^{2} | Prob > χ^{2} * |
---|---|---|---|

Injury | 6 | 55.61 | <0.0001 |

Cause | 50 | 174.28 | <0.0001 |

Occupation | 100 | 67.66 | <0.0001 |

Body Part | 42 | 61.21 | <0.0001 |

Nature | 22 | 11.17 | 0.0108 |

Cause Group | - | - | - |

Agricultural-related Industry | - | - | - |

^{2}: chi-square value.

Criteria | Gamma | Weibull | Lognormal |
---|---|---|---|

R^{2} | 0.79 | 0.46 | 0.53 |

RMSE | 163,002 | 245,624 | 145,974 |

BIC | 27,386 | 27,410 | 26,809 |

AIC | 27,145 | 27,145 | 27,079 |

−LL | 13,519 | 13,514 | 13,345 |

Descriptive Statistics | Empirical Data | Gamma | Weibull | Lognormal |
---|---|---|---|---|

Mean | 268,622 | 257,505 | 257,947 | 249,064 |

Standard Deviation | 451,790 | 364,631 | 264,264 | 256,901 |

Standard Error Mean | 14,070 | 5157 | 3737 | 3633 |

Upper 95% Mean | 296,232 | 267,615 | 265,273 | 256,187 |

Lower 95% Mean | 241,012 | 247,396 | 250,620 | 241,942 |

N(Sample Size) | 1031 | 5000 | 5000 | 5000 |

Descriptive Statistics | Gamma | Weibull | Lognormal |
---|---|---|---|

Mean | −4.14% | −3.97% | −7.28% |

Standard Deviation | −19.29% | −41.51% | −43.14% |

Standard Error Mean | −63.35% | −73.44% | −74.18% |

Upper 95% Mean | −9.66% | −10.45% | −13.52% |

Lower 95% Mean | 2.65% | 3.99% | 0.39% |

N (Sample Size) | 5000 | 5000 | 5000 |

**Table 11.**Comparison of root mean square error (RMSE) between empirical data GLMs and simulation data GLMs.

Descriptive Statistics | Gamma | Weibull | Lognormal |
---|---|---|---|

Mean | 11,117 | 10,676 | 19,558 |

Standard Deviation | 87,159 | 187,526 | 194,889 |

Standard Error Mean | 8914 | 10,333 | 10,437 |

Upper 95% Mean | 28,618 | 30,959 | 40,045 |

Lower 95% Mean | 6384 | 9608 | 929 |

N (Sample Size) | 5000 | 5000 | 5000 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Davoudi Kakhki, F.; Freeman, S.A.; Mosher, G.A. Analyzing Large Workers’ Compensation Claims Using Generalized Linear Models and Monte Carlo Simulation. *Safety* **2018**, *4*, 57.
https://doi.org/10.3390/safety4040057

**AMA Style**

Davoudi Kakhki F, Freeman SA, Mosher GA. Analyzing Large Workers’ Compensation Claims Using Generalized Linear Models and Monte Carlo Simulation. *Safety*. 2018; 4(4):57.
https://doi.org/10.3390/safety4040057

**Chicago/Turabian Style**

Davoudi Kakhki, Fatemeh, Steven A. Freeman, and Gretchen A. Mosher. 2018. "Analyzing Large Workers’ Compensation Claims Using Generalized Linear Models and Monte Carlo Simulation" *Safety* 4, no. 4: 57.
https://doi.org/10.3390/safety4040057