# Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data

#### 2.2. Logistic Regression

#### 2.3. Decision Tree

#### 2.4. Random Forest

#### 2.5. Artificial Neural Network

#### 2.6. Measures

#### 2.6.1. Explanatory Variables

#### 2.6.2. Outcome Variables

#### 2.7. Data Pre-Processing

#### 2.7.1. Categorical Encoding Using Target Statistics

#### 2.7.2. Word Embedding

#### 2.8. Representation of Narratives

#### 2.9. Data Augmentation

#### 2.10. Performance Metrics

#### 2.10.1. Accuracy

#### 2.10.2. F1 Score

#### 2.11. Mean Squared Error (MSE)

#### 2.12. Root Mean Square Error (RMSE)

#### 2.13. Predicting Outcome of the Injury

#### 2.13.1. Fixed Field Entries

#### 2.13.2. Narratives

#### 2.14. Predicting Days Away from Work

#### 2.14.1. Fixed Field Entries

#### 2.14.2. Narratives

## 3. Results

#### 3.1. Injury Outcome

#### 3.2. Days Away from Work

#### 3.3. Feature Importance

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

DAFW | Days away from work |

ANN | Artificial neural network |

MSHA | Mine safety and health administration |

GAN | Generative adversarial network |

NLP | Natural language processing |

TF-IDF | Term frequency and inverse document frequency |

MSE | Mean squared error |

RMSE | Root mean squared error |

CBOW | Continuous bag of words |

## References

- Nowrouzi-Kia, B.; Sharma, B.; Dignard, C.; Kerekes, Z.; Dumond, J.; Li, A.; Larivière, M. Systematic review: Lost-time injuries in the US mining industry. Occup. Med.
**2017**, 67, 442–447. [Google Scholar] [CrossRef][Green Version] - Cdc.gov. NIOSH Mining. 2020. Available online: https://www.cdc.gov/niosh/mining (accessed on 12 July 2020).
- Stemn, E. Analysis of Injuries in the Ghanaian Mining Industry and Priority Areas for Research. Saf. Health Work
**2019**, 10, 151–165. [Google Scholar] [CrossRef] - Margolis, K. Underground coal mining injury: A look at how age and experience relate to days lost from work following an injury. Saf. Sci.
**2010**, 48, 417–421. [Google Scholar] [CrossRef] - Onder, S. Evaluation of occupational injuries with lost days among opencast coal mine workers through logistic regression models. Saf. Sci.
**2013**, 59, 86–92. [Google Scholar] [CrossRef] - Bell, J.; Gardner, L.; Landsittel, D. Slip and fall-related injuries in relation to environmental cold and work location in above-ground coal mining operations. Am. J. Ind. Med.
**2000**, 38, 40–48. [Google Scholar] [CrossRef] - Pollard, J.; Heberger, J.; Dempsey, P.G. Maintenance and repair injuries in US mining. J. Qual. Maint. Eng.
**2014**, 20, 20–31. [Google Scholar] [CrossRef] - Coleman, P.; Kerkering, J. Measuring mining safety with injury statistics: Lost workdays as indicators of risk. J. Saf. Res.
**2007**, 38, 523–533. [Google Scholar] [CrossRef] - Nowrouzi, B.; Rojkova, M.; Casole, J.; Nowrouzi-Kia, B. A bibliometric review of the most cited literature related to mining injuries. Int. J. Min. Reclam. Environ.
**2016**, 31, 276–285. [Google Scholar] [CrossRef] - Sarkar, S.; Vinay, S.; Raj, R.; Maiti, J.; Mitra, P. Application of optimized machine learning techniques for prediction of occupational accidents. Comput. Oper. Res.
**2019**, 106, 210–224. [Google Scholar] [CrossRef] - Matías, J.; Rivas, T.; Martín, J.; Taboada, J. A machine learning methodology for the analysis of workplace accidents. Int. J. Comput. Math.
**2008**, 85, 559–578. [Google Scholar] [CrossRef] - Tixier, A.; Hallowell, M.; Rajagopalan, B.; Bowman, D. Application of machine learning to construction injury prediction. Autom. Constr.
**2016**, 69, 102–114. [Google Scholar] [CrossRef][Green Version] - Davoudi Kakhki, F.; Freeman, S.; Mosher, G. Evaluating machine learning performance in predicting injury severity in agribusiness industries. Saf. Sci.
**2019**, 117, 257–262. [Google Scholar] [CrossRef] - Davoudi Kakhki, F.; Freeman, S.; Mosher, G. Use of logistic regression to identify factors influencing the post-incident state of occupational injuries in agribusiness operations. Appl. Sci.
**2019**, 9, 3449. [Google Scholar] [CrossRef][Green Version] - Rivas, T.; Paz, M.; Martín, J.; Matías, J.; García, J.; Taboada, J. Explaining and predicting workplace accidents using data-mining techniques. Reliab. Eng. Syst. Saf.
**2011**, 96, 739–747. [Google Scholar] [CrossRef] - Marucci-Wellman, H.; Corns, H.; Lehto, M. Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review. Accid. Anal. Prev.
**2017**, 98, 359–371. [Google Scholar] [CrossRef] [PubMed][Green Version] - Davoudi Kakhki, F.; Freeman, S.; Mosher, G. Segmentation of severe occupational incidents in agribusiness industries using latent class clustering. Appl. Sci.
**2019**, 9, 3641. [Google Scholar] [CrossRef][Green Version] - Davoudi Kakhki, F.; Freeman, S.; Mosher, G. Use of neural networks to identify safety prevention priorities in agro-manufacturing operations within commercial grain elevators. Appl. Sci.
**2019**, 9, 4690. [Google Scholar] [CrossRef][Green Version] - Davoudi Kakhki, F.; Freeman, S.; Mosher, G. Applied machine learning in agro-manufacturing occupational Incidents. Procedia Manuf.
**2020**, 48, 24–30. [Google Scholar] [CrossRef] - Agarwal, B.; Agarwal, H.; Talib, P. Application of artificial intelligence for successful strategy implementation in indias banking sector. Int. J. Adv. Res.
**2019**, 7, 157–166. [Google Scholar] [CrossRef][Green Version] - Song, X.; Yang, S.; Huang, Z.; Huang, T. The Application of Artificial Intelligence in Electronic Commerce. J. Phys. Conf. Ser.
**2019**, 1302, 032030. [Google Scholar] [CrossRef] - Riihimaa, P. Impact of machine learning and feature selection on type 2 diabetes risk prediction. J. Med. Artif. Intell.
**2020**, 3, 10. [Google Scholar] [CrossRef] - He, X.; Chen, W.; Nie, B.; Zhang, M. Classification technique for danger classes of coal and gas outburst in deep coal mines. Saf. Sci.
**2010**, 48, 173–178. [Google Scholar] [CrossRef] - Sanmiquel, L.; Rossell, J.; Vintró, C. Study of Spanish mining accidents using data mining techniques. Saf. Sci.
**2015**, 75, 49–55. [Google Scholar] [CrossRef] - Wilson, J.; Lorenz, K. Modeling Binary Correlated Responses Using SAS 2015, SPSS and R; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Peng, C.; Lee, K.; Ingersoll, G. An Introduction to Logistic Regression Analysis and Reporting. J. Educ. Res.
**2002**, 96, 3–14. [Google Scholar] [CrossRef] - Bhattacharjee, P.; Dey, V.; Mandal, U. Risk assessment by failure mode and effects analysis (FMEA) using an interval number based logistic regression model. Saf. Sci.
**2020**, 132, 104967. [Google Scholar] [CrossRef] - Han, J.; Kamber, M. Data Mining; Elsevier: Haryana, India, 2012. [Google Scholar]
- Patel, B. Efficient Classification of Data Using Decision Tree. Bonfring Int. J. Data Min.
**2012**, 2, 6–12. [Google Scholar] [CrossRef] - Patel, N.; Upadhyay, S. Study of Various Decision Tree Pruning Methods with their Empirical Comparison in WEKA. Int. J. Comput. Appl.
**2012**, 60, 20–25. [Google Scholar] [CrossRef] - Prajwala, T.R. A Comparative Study on Decision Tree and Random Forest Using R Tool. Int. J. Adv. Res. Comput. Commun. Eng.
**2015**, 4, 196–199. [Google Scholar] [CrossRef] - Papadopoulos, S.; Azar, E.; Woon, W.; Kontokosta, C. Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J. Build. Perform. Simul.
**2017**, 11, 322–332. [Google Scholar] [CrossRef] - Kotsiantis, S. Bagging and boosting variants for handling classifications problems: A survey. Knowl. Eng. Rev.
**2013**, 29, 78–100. [Google Scholar] [CrossRef] - Chang, L.; Wang, H. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accid. Anal. Prev.
**2006**, 38, 1019–1027. [Google Scholar] [CrossRef] [PubMed] - Cheng, C.; Leu, S.; Cheng, Y.; Wu, T.; Lin, C. Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan’s construction industry. Accid. Anal. Prev.
**2012**, 48, 214–222. [Google Scholar] [CrossRef] [PubMed] - Rosenblatt, F. The Perceptron: A Probabilistic model for information storage and organization in the brain. Psychol. Rev.
**1958**, 65, 386–408. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lek, S.; Park, Y. Artificial Neural Networks. In Encyclopedia of Ecology; Elsevier: Amsterdam, The Netherlands, 2008; pp. 237–245. [Google Scholar] [CrossRef]
- Hinton, G.; Osindero, S.; Teh, Y. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput.
**2006**, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed] - White, H. Learning in Artificial Neural Networks: A Statistical Perspective. Neural Comput.
**1989**, 1, 425–464. [Google Scholar] [CrossRef] - Tu, J. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol.
**1996**, 49, 1225–1231. [Google Scholar] [CrossRef] - Zhu, C.; Gao, D. Influence of Data Preprocessing. J. Comput. Sci. Eng.
**2016**, 10, 51–57. [Google Scholar] [CrossRef] - Potdar, K.; Pardawala, T.S.; Pai, C.D. A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. Int. J. Comput. Appl.
**2017**, 175, 7–9. [Google Scholar] [CrossRef] - Micci-Barreca, D. A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems. ACM SIGKDD Explor. Newsl.
**2001**, 3, 27. [Google Scholar] [CrossRef] - Khattak, F.; Jeblee, S.; Pou-Prom, C.; Abdalla, M.; Meaney, C.; Rudzicz, F. A survey of word embeddings for clinical text. J. Biomed. Inform. X
**2019**, 4, 100057. [Google Scholar] [CrossRef] - Kilimci, Z.; Akyokus, S. Deep Learning-and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification. Complexity
**2018**, 2018, 7130146. [Google Scholar] [CrossRef][Green Version] - Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv
**2013**, arXiv:1301.3781. [Google Scholar] - Qi, L.; Li, R.; Wong, J.; Tavanapong, W.; Peterson, D.A.M. Social Media in State Politics: Mining Policy Agendas Topics. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, Australia, 31 July–3 August 2017. [Google Scholar] [CrossRef]
- Luque, A.; Carrasco, A.; Martín, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit.
**2019**, 91, 216–231. [Google Scholar] [CrossRef] - Shields, M.; Teferra, K.; Hapij, A.; Daddazio, R. Refined Stratified Sampling for efficient Monte Carlo based uncertainty quantification. Reliab. Eng. Syst. Saf.
**2015**, 142, 310–325. [Google Scholar] [CrossRef][Green Version] - Groves, W.; Kecojevic, V.; Komljenovic, D. Analysis of fatalities and injuries involving mining equipment. J. Saf. Res.
**2007**, 38, 461–470. [Google Scholar] [CrossRef] - Bajpayee, T.; Rehak, T.; Mowrey, G.; Ingram, D. Blasting injuries in surface mining with emphasis on flyrock and blast area security. J. Saf. Res.
**2004**, 35, 47–57. [Google Scholar] [CrossRef] - Donoghue, A. Occupational health hazards in mining: An overview. Occup. Med.
**2004**, 54, 283–289. [Google Scholar] [CrossRef][Green Version] - Sanmiquel, L.; Bascompta, M.; Rossell, J.M.; Anticoi, H.F.; Guash, E. Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data-Mining Techniques. Int. J. Environ. Res. Public Health
**2018**, 15, 462. [Google Scholar] [CrossRef][Green Version] - Wilson, K.S.; Kootbodien, T.; Naicker, N. Excess Mortality Due to External Causes in Women in the South African Mining Industry: 2013–2015. Int. J. Environ. Res. Public Health
**2020**, 17, 1875. [Google Scholar] [CrossRef][Green Version] - Lee, M.; Kim, Y.; Jung, K. Text Classification via Sentence-level Graph Convolutional Networks. KIISE Trans. Comput. Pract.
**2019**, 25, 397–401. [Google Scholar] [CrossRef] - Kim, G.; Jang, J.; Lee, J.; Kim, K.; Yeo, W.; Kim, J. Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks. Asia Pac. J. Inf. Syst.
**2019**, 29, 771–788. [Google Scholar] [CrossRef] - Douzas, G.; Bacao, F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl.
**2018**, 91, 464–471. [Google Scholar] [CrossRef]

Target Class | Count before Augmentation | Count after Augmentation |
---|---|---|

Class1: All Other Cases (Including 1st Aid) | 676 | 7564 |

Class2: Days Away From Work Only | 31,653 | 31,653 |

Class3: Days Restricted Activity Only | 16,633 | 16,633 |

Class4: Days Away From Work & Restricted Activity | 10,025 | 10,025 |

Class5: Fatality | 336 | 3842 |

Class6: Injuries due to Natural Causes | 444 | 2785 |

Class7: No Days Away From Work, No Restricted Activity | 27,627 | 27,627 |

Class8: Occupational Illness not DEG 1–6 | 1346 | 9676 |

Class9: Permanent Total or Permanent Partial Disability | 895 | 12,796 |

Model | F1 Score | Accuracy |
---|---|---|

Logistic regression | 0.64 | 67% |

Decision Tree | 0.58 | 58% |

Random Forest | 0.66 | 66% |

Artificial Neural Network | 0.67 | 78% |

Model | F1 Score | Accuracy |
---|---|---|

Random Forest | 0.93 | 93% |

Artificial Neural Network | 0.60 | 92% |

Model | Input | MSE | RMSE |
---|---|---|---|

Random forest | Fixed Field Entries | 14.65 | 3.82 |

Injury Narratives | 1502.61 | 38.76 | |

Artificial neural network | Fixed Field Entries | 0.38 | 0.62 |

Injury Narratives | 5944.74 | 77.10 |

Feature | Description |
---|---|

Nature of Injury | Identifies the injury in terms of its principal physical characteristics. |

Injured body part | Identifies the body part affected by an injury. |

Occupation | Occupation of the accident victim’s regular job title. |

Coal or Metal | Identifies if the accident occurred at a Coal or Metal/Non-Metal mine. |

Job Experience | Experience in the job title of the person affected calculated in the decimal year. |

Hours | Time difference between accident time and shift begin time in hours. |

Injury Source | Identifies the object, substances, exposure or bodily motion which directly produced or inflicted the injury. |

Classification | Identifies the circumstances which contributed most directly to the resulting accident. |

Activity | Specific activity the accident victim was performing at the time of the incident. |

Accident type | Identifies the event which directly resulted in the injury/accident. |

Sub-unit | The Sub-unit of the mining site where the accident occurred. |

Mine experience | Total experience at a specific mine of the person affected calculated in decimal years. |

Total experience | Total mining experience of the person affected calculated in decimal years. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yedla, A.; Kakhki, F.D.; Jannesari, A. Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations. *Int. J. Environ. Res. Public Health* **2020**, *17*, 7054.
https://doi.org/10.3390/ijerph17197054

**AMA Style**

Yedla A, Kakhki FD, Jannesari A. Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations. *International Journal of Environmental Research and Public Health*. 2020; 17(19):7054.
https://doi.org/10.3390/ijerph17197054

**Chicago/Turabian Style**

Yedla, Anurag, Fatemeh Davoudi Kakhki, and Ali Jannesari. 2020. "Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations" *International Journal of Environmental Research and Public Health* 17, no. 19: 7054.
https://doi.org/10.3390/ijerph17197054