An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Dataset and Data Preparation
2.2. Objective-Based Entropy
2.3. Objective-Based Information Gain (OBIG) for Selecting the Features with the Greatest Explanatory Value in a Decision Tree Model
2.4. Interpretable Classification Models in the Context of Absenteeism
3. Results
3.1. A Comparison Between Interpretable Ordinal and Non-Ordinal Classifiers
3.2. The Practical Value of the Interpretable Ordinal CART—Examples of Identified Patterns
4. Conclusions and Discussion
- (1)
- Methodology. We introduce a new information measure, known as the objective-based entropy, which extends the weighted entropy proposed in Singer et al. [16] and considers the ordinal nature of the target (in this case, absenteeism). In contrast to standard entropy measures, the objective-based entropy can differentiate between two situations in which the set of absenteeism classes (“non-absent”, “hours”, “days”, “weeks”) has respective probability distributions of and , for example. We demonstrate the use of the new measure and, in particular, highlight its suitability when the objective is to identify a specific class-level (in the present case, those who may be particularly susceptible to absenteeism). Thus, the objective-based entropy measure makes it possible to focus on a specific class, unlike previous approaches that tend to focus on model-level indices (e.g., accuracy).
- (2)
- Modeling. This research highlights the value of interpretable models as decision support tools in applications such as human resource management. Indeed, human users (in our case, human resource managers) prefer interpretable models that enable their reasoning [17,18]. In the current study, understanding the logic of the models may enable human resource managers to take action and devise data-driven policies for decreasing and preventing absenteeism. We provide numerical examples to demonstrate the ability of interpretable models to uncover subgroups of individuals with common characteristics who fall into the same class of the target variable. This approach produces insights that are not discovered through conventional methods, such as hypotheses testing and regression models, as the latter typically focus on high-level correlation between individual features and the target variable (e.g., “absenteeism increases with workload”). Based on this argument, we contend that interpretable models may be superior to their noninterpretable counterparts in terms of organizational benefit, even if their performance is slightly lower. Fortunately, in this research, our interpretable models also achieve higher performance than their noninterpretable counterparts.
- (3)
- Practice. Last, the current study contributes to research on absenteeism by departing from previous research in which the “reason for absence” was used as an explanatory feature. In practice, the reason for absence is not known ahead of the absenteeism event and, moreover, most organizations do not record in their information systems the specific medical situations of their employees. Combined with the use of interpretable models that enable human resource managers to decide on actionable policies, we would argue that our model has greater practical value for analyzing and predicting absenteeism patterns than previous models that did include “reason for absence” as a feature and that were based on non-interpretable models.
Author Contributions
Funding
Conflicts of Interest
References
- Porter, L.W.; Steers, R.M. Organizational, work, and personal factors in employee turnover and absenteeism. Psychol. Bull. 1973, 80, 151. [Google Scholar] [CrossRef]
- Soriano, A.; Kozusznik, M.W.; Peiró, J.M.; Mateo, C. Mediating role of job satisfaction, affective well-being, and health in the relationship between indoor environment and absenteeism: Work patterns matter! Work 2018, 61, 313–325. [Google Scholar] [CrossRef] [PubMed]
- Hansen, C.D. Objectively measured work load, health status and sickness absence among Danish ambulance personnel. A longitudinal study Claus D. Hansen. Eur. J. Public Health 2013, 23. [Google Scholar]
- Chadwick-Jones, J.K.; Nicholson, N.; Brown, C. Social Psychology of Absence; Praeger: New York, NY, USA, 1982. [Google Scholar]
- Rhodes, S. Age-related differences in work attitudes and behavior: A review and conceptual analysis. Psychol. Bull. 1983, 93, 328–367. [Google Scholar] [CrossRef]
- Rhodes, S.R.; Steers, R.M. Managing Employee Absenteeism; Addison-Wesley: Reading, MA, USA, 1990. [Google Scholar]
- Thomson, L.; Griffiths, A.; Davison, S. Employee absence, age and tenure: A study of nonlinear effects and trivariate models. Work Stress 2000, 14, 16–34. [Google Scholar] [CrossRef]
- Kyröläinen, H.; Häkkinen, K.; Kautiainen, H.; Santtila, M.; Pihlainen, K.; Häkkinen, A. Physical fitness, BMI and sickness absence in male military personnel. Occup. Med. 2008, 58, 251–256. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Bramming, M.; Jørgensen, M.B.; Christensen, A.I.; Lau, C.J.; Egan, K.K.; Tolstrup, J.S. BMI and labor market participation: A cohort study of transitions between work, unemployment, and sickness absence. Obesity 2019, 27, 1703–1710. [Google Scholar] [CrossRef] [PubMed]
- Tewari, K.; Vandita, S.; Jain, S. Predictive Analysis of Absenteeism in MNCS Using Machine Learning Algorithm. In Proceedings of ICRIC 2019: Recent Innovations in Computing; Springer Nature: Berlin, Germany, 2020. [Google Scholar]
- Martiniano, A.; Ferreira, R.; Sassi, R.; Affonso, C. Application of a neuro fuzzy network in prediction of absenteeism at work. In Proceedings of the 7th Iberian Conference on Information Systems and Technologies (CISTI 2012), Madrid, Spain, 20–23 2012; IEEE: Piscataway, NJ, USA; pp. 1–4. [Google Scholar]
- Wahid, Z.; Satter, Z.; Al-Imran, A.; Bhuiyan, T. Predicting absenteeism at work using tree-based learners. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat, Viet Nam, 25–28 January 2019; pp. 7–11. [Google Scholar]
- Ali Shah, S.A.; Uddin, I.; Aziz, F.; Ahmad, S.; Al-Khasawneh, M.A.; Sharaf, M. An enhanced deep neural network for predicting workplace absenteeism. Complexity 2020. [Google Scholar] [CrossRef][Green Version]
- Dogruyol, K.; Sekeroglu, B. Absenteeism Prediction: A Comparative Study Using Machine Learning Models. In International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions; Springer: Berlin/Heidelberger, Germany, 2019; pp. 728–734. [Google Scholar]
- Araujo, V.S.; Rezende, T.S.; Guimarães, A.J.; Araujo, V.J.S.; de Campos Souza, P.V. A hybrid approach of intelligent systems to help predict absenteeism at work in companies. SN Appl. Sci. 2019, 1, 536. [Google Scholar] [CrossRef][Green Version]
- Japkowicz, N. Assessment metrics for imbalanced learning. In Imbalanced learning; John Wiley & Sons: Chichester, UK, 2013; pp. 187–206. [Google Scholar]
- Owen, A.B. Infinitely imbalanced logistic regression. J. Mach. Learn. Res. 2020, 8, 761–773. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2007, 16, 321–357. [Google Scholar] [CrossRef]
- Kerdprasop, N.; Kerdprasop, K. Predicting rare classes of primary tumors with over-sampling techniques. In Database Theory and Application; Bio-Science and Bio-Technology; Springer: Berlin/Heidelberg, Germany, 2011; pp. 151–160. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Labs Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef][Green Version]
- Singer, G.; Anuar, R.; Ben-Gal, I. A weighted information-gain measure for ordinal classification trees. Expert Syst. Appl. 2020, 152, 113375. [Google Scholar] [CrossRef]
- Doshi-Velez, F.; Kim, B. Considerations for evaluation and generalization in interpretable machine learning. In Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Berlin/Heidelberger, Germany, 2018; pp. 3–17. [Google Scholar]
- Pessach, D.; Singer, G.; Avrahami, D.; Ben-Gal, I.; Ben-Gal, H.C.; Shmueli, E. Employees recruitment: A prescriptive analytics approach via machine learning and mathematical programming. Decis. Support Syst. 2020, 134, 113290. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-agnostic interpretability of machine learning. arXiv 2016, arXiv:1606.05386. Available online: https://arxiv.org/abs/1606.05386 (accessed on 20 July 2020).
- Singer, G.; Golan, M.; Rabin, N.; Kleper, D. Evaluation of the effect of learning disabilities and accommodations on the prediction of the stability of academic behaviour of undergraduate engineering students using decision trees. Eur. J. Eng. Educ. 2020, 45, 614–630. [Google Scholar] [CrossRef]
- Singer, G.; Golan, M. Identification of subgroups of terror attacks with shared characteristics for the purpose of preventing mass-casualty attacks: A data-mining approach. Crime Sci. 2019, 8, 14. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Cardoso, J.S.; Costa, J.F. Learning to classify ordinal data: The data replication method. J. Mach. Learn. Res. 2007, 8, 1393–1429. [Google Scholar]
- Clegg, C.W. Psychology of employee lateness, absence, turnover: A methodological critique and an empirical study. J. Appl. Psychol. 1983, 68, 88–101. [Google Scholar] [CrossRef]
- Nicholson, N. Industrial Absence as An Indicant of Employee Motivation and Job Satisfaction. Ph.D. Thesis, University of Wales, Cardiff, UK, 1975. [Google Scholar]
- Vincenti, M.A. Physical status: The use of and interpretation of anthropometry. J. Acad. Nutr. Diet. 1996, 96, 1104. [Google Scholar]
Feature Name | Feature Type | Possible Values (for Nominal Variables) |
---|---|---|
ID | Numerical | |
Reason for absence | Categorical | 21 categories according to the International Code of Diseases (ICD) |
Month of absence | Categorical | 1-January 2-February 3-March 4-April 5-May 6-June 7-July 8-August 9-September 10-October 11-November 12-December |
Day of the week | Categorical | 2-Monday 3-Tuesday 4-Wednesday 5-Thursday 6-Friday |
Season | Categorical | 1-summer 2-autumn 3-winter 4-spring |
Transportation expense | Numerical | |
Distance from residence to work (km) | Numerical | |
Service time | Numerical | |
Age | Numerical | |
Workload (average daily) | Numerical | |
Hit target | Numerical | |
Disciplinary failure | Categorical | 1-yes 2-no |
Education | Categorical | 1-high school 2-graduate 3-postgraduate 4-master/doctor |
# of children | Numerical | |
Social drinker | Categorical | 1-yes 2-no |
Social smoker | Categorical | 1-yes 2-no |
# of pets | Numerical | |
Weight | Numerical | |
Height | Numerical | |
Body mass index | Numerical | |
Absenteeism (hours) | Numerical |
Absenteeism Hours (y) | Absenteeism Class | |||
---|---|---|---|---|
0 | not absent | 1 | 6% | |
0 < y < 8 | Hours | 2 | 57% | |
8 ≤ y < 40 | Days | 3 | 34% | |
y ≥ 40 | Weeks | 4 | 3% |
Not Absent | Hours | Days | Weeks | Total Instances | |
---|---|---|---|---|---|
Training before SMOTE | 6% | 57% | 34% | 3% | 592 |
Training after SMOTE | 25% | 25% | 25% | 25% | 1360 |
(0.6,0.3,0,0.1) | 1.30 | 0.43 | 0.25 |
(0.6,0.1,0,0.3) | 1.30 | 0.38 | 0.36 |
Performance Measures | |||||||
---|---|---|---|---|---|---|---|
F-score | Precision | Recall | Accuracy | AUC | MSE | ||
Non-ordinal classifiers | |||||||
Extreme Gradient Boosting (XGBoost) | 0.69 | 0.72 | 0.68 | 0.68 | 0.73 | 0.32 | 0.52 |
Multi-Layer Perceptron (MLP) | 0.42 | 0.33 | 0.57 | 0.57 | 0.50 | 0.49 | 0.40 |
K-Nearest Neighbor | 0.56 | 0.56 | 0.56 | 0.56 | 0.60 | 0.58 | 0.35 |
Naïve Bayes | 0.41 | 0.54 | 0.34 | 0.34 | 0.56 | 1.46 | 0.02 |
Random Forest (RF) | 0.67 | 0.68 | 0.67 | 0.67 | 0.70 | 0.35 | 0.51 |
CART | 0.66 | 0.66 | 0.66 | 0.66 | 0.69 | 0.36 | 0.41 |
Ordinal classifiers | |||||||
Ordinal CART | 0.69 | 0.70 | 0.69 | 0.69 | 0.72 | 0.31 | 0.53 |
Ordinal CART | 0.73 | 0.74 | 0.72 | 0.72 | 0.76 | 0.34 | 0.58 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singer, G.; Cohen, I. An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work. Entropy 2020, 22, 821. https://doi.org/10.3390/e22080821
Singer G, Cohen I. An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work. Entropy. 2020; 22(8):821. https://doi.org/10.3390/e22080821
Chicago/Turabian StyleSinger, Gonen, and Izack Cohen. 2020. "An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work" Entropy 22, no. 8: 821. https://doi.org/10.3390/e22080821