# An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work

^{*}

## Abstract

**:**

## 1. Introduction

^{2}values of 0.90 and 0.99, respectively. Note, however, that these analyses used the “reason for absence” as a feature, which although highly correlated with absence (i.e., every instance with an empty field for “reason for absence” obtains a value of 0 for the feature “absenteeism time in hours”), is not known before the absenteeism event. The main contributions of this research include the introduction of a new information measure, known as objective-based entropy, which considers the ordinal nature of the target (in this case, absenteeism). In addition, we highlight the value of interpretable models as decision support tools for human resource management. The combination of interpretable modeling and a metric that considers ordinal data makes our model valuable for analyzing and predicting absenteeism patterns.

## 2. Materials and Methods

#### 2.1. The Dataset and Data Preparation

#### 2.2. Objective-Based Entropy

#### 2.3. Objective-Based Information Gain (OBIG) for Selecting the Features with the Greatest Explanatory Value in a Decision Tree Model

#### 2.4. Interpretable Classification Models in the Context of Absenteeism

## 3. Results

#### 3.1. A Comparison Between Interpretable Ordinal and Non-Ordinal Classifiers

#### 3.2. The Practical Value of the Interpretable Ordinal CART—Examples of Identified Patterns

**Example**

**1.**

**The relationship between age and the level of absenteeism.**

**Example**

**2.**

**The relationship between body characteristics and the level of absenteeism.**

**Example**

**3.**

**The relationship between workload and the level of absenteeism.**

## 4. Conclusions and Discussion

- (1)
- Methodology. We introduce a new information measure, known as the objective-based entropy, which extends the weighted entropy proposed in Singer et al. [16] and considers the ordinal nature of the target (in this case, absenteeism). In contrast to standard entropy measures, the objective-based entropy can differentiate between two situations in which the set of absenteeism classes (“non-absent”, “hours”, “days”, “weeks”) has respective probability distributions of $({p}_{1},{p}_{2},{p}_{3},{p}_{4})$ and $({p}_{1},{p}_{2},{p}_{3},{p}_{4})$, for example. We demonstrate the use of the new measure and, in particular, highlight its suitability when the objective is to identify a specific class-level (in the present case, those who may be particularly susceptible to absenteeism). Thus, the objective-based entropy measure makes it possible to focus on a specific class, unlike previous approaches that tend to focus on model-level indices (e.g., accuracy).
- (2)
- Modeling. This research highlights the value of interpretable models as decision support tools in applications such as human resource management. Indeed, human users (in our case, human resource managers) prefer interpretable models that enable their reasoning [17,18]. In the current study, understanding the logic of the models may enable human resource managers to take action and devise data-driven policies for decreasing and preventing absenteeism. We provide numerical examples to demonstrate the ability of interpretable models to uncover subgroups of individuals with common characteristics who fall into the same class of the target variable. This approach produces insights that are not discovered through conventional methods, such as hypotheses testing and regression models, as the latter typically focus on high-level correlation between individual features and the target variable (e.g., “absenteeism increases with workload”). Based on this argument, we contend that interpretable models may be superior to their noninterpretable counterparts in terms of organizational benefit, even if their performance is slightly lower. Fortunately, in this research, our interpretable models also achieve higher performance than their noninterpretable counterparts.
- (3)
- Practice. Last, the current study contributes to research on absenteeism by departing from previous research in which the “reason for absence” was used as an explanatory feature. In practice, the reason for absence is not known ahead of the absenteeism event and, moreover, most organizations do not record in their information systems the specific medical situations of their employees. Combined with the use of interpretable models that enable human resource managers to decide on actionable policies, we would argue that our model has greater practical value for analyzing and predicting absenteeism patterns than previous models that did include “reason for absence” as a feature and that were based on non-interpretable models.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Porter, L.W.; Steers, R.M. Organizational, work, and personal factors in employee turnover and absenteeism. Psychol. Bull.
**1973**, 80, 151. [Google Scholar] [CrossRef] - Soriano, A.; Kozusznik, M.W.; Peiró, J.M.; Mateo, C. Mediating role of job satisfaction, affective well-being, and health in the relationship between indoor environment and absenteeism: Work patterns matter! Work
**2018**, 61, 313–325. [Google Scholar] [CrossRef] [PubMed] - Hansen, C.D. Objectively measured work load, health status and sickness absence among Danish ambulance personnel. A longitudinal study Claus D. Hansen. Eur. J. Public Health
**2013**, 23. [Google Scholar] - Chadwick-Jones, J.K.; Nicholson, N.; Brown, C. Social Psychology of Absence; Praeger: New York, NY, USA, 1982. [Google Scholar]
- Rhodes, S. Age-related differences in work attitudes and behavior: A review and conceptual analysis. Psychol. Bull.
**1983**, 93, 328–367. [Google Scholar] [CrossRef] - Rhodes, S.R.; Steers, R.M. Managing Employee Absenteeism; Addison-Wesley: Reading, MA, USA, 1990. [Google Scholar]
- Thomson, L.; Griffiths, A.; Davison, S. Employee absence, age and tenure: A study of nonlinear effects and trivariate models. Work Stress
**2000**, 14, 16–34. [Google Scholar] [CrossRef] - Kyröläinen, H.; Häkkinen, K.; Kautiainen, H.; Santtila, M.; Pihlainen, K.; Häkkinen, A. Physical fitness, BMI and sickness absence in male military personnel. Occup. Med.
**2008**, 58, 251–256. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bramming, M.; Jørgensen, M.B.; Christensen, A.I.; Lau, C.J.; Egan, K.K.; Tolstrup, J.S. BMI and labor market participation: A cohort study of transitions between work, unemployment, and sickness absence. Obesity
**2019**, 27, 1703–1710. [Google Scholar] [CrossRef] [PubMed] - Tewari, K.; Vandita, S.; Jain, S. Predictive Analysis of Absenteeism in MNCS Using Machine Learning Algorithm. In Proceedings of ICRIC 2019: Recent Innovations in Computing; Springer Nature: Berlin, Germany, 2020. [Google Scholar]
- Martiniano, A.; Ferreira, R.; Sassi, R.; Affonso, C. Application of a neuro fuzzy network in prediction of absenteeism at work. In Proceedings of the 7th Iberian Conference on Information Systems and Technologies (CISTI 2012), Madrid, Spain, 20–23 2012; IEEE: Piscataway, NJ, USA; pp. 1–4. [Google Scholar]
- Wahid, Z.; Satter, Z.; Al-Imran, A.; Bhuiyan, T. Predicting absenteeism at work using tree-based learners. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat, Viet Nam, 25–28 January 2019; pp. 7–11. [Google Scholar]
- Ali Shah, S.A.; Uddin, I.; Aziz, F.; Ahmad, S.; Al-Khasawneh, M.A.; Sharaf, M. An enhanced deep neural network for predicting workplace absenteeism. Complexity
**2020**. [Google Scholar] [CrossRef][Green Version] - Dogruyol, K.; Sekeroglu, B. Absenteeism Prediction: A Comparative Study Using Machine Learning Models. In International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions; Springer: Berlin/Heidelberger, Germany, 2019; pp. 728–734. [Google Scholar]
- Araujo, V.S.; Rezende, T.S.; Guimarães, A.J.; Araujo, V.J.S.; de Campos Souza, P.V. A hybrid approach of intelligent systems to help predict absenteeism at work in companies. SN Appl. Sci.
**2019**, 1, 536. [Google Scholar] [CrossRef][Green Version] - Japkowicz, N. Assessment metrics for imbalanced learning. In Imbalanced learning; John Wiley & Sons: Chichester, UK, 2013; pp. 187–206. [Google Scholar]
- Owen, A.B. Infinitely imbalanced logistic regression. J. Mach. Learn. Res.
**2020**, 8, 761–773. [Google Scholar] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2007**, 16, 321–357. [Google Scholar] [CrossRef] - Kerdprasop, N.; Kerdprasop, K. Predicting rare classes of primary tumors with over-sampling techniques. In Database Theory and Application; Bio-Science and Bio-Technology; Springer: Berlin/Heidelberg, Germany, 2011; pp. 151–160. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Labs Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef][Green Version] - Singer, G.; Anuar, R.; Ben-Gal, I. A weighted information-gain measure for ordinal classification trees. Expert Syst. Appl.
**2020**, 152, 113375. [Google Scholar] [CrossRef] - Doshi-Velez, F.; Kim, B. Considerations for evaluation and generalization in interpretable machine learning. In Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Berlin/Heidelberger, Germany, 2018; pp. 3–17. [Google Scholar]
- Pessach, D.; Singer, G.; Avrahami, D.; Ben-Gal, I.; Ben-Gal, H.C.; Shmueli, E. Employees recruitment: A prescriptive analytics approach via machine learning and mathematical programming. Decis. Support Syst.
**2020**, 134, 113290. [Google Scholar] - Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-agnostic interpretability of machine learning. arXiv
**2016**, arXiv:1606.05386. Available online: https://arxiv.org/abs/1606.05386 (accessed on 20 July 2020). - Singer, G.; Golan, M.; Rabin, N.; Kleper, D. Evaluation of the effect of learning disabilities and accommodations on the prediction of the stability of academic behaviour of undergraduate engineering students using decision trees. Eur. J. Eng. Educ.
**2020**, 45, 614–630. [Google Scholar] [CrossRef] - Singer, G.; Golan, M. Identification of subgroups of terror attacks with shared characteristics for the purpose of preventing mass-casualty attacks: A data-mining approach. Crime Sci.
**2019**, 8, 14. [Google Scholar] [CrossRef] - Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag.
**2009**, 45, 427–437. [Google Scholar] [CrossRef] - Cardoso, J.S.; Costa, J.F. Learning to classify ordinal data: The data replication method. J. Mach. Learn. Res.
**2007**, 8, 1393–1429. [Google Scholar] - Clegg, C.W. Psychology of employee lateness, absence, turnover: A methodological critique and an empirical study. J. Appl. Psychol.
**1983**, 68, 88–101. [Google Scholar] [CrossRef] - Nicholson, N. Industrial Absence as An Indicant of Employee Motivation and Job Satisfaction. Ph.D. Thesis, University of Wales, Cardiff, UK, 1975. [Google Scholar]
- Vincenti, M.A. Physical status: The use of and interpretation of anthropometry. J. Acad. Nutr. Diet.
**1996**, 96, 1104. [Google Scholar]

**Figure 1.**A comparative graph of Area Under the Curve (AUC) values (y-axis) for different learning models as a function of absenteeism classes (x-axis).

**Figure 2.**Relationship between age and absenteeism at work for different subgroups of employees. The LHS and RHS respectively show (i) a “simple” partition by age and (ii) a series of patterns revealed by our ordinal CART model.

**Figure 3.**Relationship between body characteristics and absenteeism at work for different subgroups of employees. The LHS and RHS respectively show (i) simple partitions by height and BMI and (ii) a more refined series of patterns revealed by the ordinal CART model.

**Figure 4.**Relationship between workload and absenteeism at work for different subgroups of employees. The LHS and RHS, respectively, show (i) a simple partition by workload and (ii) a more refined series of patterns revealed by the ordinal CART model.

**Figure 5.**A mechanism for guiding the selection and development of intervention programs for employee subgroups.

Feature Name | Feature Type | Possible Values (for Nominal Variables) |
---|---|---|

ID | Numerical | |

Reason for absence | Categorical | 21 categories according to the International Code of Diseases (ICD) |

Month of absence | Categorical | 1-January 2-February 3-March 4-April 5-May 6-June 7-July 8-August 9-September 10-October 11-November 12-December |

Day of the week | Categorical | 2-Monday 3-Tuesday 4-Wednesday 5-Thursday 6-Friday |

Season | Categorical | 1-summer 2-autumn 3-winter 4-spring |

Transportation expense | Numerical | |

Distance from residence to work (km) | Numerical | |

Service time | Numerical | |

Age | Numerical | |

Workload (average daily) | Numerical | |

Hit target | Numerical | |

Disciplinary failure | Categorical | 1-yes 2-no |

Education | Categorical | 1-high school 2-graduate 3-postgraduate 4-master/doctor |

# of children | Numerical | |

Social drinker | Categorical | 1-yes 2-no |

Social smoker | Categorical | 1-yes 2-no |

# of pets | Numerical | |

Weight | Numerical | |

Height | Numerical | |

Body mass index | Numerical | |

Absenteeism (hours) | Numerical |

Absenteeism Hours (y) | Absenteeism Class | $\mathit{c}$ | $\mathit{V}(\mathit{c})$ | $\mathit{P}\left(\mathit{c}\right)$ |
---|---|---|---|---|

0 | not absent | ${c}_{1}$ | 1 | 6% |

0 < y < 8 | Hours | ${c}_{2}$ | 2 | 57% |

8 ≤ y < 40 | Days | ${c}_{3}$ | 3 | 34% |

y ≥ 40 | Weeks | ${c}_{4}$ | 4 | 3% |

**Table 3.**Distribution of training dataset classes before and after Synthetic Minority Oversampling Technique (SMOTE) implementation.

Not Absent | Hours | Days | Weeks | Total Instances | |
---|---|---|---|---|---|

Training before SMOTE | 6% | 57% | 34% | 3% | 592 |

Training after SMOTE | 25% | 25% | 25% | 25% | 1360 |

**Table 4.**Entropy and objective-based entropy (OBE) measures with selected statistics ${c}^{\mathrm{max}}$ and ${c}^{\mathrm{mode}}$ for two different probability distributions of the absenteeism classes (“not absent”, “hours”, “days”, and “weeks”).

$\left(\mathit{P}({\mathit{c}}_{1}),\mathit{P}({\mathit{c}}_{2}),\mathit{P}({\mathit{c}}_{3}),\mathit{P}({\mathit{c}}_{4})\right)$ | $\mathit{H}(\mathit{c})$ | $\mathit{O}\mathit{B}\mathit{E}({\mathit{c}}^{\mathbf{max}})$ | $\mathit{O}\mathit{B}\mathit{E}({\mathit{c}}^{\mathbf{mode}})$ |
---|---|---|---|

(0.6,0.3,0,0.1) | 1.30 | 0.43 | 0.25 |

(0.6,0.1,0,0.3) | 1.30 | 0.38 | 0.36 |

**Table 5.**Average performance measures of different learning models for the absenteeism at work dataset.

Performance Measures | |||||||
---|---|---|---|---|---|---|---|

F-score | Precision | Recall | Accuracy | AUC | MSE | ${\mathit{\tau}}_{\mathit{b}}$ | |

Non-ordinal classifiers | |||||||

Extreme Gradient Boosting (XGBoost) | 0.69 | 0.72 | 0.68 | 0.68 | 0.73 | 0.32 | 0.52 |

Multi-Layer Perceptron (MLP) | 0.42 | 0.33 | 0.57 | 0.57 | 0.50 | 0.49 | 0.40 |

K-Nearest Neighbor | 0.56 | 0.56 | 0.56 | 0.56 | 0.60 | 0.58 | 0.35 |

Naïve Bayes | 0.41 | 0.54 | 0.34 | 0.34 | 0.56 | 1.46 | 0.02 |

Random Forest (RF) | 0.67 | 0.68 | 0.67 | 0.67 | 0.70 | 0.35 | 0.51 |

CART | 0.66 | 0.66 | 0.66 | 0.66 | 0.69 | 0.36 | 0.41 |

Ordinal classifiers | |||||||

Ordinal CART $OBE({c}^{\mathrm{mode}})$ | 0.69 | 0.70 | 0.69 | 0.69 | 0.72 | 0.31 | 0.53 |

Ordinal CART $OBE({c}^{\mathrm{max}})$ | 0.73 | 0.74 | 0.72 | 0.72 | 0.76 | 0.34 | 0.58 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Singer, G.; Cohen, I.
An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work. *Entropy* **2020**, *22*, 821.
https://doi.org/10.3390/e22080821

**AMA Style**

Singer G, Cohen I.
An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work. *Entropy*. 2020; 22(8):821.
https://doi.org/10.3390/e22080821

**Chicago/Turabian Style**

Singer, Gonen, and Izack Cohen.
2020. "An Objective-Based Entropy Approach for Interpretable Decision Tree Models in Support of Human Resource Management: The Case of Absenteeism at Work" *Entropy* 22, no. 8: 821.
https://doi.org/10.3390/e22080821