Incorporating a Machine Learning Model into a Web-Based Administrative Decision Support Tool for Predicting Workplace Absenteeism

: Productivity losses caused by absenteeism at work cost U.S. employers billions of dollars each year. In addition, employers typically spend a considerable amount of time managing employees who perform poorly. By using predictive analytics and machine learning algorithms, organizations can make better decisions, thereby increasing organizational productivity, reducing costs, and improving efﬁciency. Thus, in this paper we propose hybrid optimization methods in order to ﬁnd the most parsimonious model for absenteeism classiﬁcation. We utilized data from a Brazilian courier company. In order to categorize absenteeism classes, we preprocessed the data, selected the attributes via multiple methods, balanced the dataset using the synthetic minority over-sampling method, and then employed four methods of machine learning classiﬁcation: Support Vector Machine (SVM), Multinomial Logistic Regression (MLR), Artiﬁcial Neural Network (ANN), and Random Forest (RF). We selected the best model based on several validation scores, and compared its performance against the existing model. Furthermore, project managers may lack experience in machine learning, or may not have the time to spend developing machine learning algorithms. Thus, we propose a web-based interactive tool supported by cognitive analytics management (CAM) theory. The web-based decision tool enables managers to make more informed decisions, and can be used without any prior knowledge of machine learning. Understanding absenteeism patterns can assist managers in revising policies or creating new arrangements to reduce absences in the workplace, ﬁnancial losses, and the probability of economic insolvency.


Introduction
Absenteeism has been identified as an important factor in company performance and productivity losses. A study conducted by the Bureau of Labor Statistics suggests that nearly 2.8 million workdays are lost each year due to absenteeism at work [1]. Despite employers' expectations, excessive absences may reduce productivity and negatively impact the company's finances and other aspects [2]. The Gallup Wellbeing Index surveyed over 94,000 individuals from 14 major occupations and determined that absenteeism among particular U.S. workers costs USD 153 billion annually [3]. It is alarming to learn that many employers in the United States are unaware of the extent of absenteeism in the workplace. Less than one half of all companies have a system for tracking absenteeism, and only 16 percent have measures to reduce absenteeism [1]. Consequently, absenteeism has a significant impact on a company's financial performance, as it has a direct and powerful effect on the organizational structure [1,3,4]. In order to effectively handle employee absenteeism, an organization first needs to understand the causes and patterns of absence from multiple perspectives in order to properly classify the reasons for employee absence. Absenteeism has generally been considered a significant challenge in human resource management in various industries and organizations [5]. Artificial intelligence and predictive analytics are perceived as key drivers in improving organizational performance and productivity [6]. In this study, we explore those factors that are closely related to absenteeism at work with a high degree of accuracy. Furthermore, we propose a decision support tool for human resource managers that provides a classification of a particular employee guided by the CAM theory. With this tool, managers without expertise in the field of machine learning can gather more informed information to revise policies or develop new plans to reduce absences from work, thus reducing the adverse effects of workplace absences on the productivity and the performance of the company. The paper is organized as follows. The proposed interactive web-based tool is described in Section 2, along with a brief review of the literature related to predicting work-related absences. A detailed examination of the dataset used in this study is presented in Section 2.1. Different multiclass classification models are compared in order to determine the most effective model for predicting workplace absenteeism of potential candidates, and are discussed in Section 3, along with the performance metrics used to select the most effective model for subsequent integration with the web-based interactive tool. In Section 4, we describe the results obtained by the multiclass classification models and provide recommendations for selecting the best model based on performance metrics. Incorporating the best model as found in Section 4, Section 5 describes the developed interactive absenteeism management tool. Moreover, we present a demonstration of how to use our proposed tool to identify the absenteeism class of a potential candidate. Finally, concluding remarks are provided in Section 6.

Literature Review and Proposed Method
There are many types of employee leaves of absence, including short-term and longterm disability, workers' compensation, family and medical leave, and military leave. Additionally, there is evidence that companies with low morale suffer from higher rates and costs of unscheduled absences. Based on the 2005 CCH Survey, only 35% of unscheduled absences are attributed to personal illness, while 65% are attributed to other reasons, including family concerns (21%), personal needs (18%), entitlement attitude (14%), and stress (12%) [7]. Tunceli et al. [8] found that male employees with diabetes have a 7.1% lower likelihood of working and female employees with diabetes have a 4.4% lower likelihood of working than those without diabetes. Furthermore, the study conducted by Halpern et al. showed that smoking policies in the workplace affect absenteeism and productivity, concluding that smokers are more likely to be absent from work than former smokers and nonsmokers [9]. Researchers have increasingly used artificial intelligence (AI)-based methods in recent years to model absenteeism problems that have a detrimental impact on the company's infrastructure. Using Naive Bayes, Decision Trees, and Multilayer Perceptrons, Gayathri predicted absenteeism at the workplace and recommended multilayer perceptrons on the basis of validation score [10]. Using a multilayer perceptron with the error back-propagation algorithm, Martiniano et al. proposed a neuro-fuzzy network to predict workplace absenteeism [11].
In a recent study, Skorikov et al. [12] compared six different data mining techniques to predict multiclass absenteeism at work, concluding that KNN classifiers with the Chebyshev distance metric performed the best. Although the authors divided absenteeism into different degrees, all models showed significantly lower prediction power. Furthermore, the authors did not follow the recommended performance metric guideline [13] when comparing methods for imbalanced data.
Thus, we have utilized most of the current absenteeism research and followed the methodology of Cognitive Analytics Management (CAM) theory in order to examine absenteeism issues in the workplace. CAM theory, which was proposed Osman et al., consists of the following three steps [14]: (a) Cognitive process: acquire a thorough understanding of a problem, identifying the attributes and relationships that align with the desired goals and objectives determined in coordination with executive authorities. (b) Analytical process: determine the most appropriate analytical models and methods for addressing the identified challenge and achieving the desired objectives, then validate the outcomes and convey the findings to executive authorities. (c) Management process: A critical component of the CAM theory that is essential for the successful commencement and completion of any analytical project.
As part of the cognitive process, we preprocessed the data and explored the factors related to absenteeism through the use of ANOVA F-values and Random Forest feature selection. In the analytical process, we trained different machine learning classification models based on the attributes selected during the cognitive process, then determined the most significant model by comparing their performance metrics.
Furthermore, human resource managers may not have sufficient knowledge of the data mining techniques that used in existing models. In order to resolve this issue, we propose an automated decision support tool guided by the final stage (management process) of the CAM theory. For example, Delen, Sharda, & Kumar [15] developed an automated web-based tool integrating prediction models to provide Hollywood producers with a way of classifying a movie into one of nine success categories, ranging from flops to blockbusters.
Simsek et al. [16] have developed an automated tool that uses artificial neural networks (ANN) to identify point velocity profiles on rivers with an accuracy level of 0.46. Figure 1 provides an overview of how the web-based interactive tool proposed here was developed and how it can be utilized. This tool ensures that the end-user does not need to know anything about machine learning in order to make predictions about absenteeism; they must simply input data into the tool and click "predict".
The methodology in this paper is organized as follows. To begin, we preprocess the data by performing feature selection and scaling, one-hot encoding, and classification of absenteeism hours. We use the Synthetic Minority Over-Sampling Technique (SMOTE) to improve the performance of the classification models. In the next step, we split the data into training and testing sets, train four models (MLR, SVM, ANN, and RF) using the training data, and predict absenteeism classes using the testing data. Utilizing performance metrics, we compare and choose the most suitable model, then integrate the selected model into a proposed web-based interactive tool. Last, we provide a brief description of how the user can access the interactive tool.

Preprocessing and Cross-Validation
The dataset for this study was obtained from the UCI Machine Learning Repository and provided by Martiniano, Ferreira, Sassi & Affonso [11]. This database was created using absenteeism records from a Brazilian courier company from July 2007 to July 2010. The dataset consists of 740 instances and 21 attributes. For more detailed information regarding the dataset, please see Skorikov et al. [12]. In accordance with information provided by the data source (UCI Machine Learning Repository) and authors [11], the dataset permits several kinds of attributes to be combined and excluded, and permits modification of the type of attributes (categorical, integer, or real) depending on the purpose of research.
There are several categorical attributes in the Absenteeism dataset (month of absence, day of the week, season, and more). In machine learning models, it is assumed that two numbers that are very close will appear to be more similar than two numbers that are further apart. However, this is not always the case for categorical attributes [17]. With One-Hot-Encoding, the machine learning algorithm does not assume that larger numbers are more significant. Thus, we applied One-Hot-Encoding (by creating a binary column) to the categorical attributes [18]. An important step in the development of a machine learning model is to determine the importance of features. A number of features are redundant or do not contain useful information. Features can be selected appropriately based on their importance. We utilized ANOVA-F-value attribute selection (ANOVA-FS) and Random Forest attribute importance (RFFI) to determine the feature importance. In ANOVA, each attribute is ranked by calculating the ratio of variances between and within groups [19]. Based on this ratio, it can be determined how strongly the i-th attribute is related to the group attribute [20].  During a Random Forest training process, the significance of each attribute is quantitatively measured by the Gini index (error rate). In this way, the importance of each attribute can be ranked [21]. Further information about random forest attributes can be found in Sarelaa et al. [22].
We have listed the significant attributes for each of the methods in Table 1 with a checkmark. For the RFFI method, we set a significant level of 0.025 for all attributes; thus, any attributes with an importance level less than 0.025 were considered to have insufficient predictive power. The ANOVA-FS and RFFI both select ten attributes, although the selected attributes within each model are not identical. We selected the attributes from the union set of ANOVA-FS and RFFI.
As outlined in our proposed study, absenteeism in hours is a variable of interest. The absenteeism rate is divided into three categories: none means that an employee is never absent; moderate involves employees who are absent for 1 to 15 h per month; and excessive refers to employees who are absent for 16 to 120 h per month. For simplicity, we refer to these three groups as A + , B + , and C + , respectively. Skorikov et al. [12] introduced the concept of the classification of absenteeism data, which is helpful when comparing groups within an organization. Furthermore, these predefined classes provide an opportunity to compare our findings with existing results. We trained multiple classification models using selected attributes from each method and the dependent attribute absenteeism in hours. Moreover, we applied five-fold stratified cross validation before training with the absenteeism dataset in order to evaluate model performance. Five-fold stratified cross-validation involves randomly dividing the dataset into five equal folds, each of which contains the same number of classification classes. The test set consists of one fold and the training set consists of four folds. The test set changes every time, and the remaining four subsets are used as training sets to produce a total of five models. We considered the average of five performance metrics across five models [23,24]. Furthermore, due to the categorization of absenteeism in hours, three imbalanced classes were formed (A+ = 6%, B+ = 85%, and C+ = 9%). Therefore, in order to increase the performance of the models, we only applied SMOTE to the training dataset, leaving the testing dataset unchanged. A major objective of SMOTE is to create data for the minority class to support the balance between classes. This process generates new data points using the k-nearest neighbors algorithm [25].
Having applied all of the above techniques to the dataset, we trained multiple models; for brevity, we included only the four best-performing models (SVM, MLR, ANN, and RF) in this study. The following section provides a brief description of each of the classification models.

Prediction Models
For our study, we analyzed four commonly used models to classify the absenteeism category of potential employees. The following subsections outline the method and corresponding optimal hyperparameters calculated by the grid search algorithm.

Support Vector Machine
The support vector machine (SVM) has become very popular because it offers significant accuracy with minimal computational power. Support vector machines perform reasonably well with linear dependencies, have reasonable performance with sparse data sets, and can be used for a wide variety of data types [6]. The purpose of the support vector machine algorithm is to produce a hyperplane capable of differentiating between two dif-ferent classes of data. A separating hyperplane with the largest margin defined by d = 2 ||a|| maximum distance between data points of two classes (where vector a is perpendicular to the separating hyperplane specified) is shown in Figure 2 [26]. The hyperplane may not be readily available in certain cases due to the greater dimensions of the problem, in which case a kernel function helps in the smooth computation of the problem [27]. In order to apply SVM to multi-class classification problems, it is common to divide the problem into multiple binary classification subsets and then apply a standard SVM to each subset, which is called the one-versus-rest technique. In order to achieve optimal accuracy, we tested several different parameters, and found that the radial basis kernel function provides the highest level of accuracy.

Multinomial Logistic Regression
Multinomial logistic regression is an extension of binary logistic regression in which the outcome variable can be categorized into more than two categories. In order to extend logistic regression to multiclass classification problems, one commonly used approach is to divide the multiclass classification problem into a series of binary classification sets and fit a standard logistic regression model for each subset. Consider h mn as the success (h mn = 1) or failure (h mn = 0) of multinomial outcomes n, n = 1, . . . , N for observation m, m = 1, . . . , M. Consider x m to denote observation m s K-dimensional vector of the predictor variable, k = 1, . . . , K. Based on reference outcome N, the multinomial logistic regression (MLR) can be defined to predict probabilities π mn (x m ) for outcomes n = 1, . . . , N − 1 as follows [28]: where θ n = (θ n1 , . . . , θ nK ) refers to the coefficients for the nth linear predictor, excluding its intercept, λ n . The log-likelihood method is used to estimate λ and θ, providing normal and consistent estimates. Here, the Newton method was used to optimize the problem for the most accurate prediction.

Artificial Neural Networks
In recent years, artificial neural networks (ANN) have been applied to various fields thanks to their ability to model highly challenging problems. ANNs are new and useful models when applied to problem solving and machine learning. This is a model of information management that is comparable to the function of the human nervous system. A key feature of the human brain is its ability to process information in a unique manner. Many interconnected neurons serve as components of the system, and work in concert to solve specific problems on a daily basis [29]. An ANN consists of nodes, representing neurons, and connections between nodes, representing axons and dendrites carrying information. There is a value or weight attached to every connection between two nodes for the purpose of assessing the strength of the signal [30]. The neurons are arranged in layers, with an input layer representing one type of input data, an output layer representing the result of the classification, and one or more hidden layers. One of the most common and widely used forms of ANN is the perceptron, which is a fully connected feed-forward network [31]. The linear combination of weights and input values is passed through a non-linear function, known as an activation function [32]. Neural activation functions approximate the complex physical processes of neurons, which modulate their output in a non-linear way. The architecture of an artificial neural network is shown in Figure 3 [33]. We built a six-layer fully connected ANN, in which each neuron in one layer is connected to every neuron in the following layer. After one hot encoding and standardization, the first input layer consists of 42 input neurons. Our results indicated that the highest degree of accuracy was achieved using a network with four hidden layers, consisting of 400, 100, 50, and 20 neurons (nodes), respectively, and with the output layer consisting of three neurons using the relu activation function.

Random Forest
Decision trees are the core component of random forest classifiers. Using the features of a data set, a decision tree is built into a hierarchical structure. In the decision tree, each node represents a measure associated with a subset of features [34]. A random forest is composed of trees that produce class predictions for each tree, and the class that receives the most votes becomes the model prediction [35]. In this study, Gini impurity-supported criteria were used to determine most accurate predictions.

Performance Metrics
The following section demonstrates several of the statistical assessment metrics that we used to validate our model's performance.
Accuracy: An important metric for evaluating classification models is accuracy. Accuracy can be determined based on binary classification, as follows: where TP stands for true positives, TN for true negatives, FP for false positives, and FN for false negatives [36]. With imbalanced datasets, accuracy can be misleading; therefore, there are additional metrics found in a confusion matrix that can be utilized to evaluate performance.
Precision: Precision is a popular metric in classification systems. A measure of how well a model is able to predict positive values is referred to as precision [36]: Recall: Recall addresses imbalances that may occur in a dataset, and is defined as follows [37]: F1-score: In terms of precision and recall, the F-measure is defined as follows [38]: where P and R are the precision and recall, respectively, and α ≥ 0 represents the balance between P and R. This is commonly referred to as the F1 score when α = 1 [39]. ROC AUC score: Over the past few decades, receiver operating characteristic (ROC) curves have become popular, and have been widely used as a tool to evaluate the discrimination ability of various machine learning methods for predictive purposes [40]. Better models pass through the upper left corner and have greater overall testing accuracy. One of the most widely used metrics for assessing the performance of models is the Area Under Curve (AUC), which provides the ability of the classifier to distinguish between classes and is used as a summary of the ROC curve. AUC values are generally between 0.5 and 1.0, and a larger AUC indicates better performance. In the case of a perfect model, the AUC would be 1, indicating that all positive examples are always in front of all negative examples [39]. There are two popular methods for evaluating multi-class classification problems. The one-versus-one algorithm computes the average of the pairwise ROC AUC scores, while the one-versus-rest algorithm computes the average of the ROC AUC scores for each class compared to all other classes [41].

Results and Model Comparison
In this article, we explored four popular machine learning algorithms for predicting employee absenteeism. We used different methods of selecting the most significant set of attributes, as discussed in Section 2.1. Our primary objective was to determine the most appropriate model using performance metrics and develop a decision support tool for human resource managers. We employed four different classification models across three different sets of attributes. In comparing the results shown in Table 2, we found that the AUC score for the attribute set selected by ANOVA-FS ∪ RFFI was the highest of all models. Therefore, SVM and MLR had the highest AUC scores overall compared to other models. It should be noted that as concerns predictions, there is no universally accepted best method that is applicable in all problems. Trial and error and experimentation are required in order to find the best model for each scenario [42]. Furthermore, Delen et al. recommend incorporating the knowledge of multiple experts in the development and training of a model [15]. To develop an appropriate model, companies should consult with experts in the relevant field and choose the appropriate attributes based on their experience and geographic location. Saidene et al., for instance, investigated the factors leading to work absenteeism in Tunisia and concluded that excessive work hours, poor posture, workplace stress, and insufficient rest time are significant attributes leading to absenteeism [43].
As the ANOVA-FS ∪ RFFI set provided the overall best performance, we selected these thirteen attributes for the final model. The relative importance of the attribute set is presented in Figure 4. It can be seen that reason for absence has the highest importance compared to others. Almost all levels of reason for absence are caused by health-related absences. The establishment of an exercise center at the workplace may be one solution to reducing absenteeism. In an analysis of 517 employees selected randomly, Baun et al. explored the differences between exercisers and non-exercisers in terms of health care costs and absenteeism [44]. They found that exercise reduced illness absence among exercisers and increased illness absence among nonexercisers. The findings of our study are therefore consistent with those of their experiments, demonstrating a significant association between illness and absenteeism at work. In addition, Dula et al. conducted an experimental study of absenteeism among medical staff at Arba Minch General Hospital and concluded that workload has a positive relationship with absenteeism [45]. From Figure 4, it can be seen that workload is an important factor. In this case, a practical solution would be to decrease workload by hiring more employees. Moreover, as identifying the maximum workload for an employee is a critical decision, our proposed web-based tool can help managers in finding possible maximum workloads; this issue is discussed in detail in the next section.
Categorizing employees' absenteeism in hours generates imbalanced classes. Numerous suggestions have been made as to how to overcome the negative effects of such an imbalance on performance metrics (for further details, see Luque, Carrasco, Martin & Heras [46]). On the basis of our experimental results, and as suggested by Johnson, Halbesleben, Marilyn & Khoshgoftaar [47] and Simsek et al. [6], we determined the best model based on the AUC score and confusion matrix. In Table 2, we report the accuracy, weighted F1-score, weighted precision, and the one-versus-one macro ROC AUC scores. A previous approach Skorikov et al. [12] proposed a KNN classifier with a Chebyshev distance metric that achieved an AUC score of only 0.69. Table 2 demonstrates that, under all of the scenarios assessed, our models performed better than their proposed model. This may be due to the fact that we applied One-Hot-Encoding to the categorize attributes and standardized the continuous attributes before using them to train the models.
We selected SVM for the web-based supporting tool because of its accuracy, which was 100% for class A + , 85% for class B + is 85%, and 77% for class C + , the best overall performance in comparison to the other models. Skorikov et al. proposed KNN classifiers with the Chebyshev distance metric, obtaining an accuracy of 67% for class A + , 92% for class B + , and 8.3% for class C + [12]. Thus, our proposed model for absenteeism data shows a substantial improvement in terms of both AUC score and confusion matrix.

Absenteeism Interactive Tool
As discussed earlier, this study followed the CAM theory and current research on absenteeism to identify the factors that are significantly associated with absenteeism at work. In this study, we predicted absenteeism classes using these factors in order to explore possible ways for organizations to reduce absenteeism. Here, we propose an interactive web application for absenteeism classification guided by the final step of CAM theory, namely, the management process.
It can be difficult for a human resource manager or project manager to predict employee absenteeism classes using a machine learning algorithm, especially if he or she lacks programming expertise. Our proposed interactive web-based tool enables human resource or project managers to predict absenteeism by analyzing all current employees and estimating how many will be absent, allowing them to make appropriate adjustments and preparations ahead of time. In addition, managers will be able to save significant amounts of time by eliminating the need to train a machine learning algorithm. We present a prototype of the proposed tool based on a publicly available open-access dataset collected by Martiniano, Ferreira, Sassi & Affonso [11] from a Brazilian courier company in 2012. This dataset is widely used, and there has been a significant amount of research published on absenteeism using this dataset. A prototype of our proposed web-based tool is accessible at the following link: https://share.streamlit.io/gopalnath1926/app_absenteeism_new/ main/app_absenteeism.py (accessed on 24 June 2022).
It is important to note that this dataset represents specific human behaviors from the area in which it was collected. Related behaviors may be different in other areas of the world [48] or in different companies. Therefore, companies should consult with experts and select appropriate attributes (possibly based on internal data from their own company) for training the model and developing an effective version of the proposed web-based tool for human resources and project managers in order to improve workplace absenteeism problems.
It should be noted that this prototype is presented only for the benefit of the reader as a demonstration of the proposed tool. One of the referees remarked that this prototype tool, having been trained by the Brazilian courier dataset, may not be applicable or useful to other organizations. Perhaps a fully implemented version of the tool could streamline this process by allowing a company to upload its absentee data, automatically develop a model based on the process proposed in this study, and quickly create a unique version of the tool for their own human resource or projects managers.
Furthermore, based on the performance analysis discussed in the previous section, SVM was chosen for the prototype. We developed the prototype using the Streamlit, an open-source Python framework for developing web applications. By collecting an employee's information, the end-user can then input all of the information and, by clicking on the predict button, predict the class of the employee. The end-user does not need to know anything about machine learning to make predictions; they simply input data into the tool and click predict. As shown in Figure 5, the output (predicted) class for the desired candidate is A + on a scale of A + B + C + , indicating that this particular employee has a high degree of sincerity in the workplace and is less likely to miss work. Furthermore, as a follow-up to the problem we mentioned above, Dula et al. concluded that workload was a significant factor in predicting absenteeism in an experimental study on medical staff at Arba Minch General Hospital [45]. Our proposed interactive tool can help significantly in adjusting the workload of an employee. According to Figure 5, if we change the workload average/day from 70 to 73, the prediction class changes from A + to C + , assuming that all other attributes remain constant. Therefore, human resources or project managers can determine the maximum workload an employee can handle based on all of the input information, thereby avoiding absenteeism. Moreover, we conclude that there is a positive association between workload and absenteeism, which is consistent with the experimental results of Dula et al.

Conclusions
The cost of absenteeism is in the billions of dollars, and this does not include the time spent managing employees who are not performing well. Thus, absenteeism has a highly negative effect on a company's organizational structure. Companies differ depending on their working environment, type of work, geographical location, etc. Therefore, in order to effectively handle employee absenteeism, which adversely affects a company's financial stability, the company needs to identify the cause of the problem. Consequently, companies spend large amounts of money hiring additional management staff to identify causes and find suitable solutions.
To avoid negative impacts on companies related to employee absenteeism, machine learning algorithms can determine the potential class of employees in terms of absenteeism. We have utilized current studies on absenteeism and followed the guidelines of CAM theory in order to address the issue of absenteeism. Identifying absenteeism classes with a greater degree of accuracy is essential. We applied a hybrid optimization approach to identify a set of attributes that are significantly associated with absenteeism. Subsequently, a model was developed to predict absenteeism classes with a greater degree of accuracy. Furthermore, using machine learning algorithms can be quite challenging, as they often require a high level of knowledge, a significant amount of time, and strong programming abilities. Human resource managers may not have enough experience with machine learning algorithms, or may not have the time to develop such an algorithm. To address these issues, we have proposed a web-based interactive tool. As a proof-of-concept of the proposed tool, we developed a prototype using the Python Streamlit framework with an integrated SVM model (which was our best-performing model overall, based on experimental study). This proposed web-based tool serves as a link between machine learning algorithms and human resource managers. The proposed web-based interactive tool is very useful for human resource or project managers, allowing them to predict employee absences, determine how many will be absent, and adjust plans in advance. Additionally, managers can identify a possible threshold value of workload or hit a target workload for particular employees in order to prevent potential absenteeism. End users do not need to have any previous knowledge about how machine learning algorithms work in order to use our proposed tool. They can determine the absenteeism class by inputting the relevant information into the tool and clicking the predict button. Thus, human resource managers can utilize this simple tool to save time, decrease workloads, and make better decisions, preventing companies from experiencing adverse financial consequences.