A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics

Alyousef, Muna I.; Khan, Hamza Wazir; Sattar, Mian Usman

doi:10.3390/info17020208

Open AccessArticle

A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics

by

Muna I. Alyousef

¹

,

Hamza Wazir Khan

^2,*

and

Mian Usman Sattar

³

¹

Department of Management Information System, College of Business Administration, University of Hail, Ha’il 81451, Saudi Arabia

²

Department of Business Studies, Namal University, Mianwali 42250, Punjab, Pakistan

³

Department of Computing, College of Science and Engineering, University of Derby, Kedleston Road, Derby DE22 1GB, UK

^*

Author to whom correspondence should be addressed.

Information 2026, 17(2), 208; https://doi.org/10.3390/info17020208

Submission received: 28 December 2025 / Revised: 13 February 2026 / Accepted: 14 February 2026 / Published: 17 February 2026

(This article belongs to the Special Issue Machine Learning Approaches for Prediction and Decision Making)

Download

Browse Figures

Versions Notes

Abstract

Employee turnover presents a significant challenge to modern organizations, often resulting in operational disruptions, substantial hiring costs, and a loss of institutional knowledge. While traditional human resource practices have historically been reactive, the emergence of machine learning has introduced a proactive capability to anticipate and mitigate attrition before it occurs. This research utilizes the IBM HR Analytics dataset, which contains 1470 employee records and 35 distinct features, to develop a hybrid machine learning model designed to enhance the accuracy of turnover predictions. To ensure the model’s effectiveness, the researchers employed a comprehensive preprocessing phase that included eliminating non-informative features, applying label encoding to categorical data, and using StandardScaler to normalize quantitative values. A critical component of the study addressed the common issue of class imbalance within HR data. To resolve this, a hybrid sampling strategy was implemented, combining Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) to create a more balanced learning environment for the algorithms. The core of the predictive engine is a soft voting ensemble that integrates three powerful algorithms: Random Forest, XGBoost, and logistic regression. Evaluated on an 80/20 train–test split, the tuned XGBoost model achieved an impressive 84% accuracy and an Area Under the Curve (AUC) of 0.80. Meanwhile, the logistic regression component contributed the highest F1-score, reinforcing the overall strength and balance of the ensemble approach. These metrics confirm that the hybrid model is both robust and reliable for identifying at-risk employees. Beyond simple prediction, the study prioritized interpretability by using SHapley Additive exPlanations (SHAP) to identify the primary drivers of attrition. The analysis revealed that the most significant variables influencing an employee’s decision to leave include the interaction between job level and experience, frequent overtime, monthly income, current job level, and total years spent at the company. By providing these data-driven insights, the model empowers HR teams to transition from reactive troubleshooting to proactive retention planning, ultimately securing the organization’s talent and stability.

Keywords:

Random Forest; XGBoost; predicting employee attrition; explainable AI (XAI); SMOTE; ADASYN; SHAPE; hybrid model; Feature-Driven Insights

1. Introduction

Human capital is not a simple resource nowadays in the hyper-competitive business environment but is, in many ways, a strategic asset. Nonetheless, employee attrition remains a problematic issue that bedevils organizations of various sectors, interferes with the continuity of operation, drives up the cost of recruitment, and destroys institutional memory [1]. The average cost-per-hire is said to be over $4000, and turnover is very costly to an entity in the opinion of the Society for Human Resource Management [2]. Voluntary or involuntary attrition is an indicator of other problems within the organization, including workplace dissatisfaction and poor alignment, to name just a couple. The conventional HR measures tend to be reactive, i.e., they respond to the occurrence of attrition. With the advent of predictive analytics and machine learning (ML), there is, however, the paradigm shift of organizations being able to predict and prevent the occurrence of risks of attrition in advance [3].

Machine learning has since developed into an innovative play in the span of the past few years and computerized a game changer in human resource management (HRM). By becoming part of the HR processes, like recruitment, performance appraisal, and retention, it has recharacterized the manner in which organizations understand and use workforce information to make strategic business choices [4]. The algorithms of ML may reveal hidden patterns in employee behaviour, predict the risk of attrition with great accuracy, and discover high-risk individuals with outstanding precision. The Gradient Boosting and the Random Forest models have both been shown to work with structured HR datasets with predictive accuracy over 90% [5]. Further, the development of Explainable AI (XAI) models like SHAP and LIME has mitigated a key issue of transparency, i.e., HR professionals can know not only the prediction of the model but how and why the model makes these predictions [6].

Explainable Artificial Intelligence (XAI) has gained significant attention as it addresses one of the key limitations of modern machine learning models: their lack of interpretability. Techniques such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) provide a systematic way to understand how models arrive at specific predictions [7]. SHAP leverages principles from cooperative game theory to assign feature importance values that are both consistent and additive, offering global and local interpretability. LIME, on the other hand, approximates complex models with locally interpretable linear models, thereby allowing researchers and practitioners to observe how small perturbations in input variables affect outcomes [8]. By enhancing transparency, these methods not only improve trust and accountability in AI-driven decision-making but also support compliance with ethical and regulatory requirements. Their application is particularly valuable in domains such as healthcare, finance, and human resource management, where explainability is critical for justifying automated recommendations [9,10].

Machine learning has not only resulted in a more efficient operation in HR but has added the predictive element in workforce management. Historical and real-time data can also be used in ML models to predict employee turnover, determine areas of engagement gaps, and facilitate strategic retention as needed [11]. They enable HR professionals to be less reactive and instead be proactive in the decisions they make. Also, the twofold method by employing several algorithms in a combination that includes logistic regression, Random Forest, and XGBoost can enhance prediction robustness. Combined with feature-based analysis and explainable AI techniques, these models allow not only a high level of accuracy but also interpretability, which is an invaluable addition to modern HR analytics [12].

Neural networks (NNs) are an essential component of the machine learning process, and their ability to learn nonlinear relationships and adapt to high-dimensional data makes them especially useful in the process of modelling complex patterns that employee behaviour data represent. Empirical data show that optimized neural architectures outperform traditional models in predicting turnover, especially when they are synergized with data augmentation and clustering approaches [13].

Besides, the implementation of machine learning in human resources activities has triggered a paradigm shift in descriptive analytics to predictive and prescriptive decision-making. Taking advantage of the ensemble approaches and explainable framework, organizations are in a position not only to predict employee attrition but also explain the underlying forces with greater accuracy [14]. This two-fold ability including prediction and interpretation enables HR professionals to implement specific interventions, e.g., workload distribution optimization or career advancement trajectories. This means that machine learning is more than a technical tool, as it is now a strategic resource that fosters workforce stability and organizational resiliency [15].

Although there has been an increase in the literature on ML applications in HR, there is a marked lack of a combination between such integrations with a hybrid modelling approach and feature-based interpretability. The current literature either concentrates on single-model design or lacks sophistication in the interaction between attributes that cause attrition [16]. In this paper, we will discuss that gap by using the corpus of the IBM HR Analytics dataset, which is a rich real-world corpus of 1470 employee records and 35 features, to build and compare hybrid ML models to predict attrition. With this study, we believe it is possible to not only increase predictability but also introduce interpretable insights as to which factors are actually driving turnover among employees. The methodology provides technical rigor that is balanced with demand-sided applications, giving the HR practitioner a strong foundation for data-driven talent management orientation.

The central aim of this study is to develop and evaluate an interpretable hybrid machine learning framework for employee attrition prediction—one that intentionally balances predictive performance, transparency, and managerial relevance. Rather than pursuing algorithmic novelty, the contribution lies in integrating ensemble learning, class imbalance treatment, interaction-oriented feature engineering, and explainable AI into a coherent analytical pipeline specifically tailored to human resource decision-making contexts.

This study is guided by the following research questions:

Can a hybrid ensemble framework combining linear, tree-based, and boosting algorithms enhance attrition prediction performance under severe class imbalance?
Do engineered interaction features—specifically those capturing hierarchical–experiential alignment—improve both predictive accuracy and interpretability?
Can explainable AI methods such as SHAP meaningfully translate machine learning outputs into actionable insights for HR practitioners?

Do simpler and more interpretable models, such as logistic regression, perform competitively with more complex models in structured HR datasets?

The paper is divided into seven different sections. The Introduction outlines the importance of the problem of employee attrition and defines the role of machine learning in the field of human resource analytics. The Literature Review summarizes the previous studies on predictive modelling and hybrid methodology studies. The Methodology elaborates on the work done on data pre-processing, model framework, and evaluation measures. The Results and Analysis represent the empirical performance of the models and information about the interpretability. The Discussion puts the findings into the context of the macro-HR strategy. The Conclusion summarizes the main contributions, whereas Future Work promotes the pathways that meet ethical as well as technical advancement in AI-enhanced HR practices.

2. Literature Review

2.1. Evolution of Machine Learning in Human Resource Analytics

Incorporating machine learning (ML) in human resource management (HRM) has transformed the old practice into data-driven decision-making systems. Originally, HR analytics was based on statistical review and manual evaluation that used to be often restricted by prejudice and devoid of scalability [17]. Nevertheless, the emergence of ML has allowed organizations to automate more complicated efforts like the screening of job summaries, performance forecasting, and attrition prediction more accurately and faster [18]. Artificial intelligence models, such as Random Forest, Support Vector Machines, and Gradient Boosting, have proved better in HR practices such as the detection of the high-risk personnel and the best retention policies [19,20].

2.2. Predictive Modelling for Employee Attrition

Employee turnover presents a significant challenge for organizations, impacting productivity, employee morale, and financial performance. Recent studies have demonstrated that machine learning (ML)-based predictive systems can forecast attrition with remarkable accuracy, thereby enabling human resource departments to take proactive measures to mitigate it [21]. For instance, Chung et al., (2023) developed a predictive algorithm utilizing six ML models and achieved an accuracy rate of 98% using gradient boosting on the IBM HR Analytics dataset [22]. Their research highlights the importance of evaluating model robustness through a combination of performance metrics, including precision, recall, and F1-score. Moreover, the integration of ensemble techniques and hyperparameter tuning has further enhanced predictive outcomes, reinforcing the effectiveness of ML approaches in managing employee attrition [23].

“Predicting employee attrition and explaining its determinants”: This study highlights SHAP’s sensitivity to background data selection in explaining feature contributions and notes the importance of domain knowledge in making explanations actionable for HR professionals [24]. This study compares SHAP and LIME in employment contexts, confirming SHAP’s global consistency and sensitivity to background data while LIME provides more volatile, localized interpretations [25]. It emphasizes the need for technically sound, HR-aligned explanations to ensure meaningful transparency in decision support systems [26].

“Machine Learning Model for Human Resource Placement in Organizations”: This research employs both SMOTE and ADASYN for managing class imbalance and evaluates model performance using advanced metrics and interpretable machine learning [27].

Simultaneously, Raza et al. (2022) established an optimized Extra Trees Classifier (ETC) that reached 93 percent accuracy in the prediction of employee attrition, mentioning the importance of exploratory data analysis, which allowed them to distinguish the major drivers of attrition, which included monthly income, level of the job, and age [2]. Their results support the idea that predictive modelling needs to be based on not only an algorithmic intervener but also the contextual awareness of the organizational dynamics. These studies show conclusively that ML is able to not only point out the risks of attrition but also lead to strategic HR interventions which are not only timely but also focused [21].

2.3. Hybrid Models and Feature-Driven Insights

While single machine learning models have demonstrated success, hybrid ML approaches, such as stacking and blending, which integrate multiple algorithms, have shown superior performance in terms of predictive accuracy and generalizability [28]. These models capitalize on the strengths of individual classifiers to reduce overfitting and enhance accuracy across diverse datasets. In a comparative study focused on predicting employee performance and attrition, hybrid models combining Support Vector Machines (SVMs) and XGBoost outperformed traditional classifiers [29]. Additionally, the incorporation of feature-based interpretability techniques, such as SHAP and LIME, has enabled HR professionals to gain meaningful insights into model decisions, uncovering patterns related to factors like overtime, job level, and total years of service [30]. This interpretability bridges the gap between technical model outputs and actionable HR strategies, fostering more informed and strategic decision-making [31].

“Developing a hybrid machine learning model for employee turnover prediction integrating Genetic Algorithms and LightGBM”: This study demonstrates the advantage of hybrid models, improving predictive performance with ensemble strategies and feature selection [32].

The emergence of Explainable AI (XAI) has further elevated the relevance of hybrid models within HR analytics. Diaz et al. (2023) demonstrated that XAI not only enhances predictive capabilities but also provides interpretability by revealing the rationale behind each prediction, thereby enabling organizations to anticipate future attrition and understand the underlying causes [33]. Their findings identified key factors such as work–life balance, compensation expectations, and perceived career advancement as significant contributors to attrition decisions. By leveraging hybrid modelling in conjunction with XAI, HR departments can move beyond opaque, black-box predictions and develop nuanced, employee-centric retention strategies that align with organizational goals and ethical standards [34]. “Featuring Machine Learning Models to Evaluate Employee Attrition”: This publication outlines the necessity of F1-score, AUC, and ROC metrics in imbalanced datasets and discusses how ensemble models outperform single classifiers in accuracy and robustness [35].

The recent studies conducted emphasize that leadership competencies are essential to ensuring the successful deployment of AI technologies in organizations. A scoping review pinpoints 15 fundamental skills that should be commanded by a leader in order to successfully implement AI and staff engagement [36]. These results support the relevance of matching technological innovation with human-centric approaches to leadership, which are the views that complement our hybrid modelling approach in that they address the readiness and behavioural aspects of AI adoption in an organization [37].

2.4. Ethical Considerations and Future Directions

Although the technical changes have been materialized, the use of ML in HRM is not free of any difficulties. Privacy, bias, and a lack of transparency associated with algorithmic technology highlights the need to employ ethical AI tools. Cavescu and Popescu (2025) call to attention the importance of both explainability and fairness in employing AI-driven HR systems, encouraging the use of multi-disciplinary methodologies of integrating organizational psychology together with predictive analytics [16].

3. Methodology

3.1. Data Preprocessing and Feature Engineering

The IBM HR Analytics dataset has 1470 employees and 35 attributes related to demographics, job role, level of satisfaction, and performance indicator. During preprocessing steps, the following columns with non-informative data were removed: EmployeeCount, EmployeeNumber, Over18, and StandardHours. The reason is that they can be classified as either having constant values or as identifiers that do not participate in predictive modelling.

Categorical features, such as Gender, Department, and JobRole, were also changed with the help of label encoding, where text labels are assigned specific numbers and can be interpreted by algorithms as numbers. To enhance the predictive power of the models, three domain-specific features were engineered.

YearsAtCompany_Ratio: The ratio of YearsAtCompany to TotalWorkingYears, indicating the proportion of an employee’s total career spent with the current company.
Income_Per_Year: Calculated as MonthlyIncome divided by TotalWorkingYears, representing the monthly income earned per year of overall experience.
JobLevel_Experience_Interaction: An interaction term created by multiplying JobLevel and TotalWorkingYears to capture the combined effect of seniority and total experience.

The choice of these specific interaction and ratio features was prioritized over standard transformations such as binning or logarithmic scaling. While binning can simplify data, it often results in a loss of granular information regarding tenure and seniority. Conversely, the JobLevel_Experience_Interaction term was specifically selected to capture the non-linear synergy between authority and tenure, which is theoretically more aligned with organizational behaviour models than simple monotonic transformations.

Data Preprocessing and Feature Engineering steps are depicted in Figure 1. Also, all the numerical variables were standardized using StandardScaler, which transforms the data to have a mean of zero and a standard deviation of one. Scaling is applied to make sure that a particular feature with a more significant range does not overweight the learning process and allows for enhancing model convergence and performance.

In addition, to maintain data integrity, missing values were addressed through a systematic imputation strategy. Numerical features with missing entries were imputed using the median to maintain robustness against outliers, while categorical features were handled using the mode. This ensured that the sample size of 1470 records were preserved without introducing significant bias into the predictive models. The missing data were handled in order to maintain data integrity, and feature selection procedures were used because it is important to retain only the most salient predictors when comparing them to attrition modelling. These processes not only reduced dimensions but also improved computational performance. The final filtered data provided a strong platform on which one can train the high-performing machine learning models.

To mitigate the severe class imbalance observed in the dataset, a hybrid sampling approach was implemented combining minority class oversampling with majority class undersampling. This phase critically evaluated two distinct oversampling techniques: SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling).

SMOTE generates synthetic samples by interpolating between minority class instances, which helps in creating a more generalized decision boundary. ADASYN, on the other hand, focuses on generating synthetic samples for minority instances that are harder to classify, particularly those near the decision boundary, thereby improving model sensitivity to difficult cases.

While these resampling techniques (SMOTE and ADASYN) establish a balanced data environment, the predictive performance remains dependent on the choice of algorithms. Therefore, a hybrid ensemble was designed to leverage the distinct advantages of linear, tree-based, and boosting methods.

3.2. Model Selection and Hybrid Ensemble Design

In order to establish sound staff turnover simulation, a hybrid method was chosen with three independent classifiers, Random Forest, XGBoost, and logistic regression (LR). Each of the models’ sources lends strengths to the ensemble. The Random Forest has also been found to work well in complex interactions with features and when working with noisy data. An effective gradient boosting algorithm, XGBoost, is outstanding in terms of performance optimization by including the steps of learning and regularization. A soft voting scheme was used to mash up these models, and the output was the averaged probabilistic outputs of the three classifiers, which was considered the final prediction. Details are depicted in Figure 2.

Averaging these probabilistic outputs ensures a more nuanced and calibrated final prediction. This method inherently weights the models based on the confidence they assign to a particular outcome, leading to higher generalizability and a consensus-based result while averting individual model biases by minimizing chances of overfitting so as to create a more trustworthy and precise method of prediction.

Random Forest, XGBoost and logistic regression were selected because of their contrasting abilities when it comes to resourceful classification problems. Random Forest works well to obtain features interaction and deal with noisy data. XGBoost achieves high precision by means of rearranging and organizing by attribution.

Although logistic regression (LR) demonstrated slightly lower performance metrics compared to XGBoost and Random Forest (RF), its inclusion in the ensemble was deliberate. LR, being a linear and interpretable model, contributes robustness and generalizability to the ensemble. It is less susceptible to overfitting and can capture linear relationships that may be overlooked by more complex models. The diversity in model types, linear (LR), tree-based (Random Forest), and boosting (XGBoost), enhances the ensemble’s ability to generalize across different data patterns and reduces the risk of bias from any single model. The LR and RF models were trained on the SMOTE Hybrid resampled data. The XGBoost model was trained and separately tuned using both the SMOTE Hybrid and ADASYN Hybrid data. Resampling techniques (SMOTE and ADASYN) were applied strictly to the training data to avoid data leakage and ensure valid generalization. This allocation reflects differing algorithmic behaviours with respect to synthetic minority generation.

3.3. Mathematical Models

Below mentioned are the mathematical models that were used in the ensemble.

H_{B} (x) = \frac{1}{B} \sum_{b = 1}^{B} h (x; θ_{b})

(1)

Random Forest predicts by averaging (for regression) or majority voting (for classification) over

B

decision trees, where each tree

h (x; θ_{b})

is trained on a random bootstrap sample and a random subset of features, as shown in Equation (1) [38]. This ensemble reduces variance and improves generalization compared to a single decision tree.

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(2)

XGBoost predicts by adding the outputs of

K

regression trees, where each tree

f_{k}

belongs to the function space

F

of decision trees, as shown in Equation (2) [39]. Trees are added sequentially, and each new tree is trained to minimize a regularized objective that combines training loss (e.g., squared error or logistic loss) with model complexity to enhance accuracy and prevent overfitting.

P (y = 1 ∣ x) = σ (β_{0} + β^{T} x) = \frac{1}{1 + e^{- (β_{0} + β^{T} x)}}

(3)

Logistic regression models the probability of a binary outcome using the sigmoid function applied to a linear combination of input features. The coefficients

β

are estimated to maximize the likelihood, making the model interpretable and effective for classification tasks, as shown in Equation (3) [40].

3.4. Hyperparameter Model

In this study, logistic regression and Random Forest were treated as baseline models to establish a performance floor. The logistic regression model was initialized with the widely accepted ‘liblinear’ solver, which is efficient for binary classification and small datasets. The Random Forest model was configured with n_estimators = 100, a standard default that balances performance and computational cost. These configurations align with established practices in HR analytics literature [2]. In contrast, XGBoost was expected to be the top-performing model due to its sensitivity to hyperparameters and was therefore systematically optimized using GridSearchCV. The tuning process explored combinations of n_estimators, max_depth, and learning_rate, with the objective of maximizing F1-score to address class imbalance.

In contrast, the XGBoost Classifier underwent a systematic and exhaustive search using Grid Search Cross-Validation (GridSearchCV). This tuning was critical because XGBoost was expected to be the top-performing model. The optimization goal was specifically set to maximize the F1-score, which is essential for accurately evaluating performance on the class-imbalanced staff turnover data (balancing the capture of true positives with precision). The GridSearchCV evaluated a total of 20 distinct parameter combinations across the following grid: n_estimators tested at [100, 200], max_depth at [3, 5] to control tree complexity, and learning_rate at [0.05, 0.1] to manage the shrinkage applied to each boosting step. The best combination from this grid was selected as the final, tuned XGBoost model.

3.5. Evaluation Metrics

In order to measure the performance of the hybrid ensemble, we divided the dataset into training and testing sets using an 80/20 split, which maintains the class distribution of attrition. Having such evaluation metrics as accuracy, precision, recall, and F1-score enabled us to evaluate the efficiency of the model, especially on class imbalance. Details of proposed methodology are depicted in Figure 3. The metrics give an even picture of the gross accuracy and the model sentience to identify the cases of attraction. Their results and interpretation are described in the next section in detail.

3.6. Model Interpretability

To facilitate transparency and ethical use of AI, model interpretability was handled in the context of SHAP (SHapley Additive exPlanations), which is a very popular approach for the explanation of machine learning predictions. Although SHAP is not able to directly explain the hybrid ensemble because it has a non-callable structure, it can be used on each model such as XGBoost and Random Forest to shed light on the feature importance. These observations provide clues on the variables that have the greatest impact on the likelihood of predicting employee attrition, including the status of overtime, job level, and years at the company. With such knowledge pushed to these drivers, HR professionals will be able to form better decision-making and design.

Although SHAP offers theoretically grounded explanations, it also presents limitations such as computational overhead and interpretability challenges for non-technical users. To address computational complexity, we rely on the optimized Tree SHAP implementation for tree-based models and use sampling-based approximations to reduce runtime without compromising fidelity. For interpretability, SHAP outputs are complemented with simplified textual summaries, annotated dashboards, and global-to-local explanation workflows to make insights accessible to HR practitioners. These strategies ensure that SHAP remains both computationally feasible and managerially meaningful in real-world HR analytics environments.

4. Analysis and Results

This section signifies modelling procedure to identify central trends of employee attrition and evaluate the functionality of the offered predictive infrastructure. By a collection of visualizations such as attraction distribution, correlation of features, income and job position distribution, significant insights are obtained that should be used in data preprocessing as well as model adjusting. The previously trained hybrid ensemble has its performance measured by the typical classification performance measures, and its interpretability is discussed via the SHAP analysis.

4.1. Model Performance Comparison

The overall performance of the models was evaluated according to common classification parameters: accuracy, precision, recall and F1-score. Table 1 and Figure 4 indicate that the XGBoost Hybrid models show the highest overall accuracy, while the logistic regression model demonstrates a stronger F1-score and the highest recall, suggesting a better balance for identifying true positive attrition cases.

Table 1 compares the performance of logistic regression, Random Forest, and Tuned XGBoost models using accuracy, precision, recall, and F1-score, which is also shown Figure 4. Logistic regression achieved the highest F1-score (0.5042), indicating a better balance between precision and recall for predicting attrition. Random Forest and XGBoost showed competitive accuracy, but their lower recall and F1-scores highlight the hybrid model’s advantage in handling class imbalance.

4.1.1. Accuracy

The accuracy of all models was high; however, XGBoost and Random Forest had an 84 percent accuracy rate. Nevertheless, because of the class imbalance nature of the attrition dataset (the number of employees who remain in the company is much larger), the accuracy cannot be evaluated solely and therefore must be complemented by other metrics.

4.1.2. Precision and Recall

The F1-score comparison, as emphasized in Figure 5, shows higher performance of the logistic regression model and that of Tuned XGBoost model sequentially. The model with the best F1-score was the logistic regression model, which implies that it can offer a more equal compromise between precision and recall in this particular situation.

4.2. XGBoost Model Specifics

To provide a more granular view of the best-performing model (based on a balance of metrics), the logistic regression model’s performance was further analysed through a confusion matrix and an ROC curve.

Confusion Matrix

The confusion matrix in Figure 6 provides a detailed breakdown of the logistic regression model’s predictive performance on the test set.

True Negatives (205): The model correctly predicted that 205 employees would not leave.
True Positives (30): The model correctly identified 30 employees who did quit.
False Positives (42): The model incorrectly predicted that 42 employees would quit when they did not.
False Negatives (17): The model failed to identify 17 employees who actually left the company.

As noted in the matrix, the model holds a lot of promise in classifying non-attrition cases (high True Negatives) but has the potential of being improved in all positive cases of attrition (reduction of False Negatives).

4.3. Receiver Operating Characteristic (ROC) Curve

Figure 7 shows the ROC curve of the TPR versus the FPR with different values of classification thresholds. The model’s ability to distinguish between the two classes is measured by the Area Under the Curve (AUC).

AUC Score: Tuned XGBoost had a score of AUC = 0.80. AUC = 0.5 is a chance level, and AUC = 1.0 is an ideal classifier. The Area Under the Curve (AUC) is 0.80, which suggests that the model has an 80% chance of correctly distinguishing between an employee who will quit and one who will not. The value of 0.80 shows that the model is strongly capable of distinguishing between the employees who do and do not quit to a medium degree. While this is a strong result, future improvements could aim for higher sensitivity in minority class detection.

4.4. Feature Importance Analysis

To understand which factors most significantly influence the model’s predictions, a SHAP (SHapley Additive exPlanations) summary plot was generated. Figure 8 visualizes the impact of each feature on the model’s output.

The plot reveals several key insights:

JobLevel Experience Interaction

This was the most influential feature. Higher values (blue) reduce attrition risk, while lower values (red) increase it. This indicates that the combined effect of job level and experience significantly shapes employee retention.

Total Working Years

Employees with more total experience (red dots on the left) are less likely to leave, showing negative SHAP values.

Over Time

High overtime (red dots) strongly increases attrition likelihood with positive SHAP values.

Monthly Income

Higher income (red) tends to reduce attrition, as seen from red dots concentrated on the left.

The given analysis reveals an understandable and comprehensible picture of the reasons behind the job hopping of employees, something of immense value in creating specific HR retention measures. The JobLevel Experience Interaction feature, uniquely developed in this study, captures the balance between an employee’s job level and experience to reflect hierarchical–experiential alignment. The SHAP analysis shows that higher interaction values, where experience matches job level, reduce attrition risk, while mismatches increase it. This interaction offers a deeper and more precise understanding of employee retention patterns than either variable alone, marking a distinct contribution to workforce analytics.

5. Discussion

The findings of this study demonstrate the great potential hybrid machine learning applications possess when addressing the issue of employee attrition, which is both challenging and costly to many organizations in the world. The model was able to leverage the strengths of the three predictive models (Random Forest, XGBoost, and logistic regression) together in a soft voting ensemble resulting in the relative strengths of each model being used together (addressing feature interactions, feature engineering, strategic hybrid resampling, hyperparameter tuning, comprehensive model comparison, high performance gradient boosting, and robustness in high-dimension spaces). This outcome challenges the expectation that complex ensemble methods, like XGBoost or Random Forest, are universally superior. Instead, the LR model’s success suggests that the data, post-transformation, achieved a high degree of linear separability. The ability of a simple, stable, and computationally efficient linear model to outperform more complex non-linear models on this imbalanced dataset (where the number of voluntary turnover cases is not very high) is a key methodological finding.

In addition to predictive ability, the interpretability of the study, enabled by the SHAP analysis on the selected logistic regression model, solves an essential problem in AI application to human resource management: explainability and confidence. The model must not only identify who is likely to quit; organizations also need to know the reasons for such predictions to facilitate effective, specific interventions.

The SHAP analysis identified JobLevel_Experience_Interaction and YearsAtCompany_Ratio as robust retention levers, OverTime as the dominant, most influential factor increasing attrition risk and MonthlyIncome as the strongest factor decreasing attrition risk.

Our analysis demonstrates that attrition risk is fundamentally a matter of hierarchical–experiential alignment; when an employee’s experience and “authority” (job level) fall out of sync, the likelihood of job-hopping spikes. While a balanced match between the two serves as a powerful retention anchor, misalignments like high experience in a low-level role or low experience in a high-level role create specific friction points that drive exits. This insight shifts HR strategy from broad-brush retention to precision management, allowing for surgical interventions such as targeted promotions for the stagnated or enhanced mentorship for the over-extended and enabling a more proactive, predictive audit of workforce stability.

The proposed hybrid model is computationally feasible for deployment in real-world HR systems. Each component—logistic regression, Random Forest, and XGBoost—is supported by efficient, production-ready libraries and offers fast inference once trained. The soft voting mechanism used for ensemble prediction involves simple averaging of probabilities, which adds minimal overhead. The entire pipeline, including preprocessing and feature engineering, can be integrated into standard HR analytics platforms using modest computational resources, making the approach both scalable and practical for organizational use.

These findings corroborate existing theories of organizational behaviour and the results of empirical investigations. Such insights put HR practitioners in a position to create more data-driven policies which are expected to directly target the cause of turnover, including rigorous overtime observation or career development program investments.

Research on employee turnover has consistently demonstrated that misalignment between experience, job role, workload, and rewards is a central predictor of attrition. Early turnover models [41] emphasized perceived job skill mismatch and job satisfaction as precursors to exit decisions. Subsequent work in the 1990s and 2000s highlighted the importance of structural alignment, particularly job level, tenure expectations, and perceived growth opportunities, in shaping turnover intention. More recent empirical studies continue this progression. Ref. [42] found job level, income, and age to be major determinants of turnover. Ref. [43] reinforced the role of experience–job level balance and overtime demands. Ref. [44] further showed that SHAP-derived insights consistently rank job seniority, experience, and work pressure among the strongest explanatory variables. The hierarchical–experiential alignment revealed by our SHAP analysis therefore aligns with and extends this continuous line of research, offering a more fine-grained representation of how experience–authority imbalances (e.g., stagnation or over-extension) evolve into attrition risks. By situating our findings within this chronological body of work, our results reinforce that attrition is not driven by isolated factors but by cumulative, interactional misalignments within the employee’s role, workload, and career trajectory.

This consideration of accuracy and explainability serves as a remedy to a major issue witnessed in HR analytics. Most predictive models often behave like black boxes, which constrains their use in real-life scenarios because they do not provide valuable information on what can be done next. More complex is not always better. This research disrupts the ‘complexity bias’ in machine learning by showing that a simpler logistic regression model actually achieves a stronger balance of precision and recall. The integration of a simple, high-performing model and explainable AI methods in this study provides a pragmatic and ethical approach to decision-making that instils more confidence in HR leaders. Further, it embraces the ambiguities of human behaviour in the organization, since attrition is determined by a combination of interdependent factors rather than individual variables. Crucially, the study goes deeper than just listing important features; it uncovers how factors like job level and experience interact, offering a more sophisticated look at the data than previous research.

For this hybrid analytics approach to be successfully replicated, organizations must ensure high data granularity within their HRIS systems and foster interdisciplinary synergy between data scientists and HR practitioners to maintain feature relevance. While the model is computationally efficient, its implementation may face barriers such as the “small data” problem, where low turnover volume in smaller firms limits training effectiveness and ethical concerns regarding employee surveillance and data privacy. Overcoming these hurdles requires a “Human-in-the-loop” framework that leverages SHAP interpretability to bridge the gap between algorithmic output and executive decision-making, ensuring that the transition from reactive to predictive HR is both transparent and ethically grounded.

6. Conclusions

To sum up, the research reveals that hybrid machine learning strategies used alongside explainable AI approaches can present an effective and efficient method to determine the risk of employee attrition with high accuracy and explainability levels. Using the strengths of Random Forest, XGBoost, and LR as the complementary instruments and shedding some light on the essential key drivers of turnover, including Job Level_Experience Interaction, income and job level, the proposed approach will allow organizations to shift towards data-driven proactive retention initiatives so that they are no longer providing reactive solutions to the problem. The combination of predictive performance and interpretability is by no means just a technical breakthrough in the field of HR analytics, as this is also a solution to perhaps the most important problem of complex algorithms. HR leaders must be able to make informed, ethical, and effective decisions that protect what is most valuable to the company: people.

7. Future Work and Limitations

Future studies on this topic need to be directed to improving these hybrid models with more varied and dynamic data, employee engagement surveys, social network analysis, and real-time behavioural metrics. Moreover, it will be necessary to address the ethical implications of developing fair and bias-free predictions regarding attrition as supported by AI by taking into consideration the privacy issues of fairness, bias, and trustworthiness. The continued association of domain-specific theories and ongoing constructive feedback loops with HR practitioner communities can also further narrow down the relevance of the use of the models, thus leading to more personal and effective approaches of talent management amidst the changing workplace environment.

It is recommended to use k-fold cross-validation or multiple randomized splits in future work to assess the robustness and generalizability of the hybrid ensemble. Future work should also consider applying systematic hyperparameter optimization techniques such as randomized search or Bayesian optimization to all models for a more balanced comparison.

While this study introduced three engineered features—YearsAtCompany_Ratio, Income_Per_Year, and JobLevel_Experience_Interaction—which were designed to enhance the model’s ability to capture domain-specific patterns, we acknowledge that their individual contributions were not formally quantified through ablation studies. Future research should incorporate systematic ablation experiments by removing each engineered feature independently and observing the impact on model performance metrics such as F1-score and AUC. This would provide empirical evidence of the value added by each feature and strengthen the interpretability and robustness of the feature engineering process.

The feature engineering applied is based on simple transformations, and the ensemble approach relies on a basic soft-voting method. The interpretability analysis confirms factors already known in HR analytics. In future work, we plan to explore more diverse datasets, advanced feature engineering techniques, and novel ensemble strategies to enhance methodological depth.

Additionally, researchers can examine how the disturbance of data, such as missing data, noise injection, and adversarial input, affect the model performance and stability. Evaluation of the hybrid ensemble in such an unfavourable context would provide more valuable understanding of its strength and viable feasibility in the dynamic human resource settings.

Author Contributions

Conceptualization, M.U.S.; methodology, M.U.S. and M.I.A.; software, H.W.K. and M.I.A.; validation, H.W.K. and M.U.S.; formal analysis, M.U.S.; investigation, H.W.K. and M.I.A.; resources, M.I.A.; data curation, M.U.S. and H.W.K.; writing—original draft preparation, M.U.S.; writing—review and editing, H.W.K. and M.I.A.; visualization, M.I.A. and H.W.K.; supervision, M.U.S.; project administration, M.U.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available in a publicly accessible repository. The original data presented in the study are openly available at this link https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset?resource=download (accessed on 24 June 2025).

Acknowledgments

The authors would like to acknowledge the use of ChatGPT-4 specifically to assist in some content rewriting for improved clarity and effectiveness (24 May 2023 version, OpenAI, San Francisco, CA, USA).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dash, S.; Mishra, S.; Tripathy, S.K. Contextualization of Employee Recruitment and Retention in Technology Start-Ups. In Palgrave Studies in Democracy, Innovation and Entrepreneurship for Growth; Springer: Berlin/Heidelberg, Germany, 2022; pp. 187–213. [Google Scholar] [CrossRef]
Raza, A.; Munir, K.; Almutairi, M.; Younas, F.; Fareed, M.M.S. Predicting Employee Attrition Using Machine Learning Approaches. Appl. Sci. 2022, 12, 6424. [Google Scholar] [CrossRef]
Hamja, A.; Hasan, M.; Hassan, Z.; Uddin, M.P.; Siddikee, M.J.A. An Explainable Machine Learning-Based Employee Attrition Predictive System. Ann. Data Sci. 2025. [Google Scholar] [CrossRef]
Rajagopal, N.K.; Anand, M.; Mohanty, S. Exploring Machine Learning Applications in Human Resources Management: A Comprehensive Review. Stud. Syst. Decis. Control 2024, 569, 303–313. [Google Scholar] [CrossRef]
Basnet, S. Artificial Intelligence and Machine Learning in Human Resource Management: Prospect and Future Trends. Int. J. Res. Publ. Rev. 2024, 5, 281–287. [Google Scholar] [CrossRef]
Pandey, S.; Pandey, S. Leveraging Artificial Intelligence and Machine Learning for Workforce Optimization: A Holistic Approach to HR Transformation. J. Artif. Intell. Mach. Learn. Data Sci. 2025, 3, 2680–2683. [Google Scholar] [CrossRef]
Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
Aldughayfiq, B.; Ashfaq, F.; Jhanjhi, N.Z.; Humayun, M. Explainable AI for Retinoblastoma Diagnosis: Interpreting Deep Learning Models with LIME and SHAP. Diagnostics 2023, 13, 1932. [Google Scholar] [CrossRef]
Arunika, M.; Saranya, S.; Charulekha, S.; Kabilarajan, S.; Kesavan, G. A Survey on Explainable AI Using Machine Learning Algorithms Shap and Lime. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, Kamand, India, 18–22 June 2024. [Google Scholar] [CrossRef]
Moscato, V.; Khan, H.W.; Sattar, M.U.; Noor, S.; Alyousef, M.I. A Personality-Informed Candidate Recommendation Framework for Recruitment Using MBTI Typology. Information 2025, 16, 863. [Google Scholar] [CrossRef]
Wandhe, P. The Transformative Role of Artificial Intelligence in HR: Revolutionizing the Future of HR. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
Gagandeep; Verma, J.; Gupta, D. Investigating the Transformative Effects of AI, Machine Learning, and Robotics on Human Capital Analytics—An Empirical Study. Hum. Cap. Anal. 2025, 111–141. [Google Scholar] [CrossRef]
López, O.A.M.; López, A.M.; Crossa, J. Fundamentals of Artificial Neural Networks and Deep Learning. In Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer: Berlin/Heidelberg, Germany, 2022; pp. 379–425. [Google Scholar] [CrossRef]
Subramanian, Y.R.; R, R. The Transformative Role of Artificial Intelligence in Human Resource. Int. J. Recent Trends Bus. Tour. 2024, 8, 14–25. [Google Scholar] [CrossRef]
Arora, M.; Prakash, A.; Mittal, A.; Singh, S. HR Analytics and Artificial Intelligence-Transforming Human Resource Management. In Proceedings of the 2021 International Conference on Decision Aid Sciences and Application, DASA 2021, Virtual, 7–8 December 2021; pp. 288–293. [Google Scholar] [CrossRef]
Căvescu, A.M.; Popescu, N. Predictive Analytics in Human Resources Management: Evaluating AIHR’s Role in Talent Retention. AppliedMath 2025, 5, 99. [Google Scholar] [CrossRef]
Shafie, M.R.; Khosravi, H.; Farhadpour, S.; Das, S.; Ahmed, I. A cluster-based human resources analytics for predicting employee turnover using optimized Artificial Neural Networks and data augmentation. Decis. Anal. J. 2024, 11, 100461. [Google Scholar] [CrossRef]
Bonilla-Chaves, E.F.; Palos-Sánchez, P.R. Exploring the Evolution of Human Resource Analytics: A Bibliometric Study. Behav. Sci. 2023, 13, 244. [Google Scholar] [CrossRef]
Krishna Adabala, S. Citation: Adabala SK. AI in HR Evolution: Harnessing Machine Learning for Modern Solutions. J. Artif. Intell. Mach. Learn. Data Sci. 2023, 2023, 1702–1707. [Google Scholar] [CrossRef]
Ahmad, B.; Sattar, M.U.; Khan, H.W.; Qureshi, Z.; Hasan, R.; Ahmad, A.; Azad, M. Intelligent Digital Twin to make Robot Learn the Assembly process through Deep Learning. Lahore Garrison Univ. Res. J. Comput. Sci. Inf. Technol. 2021, 5, 65–72. [Google Scholar] [CrossRef]
Kambhampati, V.K.; Rao, K.B.V.B. Advancing Employee Attrition Models: A Systematic Review of Machine Learning Techniques and Emerging Research Opportunities. In Proceedings of the 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2024—Proceedings, Kirtipur, Nepal, 3–5 October 2024; pp. 1100–1107. [Google Scholar] [CrossRef]
Chung, D.; Yun, J.; Lee, J.; Jeon, Y. Predictive model of employee attrition based on stacking ensemble learning. Expert Syst. Appl. 2023, 215, 119364. [Google Scholar] [CrossRef]
Quinteros, D.M. Acadlore Transactions on AI and Machine Learning Predictive Modelling of Employee Attrition Using Deep Learning. Acadlore Trans. Ai Mach. Learn 2023, 2, 212–225. [Google Scholar] [CrossRef]
Varkiani, S.M.; Pattarin, F.; Fabbri, T.; Fantoni, G. Predicting employee attrition and explaining its determinants. Expert Syst. Appl. 2025, 272, 126575. [Google Scholar] [CrossRef]
Shafeeq, S.; Ali, M.; Azam, M.; Hashmi, M.U.; Ullah, M.A.; Ittfaq, A. hybrid machine learning framework for predicting employee attrition. Cent. Manag. Sci. Res. 2025, 3, 578–589. [Google Scholar] [CrossRef]
Chaudhary, M.; Gaur, L.; Chakrabarti, A.; Singh, G.; Jones, P.; Kraus, S. An integrated model to evaluate the transparency in predicting employee churn using explainable artificial intelligence. J. Innov. Knowl. 2025, 10, 100700. [Google Scholar] [CrossRef]
Al-Shammari, M.; Ghanem, Y.A. A Systematic Literature Review of Quantitative Models for Predicting Employee Attrition. In Proceedings of the 2024 International Conference on Decision Aid Sciences and Applications (DASA), Manama, Bahrain, 11–12 December 2024. [Google Scholar] [CrossRef]
Lee, K.; Lee, J.; Park, E. TFBlender: A hybrid time series attention model with data-driven macroeconomic perspectives for ELS Knock-In prediction. J. Big Data 2025, 12, 173. [Google Scholar] [CrossRef]
Oyeniran, M.; Adekunle, J.; Sule, H.; Folorunso, O.; Alagbe, S.; Anifowoshe, T.; Robbert, C.; Ebonyem, B.; Ideh, G.; Oyelakin, S.; et al. Personalized Energy Optimization in Smart Homes Using Adaptive Machine Learning Models: A Feature-Driven Approach. Int. J. Artif. Intell. Sci. 2025, 2, 83–104. [Google Scholar] [CrossRef]
Yao, X.; Xu, Z.; Ren, T.; Zeng, X.J. Feature-driven hybrid attention learning for accurate water quality prediction. Expert Syst. Appl. 2025, 276, 127160. [Google Scholar] [CrossRef]
Talebi, H.; Bardsiri, A.K.; Bardsiri, V.K. Machine Learning Approaches for Predicting Employee Turnover: A Systematic Review. Eng. Rep. 2025, 7, e70298. [Google Scholar] [CrossRef]
Talebi, H.; Bardsiri, A.K.; Bardsiri, V.K. Developing a hybrid machine learning model for employee turnover prediction: Integrating LightGBM and genetic algorithms. J. Open Innov. Technol. Mark. Complex. 2025, 11, 100557. [Google Scholar] [CrossRef]
Behera, S.K.; Dash, R. A Novel Framework for Mental Illness Detection Leveraging TOPSIS-ModCHI-Based Feature-Driven Randomized Neural Networks. Math. Comput. Appl. 2025, 30, 67. [Google Scholar] [CrossRef]
Ufeli, C.P.; Sattar, M.U.; Hasan, R.; Mahmood, S. Enhancing Customer Segmentation Through Factor Analysis of Mixed Data (FAMD)-Based Approach Using K-Means and Hierarchical Clustering Algorithms. Information 2025, 16, 441. [Google Scholar] [CrossRef]
Haque, M.; Paralkar, T.A.; Rajguru, S.; Goyal, A.A.; Patil, T.; Upreti, K. Featuring Machine Learning Models to Evaluate Employee Attrition: A Comparative Analysis of Workforce Stability-Relating Factors. Int. Res. J. Multidiscip. Scope 2025, 6, 862–873. [Google Scholar] [CrossRef]
Babar, A.R.; Samad, A. AI Adoption Imperatives for Employees Engagement: A Leadership Approach; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 1–26. [Google Scholar] [CrossRef]
Myszak, J.M.; Filina-Dawidowicz, L. Leaders’ Competencies and Skills in the Era of Artificial Intelligence: A Scoping Review. Appl. Sci. 2025, 15, 10271. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Huang, C.; Zhu, X.; Lu, M.; Zhang, Y.; Yang, S. XGBoost algorithm optimized by simulated annealing genetic algrithm for permeability prediction modeling of carbonate reservoirs. Sci. Rep. 2025, 15, 14882. [Google Scholar] [CrossRef]
Bewick, V.; Cheek, L.; Ball, J. Statistics review 14: Logistic regression. Crit. Care 2005, 9, 112. [Google Scholar] [CrossRef]
Mobley, W.H. Intermediate linkages in the relationship between job satisfaction and employee turnover. J. Appl. Psychol. 1977, 62, 237–240. [Google Scholar] [CrossRef]
Tura, A.S. Determinants of Employee’s Turnover: A Case Study at Madda Walabu University Abebe Seboka Tura. Adv. Manag. Appl. Econ. 2020, 10, 1792–7552. [Google Scholar]
Lazăr, F.; Rentea, G.C.; Mihai, A.; Niță, D.; Munch, S. Retention and turnover in social work practice: What role do trusting colleagues, overtime, and workload play in job satisfaction? J. Soc. Work Pract. 2025, 39, 21–36. [Google Scholar] [CrossRef]
Saketh, S.; Saicharan, V.; Rohith, V.; Rao, D. Leveraging Knowledge Graphs and Explainable AI to Improve Employee Turnover Predictions. Int. J. Adv. Res. Educ. Technol. 2025, 12, 1213–1220. [Google Scholar]

Figure 1. Data preprocessing and feature engineering.

Figure 2. Model selection and hybrid ensemble design.

Figure 3. Proposed methodology diagram.

Figure 4. Comparison of all evaluation metrics by model.

Figure 5. F1-Score comparison of models.

Figure 6. Confusion matrix for logistic regression model.

Figure 7. Receiver operating characteristic (ROC) curve.

Figure 8. SHAP summary plot for feature importance.

Table 1. Model comparison.

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression	0.7993	0.4167	0.6383	0.5042
Random Forest	0.8367	0.4865	0.3830	0.4286
XGBoost (SMOTE Hybrid)	0.8435	0.5152	0.3617	0.4250
XGBoost (ADASYN Hybrid)	0.8231	0.4419	0.4043	0.4222

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alyousef, M.I.; Khan, H.W.; Sattar, M.U. A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics. Information 2026, 17, 208. https://doi.org/10.3390/info17020208

AMA Style

Alyousef MI, Khan HW, Sattar MU. A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics. Information. 2026; 17(2):208. https://doi.org/10.3390/info17020208

Chicago/Turabian Style

Alyousef, Muna I., Hamza Wazir Khan, and Mian Usman Sattar. 2026. "A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics" Information 17, no. 2: 208. https://doi.org/10.3390/info17020208

APA Style

Alyousef, M. I., Khan, H. W., & Sattar, M. U. (2026). A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics. Information, 17(2), 208. https://doi.org/10.3390/info17020208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Predictive Model for Employee Turnover: Integrating Ensemble Learning and Feature-Driven Insights from IBM HR Analytics

Abstract

1. Introduction

2. Literature Review

2.1. Evolution of Machine Learning in Human Resource Analytics

2.2. Predictive Modelling for Employee Attrition

2.3. Hybrid Models and Feature-Driven Insights

2.4. Ethical Considerations and Future Directions

3. Methodology

3.1. Data Preprocessing and Feature Engineering

3.2. Model Selection and Hybrid Ensemble Design

3.3. Mathematical Models

3.4. Hyperparameter Model

3.5. Evaluation Metrics

3.6. Model Interpretability

4. Analysis and Results

4.1. Model Performance Comparison

4.1.1. Accuracy

4.1.2. Precision and Recall

4.2. XGBoost Model Specifics

Confusion Matrix

4.3. Receiver Operating Characteristic (ROC) Curve

4.4. Feature Importance Analysis

5. Discussion

6. Conclusions

7. Future Work and Limitations

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI