Next Article in Journal
Machine Learning-Enhanced Architecture Model for Integrated and FHIR-Based Health Data
Next Article in Special Issue
Implementing AI Chatbots in Customer Service Optimization—A Case Study in Micro-Enterprise
Previous Article in Journal
FreeViBe+: An Enhanced Method for Moving Target Separation
Previous Article in Special Issue
Intelligent Sustainability: Evaluating Transformers for Cryptocurrency Environmental Claims
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation

by
Muna I. Alyousef
1,
Mian Usman Sattar
2,*,
Raza Hasan
3,*,
Snober Usman
4 and
Atif Hassan
5
1
Department of Management Information System, College of Business Administration, University of Hail, Hail 81451, Saudi Arabia
2
Data Science Research Centre, University of Derby, Kedleston Road, Derby DE22 1GB, UK
3
Department of Science and Engineering, Southampton Solent University, Southampton SO14 0YN, UK
4
Human Resource Department, Ranas Accountancy, Henleaze House Business Centre, 13 Harbury Road, Bristol BS9 4PN, UK
5
School of Professional Advancement, University of Management and Technology, Main Block, Ground Floor, Main Building South, Block C2, Phase 1, Johar Town, Lahore 54700, Pakistan
*
Authors to whom correspondence should be addressed.
Information 2025, 16(12), 1053; https://doi.org/10.3390/info16121053
Submission received: 6 November 2025 / Revised: 25 November 2025 / Accepted: 28 November 2025 / Published: 2 December 2025
(This article belongs to the Special Issue AI Tools for Business and Economics)

Abstract

Employee attrition represents a significant burden, yet predictive models often fail to provide actionable retention strategies, creating a critical prediction–prescription gap. This paper introduces the Counterfactual–Dialectical Optimisation (CDO) framework, a novel methodology that bridges this gap by integrating predictive modeling, robust causal inference, and budget-constrained optimization. We first illustrate the framework’s mechanics on the synthetic Human Resources (HR) dataset, then conduct a rigorous proof-of-concept on the empirical ‘Saudi Employee Attrition Dataset’ to test its real-world applicability. In our empirical validation, we employ Propensity Score Matching (PSM) to estimate the causal effects of interventions while controlling for confounding variables. The results on the real-world dataset show that while predicting attrition is challenging Area Under the Curve (AUC ≈ 0.60), the framework successfully identified a deserved promotion as a powerful retention lever, causally reducing attrition probability by an estimated 23.9 percentage points. Acting on this insight, the optimization layer efficiently allocated the entire budget to this single, high-impact strategy for high-priority employees. This work provides a validated blueprint for shifting from passive prediction to active, Return on Investment (ROI)-driven prescription in strategic workforce management, demonstrating how to derive clear, actionable guidance even from complex, real-world data.

Graphical Abstract

1. Introduction

The escalating competition for talent has rendered employee retention a cornerstone of modern corporate strategy [1]. High rates of employee attrition not only incur substantial direct costs related to recruitment and training but also lead to indirect losses in organizational knowledge, productivity, and morale [2]. The scale of this financial burden can be substantial. To contextualize this within the setting for our empirical validation, recent industry analyses, such as the 2023 Gulf Talent Index from Oxford Economics, estimate the direct replacement cost per professional leaver in the Saudi knowledge economy at 60,000 SAR. Based on this estimate, the 515 leavers in our sample would represent an approximate 31 million SAR annual loss—a figure that underscores the urgency of the question we address: how should a finite retention budget be allocated to maximize the expected reduction in attrition? In response, organizations have increasingly turned to data-driven approaches, leveraging HR analytics to gain a deeper understanding of the factors contributing to employee turnover [3,4].
To date, the predominant paradigm in this domain has been predictive. Utilizing machine learning models, organizations can now identify, with considerable accuracy, which employees possess the highest propensity to leave [5]. These attrition risk models, while valuable for flagging potential issues, are inherently diagnostic rather than prescriptive [6]. They answer the question of ‘who’ is likely to attrit but remain silent on the more critical strategic question: ‘what’ should be done about it? Interventions based purely on risk scores are often suboptimal, as they fail to account for the heterogeneous ways in which individuals respond to different retention strategies such as a financial bonus, additional training, or a promotion [7]. An intervention may be highly effective for one employee but ineffective, or even counterproductive, for another.
Furthermore, retention strategies are invariably subject to real-world constraints, most notably limited budgets [8]. The challenge for HR leadership is therefore not simply to intervene, but to do so in a manner that maximizes the return on investment by allocating the right resources to the right individuals [9,10,11]. This transforms the problem from one of prediction to one of constrained causal optimization.
This paper addresses this challenge by proposing the CDO framework. Our work makes the following contributions:
  • We formalize a multi-layered methodology that synergistically combines predictive risk modeling, robust causal inference, and budget-constrained optimization to generate actionable, ROI-driven retention plans.
  • To directly address the limitations of purely synthetic analyses, we first illustrate the framework’s mechanics on a synthetic dataset, then conduct a full proof-of-concept on the empirical ‘Saudi Employee Attrition Dataset,’ grounding our validation in a real-world context.
  • Our framework moves beyond simple risk scores by explicitly incorporating real-world constraints, including limited budgets and a novel Employee Importance Score, ensuring resources are strategically allocated to critical, at-risk employees.
The remainder of this paper is structured to develop this contribution methodically. Section 2 begins by situating our work within the relevant literature. In Section 3, we introduce the conceptual architecture and logic of the proposed CDO framework. Section 4 provides the detailed mathematical formalisms for each layer, including our use of Propensity Score Matching for robust causal estimation, and outlines our two-stage experimental setup. To validate the framework, our findings are presented in two distinct parts. Section 5 provides a methodological illustration on a synthetic dataset to clearly demonstrate the framework’s mechanics. Section 6 then presents the main empirical proof-of-concept, applying the full framework to the real-world Saudi workforce dataset. Finally, Section 7 offers a comprehensive Discussion and Conclusion, where we interpret the empirical findings, discuss their implications and limitations, and summarize the paper’s overall contribution.

2. Background and Literature Review

The challenge of employee attrition has been approached from multiple analytical perspectives. This section reviews three key streams of literature that form the foundation for our proposed CDO framework: (1) predictive modelling for employee turnover, (2) the application of causal inference and uplift modelling in business contexts, and (3) resource allocation and optimization in HR management. We argue that while each stream offers valuable insights, their true potential is realized only through their integration.

2.1. The Evolving Landscape of AI in HR and the Challenge of Fairness

AI in HR analytics presents transformative potential alongside significant ethical challenges, requiring sophisticated bias mitigation strategies to ensure fair and responsible implementation. The evidence reveals complex dynamics in AI-driven HR systems. Reference [12] demonstrates that while AI can enhance decision-making, it simultaneously introduces substantial bias risks that could exclude qualified candidates and damage organizational reputation. Reference [13] further emphasizes that existing AI tools may inadvertently perpetuate hidden discriminatory patterns. Critically, researchers unanimously highlight that no single mitigation approach completely resolves fairness concerns. Reference [14] found organizations use only a limited set of proposed mitigation strategies, indicating an urgent need for more comprehensive ethical frameworks. The consensus suggests ongoing interdisciplinary research, transparent algorithmic design, and proactive governance are essential for responsible AI implementation in human resource management.

2.2. Predictive Modelling of Employee Attrition

The dominant data-driven approach to attrition management has been the development of predictive models. While early studies often relied on traditional statistical models like logistic regression, the field has embraced more sophisticated, non-linear machine learning models [15]. Recent literature demonstrates the high efficacy of these approaches, with reported predictive accuracies ranging from 87% to as high as 97.5% using methods such as Feedforward Neural Networks [16,17].
This body of work has also led to a consensus around a core set of highly predictive factors. Across multiple studies, variables such as environmental and job satisfaction, working overtime, monthly income, relationship satisfaction, and the distance from home to work are consistently identified as key drivers of an employee’s decision to leave [18]. The research clearly demonstrates that machine learning can provide organizations with robust, strategic insights for identifying at-risk individuals. The use of a RandomForest classifier in the predictive layer of our CDO framework aligns with this established best practice for risk stratification.
However, a critical limitation of this purely predictive paradigm is its inability to guide intervention strategy. As noted by [19], prediction models answer “what if” questions based on passive observation, not “what if we do” questions that involve an active intervention. A high-risk score indicates a problem but offers no insight into the optimal solution, creating the “prediction–prescription gap” that our framework is expressly designed to close.

2.3. Causal Inference and Uplift Modelling in Business

To bridge the prediction–prescription gap, the field has increasingly turned towards causal inference. The goal is not merely to predict an outcome but to estimate the causal effect of a specific action. In business contexts, this is known as uplift modelling, a technique for estimating causal effects at individual or subgroup levels to optimize personalized interventions [20]. These models aim to identify which customers or employees will be most positively influenced by a specific action, enabling targeted and efficient resource allocation.
The utility of this approach is well-documented across various domains. It is instrumental in e-commerce for targeting promotional campaigns, in digital advertising for moving beyond mere correlation to understand cause-and-effect relationships, and in optimizing user engagement on digital platforms [21,22]. While promising, the application of uplift modelling requires sophisticated statistical techniques to account for confounding variables and potential biases, particularly when using observational, non-experimental data. The gold standard for causal estimation, the randomized controlled trial (or A/B test), is often costly, slow, and ethically challenging to implement at scale in an HR context. Consequently, methods for estimating causal effects from observational data have gained prominence. A key challenge in this area is selection bias, where the factors that influence whether an individual receives treatment are also correlated with the outcome.
To address this, a range of quasi-experimental methods has been developed, from traditional approaches like Propensity Score Matching (PSM) and Difference-in-Differences (DiD) to more recent machine learning techniques such as Causal Forests [23] and Meta-Learners [24]. These methods leverage the power of flexible machine learning models to estimate heterogeneous treatment effects while controlling confounders [25].

2.4. Optimization and Resource Allocation in HR

The final component of a prescriptive framework is optimization. Given a set of effective interventions and a high-risk population, the problem becomes one of allocating a limited budget to maximize the desired outcome. This is a classic resource allocation problem, widely studied in operations research and computer science. In the context of HR, the challenge lies in effectively matching employee needs and organizational goals under specific constraints, a problem that is known to be computationally complex and often NP-Hard [26].
The literature demonstrates a variety of sophisticated mathematical and computational approaches to this challenge. Optimization models have been developed for simultaneously selecting profitable projects and allocating the requisite HR [27], as well as for dynamic team allocation that accounts for skill levels and learning processes [28]. These strategies typically aim to maximize resource utilization, minimize project duration and cost, or align employee skills with organizational goals, often using computational methods like linear programming or probabilistic models. However, the integration of individual-level causal effect estimates into these optimization problems is a relatively nascent field. Much of the existing work in HR optimization focuses on optimizing based on predicted outcomes or pre-defined business rules rather than on counterfactual uplift estimates. This is the specific gap our framework seeks to fill.
The approach taken in our CDO framework is a greedy heuristic that prioritizes allocation based on a combination of risk and ROI a pragmatic and computationally efficient solution to this complex combinatorial optimization problem. While more formal methods like integer programming could provide a globally optimal solution, heuristic approaches are often preferred in practice for their speed and ease of implementation, particularly when dealing with large-scale allocation decisions.

2.5. The Present Contribution: A Synthesis

While the preceding sections review three distinct fields, the primary contribution of this paper is not an incremental advance within any single one. Rather, our contribution lies in their novel synthesis to propose a new, end-to-end paradigm for prescriptive workforce analytics. By integrating a predictive layer (‘who is at risk’), a robust causal layer (‘what will work’), and a constrained optimization layer (‘how should we invest’), the CDO framework moves beyond the limitations of standalone models. It transforms the problem from a series of disconnected analytical tasks into a single, cohesive strategic process. To our knowledge, this synthesis represents a new and comprehensive approach to solving the prediction–prescription gap in employee attrition management.

3. The Proposed Framework

This section introduces the CDO framework, a multi-layered methodology designed to move beyond predictive attrition modelling towards a prescriptive, resource-constrained retention strategy. The framework is designed to answer three critical business questions in sequence: (1) Who is most likely to attrit? (2) What is the most effective intervention for each individual? (3) Why are these decisions being made?

3.1. Overall Framework Architecture

As depicted in Figure 1, the CDO architecture comprises three core analytical layers operating on historical employee data: a Predictive Layer, a Causal Layer, and an Optimization Layer. The outputs of these layers are not only an optimized, actionable retention plan but also a set of strategic insights derived from a consistent analysis of feature contributions.
The workflow is sequential:
  • Input Data: The framework ingests historical employee data, including demographic, role-based, and behavioral features, alongside the historical attrition outcome.
  • Predictive & Causal Layers: This data is processed in parallel by the Predictive Layer, which estimates attrition risk, and the Causal Layer, which estimates the effect of potential interventions.
  • Optimization Layer: The output from the first two layers (1) risk scores and (2) uplift scores, ATEs are fed into the Optimization Layer. This layer applies a novel Counterfactual–Dialectical optimization process to generate the final retention plan under a budget constraint [29].
  • Outputs: The final outputs are a tactical retention plan, specifying which employee should receive which intervention, and strategic insights into the consistent drivers of both risk and intervention effectiveness.

3.2. Algorithmic Summary

A high-level summary of the framework’s algorithmic process is presented in Table 1. This table outlines the primary function and purpose of each phase, providing a roadmap for the detailed methodological descriptions that follow.

3.3. Detailed Layer Descriptions

The first layer addresses the ‘who’ question by identifying employees with the highest propensity to attrit. This is a standard supervised learning task where a predictive model M risk is trained on the feature set X to predict the binary outcome ye (attrition). In this study, a RandomForest classifier is employed for its robustness and high performance [30]. The model’s output is an Attrition Risk Score, r ^ e = P ( y e = 1 x e ) , for each employee e. This score is used to prioritise employees in the optimisation phase.
The second layer moves from prediction to prescription by addressing the ‘what’ question: which intervention is most effective? This layer uses causal inference, specifically PSM, to calculate the ATE for each intervention. The ATEk represents the estimated average change in attrition probability for the employee population if they were to receive intervention k. PSM is chosen for its ability to provide robust effect estimates by controlling for confounding variables, a common characteristic of observational business data [31].
The core of the framework is the Optimization Layer, which synthesizes the risk scores, employee importance, and the ATEs to generate a cost-effective retention plan. This process leverages the ATE scores, which are inherently counterfactual as they estimate the difference in potential outcomes.
The allocation process is guided by a heuristic we term a dialectical loop, illustrated in Figure 2. This is not a formal dialectic, but a conceptual search algorithm inspired by its structure, designed to operate under budget constraints. For each high-priority employee (prioritized by their combined risk and importance score), the algorithm proceeds as follows:
  • Thesis: Identify the intervention with the maximum raw attrition reduction (most negative ATE).
  • Constraint Check: Determine if this thesis intervention is affordable within the remaining budget.
  • Antithesis & Synthesis: If the thesis is unaffordable, an antithesis is generated by identifying all other affordable interventions. A synthesis is then formed by selecting the affordable intervention that offers the highest ROI, calculated as:
    τ ^ k x e cost k
  • Allocation: The synthesized (or original thesis) plan is assigned, and the budget is updated.
This iterative, ROI-driven approach ensures that the budget is allocated in the most efficient manner, maximizing the total expected reduction in attrition.
A final, crucial layer addresses the ‘why’ question, ensuring transparency and generating strategic insights. This is achieved by applying the SHAP algorithm to key models within the framework. Specifically, this yields two distinct sets of feature importances:
  • One explaining attrition risk (from the predictive Random Forest model).
  • Another explaining the likelihood of receiving a treatment (from the L1-regularized propensity score model).
We introduced a novel metric, the Consistent Feature Contribution Analysis (CFCA) Score, calculated using a robust, rank-based aggregation of the SHAP-based importances from both models. This score serves to identify features that are consistently influential across both the predictive task (‘who is at risk’) and the prescriptive context (‘what drives an intervention’). Features with high CFCA scores represent the most strategically valuable levers within the organization for managing attrition, as they are fundamental to both understanding risk and the intervention process itself.

4. Methodology and Experimental Setup

This section details both the formal methodology of the CDO framework and the specific experimental design used for its validation. We first present the mathematical and algorithmic foundations of each layer in the framework. Subsequently, we describe the dataset, pre-processing pipeline, and simulated intervention strategy employed in this proof-of-concept study.

4.1. The CDO Framework: Formal Definitions

The CDO framework is a multi-layered process designed to generate optimized, explainable retention strategies. Each layer is defined by a distinct mathematical objective and algorithmic procedure.

4.1.1. Predictive Layer: Attrition Risk Stratification

The first analytical layer of the framework is designed to stratify employees by their propensity to leave. The objective is to train a predictive model, Mrisk, that estimates the conditional probability of attrition for each employee, known as the Attrition Risk Score ( r ^ e). This score, along with the Employee Importance Score, serves as a primary input for the optimization layer’s prioritization process. Algorithm 1 shows the procedure used to generate these scores.
Let E = { e 1 , , e N } be the set of employees, each described by a feature vector x e . Let y e { 0,1 } be the binary attrition outcome. The objective is to train a predictive model, M risk , that estimates the conditional probability of attrition, or the Attrition Risk Score r ^ e :
r ^ e = P ( y e = 1 X = x e )
Algorithm 1: Generating a Key Prioritization Input: Attrition Risk Score
Input: Dataset D = {(xe, ye)}e=1n
Output: Vector of risk scores R = { r ^ e} for e∈E
1: procedure GenerateRiskScores(D)
2:      Mrisk ← TrainClassifier(D) // such as RandomForest
3:      for each employee e in D do
4:           r ^ e ← Mrisk.predict_proba(xe)
5:      end for
6:      return R
7: end procedure

4.1.2. Causal Layer: Intervention Effect Estimation

To estimate the ATE, or the causal impact of each intervention on the probability of attrition, the framework’s second layer employs causal inference. To control confounding variables inherent in observational data, where employees who receive an intervention (e.g., a promotion) may be systematically different from those who do not, we utilize PSM, a widely used quasi-experimental method.
To ensure the stability of the propensity score model, which is critical for a valid causal estimate, we first conducted multicollinearity diagnostics using the Variance Inflation Factor (VIF). As high multicollinearity was detected between some predictors (VIF > 10 for Age and Employee Importance), we employed an L1-regularized logistic regression model (Lasso) to build the propensity score model. This is a technique that is robust to collinear features and helps prevent model overfitting.
The PSM process estimates the ATE by creating a synthetic control group that is statistically similar to the treated group across a range of observed covariates X. This is achieved in two primary stages as detailed in Algorithm 2. First, the regularized predictive model is trained to estimate the propensity score e(X) for each employee, which is their predicted probability of receiving the intervention given their characteristics. Second, each employee who received the intervention is matched to one or more employees who did not, but had a very similar propensity score. The ATE is then calculated as the simple difference in the mean attrition outcome between these two well-matched groups.
Algorithm 2: ATE Estimation with PSM
Input: Dataset D = {(Xe, ye, Te,k)} for e = 1 to n
Output: ATEk; estimate for intervention k
1: procedure GenerateATEviaPSM(D, k)
2:   // Stage 1: Estimate Propensity Scores using a regularized model
3:   Mpropensity← TrainClassifier(X, Tk)   // e.g., L1-Regularized Logistic Regression
4:   for each employee e in D do
5:      p ^ e,k ← Mpropensity.predict_proba(Xe)   // Calculate P(Tk = 1 | Xe)
6:   end for
7:   // Stage 2: Matching and Effect Estimation
8:   MatchedPairs ← FindNearestNeighborMatches( p ^ e,k for Te,k = 1, p ^ e,k for Te,k = 0)
9:   TreatedOutcomes ← Average(ye for treated employees in MatchedPairs)
10:   ControlOutcomes ← Average(ye for control employees in MatchedPairs)
11:
12:   ATEk ← TreatedOutcomes – ControlOutcomes
13:   return ATEk
14: end procedure

4.1.3. Optimization Layer: Dialectical Resource Allocation

The final layer of the framework synthesizes the outputs from the predictive and causal layers to generate a budget-constrained, ROI-driven retention plan. The objective is to maximize the total strategic value of the interventions, where strategic value is defined as the expected reduction in attrition weighted by the importance of the employee.
Let ae,k ∈ {0, 1} be the decision variable for assigning intervention k to employee e. The problem is to
Maximize
e E h i g h k T a e , k A T E k w e
Subject to:
Budget Constraint:
e E p r i o r i t y k T a e , k c k B t o t a l
Assignment Constraint:
k T { 0 } a e , k = 1 e E p r i o r i t y
Algorithm 3 provides the allocation procedure.
Algorithm 3: Intervention Allocation Heuristic
Input: Risk scores R, Employee Importance Scores W, ATEs, Costs C, Budget Btotal
Output: Allocation Plan Π = {(e, k)}
1: procedure AllocateInterventions(R, W, ATEs, C, Btotal)
2: Epriority ← IdentifyHighPriorityEmployees(R)
3: Esorted ← Sort Epriority by priority_scoree in descending order
4: Bused ← 0, Π ← empty map
5: for each employee e ∈ Esorted do
6: if Bused ≥ Btotal then break end if
7: kbest ← argmax(k ∈ 𝒯) {|ATEk|/ck} such that τ ^ e,k < 0
8: if kbest exists and (Bused + c(k)) ≤ Btotal* then
9:  Π[e] ← kbest
10:  Bused ← Bused + c(k)
11: end if
12: end for
13: return Π
14: end procedure
Although the allocation rule is greedy, it is grounded in a micro-economic model of turnover cost. Let the organizational utility of assigning intervention k to employee e be Ue(k) = −ATEk.we, where we is the replacement-cost proxy (employee-importance score). Maximizing ΣUe(k) under budget B is a 0–1 knapsack problem. The ratio |ATEk.we)/ck is the strategic-profit-to-cost heuristic that approximates the optimal solution. The algorithm therefore embeds ROI-maximizing economic rationality rather than an ad hoc rule.

4.1.4. Explainability Layer: CFCA

The final layer addresses the ‘why’ question by identifying features that are consistently important across the framework’s key models. We use SHAP, which represents a model’s prediction M(xe) as a sum of feature attributions, or SHAP values φj:
( x e ) = ϕ 0 + j = 1 J ϕ j ( x e )
Global feature importance I(j) is the mean absolute SHAP value across all samples. This is calculated for two key models in our framework:
  • The risk model (Irisk), to explain the drivers of attrition risk.
  • The propensity score model (Ipropensity), to explain the drivers of receiving an intervention.
I ( j ) = 1 N e = 1 N ϕ j ( x e )
To ensure a robust and stable comparison, as recommended by reviewer feedback, we introduce the CFCA Score. Instead of using potentially unstable scaled importance values, we use a rank-based aggregation. For each model, all features are ranked from most important (rank 1) to least important based on their global feature importance I(j). The CFCA Score is then calculated as the average of the feature’s ranks across both models:
S C F C A j = 0.5 R a n k r i s k j + 0.5 R a n k p r o p e n s i t y ( j )
A lower CFCA score indicates that a feature is consistently highly ranked (i.e., important) across both the predictive and prescriptive contexts of the framework. These features represent the most strategically valuable levers within the organization for managing attrition.

4.2. Experimental Design and Data

Our experimental design follows a two-stage approach to first illustrate the framework’s mechanics in a controlled environment and then rigorously validate its performance on real-world data. A consistent data pre-processing pipeline was applied to both datasets to ensure methodological consistency.

4.2.1. A Two-Stage Validation Approach

  • Stage 1 (Methodological Illustration): To provide a clear, step-by-step demonstration of the CDO framework’s components, we first utilize the well-known, publicly available HR Analytics dataset. As a standard synthetic benchmark for attrition modeling, it provides a controlled environment to explain the functionality of the predictive, causal, and optimization layers. The full results of this illustrative analysis are presented in Section 5.
  • Stage 2 (Empirical Validation): To test the framework’s utility in a more realistic and challenging context, we then conduct our main proof-of-concept on the Saudi Employee Attrition Dataset. This dataset, based on an empirical employee survey, provides the foundation for our main results, which are presented in Section 6.

4.2.2. Data Pre-Processing Pipeline

To prepare the raw data from both datasets for the subsequent analytical layers, the following multi-step pre-processing pipeline was meticulously executed:
  • Data Cleaning and Feature Removal: Unique identifier columns (e.g., EmployeeNumber) and columns with zero variance (e.g., EmployeeCount, Over18) that provide no predictive information were removed to create a clean and relevant feature set.
  • Target Variable Encoding: The categorical Attrition column (with values ‘Yes’ and ‘No’) was converted into a binary integer format, where 1 represents attrition and 0 represents retention. This column serves as the outcome variable in all subsequent mathematical formulations.
  • Missing Value Imputation: To preserve statistical power, missing values in numerical columns were imputed using the median, which is robust to outliers. Missing values in categorical columns were imputed using the mode, which is the most frequently occurring value.
  • Feature Engineering and Encoding: For the empirical validation stage, an Employee Importance Score was engineered from salary and job title data to serve as a proxy for an employee’s strategic value. All remaining non-numerical features (e.g., Gender, MaritalStatus) were converted into a numerical format suitable for model training using appropriate mapping or encoding techniques.
  • Feature Scaling: For distance-based algorithms like k-Nearest Neighbors (k-NN), which are sensitive to the scale of input data, the final numerical feature set was scaled using a StandardScaler. This technique transforms each feature to have a mean of 0 and a standard deviation of 1.
This pre-processing pipeline results in a clean, fully numerical, and scaled dataset, providing a robust foundation for the training of both the predictive and causal models.

4.2.3. Intervention Set Definition and Optimization Parameters

The definition of interventions was adapted for each validation stage to suit the nature of the data:
  • For the illustrative stage (HR dataset), interventions for ‘Bonus’, ‘Promotion’, and ‘Training’ were synthetically generated via random assignment.
  • For the empirical validation stage (Saudi dataset), interventions were defined using real survey data as proxies: a Promotion Intervention (from the Get_Deserved_Promotion column) and a Compensation Intervention (from the Bonus column). The Training intervention was excluded from this stage due to insufficient data for a robust causal analysis.
For the empirical validation, the hypothetical costs were set at $100,000. These parameters were used to generate the main results of this paper. This comprehensive experimental setup, combining real-world employee data with a simulated policy environment, provides the necessary foundation to evaluate the performance and potential utility of the CDO framework, the results of which are presented in the following section.

5. Methodological Illustration on a Synthetic Dataset

To clearly illustrate the mechanics of the CDO framework, we first apply it to the widely used synthetic HR Analytics dataset. The goal of this exercise is not to draw substantive conclusions, but to demonstrate the interaction between the predictive, causal, and optimization layers in a controlled environment. The key outcomes from each layer in this illustrative simulation are presented below.

5.1. Predictive Analysis: Attrition Risk Stratification

The initial layer of the framework involved training a Random Forest classifier to predict the probability of attrition for each employee, generating an individualized Attrition Risk Score. The distribution of these scores, segmented by the employees’ actual attrition status, is presented in Figure 3.
As illustrated, the predictive model demonstrates a clear ability to discriminate between employee groups. There is a marked rightward skew for employees who ultimately did attrit (denoted in salmon), indicating that the model assigns higher risk scores to this population. Conversely, the scores for employees who did not attrit (skyblue) are concentrated at the lower end of the risk spectrum. This provides a sound empirical basis for the subsequent optimization, which focuses resources on the high-risk segment.

5.2. Illustrative Causal Effect Estimation

The second layer employed Double Machine Learning (DML) to estimate the causal impact of three simulated interventions. Table 2 summarizes the Average Treatment Effects (ATEs). This analysis demonstrates the framework’s ability to uncover heterogeneous and even counter-intuitive effects. Most notably, the ‘Promotion’ intervention was estimated to have a statistically significant positive effect on attrition in this simulation, suggesting a poorly implemented promotion could increase flight risk. This counter-intuitive finding serves as a powerful example of why causal validation is necessary.
A pivotal finding from this analysis is the heterogeneity in both the direction and statistical significance of the effects. Most notably, Promotion is estimated to have a statistically significant positive effect on attrition, increasing the probability of an employee leaving by approximately 10.8 percentage points on average. This counter-intuitive result could, in a real-world scenario, suggest that promotions without adequate support or compensation might increase flight risk.
Conversely, Bonus and Training show negative point estimates, suggesting they reduce attrition probability as intended. However, these effects are not statistically significant at the 95% confidence level, as their confidence intervals contain zero. This implies that, based on the available data, we cannot statistically distinguish their impact from no impact at all. It is also imperative to note the UserWarning from the model execution, indicating that the co-variance matrix was underdetermined, which renders the calculated confidence intervals invalid for formal statistical inference. This is a critical limitation that would require further investigation in a non-simulated study.
Figure 4 provides a visual representation of these ATEs and their associated 95% confidence intervals.

5.3. Illustrating the Optimization Layer: Allocation Results

Using the scores from the previous layers, the optimization layer allocated a hypothetical budget of $250,000. The algorithm efficiently utilized 248,000 to target 50 high-risk employees. The strategic logic of the allocation is shown in the heatmap in Figure 5. As illustrated, the framework’s ROI-driven logic correctly avoided recommending the detrimental ‘Promotion’ intervention entirely, instead concentrating ‘Bonus’ and ‘Training’ interventions among the at-risk segments. This demonstrates the layer’s ability to translate causal insights into a cost-effective and logically sound action plan.
As shown, Bonuses and Training constitute most of the interventions, concentrated primarily among employees with risk scores between approximately 0.4 and 0.8. The ‘Promotion’ intervention was not recommended for any employee, which is a direct and logical consequence of the causal analysis as shown in Table 2 that identified its effect as increasing, rather than decreasing, attrition risk.

5.4. Illustrating the Explainability Layer: CFCA Results

Finally, to demonstrate the framework’s ability to generate deeper strategic insights, a CFCA was performed using a robust, rank-based aggregation of SHAP values. Figure 6 presents a comparative bar plot of the normalized feature importances for both predicting attrition risk (from the Random Forest model) and predicting the likelihood of receiving a treatment (from the L1-regularized propensity score model).
A clear divergence is apparent, confirming the core thesis of our framework. As shown in the plot, the features that are most important for predicting attrition risk are different from those that are most important for predicting who receives an intervention. For example, Years_of_service_numeric is the most important predictor of risk, while Job_Satisfaction_numeric is the primary driver determining who received a promotion. This result vividly illustrates that the factors predicting who is at risk are fundamentally different from the factors that explain why an intervention is assigned.

6. Empirical Proof-of-Concept on the Saudi Workforce Dataset

Having illustrated the framework’s mechanics, we now conduct a rigorous proof-of-concept on the empirical Saudi Employee Attrition Dataset to validate its performance and utility in a real-world context.

6.1. Predictive Analysis and Risk Stratification

The framework’s initial layer aimed to generate an individualized Attrition Risk Score for each employee. We compared a Random Forest classifier against a k-NN model. The results are summarized in Table 3.
The modest level of predictive accuracy is indicative of the challenges in modeling complex human behavior with real-world survey data. We proceeded using the risk scores from the Random Forest model for subsequent layers.

6.2. Empirical Causal Effect Estimation

The second layer employed PSM to provide a robust estimate of the causal impact of the two viable interventions. The analysis yielded the ATEs shown in Table 4, which represent the average change in attrition probability caused by each intervention.
The results clearly identify Promotion as a substantial and powerful retention lever for this workforce, causally reducing the probability of attrition by an estimated 23.9 percentage points. The Compensation (Bonus) intervention also had a beneficial impact, but its effect was more modest, reducing attrition probability by 5.6 percentage points.

6.3. Optimization and Allocation Results

Using the individualized risk scores, Employee Importance scores, and the causal ATEs, the optimization layer allocated the $100,000 budget to generate a final, prescriptive retention plan. The aggregate outcomes of this allocation process are summarized in Table 5.
The framework allocated the entire budget to fund 10 promotion interventions for the 10 highest-priority employees.

6.4. Integrated Analysis Dashboard

The integrated results of the framework’s application on the Saudi dataset are summarized in the six-panel dashboard in Figure 7.
Figure 7 presents six outputs from the analysis. The first panel shows the distribution of predicted attrition risk scores. You see how risk spreads across employees, which helps you identify groups with higher likelihood of leaving. The second panel shows the distribution of employee importance scores. This highlights how influence or impact varies across the workforce. The third panel shows a scatter plot of importance versus risk. This view makes it easy to spot employees who are both high risk and high importance. The fourth panel shows the estimated Average Treatment Effects for each intervention. These values indicate how much each intervention reduces attrition for employees who receive it. The fifth panel shows the distribution of allocated interventions. This confirms which intervention type the system selected for the workforce under the budget constraint. The sixth panel shows the final budget utilization. This verifies that the optimization used the full budget with no remaining funds.

6.5. Framework Validation Analyses

To rigorously test the design, fairness, and robustness of the CDO framework, we conducted two additional validation analyses on our empirical results.

6.5.1. Fairness Evaluation

A critical component of the framework is its ability to diagnose potential biases in allocation. We evaluated the fairness of the optimized intervention plan by comparing the gender distribution of the total employee population against the distribution of the allocated budget. The results are presented in Table 6.
The fairness evaluation revealed a significant demographic bias in the outcome. Although females constitute most of the workforce (56.93%), they received only 20% of the allocated retention budget. This finding does not represent a failure of the framework; on the contrary, it demonstrates its essential function as a diagnostic tool for governance. It highlights that for responsible deployment, HR leaders must use these insights to either investigate the root cause of the imbalance or implement explicit fairness constraints in the optimization layer.
To demonstrate extensibility, we re-ran the optimization under a post hoc fairness constraint that keeps the female budget share within ±5%-points of the population share (minimum 20%). The resulting plan treats 10 high-risk employees while allocating 40% of the budget to women (population share 56.93%), illustrating how equity targets can be embedded without sacrificing strategic efficiency.

6.5.2. Ablation Study: Validating the Framework Components

We conducted an ablation study comparing the Full CDO framework’s performance against three simplified strategies, using Risk-Weighted Attrition Reduction as our primary metric for strategic efficiency. The results are presented in Table 7.
The ablation study confirms the value of the CDO’s integrated design. It achieved a Risk-Weighted Reduction of 2.34, significantly outperforming the “Cheapest-First” strategy (0.71) and the “ROI-Only” strategy (1.73). This proves that synthesizing risk prioritization with causal ROI logic is more strategically efficient than either approach alone.

7. Discussion and Conclusions

This paper introduced and validated the Counterfactual–Dialectical optimization (CDO) framework, a novel methodology for advancing HR analytics from a predictive to a prescriptive paradigm. Our empirical proof-of-concept on the Saudi Employee Attrition dataset provides a compelling validation of the framework’s utility in a real-world setting. This concluding section interprets the principal findings from our empirical analysis, discusses their managerial implications, candidly addresses the limitations of this study, and summarizes the paper’s overall contribution.

7.1. Interpretation of Principal Findings

The successful execution of the CDO framework on empirical data yielded several noteworthy findings. First, the predictive layer highlighted the reality of workforce analytics: perfect prediction of human behavior is unattainable. The modest AUC scores (≈0.60) reflect a realistic level of uncertainty and underscore the inadequacy of relying on predictive models alone.
Despite this predictive uncertainty, the causal layer identified powerful and clear signals. In stark contrast to the illustrative simulation on synthetic data (Section 5), the empirical analysis revealed that a deserved promotion was the single most powerful lever for retention, reducing attrition probability by a substantial 23.9 percentage points. This reversal of findings provides the strongest possible evidence for the necessity of validating retention strategies on real, context-specific data.
Finally, the optimization and validation layers confirmed the framework’s strategic logic. The optimization algorithm correctly allocated the entire budget to the high-impact promotion strategy, a decision empirically justified by the Ablation Study (Table 7), which demonstrated the superiority of our integrated approach over simpler, ablated alternatives.

7.2. Managerial and Policy Implications

The findings presented hold several important implications for HR management and strategic workforce planning:
  • Shift to Personalized, Causal-Driven Interventions: The framework demonstrates the feasibility of moving from broad, one-size-fits-all retention policies to a targeted strategy where interventions are personalized based on robust causal evidence and individual employee value.
  • The Imperative of Causal Validation: The conflicting results for the ‘Promotion’ intervention between our illustrative and empirical analyses serve as a crucial cautionary tale. It highlights the danger of implementing policies based on intuition or generic benchmark studies and underscores the need for rigorous causal validation using an organization’s own data.
  • Proactive Fairness and Algorithmic Governance: As established in the literature, there is an urgent need for more comprehensive ethical frameworks to govern AI in HR [14]. Our empirical analysis speaks directly to this challenge. The significant gender bias revealed by our Fairness Evaluation (Table 6) is not a failure of the framework, but rather a demonstration of its essential function as the type of diagnostic and governance tool called for by current research. It proves that a purely ROI-driven optimization can perpetuate biases, confirming the risks highlighted by [13]. This finding underscores that proactive fairness monitoring, as enabled by the CDO framework, is critical for the responsible deployment of AI.

7.3. Limitations and Avenues for Future Research

For scholarly rigor, it is imperative to discuss the limitations inherent in this proof-of-concept study, which naturally delineate a clear path for future research.
The primary limitation is that while the employee data is real, the ‘interventions’ are proxies derived from survey responses rather than administrative records of a formal program. The most critical next step is to apply this methodology to a real, observational dataset from an organization containing historical records of non-randomly assigned interventions.
Secondly, while the Ablation Study confirmed our framework’s strategic efficiency, the optimization layer employs a greedy heuristic. Future work could explore more sophisticated optimization techniques, such as integer programming, to find a globally optimal portfolio of interventions.
Finally, the fairness evaluation highlighted the need for more advanced, fairness-aware algorithms. Future research should focus on integrating equity constraints directly into the optimization process, allowing HR leaders to balance the competing objectives of maximizing ROI and ensuring equitable resource distribution.

7.4. Conclusions

The primary contribution of this work is the development and empirical validation of an integrated conceptual framework. While this paper does not propose a new standalone statistical theory, its novelty lies in the synergistic synthesis of predictive modeling, causal inference, and constrained optimization to solve a critical, real-world business problem. This work bridges the gap between academic theory and practical application, providing a concrete methodology for organizations to move from simply understanding attrition to actively and efficiently managing it.
In an era where human capital is a primary driver of competitive advantage, traditional analytical approaches that merely predict attrition are no longer sufficient. This paper has addressed this gap by proposing and validating the CDO framework, a novel, multi-layered methodology designed for prescriptive retention analytics.
Our application of the framework on a real-world workforce dataset successfully demonstrated its end-to-end functionality. It generated a tactical retention plan that was cost-efficient and logically sound by identifying and investing in the highest-impact intervention. Furthermore, its integrated validation layers provided crucial insights into both its strategic efficiency, confirmed by the Ablation Study, and its potential for producing biased outcomes, as revealed by the Fairness Evaluation.
While we acknowledge the limitations of this study, the CDO framework provides a robust and comprehensive blueprint for the future of HR analytics. By shifting the focus from passive prediction to active, optimized, and explainable prescription, this work offers a clear path for organizations to more effectively and equitably invest in their most valuable asset: their people.

Author Contributions

Conceptualization, M.U.S., R.H., S.U. and M.I.A.; methodology, M.U.S. and M.I.A.; software, M.U.S.; validation, M.U.S., M.I.A. and S.U.; formal analysis, M.U.S. and M.I.A.; investigation, M.U.S.; resources, A.H.; data curation, M.U.S.; writing—original draft preparation, M.U.S. and M.I.A.; writing—review and editing, R.H., S.U., A.H. and M.I.A.; visualization, M.U.S.; supervision, R.H. and A.H.; project administration, R.H.; funding acquisition, A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethics approval is not mandated by UK ICO guidance for research that does not process personal data. The guidance clarifies that effectively anonymized information is not personal data under the UK GDPR.

Informed Consent Statement

Informed consent for participation is not required as per local legislation: UK ICO guidance on anonymization, which states that anonymous information is not personal data under the UK GDPR.

Data Availability Statement

Publicly available datasets were analyzed in this study. The empirical data for the main proof-of-concept, the ‘Saudi Employee Attrition Dataset’, can be found on Mendeley Data at https://data.mendeley.com/datasets/6z2hty8php/1 (accessed on 20 November 2025). The data used for the methodological illustration, the ‘HR Analytics: Job Change of Data Scientists’ dataset, is available on Kaggle at https://www.kaggle.com/datasets/uniabhi/hr-analytics-job-change-of-data-scientists (accessed on 8 October 2025).

Acknowledgments

The authors would like to acknowledge the use of ChatGPT-4 24 May 2023 version (OpenAI, San Francisco, CA, USA), specifically to assist in some content rewriting for improved clarity and effectiveness.

Conflicts of Interest

Author Snober Usman was employed by Ranas Accountancy, UK. The remaining authors declare no commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
ATEAverage Treatment Effect
AUC Area Under the Curve
CATEConditional Average Treatment Effect
CDOCounterfactual–Dialectical Optimization
CFCAConsistent Feature Contribution Analysis
DMLDouble Machine Learning
GBMGradient Boosting Machines
HRHuman Resources
k-NNk-Nearest Neighbors
MCKPMultiple-Choice Knapsack Problem
PSMPropensity Score Matching
ROIReturn on Investment
SHAPSHapley Additive exPlanations
VIFVariance Inflation Factor

References

  1. Urme, U.N. The Impact of Talent Management Strategies on Employee Retention. Int. J. Sci. Bus. 2023, 28, 127–146. [Google Scholar] [CrossRef]
  2. Al-Suraihi, W.A.; Samikon, S.A.; Al-Suraihi, A.A.; Ibrahim, I. Employee Turnover: Causes, Importance and Retention Strategies. Eur. J. Bus. Manag. Res. 2021, 6, 10. [Google Scholar] [CrossRef]
  3. Yashu; Sharma, R.; Jain, A.; Manwal, M. Enhancing Human Resource Management through Deep Learning: A Predictive Analytics Approach to Employee Retention Success. In Proceedings of the 2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore, India, 28–29 June 2024; pp. 1–4. [Google Scholar]
  4. Di Prima, C.; Cepel, M.; Kotaskova, A.; Ferraris, A. Help me help you: How HR analytics forecasts foster organizational creativity. Technol. Forecast. Soc. Change 2024, 206, 123540. [Google Scholar] [CrossRef]
  5. Jain, P.K.; Jain, M.; Pamula, R. Explaining and predicting employees’ attrition: A machine learning approach. SN Appl. Sci. 2020, 2, 757. [Google Scholar] [CrossRef]
  6. Pan, Y.; Zhan, P. The Impact of Sample Attrition on Longitudinal Learning Diagnosis: A Prolog. Front. Psychol. 2020, 11, 1051. [Google Scholar] [CrossRef]
  7. Weiss, M.; Zacher, H. Still Waters Run Deep: How Employee Silence Affects Instigated Workplace Incivility over Time. J. Bus. Ethics 2025, 20, 587–604. [Google Scholar] [CrossRef]
  8. Veloso, E.F.R.; Da Silva, R.C.; Dutra, J.S.; Fischer, A.L.; Trevisan, L.N. Talent Retention Strategies in Different Organizational Contexts and Intention of Talents to Remain in the Company. RISUS—Rev. Inovação Sustentabilidade 2014, 5, 49. [Google Scholar] [CrossRef][Green Version]
  9. Salas-Vallina, A.; Alegre, J.; López-Cabrales, Á. The challenge of increasing employees’ well-being and performance: How human resource management practices and engaging leadership work together toward reaching this goal. Hum. Resour. Manag. 2021, 60, 333–347. [Google Scholar] [CrossRef]
  10. Geerts, J.M. Maximizing the Impact and ROI of Leadership Development: A Theory- and Evidence-Informed Framework. Behav. Sci. 2024, 14, 955. [Google Scholar] [CrossRef]
  11. Hubbart, J.A. Organizational change: The challenge of change aversion. Adm. Sci. 2023, 13, 162. [Google Scholar] [CrossRef]
  12. D’amicantonio, S.; Kulangara, M.K.; Darshan Mehta, H.; Pal, S.; Levantesi, M.; Polignano, M.; Purificato, E.; De Luca, E.W. A Comprehensive Strategy to Bias and Mitigation in Human Resource Decision Systems. In Proceedings of the 5th Italian Workshop on Explainable Artificial Intelligence, Bolzano, Italy, 26–27 November 2024; pp. 11–27. [Google Scholar]
  13. Naoum, R. A Framework for Integrating AI-Powered Systems to Mitigate Bias Risk in HRMFunctions. Mark. Menedzsment 2025, 59, 52–61. [Google Scholar] [CrossRef]
  14. Bar-Gil, O.; Ron, T.; Czerniak, O. AI for the people? Embedding AI ethics in HR and people analytics projects. Technol. Soc. 2024, 77, 102527. [Google Scholar] [CrossRef]
  15. Ali, A.; Jayaraman, R.; Azar, E.; Maalouf, M. A comparative analysis of machine learning and statistical methods for evaluating building performance: A systematic review and future benchmarking framework. Build. Environ. 2024, 252, 111268. [Google Scholar] [CrossRef]
  16. Quinteros, D.M. Predictive Modelling of Employee Attrition Using Deep Learning. Acadlore Trans. AI Mach. Learn. 2023, 2, 212–225. [Google Scholar] [CrossRef]
  17. Nandal, M.; Grover, V.; Sahu, D.; Dogra, M. Employee Attrition: Analysis of Data Driven Models. EAI Endorsed Trans. Internet Things 2024, 10, 1–10. [Google Scholar] [CrossRef]
  18. Chung, D.; Yun, J.; Lee, J.; Jeon, Y. Predictive Model of Employee Attrition Based on Stacking Ensemble Learning. SSRN Electron. J. 2022. [Google Scholar] [CrossRef]
  19. Athey, S.; Imbens, G.W. The State of Applied Econometrics: Causality and Policy Evaluation. J. Econ. Perspect. 2017, 31, 3–32. [Google Scholar] [CrossRef]
  20. Moraes, F.; Manuel Proença, H.; Kornilova, A.; Albert, J.; Goldenberg, D. Uplift Modeling: From Causal Inference to Personalization. In Proceedings of the Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 5212–5215. [Google Scholar] [CrossRef]
  21. De Caigny, A.; Coussement, K.; Verbeke, W.; Idbenjra, K.; Phan, M. Uplift modeling and its implications for B2B customer churn prediction: A segmentation-based modeling approach. Ind. Mark. Manag. 2021, 99, 28–39. [Google Scholar] [CrossRef]
  22. Singh, S.S.K.; Kumar Sinha, A.; Pandey, T.N.; Acharya, B.M. A Machine Learning Approach to Compare Causal Inference Modelling Strategies in the Digital Advertising Industry. In Proceedings of the 2023 2nd International Conference on Ambient Intelligence in Health Care (ICAIHC), Bhubaneswar, India, 17–18 November 2023; pp. 1–7. [Google Scholar] [CrossRef]
  23. Wager, S.; Athey, S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. J. Am. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
  24. Künzel, S.R.; Sekhon, J.S.; Bickel, P.J.; Yu, B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 4156–4165. [Google Scholar] [CrossRef]
  25. Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; Newey, W.; Robins, J. Double/debiased machine learning for treatment and structural parameters. Econom. J. 2018, 21, C1–C68. [Google Scholar] [CrossRef]
  26. Bibi, N.; Ahsan, A.; Anwar, Z. Project resource allocation optimization using search based software engineering—A framework. In Proceedings of the Ninth International Conference on Digital Information Management (ICDIM 2014), Phitsanulok, Thailand, 29 September–1 October 2014; pp. 226–229. [Google Scholar] [CrossRef]
  27. Yoshimura, M.; Fujimi, Y.; Izui, K.; Nishiwaki, S. Decision-making support system for human resource allocation in product development projects. Int. J. Prod. Res. 2006, 44, 831–848. [Google Scholar] [CrossRef]
  28. Certa, A.; Enea, M.; Galante, G.; Manuela La Fata, C. Multi-objective human resources allocation in R&D projects planning. Int. J. Prod. Res. 2009, 47, 3503–3523. [Google Scholar] [CrossRef]
  29. Hasan, R.; Dattana, V.; Mahmood, S. Dialectical search: A cognitively inspired framework for balancing solution quality and computational cost in global optimization. J. Umm Al-Qura Univ. Eng. Archit. 2025, 1–15. [Google Scholar] [CrossRef]
  30. Alsheref, F.K.; Fattoh, I.E.; M.Ead, W. Automated Prediction of Employee Attrition Using Ensemble Model Based on Machine Learning Algorithms. Comput. Intell. Neurosci. 2022, 2022, 7728668. [Google Scholar] [CrossRef] [PubMed]
  31. Jung, Y.; Tian, J.; Bareinboim, E. Estimating Identifiable Causal Effects through Double Machine Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 12113–12122. [Google Scholar] [CrossRef]
Figure 1. Conceptual Architecture of the CDO Framework.
Figure 1. Conceptual Architecture of the CDO Framework.
Information 16 01053 g001
Figure 2. Dialectical–Counterfactual Loop Diagram.
Figure 2. Dialectical–Counterfactual Loop Diagram.
Information 16 01053 g002
Figure 3. Attrition Risk Distribution by Actual Attrition Status.
Figure 3. Attrition Risk Distribution by Actual Attrition Status.
Information 16 01053 g003
Figure 4. Estimated Average Causal Effects per Intervention (Illustrative).
Figure 4. Estimated Average Causal Effects per Intervention (Illustrative).
Information 16 01053 g004
Figure 5. Intervention Allocation Map by Attrition Risk Segment (Illustrative).
Figure 5. Intervention Allocation Map by Attrition Risk Segment (Illustrative).
Information 16 01053 g005
Figure 6. Global Feature Contribution Comparison: Risk vs. Treatment Drivers (Illustrative).
Figure 6. Global Feature Contribution Comparison: Risk vs. Treatment Drivers (Illustrative).
Information 16 01053 g006
Figure 7. Integrated Analysis Dashboard for the Saudi Employee Attrition Dataset.
Figure 7. Integrated Analysis Dashboard for the Saudi Employee Attrition Dataset.
Information 16 01053 g007
Table 1. Algorithmic Summary of the CDO Framework.
Table 1. Algorithmic Summary of the CDO Framework.
StepPhaseFunctionExplanation
1Attrition Risk StratificationMrisk(xe) → reTrain a predictive model such as Random Forest to predict the probability of attrition (re) for each employee (e).
2Causal Effect EstimationPSM (Y, T, X) → ATEkUse PSM to estimate the ATE for each potential intervention (k).
3Dialectical OptimizationOptimize (re, ATE(we), B) → π(e)Apply a dialectical search heuristic to assign the best affordable interventions (π) to high-priority employees under a budget (B), weighting by the Employee Importance Score (we)
4Consistent Feature AnalysisCFCA (φrisk, φcausal) → ΦReconcile SHapley Additive exPlanations (SHAP) values (φ) from the risk model and the propensity score model to identify a set of consistently important features (Φ).
Table 2. Summary of ATEs from the Illustrative Simulation.
Table 2. Summary of ATEs from the Illustrative Simulation.
InterventionMean Effect (ATE)Lower CIUpper CISignificance
Bonus−0.012666−0.0539160.028585Not Significant
Promotion0.1084160.0481260.168705Significant
Training−0.040247−0.0829380.002443Not Significant
Table 3. Predictive Model Performance Comparison.
Table 3. Predictive Model Performance Comparison.
ModelAUC Score
k-NN0.6008
Random Forest (Primary)0.5653
Table 4. Causal Effect Estimates (ATE) via Propensity Score Matching.
Table 4. Causal Effect Estimates (ATE) via Propensity Score Matching.
InterventionATE
Promotion−0.2393
Compensation (Bonus)−0.0555
Table 5. Optimization Summary.
Table 5. Optimization Summary.
MetricValue
Total Budget Allocated$100,000.00
Total Employees Targeted10
Primary Intervention AssignedPromotion
Table 6. Fairness Metrics (Gender) for the Empirical Allocation Plan.
Table 6. Fairness Metrics (Gender) for the Empirical Allocation Plan.
GenderPopulation Share (%)Budget Share (%)
Female56.9320.00
Male43.0780.00
Table 7. Ablation Study Results with Risk-Weighted Metric (Empirical).
Table 7. Ablation Study Results with Risk-Weighted Metric (Empirical).
Framework VersionAllocation StrategyBudget UsedEmployees TreatedTotal Attrition ReductionRisk-Weighted Attrition Reduction
Scenario ARisk-Prioritised + Cheapest$97,500130.72110.7108
Full CDO FrameworkRisk-Prioritised + Causal ROI$100,000102.39292.3402
Scenario BROI-Only (No Risk Priority)$100,000102.39291.7270
Scenario CRisk-Prioritised + Raw Uplift$100,000102.39292.3402
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alyousef, M.I.; Sattar, M.U.; Hasan, R.; Usman, S.; Hassan, A. The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation. Information 2025, 16, 1053. https://doi.org/10.3390/info16121053

AMA Style

Alyousef MI, Sattar MU, Hasan R, Usman S, Hassan A. The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation. Information. 2025; 16(12):1053. https://doi.org/10.3390/info16121053

Chicago/Turabian Style

Alyousef, Muna I., Mian Usman Sattar, Raza Hasan, Snober Usman, and Atif Hassan. 2025. "The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation" Information 16, no. 12: 1053. https://doi.org/10.3390/info16121053

APA Style

Alyousef, M. I., Sattar, M. U., Hasan, R., Usman, S., & Hassan, A. (2025). The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation. Information, 16(12), 1053. https://doi.org/10.3390/info16121053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop