1. Introduction
Customer retention has become a critical challenge for businesses across various industries, including telecommunications, retail, banking, insurance, healthcare, education, and subscription-based services. Customer churn—customers discontinuing their relationship with a company—can significantly impact revenues, with annual churn rates ranging from 20% to 40% in some sectors [
1]. Research indicates that acquiring a new customer is five to twenty-five times more expensive than retaining an existing one, making churn prevention a strategic priority for companies [
2].
Machine Learning and Deep Learning have emerged as powerful tools for churn prediction due to their ability to analyze large, high-dimensional, and dynamic customer datasets effectively. Traditional churn prediction methods, such as rule-based systems and statistical modeling, often fail to capture customer behaviour’s complexities adequately. Conversely, ML approaches like Decision Trees (DTs), Random Forests (RFs), Support Vector Machines (SVMs), and boosting algorithms (e.g., XGBoost, LightGBM, CatBoost) have demonstrated strong predictive capabilities with structured datasets [
3,
4,
5]. Furthermore, advanced DL architectures—including Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Transformer-based models—provide significant advantages for modeling sequential and unstructured data, such as customer interaction histories and textual feedback.
Despite these technological advancements, several critical challenges remain in churn prediction. Model interpretability remains a significant concern, especially with complex DL-based approaches often functioning as “black-box” models [
6]. Data imbalance is another prevalent issue, as churn datasets typically feature significantly fewer churners than non-churners, potentially biasing model predictions [
5]. Additionally, concept drift—the evolving nature of customer behaviour over time—complicates the sustained accuracy of predictive models.
This literature review systematically explores advancements in customer churn prediction by analyzing peer-reviewed research published between 2020 and 2024 across diverse domains such as telecommunications, retail, banking, healthcare, education, and insurance. It aims to map the current landscape of ML and DL approaches, evaluating their strengths, limitations, and applicability to real-world scenarios. Given the broad adoption of predictive analytics across industries, this review seeks to clarify the evolution of these methodologies, the specific challenges they address, and the gaps that require further research.
A key objective of this study is to identify and categorize the most frequently employed ML and DL techniques used in churn prediction. Understanding the evolution of these methods over recent years provides insights into how businesses and researchers have refined approaches to enhance accuracy and adaptability. Additionally, this review evaluates the performance and interpretability of various predictive models, focusing specifically on their capacity to manage imbalanced datasets, dynamic customer behaviours, and practical deployment constraints. Considering that customer churn results from multiple factors—such as transaction histories, engagement patterns, and external market conditions—it is crucial to assess the effectiveness of models in capturing these complexities.
Another central goal is highlighting persistent challenges and limitations within churn prediction research. Despite substantial progress, issues such as the black-box nature of DL models, class imbalance, and difficulty adapting models to evolving customer behaviours impede real-world implementations. This review emphasizes these research gaps and suggests potential areas for future investigation, including improving model transparency, advancing feature engineering techniques, and developing adaptive learning methods to address shifting customer preferences.
While this review synthesizes a broad body of recent literature on customer churn prediction, we intentionally refrain from presenting a direct comparison of their reported performance metrics (e.g., accuracy, F1-score, AUC). This decision is based on the substantial heterogeneity observed across the studies regarding dataset characteristics, imbalance ratios, feature sets, modeling objectives, and evaluation protocols.
Specifically, models were trained and validated on various public and proprietary datasets drawn from diverse industries (e.g., telecommunications, banking, e-commerce), often with distinct definitions of churn, time windows, and input modalities. Evaluation metrics also varied widely, with some studies prioritising business-oriented outcomes and others focusing on statistical measures. As such, any attempt to aggregate or compare these results directly would risk introducing misleading interpretations and overgeneralizations.
Instead, this review focuses on identifying methodological trends, the taxonomy of modeling strategies, and common challenges and innovations. Where appropriate, we highlight representative studies that exemplify key methodological advances without asserting quantitative superiority. We encourage future benchmark studies using standardized datasets and experimental protocols to conduct rigorous performance comparisons, ideally incorporating statistical significance testing under controlled conditions.
To address these objectives, this study is guided by three fundamental research questions:
RQ1: What are the predominant ML and DL approaches used in customer churn prediction, and how have these methodologies evolved over time?
RQ2: How do different predictive models compare accuracy, adaptability, and interpretability when applied to churn prediction across various industries?
RQ3: What are the significant challenges and limitations in existing churn prediction research, and what future directions can be explored to enhance the effectiveness of predictive models?
This review synthesizes current research to inform both academic and industry practices. This work’s specific contributions and novel aspects are outlined in the following subsection.
Contributions and Novelty
This study offers several distinct contributions that differentiate it from prior reviews on customer churn prediction:
Most Recent and Comprehensive Scope: We systematically review peer-reviewed research published between January 2020 and December 2024, encompassing recent advances such as CNN-based architectures, hybrid deep learning frameworks, and profit-driven modelling approaches. Earlier reviews predominantly focus on pre-2020 literature and therefore do not capture these emerging trends.
PRISMA-Guided and Reproducible Methodology: Our search and selection strategy adheres to the PRISMA 2020 guidelines, ensuring methodological transparency and reproducibility. We employ a two-phase review process, an initial bibliometric analysis of 240 studies followed by an in-depth synthesis of 61 key papers. Whereas existing reviews often lack such a structured and replicable approach.
Novel Hierarchical Taxonomy: We introduce a new hierarchical taxonomy that categorizes ML and DL approaches into fine-grained subgroups (e.g., profit-centric models, optimization/metaheuristics, adaptive learning, explainable AI). This taxonomy provides a systematic framework for mapping the methodological landscape, a feature absent in earlier works.
Integration of Bibliometric and Methodological Insights: In addition to methodological synthesis, we conduct a comprehensive bibliometric analysis, including publisher trends, citation dynamics, and open-access effects, to contextualize the research landscape. Previous reviews focus exclusively on models and do not incorporate dissemination-oriented analyses.
Identification of Emerging Challenges Supported by Evidence-Based Trends: We identify challenges such as class imbalance, concept drift, and the limited adoption of business-oriented evaluation metrics, linking them to representative studies published between 2020 and 2024. This evidence-driven mapping of trends provides a more precise and up-to-date perspective than the generic limitations discussed in earlier surveys.
By clearly delineating these contributions, this review makes its novelty and value explicit, offering actionable insights for academic researchers and industry practitioners engaged in customer retention analytics.
2. Purpose of the Study
Customer churn prediction is vital in modern Customer Relationship Management (CRM), helping businesses proactively retain at-risk customers and maximize customer lifetime value. With high churn rates leading to substantial revenue losses, businesses in subscription-based services, telecommunications [
1,
7], retail [
8], banking [
9], education [
10], healthcare [
11], Insurance [
12], and other sectors increasingly rely on data-driven approaches to enhance customer retention strategies.
While businesses collect vast amounts of customer data, extracting actionable insights from these datasets is challenging. Data mining, a key discipline in ML and artificial intelligence, enables organizations to uncover hidden patterns and trends in churn behaviours. However, the effectiveness of churn prediction models varies significantly based on the choice of methodology, dataset characteristics, and industry-specific factors.
This study systematically reviews 240 research articles published between 2020 and 2024, focusing on churn prediction using ML and DL methodologies across various sectors. The review:
Examines different churn prediction approaches across multiple industries.
Assesses the comparative performance of ML and DL techniques in churn prediction.
Investigates common challenges, such as data imbalance, feature selection, interpretability, and concept drift.
Highlights emerging trends in churn prediction, including profit-driven modeling, explainable AI (XAI), and adaptive learning approaches.
Churn prediction research is crucial for developing effective retention strategies, allowing businesses to anticipate customer attrition, personalize marketing efforts, and allocate retention budgets more efficiently. Studies suggest that businesses implementing advanced churn prediction techniques can improve retention rates by 5–10%, leading to profit increases of 25–95% [
13].
By synthesizing insights from recent research, this paper serves as a valuable resource for researchers, data scientists, and industry practitioners, helping them understand best practices, methodological advancements, and future directions in churn prediction.
For more information, readers can refer to several comprehensive review papers that explore various aspects of customer churn prediction. Imani and Arabnia [
3] provide a comparative analysis of hyperparameter optimization techniques and data sampling strategies in ML models for churn prediction, highlighting their impact on predictive performance. The authors in [
5] extend this analysis by evaluating the effectiveness of SMOTE, ADASYN, and GNUS upsampling techniques in conjunction with RF and XGBoost under different class imbalance levels. Geiler et al. [
14] offer a broad survey of ML approaches for churn prediction, discussing their strengths, limitations, and practical applications. Domingos et al. [
15] focus on hyperparameter tuning for DL-based churn prediction models, particularly within the banking sector, providing insights into optimizing deep neural networks for improved accuracy. These studies offer valuable perspectives on churn prediction research’s methodological advancements and challenges.
3. Search Strategies
A systematic literature search was conducted across six major academic publishers, including Springer, IEEE, Elsevier, MDPI, ACM, and Wiley, ensuring comprehensive coverage of recent advancements in customer churn prediction using ML and DL techniques. The search was executed via Lens.org, a scholarly research platform offering advanced filtering and indexing capabilities superior to generic search engines like Google Scholar.
To refine the search, the query “(churn prediction AND machine learning) OR (churn prediction AND deep learning) NOT (“survey” OR “review”)” was applied, focusing on original research contributions rather than survey or review articles. Additionally, results were restricted to journal and conference proceedings articles published between 2020 and 2024, ensuring relevance to recent developments. The KStem-based stemming approach was utilized to normalize variations of the term “churn,” such as “churned” and “churning,” to capture a broader range of relevant studies. The final search was conducted on 15 January 2025. Visualizations and plots were produced using Python 3.13, employing the matplotlib and seaborn libraries to ensure clarity and reproducibility of graphical results.
As illustrated in
Figure 1, the initial search retrieved 837 articles. To ensure relevance and quality, a series of refinement steps was applied. First, filtering by document type to include only journal and conference articles while excluding pre-prints, technical reports, and other non-peer-reviewed documents reduced the count to 679 articles. Next, restricting the selection to high-quality publishers—as previously outlined—further refined the dataset to 368 articles. Finally, a domain-specific review was conducted to eliminate papers unrelated to customer churn prediction or those not utilizing ML and DL techniques. This resulted in a final selection of 240 articles for the first phase (shallow review phase). This exploratory phase analyzed broad research trends, methodological patterns, and key developments in customer churn prediction using ML and DL approaches. This phase focused on high-level bibliometric analysis, including publication trends across research domains, the distribution of ML and DL techniques, the average citation trends of publishers (Crossref citation), citation patterns, and the publications shared among different publishers over the past five years (2020–2024). By analyzing these broader trends, this phase provided a foundation for identifying the most influential studies, emerging research directions, and methodological advancements.
A second phase (deep review phase) was conducted to ensure a more focused and rigorous examination, in which 61 papers were selected based on relevance, citation impact, methodological novelty, and contribution to the field. This phase delved into the technical depth of the selected studies, focusing on critical aspects such as dataset characteristics, applied ML and DL techniques, evaluation metrics, and the key outcomes reported in the studies. By conducting this two-phase review strategy, the study captured broad research trends and provided a granular understanding of methodological advancements, dataset challenges, and performance benchmarks. This structured approach enhanced the literature review’s comprehensiveness, objectivity, and depth, ensuring both breadth and depth in assessing the state-of-the-art customer churn prediction research.
The inclusion criteria are outlined below:
Articles must focus on churn prediction using ML or DL techniques.
Articles published between 2020 and 2024 in peer-reviewed, high-quality journals.
Articles must be original research papers.
Articles published in English.
The exclusion criteria are outlined below:
Articles unrelated to churn prediction.
Articles unrelated to ML or DL.
Non-peer-reviewed works (e.g., lecture notes, newsletters, dissertations).
Low-quality publishers.
Review papers, preprints, books, etc.
Non-English publications.
This systematic approach, grounded in a well-documented filtering process and adherence to PRISMA guidelines, ensures the reproducibility of this literature review. All inclusion criteria, search strings, and filtering steps have been explicitly outlined to facilitate replication by future researchers.
Two reviewers (MI and MJ) collaboratively screened titles and abstracts for relevance, resolving disagreements through discussion. One reviewer (MI) extracted study characteristics and methodological details for data collection, while the second reviewer (MJ) cross-checked for accuracy. No automation tools or contact with study authors were used during these processes.
For each included study, data were extracted on the primary outcomes of interest: ML/DL techniques employed, evaluation metrics (e.g., accuracy, F1-score, ROC-AUC, PR-AUC), and key findings related to methodological challenges such as class imbalance, concept drift, and model interpretability. Additional variables collected included publication year, application domain (e.g., telecommunications, banking, healthcare), dataset characteristics (public, private, or synthetic), and study citation metrics. All data were extracted as reported in the original publications; no imputation or conversions were applied.
Studies were grouped for synthesis using a two-phase approach: a shallow review phase (240 studies) to identify broad methodological trends and a deep review phase (61 studies) for detailed analysis. Results were tabulated and visually displayed using summary tables and figures to illustrate trends in ML/DL techniques, performance metrics, and application domains. Narrative synthesis was performed to summarize methodological patterns and challenges, as a meta-analysis was not feasible due to heterogeneity in study designs, datasets, and evaluation metrics. No subgroup analyses or sensitivity analyses were conducted, given the qualitative focus of this review.
We did not perform a formal risk of bias assessment or reporting bias assessment, as the review aimed to synthesize methodological trends rather than evaluate the quality of individual studies. Similarly, a formal certainty assessment (e.g., using GRADE) was not applied. Future systematic reviews conducting quantitative synthesis or meta-analyses should consider incorporating these assessments using standardized tools such as ROBIS, AMSTAR 2, or GRADE. This systematic review was retrospectively registered in the Open Science Framework (OSF) under DOI:
https://doi.org/10.17605/OSF.IO/PZ2H7.
4. Trends in Churn Prediction Research
To comprehensively investigate the state of churn prediction research, we systematically reviewed 240 publications spanning the years 2020 to 2024. This five-year window was chosen to capture current trends and reflect the rapid advancements in ML and DL applications. The broad scope of this initial pool enabled us to analyze significant trends in publisher distribution, citation dynamics, average citation variations, research domain focus, and the adoption of various ML and DL techniques. All studies excluded during the screening process failed to meet the predefined inclusion criteria (e.g., they did not employ ML/DL techniques, did not address churn prediction, or were non-peer-reviewed). No studies that initially appeared to meet inclusion criteria were excluded during full-text review.
From this more extensive set, we selected 61 studies for deeper qualitative examination. This subset was identified based on multiple criteria, including methodological rigor, novelty of approach, domain diversity, and overall contribution to the field. By combining a wide-ranging quantitative overview with a focused, in-depth analysis of key studies, our methodology ensures an expansive mapping of churn prediction research and a thorough investigation of the most influential and innovative work. This dual-level strategy thus provides readers with a robust understanding of current practices, emerging challenges, and future directions in churn prediction using ML and DL techniques.
Figure 2 presents the overall distribution of publications by publishers. The pie chart illustrates that IEEE accounts for the largest share, with 60.4% of the total publications. Springer and Elsevier follow, at 12.9% and 11.2%, respectively, while MDPI comprises 7.1% of the dataset. ACM and Wiley comprise the remaining 5.8% and 2.5%, respectively. These percentages highlight the dominant position of IEEE among the publishers represented in this study.
Figure 3 further explores the temporal dimension of these publications from 2020 through 2024. IEEE exhibits a marked increase in published papers, peaking in 2023. In contrast, the other publishers remain relatively steady, though minor fluctuations can be observed from year to year. Notably, the apparent decline in publications for 2024 is likely attributable to incomplete indexing during data extraction (January 2025). Given that not all 2024 publications may have been processed and included in our study by that point, the downward trend for 2024 should be interpreted with caution. These figures suggest that IEEE consistently leads in publication output, while other publishers maintain comparatively smaller yet stable shares over the examined period.
Figure 4 and
Figure 5 illustrate the number of citations and normalized impact factor trends for the selected publishers (Elsevier, IEEE, MDPI, Springer, Wiley, and ACM) from 2020 to 2024.
Figure 4 shows that Elsevier exhibited the highest total citations in 2020, followed by a noticeable decline in subsequent years. Other publishers, including IEEE and MDPI, display smaller but still discernible peaks in earlier years, with a tendency toward reduced citation counts in 2023 and 2024. These observations align with the typical pattern in bibliometric analyses, whereby earlier publications have a longer window to accumulate citations.
Figure 5 illustrates the normalized impact factor trends of the publishers from 2020 to 2024. To ensure a fair comparison of citation performance across publication years, we computed a normalized impact factor (IF) by dividing the total number of citations received by the number of published papers and the number of years since publication. This approach accounts for the varying time windows available for papers to accumulate citations, thus mitigating the bias that favors earlier publications. The formula used is as follows:
As shown in
Figure 5, Elsevier and MDPI consistently outperform other publishers in terms of normalized impact across most years. Elsevier exhibits strong performance in 2020 (above 10 citations per paper per year), dips in 2022, and then peaks again in 2023, suggesting a combination of high-impact publications and efficient visibility. MDPI demonstrates a steep rise in 2021—reaching nearly 10 citations per paper per year—and a gradual decline in the following years yet maintaining a relatively strong citation performance through 2023. Springer shows a downward trend from 2020 to 2022 but stabilizes around three citations per paper per year by 2023. Wiley peaks in 2021, like MDPI, followed by a moderate but steady decline. IEEE and ACM display lower and more stable citation patterns across the years, with values remaining primarily below 2, indicating more consistent but modest average citation rates.
While the normalized impact factor accounts for the time since publication, a general decline is still observed in 2024 across most publishers. This may reflect several factors, including recent shifts in publication strategies, article topics, quality changes, or early-stage visibility. Moreover, papers published in 2024 may not yet be fully indexed or cited at the time of data extraction (January 2025), especially for journals with delayed indexing pipelines. As such, citation-based metrics from the most recent year should be interpreted with caution, as they may underestimate the eventual long-term impact of these publications.
Overall, the trends reveal significant year-to-year variation in normalized citation performance among publishers, underscoring the roles of editorial policy, topical focus, and dissemination strategies. By adjusting for publication age, the normalized impact factor offers a fairer and more time-independent comparison, particularly when analyzing performance across both recent and earlier publication years.
Figure 6 illustrates the overall distribution of citation counts for the collected publications, revealing a highly skewed pattern. Most papers receive only a few citations (fewer than five), while a relatively small number of publications accumulate notably higher citation counts. This right-skewed distribution is typical in bibliometric analyses, wherein most publications garner modest attention, whereas a limited subset gains substantial visibility and, consequently, higher citation impact.
Figure 7 presents the normalized impact factor trends—the average number of citations per paper per year—for Open Access (OA) and Non-Open Access (non-OA) publications from 2020 to 2024. Across all years, OA papers consistently outperform non-OA articles in terms of citation impact, with robust performance in 2020 and 2021. This trend supports the notion that OA publishing may enhance the visibility and discoverability of research, thereby increasing its citation potential. While the normalized metric accounts for the time since publication, a noticeable decline is observed for both OA and non-OA papers in 2024. This may reflect limited early-stage visibility, indexing delays, or publication lags that hinder citation accumulation, particularly for articles published close to the data extraction date (January 2025), which may not yet be fully indexed or cited, especially in journals with slower indexing pipelines. As such, the lower values observed for the most recent year should be interpreted cautiously, as they may not accurately reflect the long-term influence of those publications.
Figure 8 presents the annual distribution of publications across six research domains—Telecom, Retail, Banking, Education, Healthcare, and Insurance—from 2020 to 2024. Across most domains, the overall trend is gradual growth from 2020 through 2023, followed by a slight decline in 2024. Telecom shows a pronounced increase in publications up to 2023, indicating a sustained research focus on churn prediction within that sector. Healthcare and Education also exhibit steady upward trajectories, reflecting broader interest in applying churn-related methodologies to patient retention and student engagement. Retail and Banking maintain moderate but consistent growth, while Insurance remains comparatively lower throughout the observed period. The apparent drop in 2024 publications for all domains is likely influenced by the shorter window for indexing at the time of data extraction (January 2025), and it does not necessarily indicate a waning research interest.
Figure 9 presents the time series trends of ML and DL techniques in churn prediction from 2020 to 2024. ML methods exhibit a steady upward trend, indicating their widespread adoption. In contrast, DL publications remain relatively low but show gradual growth. The apparent decline in 2024 should be interpreted cautiously, as many papers from this year may not yet be fully indexed or have had sufficient time to gain citations and visibility.
Figure 10 depicts the annual usage of seven ML algorithms—Boosting Techniques (including XGBoost, LightGBM, and CatBoost), K-Nearest Neighbors, RF, DT, SVM, Naïve Bayes, and Logistic Regression—between 2020 and 2024. Boosting Techniques, RF, and Logistic Regression show notable growth through 2022–2023, suggesting increased research interest in ensemble-based methods and widely used baseline models. While most techniques experienced a slight dip in 2024, it is likely due to incomplete indexing and the relatively short time since publication at the time of data extraction (January 2025).
Figure 11 focuses on DL approaches—ANNs, LSTMs, CNNs, Recurrent Neural Networks (RNNs), Transformers, and Reinforcement Learning—over the same period. ANNs exhibit a pronounced surge in 2022, reflecting their broad applicability in diverse domains. LSTMs and CNNs also show moderate yet consistent usage, while Transformers and Reinforcement Learning remain less frequent but appear to have gained modest traction in recent years. Like the ML trends, the lower counts for 2024 likely do not capture the full extent of ongoing research activity, underscoring the need to interpret these recent-year values cautiously. Overall, the data reveal a continued shift toward advanced ML and DL techniques, albeit tempered by the time-dependent nature of publication and indexing cycles.
While the primary focus of this review is on methodological advancements in churn prediction, analyzing where and how research is published offers complementary insights into the dissemination and visibility of the field. The distribution of publications across major academic publishers and the temporal trends in citation activity help illustrate the growing attention to churn prediction across domains such as telecommunications, banking, and healthcare. For example, the predominance of IEEE publications may reflect historical engagement with machine learning applications in telecommunications and a concentration of conference-style contributions. While citation trends at the publisher level cannot be directly linked to specific methods or studies, they may suggest broader patterns in research visibility, accessibility (e.g., open access availability), and perceived relevance. As such, these bibliometric observations contextualize, not evaluate, the methodological developments reviewed in this study.
5. Paper’s Categorizations
In our review, we propose a comprehensive taxonomy that systematically organizes the literature on churn prediction into two primary methodological categories: Machine Learning Approaches and Deep Learning Approaches. Each category is further subdivided into specific subcategories, as illustrated in
Figure 12.
The ML Approaches encompass a range of techniques, including profit-centric models, which optimize retention strategies based on business impact, and ensemble and hybrid approaches, which combine multiple classifiers to improve predictive performance. Optimization and metaheuristic methods also focus on refining feature selection and hyperparameter tuning, while adaptive and resampling techniques address data imbalance and concept drift. The review also covers explainable and interpretable models, which enhance transparency in churn prediction, data-centric and augmentation strategies that leverage novel data sources and synthetic data generation, and traditional ML techniques, which continue to play a foundational role in churn modeling.
On the other hand, DL approaches leverage advanced architectures to capture complex patterns in customer behaviour. These include deep reinforcement learning, which enables adaptive decision-making, and temporal and sequential models, such as LSTMs, which capture evolving churn patterns over time. The taxonomy also highlights hybrid and ensemble DL approaches, which integrate multiple DL frameworks for improved generalization, and CNN-based models, which excel in feature extraction. Furthermore, feedforward deep neural networks, NLP-based models for text-based churn analysis, and representation and feature interaction techniques, which enhance predictive performance by capturing high-order dependencies, are explored.
As noted in the Introduction, direct comparison of reported performance metrics was avoided due to substantial heterogeneity in datasets, evaluation protocols, and modeling objectives across studies. Instead, a descriptive synthesis of individual study results is presented.
By structuring the existing research into this hierarchical framework, our taxonomy provides a clear perspective on the evolution of churn prediction methodologies. It underscores how different approaches have been tailored to address the multifaceted challenges of churn modeling, from enhancing predictive accuracy and scalability to improving interpretability and data efficiency.
6. Machine Learning Approaches
Machine learning methodologies have significantly enhanced churn prediction through diverse approaches to address complex customer retention challenges across various sectors. Recent research encompasses profit-driven models, ensemble learning techniques, optimization-based methods, adaptive resampling strategies, explainable artificial intelligence (XAI), and traditional algorithms. Each methodology contributes distinct advantages such as improved predictive accuracy, enhanced interpretability, computational efficiency, and alignment with business objectives. This section reviews these innovative approaches, outlining their methodologies, data characteristics, and performance evaluations, thereby providing valuable guidance for selecting suitable ML techniques for specific churn prediction applications.
Table 1 briefly summarizes each study by indicating the dataset types used (public, private, or synthetic), ML techniques employed, and performance metrics evaluated.
6.1. Profit-Centric Approaches
Recent developments in churn prediction research reflect a growing emphasis on aligning predictive models with business objectives, particularly profitability. Traditionally, churn models have been optimized for accuracy-based metrics like AUC. Still, a shift toward integrating financial considerations directly into model training has emerged as critical for more impactful customer retention strategies.
Höppner et al. [
16] exemplify this shift by introducing ProfTree, a profit-driven DT tailored explicitly for churn prediction. Rather than solely optimizing classification accuracy, ProfTree employs the Expected Maximum Profit for Customer Churn (EMPC) metric to construct DTs prioritising profitability. The model systematically accounts for misclassification costs and customer-specific economic value through an evolutionary algorithm. Experiments on telecommunication datasets demonstrate that ProfTree significantly enhances profit compared to conventional accuracy-centric approaches, underscoring the importance of profit-centric predictive analytics.
Building on similar principles, Maldonado et al. [
17] propose a profit-oriented churn prediction model utilizing Minimax Probability Machines (MPM). Unlike traditional methods that often use profitability metrics only during post-model selection or threshold adjustments, this approach directly integrates profit maximization into the classifier’s training objective. Their framework includes a baseline model and two regularized variants incorporating LASSO and Tikhonov regularization to ensure robust generalization. Benchmark evaluations confirm that these profit-driven MPM extensions yield superior profitability outcomes relative to standard binary classifiers, emphasizing the necessity of embedding business objectives directly into predictive modeling.
Extending this perspective into the business-to-business (B2B) domain, Janssens et al. [
18] introduce B2Boost, an instance-dependent gradient boosting model explicitly designed for B2B churn scenarios. Recognizing customer heterogeneity in profitability, they propose the Expected Maximum Profit for B2B churn (EMPB) metric to guide model training. B2Boost directly optimizes customer-specific profit rather than traditional classification accuracy, yielding notable profit improvements over standard approaches. The successful application in B2B contexts highlights the broader potential of profit-centric methodologies beyond consumer markets.
These studies underscore the necessity of shifting predictive modeling practices toward profit-centric frameworks. By directly incorporating financial objectives, churn prediction models become more aligned with strategic business goals, facilitating more effective and economically beneficial customer retention efforts.
6.2. Ensemble and Hybrid ML Approaches
Ensemble and hybrid approaches have emerged as robust methodologies for enhancing customer churn prediction across various industries. By integrating multiple classifiers, clustering techniques, and advanced feature engineering methods, these approaches harness the strengths of individual models to mitigate the limitations of single-algorithm solutions. This section provides a comprehensive review of key studies that have demonstrated the effectiveness of ensemble and hybrid learning in churn prediction, highlighting their contributions to predictive accuracy, model robustness, and real-world applicability.
While both hybrid and ensemble approaches combine multiple models, their integration strategies differ. Ensemble methods, such as bagging, boosting, and stacking, aim to improve generalization by aggregating the predictions of several base learners, typically of the same or different types, without altering the original algorithms. In contrast, hybrid methods integrate distinct algorithms sequentially or in parallel, where one model’s output or feature transformation becomes the input for another. For example, a hybrid model might use clustering for customer segmentation, followed by classification within each segment, or combine feature engineering via CNNs with temporal modeling via LSTMs. Hybrid systems are generally more customized and often domain-specific, whereas ensemble methods follow standardized combining rules like majority voting or weighted averaging.
One notable study by Liu et al. [
28] introduces a hybrid approach that integrates clustering and classification algorithms to improve predictive accuracy in the telecom sector. Their model employs k-means, k-medoids, and random clustering techniques alongside classifiers such as Gradient Boosting Trees (GBT), DTs, RFs, DL, and Naïve Bayes (NB). The study reports significant performance improvements by leveraging stacking-based hybridization, with 96% and 93.6% accuracy on the Orange and Cell2Cell datasets. These results emphasize the benefits of ensemble learning and clustering-based feature enhancement in churn prediction. Similarly, Ramesh et al. [
24] propose a hybrid model combining ANNs and RFs to enhance churn prediction in telecommunications. Their ANN architecture, consisting of four hidden layers, achieved 90.34% accuracy, outperforming standalone RF and simpler ANN models. Integrating ANN’s predictive power with RF’s robustness effectively identifies churn factors, aiding telecom companies in proactive customer retention strategies.
Using hybrid approaches, Usman-Hamza et al. [
25] introduce Intelligent Decision Forest (DF) models to address scalability issues and class imbalance in telecom churn prediction. Their approach significantly enhances classification accuracy by incorporating Logistic Model Tree (LMT), RF, and Functional Trees (FT) within a weighted soft voting and stacking framework. The study underscores the potential of decision forest-based models in handling imbalanced datasets and improving churn detection across telecommunications.
Saias et al. [
26] focus on churn prediction within cloud service providers, emphasizing the importance of early detection in mitigating customer loss and optimizing resource allocation. Their ML framework evaluates multilayer neural networks, AdaBoost, and RF models, with RF emerging as the most effective, achieving an accuracy of 98.8% and an AUC score of 0.997. These findings reinforce the relevance of ensemble learning in dynamic service industries.
In the context of the webcasting industry, Fu et al. [
30] employ an ensemble learning-based churn prediction model optimized by the Nelder-Mead algorithm. Their approach extracts high-dimensional behavioural features from time-series data, introducing a novel churn indicator to enhance label accuracy. The study demonstrates superior operational efficiency and outperformance of traditional ensemble models, offering actionable insights for customer retention strategies.
Optimization techniques have also been explored to refine ensemble methods. Khoh et al. [
32] introduce an optimized weighted ensemble model tailored for the telecommunications industry, integrating Powell’s optimization algorithm to assign differential weights to base learners based on their predictive strength. This model achieves an accuracy of 84% and an F1-score of 83.42%, surpassing conventional ML approaches. Yogesh et al. [
29] further contribute to this domain by proposing a two-layer flexible voting ensemble, demonstrating the impact of data balancing on improving classification performance.
Boosted tree models have gained traction in various industries for their efficiency in churn prediction. Maretta et al. [
21] explore the use of XGBoost, LightGBM, and CatBoost in banking churn prediction, finding LightGBM to be the most effective with 91.4% accuracy, 94.8% AUC, and 87.7% recall. Similarly, Tianpei et al. [
22] implement a stacking-based ensemble framework combining XGBoost, Logistic Regression, DTs, and Naïve Bayes, achieving 98.09% accuracy by incorporating feature grouping techniques.
A novel direction in ensemble learning is explored by Arshad et al. [
33], who introduce Q-Ensemble Learning, a quantum-enhanced ensemble approach incorporating Quantum Support Vector Machine (Q-SVM), Quantum k-Nearest Neighbors (Q-kNN), and Quantum Decision Tree (QDT). By integrating blockchain technology for data security and transparency, their model outperforms classical ensemble models, achieving 15% higher accuracy and 12% higher precision, demonstrating the transformative potential of quantum computing in churn prediction.
Ensemble methods have also been applied to e-commerce churn prediction. Ishrat et al. [
27] present an AI-driven framework that combines model tuning, feature selection, and comparative analysis, achieving 100% accuracy and F1-score using CatBoost. Manohar et al. [
23] investigate a collective data mining approach integrating SVMs, Bayesian Classifiers, and RF, highlighting the benefits of combining multiple classifiers for improved accuracy and recall.
Other studies have focused on refining traditional ensemble techniques. Mahayasa et al. [
31] propose a weighted average ensemble combining XGBoost and RF, demonstrating superior predictive performance in the telecom and insurance sectors, with an F1-score of 0.850 and 0.947, respectively. Hemlata et al. [
20] explore Logistic Regression and Logit Boost for telecom churn prediction, confirming the efficacy of boosting techniques in outperforming conventional regression models.
Finally, Wang et al. [
19] provide a comparative analysis of widely used classification algorithms for churn prediction, reinforcing the importance of ensemble learning in enhancing model performance. Their benchmarking study offers valuable guidance for businesses seeking data-driven retention strategies.
These studies illustrate ensemble and hybrid approaches’ diverse and practical applications in customer churn prediction. By integrating multiple ML models and leveraging sophisticated feature engineering techniques, these methodologies provide robust, scalable, and high-performing solutions to the complex challenge of customer retention across various industries.
6.3. Optimization and Metaheuristic Approaches
Optimization and metaheuristic approaches have gained prominence in churn prediction research as effective strategies for enhancing model performance and reducing computational complexity. These studies offer robust frameworks that improve predictive accuracy and provide greater interpretability and actionable insights by integrating advanced feature selection techniques, hyperparameter tuning, and metaheuristic algorithms. This section reviews key contributions that employ these techniques to optimize churn prediction models across various domains.
Feature selection plays a critical role in improving model efficiency and accuracy. Saheed et al. [
35] introduce an ML-based churn prediction framework for the telecommunications sector, leveraging Information Gain and Ranker-based feature selection to enhance model interpretability. Their approach, which incorporates SVM, Multi-Layer Perceptron (MLP), RF, and Naïve Bayes, achieves a 95.02% accuracy rate, surpassing the 92.92% obtained without feature selection. These results highlight the importance of selecting relevant churn-related attributes for improved classification performance.
Building on feature selection techniques, Al-Shourbaji et al. [
38] propose a novel hybrid method, ACO-RSA, which integrates Ant Colony Optimization (ACO) with the Reptile Search Algorithm (RSA) to enhance predictive performance. Evaluated across multiple open-source churn datasets, ACO-RSA outperforms Particle Swarm Optimization (PSO), Multi-Verse Optimizer (MVO), and Grey Wolf Optimizer (GWO), demonstrating its effectiveness in handling high-dimensional telecom data. This study underscores the potential of metaheuristic approaches in refining feature selection for improved churn detection.
Pustokhina et al. [
36] introduce the ISMOTE-OWELM model, which integrates Improved SMOTE (ISMOTE) for data balancing with an Optimal Weighted Extreme Learning Machine (OWELM) for classification. A Multi-objective Rain Optimization Algorithm (MOROA) optimizes sampling rates and model parameters, yielding 94%, 92%, and 90.9% accuracy across three telecom datasets, significantly surpassing traditional approaches. The study emphasizes the effectiveness of ISMOTE-OWELM in improving churn detection while maintaining computational efficiency, making it a valuable tool for telecom providers aiming to enhance customer retention efforts.
Incorporating hyperparameter tuning into feature selection, Mirabdolbaghi et al. [
37] present a comprehensive model optimization framework integrating Principal Component Analysis (PCA), Autoencoders, Linear Discriminant Analysis (LDA), t-SNE, and XGBoost for feature reduction. Their approach employs Bayesian and genetic optimization to fine-tune LightGBM models, significantly outperforming AdaBoost, SVM, and DT classifiers. The study also utilizes SHAP for feature importance interpretation and introduces a Customer Lifetime Value (CLV) ranking system, offering actionable insights for prioritising high-value customers at risk of churn.
Koçoğlu et al. [
42] present an Extreme Learning Machine approach for customer churn prediction, optimized using grid search for hyperparameter tuning. The study utilizes a churn dataset from the UCI Machine Learning Repository and compares ELM’s performance against Naïve Bayes, k-Nearest Neighbor, and SVM models. The results demonstrate that ELM achieves the highest accuracy of 93.1%, highlighting its efficiency in churn prediction due to minimal parameter tuning requirements and competitive performance. The study underscores ELM’s potential as a robust and effective technique for churn analysis.
Metaheuristic optimization has also been explored to enhance gradient boosting techniques. AlShourbaji et al. [
39] propose the Enhanced Gradient Boosting Model (EGBM), which integrates an SVM RBF base learner with PSO and Artificial Ecosystem Optimization (AEO) for hyperparameter tuning. Evaluated on seven telecom datasets, EGBM demonstrates superior predictive capabilities compared to traditional GBM and SVM models, effectively addressing premature convergence and enhancing customer retention strategies.
Hybrid optimization approaches further improve churn prediction efficiency. Kurtcan et al. [
40] introduce PCA-GWO-SVM, a model combining Principal Component Analysis (PCA) for feature selection, Grey Wolf Optimization for hyperparameter tuning, and SVM for classification. Compared to logistic regression, k-nearest neighbors, naïve Bayes, and DTs, PCA-GWO-SVM achieves higher accuracy, recall, and F1-score, reinforcing the value of combining optimization techniques with classification frameworks.
Ponnusamy et al. [
41] employ a PSO-SVM-based algorithm to enhance churn prediction performance in the banking sector. By optimizing hyperparameters using Particle Swarm Optimization, their approach significantly outperforms traditional SVM models, demonstrating the effectiveness of hybrid optimization strategies for financial institutions seeking to minimize customer attrition. Similarly, Venkatesh et al. [
34] propose an Optimal Genetic Algorithm (OGA) with SVM for cloud-based churn prediction. Their approach utilizes a double-chain quantum genetic algorithm to fine-tune SVM hyperparameters, achieving high sensitivity (94.50), accuracy (90.27), and an F-score of 94.30. These findings underscore the effectiveness of genetic optimization in enhancing predictive performance, making it a promising technique for large-scale cloud-based analytics.
These studies illustrate how optimization and metaheuristic approaches significantly improve churn prediction models’ accuracy, efficiency, and interpretability. By integrating advanced feature selection, hyperparameter tuning, and metaheuristic optimization, these methodologies provide scalable and high-performing solutions for industries grappling with complex customer data, ultimately enhancing retention strategies and business decision-making.
6.4. Adaptive and Resampling Approaches
In dynamic environments where customer behaviour and data distributions continuously evolve, addressing class imbalance and adapting to concept drift are critical challenges in churn prediction. Researchers have increasingly turned to resampling and adaptive learning strategies to enhance model performance in real-time applications. This section reviews key studies that employ these techniques to mitigate imbalances and adapt predictive models to changing data patterns, ensuring more accurate and reliable churn detection.
Ahmad et al. [
43] introduce the Optimized Two-Sided Cumulative Sum Churn Detector (OTCCD), a novel adaptive churn prediction framework for telecom data streams. By integrating the Synthetic Minority Over-sampling Technique (SMOTE) for data balancing and a cumulative sum control chart for drift detection, OTCCD efficiently identifies shifts in customer behaviour within a sliding window framework. Experimental evaluations on real-world telecom datasets, such as Call Detail Records, demonstrate that OTCCD outperforms traditional methods by providing higher accuracy and faster drift detection. This study highlights the importance of real-time adaptability in churn prediction models, offering telecom companies a robust tool for proactive customer retention strategies.
Adnan et al. [
44] propose an adaptive learning approach that integrates evolutionary computation with a Naïve Bayes classifier to address class imbalance in telecommunications churn prediction. By dynamically adjusting model parameters based on incoming data patterns, the hybrid method significantly improves precision, recall, and F1 scores compared to traditional approaches. Evaluations on real-world telecom datasets confirm the model’s effectiveness in proactively identifying at-risk customers, underscoring the potential of adaptive learning in minimizing revenue loss due to customer churn.
Complementing adaptive methodologies, Shimaa et al. [
46] develop a hybrid churn prediction framework that combines XGBoost with SMOTE-ENN resampling to balance datasets and improve classification accuracy. This integration enhances precision, recall, and F1 scores, outperforming conventional ML techniques across three telecom datasets. By effectively addressing class imbalance and leveraging ensemble learning, the model facilitates proactive retention strategies, reinforcing the role of resampling techniques in churn prediction.
Incorporating a more customer-centric approach, Lee et al. [
45] propose a hybrid churn prediction framework that dynamically models churn probability based on customer lifetime value rather than fixed periods. By segmenting customers into groups such as new, short-term, high-value, and churn-prone users, their methodology applies tailored ML models to enhance predictive accuracy. Evaluations of datasets from a U.K. gift seller and Pakistan’s most significant e-commerce platform show recall scores ranging from 0.56 to 0.72 in one case and 0.91 to 0.95 in another. The study highlights the advantages of integrating statistical modeling with ML techniques to refine customer retention strategies while reducing data requirements.
These studies illustrate how adaptive and resampling approaches effectively address class imbalance and concept drift, enabling more scalable and robust churn prediction solutions. By integrating real-time learning, resampling techniques, and evolutionary optimization, these methodologies provide powerful tools for businesses seeking to enhance customer retention strategies in evolving market conditions.
6.5. Explainable and Interpretable Approaches
Understanding the underlying decision processes in complex predictive tasks such as churn prediction is crucial for gaining stakeholder trust and facilitating actionable insights. Recent research has increasingly focused on integrating interpretability and explainable AI techniques into churn prediction models. This section reviews key contributions that enhance model transparency through rule-based formulations, SHAP analyses, and other XAI methodologies.
De Bock et al. [
47] introduce Spline-Rule Ensemble classifiers with Structured Sparsity Regularization (SRE-SGL) as an interpretable approach to customer churn prediction. While traditional ML models often prioritise predictive accuracy, this study emphasizes the need for explainable models that provide actionable insights into customer behaviour. The proposed spline-rule ensembles integrate tree-based ensemble methods with regression analysis, balancing model flexibility and simplicity. However, conventional rule-based ensembles can become excessively complex due to conflicting components. To address this, the authors incorporate Sparse Group Lasso regularization, which enhances interpretability by enforcing structured sparsity. Evaluations across fourteen real-world datasets demonstrate that SRE-SGL outperforms standard rule ensembles in AUC and top decile lift while maintaining competitive predictive performance. A case study in the telecommunications sector further illustrates the model’s interpretability, reinforcing the value of structured regularization in making churn prediction both effective and explainable.
Extending interpretability techniques to workforce analytics, Mitravinda et al. [
48] investigate employee attrition prediction using ML models and XAI methodologies. Their study applies SHAP to identify key factors driving attrition and visualize their impact. Additionally, the research introduces a recommendation system leveraging user-based collaborative filtering to propose personalized retention strategies. By combining predictive modeling with actionable insights, this study demonstrates how XAI techniques can inform more effective employee retention policies.
In digital entertainment, Wang et al. [
49] address the challenge of player churn prediction in online video games, where understanding social interaction dynamics is critical. While ML models are widely used for player behaviour analysis, their black-box nature limits adoption by product managers and game designers. The study restructures model inputs into explicit and implicit features to bridge this gap, enhancing expert interpretability. Furthermore, the research highlights the necessity of XAI techniques that explain feature contributions and provide actionable recommendations for reducing churn. The proposed approach is validated through two case studies involving expert feedback and a within-subject user study, demonstrating its effectiveness in improving decision-making for player retention strategies.
Together, these studies illustrate the crucial role of interpretability in churn prediction models. By integrating advanced XAI techniques, researchers bridge the gap between high predictive performance and the need for transparent, actionable insights. This integration supports more informed and effective retention strategies across diverse industries, reinforcing the value of explainable AI in real-world predictive analytics.
6.6. Data-Centric and Augmentation Approaches
Beyond refining predictive models, recent research in churn prediction has increasingly emphasized enhancing the quality and diversity of training data. Data-centric and augmentation approaches seek to enrich traditional datasets by incorporating novel data sources, generating synthetic data, and leveraging advanced feature engineering techniques. These strategies are crucial for improving model robustness, addressing data imbalances, and achieving higher predictive accuracy. This section reviews key contributions that exemplify these efforts.
Vo et al. [
50] explore a novel churn prediction approach that integrates unstructured call log data with traditional structured data. While existing ML models primarily rely on demographic and account history data, this study highlights the untapped potential of analyzing spoken content from customer interactions. Using natural language processing techniques, the authors process a large-scale call center dataset containing two million calls from over 200,000 customers. Their findings demonstrate that incorporating unstructured call data significantly enhances prediction accuracy while providing deeper insights into customer behaviour. Additionally, interpretable ML techniques extract personality traits and customer segmentation patterns, facilitating personalized retention strategies. This study underscores the importance of combining structured and unstructured data sources to develop more comprehensive churn prediction frameworks in the financial services industry.
Soumi et al. [
51] address the challenge of optimizing training data quality through a representation-based query strategy for churn prediction. Given manual data annotation’s high cost and inefficiency, the authors propose Entropy-based Min-Max Similarity (E-MMSIM), an active learning algorithm inspired by protein sequencing techniques. This method selects the most informative and representative data points for annotation, reducing redundancy and improving model efficiency. The approach enhances topic classification accuracy in customer service messages, yielding significant improvements in F1-score, AUC, and overall model performance. Moreover, when these qualitative features are integrated with structured customer data, churn prediction models achieve a 5% performance gain. The study highlights the critical role of data selection strategies in optimizing ML workflows for customer retention management.
In the realm of synthetic data generation, Wang et al. [
52] explore the impact of data-centric AI on churn prediction. Unlike traditional model-centric AI, which focuses on hyperparameter tuning and algorithm modifications, data-centric AI enhances predictive performance by improving training data quality and distribution. This research evaluates various data synthesis algorithms, examining their effects on data balancing, augmentation, and substitution. The findings underscore the potential of resampling methods in mitigating class imbalance and improving model robustness, providing valuable insights for AI-driven churn prediction frameworks across industries.
Babak et al. [
53] introduce a social network-based churn prediction model, recognizing that social interactions and peer behaviour often influence customer churn. The study develops a feature engineering approach incorporating influence and conformity indices derived from call network data. By integrating social connectivity metrics, the model significantly enhances the predictive power of standard ML classifiers, particularly gradient boosting models. This research demonstrates that churn is not solely an individual decision but is shaped by broader social dynamics. This perspective extends beyond telecommunications to various industries where peer influence affects customer behaviour.
Collectively, these studies illustrate the transformative impact of data augmentation and quality improvement in churn prediction. Researchers are developing more comprehensive and robust predictive frameworks by incorporating novel data sources, employing active learning for data selection, generating synthetic data, and leveraging social network information. These advancements enhance model accuracy and provide deeper insights into customer behaviour, enabling more effective and proactive retention strategies.
6.7. Traditional ML Approaches
Traditional machine learning approaches significantly influence churn prediction by leveraging established statistical and algorithmic techniques. These methods rely on classical models and feature engineering to derive actionable insights and achieve high predictive accuracy. This section highlights key studies that exemplify the application of conventional ML methodologies across diverse domains.
Tianyuan et al. [
55] present a data-driven approach to customer churn prediction in telecommunications, incorporating customer segmentation to enhance predictive accuracy. Using Fisher discriminant analysis and logistic regression, their model achieves a 93.94% accuracy rate on telecom datasets, effectively identifying potential churners. Tailoring predictions to specific customer groups enhances the precision of retention campaigns, providing telecom operators with a powerful tool to proactively reduce churn and improve profitability. The study underscores the significance of segmentation in refining churn prediction models.
Expanding on customer relationship management (CRM) applications, Šimović et al. [
56] explore churn prediction using big data analytics to analyze heterogeneous customer behaviours, such as self-care service usage, service duration, and responsiveness to marketing efforts. Their study introduces an enhanced logistic regression model with a mixed penalty term to mitigate overfitting and balance feature selection. Empirical evaluation on a large CRM dataset demonstrates high classification performance across standard metrics, reinforcing the potential of penalized logistic regression as a scalable and computationally efficient approach to churn modeling in big data environments.
Jakob et al. [
58] extend traditional ML techniques to the digital health sector, investigating early user churn in a weight loss app. By analyzing engagement data from 1283 users and 310,845 event logs, the study employs an RF model to predict user dropout based on daily login counts. Achieving an F1 score of 0.87 on day 7 and identifying 93% of churned users, the study highlights how churn prediction can enable personalized retention strategies in digital health interventions, ultimately improving long-term user engagement and health outcomes.
Returning to the telecommunications industry, Sikri et al. [
59] developed an ML-based approach for improving customer retention. By analyzing customer demographics, usage patterns, and service details, the study applies DTs and SVM to identify customers at risk of churning. The results demonstrate high predictive accuracy, empowering telecom companies to implement targeted retention strategies effectively. This study reaffirms the value of conventional ML techniques in customer retention efforts.
Expanding on real-time prediction applications, Nyashadzashe et al. [
54] developed a churn prediction model tailored for the telecommunications industry, specifically focusing on prepaid customers who frequently switch providers. Using Watson Studio, their study employs big data analytics within the CRISP-DM framework and evaluates three ML algorithms—Logistic Regression, RF, and DT. While Logistic Regression exhibited the lowest misclassification rate (2.2%), RF and DT achieved relatively high accuracy rates (78.3% and 79.2%, respectively) but suffered from misclassification rates above 20%. This research underscores the limitations of relying solely on accuracy metrics and advocates for more comprehensive evaluation techniques to enhance real-time churn prediction performance.
Beyond customer churn, AbdElminaam et al. [
57] introduce EmpTurnoverML, an AI-driven approach for predicting employee turnover and customer churn using ML algorithms. The study evaluates various classification techniques, including K-Nearest Neighbors, DTs, Logistic Regression, RF, SVM, AdaBoost, Naïve Bayes, and Gradient Boosted Machines (GBM), using an 80-20 train-test split. By identifying key patterns associated with employee departures, the study highlights how AI-powered prediction models can help organizations implement proactive retention strategies, reducing hiring and training costs while enhancing workforce stability. The findings demonstrate the broader applicability of churn prediction methodologies in workforce analytics and business efficiency.
These studies illustrate the continued relevance of conventional ML approaches in churn prediction. Through rigorous model development and strategic feature engineering, these methodologies provide potent tools for organizations seeking to mitigate churn, improve customer and employee retention, and drive sustainable business growth. Overall, traditional ML methods such as decision trees, logistic regression, and support vector machines remain valued for their interpretability, computational efficiency, and ease of deployment. However, they may struggle with high-dimensional or sequential data, and their performance is often limited compared to more advanced ensemble approaches.
7. Deep Learning Approaches
Deep learning techniques have significantly advanced churn prediction by offering diverse methodologies that address complex user behaviour patterns and industry retention challenges. Recent advancements include deep reinforcement learning, sequential modeling with architectures like LSTMs, hybrid and ensemble methods integrating multiple DL paradigms, CNNs tailored for structured data, efficient feedforward neural networks, and innovative representation learning and feature interaction models. Each category provides unique strengths, such as improved accuracy, enhanced interpretability, or computational efficiency, collectively supporting proactive and effective churn management strategies. This section explores these distinct approaches, highlighting their applications, advantages, and contributions to predictive analytics.
Table 2 highlights the datasets used (public, private, simulation-based), DL techniques implemented, and performance metrics evaluated.
7.1. Deep Reinforcement Learning Approaches
Deep reinforcement learning approaches represent an emerging paradigm in churn prediction, particularly within dynamic environments such as digital entertainment. These methods go beyond traditional supervised learning by leveraging simulation-based techniques to model complex user behaviours and engagement dynamics. This section highlights a pioneering study that exemplifies the potential of deep reinforcement learning in addressing churn challenges in mobile gaming.
Roohi et al. [
60] introduce a novel simulation-based model for predicting churn in mobile gaming. Unlike traditional supervised ML models that rely on historical player data, this work integrates Deep Reinforcement Learning to simulate AI-driven gameplay behaviour, capturing in-game difficulty and player skill evolution. A key strength of this approach is its ability to model player persistence and engagement dynamics without requiring extensive real-world behavioural data. The study demonstrates that incorporating a population-level simulation of player heterogeneity improves churn prediction accuracy, thereby reducing the dependency on expensive retraining of DRL agents. This framework offers a promising direction for churn analysis in digital entertainment, where player retention strategies are critical for revenue sustainability.
7.2. Temporal and Sequential DL Approaches
Temporal and sequential DL approaches have emerged as essential tools for capturing the dynamic nature of customer behaviour in churn prediction. By leveraging temporal dependencies inherent in user engagement data, these models enable a more nuanced understanding of churn patterns, ultimately leading to more effective retention strategies. This section reviews recent studies that utilize deep sequential architectures, such as LSTM networks, to enhance churn prediction performance.
Joy et al. [
63] present a hybrid DL approach that integrates sequential modeling with explainable AI to improve churn prediction in streaming services. The proposed framework combines LSTM and Gated Recurrent Unit (GRU) networks to capture temporal trends in user engagement, complemented by LightGBM to refine predictive performance. A key contribution of this study is its emphasis on interpretability, employing Shapley Additive Explanations and Explainable Boosting Machines (EBM) to provide transparency in feature importance rankings. By ensuring that decision-makers understand the reasoning behind churn predictions, the model enhances actionable insights for business applications. The study reports state-of-the-art performance, achieving a 95.60% AUC and a 90.09% F1 score, reinforcing the effectiveness of hybrid architectures in churn analysis.
Expanding on sequential DL techniques, Zhu et al. [
61] introduce a trajectory-based LSTM framework (TR-LSTM) for churn prediction, which extracts three trajectory-based features from customer movement data. The model significantly outperforms traditional methods, demonstrating the utility of spatiotemporal behaviour analysis in predicting churn. Similarly, Alboukaey et al. [
62] emphasize the importance of daily behavioural patterns by developing an LSTM-based dynamic churn prediction model for mobile telecom customers. Unlike conventional monthly-based models, this approach captures short-term fluctuations in customer activity, enhancing prediction accuracy and allowing for more timely interventions. These findings underscore the superiority of LSTM-based architectures in modeling evolving user engagement patterns, particularly in dynamic service industries.
Further validating the effectiveness of LSTMs, Beltozar-Clemente et al. [
64] demonstrate that deep sequential networks can overcome vanishing gradient issues and effectively model long-term dependencies in customer behaviour sequences. Their study achieves 95% performance across multiple evaluation metrics, highlighting the potential of LSTM-based models to refine churn prediction by capturing complex behavioural trends.
Collectively, these studies establish sequential and temporal DL approaches as robust tools for churn prediction. By leveraging LSTM-based architectures, these models offer enhanced predictive accuracy, more profound insights into user behaviour, and timely interventions, making them invaluable for developing proactive retention strategies across various industries.
7.3. Ensemble and Hybrid DL Approaches
Ensemble and Hybrid DL approaches have gained significant traction in churn prediction due to their ability to combine multiple models’ strengths and overcome individual architectures’ limitations. These approaches achieve enhanced predictive accuracy and improved generalization across diverse application domains by integrating DL techniques, such as RNNs, CNNs, and attention mechanisms, with ensemble methods and optimization algorithms. This section highlights key studies that exemplify the effectiveness of hybrid and ensemble strategies in churn prediction.
Jajam et al. [
66] introduce an ensemble model that integrates Stacked Bidirectional LSTMs (SBLSTM) and RNNs with an arithmetic optimization algorithm (AOA). The framework is fine-tuned using an improved Gravitational Search Optimization Algorithm (IGSA), achieving a state-of-the-art accuracy of 97.89% in the insurance domain. These results highlight the potential of ensemble architectures to effectively merge multiple DL techniques, improving generalization and performance in churn prediction tasks.
Similarly, Liu et al. [
65] present a fused attentional DL model (AttnBLSTM-CNN) that integrates Bidirectional LSTMs (BiLSTM) and CNNs to address the limitations of standalone RNNs and CNNs. By incorporating an attention mechanism, the model enhances prediction accuracy by prioritising critical customer behaviour patterns. The study demonstrates that integrating attention layers into DL pipelines improves churn detection accuracy and enhances interpretability, providing valuable insights for financial institutions.
Expanding on hybrid architectures in the financial sector, Van-Hieu et al. [
68] propose a DL ensemble model for customer churn prediction in banking. The approach employs a stacked DL architecture where Level 0 integrates three distinct deep neural networks, and Level 1 utilizes a logistic regression model for final prediction. Tested on the Bank Customer Churn Prediction dataset, the framework achieves 96.60% accuracy, 90.26% precision, 91.91% recall, and an F1-score of 91.07%. These results highlight the robustness of combining DL models with logistic regression to improve churn prediction accuracy, reinforcing the value of ensemble methodologies in financial customer retention strategies.
Zhao et al. [
67] further enhance churn prediction by integrating unsupervised and supervised learning techniques. Their hybrid model incorporates K-means clustering, entropy-based methods, and customer portrait analysis for segmenting telecom customers. A multi-head self-attention-based nested LSTM classifier is then applied to evaluate customer behaviour. Tested on China’s telecom market data, the model outperforms traditional classification methods by improving the accuracy of customer behaviour recognition. Additionally, it effectively differentiates between medium-value and high-value customers, providing critical insights for precision marketing strategies and enabling telecom companies to tailor service offerings more effectively.
Collectively, these studies illustrate that hybrid and ensemble DL approaches enhance predictive accuracy and improve model interpretability and generalization across sectors. Their innovative integration of diverse methodologies offers promising avenues for developing robust, scalable churn prediction systems that effectively support targeted retention strategies.
7.4. CNN–Based Approaches
Convolutional Neural Networks have emerged as a powerful tool in churn prediction, particularly for tasks requiring complex feature extraction and hierarchical data representation. While traditionally applied to image and text processing, CNN-based approaches have proven effective in structured data scenarios, offering improved predictive accuracy and addressing challenges such as class imbalance and information loss. This section reviews key studies that leverage CNNs—often in combination with other techniques—to enhance churn prediction models.
Muhammad et al. [
69] compare DL architectures on benchmark datasets such as Cell2Cell and KDD Cup for churn prediction. Their findings identify CNNs as the most effective model based on multiple evaluation criteria, outperforming traditional ML algorithms and DL models. These results underscore the ability of convolutional architectures to capture hierarchical relationships within customer data, particularly in scenarios where feature extraction poses significant challenges.
Extending CNN applications to workforce analytics, Ebru et al. [
70] introduce a hybrid model (ECDT-GRID) for employee churn prediction. This approach integrates Extended Convolutional Decision Trees (ECDT) with grid search optimization to enhance classification accuracy. Unlike conventional CNN applications in image and text processing, this study adapts CNNs for structured numerical data, addressing information loss through DT-based learning. The ECDT-GRID model outperforms CNN, ECDT, and traditional ML models, demonstrating the importance of hyperparameter tuning in improving predictive performance. The study highlights the potential of DL in workforce analytics, particularly in retail, where employee churn impacts operational stability. By combining CNNs with DT structures, this approach provides a robust predictive framework, showcasing the role of DL in optimizing employee retention strategies.
Saha et al. [
71] introduce ChurnNet, a novel DL-based churn prediction model tailored for the telecommunications industry (TCI). Recognizing the importance of customer retention in a competitive market, the study aims to enhance predictive accuracy beyond existing methods. ChurnNet integrates a 1D convolutional layer with residual blocks, squeeze-and-excitation blocks, and a spatial attention module, allowing the model to capture complex feature dependencies while mitigating the vanishing gradient problem. The model is evaluated using three public datasets, each exhibiting significant class imbalance, which is addressed through SMOTE, SMOTEEN, and SMOTETomek resampling techniques. Rigorous experimentation, including 10-fold cross-validation, demonstrates that ChurnNet outperforms state-of-the-art models, achieving accuracy scores of 95.59%, 96.94%, and 97.52% across the three datasets. These findings emphasize the potential of DL architectures with attention mechanisms in advancing churn prediction models, making them more effective and interpretable for telecom service providers.
These studies highlight the versatility and strength of CNN-based approaches in churn prediction. By addressing challenges such as feature extraction, information loss, and class imbalance, CNNs and their hybrid variants provide robust frameworks that can be adapted to various applications—from customer retention in telecom to employee churn in retail—underscoring their critical role in modern predictive analytics.
7.5. Feedforward Deep Neural Network Approaches
Feedforward deep neural network approaches remain widely used in churn prediction because they can learn complex nonlinear relationships directly from data while maintaining relatively straightforward architectures. These methods, including Extreme Learning Machines, Multi-Layer Perceptrons, and Deep Neural Networks, balance predictive performance and computational efficiency. This section reviews key studies that have leveraged these architectures to achieve robust churn prediction outcomes.
Małgorzata et al. [
73] evaluate Multi-Layer Perceptron and Radial Basis Function (RBF) networks for churn prediction in mobile telecommunications. Their findings suggest that MLPs achieve near-perfect accuracy (0.999), significantly outperforming traditional fuzzy rule-based and rough-set systems. However, the study also acknowledges the black-box nature of neural networks, emphasizing the need for explainability in DL models to support real-world adoption. These insights highlight the trade-off between model performance and interpretability, an ongoing challenge in deploying DL solutions for churn prediction.
Setyo [
72] investigates churn prediction in the telecommunications sector using Deep Neural Networks, comparing their performance against RF and XGBoost. Recognizing the critical impact of customer attrition on business retention, the study incorporates feature selection techniques and evaluates model efficiency using Google Colaboratory with a TensorFlow backend. The results indicate that DNN achieves 80.62% accuracy in just 68 s, outperforming XGBoost (76.45% accuracy, 175 s) and RF (77.87% accuracy, 529 s). These findings highlight DNN’s ability to balance accuracy and computational efficiency, making it a promising alternative for real-time churn prediction in telecommunications.
These studies underscore the potential of feedforward and standard deep neural network approaches to provide robust and efficient churn prediction solutions. At the same time, they highlight the ongoing need to improve model interpretability to enhance adoption and usability in practical business applications.
7.6. NLP–Based DL Approaches
NLP-based deep learning approaches represent an innovative frontier in churn prediction by leveraging unstructured textual data to complement traditional numerical inputs. These methods harness advanced language models and RNNs to extract meaningful insights from customer communications, enriching predictive analytics and enhancing retention strategies. This section highlights a key study that exemplifies the potential of NLP-driven churn prediction.
Ozan [
74] offers a unique perspective by applying NLP techniques to CRM data for churn prediction. Utilizing word embeddings alongside RNNs, the study demonstrates that text data—such as customer feedback and service interactions—can be effectively harnessed to predict churn. This approach complements traditional structured data methods and provides deeper insights into customer sentiment and behaviour. The findings suggest that NLP-driven churn prediction models could be particularly beneficial in industries where customer communication is critical in shaping retention strategies.
7.7. Representation and Feature Interaction Approaches
Representation and feature interaction approaches have emerged as promising strategies to enhance churn prediction by capturing complex relationships within customer data. These methods address limitations in traditional deep neural networks, particularly in handling high-order feature interactions and categorical variables. This section reviews key studies that leverage advanced embedding techniques to improve predictive accuracy and interpretability in churn modeling.
Tang et al. [
75] introduce a Feature Interaction Network (FIN) designed to overcome challenges standard deep neural network-based churn models face. Traditional models often struggle to capture high-order feature interactions and effectively handle one-hot encoded categorical features. FIN integrates two key components to address this: an entity embedding network to capture meaningful feature representations and a factorization machine network with sliding windows to enhance feature interactions. Experimental evaluations on four public datasets demonstrate that FIN outperforms state-of-the-art models by effectively capturing complex dependencies in customer data. This study underscores the importance of feature interaction modeling in churn prediction, offering a robust framework for leveraging structured customer data in predictive analytics.
In a complementary approach, Cenggoro et al. [
76] develop a DL-based vector embedding model tailored for churn prediction in the telecommunications industry. This model not only emphasizes predictive accuracy but also enhances interpretability. The model enables precise differentiation between loyal and churn-prone customers by leveraging vector embeddings to represent customer behaviour in a discriminative feature space. Experimental results indicate that the model achieves an F1 score of 81.16%, demonstrating strong predictive performance. Additionally, cluster similarity analysis and t-SNE visualizations confirm that the learned representations are highly separable, reinforcing the model’s effectiveness. This study highlights the potential of vector embeddings as a powerful tool for churn modeling, equipping telecom providers with actionable insights for customer re-engagement and retention.
These studies illustrate how embedding and feature interaction techniques can significantly improve churn prediction by capturing nuanced relationships within customer data. By enhancing both predictive performance and interpretability, these approaches offer valuable tools for developing proactive and targeted retention strategies in competitive industries. Deep learning architectures such as CNNs, RNNs, and attention-based models excel at capturing temporal dynamics and complex feature interactions, often achieving superior predictive accuracy. Their main drawbacks are higher computational cost, reliance on large datasets, and reduced interpretability, which can limit adoption in business contexts requiring transparency.
In summary, machine learning and deep learning offer complementary strengths for churn prediction. ML techniques are generally easier to interpret, faster to train, and less resource-intensive, making them suitable for business settings where transparency and efficiency are critical. In contrast, DL models are well-suited to high-dimensional, sequential, and unstructured data, where their ability to learn complex patterns can lead to superior predictive accuracy. Therefore, the choice between ML and DL depends not only on data characteristics but also on practical requirements such as interpretability, scalability, and computational resources.
The included studies (
n = 61) were synthesized narratively to highlight methodological trends, dataset usage, and reported performance metrics (see
Table 1 and
Table 2). No formal risk of bias assessment, reporting bias assessment, or certainty of evidence assessment (e.g., using GRADE) was conducted, as the review focused on methodological analysis rather than quantitative synthesis. Due to substantial heterogeneity in study designs, datasets, and evaluation protocols, meta-analysis was not feasible. Consequently, no investigations of heterogeneity, subgroup analyses, sensitivity analyses, or certainty assessments were performed, and no results were presented for these items.
8. Discussion
8.1. Linking Findings to Research Questions
To provide a direct response to the research questions outlined in the Introduction, we summarise our findings below about each question:
RQ1: What are the predominant ML and DL approaches used in customer churn prediction, and how have these methodologies evolved over time? Our synthesis (
Section 6 and
Section 7,
Table 1 and
Table 2) shows that ensemble-based ML techniques—particularly boosting methods such as XGBoost, LightGBM, and CatBoost—remain the most widely adopted across industries, with decision trees and random forests also frequently used as interpretable baselines. LSTMs, CNNs, and attention-based architectures have been widely adopted in the DL domain, particularly for sequential and unstructured datasets. While hybrid approaches exist, most combine algorithms within the same paradigm (ML–ML or DL–DL) rather than integrating ML with DL. From 2020 to 2024, there has been an apparent increase in the adoption of explainable AI techniques, adaptive learning strategies, and profit-driven evaluation metrics, reflecting a gradual shift toward models that balance predictive performance with interpretability and business relevance.
RQ2: How do different predictive models compare in terms of accuracy, adaptability, and interpretability across industries? Due to the heterogeneity of datasets, churn definitions, feature sets, and evaluation protocols, direct cross-study performance ranking is not feasible. Nonetheless, specific trends are evident. Boosting-based ML models consistently achieve strong predictive performance on structured datasets but may be less effective at modelling temporal dependencies than sequential DL architectures. LSTMs and CNNs excel at capturing behavioural and temporal patterns but often require greater computational resources and exhibit reduced interpretability. Efforts to improve adaptability include applying online learning, reinforcement learning, and transfer learning, although these remain limited in real-world deployments. Regarding interpretability, traditional ML methods offer inherent transparency, while DL methods benefit from post-hoc explainability tools such as SHAP, LIME, and attention mechanisms.
RQ3: What are the significant challenges and limitations in existing churn prediction research, and what future directions could address them? Our review identifies key challenges, including class imbalance, reliance on static datasets, limited interpretability in complex models, underutilisation of profit-oriented metrics, and a lack of cross-domain generalisability. These challenges are compounded by deployment barriers such as scalability and integration with existing CRM systems. As discussed in
Section 8.4, potential solutions include advanced resampling and cost-sensitive learning to mitigate imbalance, hybrid models that combine accuracy with transparency, adaptive drift-aware learning methods, and embedding business-centric evaluation metrics directly into optimisation processes. Future research should focus on developing scalable, adaptive, and interpretable churn prediction frameworks validated on standardised benchmark datasets to ensure both scientific rigour and real-world impact.
8.2. Challenges and Limitations
Despite significant advancements in ML and DL for churn prediction, several challenges hinder real-world implementation. One of the most persistent issues is class imbalance, where the number of churners in datasets is significantly smaller than that of non-churners. This imbalance often biases models toward the majority class, reducing their effectiveness in identifying at-risk customers. While resampling techniques and cost-sensitive learning have been proposed as solutions, they can lead to overfitting or increased computational costs.
Another major challenge lies in feature engineering and data representation. Many models rely on structured transactional data, yet customer interactions involve diverse data sources such as call logs, social media activity, and customer support interactions. Integrating and extracting meaningful features from such heterogeneous data remains a complex task. DL models can automate feature extraction, but often require extensive data preprocessing and significant computational resources.
Model interpretability is another critical concern, especially with DL models. While traditional ML techniques such as DTs and logistic regression provide human-readable decision rules, neural networks and ensemble models function as black boxes, making it difficult for businesses to trust their predictions. Explainable AI techniques, such as SHAP and attention mechanisms, have been introduced to address this issue, but they are not yet widely adopted in real-world churn prediction systems.
Furthermore, customer behaviour is dynamic, and many churn prediction models struggle to adapt to evolving patterns over time. Concept drift—where customer preferences, engagement levels, and churn risks change—challenges models trained on historical data. Adaptive learning techniques, such as online learning and reinforcement learning, offer potential solutions but require continuous retraining, making them resource intensive.
Finally, there is a disconnect between academic evaluation metrics and business impact. Many studies assess model performance using accuracy, F1-score, and AUC-ROC, but these do not necessarily translate to actionable business decisions. Profit-driven evaluation metrics, which factor in the cost of retention efforts versus lost revenue from churners, are still underexplored in research. Bridging this gap is essential for developing models that provide tangible business value.
Addressing these challenges will require further advancements in adaptive modeling, explainability techniques, and profit-aware churn prediction. As businesses continue to invest in data-driven retention strategies, future research should focus on developing scalable, interpretable, and business-aligned solutions to improve churn prediction outcomes.
Beyond the methodological challenges discussed above, this review and the body of evidence synthesized have additional limitations worth noting. The body of evidence synthesized in this review may be subject to several limitations. First, the included studies exhibited substantial heterogeneity in datasets, modeling objectives, and evaluation metrics, complicating direct comparisons across studies. Second, many studies relied on proprietary datasets with limited transparency, potentially restricting the generalizability of their findings. Third, publication and reporting biases may be present, as studies with positive results are more likely to be published in peer-reviewed outlets. Finally, the lack of standardized evaluation protocols across studies hinders the establishment of consistent benchmarks for churn prediction performance.
Moreover, this review also has inherent limitations in its processes. The search strategy was limited to English-language peer-reviewed studies, which may have excluded relevant research published in other languages or grey literature. Although the review adhered to PRISMA guidelines and involved two reviewers collaboratively screening and extracting data, no formal risk of bias or certainty assessments (e.g., ROBIS, GRADE) were performed, as the primary focus was on methodological trends rather than quantitative effect estimates. Additionally, using a narrative synthesis, while appropriate given the heterogeneity of studies, may be less robust than meta-analytic approaches for aggregating evidence.
8.3. Identified Gaps in Reviewed Research
Despite the extensive advancements in ML and DL for customer churn prediction, several gaps persist in the reviewed research, highlighting areas that require further exploration. One of the most notable gaps is the limited emphasis on real-world deployment challenges. While many studies focus on improving model accuracy and robustness, fewer address the practical aspects of implementing these models in business environments. Issues such as scalability, computational efficiency, and integration with existing CRM systems remain underexplored. Research into lightweight, efficient, and real-time deployable solutions is essential since many organizations lack the computational infrastructure to support complex DL models.
Another significant gap is the lack of focus on model interpretability and explainability. While DL approaches, particularly RNNs, CNNs, and transformers, have shown improved predictive performance, their black-box nature limits their adoption in business settings where transparency is crucial. Although techniques like SHAP and Local Interpretable Model-Agnostic Explanations (LIME) have been introduced, they are not widely integrated into churn prediction models. Future research should prioritise the development of inherently interpretable models or hybrid approaches that balance accuracy with transparency to facilitate better decision-making in customer retention strategies.
Additionally, most existing studies rely on static datasets, which fail to account for the dynamic nature of customer behaviour. Concept drift—where customer engagement patterns and churn drivers change over time—poses a significant challenge for model generalization. While some studies explore adaptive, reinforcement, or online learning techniques, their practical adoption remains limited. Future research should focus on developing adaptive and self-learning models that continuously update based on evolving customer data, ensuring sustained predictive performance over time.
Another gap is the lack of cross-domain generalization in churn prediction models. Many studies develop models tailored to specific industries, such as telecommunications or banking, but do not test their applicability across different sectors. Given that customer behaviour varies significantly across domains, future research should explore domain adaptation techniques and transfer learning to improve model generalizability. This would enable businesses in different sectors to leverage churn prediction methodologies without extensive retraining.
A further gap in the reviewed literature concerns fairness, ethics, and bias mitigation, which remain largely absent from churn prediction research. Although fairness-aware algorithms, bias auditing, and responsible AI frameworks are increasingly discussed in the broader machine learning field, very few studies apply these considerations to customer churn. This omission is significant because biased models may unintentionally disadvantage certain customer groups, leading to unequal treatment in retention strategies and exposing businesses to reputational or regulatory risks. Future research should therefore emphasize fairness-aware model design, transparent reporting of potential biases, and the integration of bias mitigation strategies. Addressing these issues would ensure that churn prediction models are accurate, profitable, equitable, trustworthy, and aligned with emerging standards for responsible AI.
Finally, profit-driven evaluation metrics remain underutilized in the reviewed literature. While traditional metrics such as accuracy, F1-score, and AUC-ROC are widely reported, they do not fully capture the business implications of churn prediction. Few studies incorporate profit-based metrics like Expected Maximum Profit for Customer Churn, which consider the financial impact of retention strategies. Further research is needed to develop models that align more closely with business goals, optimizing for predictive performance, cost-effectiveness, and revenue maximization.
Addressing these gaps will require a multi-faceted research approach, integrating interpretability, adaptive learning, cross-domain validation, and business-centric evaluation into future churn prediction models. By bridging these gaps, the field can advance toward more practical, transparent, and financially viable solutions for churn management in real-world applications.
8.4. Trend Directions
Analyzing publication trends in churn prediction research over 2020–2024 reveals a clear shift toward more advanced ML and DL techniques. IEEE has consistently led in publication volume, indicating a strong research focus within engineering and computational disciplines. While traditional ML techniques such as DTs and logistic regression remain widely used, boosting methods and ensemble learning have steadily grown, reflecting an industry preference for robust and interpretable models.
In recent years, DL approaches, particularly RNNs, CNNs, and transformers, have gained traction, especially in domains dealing with complex sequential and unstructured data, such as telecommunications and banking. Adopting hybrid ML-DL models also suggests an increasing interest in combining the strengths of multiple paradigms to improve predictive accuracy.
Another notable trend is the growing importance of explainability and business-aligned evaluation metrics. While early studies prioritised accuracy-based benchmarks, more recent research integrates profit-driven evaluation methods, addressing the gap between academic performance metrics and real-world applicability.
The field will likely see further advancements in adaptive learning techniques, reinforcement learning for churn management, and integration of multi-modal data sources. The continued evolution of ML and DL for churn prediction indicates a shift toward models that are more accurate, transparent, cost-effective, and dynamically adaptable to changing consumer behaviours.
8.5. Potential Solution to the Current Challenges
Our review identifies several persistent challenges in customer churn prediction, each of which has been addressed in the literature through various technical approaches. One of the most prevalent is class imbalance, where the proportion of churners is far smaller than that of non-churners. Beyond conventional oversampling and undersampling techniques, more advanced strategies such as Synthetic Minority Oversampling with Edited Nearest Neighbors (SMOTE-ENN) and Adaptive Synthetic Sampling (ADASYN) have demonstrated improved representation of the minority class. Some studies have combined these resampling methods with ensemble learning, while others have adopted cost-sensitive learning frameworks that incorporate misclassification costs directly into the model’s optimisation process. These cost-sensitive approaches ensure that model training reflects the real financial implications of prediction errors, which is particularly important in retention-focused applications.
Model interpretability is another major challenge, especially as deep learning architectures become increasingly complex. Several studies have applied post hoc explainability techniques such as Shapley Additive Explanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and counterfactual explanation methods to provide a clearer understanding of model behaviour. Others have explored inherently interpretable alternatives, including sparse linear models and rule-based ensemble methods, which may better suit domains where transparency is critical for regulatory compliance or building stakeholder trust. A recurring trade-off in churn prediction research is the choice between interpretable ML models and more complex DL architectures. Interpretable methods such as decision trees, logistic regression, and rule-based ensembles remain highly suitable in business contexts where transparency, regulatory compliance, and ease of communication with non-technical stakeholders are critical. These models allow decision-makers to trace predictions back to customer attributes and design targeted retention strategies. By contrast, DL models—including LSTMs, CNNs, and Transformer-based architectures—are more effective for high-dimensional, unstructured, or sequential data, where predictive accuracy and capturing complex behavioural patterns outweigh the need for interpretability. Guidance for practitioners therefore depends on context: interpretable ML is preferable when accountability and actionable insights are paramount, whereas DL approaches—including LSTMs, CNNs, and Transformer-based architectures—are more appropriate when the richness and complexity of the data demand advanced representation learning and predictive accuracy.
The problem of concept drift, where customer behaviours and market conditions evolve over time, has also received growing attention. The Optimised Two-Sided Cumulative Sum Churn Detector (OTCCD) integrates drift detection with adaptive learning to update models as data distributions change. Transfer learning and domain adaptation techniques have likewise been proposed to enable models to reuse knowledge from earlier data while adapting to new patterns with minimal retraining. These strategies are particularly relevant in industries where churn determinants shift rapidly due to technological or competitive changes.
Finally, the limited adoption of profit-oriented evaluation metrics remains a missed opportunity for aligning model performance with business objectives. Metrics such as the Expected Maximum Profit for Customer Churn (EMPC) and other cost–benefit frameworks allow for a direct assessment of the economic impact of retention strategies. Several studies have shown that embedding these metrics into the optimisation process can produce predictive and financially effective models rather than using them solely for post hoc evaluation.
These solutions show that the challenges in churn prediction are not insurmountable. Many methodological tools exist to address imbalance, improve interpretability, adapt to shifting data distributions, and incorporate business value into evaluation. By drawing attention to these approaches, our review aims to encourage future work that advances the technical state of the art and ensures that churn prediction models deliver actionable and economically meaningful outcomes.
9. Conclusions and Future Research Directions
Customer churn prediction has undergone rapid methodological evolution in recent years, with machine learning and deep learning techniques now central to identifying at-risk customers and guiding retention strategies. In this systematic review, we examined 240 peer-reviewed studies published between January 2020 and December 2024, applying a PRISMA-guided, two-phase methodology. The first phase provided a bibliometric mapping of the field, while the second delivered a detailed synthesis of 61 studies meeting strict novelty and contribution criteria. This dual approach enabled us to capture both the breadth and depth of recent advances in churn prediction research.
Our findings reveal a strong preference for ensemble learning and advanced ML techniques such as gradient boosting (XGBoost, LightGBM, CatBoost), decision trees, and random forests, alongside a growing adoption of DL architectures, particularly LSTMs, CNNs, and attention-based models. These methods are increasingly applied to capture customer data’s temporal dynamics and behavioural patterns. Hybrid modelling approaches are also explored, though most combine different algorithms within the same paradigm (ML–ML or DL–DL) rather than integrating ML with DL. While DL models often achieve superior predictive power, this comes at the expense of higher computational demands and reduced interpretability; conversely, traditional ML models tend to be more interpretable and computationally efficient but may underperform with high-dimensional or complex datasets. Efforts to bridge this gap through explainable AI tools such as SHAP, LIME, and attention mechanisms are promising but remain underrepresented in operational deployments.
Several persistent challenges emerged from our analysis. Class imbalance continues to bias model performance toward majority classes, and many models are trained on static datasets that do not reflect evolving customer behaviours, making them susceptible to concept drift. Adaptive learning strategies and real-time model updating are still rare in practice. Moreover, accuracy-oriented metrics dominate evaluation, with relatively few studies integrating profit-driven metrics such as the EMPC, despite their closer alignment with business objectives. In addition, fairness, ethics, and bias mitigation represent important but underexplored priorities in churn prediction research. Incorporating fairness-aware modelling and transparent reporting practices will be essential to ensure that future solutions are not only technically robust and business-aligned but also socially responsible.
Addressing these gaps presents clear directions for future research. There is a need for adaptive churn prediction frameworks that can dynamically update to account for behavioural and market changes, ideally incorporating automated drift detection and incremental learning. Integrating inherently interpretable models and robust post hoc explainability techniques should be prioritised to improve transparency and user trust, especially in regulated industries. Researchers should also explore multi-modal approaches that combine structured, unstructured, and network-based data to capture richer representations of customer behaviour. Finally, adopting standardised benchmark datasets and incorporating business-aligned performance metrics during training and evaluation would enable fairer comparisons across studies and ensure that predictive models deliver tangible value in real-world retention strategies.
By combining bibliometric insights with a structured methodological synthesis, this review provides a comprehensive, up-to-date map of churn prediction research. It offers concrete guidance for developing the next generation of adaptive, interpretable, and business-aligned models that can be deployed effectively in real-world contexts.