1. Introduction
The video game industry has seen exponential growth in recent years, with the global market valued at nearly USD 300 billion in 2024 [
1]. In such a competitive landscape, user retention is critical to the success of game companies. A key challenge is user churn, which is commonly defined as a player ceasing to engage with the game for an extended period (e.g., four consecutive weeks). However, this threshold may vary by game type or platform. Even a slight increase in churn rate can result in significant revenue loss, making churn prediction a high priority for developers and marketers.
Churn prediction involves identifying which players are likely to churn and, in some cases, when this churn may occur, often by analyzing behavioral patterns such as play frequency, in-game actions, purchases, and social interactions. Machine learning (ML) models, especially supervised classifiers, are widely used for this task. However, their performance depends heavily on the quality of input features and how well the data is preprocessed.
A crucial yet underexplored aspect of data preprocessing is how non-login periods are handled. Most existing studies either ignore or treat all inactivity as equivalent to disengagement. This oversimplifies player behavior and may lead to inaccurate predictions, as inactivity can result from various factors. To address this gap, we propose a novel approach that distinguishes between different types of inactivity based on observable behavioral features and activity timestamps when users first become active. We differentiate between the following:
- (1)
Inactivity before the first recorded login, which may indicate new users, dormant returnees, or temporary inaccessibility due to external factors
- (2)
Inactivity after a user has already played, which may reflect a genuine loss of interest or temporary inaccessibility due to external factors.
This classification is rule-based, not based on ML models or statistical inference. It enables us to apply more appropriate imputation strategies depending on the likely cause of inactivity. To explore the impact of our proposed approach, we apply a range of imputation techniques, including minimum value substitution, mean, mode, linear interpolation, and multiple imputation by chained equations (MICE). These methods vary in complexity and underlying assumptions; simple methods offer speed and interpretability, while MICE better preserves inter-feature relationships by iteratively modeling variable dependencies.
Our approach is evaluated using gameplay data from the MMORPG Blade & Soul, provided by NCSoft [
2]. The dataset is high-dimensional and includes a wide range of behavioral features, such as activity logs, payment transactions, and social interactions. To address this, we apply principal component analysis (PCA) to reduce feature space, preserve structural patterns, and minimize overfitting risk. The imputed data are then used as input to churn prediction using a random forest (RF) classifier, chosen for its interpretability and robustness to missing or noisy data. In addition, classifier chains are employed to capture label dependencies and further enhance performance. The main contributions of this study are as follows:
- (1)
Proposing a rule-based method to classify non-login periods and apply appropriate imputation techniques.
- (2)
Evaluating the impact of various imputation methods on churn prediction performance.
- (3)
Validating our approach using real-world MMORPG data and comparing it with baseline methods that treat inactivity uniformly.
Experimental results demonstrate that our approach improves prediction accuracy and enables a more nuanced interpretation of player inactivity.
The remainder of this paper is structured as follows:
Section 2 reviews related literature.
Section 3 describes our dataset and churn labeling procedure.
Section 4 outlines the proposed framework.
Section 5 presents the results and analysis.
Section 6 concludes with key findings and future research directions.
2. Background and Related Work
2.1. User Churn Prediction
Churn prediction continues to be a major concern across sectors such as telecommunications, retail, and finance, where identifying users likely to disengage is essential for sustaining long-term customer relationships. Recent studies have employed advanced approaches that draw on behavioral patterns, usage histories, and time-based activity trends to enhance churn detection and guide retention strategies [
3,
4,
5,
6].
In the gaming industry, churn prediction has emerged as a critical research area driven by its rapid growth and the substantial financial impact on revenue [
7]. Online games generate rich behavioral data, such as session frequency, in-game spending, and achievement progression, which can be used to detect early signs of disengagement. For instance, Kim et al. [
8] demonstrated that a small set of log-derived features could match the performance of more complex models, highlighting the importance of feature selection in churn prediction. Similarly, Lee et al. [
9] demonstrated that combining loyalty metrics, gameplay variety, and social indicators could improve prediction accuracy.
Beyond gameplay metrics, several studies have explored the role of social and psychological factors in user retention. Park et al. [
10] and Kawale et al. [
11] emphasized the role of social engagement and achievement-based features in long-term retention, particularly in MMORPGs. Borbora et al. [
12] and Lee [
13] further identified usability, intrinsic motivation, and community belonging as key determinants of player longevity, showing that churn is influenced by a combination of social dynamics and psychological factors. These psychosocial elements often interact with behavioral signals and evolve over time, reinforcing the importance of temporal modeling in churn analysis.
Efforts to model churn over time have led to the use of temporal analytics. For example, Hadiji et al. [
14] quantified the dynamics of engagement over time in free-to-play games. Drachen et al. [
15] and Runge et al. [
16] developed frameworks to analyze longitudinal user behavior across game lifecycles. Tamassia et al. [
17] used hidden Markov models to detect player state transitions from session sequences, and Milošević et al. [
18] examined how churn risk can be inferred from longitudinal activity data. Beyond understanding whether a user will churn, recent studies have focused on when churn is likely to happen. Periáñez et al. [
19] proposed the conditional inference survival ensemble (CISE) to estimate churn timing across user segments, especially high-value players. Bertens et al. [
20] built on this with a scalable survival ensemble model that provides precise churn timing predictions, enabling targeted retention interventions before user churn occurs. However, while these models handle sequential data effectively, they often assume all inactivity reflects disengagement, which may not always be the case.
Recent work across diverse online and mobile games has consistently shown the effectiveness of various churn prediction models. Performance is typically measured using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, and F1 score, particularly for handling class imbalance and assessing overall prediction accuracy. Perisic et al. [
21] investigated churn in a free-to-play casual game using cluster analysis and conditional RF, reporting an AUC of 0.71 and accuracy of 72%. Hossain et al. [
22]’s study on World of Warcraft identified RF as the best-performing model (AUC of 0.98, F1 score of 0.95, and accuracy of 97%). Hoang et al. [
23] introduced a feature tokenizer transformer and an imputation strategy using gradient-boosted regression trees to predict churn in freemium mobile games, reaching an AUC of 0.95 and accuracy of 86.8%. Dontireddy et al. [
24] analyzed over 40 million MMORPG character creation logs and found that XGBoost performed best with an AUC of 0.92 and accuracy of 90%. Mulla et al. [
25] compared logistic regression and RF for churn prediction in Candy Crush, reporting RF as the stronger model with 97% accuracy.
Several studies have focused on churn prediction using the Blade & Soul MMORPG datasets from 2017 and 2018. Guitart et al. [
26], the winning team of the 2017 Game Data Mining competition, applied tree-based ensemble models and long short-term memory (LSTM) networks for binary churn classification, achieving an overall F1 score of 0.62. For 2018 datasets, Sin and Paik [
27] employed binary logistic regression and a neural network, reporting accuracies of 83.3% and 86.7%, respectively. More recently, Jin et al. [
28] applied a graph neural network (GNN) that integrates player behavior and social relationships via a churn graph structure, achieving an F1 score of 0.93 and accuracy of 89.6%. While these studies demonstrate strong predictive performance, they primarily frame churn prediction as a binary classification task, determining whether a user will churn. In contrast, our study formulates churn prediction as a multi-class classification problem that simultaneously estimates whether and when a user will churn, using discrete time-based labels. Moreover, we address a key limitation overlooked in prior works, which treated non-login periods as indicators of churn and did not apply imputation methods to reconstruct missing behavioral data.
2.2. Handling User Inactivity as Missing Data
In churn prediction, periods of user inactivity, often defined as no login activity, are commonly treated as signs of disengagement. Most prior studies rely heavily on login frequency to construct predictive features, assuming that a prolonged absence implies declining interest. For instance, Runge et al. [
16] defined churn based on a fixed inactivity window, while Castro et al. [
29], Tamassia et al. [
17], and Borbora et al. [
12] derived features from raw login data without differentiating the causes of inactivity or addressing unobserved periods. This uniform treatment may result in misclassification when user absence is due to temporary external factors (e.g., work, travel, or technical issues) rather than a loss of interest.
In contrast, our study treats certain types of inactivity as missing data rather than definitive signs of churn. By framing these periods as a data quality issue, we apply structured imputation to reconstruct user timelines and preserve behavioral patterns. Prior research has shown that restoring continuity in behavioral data through imputation can enhance model accuracy [
30].
We evaluate several imputation techniques to fill in the missing periods. Traditional methods, such as mean and mode imputation, are simple but often fail to preserve behavioral continuity [
31]. Linear interpolation provides temporal smoothness but assumes consistent trends that may not be present in player activity data. Multiple imputation by chained equations (MICE) offers a more robust alternative by iteratively modeling the relationships among variables, thereby preserving the multivariate structure and temporal dependencies [
32]. MICE has demonstrated strong performance in restoring data integrity and improving prediction accuracy across various domains, including streamflow analysis [
33] and elastic well logs [
34]. Although rarely applied to churn prediction, its ability to capture inter-variable relationships and temporal patterns makes it a promising method for imputing missing behavioral data in gaming contexts.
Our approach applies these imputation strategies based on inferred causes of inactivity. By aligning imputation methods with behavioral context, we aim to improve the completeness of input features and, in turn, the accuracy of churn prediction. This structured link between problem framing, preprocessing, and predictive modeling represents a key methodological contribution of our work.
3. Characterizing User Churn
Churn in online games differs from other domains, where membership cancellation is often used to define churn. In gaming, users rarely cancel their accounts even after prolonged disengagement. For example, Lee et al. [
35] found that fewer than 1% of inactive users formally canceled their membership, making inactivity-based definitions more appropriate. In this study, we adopt the churn criterion defined in the Blade & Soul dataset from NCSoft: a user is considered churned if they remain inactive for four consecutive weeks during a 12-week churn determination period. This threshold is standard in MMORPGs, where players tend to have longer but less frequent sessions, unlike mobile games, where churn is often defined as 7–14 days of inactivity [
8].
The dataset contains weekly activity logs for 100,000 users over an 8-week observation period, followed by a 12-week window for determining churn. As defined by NCSoft for the competition, a user is considered churned if they are inactive for four consecutive weeks during this period. The churn occurrence point is the last active week in which activity is observed before the inactivity begins, as shown in
Figure 1. Based on when this point falls relative to the end of the observation window, users are assigned to one of four churn classes as formally defined in
Table 1.
The distribution of users across the four labels appears uniform, with each label accounting for exactly 25% of users. Note that this reflects the dataset provided by NCSoft and was not artificially sampled by the authors. The following shows some examples of the labeling process.
User A: Active in weeks 2–5 → Churn Occurrence Point is week 5 (Inactive in 6–12 weeks) → Labeled “2 Months.”
User D: Active in weeks 2, 8, and 12 → First 4-week inactivity starts after week 2 (Inactive in week 3–7) → Labeled “Month.”
4. Methodology
This section outlines our methodology. The overall workflow consists of the following key steps:
- (1)
Dataset preprocessing;
- (2)
Dataset structure;
- (3)
Inactivity treatment;
- (4)
Data structure transformation;
- (5)
Imputation methods.
4.1. Dataset Preprocessing
The Blade & Soul dataset consists of four files, as summarized in
Table 2. All features are numerical variables and cover user activity (e.g., playtime, login frequency), social interaction (e.g., party participation), and payment behavior. All 100,000 users were linked across files using unique hashed IDs, resulting in a merged dataset that included eight weeks of behavioral activity, followed by the assigned churn label. All numerical features were standardized using z-score normalization.
The activity dataset includes 36 numerical features capturing gameplay behavior, such as combat, progression, and communication.
Table 3 summarizes some key variables used for churn modeling.
Beyond gameplay activity, we also include payment behavior and party-based social interaction, both of which have been shown to correlate with churn [
36,
37]. Because the party dataset includes nearly 7 million records, we aggregated it into weekly summary features.
Table 4 lists the derived party and payment-related variables.
The final dataset includes 44 features and 99,485 users after excluding inconsistent records (e.g., mismatched party logs). The churn label distribution remained balanced across the four classes: 24,942, 24,693, 24,863, and 24,987 users.
4.2. Dataset Structure
The dataset used in this study is structured as panel data, recording the weekly average activity for each unique user ID. Each user has one row per week of data, with user IDs linking observations across time, as shown in
Figure 2. Here, ID represents a unique user identifier,
f denotes features, and
d represents recorded data values. Due to varied engagement patterns, users have different numbers of active weeks. To standardize the input for machine learning, each user’s timeline is padded to 8 weeks using NaN (not a number) values for inactive weeks, as shown in
Figure 3.
4.3. Inactivity Treatment
In this study, we analyze user inactivity (NaN), their timing, and potential causes to improve the accuracy of churn predictions. To do so, we define First_Active_Week (FAW) as the first week a user shows activity. NaN values occurring before FAW may indicate a new or dormant user (N/D) or temporary absence. At the same time, NaNs after FAW may suggest either decreased interest (DI) or external factors (EFs), such as travel or technical issues.
Figure 4 illustrates an example timeline, and
Table 5 shows the distribution of NaNs across users by FAW. The table reveals that many data points exhibit NaN values, indicating that a significant portion of users may experience intermittent periods of inactivity.
To reflect behavioral intent, we classify NaNs based on FAW:
- (1)
NaN data before FAW: May indicate a new/dormant user (N/D) or temporary inactivity due to external factors (EFs).
- (2)
NaN data after FAW: Typically indicates temporary inactivity, which may be from decreased interest (DI) or external factors (EFs).
We distinguish between non-existent data (from DI or N/D) and missing at random (MAR) data (from EF). The former is imputed using minimum values; the latter is estimated using imputation methods.
Table 6 outlines the three assumptions we adopt when imputing NaN values, based on whether they occur before or after the user’s FAW.
Method 1 assumes that all inactivity reflects a lack of engagement: NaNs before FAW are treated as non-existent (either new or dormant users), and NaNs after FAW are attributed to decreased interest. Method 2 distinguishes between the two, treating pre-FAW NaNs as non-existent and post-FAW NaNs as MAR. Method 3 assumes that all inactivity is temporary and externally caused, treating all NaNs as MAR due to EF. This behavior-based categorization supports more accurate imputation and improves churn prediction.
4.4. Data Structure Transformation
After padding each user’s timeline with NaN values, the dataset is structured as a three-dimensional array with the following dimensions:
N (number of users),
(weeks), and
D (number of features). For imputation, we reshape the data into either a wide or a long format, as shown in
Figure 5. The choice of structure can influence imputation results, as each format emphasizes different aspects of the data, such as temporal patterns or inter-feature relationships, which may affect how missing values are estimated.
- (1)
Wide format (N, ): All weekly activity features for each user are concatenated into a single row, offering a holistic user-level view.
- (2)
Long format (, T): The dataset organizes weekly activity information by feature, with time as the primary axis.
4.5. Imputation Methods
We apply four imputation methods to handle missing values in user activity logs. Other imputation approaches, such as KNN and MissForest, were also explored during preliminary experiments. However, due to memory limitations and excessive runtime on the full dataset, their results are not included in the final analysis.
- (1)
Mean imputation: Replaces missing values with the mean of each variable.
- (2)
Mode imputation: Fills missing entries using the most frequent value.
- (3)
Linear interpolation: Estimates missing values by assuming a linear trend between observed values.
- (4)
MICE: A more advanced method that models each variable with missing data as a function of the others, preserving multivariate relationships. The MICE process follows five steps:
- (a)
Perform simple imputation (e.g., mean imputation) for all missing values in the dataset. These imputed values are so-called placeholders.
- (b)
Set placeholders for one variable back to missing values.
- (c)
Treat the variable with NaN values as the dependent variable and predict these missing values using regression models based on the remaining variables.
- (d)
Repeat (b)–(c) for all variables with missing data. One complete iteration, known as a cycle, ensures all missing values are replaced based on inter-variable relationships.
- (e)
Conduct imputations for multiple cycles until the estimates stabilize, ensuring consistent relationships among variables.
In this study, we use a single imputed dataset from MICE, not full multiple imputation with pooled estimates, meaning that instead of sampling from the full posterior distribution, the mean of the posterior distribution generated by Bayesian ridge regression was used for missing value imputation. This is because our focus is not on inferential statistics, and also requires training and evaluating multiple models across independently imputed datasets. Since our focus is on predictive performance rather than parameter inference, a single deterministic imputation was considered sufficient and more consistent for comparing imputation strategies.
Imputation methods are applied across both wide and long dataset formats, following the NaN treatment strategies defined in
Section 4.3. Imputation results also vary by how NaN values are interpreted in relation to the user’s FAW. We apply each method to datasets generated from three NaN treatment strategies (see
Figure 6), resulting in several versions of imputed datasets, as shown in
Table 7.
To evaluate the impact of different imputation strategies, each imputed dataset is used to train churn prediction models. Model performance is evaluated using standard classification metrics, including AUC-ROC and F1 score (see
Section 5.3). This enables a direct comparison of how different imputation methods and assumptions for handling NaNs affect predictive accuracy. The analysis reveals the extent to which imputation quality influences the reliability of churn classification.
5. Experiments and Analysis
We evaluate the impact of different imputation strategies on churn prediction performance. Following the preprocessing steps described earlier, experiments were conducted in the Google Colab Python 3 environment, using a fixed random seed of 42 for reproducibility.
5.1. Data Imputation and Evaluation
To handle missing values arising from user inactivity, we applied several imputation methods as detailed in
Section 4.5. Among these, MICE was implemented using the IterativeImputer from Python’s scikit-learn library [
38], with Bayesian ridge regression [
39]. This setup reduces overfitting and captures parameter uncertainty, contributing to more stable predictions.
Table 8 lists the MICE parameters used.
To verify that imputation preserved essential dataset characteristics, we evaluated structural and temporal consistency:
- (1)
Structural Consistency: We computed feature correlation matrices before and after imputation and quantified their changes using mean correlation difference (MCD). If the relationships among features are significantly altered after imputation, the validity of subsequent analyses may be compromised. As shown in
Table 9, all datasets exhibited MCD values below 0.1, indicating minimal structural changes. According to Cohen’s criteria [
40], these values suggest a slight difference, supporting the reliability of the imputation methods used.
- (2)
Temporal Consistency: We also examined whether imputation preserved weekly activity patterns by comparing the original dataset against three imputed datasets: linear_nd_t, mode_nd_t, and mice_all_n_td.
Figure 7 shows average play_time and payment_amount trends by churn label across 8 weeks. For example, Label 1’s spike in week 7 and Label 3’s consistently low values are preserved across all imputed versions. These results confirm that key temporal dynamics remained intact after imputation, further supporting the validity of the datasets for modeling.
5.2. Churn Prediction
With the structural and temporal integrity of the imputed datasets confirmed, we proceed to evaluate their impact on churn prediction. Churn prediction is formulated as a four-class classification task, where each user is categorized according to the churn labels defined in
Table 1. RF was selected for its strong performance in handling noisy or incomplete data. Its interpretability and low risk of overfitting make it a practical and robust baseline model for this study, especially given the imputed, multi-class dataset. The RF configuration is listed in
Table 10. For comparison, we also employed PCA and classifier chains:
PCA reduces feature dimensionality while retaining at least 95% of total variance, which was achieved using 24 components.
Classifier chains model label dependencies by predicting the four churn labels in sequence: [0, 1, 3, 2].
All models were trained and evaluated on an 80:20 split of the dataset, with 20% used for testing.
5.3. Performance Evaluation Metrics
Model performance was assessed using a weighted F1 score and a micro-averaged AUC-ROC, computed on the 20% test set. These metrics were selected based on their effectiveness in handling imbalanced multi-class classification tasks, as discussed in
Section 2.1.
AUC-ROC measures the model’s ability to distinguish between positive and negative labels by measuring the area under the ROC curve. In this multi-class setting, we use micro-averaged AUC-ROC, which aggregates performance across all classes by pooling the individual decisions into a single binary classification task [
38,
41]. It is computed as follows:
where
C is the number of labels,
N is the total number of instances,
is the predicted probability of instance
i belonging to label
c,
is the indicator function that returns 1 if the condition is true and 0 otherwise, and
is the number of instances in label
c.
The weighted F1 score is the harmonic mean of precision and recall, accounts for class imbalances, and provides a balanced assessment of classification accuracy. This metric accounts for label imbalance and is well-suited for multi-class evaluation. It is defined as follows:
where
is the proportion of samples in label
c, and
C is the number of labels.
5.4. Results and Discussions
We evaluate the impact of different imputation methods on churn prediction using the RF model as a baseline.
Table 11 presents micro-averaged AUC scores across the four churn labels, and
Figure 8 shows the corresponding ROC curves. Among all methods, the MICE-imputed dataset (mice_all_n_td) consistently outperforms others, achieving AUCs of 0.9423 and 0.9734 for labels 0 and 1, and 0.8549 and 0.8773 for labels 2 and 3, respectively.
To further improve classification, we applied classifier chains [
42], which sequentially predict labels while incorporating earlier predictions as input features. A chain order of [0, 1, 3, 2] was used, corresponding to the four churn categories. Additionally, PCA was used to reduce feature dimensionality while preserving variance. Although 23 components capture 95% of the variance, 24 components were empirically found to yield the best balance between dimensionality and predictive performance.
Table 12 presents weighted F1 scores across datasets and model configurations (RF, RF + PCA, classifier chains, and classifier chains + PCA), using 5-fold cross-validation. Results are reported as means and standard deviations.
Method 1, represented by the ori_data, serves as the baseline. Method 2, which imputes only NaNs after FAW, yields limited improvements. This is likely due to the relatively small proportion of missing values occurring after FAW (see
Table 5). In contrast, Method 3 assumes all NaNs, before and after FAW, are MAR and applies full imputation, consistently yielding the best performance. The MICE-imputed datasets (mice_all_n_td and mice_all_nd_t) achieved the highest weighted F1 scores of
and
under the RF model. Classifier chains further improve performance, especially when combined with PCA. The best result was observed for mice_all_n_td using classifier chains + PCA (
), followed by mice_all_nd_t using classifier chains alone (
). Across all configurations, low standard deviations (generally below 0.01) indicate consistent performance across data partitions.
To confirm the statistical significance of these results, we conducted Friedman tests on both the per-label AUC scores (
Table 11) and the weighted F1 scores (
Table 12). All tests yielded
values greater than 30 with
, indicating that the differences in model performance across imputation strategies are statistically significant. These findings confirm that the choice of imputation method has a substantial impact on both class-wise discriminative ability and overall predictive performance.
Finally, we analyzed feature importance to identify behavioral attributes contributing most to churn prediction. Across all datasets, playtime consistently ranked highest, followed by login frequency, combat and reward metrics, and social interaction features. Regarding computational efficiency, most imputation methods completed within minutes. However, MICE, when applied under Method 3—where all non-login periods are treated as MAR—required approximately three hours due to the high dimensionality of the dataset. Still, given that churn prediction is typically conducted on a weekly or monthly basis, this runtime remains feasible for practical applications.
6. Conclusions and Future Work
User churn in online games can arise from various forms of inactivity, including new or dormant users, loss of interest, or external factors limiting game access. Existing models often treat all non-login periods uniformly or disregard them entirely, resulting in inaccurate behavioral interpretations and compromised predictive performance.
To address this limitation, we proposed a novel approach that differentiates between types of inactivity and handles them as either non-existent data or missing values. We applied multiple imputation strategies, minimum value substitution, mean, mode, linear interpolation, and MICE. Their impact on churn prediction is evaluated using RF, with PCA and classifier chains incorporated to enhance performance.
Our experimental results demonstrated substantial and statistically significant improvements across different dataset treatments. Method 3 datasets (mice_all_n_td and mice_all_nd_t), which treat all non-login periods as missing at random, achieved up to 3% improvement in the F1 score and approximately 1% in AUC compared to the baseline. These gains were validated by Friedman tests () and supported by low standard deviations (generally below 0.01), confirming performance stability across cross-validation folds. The best result (F1 = 0.7065 ± 0.0057) was achieved using classifier chains with PCA on mice_all_n_td, demonstrating the value of combining robust imputation with label dependency modeling and dimensionality reduction.
These findings highlight the effectiveness of MICE in preserving data integrity and enhancing prediction, while reinforcing the importance of accounting for external factors when modeling churn. Ignoring such behavioral nuances may lead to suboptimal retention strategies and missed intervention opportunities.
6.1. Limitations and Future Research Directions
While this study makes meaningful contributions to churn prediction in online games, several limitations remain, offering areas for future investigation.
First, user inactivity was categorized into the following three types: new or dormant (N/D) users, decreased interest (DI), and external factors (EFs), based primarily on behavioral patterns and the timing relative to the user’s first active week (FAW). In reality, inactivity may result from a broader range of causes, such as the release of competing games or the use of multiple accounts. Expanding this classification to reflect richer behavioral signals could enhance both imputation quality and churn prediction performance, better capturing the complexity of user behavior.
Second, while MICE provided reliable imputation results, its computational demands increased substantially with dataset size and dimensionality. This limited its feasibility for real-time analysis or large-scale game environments. Future work could explore more scalable imputation methods, such as matrix factorization or neural-based approaches, to enable faster processing without sacrificing accuracy.
Third, our experiments were conducted using a random forest model in combination with PCA and classifier chains. While the results demonstrate improvements in prediction, it remains unclear whether these improvements would generalize to other models, such as gradient boosting or deep learning approaches. Future work should investigate the robustness of the imputation strategies across a broader range of prediction models.
Lastly, this study relied on data collected in 2018. Given the rapid evolution of the gaming industry, validating the findings on more recent and diverse datasets is essential. This would help assess whether the proposed methods remain effective under modern player dynamics, engagement systems, and churn behaviors. In addition to dataset generalization, future work could also compare the effectiveness of alternative classifiers and evaluate whether the performance of imputed data remains consistent across different modeling approaches.
6.2. Practical Implications
This study enhances churn prediction accuracy by distinguishing temporary inactivity from actual churn risk, enabling game developers to implement targeted retention strategies, such as personalized rewards, re-engagement campaigns, and timely notifications, to reduce permanent churn. A deeper understanding of inactivity patterns also supports better design decisions, including difficulty balancing, social feature enhancements, and content scheduling, all of which help sustain long-term player engagement and revenue.
Beyond gaming, the proposed methodology applies to industries where user engagement and retention are critical, such as streaming services, e-commerce, and subscription platforms. By interpreting inactivity as a form of missing data and applying appropriate imputation techniques, organizations can improve predictive accuracy and tailor retention efforts more effectively. While domain-specific adjustments may be needed, the overall approach offers broad applicability for enhancing customer lifecycle management.