Customers are the cornerstone of business success across industries. Companies invest significant resources in acquiring new customers and, more importantly, retaining existing ones. However, customer churn remains a major challenge, leading to substantial financial losses. Addressing this issue requires a deep understanding of
[...] Read more.
Customers are the cornerstone of business success across industries. Companies invest significant resources in acquiring new customers and, more importantly, retaining existing ones. However, customer churn remains a major challenge, leading to substantial financial losses. Addressing this issue requires a deep understanding of customers’ cognitive status and behaviours, as well as early signs of churn. Predictive and Machine Learning (ML)-based analysis, when trained with appropriate features indicative of customer behaviour and cognitive status, can be highly effective in mitigating churn. A robust ML-driven churn analysis depends on a well-developed feature engineering process. Traditional churn analysis studies have primarily relied on demographic, product usage, and revenue-based features, overlooking the valuable insights embedded in customer–company interactions. Recognizing the importance of domain knowledge and human expertise in feature engineering and building on our previous work, we propose the Customer Churn-related Knowledge Base (ChurnKB) to enhance feature engineering for churn prediction. ChurnKB utilizes textual data mining techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), cosine similarity, regular expressions, word tokenization, and stemming to identify churn-related features within customer-generated content, including emails. To further enrich the structure of ChurnKB, we integrate Generative AI, specifically large language models, which offer flexibility in handling unstructured text and uncovering latent features, to identify and refine features related to customer cognitive status, emotions, and behaviours. Additionally, feedback loops are incorporated to validate and enhance the effectiveness of ChurnKB.Integrating knowledge-based features into machine learning models (e.g., Random Forest, Logistic Regression, Multilayer Perceptron, and XGBoost) improves predictive performance of ML models compared to the baseline, with XGBoost’s F1 score increasing from 0.5752 to 0.7891. Beyond churn prediction, this approach potentially supports applications like personalized marketing, cyberbullying detection, hate speech identification, and mental health monitoring, demonstrating its broader impact on business intelligence and online safety.
Full article