A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy

Shanmugam, Saranya; Elavarasan, Einiyaselvi; Madhavarao Seshadri, Narassima; Ashokkumar, Dharun; Senthilkumar, Santhoshkumar; Mohanavelu, Thenarasu

doi:10.3390/jtaer21030093

Open AccessArticle

A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy

by

Saranya Shanmugam

¹,

Einiyaselvi Elavarasan

¹,

Narassima Madhavarao Seshadri

^2,*

,

Dharun Ashokkumar

¹,

Santhoshkumar Senthilkumar

¹ and

Thenarasu Mohanavelu

³

¹

Department of Computer Science and Business Systems, KGISL Institute of Technology, Coimbatore 641035, Tamil Nadu, India

²

Great Lakes Institute of Management, Chennai 603102, Tamil Nadu, India

³

Department of Mechanical Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, Tamil Nadu, India

^*

Author to whom correspondence should be addressed.

J. Theor. Appl. Electron. Commer. Res. 2026, 21(3), 93; https://doi.org/10.3390/jtaer21030093

Submission received: 23 January 2026 / Revised: 28 February 2026 / Accepted: 17 March 2026 / Published: 19 March 2026

(This article belongs to the Section Data Science, AI, and e-Commerce Analytics)

Download

Browse Figures

Versions Notes

Abstract

The highly competitive nature of the online food delivery (OFD) market faces a serious retention problem, with acquiring new users typically being much more expensive than retaining existing users. Traditional prediction methods that rely primarily upon static transactional metrics such as recency and frequency are often unable to capture the psychological ‘disconfirmation’ which occurs prior to churn. To fill this gap, this study proposes a framework based on Expectation-Confirmation Theory (ECT). Unsupervised K-Means clustering was employed to classify a simulated and filtered dataset with 1500 customer records containing behaviour, geography, etc. This framework also couples sentiment analysis from BERT, allowing it to identify psychological “silent” attrition. Heterogeneous cohorts, which exhibit different psychological antecedents (utilitarian versus hedonic), were identified. The empirical results of our analyses demonstrated that Random Forest Classifiers with segment-specific features outperform baseline transactional models (F1 = 0.76) with an F1 Score of 0.89. The visual analytic interface developed provides a holistic view of the consumption process than traditional prediction models, including prescriptive, automated segment-based mitigation strategies. Our findings contradict the assumption that the “frequency–loyalty” model applies to all users. High-frequency discretionary users are found to be elastic in terms of retention and will experience significant churn. By utilising the automated action log, managers can plan targeted, highly efficient retention strategies rather than blanket discounting approaches.

Keywords:

customer churn prediction; online food delivery; behavioural segmentation; Expectation-Confirmation Theory; Random Forest

1. Introduction

Online food delivery (OFD) is growing rapidly (and has been for years). The biggest drivers behind this growth are an increasing number of people globally buying smartphones and improvements in last-mile logistics (infrastructure). This rapid technological change has fundamentally altered how restaurants operate and how consumers dine out. According to projections from Grand View Research [1], as demand for OFD continues to grow globally, the OFD market is expected to expand, reaching a market value of roughly USD288.84 billion by the end of 2024. From 2024 to 2030, OFD is expected to have a consistent Compound Annual Growth Rate (CAGR) of approximately 9.4%. Emerging markets, specifically those experiencing rapid digitalisation, will see the most growth from OFD over the coming years. For example, between 2025 and 2034, the Indian OFD market alone is expected to grow at a CAGR of approximately 27.30%, driven primarily by the accelerated growth of platform aggregators and the implementation of cloud kitchen business models across India [2].

However, with this rapid growth comes significant challenges for OFD providers, notably increased competition and reduced consumer loyalty. Previous research has identified critical factors that determine consumer satisfaction and sustainability in the online food delivery ecosystem [3,4], while recent studies highlight the critical role of electronic word-of-mouth (e-WOM) in shaping consumer acceptance [5]. Research has also shown that the correlation between an online “click” and actual engagement or loyalty to a restaurant has weakened in recent years [6]. While acquiring customers remains the main metric by which OFD providers measure their growth, the “leaky bucket” phenomenon, in which new customers are acquired at high cost but quickly leave the service, is damaging to the long-term profitability of many OFDs. Industry data shows that almost 58% of consumers stop using a platform within three purchases, indicating a serious customer retention problem The financial impact of losing customers is also substantial; research has shown that it costs between 5 and 25 times more to acquire a new customer than to keep one [7]. Because of this, OFD platforms have shifted their strategic emphasis from purely acquiring customers to aggressively predicting and retaining them.

The academic foundation of customer loss can be analysed using a framework known as Expectation-Confirmation Theory (ECT) [8]. ECT states that customer satisfaction and intention to stay (retention) are affected by whether a customer’s original expectations were met by their actual experience. In the OFD space, a dissatisfactory experience (e.g., late delivery, cold food, poor service) creates a gap between expectations and satisfaction, directly affecting retention intent. Recent empirical evidence confirms that such service failures, particularly in AI-mediated environments, significantly accelerate consumer disengagement and churn [9]. Although past methodologies for predicting which customers will stop ordering have typically relied on static factors (demographics) or simplistic measures of behaviour (i.e., recency and frequency), they have often not incorporated the psychological impact of negative experiences and/or of unstructured feedback [10,11]. At the same time, it was pointed out that accurately predicting churn requires models that can capture complex consumer behaviour patterns.

Research has revealed a gap in the literature regarding the interaction between behavioural indicators and sentiment indicators. Current research often separates behavioural (transaction) logs from sentiment indicators (from reviews and ratings). Behavioural models may indicate that a customer has stopped ordering, but they do not explain why [12]. In contrast, sentiment analysis can detect dissatisfaction but does not necessarily provide the behavioural context necessary to predict churn within different user groups [13]. There is limited literature utilising a hybrid approach (using unsupervised techniques such as K-Means to segment behaviour and sophisticated NLP models like BERT to capture sentiment) to develop a complete fraud prediction framework.

To fill this knowledge gap, this research presents a unique, integrated machine learning solution for segmenting users based on the temporal ordering of their purchases, along with sentiment measurements derived from the analysis of product reviews. The goal of training segment-specific Random Forest Classifiers is to improve classification performance and provide more precise, actionable insights for user retention. Although hybrid models enhance prediction accuracy, there remains a large gap between their use in prediction and their use in developing business intelligence. The most valuable predictive models developed using hybrid modelling often cannot be applied in the field due to their inability to provide variable, dynamic predictive data. In this paper, I have created an all-encompassing visual analytics framework that bridges the gap between prediction and prescription by using separate anomaly detection score outputs to recommend actionable mitigation measures against predicted customer churn. To reach these goals, this study will focus on the following primary Research Questions (RQ):

RQ1: How effectively does K-Means segmentation categorise users into distinct behavioural retention profiles?
RQ2: To what extent does integrating BERT-based sentiment metrics capture “silent attrition” and improve churn prediction accuracy?
RQ3: What are the distinct utilitarian and hedonic factors that drive churn across different user segments?
RQ4: How can the integration of machine learning with visual analytics prescribe segment-specific mitigation strategies to improve retention?

The overall methodology is as indicated in Figure 1. Section 2 discusses the current state of the literature relative to this study and what previous research lacks in methodology. Section 3 details the research methodology, outlining our data collection and sampling procedures before breaking down the mechanics of the hybrid K-Means and BERT framework. Section 4 presents our empirical results and comparative metrics. Section 5 examines the findings through the lens of Expectation-Confirmation Theory, separating the identified churn drivers by segment. Section 6 contains the implications of our findings for theory and practice. Section 7 concludes with a summary of the main conclusions and the implications for future research.

2. Literature Review

This section presents a synthesis of the theoretical and methodological aspects of customer churn in the online food delivery (OFD) environment. A synthesis of Expectation-Confirmation Theory (ECT), Advanced Behavioural Analytics, and Natural Language Processing necessitates the proposed hybrid framework.

2.1. Theoretical Framework: Expectation-Confirmation Theory (ECT)

The theoretical foundation for studying customer churn in this project is from the original authors of Expectation-Confirmation Theory, Oliver [8], with later adaptations for Information Systems Research. To paraphrase, the ECT model of retention suggests that an individual’s intent to continue using any service will not be solely influenced by their perception (experience) of how the service performed, but also by how they perceive it compared to their original expectations or stated preferences. Based on the framework of OFD, this comparison will provide examples of either confirmatory or negative disconfirmatory experiences. Consequently, when an OFD platform meets or exceeds consumers’ original expectations about delivery speed, food quality, and software usability (the customer will likely experience positive disconfirmation), this will lead to a satisfied consumer who remains loyal to that platform. However, when the OFD platform fails to meet or exceeds consumers’ original expectations regarding those same service attributes (e.g., late deliveries, cold food, poor customer service), it creates psychological disharmony and considerably raises the risk of churn [14]. The recently published literature suggests that traditional transactional metrics (such as delivery time) are less effective than psychological disconfirmation in capturing the totality of this issue. For instance, an OFD could be recorded as “delivered on time” in the system, even though the food was cold and/or spilt by the delivery person, resulting in a negative customer experience that is not available in structured databases [15]. There is a theoretical basis for the need to collect and assess unwritten responses from end users. Customer reviews are often the only means for displeased customers to express their discontent, and they include emotional nuance and the rationale behind their dissatisfaction. Consequently, existing research explores the use of ECT’s framework and incorporates sentiment analysis as a direct representation of the “confirmation” construct, indicating that incorporating NLP into churn analysis is at least a possibility.

2.2. The Evolution of Churn Prediction Models

Over the past ten years, churn prediction has evolved from a process that relied on simple statistical models to a more sophisticated use of machine learning (ML) algorithms. Early research efforts primarily relied on logistic regression (LR) as the standard supervised approach to predict customer churn risk; Khodabandehlou & Zivari Rahman [12] were the first to compare various supervised ML methods and to create a benchmark for assessing their performance, based on Behavioural Analysis. Although LR models are highly interpretable because they can show a linear relationship between input features and how they affect the target variable, they do not take into consideration the complexity of human behaviour, especially with the issue of threshold effects, where a slight increase in the time taken to deliver a good can cause a dramatic reduction in the amount of loyalty displayed by customers. As a result, ensemble machine learning techniques such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) have become widely used in the field by researchers and are now considered the best-of-the-best standard in the marketplace. Recent studies within the OFD industry have shown that using a Random Forest Classifier generally yields much higher predictive accuracy than either a single decision tree or a logistic regression model, with Random Forest Classifiers achieving around 85 to 90 per cent accuracy on tabular datasets. Thus, this finding corroborates the results of Thamizhselvi et al. [16] in their recent analysis of restaurant churn. Although a number of researchers have used deep learning (DL) models for sequential data, such as via Long Short-Term Memory (LSTM) networks, many researchers and practitioners have recently criticised DL models as being opaque or “black boxes.” In a business context, understanding which customers are likely to churn (or how many will churn) provides less value than understanding why they are likely to churn, enabling companies to develop retention strategies [17]. As a result, Random Forests strike a balance of providing improved prediction capabilities (due to superior predictive power) and providing the ability to interpret the features utilised within the model, for instance, if a manager can ascertain that delivery delay is a stronger predictor of churn for newer customers than for established loyal customers. As such, this research uses Random Forest modelling for robustness and explainability but enhances the input features by incorporating state-of-the-art sentiment metrics. This aligns with emerging research on ‘Next-Generation’ shoppers, which suggests that understanding AI-driven perceptions is essential for modelling modern consumer behaviour [18].

2.3. Behavioural Segmentation: Moving Beyond Static RFM

Many aggregate churn prediction models assume that all users exhibit similar behaviours and drop out of the service for the same reasons. This assumption has been addressed by creating user behavioural segments as a core focus of Customer Relationship Management (CRM). Traditionally, the recency–frequency–monetary (RFM) Model has been used to quantify customer value; however, RFM scoring of customers serves limited use, particularly in dynamic markets such as On-Demand Food Delivery (OFD), where user habits change frequently [19]. For this reason, the current trend in research is to use unsupervised learning techniques, specifically the K-Means clustering algorithm, to identify distinct statistically significant user groups (such as “high-value loyalists,” “price-sensitive seasonal customers,” or “hibernating risk customers”) as opposed to setting arbitrary thresholds for each group. Empirical evidence suggests that training churn analysis models specific to user segments yields significantly improved accuracy compared to regressing a global churn model, because the predictors of each segment’s churn differ widely [20]. For example, the churn of a price-sensitive segment may be due to a lack of promotional offers, while a high-value segment may experience churn due to inadequate service quality. However, despite these research investigations, a significant gap persists in understanding user segments within the OFD context to date as the majority of segmentations completed for this space have only employed structured transactional logs as data sources; thus, unstructured data generating behavioural signals remain unobserved and consequently the behavioural characteristics of the profiles generated through structured transactional logs do not tell the whole psychological profile story. It is critical to note that the research conducted by Momtaz et al. [21] and Halim et al. [22] has shown that user retention is contingent upon a variety of complex service interactions that transactional models omit.

2.4. Unstructured Data and Advanced Sentiment Analysis

Due to the rapid rise in user-generated content, Natural Language Processing (NLP) now forms part of churn modelling. Traditionally, sentiment analysis here has used lexicon-based methods such as Valence Aware Dictionary and sEntiment Reasoner (VADER), which derive absolute sentiment by aggregating word-level polarity across a text. Although these methods have low computational complexity, they often miss the subtleties of words’ meanings in context, including sarcasm and polysemy. For example, a delivery might be described as “sick,” which could indicate a favourable or unfavourable result depending on the customer group [23]. Transformer architectures in recent years, specifically Bidirectional Encoder Representations from Transformers (BERT), have changed this paradigm by enabling bidirectional text reading (using an attention mechanism) and thereby identifying the full range of contextual relationships among words. In relation to benchmarks for the food delivery sector, recent reports indicate that BERT significantly outperforms lexicon methods, with Pearson correlations of r > 0.74 for BERT compared to r < 0.59 for VADER [24]. In addition to the significant improvements seen in sentiment scores, the use of BERT-based scores as an input feature in predictive models has led to higher AUC scores, as they capture the “disconfirmation” aspect of ECT that transactional data cannot. This present study builds on the extraction of detailed sentiment scores using a pre-trained BERT model and then reports quantitative values from qualitative complaints, which can be used as input features by the Random Forest model. This study also improves on the work reported by Yaiprasert and Hidayanto [13] by integrating sentiment scoring into an AI-based ensemble learning approach to develop enhanced digital marketing strategies.

2.5. Research Gap and Proposed Framework

Churn prediction, behavioural segmentation, and sentiment analysis have been studied comprehensively, but no studies integrate the three. Most studies have focused on either optimising behavioural segmentation techniques (such as RFM and K-Means) while ignoring the textual nuances of psychological profiling via sentiment analysis or applying advanced Natural Language Processing (NLP) techniques (such as BERT) to all users without considering differences in user behaviour. There is little research available that creates a unified hybrid framework in which users are first segmented by transaction (purchase) behaviours and then assessed on a granular basis using sentiment features to predict churn within those segments. This research aims to bridge the gap by proposing a hybrid methodology in two stages. Initially, K-Means clustering will be utilised to classify customers according to their purchasing behaviour over time. Following this stage, behavioural groupings will be enhanced with advanced sentiment scores obtained through the application of a pre-trained BERT Model. The hybrid application of the Expectation-Confirmation Theory (ECT) will provide an in-depth empirical understanding of customer churn. Ultimately, it is expected that these two methods will produce causal factors (antecedents) for each segment of churn (e.g., whether “seasonal” users are more likely to churn due to being price sensitive rather than behavioural; or whether “daily” users tend to churn due to poor service quality or sentiment). After identifying the antecedents, we will provide recommendations for creating efficient retention strategies.

In addition, the current literature has rarely examined the relationships among service failure, geographic location, and the method(s) used to acquire customers. Models that output binary churn probabilities do not account for the underlying risk profile, such as the correlations among the variables used to derive the churn probability (e.g., delivery time, negative reviews, and recency). Therefore, there is little research that operationalises predictions about customer churn into procedures for automating prescriptive actions based on customer-specific risk profiles (for example, providing priority support and/or discount offers).

3. Research Methodology

This study developed an integrated multi-stage analytical model based on ECT to predict customer churn through three phases of analysis: (1) data collection and cleaning, (2) behavioural segmentation using unsupervised techniques, and (3) hybrid supervised modelling to forecast churn. Using a modular approach allows for identifying distinct customer behaviours and incorporating advanced features to ensure that predictive models are developed based on actual users and their behaviours rather than relying solely on generic user assumptions.

3.1. Data Collection and Sampling

The confidential nature of consumer retention data in the gig economy is highly proprietary, which is why this research uses a synthetic data set. The synthetic data set was designed to closely resemble the complex transactional and textual data logs common in today’s OFD (On-Demand Food) mobile apps. The use of a synthetic data set allows us to create a reproducible proof of concept of the proposed Hybrid Analytical Framework (HAF) to rigorously test it. The structured data set contains realistic parameters for each transaction, such as order ID, time stamp (for when it was ordered), delivery time (in minutes), order price (currency), discount percentage, customer rating (scale of 1–5), acquisition channel, and geographic coordinates (longitude and latitude). In addition, the unstructured data consists of synthetic reviews designed to resemble actual patterns of user feedback.

Establishing data integrity was the primary focus before beginning the data analysis. To ensure that all incomplete records (e.g., missing delivery time; null review text) were adequately addressed, we implemented a data-cleaning procedure. Missing numerical data were replaced with the median for that record type, while we excluded any transaction records that were missing essential keys/IDs (e.g., order_id; customer_id). To maintain consistency across our segmentation and prediction phases, we created an aggregation-and-filtering pipeline. As behavioural loyalty modelling will be based on longitudinal data rather than individual transactions, 12,000 unique transactions were aggregated into unique users (aggregated by user). In addition, to avoid “cold start” users (those with little behavioural history) adding arbitrary variance to the K-Means clustering algorithm, any user with three or fewer lifetime transactions was excluded from the analytical cohort. We also mapped 3000 “unstructured” reviews to the surviving user IDs.

Following this filtering and aggregation, our raw data were transformed into a table comprising 1500 distinct customers (1096 “retained” and 404 “churned”). This tiered data architecture enables training predictive models on stable, aggregated instances of each customer’s profile while providing visual analytics with access to detailed time-series and text-based information. This is possible by rejoining the predictive results with the raw transaction log data. To provide complete transparency into the methodology used to filter and aggregate the dataset, and to clarify the intermediate datasets that contributed to our findings, the dataset’s full provenance is detailed in Table 1.

3.2. Phase 1: Behavioural Segmentation (K-Means Clustering)

To respond to Research Question 1 regarding user heterogeneity, K-Means clustering (a centroid-based unsupervised learning algorithm) was used to create mathematical representations (user personas) of distinct user groups based on their temporal-order behaviours, rather than arbitrary manual thresholds (e.g., high value vs. low value). Feature engineering was performed according to the recency, frequency and monetary (RFM) framework, and three features were created for the clustering algorithm to know what a user was doing between orders: (1) recency_days (the number of days since the last order), (2) orders_per_month (the number of orders placed each month), and (3) avg_order_value (the average value of orders placed by that user). Since K-Means clustering uses Euclidean distance to create clusters, it is sensitive to the scale of the input variables. For example, although avg_order_value is in the hundreds of dollars, it would also dominate the variable orders_per_month (in single digits).

To eliminate this bias, Z-score normalisation was completed with StandardScaler to scale all features to a mean of 0 and a standard deviation of 1. To help us establish the best number of clusters (k), we initially looked at the Elbow Method to find an inflexion point (k = 3) based on the Within-Cluster Sum of Squares (WCSS). After this, we took another step beyond using the visual heuristics and conducted an objective analysis using the Silhouette Score and the Davies–Bouldin Index. The k = 3 configuration shows a Silhouette Score of 0.261 and a Davies–Bouldin Index score of 1.240. We selected k = 3, because it achieves an optimal balance between being statistically valid and having a relevant business taxonomy that can be interpreted as three distinct personas based on their frequency of use: Daily active users (high), weekly regulars (moderate), and seasonal/occasional users (low frequency, recency gap). In addition, to ensure that these segments would remain structurally stable and to mitigate the risk of algorithmically induced instability, we performed a bootstrapping reliability check. The dataset was randomly resampled with replacement over 50 iterations, and the stability of the cluster assignments was assessed using the Adjusted Rand Index (ARI). The average ARI score from these simulations was 0.552. Given the behavioural nature of RFM data (i.e., continuous, overlapping clusters), this score indicates a moderate, acceptable level of structural stability, demonstrating that the identified user cohorts represent consistent, stable behavioural patterns rather than temporary features arising from the initial centroids.

K-Means is also more scalable than hierarchical clustering. Its time complexity is O(n) vs. O(n³) for hierarchical clustering, which makes K-Means a better option for processing millions of transactions in real-time OFD situations. The cluster labels were then assigned to the respective users as base “segment” variables for the next phase of predictions.

3.3. Phase 2: Sentiment Feature Extraction (BERT)

Sentiment analysis was typically performed using a lexicon-based method, also called a “bag-of-words” approach, such as VADER or TextBlob, where sentiment is assigned to a word based on the presence of positive or negative words that represent that sentiment. These methods can be problematic because they do not consider the context in which the words were used (e.g., a negative sentiment can occur in a word that is part of a positive phrase). For example, there are critical nuances in food reviews where phrases like “wicked fast” indicate a product’s positive attributes, not its negative ones. To overcome this limitation and effectively quantify the “disconfirmation” construct of ECT, the current study applied BERT (Bidirectional Encoder Representations from Transformers).

This study used the nlp-town/bert-base-multilingual-uncased-sentiment model, an advanced transformer architecture extensively trained on a vast database of consumer product/service reviews, including posts about local restaurants, making it highly applicable to OFD. As part of the validation process for using the BERT model as a surrogate for the “confirmation” construct in Expectation-Confirmation Theory, we conducted a manual validation check before deploying the model to production. We independently annotated a stratified random sample of 200 user reviews from our data set to determine their sentiment polarity by two human annotators. Comparing the results to the BERT model, we found that the model achieved an agreement rate of 88.5% with the two annotators, even with the highly specialised slang and subtleties of this domain (e.g., “Food was good but the driver was rude.”). For tokenisation, we broke the original review texts into sub-word units using WordPiece tokenisation. This process allowed us to account for Out-Of-Vocabulary (OOV) words.

The tokenisation process involved breaking the original review text down into sub-word units using WordPiece tokenisation, which enables the handling of Out-Of-Vocabulary (OOV) terms. The BERT architecture processes these tokens bidirectionally and therefore can understand the context of a word based on the words that appeared before and after it. The BERT model produces a softmax probability distribution across five classes (1 star to 5 stars) for each review, associated with each customer. These separate probability distributions were then aggregated into a continuous avg_sentiment score for each customer, calculated as a weighted mean of the sentiment predictions across all reviews submitted by that customer. The conversion of qualitative textual complaints into a continuous numerical feature (S_c) provides the churn model with a much higher-fidelity input and a more complete representation of the user’s cumulative psychological satisfaction than using simple numerical ratings alone.

3.4. Phase 3: Hybrid Churn Prediction (Random Forest)

As part of the final analysis phase, we built a hybrid predictive model using information from both Phase 1, behavioural category information, and Phase 2, sentiment feature information. Instead of using a single global model to train all users (which dilutes the signal for minority segments), we built separate Random Forest Classifiers (RFCs) for each behaviour category (daily, weekly, seasonal). The Random Forest algorithm was chosen for its ensemble nature: it builds a “forest” of independent decision trees during training and predicts by outputting the class mode (Churn vs. non-churn) as the final classification. The Random Forest Classifier is less prone to overfitting due to bagging (Bootstrap Aggregating), in which each tree learns from a random subset of the data.

The feature set used in these models included standard RFM metrics and the newly engineered sentiment features: avg_sentiment score, avg_delivery_time, discount_rate, and avg_rating. To evaluate how well the models perform on previously unseen data, we split all behavioural category datasets into training (80% of the available data) and testing (20% of the available data). Tuning hyperparameters (n_estimators: total number of trees built; max_depth: depth of each tree) was achieved using GridSearchCV and 5-fold CV, which is important for building models with optimal configurations. This also represents a balancing of the bias–variance trade-off. In addition, given the typical imbalance (non-churners are generally more numerous than churners) in churn data sets, the focus of evaluation should be on Precision, Recall, F1 Score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC) metrics. This approach of using separate models by behaviour produces models that learn which factors are significant for each behaviour category. For example, “high delivery time” will likely be a major driver of churn for daily users (who want speed) but will have little effect on seasonal users (who care more about discounts). This segmented hybrid model produces granular actionable insight. The hyperparameters for the Random Forest Classifiers were maximised using Grid Search.

3.5. Phase 4: Visual Analytics and Prescriptive Modelling

The data visualisation component of our strategic roadmap consists of a visual analytics layer for all Random Forest-generated predictions. The relationships among variables, including the effect of delivery time on customer sentiment and comparisons of LTV and risk levels, were determined using continuous regression plots and Kernel Density Estimation (KDE). Finally, we designed a prescriptive recommendation engine to assess specific mitigation strategies (e.g., targeted discounts or fast delivery promises) by correlating each segment’s dominant churn drivers with the estimated retention lift for that segment.

4. Results

The empirical findings from a hybrid behavioural sentiment framework are presented in this section. The analysis supports the notion of user base heterogeneity and demonstrates that segmented and sentiment-enhanced Random Forest models achieve far better predictive performance than traditional models.

4.1. Behavioural Segmentation Profiles (RQ1)

We applied the K-Means clustering algorithm (k = 3) to classify the RFM-normalised dataset and discovered three distinct user cohorts (Table 2). As you can see in the graphs below (customer group), each cohort shows different patterns. The following is also a summary of these unique behaviour patterns based on cluster centroids:

Cluster 1: Daily active users (loyalists), approximately 25% of the user base. This cohort orders very frequently (on average, >15 orders per month) and has had little time since their last order (high recency = habitual ordering). They are very price-sensitive and highly service-speed-sensitive.

Cluster 2: Weekly regulars (largest cohort, about 55% of weekly users). These users typically buy on weekends and have consistently shown high negative elasticity following a single negative experience (e.g., an order missed/delivered late), leading to churn.

Cluster 3: Seasonally/at-risk users—approximately 20% of the user cohort. These users tend to have substantially longer time intervals between orders (higher recency) and are primarily motivated by deep-discount deals.

We present the behavioural heterogeneity in our sample visually in the fully inclusive and comprehensive visual analytics layer (Figure 2, Dashboard 1). In addition to basic histograms that provide information about the general distribution of the population, the “churn risk by spend” box plot shows that there is a significant difference in the variance and median lifetime value (LTV) of retained versus churned customers and does not support the hypothesis that all high-spending customers are immune to churn. In addition, our Kernel Density Estimate (KDE) of churn probability versus risk level (low, medium, or high) shows significant overlap among the three risk tiers, further emphasising that risk is a continuous spectrum rather than a “hard stop” threshold. Finally, the geographic heatmap (location risk status) illustrates geographic risk concentration and indicates that certain areas have the highest concentrations of high-risk customers. Thus, supply chain operations will require localised resiliency to avoid service disruptions based on where a customer is located.

4.2. Impact of Sentiment on Churn (RQ2)

Integrating a BERT-based sentiment score, we found a significant negative correlation between the degree of negative sentiment polarity and churn probability (r = −0.68, p < 0.01). Transactional data such as ‘Delivery Time’ gave a baseline for customers who were dissatisfied with their experience, but using the BERT model, we found that many people churned because of food quality, even though they received their order on time, using only text in the comments (e.g., the food ‘’tasted stale’’).

The time and dimension maps in Dashboard 2 (Figure 3) exemplify the predictive capability of unstructured textual data. The dual-axis (i.e., multi-axis) structure of the virtual timeline versus volume graph illustrates the relationship between two variables over time and reveals the “Lag Effect.” The average sentiment score (as illustrated by the black line) begins to plummet significantly before the volume of negative comments (as illustrated by the red bars) spikes, demonstrating how sentiment trends can serve as an early warning indicator. The ‘Sentiment Score vs Risk Level’ boxplot also illustrates statistical differences in sentiment across mathematically defined risk tiers, providing evidence that high-risk customers consistently hold negative emotional attitudes. Finally, the ‘sentiment volume by acquisition channel’ heatmap provides an additional multi-channelled perspective and identifies specific platforms that create vulnerability, serving as hot spots for the collection of negative disconfirmations, providing the managerial ability to strategically target and manage key operational issues.

The dual-axis presents two aspects of the same data (timeline and binary outcome). This dual-axis presentation provides a means of overcoming a major limitation of most churn modelling efforts (i.e., Churn as a one-and-done process) by adding a temporal or event-driven view, which provides insights into how the average emotional experience changes over time. Hence, the dual-axis presentation shows that the average emotional experience declines steadily over approximately 14–21 days prior to the cessation of the last transaction (the actual churn event). This provides strong evidence supporting the theoretical basis of Expectation-Confirmation Theory regarding the use of digital media platforms, emphasising the dynamic, evolving nature of disconfirmation rather than its occurring instantaneously.

4.3. Predictive Model Performance (RQ3)

To demonstrate the effectiveness of the hybrid framework, it was compared with a “Baseline Global Model”, which refers to a single Random Forest trained on all users without segmentation or sentiment, and collected performance metrics shown in Table 3. There are many metrics where the segmented hybrid model significantly outperformed the baseline model. For instance, the Macro F1 Score (which is essential when dealing with imbalanced churn data) increased from 0.76 to 0.89.

4.4. Determinants of Churn (Feature Importance)

A major finding is that the factors impacting customer churn vary by segment, which is visually confirmed by the Sentiment Drivers Analysis (Figure 4, Dashboard 3). Whereas we grouped the themes that surfaced in our analysis, the collection of granular linguistic evidence via a Natural Language Processing (NLP) pipeline enabled us to identify specific words associated with both positive and negative sentiment. Additionally, we validated our operational drivers via collective regression plots. The ‘impact of delivery time on sentiment’ regression plot showed a steep, negative linear relationship that significantly affects the sentiment of utilitarian daily users, while the ‘impact of discounts on sentiment’ regression plot isolates the economic reason behind the sentiment of seasonal users.

Daily Users: Average delivery time and average sentiment are the two most important predictors of churn from this type of user due to the significance placed on reliability; the major factor triggering churn from this type of user is delays.
Seasonal Users: The most significant predictor is discount rate, in that these users are sensitive to price increases; when discounts are removed from their service, they will churn regardless of service quality.
Weekly Users: The most important predictor is average rating (quality). While this group of users will be more patient and pay more for food, they expect the highest quality.

5. Discussions

This study’s main objective was to develop a machine learning model that implements Expectation-Confirmation Theory (ECT) to assess customer churn within the online food delivery (OFD) industry. We believe that, rather than treating all customers as a single large group (as in traditional black-box algorithms), customers’ tendencies to leave are influenced by several distinct factors, including individual behavioural expectations and psychological disconfirmations. The study will describe these results against previously published literature; we will provide our theoretical contributions and the trade-off between the theoretical component and the application of the mathematical aspect of our model.

Although earlier models primarily examined independently collected behavioural indicators, our integrated dashboard (Dashboard 1) includes both visual indicators and empirical evidence that show that geo-spatial factors play a significant role in churn dynamics. The heat map indicating the ‘Location Risk Status’ highlights a regional clustering of high-risk churn-probability hotspots, rather than a uniform distribution. High-density/urban areas experiencing extreme logistical congestion are therefore geo-spatial catalysts for service failure, ultimately resulting in accelerated localised churn.

5.1. Reevaluating Behavioural Loyalty: The Fallacy of Frequency (RQ1)

Utilising K-Means clustering on RFM-normalised data generated results that categorise statistically distinct cohorts as daily active users, weekly regulars and seasonal users. This result greatly enhances current behavioural models.

Our findings extend those of Tabianan et al. [19], who classified users into binary groups as ‘Active’ versus ‘Passive’, typically using static thresholds. Their approach provides an overall perspective but does not reflect the nuances of ordering rhythm. Our study indicates that ‘Active’ users are not a homogeneous cohort and that the behavioural differentiators between daily (routine-driven) and weekly (occasion-driven) users constitute statistically significant distinctions; thus, the ability to make predictions based on these distinctions is critical. Likewise, the research of Tabianan et al. [19] corroborates our research findings in identifying behavioural segmentation in B2B food industries as revealing ‘hidden’ at-risk clusters within apparent loyal, high-volume customers.

Based on the data collected, one counterintuitive finding was the vulnerability of ‘weekly regular’ members to quitting. The typical view within the framework/field of Customer Relationship Management (CRM) is that frequent buyers are also extremely loyal. Yet the data analysis clearly shows the ‘weekly regular’ segment has a high level of ‘churn elasticity’. A single negative event (e.g., a late delivery) was found to be 34% more likely to trigger churn among weekly users than among daily users.

We theorise this through the lens of switching cost theory. Due to the daily use of the platform for sustenance (i.e., office lunches), daily users likely built a high ‘procedure switching cost’ due to their habitual behaviour. Conversely, weekly regulars likely use the service as a luxurious or discretionary treat for the weekend. As noted by Burnham et al. [25], discretionary users have lower barriers to exit; if their hedonic expectations are not met, they will immediately switch platforms. As such, this challenges the long-held industry perspective of prioritising only ‘Whales’ (high-volume) without accounting for the customer’s ‘habit strength’. The research findings of Burnham et al. [25] support this position, showing that the psychological barrier of switching costs in the food service industry is breached once the hedonic barrier is crossed, leading to a precipitous decline in retention rates among weekly regulars.

5.2. The Superiority of BERT in Capturing “Silent Attrition” (RQ2)

Research Question 2 also examined whether combining unstructured sentiment data with BERT (Bidirectional Encoder Representations from Transformers) would create incremental value. The study found that hybrid (combined behavioural and sentiment) models will perform better than baseline transactional-only models, improving the F1 Score from 0.76 to 0.89.

This study demonstrates methodological advancement by providing empirical support for moving away from lexicon-based methods within the OFD sector. Previous studies (e.g., VADER (Valence Aware Dictionary and sEntiment Reasoner)) reported sentiment correlations with r of approximately 59 because they could not account for the context. Our implementation of BERT achieved r of approximately 74 compared to real-world churn. The reason for this advantage is that BERT uses a two-directional attention mechanism to identify polysemy and nuance (e.g., it would classify “sick burger” as a positive slang word rather than a negative health complaint), a feature that a simple bag-of-words/phrase method would not capture [23].

The discovery of silent attrition represents one of the most important contributions of this research. Dashboard 2 shows that, in the ‘Sentiment Score vs. Risk Level’ boxplot, users in the high-risk category based on a mathematical approach have very low negative sentiment scores, even though their star ratings (i.e., 4 stars) appear high. Similarly, the twin-axis ‘Sentiment Timeline’ graphically illustrates the psychological decline of sentiment score over time, leading to eventual transactional turnover, with many high-risk users experiencing significant declines in their sentiment score for weeks before they actually churned. By converting non-structured semantic context into continuous variables, this model identified an early disconfirmation component of ECT that is typically captured in standard databases for subsequent verification, thereby considerably reducing the false negative rate.

5.3. Divergent Antecedents of Churn: Utilitarian vs. Hedonic Drivers (RQ3)

Research Question #3 sought to identify drivers of churn specific to each segment studied. Using Random Forest feature importance analysis, we found that the drivers of churn vary across segments, disproving the “one-size-fits-all” assumption prevalent in traditional churn models.

Utilitarian Daily Users: Daily users profited from an inverse correlation between their average delivery times and their sentiment toward the platform. Consequently, daily users perceived the platform primarily as a utility for logistics, so when they experienced deviations from their expected delivery speeds, they immediately began to disconfirm their loyalty to the platform, predicting they would likely disengage.

Economic Seasonal Users: The discount rate is the most significant predictor of seasonal user behaviour, according to the results of the impact-of-discounts regression. This finding is in agreement with the concept of price elasticity. Essentially, a seasonal user’s loyalty is primarily driven by financial incentives and is extremely sensitive to their elimination.

Hedonic Weekly Users: The avg_rating and avg_sentiment have driven churn in this segment. Hedonic weekly users want a high-quality product each time they order. For example, if they ordered food for their Friday night family dinner, they want high-quality food, as that is why they ordered it in the first place.

Literature Comparison: Most existing literature indicates that “Price” or “High Commission” is the primary driver of churn among food apps [1,2]. In contrast, our analysis shows that “Price” is only the main predictor for the seasonal segment (approximately 20% of all users) and that for the high-value daily segment, price is a secondary concern to reliability. Our findings support AbdelAziz et al., [17] “Profit-Centric” modelling approach, as our study indicates that in order to retain customers, retention strategies should be provided based on the segment: for example, providing coupons to a daily user who has churned due to lateness is an ineffective strategy; providing faster delivery to a seasonal user whose primary objective is to receive a discount is just as ineffective and creates waste.

5.4. Methodological Trade-Offs: Interpretability vs. Computational Cost

Hybrid models have unique benefits that must be weighed when creating hybrid frameworks for the scientific community. Based on forecasts from Hybrid Random Forest (HRF) models, the biggest benefit HRF has over traditional machine learning is its interpretability. ‘Black box’ algorithms, such as artificial neural networks (ANN) and deep neural networks (DNN), have frequently faced pushback from industry practitioners due to their inherent inability to clearly explain the reasoning behind specific predictions. The Random Forest algorithm also allows for the extraction of feature importances, so business managers understand how each of their segment churns. For example, Segment A was driven by delivery time, whereas Segment B was driven by sentiment. Furthermore, the daily segment achieved a predictive accuracy of 91.2% using the Hybrid Random Forest method, exceeding the performance of comparable AdaBoost-based ML approaches on tabular datasets of up to 50% [16].

The major shortcoming of the Hybrid Random Forest model is its increased computational latency due to using an NLP pipeline to produce BERT embeddings from large volumes of text. Our logs indicate that implementing Hybrid Random Forests increases processing time by 40% compared to RFM-based logistic regression models. In a live data centre environment where large volumes of transactions (approximately 1 million+) occur every second, this level of latency will be problematic. Therefore, further research should examine whether transformer “distillation” techniques (e.g., DistilBERT, TinyBERT) yield predictive accuracy comparable to that of their full-sized BERT predecessors while achieving 60% of the initial inference speed of their full-sized BERT predecessors. Finally, data for the created cluster centroids were obtained from a single platform (i.e., Amazon); future work should ensure cross-platform validation of the resulting cluster centroids.

6. Implications

Research shows that the results reported here yield major gains in predictive accuracy. The work has broad implications for both the academic literature on consumer behaviour and the practical frameworks used by practitioners in industry. The use of a validated combined methodological approach creates an important link between behavioural analytics and psychological theories and provides a method for achieving sustainable growth within the hyper-competitive online food delivery (OFD) industry.

6.1. Theoretical Implications

This research contributes to the current body of knowledge on customer retention by applying Expectation-Confirmation Theory (ECT) to the Big Data environment. ECT has typically been developed using static survey measures (i.e., perceived performance and confirmation). However, the results of this work indicate that unstructured sentiment scores generated through Natural Language Processing are a far more accurate representation of the confirmation construct than traditional Likert-scale ratings. This work suggests that by quantifying silent attrition within digital ecosystems, confirmation does not simply exist in a binary state (i.e., confirmed or disconfirmed). Rather, as demonstrated by the continuous distributions in our Kernel Density (KDE) and correlation visualisations, expectation confirmation exists as a complex, multi-variate continuum that is best represented through the continuous semantic analysis of user feedback. By modernising the methodological implementation of ECT, it provides evidence that text serves as a form of behavioural signal, rather than merely being noise.

This research challenges the conventional wisdom on frequency–loyalty, suggesting that high use frequency corresponds to high retention through habit formation. Using the weak regular segment to illustrate that such users are often at high risk of switching, this research significantly adds to the body of work on switching cost theory. The authors make the following distinctions regarding the type of switching costs: procedural switching costs are generally high for daily users because they have integrated the service into their daily routines, whereas psychological switching costs are low for weekly users because they generally view the service as a discretionary purchase. This distinction provides further insight into why daily users represent a high-risk group even though they frequently use the service, thereby validating the need to separate habit strength from purchase frequency in future models of customer retention.

For the first time in academic research, it has been found that, for daily users, algorithmic churn predictions are dominated by reliable/speed-utilitarian characteristics, whereas for weekly users, they are dominated by quality/sentiment-hedonic characteristics. Thus, empirical support has been provided to support the Theory of Consumption Value. These results demonstrate that short-term retention behaviour cannot be generalised across all forms of consumption, thereby necessitating future churn models to treat churn drivers as dynamic variables dependent on user-specific consumption modes.

6.2. Managerial Implications

The OFD industry’s practitioners must transition from utilising “Global Retention Strategies” to implementing “Segment-Triggered Interventions”. This change addresses Muñoz’s [26] request for additional ways to detect when a user is leaving or at risk of leaving via a more targeted approach to marketing retention efforts. Currently, the industry norm is to redistribute many discount coupons to re-engage customers at high risk of leaving; however, study results show this approach is relatively ineffective for daily and weekly users, whose reasons for leaving are service failures, not price sensitivity. Managers should consider developing a Fine-Grain Retention Playbook, eliminating redundancy between the daily and weekly user segments. For example, daily users must be restored to a reliable level through automated apologies, and priority delivery upgrades should occur when delays occur, rather than offering discounts. Conversely, market budgets allocated to the seasonal user segment must be invested aggressively, given the symmetric relationship between price elasticity and loyalty. Aligning the incentive/reward with the specific behaviour driving a user (utilitarian vs. economic) will increase ROI for retention campaigns.

To effectively retain customers, it is critical to transition from descriptive reporting to prescriptive action. Figure 5 (Dashboard 4) presents the macro-level ‘Churn Analysis and Recommendation Engine’. Instead of generic aggregations, this interface cross-tabulates risk via a ‘Churn Risk Matrix’ mapping geographic regions against acquisition channels, allowing managers to pinpoint systemic vulnerabilities. Furthermore, the ‘Drivers of Churn’ correlation analysis isolates the statistical weight of individual risk factors. These insights feed directly into the ‘Highest Risk Customers Action List’ table, enabling localised marketing teams to prioritise high-probability churners for immediate, manual intervention.

For high-value accounts, aggregate geographic scores are insufficient. Figure 6 (Dashboard 5) presents the ‘customer risk dashboard’ for deep-dive, micro-level assessment. The ‘risk profile comparison’ radar chart overlays the behavioural footprints of high vs. Low-risk customers, immediately highlighting critical deviations in recency or negative review frequency. Simultaneously, the ‘customer lifetime value by risk level’ violin plot quantifies the actual financial exposure associated with these high-risk cohorts. Bridging the gap between AI prediction and human execution, the dashboard integrates an ‘automated action logs’ module. This engine synthesises the customer’s distinct risk profile with the ‘mitigation strategies impact’ estimations, generating algorithmic, highly specific directives, such as flagging a user for a priority support channel re-engagement, rather than relying on blanket discount distribution.

Silent attrition requires a new KPI design. The industry should not rely on “Average Customer Ratings” (ACR) or Net Promoter Scores (NPS). A dashboard with an ACR of 4.5 Stars creates a false sense of security. A dashboard showing a healthy 4.5 average may be masking a rotting user base, as a negative trend of the underlying written sentiment may indicate a higher user churn rate. Managers must utilise BERT for sentiment tracking to create real-time dashboards that indicate changing sentiments as opposed to changing ratings. When employee rankings on a user’s rating dashboard worsen, it is too late for a customer success team to respond proactively; the customer may have already reached a psychological tipping point and is headed for imminent chastisement.

The findings described above clearly demonstrate a need to implement Dynamic Service Level Agreements (SLA) within the Logistics Algorithm of OFD platforms. The findings confirm that daily users are more sensitive to time since the order was placed, while weekly users are less concerned about order delivery time but more concerned with food quality; therefore, the dispatch algorithm should reflect these differences. During peak demand hours, the dispatch algorithm must prioritise speed and route for daily users to reduce the amount of immediate disconfirmation. On the other hand, during the weekly peak times. As an answer to the weekly orders, or ‘Hedonic’ orders, will be prioritised for handling care and order accuracy, with higher-rated drivers assigned to these orders. Through dynamic allocation of logistics resources, delivery platforms can increase retention efficiency based on the supply chain without the need for additional fleet size through optimising the trade-off between speed and quality.

6.3. Implications for Policy and Society

In addition to their commercial significance, this research has significant implications for both the broader gig economy and urban logistics policy. High-churn platforms are compelled to pursue an aggressive customer acquisition approach that artificially inflates demand in the gig worker supply chain, leading to increased urban congestion. Stabilising user bases through better demand prediction allows platforms to develop more consistent demand patterns and for drivers to optimise their routes with fewer “dead miles” (miles driven without an order), reducing carbon emissions per order. Moreover, the ability to apply semantic analysis creates the potential for more equitable algorithmic management of gig workers. By separating out negative reviews that are direct results of logistics problems (e.g., “traffic caused delays”) from service-related failures (e.g., “rude behaviour”), platforms can create more equitable systems for customer ratings of drivers that do not penalise drivers for systemic inefficiencies beyond their control.

7. Conclusions

The strategic focus of platforms in the online food delivery (OFD) industry has shifted away from aggressively acquiring new users, toward providing reasonably priced, reliable services as a means of retaining (keeping) users over time. This research explores customer loss (churn) issues using an operationalised Expectation-Confirmation Theory (ECT) model (framework). In this model, by their nature, unsupervised behavioural segmentation (i.e., how users behave) and advanced Natural Language Processing (NLP) techniques (to understand what users expect) need to be considered together to accurately predict when customers will stop ordering, based on their expectation of what they receive. The analysis used to answer the three main Research Questions (RQs) produced significant findings:

Behavioural Heterogeneity (RQ1): Through K-Means clustering, it was found that there is a wide range of user types (cohorts)—daily active users (25%), weekly regulars (55%), seasonal users (20%). The results dispel the notion of a frequency–loyalty relationship, as weekly regulars were most likely to stop ordering, due to their greater sensitivity to service failure than daily active users; daily active users are more likely to have become habituated to the service.

The Power of Unstructured Data (RQ2): The integration of sentiment scoring (through the BERT method) into a Random Forest model resulted in an increase in F1 Score predictive performance from 0.76 (baseline) to 0.89 (hybrid). The increased performance confirms that text reviews are strong indicators of silent attrition (or psychological disconfirmation) that does not appear in traditional transactional log data.

Segment-Specific Drivers (RQ3): The analysis of feature importance indicates that the drivers of churn vary by user segment: seasonal users are price (economic value) sensitive, daily active users are speed (utilitarian value) sensitive, and weekly regular users are quality (hedonic value) sensitive. The conclusion is that a one-size-fits-all retention strategy is impractical given the differences in behaviour and drivers across segments.

Prescriptive Visual Analytics (RQ4): The study merges the complex results generated from machine learning with a prescriptive visual analytics dashboard for application in the industry. Through mapping various feature sets such as geographic risk, acquisition channel sentiment, and lifetime value exposure, the developed system creates automated prescriptive action logs from the original raw algorithmic probabilities to facilitate specific interventions.

Limitations and Future Research Directions

The robust contributions of this study come with significant limitations that offer opportunities for further research:

(1) Validation: The proposed hybrid framework has powerful predictive ability and provides useful visual analytics. However, the data used to validate this framework was created from a synthetic dataset. Although this synthetic dataset was designed to behave as closely as possible to real-world characteristics associated with OFD (OFD) mobile applications, synthetic datasets cannot fully replicate every unpredictable detail of how people interact with technology. Future research should include a partnership with an industry provider to validate and refine this framework against proprietary (real-world) transaction logs.

(2) Causal v Correlation: Ideally, the machine learning techniques discussed here (K-Means and Random Forest) identify predictive/correlation relationships, which are characteristic of those algorithms. Because it was observed that higher delivery times showed significant correlations with increased customer churn rates, it must be established through a controlled experimental study (e.g., A/B testing) that delivery time is a cause of customer churn before making any conclusive statements about the effect of delivery time on customer churn.

(3) Geographic and Platform Specificity—The dataset came from one OFD platform within a limited geographic area, and as a result, the cluster centroids identified (e.g., daily = <3 days recency) may not be representative of other markets that have different dining cultures (e.g., some parts of the world view ordering out as a monthly luxury). Therefore, additional research should be conducted using cross-platform validation to test the K-Means/BERT pipeline against various datasets to establish a global standard.

(4) Temporal Dynamics—The current Random Forest model is a static model and provides insights into churn based upon a snapshot of the user’s behaviour, while sequential dependencies (e.g., 10 weeks of continuous decreases in order value) are lost. To improve time-to-event predictions, future studies should use an RNN or LSTM architecture to model the sequence of user interactions over time.

(5) Computational Efficiency—The use of the BERT BASE model has a high computational cost and generates latency. Therefore, as RNN and LSTM models and their architectures are tested in high-frequency trading environments in real-time, future researchers will need to explore knowledge distillation techniques (e.g., DistilBERT or TinyBERT) to reduce inference time without sacrificing semantic meaning.

Customer retention modelling will greatly benefit from the combination of behavioural psychology with machine learning. By treating unstructured textual reviews in conjunction with traditional transaction data as equally important behavioural signals, OFD platforms can move away from being purely reactive in how they monitor churn to being proactive with their retention management efforts.

Author Contributions

Conceptualisation, S.S. (Saranya Shanmugam) and N.M.S.; methodology, S.S. (Saranya Shanmugam); software, D.A. and S.S. (Santhoshkumar Senthilkumar); validation, S.S. (Saranya Shanmugam), N.M.S. and T.M.; formal analysis, S.S. (Saranya Shanmugam) and E.E.; investigation, S.S. (Saranya Shanmugam), E.E. and D.A.; resources, N.M.S.; data curation, S.S. (Saranya Shanmugam) and S.S. (Santhoshkumar Senthilkumar); writing—original draft preparation, S.S. (Saranya Shanmugam); writing—review and editing, N.M.S. and T.M.; visualisation, E.E.; supervision, N.M.S.; project administration, N.M.S.; funding acquisition, N.M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and dataset used can be accessed from: https://osf.io/7nsv6/overview?view_only=0b9a09faecc14ca6850fea7e75bb3c48 (accessed on 16 March 2026).

Acknowledgments

The authors would like to thank institutions for providing the necessary facilities and support to carry out this research. The authors also acknowledge the valuable support and encouragement received from colleagues during this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACR	Average Customer Ratings
AI	Artificial Intelligence
ANN	Artificial Neural Networks
ARI	Adjusted Rand Index
AUC-ROC	Area Under the Receiver Operating Characteristic Curve
B2B	Business-to-Business
BERT	Bidirectional Encoder Representations from Transformers
CAGR	Compounded Annual Growth Rate
CRM	Customer Relationship Management
CV	Cross-Validation
DL	Deep Learning
DNN	Deep Neural Networks
ECT	Expectation-Confirmation Theory
e-WOM	Electronic Word-of-Mouth
HRF	Hybrid Random Forest
KDE	Kernel Density Estimation
KPI	Key Performance Indicator
LR	Logistic Regression
LSTM	Long Short-Term Memory
LTV	Lifetime Value
ML	Machine Learning
NLP	Natural Language Processing
NPS	Net Promoter Scores
OFD	Online Food Delivery
OOV	Out-Of-Vocabulary
PII	Personal Identification Information
RF	Random Forest
RFC	Random Forest Classifier
RFM	Recency, Frequency, Monetary
RNN	Recurrent Neural Network
ROI	Return on Investment
RQ	Research Question
SLA	Service Level Agreements
SMOTE	Synthetic Minority Over-sampling Technique
VADER	Valence Aware Dictionary and sEntiment Reasoner
WCSS	Within-Cluster Sum of Squares
XGBoost	Extreme Gradient Boosting

References

Grand View Research. Online Food Delivery Market Size, Share & Trends Analysis Report by Type, by Business Model, by Region, and Segment Forecasts, 2024–2030 (Report No. GVR-4-68039-334-4); Grand View Research: San Francisco, CA, USA, 2024; Available online: https://www.grandviewresearch.com/industry-analysis/online-food-delivery-market-report (accessed on 14 December 2025).
Grand View Research. India Online Food Delivery Market Size, Share & Trends Analysis Report by Platform, by Business Model, and Segment Forecasts, 2025–2034; Grand View Research: San Francisco, CA, USA, 2024; Available online: https://www.grandviewresearch.com/horizon/outlook/online-food-delivery-market/india (accessed on 14 December 2025).
Dastane, O.; Fazlin, I. Re-investigating key factors of customer satisfaction affecting customer retention for fast food industry. Int. J. Manag. Account. Econ. 2017, 4, 379–400. [Google Scholar] [CrossRef]
Krishna, A.; Suresh, M.; P, R.; Shah, M.V.; Sp, A.; Narassima, M.S.; Thenarasu, M. An analysis of the factors influencing sustainable online food delivery. J. Foodserv. Bus. Res. 2025, 28, 622–649. [Google Scholar] [CrossRef]
Boldureanu, D.; Gutu, I.; Boldureanu, G. Understanding the Dynamics of e-WOM in Food Delivery Services: A SmartPLS Analysis of Consumer Acceptance. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 18. [Google Scholar] [CrossRef]
Chinelato, F.B.; Gonçalves Filho, C. The bridge between “click” and loyalty in food delivery apps. J. Hosp. Tour. Insights 2025, in press. [Google Scholar] [CrossRef]
Saleh, A. Customer Acquisition vs. Retention Costs—Statistics and Trends. Invesp Consult. 2024. Available online: https://www.invespcro.com/blog/customer-acquisition-retention/ (accessed on 10 December 2025).
Oliver, R.L. A cognitive model of the antecedents and consequences of satisfaction decisions. J. Mark. Res. 1980, 17, 460–469. [Google Scholar] [CrossRef]
Peng, Y.; Wang, Y.; Li, J.; Yang, Q. Impact of AI-oriented live-streaming E-commerce service failures on consumer disen-gagement—Empirical evidence from China. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 1580–1598. [Google Scholar] [CrossRef]
Bhattacherjee, A. Understanding information systems continuance: An expectation-confirmation model. MIS Q. 2001, 25, 351–370. [Google Scholar] [CrossRef]
Yang, P.Y.; Chaw, J.K.; Cheng, X.; Ang, M.C.; Salim, M.H.M. Exploring consumer behaviour patterns and modelling churn prediction in the food delivery service industry: A case study. J. Inf. Syst. Technol. Manag. 2023, 8, 53–68. [Google Scholar] [CrossRef]
Khodabandehlou, S.; Zivari Rahman, M. Comparison of supervised machine learning techniques for customer churn predic-tion based on analysis of customer behavior. J. Syst. Inf. Technol. 2017, 19, 65–93. [Google Scholar] [CrossRef]
Yaiprasert, C.; Hidayanto, A.N. AI-driven ensemble machine learning to enhance digital marketing strategies in the food delivery business. Intell. Syst. Appl. 2023, 18, 200235. [Google Scholar] [CrossRef]
Noviyasari, C.; Kasiran, M.K. An expectation-confirmation model of continuance intention to enhance e-wallet. J. Theor. Appl. Inf. Technol. 2022, 99, 6028–6041. [Google Scholar]
Mustafa, D.; Khabour, S.M.; Al-Kfairy, M.; Shatnawi, A. Leveraging sentiment analysis of food delivery services reviews using deep learning and word embedding. PeerJ Comput. Sci. 2025, 11, e2669. [Google Scholar] [CrossRef] [PubMed]
Thamizhselvi, D.; Punitha, A.; Kumar, R.S.; Godson, V.R.; Bharath, S. Churn analysis for restaurant. In Proceedings of the 2024 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 8–9 October 2024; pp. 1–6. [Google Scholar] [CrossRef]
AbdelAziz, N.M.; Bekheet, M.; Salah, A.; El-Saber, N.; AbdelMoneim, W.T. A comprehensive evaluation of machine learning and deep learning models for churn prediction. Information 2025, 16, 537. [Google Scholar] [CrossRef]
Bunea, O.I.; Corboș, R.A.; Mișu, S.I.; Triculescu, M.; Trifu, A. The next-generation shopper: A study of generation-Z percep-tions of AI in online shopping. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 2605–2629. [Google Scholar] [CrossRef]
Tabianan, K.; Velu, S.; Ravi, V. K-means clustering approach for intelligent customer segmentation using customer purchase behavior data. Sustainability 2022, 14, 7243. [Google Scholar] [CrossRef]
Syahra, Y.; Fadlil, A.; Yuliansyah, H. Customer segmentation using RFM and K-means clustering to support CRM in retail industry. Sinkron: J. Dan Penelit. Tek. Inform. 2025, 9, 1120–1131. [Google Scholar] [CrossRef]
Jafari Momtaz, N.; Alizadeh, S.; Sharif Vaghefi, M. A new model for assessment fast food customer behavior: Case study of an Iranian fast-food restaurant. Br. Food J. 2013, 115, 601–613. [Google Scholar] [CrossRef]
Halim, E.; Jonathan, M.; Krisbiantoro, A.M.; Johanes, C.; Sugandi, L.; Poba-Nzaou, P. Factors influencing user retention of Kios-K machines in food and beverage services. In Proceedings of the 2024 International Conference on Creative Communi-cation and Innovative Technology (ICCIT), Tangerang, Indonesia, 7–8 August 2024; pp. 1–7. [Google Scholar] [CrossRef]
Catelli, R.; Pelosi, S.; Esposito, M. Lexicon-based vs. BERT-based sentiment analysis: A comparative study in Italian. Electronics 2022, 11, 374. [Google Scholar] [CrossRef]
Ishan, A.; Stanley, J.; Nikhil, J.; Akshar, P.; Kumar, V.; Mallikarjuna, D. Sentiment analysis in healthcare: A comparison of VADER, BERT, and Flair NLP models on patient reviews of pain management physicians. Cureus 2025, 17, e88902. [Google Scholar] [CrossRef] [PubMed]
Burnham, T.A.; Frels, J.K.; Mahajan, V. Consumer switching costs: A typology, antecedents, and consequences. J. Acad. Mark. Sci. 2003, 31, 109–126. [Google Scholar] [CrossRef]
Muñoz, L.E. Customer Churn Detection and Marketing Retention Strategies in the Online Food Delivery Business. Master’s Thesis, Polytechnic University of Turin, Turin, Italy, 2022. Available online: https://webthesis.biblio.polito.it/22234/ (accessed on 12 December 2025).

Figure 1. Research Framework.

Figure 2. Churn risk analysis dashboard.

Figure 3. Temporal sentiment analysis.

Figure 4. Sentiment Drivers Analysis.

Figure 5. Churn analysis and recommendation engine.

Figure 6. Customer risk assessment detail.

Table 1. Data provenance and analytical pipeline mapping.

File Name	File Type and Purpose	Record Count (Initial → Final)	Fields Analysed Exact Filter Criteria
food_delivery_churn_raw_12000.csv	Raw Data: Contains chronological order logs. Source for calculating all recency, frequency, and monetary (RFM) variables.	12,000 transactions → 12,000 transactions	Fields: order_id, customer_id, order_date, delivery_time_min. Filter Criteria: Missing numerical values (e.g., delivery time) imputed via median. Rows missing critical primary keys dropped. Used to plot time-series trends.
customer_reviews.csv	Raw Data: Contains unstructured reviews. Source for BERT sentiment extraction.	3000 reviews → 3000 reviews	Fields: review_text, customer_id. Filter Criteria: Reviews containing only special characters or missing text fields removed prior to NLP tokenization. Used in dashboards to map word frequencies.
customer_features.csv	Intermediate File: Aggregates raw transactions into distinct profiles for clustering.	12,000 transactions → 1500 unique customers	Fields: customer_id, total_orders. Filter Criteria: Cold-start filter applied. Any customer_id with 3 or less transactions was removed to ensure sufficient behavioural history.
customer_segmented.csv	Intermediate File: Contains the unsupervised classification outputs.	1500 customers → 1500 customers	Fields: RFM features (Z-score normalised). Filter Criteria: Appends the assigned K-Means cluster_id (k = 3) to the surviving 1500 customers.
customer_sentiment.csv	Intermediate File: Contains the processed BERT outputs aggregated by user.	3000 reviews → 1260 customers	Fields Analysed: avg_sentiment, neg_review_ratio. Criteria: Softmax probabilities from BERT aggregated into continuous scores. (Contains only the 1260 users who left reviews).
customer_churn_labeled.csv	Intermediate File: The binary target variable file used for supervised learning.	1500 customers → 1500 customers	Fields: customer_id, churn (0 = Retained, 1 = Churned). Filter Criteria: Mapped exactly to the 1500 segmented users.
customer_level_trained_dataset.csv	Pre-Training Master: The dataset containing predictions from the Segment-Specific Random Forest models.	1500 customers → 1500 customers	Fields: RFM metrics, avg_rating, engineered sentiment features. Criteria: Inner join of segments, sentiment, and labels. The 240 users lacking review text received an imputed neutral score (3.0). Appends the churn_probability outputs generated by training three independent Random Forest Classifiers (one for each behavioural segment).
analytics_master.csv	Final Output Master: The ultimate output file combining raw features, segments, sentiment, and algorithmic predictions.	1500 customers → 1500 customers	Criteria: Appends Random Forest probability outputs (churn_probability) and categorised risk_level to the trained dataset. Rejoined with the raw transaction and review files to generate Visual Dashboards 1 through 5.

Table 2. Behavioural cluster centroids.

Cluster Profile	User Segment	Order Frequency	Recency	Primary Churn Driver	Behavioural Motivation
Cluster 1	Daily Active (25%)	High (>15/mo)	Low (<3 days)	Delivery Delay	Utilitarian (Habit/Routine)
Cluster 2	Weekly Regulars (55%)	Medium (Weekends)	Medium (5–7 days)	Food Quality/Rating	Hedonic (Treat/Experience)
Cluster 3	Seasonal/At-Risk (20%)	Low (<2/mo)	High (>21 days)	Price/Discount Removal	Economics (Price Sensitivity)

Table 3. Performance comparison (baseline vs. hybrid model).

Model Type	Architecture	Input Features	Macro F1 Score	Accuracy	False Negative Rate
Baseline	Global Random Forest	Transactional Only (RFM)	0.76	82.4%	High
Proposed Hybrid	Segmented Random Forest	Transactional + BERT Sentiment	0.89	91.2% *	Reduced by 14%

*—statistical significance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shanmugam, S.; Elavarasan, E.; Madhavarao Seshadri, N.; Ashokkumar, D.; Senthilkumar, S.; Mohanavelu, T. A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy. J. Theor. Appl. Electron. Commer. Res. 2026, 21, 93. https://doi.org/10.3390/jtaer21030093

AMA Style

Shanmugam S, Elavarasan E, Madhavarao Seshadri N, Ashokkumar D, Senthilkumar S, Mohanavelu T. A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy. Journal of Theoretical and Applied Electronic Commerce Research. 2026; 21(3):93. https://doi.org/10.3390/jtaer21030093

Chicago/Turabian Style

Shanmugam, Saranya, Einiyaselvi Elavarasan, Narassima Madhavarao Seshadri, Dharun Ashokkumar, Santhoshkumar Senthilkumar, and Thenarasu Mohanavelu. 2026. "A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy" Journal of Theoretical and Applied Electronic Commerce Research 21, no. 3: 93. https://doi.org/10.3390/jtaer21030093

APA Style

Shanmugam, S., Elavarasan, E., Madhavarao Seshadri, N., Ashokkumar, D., Senthilkumar, S., & Mohanavelu, T. (2026). A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy. Journal of Theoretical and Applied Electronic Commerce Research, 21(3), 93. https://doi.org/10.3390/jtaer21030093

Article Menu

A Segmented Machine Learning Approach to Predicting and Mitigating Churn in the Gig Economy

Abstract

1. Introduction

2. Literature Review

2.1. Theoretical Framework: Expectation-Confirmation Theory (ECT)

2.2. The Evolution of Churn Prediction Models

2.3. Behavioural Segmentation: Moving Beyond Static RFM

2.4. Unstructured Data and Advanced Sentiment Analysis

2.5. Research Gap and Proposed Framework

3. Research Methodology

3.1. Data Collection and Sampling

3.2. Phase 1: Behavioural Segmentation (K-Means Clustering)

3.3. Phase 2: Sentiment Feature Extraction (BERT)

3.4. Phase 3: Hybrid Churn Prediction (Random Forest)

3.5. Phase 4: Visual Analytics and Prescriptive Modelling

4. Results

4.1. Behavioural Segmentation Profiles (RQ1)

4.2. Impact of Sentiment on Churn (RQ2)

4.3. Predictive Model Performance (RQ3)

4.4. Determinants of Churn (Feature Importance)

5. Discussions

5.1. Reevaluating Behavioural Loyalty: The Fallacy of Frequency (RQ1)

5.2. The Superiority of BERT in Capturing “Silent Attrition” (RQ2)

5.3. Divergent Antecedents of Churn: Utilitarian vs. Hedonic Drivers (RQ3)

5.4. Methodological Trade-Offs: Interpretability vs. Computational Cost

6. Implications

6.1. Theoretical Implications

6.2. Managerial Implications

6.3. Implications for Policy and Society

7. Conclusions

Limitations and Future Research Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI