1. Introduction
Customer Segmentation plays a significant role in enabling personalized marketing strategies, boosting customer satisfaction, and enhancing overall business outcomes. Recency, Frequency, and Monetary (RFM) analysis has proven to be a powerful technique for understanding customer value and behavior in marketing analytics [
1]. It evaluates customer engagement across three behavioral metrics: Recency value (how recently they bought something), Frequency value (how often they make a purchase), and Monetary value (how much they spend). These metrics enable organizations to classify customers based on their similar purchase behaviors.
Despite its widespread use, traditional RFM-based segmentation relies primarily on rule-based logic, which may limit its ability to capture certain nonlinear patterns in real-world datasets. In the contemporary business environment, customer behavior can be multi-faceted and sophisticated and influenced by a variety of dynamic factors. Traditional approaches, such as statistical analysis and clustering, may not always fully capture the intricate relationships within data, sometimes leading to simplified categorizations, limited understanding, and incomplete interpretations. They sometimes offer static customer groupings that lack the adaptability required in dynamic market scenarios. These shortcomings have created a need for a potentially more robust, intelligent, insightful, and adaptable segmentation technique using Deep Learning.
In this study, we introduce RFM-Net, a customer segmentation approach that fuses the effectiveness of RFM analysis with the strong learning capability of deep learning. By designing a Convolutional Neural Network (CNN) trained on RFM features, RFM-Net effectively categorizes customers into actionable groups, including Champions, Loyal Customers, Potential Loyalists, Need Attention, At Risk, About to Sleep, and Hibernating. The proposed approach leverages both domain knowledge and data-driven learning, making it highly adaptable to diverse business contexts.
RFM-Net addresses the challenges by synergizing domain-driven features (RFM) with a data-driven modeling approach (deep learning). This combination allows the model not only to preserve the interpretability of traditional RFM analysis but also to enhance it through automated learning and feature extraction, enabling deeper insight into customer dynamics. As a result, this provides businesses with a segmentation solution that is both intuitive for decision-makers and technically advanced for data scientists and analysts. A primary aim of this research is to enable businesses to gain a more profound and data-driven understanding of their customers, thereby facilitating customer-centric marketing strategies that are not only targeted but also efficient and scalable.
Unlike traditional CNN architectures like AlexNet, GoogleNet, VGG, and ResNet; the architecture of RFM-Net is relatively shallow, consisting of only a few layers and a minimal number of parameters. Its lightweight structure is both computationally efficient and highly interpretable, allowing for real-time applications and ease of deployment. Unlike deeper models, RFM-Net is optimized for low-dimensional and tabular RFM data samples to achieve effective generalization without overfitting. It specially adapts the CNN power to structured customer behavior metrics. Moreover, RFM-Net eliminates the need for extensive feature engineering or manual clustering.
The contributions of this paper are fourfold:
RFM-Net: Generating labels through rule-based logic derived from RFM scores and using them as supervised ground truth. Learning an expert-defined mapping to represent customer segmentation instead of discovering a new segmentation structure from the data.
Validation and Comparison: Validated RFM-Net by a high accuracy of 94.33% using a real-world dataset and demonstrated a significant relative increase of 13.17% over previously reported results in terms of classification accuracy.
Strategic Insights: Providing businesses with a reliable, advanced, and powerful solution for customer segmentation in marketing.
Benefits: Unlike traditional CNN architectures, RFM-Net offers a more compact and lightweight framework with fewer layers and parameters, enhanced interpretability, and optimization for customer segmentation using structured RFM data, with reduced risk of overfitting, enabling end-to-end learning without the need for a separate clustering algorithm.
The rest of the paper is structured as follows:
Section 2 provides an overview of existing customer segmentation methods.
Section 3 details the research methodology and introduces the proposed RFM-Net model.
Section 4 reports the experimental results and comparative analysis, discussing the findings and their practical implications. Finally,
Section 5 wraps up the study and highlights avenues for future work.
2. Related Work
This section provides a systematic review of current studies in customer segmentation.
Table 1 provides a summary of previous studies [
2,
3,
4,
5,
6,
7,
8,
9,
10], including the methods employed, their application purposes, the types of tasks addressed (classification or clustering), the performance metrics used, and the regions in which the studies were conducted.
Some studies in the literature treat customer segmentation solely as a clustering task [
2,
3,
4,
7,
8], whereas others address it within a classification framework as well [
5,
6,
9,
10]. In [
2], researchers clustered customers into different segments as high spenders and seasonal shoppers. In [
3], customers were divided into four different groups using clustering tasks, and the importance of this in determining marketing strategies was emphasized. In [
4], where a three-dimensional segmentation model was applied, a system-based clustering approach was adopted. The study was implemented under real-world conditions with customers of a company in the postal market. In [
5], where B2C e-commerce customers were segmented based on their shopping behavior, the problem situation was treated as both clustering and classification tasks. In [
6], machine learning (ML) methods were used to examine the preferences and behaviors of e-commerce customers. The researchers divided customers into three main groups (young, unemployed, and female e-customers; retirees and the elderly; and employed, highly educated, and middle-aged men). After that, classification was performed using the labeled data obtained in the study. There are also studies in the literature that utilize ML methods to separate customers into different categories based on their similar behaviors, with the goal of recommending the right product to the right customer for long-term profit [
7,
8].
Some studies have utilized traditional ML methods, such as Support Vector Machine (SVM) [
5,
6], Naive Bayes (NB) [
6], Decision Tree (DT) [
6,
10], K-Nearest Neighbors (KNN) [
6], and Artificial Neural Networks (ANN) [
10]. SVM exhibited better prediction performance with higher accuracy than other models in B2C e-commerce customers’ churn prediction model [
5]. It was also among the machine learning classifiers applied to identify e-customer profiles (clusters) and was one of the algorithms showing the highest overall classification performance [
6]. The Gaussian Naïve Bayes (GNB) algorithm was also used to identify the same e-customer profiles, but it showed lower performance in multi-class accuracy and Area Under the Curve (AUC) metrics than ensemble methods, such as Random Forest [
6]. The Classification and Regression Tree (CART) technique was applied to determine the importance of 17 key factors affecting customer satisfaction after segmenting IoT customers [
10]. The KNN algorithm was among the tested algorithms in e-customer profile classification, but it exhibited lower performance due to the variability in the categorical dataset [
6]. Finally, Self-Organizing Map (SOM), a model based on ANN, was applied to group IoT customers according to their device usage patterns [
10].
Clustering approaches used in customer segmentation studies vary. Most of them used K-means [
2,
3,
5,
7,
8], while others employed hierarchical clustering algorithms [
2,
6,
8] or DBSCAN [
3]. K-means-based approaches generally aim to segment customer behavior data [
5,
7] or retail customer value [
8]. For example, in [
5], K-means was shown to significantly increase the performance of the churn prediction model by categorizing B2C e-commerce customers into three groups according to their shopping behavior. Ref. [
7] presented an improved K-means approach, namely SAPK + K-means, to analyze Malaysian e-commerce customer purchasing behavior data and identify the most profitable customer segments. In [
3], K-means was employed in the preprocessing stage of the hybrid KM-DBSCAN algorithm, which enables segmenting bank customers into four distinct groups and can handle noisy data. In [
2], customer segments were identified using K-means after dimensionality reduction (FAMD) in mixed datasets containing both numerical and categorical variables. In [
8], improved algorithms were proposed to address the shortcomings of the K-means algorithm, including the initial value of
k and the tendency to fall into a local optimal solution, for retail customer classification and quantification of customer value systems. In [
6], the Hierarchical Clustering on Principal Components (HCPC) method was utilized to model and label e-customer profiles based on demographic factors. In [
2], Agglomerative Clustering was applied in comparison with K-means to validate segmentation results in mixed datasets. Similarly, a hierarchical clustering algorithm was also used in [
8] to achieve higher efficiency.
Studies in the field of customer segmentation and analysis have focused on various objectives that respond to different industry needs and data types. Some of these studies address Customer Churn Prediction [
5,
9,
11] and Customer Life Value [
4,
7,
8], while others address Customer Behavior [
2,
6,
10]. In studies [
5,
9], the researchers focused on predicting customer churn in e-commerce retail. In [
11], the aim was to predict customer retention behavior and predict loyalty. In [
9], the critical impact of behavioral factors such as shipping costs, product categories, and customer initial purchase value on churn is investigated and revealed. The study [
8] aimed at segmenting retail customers into value-based categories; customer value was quantified, and then a structure supporting CRM decision-making processes was created by applying improved k-means variants. Studies adopting a sustainable development perspective have focused on identifying strategic segments with high potential, low cost, and high relationship value [
4,
7]. Studies aimed at understanding customer behavior have addressed demographic influences, online preferences, and interactions within the IoT ecosystem [
2,
6,
10]. A study examining Serbian e-commerce users modeled the relationships between demographic variables and online behavior, showing that the resulting three basic profiles could later be used as labels for classification models [
6]. Studies examining the behavioral patterns of IoT users have analyzed the importance of factors affecting customer satisfaction [
10]. There are also studies that perform segmentation on mixed datasets containing both categorical and numerical variables; in this context, distinct groups such as seasonal buyers and high-spending customers have been identified [
2].
Performance evaluation metrics across studies differ depending on whether the task is classification or clustering. Since the majority of the reviewed studies were focused on classification, they primarily use metrics such as accuracy [
3,
5,
6,
11,
12,
13], precision [
3,
5,
6,
11,
12,
13], and recall [
3,
5,
6,
11,
12,
13]. These metrics are generally calculated using a confusion matrix [
5,
12,
13]. Furthermore, the F1-score [
3,
6,
11,
12,
13] and AUROC [
5,
6,
9,
12,
13], which provide performance evaluations that are insensitive to the classification threshold, have also been widely applied. In clustering problems [
2,
3], metrics such as Silhouette coefficient, Calinski–Harabasz index (CHI), and Davies–Bouldin index (DBI) were used for quality assessment.
Customer segmentation research conducted across different geographies has yielded important findings about how these models interact with regional market structures, cultural consumption habits, and industry conditions. Studies in the European context [
3,
4,
6] have focused primarily on regulated service sectors and e-commerce; in Slovakia, system-based segmentation models have been applied to corporate customers [
4]; in Serbia, the impact of demographic variables on online behavior has been examined [
6]; and in Portugal, banking customers have been classified using advanced clustering techniques [
3]. Studies in Asia [
7,
8,
12] have focused on topics such as customer value management and purchasing behavior; in China, retail customer value has been segmented using analytical methods [
8], and in Malaysia, “smart segments” have been created by modeling e-commerce behavior with K-Means [
7]. Among the studies conducted in South America, a comprehensive segmentation and churn forecasting study for the Brazilian e-commerce market stands out, combining transaction records and socio-demographic indicators [
9]. Additionally, retail datasets from the United Kingdom [
12] and the United States [
2] were used as international references when regional data were limited. This diversity demonstrates that customer segmentation models are sensitive to regional context rather than universal and emphasizes the need to carefully evaluate the adaptability of the methods used to different markets [
2,
4,
6].
Customer segmentation has been studied for different regions such as China [
8], Brazil [
9], Malaysia [
7], Serbia [
6], Iran [
10], Slovakia [
4], Portugal [
3], the United States [
2], and the United Kingdom [
12]. These studies demonstrate how customer behavior varies across different geographic and cultural contexts. In Europe, segmentation models were developed for postal and banking services [
3,
4], while in Asia, retail customer value and e-commerce behavior were highlighted [
7,
8]. In South America, churn prediction was addressed using multi-source data [
9], while datasets from the US and UK were utilized as common reference points for method comparisons [
2,
12]. This diversity demonstrates that segmentation approaches are sensitive to regional dynamics and that models need to be adapted to different market conditions [
4].
Data used in customer segmentation studies generally fall into basic categories such as sales transactions, behavioral logs, demographic attributes, and geo-socio-demographic indicators. Many studies have modeled customer patterns using transactional/behavioral data such as purchase history, product views, and basket and favorite interactions [
2,
5,
7,
12]. In addition, demographic data such as age, gender, education, income, and household characteristics have been widely used in both e-commerce user profiles [
6] and IoT customer behavior analysis [
8,
10]. Geo-socio-demographic data, including indicators such as regional population structure, income level, and urban/rural location, have played a significant role, especially in churn forecasting studies [
9]. In some sectors, structured data such as bank customer records [
3] or ESG-based sustainability criteria [
4] have been integrated into the model to create an objective-focused segmentation.
In addition to the studies discussed above, there are also studies that combine RFM analysis with machine learning techniques for customer segmentation [
14,
15,
16,
17,
18]. In [
16], researchers explored hybrid analytical processes that integrate RFM features with supervised or quasi-supervised learning components. Studies [
14,
17] incorporated RFM variables into clustering-based frameworks to support managerial decision-making processes. Similarly, in [
15,
18], RFM-focused segmentation strategies were primarily used within center-based clustering paradigms to divide customers into predefined behavioral layers (e.g., high-value, medium-value, low-value segments). The methodologies of these studies emphasize segment-based optimization and statistical grouping, generally relying on distance measures and the intuitive selection of cluster numbers. Unlike studies that primarily relied on cluster-focused approaches, we adopted a deep-learning-based solution.
Considering the comparison above, conventional statistical and clustering approaches may offer a reasonable solution for managing the customer segments. However, in scenarios where customer behavior exhibits potentially nonlinear or intricate patterns, deep-learning models might offer additional advantages. To explore this possibility, this study proposes RFM-Net—an approach that combines the predictive capabilities of deep learning with the strategic insights of RFM analysis, aiming to augment the capabilities of existing methodologies.
3. Materials and Methods
3.1. Proposed Method (RFM-Net)
This study introduces RFM-Net, a customer segment classification approach that fuses the valuable insights of RFM analysis with the powerful learning abilities of deep learning. RFM-Net incorporates a specialized convolutional neural network architecture that processes the historical purchase data of the customers, transforming raw RFM inputs into meaningful customer segments. This model identifies intricate behavioral patterns, allowing businesses to categorize customers into strategically relevant groups such as Loyal Customers, About to Sleep, Need Attention, and At Risk.
Figure 1 presents the general framework of the proposed approach. The methodology follows a structured pipeline that begins with raw data acquisition and proceeds through preprocessing and feature engineering to prepare the dataset for analysis. After the labeled data are organized, the model is systematically trained and evaluated. The subsequent stages are then followed by generating predictions and ultimately transforming these outputs into actionable decision-making support.
Algorithm 1 outlines the RFM-Net methodology in a formal, step-by-step manner.
Step 1—Data Acquisition: Historical raw customer data is acquired, focusing on marketing transactions such as purchases and returns. This data can be stored in a cloud-based platform to ensure scalability, availability, and efficient storage. Formally, let
denote the dataset consisting of
transactions, where each transaction
contains fields like customer ID, invoice number, transaction date, quantity, and unit price. For each customer
, the algorithm gathers all of their transactions, as given in Equation (1):
Step 2—Data Preprocessing: The raw data undergoes several pre-processing steps to ensure its quality, integrity, and usability.
- -
Feature Selection: In this step, only the fields relevant to RFM analysis and customer behavior modeling are retained, such as transaction date and amount. Several non-essential columns, such as country, product number, and name, are excluded from the dataset to reduce dimensionality and computational complexity.
- -
Data Cleaning: Several preprocessing operations are applied to ensure the integrity and consistency of the dataset, including the removal of return transactions, the handling of missing values, and the exclusion of irrelevant entries.
- -
Data Transformation: The raw transactional records are aggregated for each unique customer to calculate Recency, Frequency, and Monetary (RFM) metrics, thereby quantifying their purchasing behavior.
Recency (
) is calculated as the number of days since a customer’s most recent purchase (
) relative to the latest date (
) in the dataset, as given in Equation (2). This metric helps distinguish between active and dormant customers.
Frequency (
) reflects how often a customer has transacted. It is calculated by the total number of distinct purchase events (invoices) associated with the customer
, as given in Equation (3). Higher frequency typically indicates strong engagement with the marketing platform.
Monetary (
) represents the total monetary value of all purchases made by a specific customer
over a particular time period. It is derived by summing the product of the quantity and unit price of each transaction associated with the customer, as given in Equation (4). Monetary value quantifies the cumulative financial contribution of a customer to the business. This metric is particularly useful in identifying high-value customers who generate significant revenue. It helps differentiate between low-spending and high-spending customers, thereby supporting targeted marketing strategies, resource allocation, and personalized service offerings.
At the end of the first step, each customer is represented by a three-dimensional feature vector . These three features form the core input representation for training the deep learning model.
Step 3—RFM Feature Engineering: Each customer is assigned three numerical scores (R, F, and M), which collectively capture their engagement level and transactional behavior. The continuous RFM metrics are discretized into categorical scores ranging from 1 to 5 according to the user-defined threshold values, allowing the model to normalize the dataset and better capture customer behavior patterns. This process sorts the values, determines thresholds, and divides each metric into five segments. Let
be the thresholds of recency,
for frequency, and
for monetary, each customer’s RFM value is mapped to a discrete score between 1 and 5 using these thresholds. For example, a customer who spends a large amount overall would receive a top monetary score (e.g., 5), reflecting strong financial value, whereas one with minimal total spending would be assigned a lower score (e.g., 1), indicating limited contribution. The recency score is assigned inversely—meaning that a lower recency value (i.e., more recent purchases) results in a higher score— while the frequency and monetary scores are assigned directly, with higher values yielding higher scores, as given in Equation (5).
At the end of this step, each customer’s RFM scores () are saved to be further used for assigning segment labels. These categorical scores normalize customer activity, making it easier to compare and group customers based on similar behavioral patterns.
Step 4—Data Annotation: Each customer RFM score triplet (
R,
F,
M) is subsequently mapped to a predefined customer segment using rule-based logic. Segment definitions follow established marketing taxonomy, including labels such as “Champions”, “Loyal Customers”, “Potential Loyalists”, “At Risk”, and others, depending on combinations of high or low RFM scores. For example, if the Recency (
R), Frequency (
F), and Monetary (
M) scores are each greater than 4, the customer is classified into the “Champions” segment, which comprises the most active and profitable customers. Similarly, customers with Recency, Frequency, and Monetary scores of
R < 2,
F < 2, and
M < 2 are classified into the “Hibernating” segment, indicating they are low-value customers (likely to be lost).
Table 2 presents RFM segment criteria, characteristics, and their corresponding strategy suggestions. We identified seven distinct groups of customers based on their transaction history, frequency, and spending habits, similar to the study [
19]. This rule-based labeling offers an interpretable and actionable approach to categorizing customers based on their behavior. It creates a supervised dataset in which continuous RFM values serve as features and customer segments as class labels.
As given in
Table 2, differentiated marketing strategies can be implemented for each customer segment identified by RFM-Net. For the “Champions”, who are the most high-value customers, strategies such as VIP programs, personalized loyalty rewards, and early access to new products can reinforce their satisfaction. “Loyal Customers” represent a stable base and can be motivated with membership programs and periodic appreciation messages to maintain their engagement. The “Potential Loyalists”—recent but not yet frequent buyers—could be cultivated with welcome campaigns, product education content, customized communications, and behavior-based product recommendations. For the customers in the “Need Attention” category, surveys can be implemented to understand their needs. “About to Sleep” customers might be reactivated with re-engagement emails, tailored discounts, or product bundles based on past behavior. The “At Risk” group requires stronger interventions such as targeted win-back strategies, deeper discounts, or urgent limited-time offers. Lastly, “Hibernating” customers, with the lowest engagement and value, may benefit from lower-cost marketing streams, generic bulk offers, or reminders to prevent churn. Designing special communication and promotions according to the purchasing behavior of each group ensures a more efficient allocation of marketing resources, strengthens overall customer relationship management, and maximizes customer lifetime value.
Figure 2 illustrates the customer segmentation grid derived from RFM analysis, where the x-axis represents Recency scores (how recently a customer bought something), and the y-axis combines Frequency and Monetary scores (how often and how much a customer spends). Each dimension is scored on a scale from 1 to 5, with 5 representing the highest level of customer behavior (e.g., most recent, most frequent, or highest spending) and 1 indicating the least desirable behavior. Based on their scores, all customers are organized into seven predefined groups on the grid. Customers positioned in the upper-right quadrant, such as Champions and Loyal Customers, are the most valuable ones (high recency, high frequency/monetary). Customer engagement decreases as one moves leftward and downward in the grid. Segments located in the lower-left quadrant (e.g., Hibernating) represent customers who have minimal interaction and exhibit low spending behavior, potentially indicating that they are lost customers. The grid provides a clear, quick, and strategic overview of customer behavior, enabling companies to differentiate and interpret customer value and engagement levels at a glance.
Step 5—Data Splitting: The annotated dataset is partitioned into three distinct parts: a training set used to build the CNN model, a validation set to monitor training (i.e., early stopping for preventing overfitting), and a test set for evaluating the generalization performance of the model on unseen data.
Step 6—Model Training: A CNN is then trained to learn the complex relationships between RFM patterns and customer segments. The CNN architecture includes layers such as convolutional kernels to capture patterns, pooling layers to decrease data dimensionality, and dense layers to carry out the classification task. The input to the model consists of the numerical values of recency, frequency, and monetary metrics for each customer. The output of the model is a probability distribution over possible customer segments, enabling it to predict the most likely class for each new customer. Model training can be repeated using different hyperparameter configurations in order to improve performance and generalization.
Step 7—Model Evaluation: Once the CNN architecture is trained, the model is evaluated on the test set to predict segment labels for previously unseen customers using standard performance indicators such as accuracy, recall, precision, f-measure, and confusion matrix. These indicators help evaluate the ability of the model to distinguish between customer segments. The algorithm outputs
, the predicted customer segment labels for the test instances based on their RFM values, as given in Equation (6).
Step 8—Prediction: The CNN model is used to classify unseen customers into one of the following strategic segments, such as champions, at risk, or hibernating. This predictive capability enables businesses to gain actionable insights for targeted marketing strategies.
Step 9—Decision-Making Support: The final predictions are presented to business decision-makers through an interpretable dashboard or reporting system. Segment-based visualizations and analytics enable marketing teams to make informed decisions regarding campaign design, customer retention, and resource allocation.
Overall, the proposed methodology provides a hybrid solution. By combining the interpretability of RFM analysis with the predictive power of deep learning, RFM-Net offers a scalable, data-driven approach to customer segmentation that is well-suited for real-world applications in marketing, customer relationship management, and personalized recommendation systems. It enables organizations to understand customer behavior in a structured way, while also leveraging machine learning to automate and scale the segmentation process for real-time applications.
| Algorithm 1: RFM-Net: Recency-Frequency-Monetary-based Neural Network |
Inputs: : threshold values for recency, frequency, and monetary, respectively Outputs: : predicted customer segment labels for the test samples |
Begin: where // Step 1: Data acquisition // Step 2: Data preprocessing // Reference date: The most recent date in the dataset // The set of unique customer numbers foreach do // Calculate recency, frequency, monetary values // All transactions belonging to customer // Recency: Days since the customer’s last purchase // Frequency: Number of unique invoices // Monetary: Total spending end foreach // Step 3: RFM feature engineering foreach do // Assign RFM scores (1 to 5) based on thresholds for from 1 to 5 do if then break end if if then break end if if then break end if end for end foreach // Step 4: Data annotation foreach do // Rule-based labeling if then ”Champions” else if then ”Loyal Customers” else if then ”Potential Loyalists” else if then ”Need Attention” else if then ”About to Sleep” else if then ”At Risk” else if then ”Hibernating” end foreach TrainSet, ValidationSet, TestSet ← split(Data) // Step 5: Data splitting // Perform hyperparameter tuning // Step 6: Train the CNN model // inputs (R,F,M values), output segment label foreach sample in TestSet // Step 7: Test model end foreach End |
3.2. The Proposed CNN Architecture
In this study, we propose a deep learning architecture called RFM-Net, designed to enhance customer segmentation by integrating the classical Recency, Frequency, and Monetary framework with a custom-built Convolutional Neural Network. The aim of RFM-Net is to help businesses understand customer behavior patterns effectively, thereby enabling them to develop precise, customer-centric marketing strategies. The architecture of RFM-Net is composed of several key layers (input layer, convolutional layer, max-pooling layer, flatten layer, dense layer, and output layer), each of which serves a distinct function to support data-driven customer classification. Each component of the RFM-Net model is described below, highlighting its specific contribution to the customer segmentation process.
Input Layer: This layer receives structured customer data, typically composed of RFM features. The data is reshaped to a format compatible with convolutional operations. Each input sample represents a single customer behavioral profile, forming the foundation for deeper pattern extraction.
Convolutional Layers: These layers apply multiple filters to the features to detect patterns within data. They enable the model to understand how certain RFM feature combinations (e.g., high frequency but low monetary value) might correlate with specific customer segments. The rectified linear units (ReLU) activation function in these layers introduces non-linearity, which helps in modeling complex relationships. This is particularly useful for differentiating subtle variations in customer behaviors, such as identifying “Potential Loyalists” or “Promising” customers. A narrow kernel is used to ensure local feature extraction without overfitting.
Max-Pooling Layer: This layer decreases the spatial size of the feature maps while preserving the most informative ones, thereby enhancing generalizability and reducing noise. It ensures minimizing the impact of small fluctuations in customer data.
Flatten Layer: The multi-dimensional outputs from the convolutional layers are converted into a one-dimensional vector that is suitable for classification. This transformation bridges the convolutional layers and the dense classifier while preserving the learned representations of customer behavior.
Dense Layer (Fully Connected): This layer processes the flattened vector to learn higher-level relationships between RFM patterns. It enables the network to form comprehensive views of customer profiles, such as whether customers belong to the “Champion” group, characterized by consistent spending, or the “Hibernating” group, characterized by minimal activity.
Output Layer: This layer maps the learned features to meaningful customer classes, such as Champions, Loyal Customers, Need Attention, and Hibernating. It uses a softmax activation to assign probabilities across the predefined customer segments. The segment with the top probability value is chosen as the final classification. This output empowers organizations to design their marketing strategies for each customer group separately with greater precision and personalization.
Through this end-to-end learning process, RFM-Net can accurately categorize each customer into a relevant segment, enabling businesses to develop targeted, customer-centric marketing strategies.
Table 3 presents the architecture of the RFM-Net model, including the types of layers used, their respective output shapes, and the number of trainable parameters at each stage of the network. The architecture begins with an input layer, followed by convolutional layers designed to extract low-level feature patterns from RFM metrics. A pooling layer is then applied to reduce spatial complexity. The process continues with a flattening operation, followed by a dense layer that interprets the extracted features and an output layer that maps them to customer segment predictions.
In the proposed CNN model, each customer is represented by three RFM features (Recency, Frequency, Monetary), which are reshaped into a single-channel 2D tensor of size (3, 1, 1) to comply with the Conv2D input format. The first convolutional layer applies 32 filters with a kernel size of (2, 1) and ReLU activation, followed by MaxPooling2D with pool size (2, 1). Additional convolutional layers use 64 (or higher) filters with kernel size (1, 1), enabling nonlinear feature transformation. After convolution and pooling operations, the feature maps are flattened and passed to a dense layer with 64 neurons and a final softmax output layer corresponding to the customer segments.
Despite its relatively shallow structure, RFM-Net is highly efficient and effective due to its careful task-specific design. The total number of trainable parameters remains minimal, ensuring computational efficiency while preserving model expressiveness. This lightweight structure makes RFM-Net ideal for real-world applications where computational resources may be constrained.
3.3. Comparative Analysis of RFM-Net with Existing CNN Architectures
Although conventional CNN architectures, such as ResNet, AlexNet, VGG, and GoogleNet, have demonstrated superior performance in computer vision tasks, they are not typically designed and optimized for low-dimensional tabular data such as RFM inputs. These models are often overparameterized for tasks like customer segment classification, leading to overfitting, longer training times, and the need for extensive computational resources. The key advantages of RFM-Net over these models can be summarized as follows:
Lightweight and Fast: The architecture of RFM-Net is relatively shallow, consisting of only a few layers and a minimal number of parameters. This lightweight design ensures high computational efficiency, enabling the model to be particularly well-suited for real-time inference and deployment in resource-constrained environments.
Tailored for Tabular Data: Unlike image-centric CNNs, RFM-Net is specifically designed to work with structured data, preserving the semantic relationships between RFM features.
Overfitting Prevention: Deeper models, such as VGG-16 or ResNet-50, may be prone to overfitting when applied to low-dimensional data. RFM-Net addresses this issue through its architectural simplicity and pooling mechanism.
Interpretability: The compact architecture of RFM-Net provides better interpretability than that of deeper and black-box models. This enables easier interpretation and debugging, which is crucial in customer behavior analysis for business applications when understanding model decisions.
End-to-End Learning: By combining feature extraction and classification within a unified framework, RFM-Net eliminates the need for separate clustering algorithms. This end-to-end framework simplifies the segmentation pipeline and improves scalability.
4. Experimental Studies
4.1. Dataset Description
In this study, we utilized the publicly available “Online Retail” dataset [
20], which was obtained from the UCI Machine Learning Repository. It is a rich multivariate time-series dataset comprising 541,909 records and 8 variables, collected from a UK-based non-store online retail company. The company primarily sells unique all-occasion gifts, including items such as ceramic homeware, scented candles, novelty mugs, children’s crafts, seasonal decorations, and stationery. These products are typically low-priced, decorative and often purchased in bulk by wholesalers for resale in gift shops or boutique stores. The transactions in the dataset span over a one-year period, from 1 December 2010 to 9 December 2011. The dataset captures customer purchase behavior across 37 different countries, including the United Kingdom, Japan, the United States, Australia, the Netherlands, France, Italy, Spain, Germany, Canada, and several other countries. Due to its sequential nature and temporal granularity, the dataset is well-suited for customer segmentation, market basket analysis, anomaly detection, demand forecasting, and customer lifetime value estimation.
Table 4 presents a structural overview of the dataset, including variable names, data types, brief descriptions, and indications of missing values. Each record corresponds to a line item in an invoice, meaning that a single invoice may have multiple rows corresponding to multiple products purchased in that transaction. Notably, the variables Description, Country, and CustomerID contain missing data, which must be addressed during preprocessing for any segmentation and modeling tasks.
A sample of the dataset is shown in
Table 5. Each row in the table represents an individual product item within a customer invoice. As seen in the table, customers often purchase multiple items in a single order. For instance, invoice 536608 includes three different items purchased by customer 12855 on 2 December 2010. Repeated purchases of the same product (e.g., stock code = 16014) across different invoices and dates indicate recurring demand for specific products in varying quantities. The sample also reflects temporal diversity in transactions, offering insight into customer activity across various points in time throughout the year. Overall, the inclusion of both customer- and product-level details enables a wide range of analyses on customer behavior, sales trends, and product performance.
Prior to analysis, a series of data preprocessing steps was performed to ensure analytical robustness and relevance. The initial phase involved selecting only the variables essential for RFM analysis. Supplementary fields such as product codes and names and country were omitted, as they were not directly relevant to the RFM framework and would contribute to unnecessary complexity. After that, records lacking customer identification were excluded to address data incompleteness, as they could not be linked to any user behavior. Another critical step involves removing return transactions that are not associated with a corresponding sales record. In other words, transactions whose invoice numbers were prefixed with the letter ‘C’ were eliminated if they did not correspond to an original purchase. Additionally, contextually irrelevant entries such as bank charges, postage, and gift cards were excluded to more accurately reflect actual customer purchase behavior.
Following data cleaning, customer-level metrics were computed: recency was calculated as the number of days since the most recent purchase of the customer; frequency captured the total number of distinct purchases; and monetary value was derived by aggregating total spending across all invoices of the customer. Each individual customer was then assigned discrete R, F, and M scores based on the user-defined threshold values. Specifically, recency was segmented using the bins [10, 30, 50, 150], while monetary values were divided according to the bins [500, 1500, 2500, 5000]. Given the high density of one-time visitors, frequency values were scored using a specialized binning strategy of [1, 2, 4, 6]. Under this scheme, the customers with a frequency score of 1 were assigned to category 1, those with a score of 2 were placed in category 2, those with scores of 3 and 4 were classified into category 3, customers that have the frequency values 5 and 6 were grouped under category 4, and those with a frequency value higher than 6 were included in category 5. These thresholds were determined through an exploratory analysis and iterative empirical experimentation, considering the data distributions, domain knowledge, and the resulting classification accuracy across multiple trials. Finally, a rule-based classification scheme was applied to the resulting RFM scores to annotate each customer with a corresponding behavioral segment label.
Table 6 illustrates a labeled dataset resulting from an RFM analysis, where each customer has been annotated with a corresponding segment category (e.g., ‘Champions’, ‘Hibernating’) based on their purchasing behavior. Since multiple rows may correspond to a single invoice and each customer may have multiple invoices over time, the raw data are aggregated for customer-level and invoice-level analysis. For instance, as shown in
Table 5, customer 13848 made three separate purchases at different times. These transactions included varying quantities and types of products, with a total monetary value of £1255, a frequency of 3 purchases, and a recency value of 92 days from the most recent purchase to the reference date. Based on this aggregated data, RFM scores are assigned according to the user-defined thresholds. For CustomerID 13848, the resulting RFM score is 232, indicating relatively low recency, moderate frequency, and low monetary value. According to predefined segmentation rules, this customer falls into the “Need Attention” segment. In short, the RFM analysis transforms raw sales records into interpretable customer segments, which serve as labels for the classification task.
Figure 3 illustrates the distribution of customer segments, showing the proportion of each group within the overall customer base. The largest portion of the customers (23.70%) falls under the “Potential Loyalists” category, indicating a significant number of recent buyers who have the potential to become long-term loyal users. This is followed by “Loyal Customers” at 18.28%, representing a key group of repeat purchasers. Meanwhile, “Hibernating” customers account for 17.39%, reflecting a substantial portion of inactive and low-engagement users. Champions, the most valuable and engaged customers, account for 14.46%, while “About to Sleep” and “Need Attention” make up 11.30% and 8.51%, respectively. Lastly, the “At Risk” group represents 6.36% of the customer base, highlighting a smaller but still important group with declining activity. This distribution offers critical insights into customer behavior, enabling more effective targeted efforts, from supporting high-potential segments to re-engaging those at risk of churn.
Figure 4 illustrates the outcome of the permutation feature importance analysis, a widely used technique in machine learning for assessing the relative contribution of each input feature to the predictive accuracy of the model. The primary objective of this analysis is to disrupt the relationship between the target variable and a given feature by randomly permuting the feature’s values and then observing how the model’s performance deteriorates as a result. A greater drop in performance signifies a more important feature. According to the results presented, Recency was identified as the most impactful variable, with an importance score of 0.5358, indicating that the time since a customer’s last purchase plays a significant role in the decision-making process. This high value suggests that the prediction is highly sensitive to temporal data, specifically serving as the primary signal for distinguishing between active and churned states. Frequency was ranked as the second most important variable, with a score of 0.4217, suggesting that the number of purchases is also a strong predictor. This indicates that the model relies on the repetition of purchase behavior to establish patterns of loyalty, implying that ‘how often’ a customer returns is nearly as vital as ‘when’ they last visited. Monetary was identified as the least important among the three variables, with a score of 0.2642, indicating that spending levels are also relevant but play a comparatively smaller role in the prediction process. The hierarchical importance of the RFM variables (R > F > M) aligns closely with the Churn Prediction (CP) in the customer analytics domain, where the ‘recency’ of an action is often the strongest indicator of future engagement. Furthermore, this ordering also aligns with the Customer Lifetime Value (CLV) framework, where recency and frequency are typically stronger predictors of future cash flows than monetary value alone, reflecting well-established patterns observed in domain knowledge.
4.2. Experimental Setup
The proposed method was implemented in Python (version 3.12) using various libraries, including TensorFlow (version 2.20.0), NumPy (version 2.0), Pandas (version 2.0), Scikit-Learn (version 1.8.0), Seaborn (version 0.13.0), and Matplotlib (version 3.9.0). We employed a 10-fold cross-validation procedure to assess the robustness and generalization capability of the proposed model while avoiding data leakage. In this process, the entire dataset was randomly divided into 10 equal-sized and non-overlapping subsets (folds). In each round, one fold (10% of the entire data) was held out as an independent test set, while the remaining nine folds constituted the development set. The development set was further split into training and validation subsets using an 8:1 ratio, corresponding to 80% training and 10% validation data with respect to the full dataset. The training subset was used for model fitting, while the validation subset was used to monitor training (i.e., early stopping). The test fold was used solely for performance evaluation, never involved in training or validation, ensuring that no information leakage occurred. This procedure is performed 10 times, with each fold acting as the test data exactly once. The final performance is then calculated by averaging the results across all folds, providing a comprehensive and reliable evaluation of the model.
Hyperparameter analysis was conducted as a separate set of experiments. For each hyperparameter configuration (e.g., number of convolutional layers, learning rate, number of filters, and number of folds), the entire 10-fold cross-validation procedure described above was executed independently. Similarly, RFM thresholding was computed by running the complete evaluation process for each different configuration. The preprocessing steps were fitted on the training data and then applied to the corresponding validation and test sets. The approach ensures that preprocessing and thresholding are always based solely on the training data, preventing any distribution leakage. Performance metrics were averaged across folds for each configuration, and comparisons were made between configurations based on these averaged results.
Various assessment metrics were used to evaluate how well the model classifies customers into the correct segments. These metrics include accuracy—Equation (7)—which measures the proportion of correct predictions, as well as precision, recall, and the F-measure—Equations (8)–(10)—which provide more nuanced insights into how well the model performs across different segment classes. The metrics were computed for the multi-class classification task using weighted averaging. Specifically, class-wise accuracy, precision (
), recall (
), and F-measure values were first calculated, and the final reported metrics were obtained as the weighted average across classes, where weights correspond to the number of samples in each class. This approach accounts for the possible class imbalance in segment counts.
Here, is the number of class labels, is the number of instances in class , while is the total number of instances across all classes. In these formulations, True Positives () and True Negatives () are correct predictions of positive and negative cases for class i, respectively, while False Negatives () and False Positives () are incorrect predictions where the model misclassifies negative cases as positive and positive cases as negative. A confusion matrix was also generated to visualize which segments are often confused with others, offering opportunities for model improvement.
4.3. Results
Table 7 presents the performance of the classification model obtained from a 10-fold cross-validation using four key evaluation metrics: Accuracy, Recall, Precision, and F-Measure. The results demonstrated that the model delivered robust outcomes, with accuracy values ranging from 90.78% to 97.23%. On average, the model achieved an accuracy of 94.33%. Precision and recall values closely follow this accuracy trend, reflecting the strong generalizability of the model. These outcomes indicate that the classifier is not only accurate but also maintains a strong balance between sensitivity and specificity.
Figure 5 presents the confusion matrix, which shows the performance of the classification model across different classes. The model demonstrated notably high classification accuracies in customer categories. For instance, the “Champions” group was correctly classified at a rate of 94.3%, with only 5.7% of instances misclassified. Similarly, the “Need Attention” category showed strong performance with 90.5% accuracy. The model performed particularly well in identifying “Potential Loyalists” (96.9%) and “About to Sleep” customers (93.7%), with minor misclassifications distributed across adjacent segments. Overall, the matrix indicates that the classification model is effective in distinguishing between customer segments.
Figure 6 illustrates the training and validation loss values over 20 epochs, indicating a consistent improvement in model performance. Both of them decrease substantially, with the training loss dropping from 0.6024 to 0.1549 and the validation loss decreasing from 0.3792 to 0.1502. This trend reflects effective learning and generalization. Notably, from epoch 15 onward, the losses become closely aligned, meaning that the model has reached a stable learning phase. The decreasing gap between training and validation loss toward the final epochs further supports the model’s robustness and its ability to generalize well on unseen data.
4.4. Sensitivity Analysis
Table 8 presents the results of the sensitivity analysis conducted to evaluate the impact of key hyperparameters on the proposed model performance. This analysis involved systematic testing of multiple values for each parameter. For the number of convolutional layers, the sensitivity analysis explored values ranging from 2 to 7. The highest performance was observed with two layers, yielding an accuracy of 94.33%. As the number of layers increased, performance consistently declined, likely due to overfitting or redundant feature extraction. The learning rate was also tested across a range of values: 0.04, 0.03, 0.02, and 0.01. Among these, a learning rate of 0.01 achieved peak performance in all metrics. Higher learning rates negatively impacted model performance, possibly due to overshooting during the optimization process. For the K-Fold Cross Validation, using 10 folds produced better generalization compared to 5 folds, demonstrating the advantages of a more thorough validation approach. Furthermore, a filter size of 32 was found to be optimal, offering higher accuracy compared to 16 filters. It demonstrates that the model is sensitive to filter size, suggesting that a higher number of filters can enhance feature extraction without overfitting. Finally, an analysis was conducted to examine the impact of different user-defined RFM threshold values on model performance. To provide a comprehensive evaluation, various distinct threshold configurations, ranging from tighter to broader intervals, were tested to determine the optimal discretization strategy. The progressive adjustment of R, F, and M ranges enabled a detailed examination of how threshold granularity influences classification stability. As shown in
Table 8, broader threshold intervals resulted in higher accuracy and a more balanced class distribution, better reflecting underlying differences in customer purchasing behavior. Overall, all these results guided the selection of hyperparameters for the final model configuration: two convolutional layers, a learning rate of 0.01, 10-fold cross-validation, and 32 filters.
4.5. Discussion
In this section, the performance of the proposed RFM-Net method is evaluated comparatively with the results reported in prior studies [
16,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34] on the same dataset. As presented in
Table 9, the comparison was made using common performance metrics, including accuracy, recall, precision, and F-measure. According to the results, RFM-Net provided a consistent and significant performance advantage over all compared methods in terms of all evaluation criteria. On average, RFM-Net demonstrated a performance increase of approximately 13.17% in accuracy compared to the results reported in state-of-the-art studies. For instance, the proposed method yielded a superior result (94.33%) compared to advanced models, such as PARM (90.00%) [
22] and Ret-DNN (90.00%) [
25]. In addition, RFM-Net has also surpassed ensemble learning methods such as Random Forest (87.60%) [
16], Gradient Boosting (85.00%) [
28], and AdaBoost (73.30%) [
21]. Compared to these studies, the RFM-Net method achieved the highest results not only in terms of accuracy but also in all performance metrics, with a precision of 0.9466, a recall of 0.9433, and an F-measure of 0.9429. These results clearly demonstrated RFM-Net’s superiority in processing online retail data. As shown in
Table 9, standard classification algorithms, such as KNN, DT, and SVM, are more limited in capturing complex and nonlinear relationships in transactional data compared to our deep learning-based approach. In summary, RFM-Net’s strong performance compared to other state-of-the-art studies validated the model’s ability to distinguish critical classes.
Table 10 presents the performance comparison between the proposed RFM-Net and several baseline models, including logistic regression (LR), naive Bayes, multi-layer perceptron (MLP), k-nearest neighbors, AdaBoost, decision tree (DT), and a tree-based ensemble method (Bagging (DT)). All models were evaluated under the same experimental protocol (identical preprocessing, RFM thresholds, and segment definitions) to ensure a fair comparison. The results demonstrated that RFM-Net outperformed all baseline models across all evaluated metrics. For instance, while the MLP obtained an accuracy of 85.50%, RFM-Net reached a substantially higher accuracy of 94.33%. Similarly, LR yielded an accuracy of 90.32%, which remains lower than the accuracy obtained by RFM-Net. The tree-based ensemble approach, Bagging (DT), delivered an accuracy of 90.68%, confirming the strength of ensemble-based modeling, yet remaining below the results achieved by RFM-Net. Specifically, RFM-Net improved accuracy by 6.31 percentage points compared to the average baseline performance (88.02%). These results demonstrate the effectiveness of the proposed CNN-based architecture in modeling the RFM feature interactions and improving classification performance.
Although the input consists of only three structured features (R, F, and M), the convolutional architecture can provide advantages in learning localized interaction patterns between these features. Instead of treating R, F, and M as fully independent variables, the convolutional filters act as feature interaction extractors, capturing local dependency structures and nonlinear combinations more effectively than solely global weight updates in a standard MLP. Furthermore, the pooling mechanism enhances robustness by emphasizing dominant interaction patterns while reducing sensitivity to noise. Compared to traditional models such as Regression, which usually assume linearity in their feature space, and tree-based methods that rely on hierarchical splits, the shared-filter mechanism of the CNN acts as an implicit regularizer, which improves generalization performance.
To further strengthen the validity and generalizability of the proposed RFM-Net approach, an additional dataset was incorporated into the experimental evaluation. The publicly available Online Retail II [
35] dataset, containing 1,067,371 real-world transactional records from a UK-based non-store online retailer spanning two years (2009–2011), was utilized. The same preprocessing methodology and evaluation metrics were employed to ensure consistency with the primary dataset. In the sensitivity analysis, the same hyperparameter configurations were systematically examined, except for threshold values, which were doubled because the dataset size was also twice that of the main dataset. The results are presented in
Table 11. Consistent with the findings from the primary dataset, the best performance was achieved using a shallow convolutional architecture (2 layers) and a lower learning rate (0.01), whereas deeper configurations resulted in performance degradation. Similarly, a higher filter capacity (32) and broader RFM threshold intervals led to improved classification, achieving an accuracy of 95.41%. These findings confirmed that the proposed RFM-Net model maintained strong performance on large-scale, real-world transactional data. The consistency of results across datasets further supports the robustness and generalizability of the proposed segmentation approach.
One limitation of this study is that customer segment labels are generated from predefined RFM rules and subsequently used as ground truth for training the CNN, introducing a degree of circularity. However, although customer segments (labels) are derived from discrete RFM scores (1–5 scale), the CNN model is trained on raw discrete RFM values to preserve the original behavioral measures. The primary objective of employing a deep learning architecture in this context is to create a scalable framework capable of segmenting customers accurately. While segmentation is conceptually grounded in rule-based logic using RFM scores, model training is performed using actual RFM values.