Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy

Deldadehasl, Maryam; Karahroodi, Houra Hajian; Haddadian Nekah, Pouya

doi:10.3390/tourhosp6020080

Open AccessEditor’s ChoiceArticle

Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy

by

Maryam Deldadehasl

^1,*

,

Houra Hajian Karahroodi

²

and

Pouya Haddadian Nekah

³

¹

School of Electrical, Computer, and Biomedical Engineering, Southern Illinois University, Carbondale, IL 62901, USA

²

School of Management and Marketing, Southern Illinois University, Carbondale, IL 62901, USA

³

Barney Barnett School of Business and Free Enterprise, Florida Southern College, Lakeland, FL 33801, USA

^*

Author to whom correspondence should be addressed.

Tour. Hosp. 2025, 6(2), 80; https://doi.org/10.3390/tourhosp6020080

Submission received: 31 March 2025 / Revised: 25 April 2025 / Accepted: 30 April 2025 / Published: 9 May 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study introduces a novel Recency, Monetary, and Duration (RMD) model for customer classification in the hospitality industry. Using a hybrid approach that integrates data mining with multi-criteria decision-making techniques, this study aims to identify valuable customer segments and optimize marketing strategies. This research applies the K-means clustering algorithm to classify customers from a hotel in Iran based on RMD attributes. Cluster validation is performed using three internal indices, and hidden patterns are extracted through association rule mining. Customer segments are prioritized using the TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) method and Customer Lifetime Value (CLV) analysis. The outcomes revealed six distinct customer clusters, identified as new customers; loyal customers; collective buying customers; potential customers; business customers, and lost customers. This study helps hotels to be aware of different types of customers with particular spending patterns, enabling hotels to tailor services and improve customer retention. It also provides managers with appropriate tools to allocate resources efficiently. This study extends the traditional Recency, Frequency, and Monetary (RFM) model by incorporating duration, an overlooked dimension of customer engagement. It is the first attempt to integrate data mining and multi-criteria decision-making for customer segmentation in Iran’s hospitality industry.

Keywords:

hospitality marketing; customer retention; RMD; TOPSIS; association rules; K-means; CLV

1. Introduction

The hospitality industry has become an increasingly competitive sector in the global economy. Fluctuating global hotel occupancy rates, driven by economic shifts and evolving consumer preferences, underscore the strategic importance of customer retention. Hotels generate vast amounts of customer data, yet many struggle to leverage this information for targeted marketing and service personalization. The demand for enhanced customer experience has led to a surge in the use of Customer Relationship Management (CRM) systems that store vast amounts of customer data (Dursun & Caber, 2016). However, simply collecting data is insufficient; the ability to extract actionable insights from these datasets is crucial. Data mining techniques have gained traction, as they allow businesses to uncover hidden patterns in large datasets, enabling managers to make more informed strategic decisions (Ristoski & Paulheim, 2016).

Customer segmentation is an essential aspect of marketing planning, allowing businesses to target specific customer groups with tailored strategies. Traditional segmentation methods, such as Recency, Frequency, and Monetary (RFM) analysis, have been widely applied in various industries, including retail, banking, and hospitality (McCarty & Hastak, 2007). However, these models have limitations, particularly in their ability to capture a customer’s overall engagement beyond purchase behavior (Hu & Yeh, 2014). While prior research has applied RFM in hospitality, existing studies often overlook engagement duration (D), an essential factor in customer value assessment. In the hospitality industry, a guest’s length of stay is a crucial determinant of their overall contribution to revenue and customer lifetime value. Unlike industries where transaction frequency is the primary measure of engagement (e.g., retail and banking), the duration of a guest’s visit influences pricing strategies, occupancy management, and service personalization (Dursun & Caber, 2016; Kumar et al., 2015). Consequently, integrating duration into customer segmentation provides a more holistic view of customer value.

Few studies integrate clustering with multi-criteria decision-making to optimize segmentation strategies. This study addresses these gaps by introducing an RMD-based customer segmentation framework that enhances marketing decision-making. This study extends the conventional RFM model by introducing RMD to better capture customer behaviors in the hospitality industry. The duration attribute is a significant factor that reflects a customer’s engagement with a service provider, which is often overlooked in RFM-based analyses (Dursun & Caber, 2016).

To address these limitations, this study employs a hybrid clustering approach that integrates data mining and decision-making techniques. While clustering methods such as K-means are commonly used in customer segmentation, they are often limited to descriptive categorization without integrating decision-support mechanisms for prioritization (Jain, 2010). This study complements clustering with Multi-Criteria Decision Making (MCDM) techniques, enabling a systematic approach to evaluating and ranking customer segments. Using the RMD framework, hotel customers are segmented based on recency, monetary value, and duration of stay. These clusters are then ranked using decision-making methods such as TOPSIS and CLV analysis. Further details on the methodology are provided in Section 3.

This study integrates clustering methods, association rule mining, and decision-making tools to propose a comprehensive framework for customer segmentation. Specifically, the research employs the K-means clustering algorithm, validated by three internal cluster indices—Silhouette, Calinski–Harabasz, and Davies–Bouldin—to ensure robust segmentation (Calinski & Harabasz, 1974; Davies & Bouldin, 1979; Rousseeuw, 1987). After clustering, association rule mining is used to extract hidden patterns, while TOPSIS and CLV (Customer Lifetime Value) analysis prioritize the most valuable customer segments.

While data mining has been extensively used in various sectors, few studies have combined clustering, association rules, and MCDM for customer segmentation in the hospitality industry (Dursun & Caber, 2016; Srihadi et al., 2016). Prior research in hospitality has applied clustering methods independently or in combination with RFM analysis (Ansari & Riasi, 2016; Mosavi & Afsar, 2018). However, the integration of unsupervised machine learning (e.g., K-means), rule-based analysis (e.g., Apriori association rules), and multi-criteria decision-making remains underexplored. This study seeks to bridge this methodological gap by presenting a unified framework for data-driven customer segmentation. Previous research has explored clustering techniques in hotel management, but the integration of Shannon Entropy, TOPSIS, and Best Worst Method (BWM) for segmentation validation remains underdeveloped (Çalık et al., 2020; Tu et al., 2020). This study contributes to bridging this gap by presenting a novel hybrid approach that leverages both unsupervised machine learning (clustering) and multi-criteria decision-making to optimize customer classification.

Furthermore, advancements in big data analytics and artificial intelligence have revolutionized customer segmentation strategies. The ability to analyze large datasets efficiently has led to improvements in segmentation accuracy and decision-making (George et al., 2014). Machine learning algorithms, such as K-means, enhance traditional statistical methods by allowing for dynamic pattern recognition in real time (Jain, 2010). This paper highlights how integrating MCDM techniques with data mining can lead to more precise, actionable, and strategic customer segmentation models. This study seeks to answer the following research questions: 1. How does the RMD model improve customer segmentation compared to traditional RFM? 2. What are the key characteristics of customer segments identified through K-means clustering? 3. How can decision-making techniques like TOPSIS enhance customer prioritization for marketing strategies?

The remainder of this paper is structured as follows: Section 2 reviews relevant literature on customer segmentation and data mining. Section 3 outlines the methodology, including data collection, clustering techniques, and decision-making approaches. Section 4 presents empirical results, followed by a discussion of implications in Section 5. Finally, Section 6 concludes this study with limitations and future research directions.

2. Literature Review

2.1. Data Mining in the Hospitality Industry

With the rapid advancement of Information and Communication Technology (ICT), businesses have gained the ability to generate and analyze large datasets. In the hospitality industry, data mining has become a crucial tool for extracting customer insights, optimizing marketing strategies, and improving customer relationship management (CRM) (Cheng & Chen, 2009). Data mining, which involves discovering patterns in large datasets through computational methods, is widely used in industries such as banking, healthcare, and marketing (Erevelles et al., 2016; Sharifi, 2025). Within the hospitality sector, data mining techniques help analyze customer behavior to improve marketing strategies, loyalty programs, and revenue management (Dursun & Caber, 2016; Samarasinghe & Samarasinghe, 2013). CRM-based data mining can significantly enhance customer retention by identifying high-value customers and predicting future purchasing behaviors. The application of data mining in hospitality extends beyond marketing to include dynamic pricing strategies, operational efficiency improvements, and predictive analytics for demand forecasting (Chen et al., 2012; Samarasinghe & Samarasinghe, 2013). By leveraging structured and unstructured customer data, hotels can develop targeted promotional campaigns, optimize pricing strategies, and enhance service customization, ultimately improving customer retention. The tourist industry comprises several sections such as hotels, travel agencies, and transportation (Ostovare & Shahraki, 2019). Thus, hotels are an important part of this package, and hotel managers have to offer developed services that are tailored their customers to this competitive market. However, due to the advancement of the internet, the tourism industry has been reshaped considerably (Ho & Lee, 2007). The Internet is an eminent tool to share information for both consumers and suppliers (Law et al., 2010). Consequently, people can widely access information about all hotels’ facilities and their customers’ experiences. This has led to documenting a wide range of transactions as big data, which helps hotels to present their services based on customers’ tastes. Identifying customers is crucial for all businesses to make profitable decisions in such a competitive epoch. Thus, clustering serves as an effective method. According to Table 1, a few papers were published in the hotel industry to identify customers and improve their services. Therefore, in this study, some methods are integrated to increase efficiency. To apply the K-means algorithm, determining the number of clusters (K) is critical. Hence, three internal indices are applied to evaluate k. In the following, their weights are calculated based on Shannon entropy, and the best number is defined with the TOPSIS algorithm. A priori and RMD are combined to improve the accuracy of the results and rendering patterns. Eventually, clusters are analyzed using TOPSIS and BWM.

2.2. Customer Segmentation Techniques

Customer segmentation is a fundamental technique in marketing analytics, enabling businesses to classify customers based on their behavior, demographics, and purchasing patterns (McCarty & Hastak, 2007). While traditional segmentation methods such as RFM have been widely adopted, recent advancements in machine learning and data analytics have led to the development of more sophisticated models, including behavioral and psychographic clustering (Peker et al., 2017). These models incorporate real-time transaction data, customer engagement metrics, and external factors to refine segmentation strategies. Traditional segmentation models, such as Recency, Frequency, and Monetary (RFM) analysis, have been extensively used in various industries. However, these models often fail to account for customer engagement depth and behavioral dynamics (Hu & Yeh, 2014). The limitation of RFM stems from its exclusive focus on transactional activity, neglecting customer interaction longevity, seasonal patterns, and service usage intensity (McCarty & Hastak, 2007). This gap is particularly evident in the hospitality industry, where the frequency of visits may not necessarily correlate with long-term customer value. For instance, a customer who stays in a hotel for ten nights annually might be more valuable than a frequent traveler staying for only one night per visit. The RMD framework addresses this gap by introducing duration as a critical segmentation attribute. Predictive analytics, powered by machine learning algorithms, has further enhanced customer segmentation by allowing businesses to forecast customer lifetime value and retention likelihood (Erevelles et al., 2016). Various tools are applied in data mining, conducting divergent applications. Figure 1 demonstrates some of the data mining techniques.

Clustering and association rules are introduced in the following due to their application in this research. Among the various clustering methods, K-means remains one of the most widely used due to its efficiency and adaptability. However, a critical limitation of K-means is its reliance on predefined cluster numbers, which may lead to suboptimal segmentation if the correct value of K is not determined effectively (Syakur et al., 2018). To mitigate this issue, this study employs internal cluster validation indices such as silhouette, Davies–Bouldin, and Calinski–Harabasz scores to ensure robust segmentation (Davies & Bouldin, 1979; Rousseeuw, 1987). It partitions customers into distinct groups based on similarity, enabling businesses to identify and target different customer segments effectively. K-means clustering was presented in an anthropology paper in 1954 (Loh & Shih, 1997), and it has been used in various fields such as machine learning and statistics. Items in a cluster are the most similar to each other while being the most different from items in other groups (Jain, 2010; Laursen, 2011). A key advantage of K-means is its sensitivity to changes in data properties, making it a dynamic and adaptable clustering method.

2.3. RFM vs. RMD: The Need for an Enhanced Segmentation Model

The RFM model segments customers based on three attributes: recency, frequency, and monetary value (McCarty & Hastak, 2007). It is widely used due to its effectiveness in identifying high-value customers (Hu & Yeh, 2014). This segmentation enables businesses to develop targeted marketing strategies that enhance customer engagement and retention (Wei et al., 2010). RFM has some advantages and disadvantages; first, RFM costs are affordable and let the companies’ owners understand customers’ behaviors readily (Kahan, 1998; Miglautsch, 2000). Second, it helps firms to anticipate and enhance their profits (Baecke & Poel, 2011). Third, it is an effective model for customers’ purchasing behaviors with small variables (Wei et al., 2010). However, RFM only focuses on profitable customers, which means that it applies limited variables to understand customers and does not have prospects for new customers (McCarty & Hastak, 2007). RFM is mainly applied to K-means and Self-Organizing Map (SOM) (Hanafizadeh & Mirzazadeh, 2011). To develop some of the RFM features by strengthening the weaknesses, researchers add other variables to RFM such as TRFM (timely), RFD (duration), FRAT (amount and type of products). Given the mean of RFM, each index with a higher value than the average is indicated by ↑, and those lower than the means are shown with ↓. In a rare study where the RFM method was applied to the hotel industry, loyal customers were symbolized with R↑F↑M↑, lost customers with R↓F↓M↓, new customers with R↑F↓M↓, potential customers with R↑F↑M↓, loyal summer season customers with R↓F↑M, collective buying customers with R↑F↑M, winter season customers with R↑F↑M↓ and high potential customers with RF↓M↑ (Dursun & Caber, 2016). Their model can identify valuable customers to improve service quality. Notwithstanding, in other industries, RFM and K-means have been used to identify valuable customers (Matz & Hermawan, 2020). RMD (Recency, Monetary, Duration) is presented in this study to cluster customers. RMD is defined as follows.

R (Recency): Defined as the number of days between the latest presence of a hotel’s customer and the date of analysis. It determines the most recent customers’ presence. The customers’ data are bounded between the dates 18 August 2017 and 18 August 2018 for RMD analysis; beginning with data from 18 August 2017. Each of the dates is numbered, up to 366 for 18 August 2018.
M (Monetary): The amount of money each customer spends.
D (Duration): The number of days each individual stays at the hotel.

3. Basic Concepts

3.1. K-Means

Although K-means is a simple, fast, and efficient method, it is best suited for generating spherical clusters. One of the critical aspects of clustering is measuring the distance between data points, which affects how clusters are formed. Various methods exist to calculate distance, with Euclidean distance being one of the most widely used techniques, as calculated using Equation (1):

Distance (O_{i}, O_{j}) = \sqrt{\sum_{k = 1}^{n} {(X_{i k} - X_{j k})}^{2}}

(1)

The accuracy of the K-means algorithm depends heavily on selecting the optimal number of clusters (K) (Mesforoush & Tarokh, 2013). In this research, the optimal K value is determined using Multi-Criteria Decision-Making (MCDM) tools. To ensure the reliability of clustering results, various validation indices are used. These measures evaluate the quality of clusters by assessing intra-cluster cohesion and inter-cluster separation. Three widely adopted indices include the silhouette score, the Calinski–Harabasz index, and the Davies–Bouldin index, each of which provides insights into the optimal number of clusters. The quality of clustering results can be assessed using these internal validation indices, which will be discussed in the following below:

Silhouette analysis: The silhouette method is a widely used technique for evaluating clustering performance (Rousseeuw, 1987). A higher silhouette score indicates better clustering quality. To compute the silhouette score, two key concepts are introduced:

Mean distance of points to other points within the same cluster: Suppose x_i belongs to the C_j cluster. The mean intra-cluster distance (how close x_i is to other points in its own cluster) is computed using Equation (2).

a (i) = \frac{1}{n_{i}} \sum_{l = 1}^{n} d (x_{i}, x_{l})

(2)

Notably, n_i is the size of the cluster. Moreover, (i) represents the belonging value of the x_i to its cluster, which is greater for lower values. This distance can be determined using various methods such as Manhattan and Euclidean.

Minimum mean distance of a point to other clusters: Suppose that

x_{i}

is a point belonging to cluster

C_{j}

. The mean inter-cluster distance to cluster

C_{j}

is computed using Equation (3).

b (i) = min_{\begin{matrix} 1 \leq l \leq k \end{matrix}} \frac{1}{n_{l}} \sum_{\begin{matrix} y_{m} \in x_{l} \end{matrix}} d (x_{i}, y_{m})

(3)

where y_m are the points belonging to C_k and n_l is the number of measured distances. A cluster that has the lowest mean distance to the point x_i is referred to as an adjacent cluster to this point. Thus, the value of the silhouette criterion for point x_i is calculated using Equation (4).

s (i) = \frac{b (i) - a (i)}{max (b (i), a (i))}

(4)

Calinski–Harabasz index: The Calinski–Harabasz index is measured using Equation (5).

V R C_{K} = \frac{S S_{B}}{S S_{W}} \times \frac{(N - K)}{(K - 1)}

(5)

Notably, SS_B is the sum of variance between clusters and SS_W is the sum of variance within clusters. N and K are the number of observations and the number of clusters, respectively (Calinski & Harabasz, 1974).

Davies–Bouldin index: To calculate this index, first, the following criteria must be introduced (Davies & Bouldin, 1979):

Measure of scatter within a cluster: Suppose that S_i is the measure of scattering corresponding to the cluster C_i, and d is also a distance function. The scattering rate for this cluster is obtained using Equation (6).

s (i) = {[\frac{1}{| c_{i} |} \sum_{x \in c_{i}} d^{r} (x, c_{i})]}^{\frac{1}{r}}, r > 0

(6)

Notice that C_i is the centroid of the cluster i.

Cluster separation: The separation between the two clusters is also measured based on the distance between their centroids. The distance between two clusters is represented by D_ij and is computed using Equation (7).

D_{i j} = {[\sum d {(v_{i}, v_{j})}^{t}]}^{\frac{1}{t}}

(7)

Note that V_i and V_j are the centroids of the clusters i and j. Considering S_i for tightness and D_ij for separation, R_ij can be calculated using Equation (8).

R_{i j} = \frac{S_{i} + S_{j}}{D_{i j}}

(8)

To attain the Davis–Bouldin index for a clustering method, the maximum distance of each cluster relative to the other clusters is computed using Equation (9).

R_{i} = max_{\begin{matrix} j \neq i \end{matrix}} R_{i j}

(9)

The mean of the maximum distances (Davies–Bouldin) is computed for all clusters using Equation (10).

V_{D B} = \frac{\sum_{i}^{k} R_{i}}{k}

(10)

It is noteworthy that k is the number of clusters. The lower the

D B

index, the better the clustering.

3.2. Association Rules and Customer Behavior Analysis

Association rule mining identifies relationships and recurring patterns within a database. It helps managers to develop tailored marketing strategies and improve service offerings, ultimately enhancing customer satisfaction and competitiveness in the market. In a dataset, a rule, which is shown by X →Y, means if items belong to X, they would tend to belong to Y. The association rule has two parameters to be measured: support and confidence. Support X →Y shows the percentage of X and Y in all transactions, and confidence X →Y indicates the possibility that Y tends to be in the transaction if a transaction includes X (Zhang et al., 2004). Association rules help managers to find valuable customers (Ganjali & Teimourpour, 2016). The association rule has been used to diagnose possible disease, which was obtained based on hospital information and patient keywords (Ramasamy & Nirmala, 2020). Ganjali employed the association rule to find insurers’ behavior (Ganjali & Teimourpour, 2016). After applying the clustering process, the association algorithm is applied to find rules in each cluster.

3.3. Multi-Criteria Decision-Making (MCDM) Approaches for Customer Prioritization

3.3.1. Shannon Entropy

Shannon entropy as an objective method is applied to determine the weights of the validation indices. The entropy of each criterion is calculated using Equation (12) based on the normalized matrix. Notice that k is constant and is defined by Equation (11), and m is defined as the number of alternatives.

k = \frac{1}{ln m}

(11)

E_{j} = - k \sum_{i = 1}^{m} r_{i j} ln r_{i j}, i = 1, 2, \dots, m

(12)

The division degree is calculated based on the entropy values in Equation (13).

d_{j} = 1 - E_{j}

(13)

Finally, the weights are computed using Equation (14):

W_{j} = \frac{d_{j}}{\sum_{j = 1}^{n} d_{j}}

(14)

3.3.2. TOPSIS

TOPSIS was first developed by Hwang and Yoon in 1981 to rank alternatives based on decision criteria (Yoon & Hwang, 1995). The principle of this method is to select the alternative with the shortest distance from the ideal criteria and the longest distance from the anti-ideal indices. The TOPSIS method is defined below.

Step 1: The decision-making matrix is formed: The decision-making matrix is a matrix for evaluating alternatives (

A_{1}

,

A_{2}

,…

A_{3}

) based on some criteria (

C_{1}

,

C_{2}

,…,

C_{n}

) (15).

D = [\begin{matrix} X_{11} & \dots & X_{1 n} \\ ⋮ & ⋱ & ⋮ \\ X_{m 1} & \dots & X_{m n} \end{matrix}]

(15)

It is notable that

X_{i j}

is the value of the

i th

alternative found on the

j th

criteria.

Step 2: The decision-making matrix is normalized using Equation (16):

r_{i j} = \frac{x_{i j}}{\sqrt{\sum_{i = 1}^{n} x_{i j}^{2}}}, \forall i, j

(16)

Step 3: The weights of the criteria are extracted using a method. In this study, Shannon entropy is applied to determine the weights.

Step 4: The weighted normalized decision matrix is formed via the multiplication of the normalized matrix (R) with its relative weight using Equation (17).

[P_{i j}] = D \cdot R

(17)

Step 5: The TOPSIS method is employed to prioritize the number of clusters. On this subject, the ideal and anti-ideal solutions are elected using Equations (18) and (19).

A^{*} = {v_{1}^{*}, \dots, v_{m}^{*}} = \{(max_{i} v_{i j} ∣ j \in c_{b}), (min_{i} v_{i j} ∣ j \in C_{c})\}

(18)

A^{-} = {v_{1}^{-}, \dots, v_{m}^{-}} = \{(min_{i} v_{i j} ∣ j \in c_{b}), (max_{i} v_{i j} ∣ j \in C_{c})\}

(19)

Next, Equation (20) is used to measure the Euclidean distance of each alternative from the ideal (

A^{*}

) and anti-ideal (

A^{-}

), known as

s_{i}^{*}

and

s_{i}^{-}

, respectively.

s_{i}^{*} = \sqrt{\sum_{j = 1}^{m} {(v_{i j} - v_{i j}^{*})}^{2}}, \forall i

(20)

s_{i}^{-} = \sqrt{\sum_{j = 1}^{m} {(v_{i j} - v_{i j}^{-})}^{2}}, \forall i

Consequently, the closeness ratio is calculated using Equation (21).

R C_{i}^{*} = \frac{s_{i}^{-}}{s_{i}^{-} + s_{i}^{*}}

(21)

The appropriate alternative has the highest value of

R C_{i}^{*}

. The TOPSIS method has been used in many types of research. As an illustration, TOPSIS and Shannon entropy were employed to find the optimal technology among the waste-to-energy technological choices using the waste stream of Lagos, Nigeria (Alao et al., 2020). Moreover, Wang applied TOPSIS to evaluate 22 symbiotic technologies in the iron and steel industrial network (Wang et al., 2020).

3.3.3. BWM

BWM is an efficient technique introduced by Rezaei (2015) to solve MCDM problems. This technique determines the weights of criteria (Rezaei, 2015). This algorithm is defined in the following steps.

1.: A set of decision criteria is chosen.
2.: Focus groups or experts determine the best and worst criteria. Moreover, no comparison is applied to them.
3.: Focus groups or experts select their preference for the best criteria over other criteria based on numbers between 1 and 9 ( $A_{B} =$ ( $A_{b} 1$ , $A_{b} 2$ ,… $A_{b} n$ )).
4.: Focus groups or experts select the worst criteria compared to the other criteria based on numbers between 1 and 9 ( $A_{W} =$ ( $A_{1} w$ , $A_{2} w$ ,… $A_{n} w$ )).
5.: The proper weights are found by solving the nonlinear (NLP) model using Formula (22):

$\begin{matrix} min ϵ \\ subject to : \\ |\frac{w_{B}}{w_{j}} - a_{B j}| < ϵ, \\ |\frac{w_{j}}{w_{w}} - a_{j w}| < ϵ, \\ \sum_{j} w_{j} = 1, w_{j} \geq 0 \end{matrix}$

(22)
6.: In this section, the compatibility rate (CR) of the comparisons is computed using Equation (23). In this paper, CRs less than 0.2 are reasonable.

$C R = \frac{ϵ^{*}}{C I}$

(23)

It should be mentioned that CI is the compatibility index, which is calculated based on the preference of the best criteria over the worst criteria (BWM) (Guo & Zhao, 2017). BWM has recently been applied in many studies. BWM was applied to extract the weights of water resource security indicators (Tu et al., 2020). Çalık used BWM to determine the weights of social media platforms used by travel agencies to reach customers (Çalık et al., 2020). Khalilzadeh employed BWM in the banking industry to identify risk weights that affect the loss of banking projects (Khalilzadeh et al., 2020).

3.3.4. Customer Lifetime Value

Knowing customers’ characteristics and their differences is a crucial issue in an organization (Khajvand & Tarokh, 2011). Customer Lifetime Value (CLV), a concept introduced by Kotler, represents the total expected revenue a business can generate from a customer over the duration of their relationship Kotler (1973). CLV is defined as revenue that companies can achieve from a single customer or group of customers over time (Kasprova, 2020). CLV intends to evaluate the value of specific parts of a markets’ customers; therefore, it is a primary tool to demonstrate the merit segmentation strategy (Kahreh et al., 2017). Managers could ameliorate client segmentation by computing the CLV of each customers and marketing resource allocation achivements (Kumar et al., 2015). Customers with long lives are a profitable segment for companies. CLV, as a tool for leading all companies’ assets, is an efficient method for assessing the relationships between customers and firms. Maintaining customers is considered as increasing the customers’ profit over time Reichheld & Sasser (1996). CLV has three beneficial consequences. First, it identifies loyal and potential customers. Second, it helps companies to identify different customers with hidden patterns. Third, it aids managers to propose suitable strategies for each segment. One of the most highlighted outcomes of CLV is that companies can predict the future of valuable customers. Based on their information, they can both make proper decisions and present appropriate strategies (Gurau & Ranchhod, 2002). CLV has various models, e.g., the scoring model, probability model, and econometric model (Chang et al., 2009). In this study, the scoring model, which is based on customers’ purchase attributes (RMD) is applied. CLV is calculated using Equation (23) (note that

W_{f}

is the monetary weight and

N D_{c_{i}}

is the normalized duration).

CLV has been used in the food industry to evaluate customer loyalty (Matz & Hermawan, 2020). Furthermore, CLV has been applied to rank bank customers and determine the value of each cluster (Khajvand & Tarokh, 2011).

C L V = N R_{C_{i}} \times W_{R} + N D_{C_{i}} \times W_{D} + N M_{C_{i}} \times W_{M}

4. Research Methodology

This research attempts to distinguish 1171 customers of a hotel. The three phases of this study include:

1.: K determination, after data preparation.
2.: Customer classification and rule extraction.
3.: Clusters evaluations based on CLV and decision-making methods (TOPSIS and BWM).

These processes are depicted in Figure 2.

Phase I

In phase I, the optimal number of clusters (K) is determined based on the below steps.

1.: The clusters are evaluated based on cluster quality indices including silhouette analysis (Equations (2)–(4)), Calinski–Harabasz (Equation (5)), and Calinski–Harabasz (Equations (6)–(10)).
2.: These indices are considered as decision criteria, and their weights are extracted based on Shannon entropy (Equations (11)–(14)).
3.: Different values of K are considered as decision alternatives, which are prioritized by TOPSIS (Equations (15)–(21)).

Phase II

1.: Customers are clustered using the K-means algorithm based on RMD attributes. The optimal obtained number of phase I cases is considered as the number of clusters. To measure the distance, Euclidean distance is used (Equation (1)).
2.: A priori as a method of an association rule is employed to extract the rules.
3.: Tailored strategies are developed for each cluster based on its characteristics.

Phase III

1.: RMD attributes are considered as the criteria to assess clusters, and the weights are determined by BWM (Equations (21) and (22)).
2.: Clusters are prioritized based on TOPSIS (Equations (15)–(21)).
3.: CLV is computed for each cluster (Equation (23)).

5. Case Study and Results

Tehran, the capital of Iran, has a population of nearly 8.737 million (city population statics, 2016), with 101 hotels (www.hoteldari.com accessed on 10 May 2016). As a political center, Tehran attracts a large number of tourists who travel for businesses and medical treatment. The customer profiles are evaluated for the 3-star Hally Hotel, located downtown. The research population was 1121 hotel customers who visited the hotel at least once; however, after data preparation, the number of customers reached 1107 between 18 August 2017 and 18 August 2018. The customers were 60.4% female (f = 386), and 34. 9% male (f = 720). They were mainly ages 31–40 (26.7%, f = 296) and 41–60 (19.4%, f = 215). Many of them were Iraqi (22.89%) and Chinese (7.33%) and traveled alone (61.12%; f = 676) or with others (31.55%; f = 349). Demographic profiles of hotel customers are depicted in Table 2

5.1. Clustering Model

After data preparation, three validity indices (Calinski–Harabasz, Davies–Bouldin, silhouette) were computed for the evaluation of clusters 2 to 10, including their respective weights, which were determined based on Shannon entropy Table 3. According to this table, Davies–Bouldin, had the highest value of 0.36 compared to the others. Thereafter, the customers were clustered based on the RMD indicators defined in the previous sections. The descriptive statistics of RMD are presented in Table 4. The mean of RMD was computed for each cluster, and their values were compared with the total mean of RMD. Their scores are presented in Table 4.

5.2. Clustering Analysis

The overall average of RMD was calculated (R = 126.854, D = 3.46, M = 1252.34). The first cluster had a score of R↑M↓D↓; thus, this cluster was called ‘new customers’. The customers in this cluster stayed in the hotel one night and traveled alone. New customers were mostly men (68%), Iraqi (24%), and aged 31–40 (26%). They had freelance jobs, and their travel intention was tourism. These customers should be motivated not only to stay more but also to spend more money in the hotel. The second cluster is known as loyal customers, with a score of R↑M↑D↑. These customers were Iraqi (45%) and 41–50 (41%) years old. Loyal customers traveled with two people, and the number of men and women in this group was equal. They stayed at the hotel for seven days, and their travel intention was tourism. The third cluster is called collective buying customers, with a score of R↑F↓M↑. Collective buying customers traveled with two people, and they were mostly men (69%) ages 31–40 (28%). These customers were mostly Chinese internationals with freelance jobs, and their travel intentions were tourism. Collective buying customers spent three days in the hotel. The fourth cluster scored R↓M↑D↓ and were called potential customers. Potential customers were 31–40 (38%), both Chinese and Iraqi (15.3%), and stayed one night. They were employees who traveled for work purposes, and they were equal in their gender distributions. The customers, who mainly traveled alone, spent large amounts of money, and the hotel should propose special services for this cluster to make them stay longer. The fifth cluster had R = 114.02, M = 806.17, and D = 3.57; thus, this group scored R↓M↓D↑ and were called business customers. The business customers were Iraqi (23%), aged 31–40 (28%) years, and men (65%). They had freelance jobs and traveled alone for office work. These customers stayed at the hotel for two nights. The sixth cluster is known as lost customers, with R↓M↓D↓. These customers were Iraqi (18.5%), 21–30 (27%) years old, and stayed one night. Low price offerings should be used to attract this group. The overall descriptions are summarized in Table 5.

5.3. Association Rule Results

The association rule was applied to each cluster to find relationships between demographic variables. For instance, if a visitor was a male, he was Iraqi with a freelance job. Another rule for the first cluster is that the customers’ role should be tourist before their travel intentions were tourism. The rules in each cluster are shown in Table 6.

5.4. Comparison and Evaluation of Clusters

Evaluation of customers is an important task to identify profitable groups. In this study, two solutions are presented to recognize customers using the decision-making algorithm and CLV ranking. The weights of RMD were extracted using BWM (R = 0.25, M = 0.68, D = 0.06). Thus, loyal customers had the highest score and were the most precious group among the others. These results are demonstrated in Table 7. Based on the evaluation of CLV for each cluster, it can be understood that the most valuable group was the second cluster with 2.6% of all customers; on the other hand, the sixth cluster was the least valuable group; eventually, other clusters are prioritized in Table 8.

6. Discussion and Implications

This study showcases a hybrid data mining and multi-criteria decision-making (MCDM) approach to enhance customer relationship management (CRM) in hospitality, using Hally Hotel in Tehran, Iran, as a case study. By extending the RFM model to RMD (Recency, Monetary, Duration) and integrating K-means clustering with TOPSIS and CLV analysis, we identified six distinct customer segments—new, loyal, collective buying, potential, business, and lost—offering actionable insights for an emerging economy facing unique market dynamics.

6.1. Theoretical Contributions

Theoretically, this study refines customer segmentation by incorporating duration into the RMD framework, addressing a limitation of RFM’s transaction-centric focus (McCarty & Hastak, 2007). In hospitality, where stay length drives revenue more than visit frequency (unlike retail or banking), RMD provides a superior lens for valuing guests like loyal customers (R↑M↑D↑). This builds on prior models like LRFMP (Peker et al., 2017) but emphasizes duration’s hospitality-specific relevance. Moreover, combining K-means with association rules and MCDM tools (TOPSIS, BWM) marks a methodological leap. While (Dursun & Caber, 2016) used RFM with clustering, our use of three validation indices (silhouette, Davies–Bouldin, Calinski–Harabasz) and decision-making prioritization is novel, bridging unsupervised learning with strategic ranking—a rare integration in hospitality research.

6.2. Practical Implications

The practical implications of this study are highly relevant for hotel service managers, marketing teams, and CRM professionals. The segmentation framework identifies actionable customer profiles—such as loyal customers, who merit from investment in retention programs (e.g., personalized rewards, extended-stay perks), and potential customers, whose short visits but high spending suggest a need for conversion strategies (e.g., stay extension offers, room upgrades). Marketing managers can also leverage association rule insights (e.g., demographic and behavioral clusters) to fine-tune promotions to specific customer archetypes; for instance, targeting Chinese collective buyers with package deals, or engaging freelance Iraqi males with flexible pricing and tourism bundles. By ranking segments with CLV and TOPSIS, resource allocation can be optimized toward the most profitable customer types. This implementation-ready framework aligns with previous calls for integrating customer analytics into the CRM strategy (Kumar et al., 2015; Mosavi & Afsar, 2018), yet it goes further by operationalizing these insights through a quantitative decision-making lens. Moreover, the approach supports not only large hotel chains but also small and mid-sized hotels in emerging economies where marketing budgets are constrained and targeting must be precise. Future applications could adapt this model into dashboard tools or CRM plug-ins, enhancing usability by non-technical managers.

6.3. Contextual Insights

Iran’s geopolitical context—marked by U.S. sanctions limiting Western tourism—shapes the dominance of Iraqi (22.3%) and Chinese (7.3%) guests. Iraqi business customers (cluster 5) reflect commerce-driven travel, while Chinese collective buying customers (cluster 3) highlight group tourism, tied to Iran’s trade ties with China (e.g., oil and mineral exports). This contrasts with tourism-heavy markets like Turkey, underscoring the need for adaptive CRM in sanctioned economies. As (Dursun & Caber, 2016) argued, robust CRM amplifies data-driven insights, a critical advantage when guest pools are niche yet diverse. In a broader sense, this RMD framework offers a scalable model for hospitality firms to navigate competition, merging analytics with decision-making to elevate customer-centric strategies in resource-constrained settings.

7. Conclusions

This study introduces a novel Recency, Monetary, and Duration (RMD) model to segment hotel customers, validated through a hybrid approach combining K-means clustering, association rule mining, and multi-criteria decision-making (MCDM) techniques (TOPSIS, BWM). Applied to 1107 guests of Hally Hotel in Tehran, Iran, between August 2017 and August 2018, the framework identified six distinct clusters—new, loyal, collective buying, potential, business, and lost—prioritized by Customer Lifetime Value (CLV) and TOPSIS rankings. Loyal customers emerged as the most valuable segment (CLV = 0.59), while lost customers ranked lowest (CLV = 0.01), offering clear guidance for targeted marketing strategies. By extending the RFM model with duration and integrating advanced analytics, this study provides a robust tool for hospitality firms to enhance CRM, optimize resource allocation, and improve retention in competitive markets. Limitations include the absence of frequency data due to privacy constraints, restricting comparisons with traditional RFM models. The one-year dataset precluded seasonal analysis, and the focus on foreign guests excluded domestic customer insights. Future research could address these gaps by incorporating longitudinal data to capture seasonal trends (Ahmadi et al., 2025; Irani & Metsis, 2024), adding variables like booking channels or travel modes for richer segmentation, and extending the model to other hospitality contexts or emerging economies. Additionally, future studies could explore the use of more adaptive and dynamic clustering models to better handle temporal customer behavior and high-dimensional datasets (Deldadehasl et al., 2025). Despite these constraints, the RMD framework stands as a scalable, data-driven solution, bridging analytics and decision-making to elevate customer-centric strategies in the hospitality industry.

Author Contributions

Conceptualization, M.D.; methodology, M.D.; software, M.D. and H.H.K.; formal analysis, M.D., P.H.N. and H.H.K.; data curation, H.H.K. and P.H.N.; writing—original draft preparation, M.D. and P.H.N.; writing—review and editing, H.H.K. and P.H.N.; visualization, H.H.K. and P.H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Department of Industrial Management, University of Tehran, Tehran, Iran.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to restrictions set by the private entity that provided the data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abirami, M., & Pattabiraman, V. (2016). Data mining approach for intelligent customer behavior analysis for a retail store. In The 3rd international symposium on big data and cloud computing challenges (pp. 283–291). Springer. [Google Scholar]
Ahmadi, H., Emdadi Mahdimahalleh, S., Farahat, A., & Saffari, B. (2025). Unsupervised time-series signal analysis with autoencoders and vision transformers: A review of architectures and applications. Engineering Applications of Artificial Intelligence, 127, 107629. [Google Scholar]
Akter, J., Roy, A., Rahman, S., Mohona, S., & Ara, J. (2025). Artificial Intelligence-Driven Customer Lifetime Value (CLV) forecasting: Integrating RFM analysis with machine learning for strategic customer retention. Journal of Computer Science and Technology Studies, 7(1), 249–257. [Google Scholar] [CrossRef]
Alao, M. A., Ayodele, T. R., Ogunjuyigbe, A. S. O., & Popoola, O. M. (2020). Multi-criteria decision-based waste to energy technology selection using entropy-weighted TOPSIS technique: The case study of Lagos, Nigeria. Energy, 201, 117675. [Google Scholar] [CrossRef]
Ansari, A., & Riasi, A. (2016). Customer clustering using a combination of fuzzy c-means and genetic algorithms. International Journal of Business and Management, 11(7), 59. [Google Scholar] [CrossRef]
Baecke, P., & Poel, D. (2011). Data augmentation by predicting spending pleasure using commercially available external data. Journal of Intelligent Information Systems, 36(3), 367–383. [Google Scholar] [CrossRef]
Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics—Theory and Methods, 3(1), 1–27. [Google Scholar] [CrossRef]
Chang, H. H., Wang, Y. H., & Yang, W. Y. (2009). The impact of e-service quality, customer satisfaction and loyalty on e-marketing: Moderating effect of perceived value. Total Quality Management & Business Excellence, 20(4), 423. [Google Scholar]
Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management, 19(3), 197–208. [Google Scholar] [CrossRef]
Cheng, C. H., & Chen, Y. S. (2009). Classifying the segmentation of customer value by the RFM model and RS theory. Expert Systems with Applications, 36(3), 4176–4184. [Google Scholar] [CrossRef]
Çalık, A., Sain, S. L., & Guo, K. (2020). Evaluation of social media platforms using Best Worst Method and Fuzzy VIKOR Methods: A case study of travel agency. Iranian Journal of Management Studies, 19(3), 197–208. [Google Scholar] [CrossRef]
Davies, D., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227. [Google Scholar] [CrossRef] [PubMed]
Deldadehasl, M., Jafari, M., & Sayeh, M. R. (2025). Dynamic classification using the adaptive competitive algorithm for breast cancer detection. Journal of Data Analysis and Information Processing, 13(2), 101–115. [Google Scholar] [CrossRef]
Dimitrovski, D., & Todorovic, A. (2015). Clustering wellness tourists in spa environment. Tourism Management Perspectives, 16, 259–265. Available online: https://www.sciencedirect.com/science/article/pii/S2211973615300040 (accessed on 2 March 2025). [CrossRef]
Doğan, O., Ayçin, E., & Bulut, Z. A. (2018). Customer segmentation by using RFM model and clustering methods: A case study in retail industry. International Journal of Contemporary Economics and Administrative Sciences, 8(1), 1–19. [Google Scholar]
Dursun, A., & Caber, M. (2016). Using data mining techniques for profiling profitable hotel customers: An application of RFM analysis. Tourism Management Perspectives, 18, 153–160. [Google Scholar] [CrossRef]
Erevelles, S., Fukawa, N., & Swayne, L. (2016). Big data consumer analytics and the transformation of marketing. Journal of Business Research, 69(2), 897–904. [Google Scholar] [CrossRef]
Ganjali, M., & Teimourpour, B. (2016). Identify valuable customers of Taavon Insurance in field of life insurance with data mining approach. UCT Journal of Research in Science, Engineering and Technology, 4(1), 1–10. [Google Scholar] [CrossRef]
George, G., Haas, M. R., & Pentland, A. (2014). Big data and management. Academy of Management Journal, 57(2), 321–326. [Google Scholar] [CrossRef]
Guo, S., & Zhao, H. (2017). Fuzzy best-worst multi-criteria decision-making method and its applications. Knowledge-Based Systems, 121, 23–31. [Google Scholar] [CrossRef]
Gurau, C., & Ranchhod, A. (2002). How to calculate the value of a customer—Measuring customer satisfaction: A platform for calculating, predicting and increasing customer profitability. Journal of Targeting, Measurement & Analysis for Marketing, 10(3), 203. [Google Scholar]
Hanafizadeh, P., & Mirzazadeh, M. (2011). Visualizing market segmentation using self-organizing maps and the Fuzzy Delphi method–ADSL market of a telecommunication company. Expert Systems with Applications, 38(1), 198–205. [Google Scholar] [CrossRef]
Ho, C.-I., & Lee, Y.-L. (2007). The development of an e-travel service quality scale. Tourism Management, 28(6), 1434–1449. [Google Scholar] [CrossRef]
Hosseini, S. M., Maleki, A., & Gholamian, M. R. (2010). Cluster analysis using a data mining approach to develop CRM methodology to assess customer loyalty. Expert Systems with Applications, 37, 5259–5264. [Google Scholar] [CrossRef]
Hu, Y. H., & Yeh, T. W. (2014). Discovering valuable frequent patterns based on RFM analysis without customer identification information. Knowledge-Based Systems, 61, 76–88. [Google Scholar] [CrossRef]
Irani, H., & Metsis, V. (2024). Enhancing time-series prediction with temporal context modeling: A Bayesian and deep learning synergy. The International FLAIRS Conference Proceedings, 37(1). [Google Scholar] [CrossRef]
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. Available online: https://www.sciencedirect.com/science/article/pii/S0167865509002323 (accessed on 24 February 2025). [CrossRef]
Kahan, R. (1998). Using database marketing techniques to enhance your one-to-one marketing initiatives. Journal of Consumer Marketing, 15(5), 491–493. [Google Scholar] [CrossRef]
Kahreh, Z., Shirmohammadi, A., & Kahreh, M. (2017). Explanatory study towards analysis of the relationship between total quality management and knowledge management. Procedia—Social and Behavioral Sciences, 109, 600–604. [Google Scholar] [CrossRef]
Kasprova, A. (2020). Customer lifetime value for retail based on transactional and loyalty card data. Ukrainian Catholic Institution. [Google Scholar]
Khajvand, M., & Tarokh, M. J. (2011). Estimating customer future value of different customer segments based on adapted RFM model in retail banking context. Procedia Computer Science, 3, 1327–1332. [Google Scholar] [CrossRef]
Khalilzadeh, M., Katoueizadeh, L., & Zavadskas, E. K. (2020). Risk identification and prioritization in banking projects of payment service provider companies: An empirical study. Frontiers of Business Research in China, 14(1), 1–27. [Google Scholar] [CrossRef]
Kotler, P. (1973). Atmospherics as a marketing tool. Journal of Retailing, 49(4), 48–64. [Google Scholar]
Kumar, V., Bhagwat, Y., & Zhang, X. (2015). Regaining ‘lost’ customers: The predictive power of first-lifetime behavior, the reason for defection, and the nature of the win-back offer. Journal of Marketing, 79(4), 34–55. [Google Scholar] [CrossRef]
Laursen, G. H. (2011). Business analytics for sales and marketing managers: How to compete in the information age. John Wiley & Sons. [Google Scholar]
Law, R., Qi, S., & Buhalis, D. (2010). Progress in tourism management: A review of website evaluation in tourism research. Tourism Management, 31(3), 297–313. [Google Scholar] [CrossRef]
Liao, S. H., Chen, Y. J., & Deng, M. Y. (2010). Mining customer knowledge for tourism new product development and customer relationship management. Expert Systems with Applications, 37(6), 4212–4223. [Google Scholar] [CrossRef]
Loh, W. Y., & Shih, Y. S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840. [Google Scholar]
Mahdiraji, H. A., Zavadskas, E. K., Kazeminia, A., & Kamardi, A. A. (2019). Marketing strategies evaluation based on big data analysis: A CLUSTERING-MCDM approach. Economic Research-Ekonomska Istraživanja, 32(1), 2882–2898. [Google Scholar] [CrossRef]
Matz, A., & Hermawan, A. T. (2020). Customer loyalty clustering model using K-Means algorithm with LRIFMQ parameters. Inform, 5(2), 54–61. [Google Scholar]
McCarty, J. A., & Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of Business Research, 60(6), 656–662. [Google Scholar] [CrossRef]
Mesforoush, A., & Tarokh, M. J. (2013). Customer profitability segmentation for SMEs case study: Network equipment company. International Journal of Research in Industrial Engineering, 2(1), 30–44. [Google Scholar]
Miglautsch, J. (2000). Thoughts on RFM scoring. Journal of Database Marketing & Customer Strategy Management, 8(1), 67–72. [Google Scholar]
Mohammadian, M., & Makhani, I. (2016). RFM-Based customer segmentation as an elaborative analytical tool for enriching the creation of sales and trade marketing strategies. International Academic Journal of Accounting and Financial Management, 3(6), 21–35. [Google Scholar] [CrossRef]
Mohammadrezapour, O., Kisi, O., & Pourahmad, F. (2020). Fuzzy c-means and K-means clustering with genetic algorithm for identification of homogeneous regions of groundwater quality. Neural Computing and Applications, 32(8), 3763–3775. [Google Scholar] [CrossRef]
Mosavi, A. B., & Afsar, A. (2018). Customer value analysis in banks using data mining and fuzzy analytic hierarchy processes. International Journal of Information Technology & Decision Making, 17(3), 819–840. [Google Scholar]
Ostovare, M., & Shahraki, M. R. (2019). Evaluation of hotel websites using the multicriteria analysis of PROMETHEE and GAIA: Evidence from the five-star hotels of Mashhad. Tourism Management Perspectives, 30, 107–116. [Google Scholar] [CrossRef]
Peker, S., Kocyigit, A., & Eren, P. E. (2017). LRFMP model for customer segmentation in the grocery retail industry: A case study. Marketing Intelligence & Planning, 35(4), 544–559. [Google Scholar] [CrossRef]
Rajput, L., & Singh, S. N. (2023, January 19–20). Customer segmentation of e-commerce data using K-Means clustering algorithm. 2023 13th international conference on cloud computing, data science & engineering (confluence) (pp. 659–664), Noida, India. [Google Scholar] [CrossRef]
Ramasamy, S., & Nirmala, K. (2020). Disease prediction in data mining using association rule mining and keyword-based clustering algorithms. International Journal of Computers and Applications, 42(1), 1–8. [Google Scholar] [CrossRef]
Reichheld, F. F., & Sasser, J. (1996). Zero defections: Quality comes to services. Harvard Business Review, 68(5), 105–111. [Google Scholar]
Rezaei, J. (2015). Best-worst multi-criteria decision-making method. Omega (Westport), 53, 49–57. [Google Scholar] [CrossRef]
Ristoski, P., & Paulheim, H. (2016). Semantic Web in data mining and knowledge discovery: A comprehensive survey. Journal of Web Semantics, 36, 1–22. Available online: https://www.sciencedirect.com/science/article/pii/S1570826816000020 (accessed on 27 February 2025). [CrossRef]
Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. Available online: https://www.sciencedirect.com/science/article/pii/0377042787901257 (accessed on 27 February 2025). [CrossRef]
Samarasinghe, G. D., & Samarasinghe, D. S. R. (2013). Green decisions: Consumers’ environmental beliefs and green purchasing behaviour in Sri Lankan context. International Journal of Innovation and Sustainable Development, 7(2), 172–184. [Google Scholar] [CrossRef]
Sarvari, P. A., Ustundag, A., & Takci, H. (2016). Performance evaluation of different customer segmentation approaches based on RFM and demographics analysis. Kybernetes, 45(7), 1129–1157. [Google Scholar] [CrossRef]
Sharifi, S. (2025). Enhancing kidney transplantation through multi-agent kidney exchange programs: A comprehensive review and optimization models. arXiv, arXiv:2502.07819. [Google Scholar] [CrossRef]
Srihadi, T. F., Sukandar, D., & Soehadi, A. W. (2016). Segmentation of the tourism market for Jakarta: Classification of foreign visitors’ lifestyle typologies. Tourism Management Perspectives, 19, 32–39. [Google Scholar] [CrossRef]
Syakur, M. A., Khotimah, B. K., Rochman, E. M. S., & Satoto, B. D. (2018). Integration K-means clustering method and elbow method for identification of the best customer profile cluster. IOP Conference Series: Materials Science and Engineering, 336(1), 012017. [Google Scholar] [CrossRef]
Tabianan, K., Velu, S., & Ravi, V. (2022). K-Means clustering approach for intelligent customer segmentation using customer purchase behavior data. Sustainability, 14(12), 7243. [Google Scholar] [CrossRef]
Tu, Y., Chen, K., Wang, H., & Li, Z. (2020). Regional water resources security evaluation based on a hybrid fuzzy BWM-TOPSIS method. International Journal of Environmental Research and Public Health, 17(14), 4987. [Google Scholar] [CrossRef]
Wang, S., Sun, L., & Yu, Y. (2024). A dynamic customer segmentation approach by combining LRFMS and multivariate time series clustering. Scientific Reports, 14, 17491. [Google Scholar] [CrossRef]
Wang, Y., Wen, Z., & Li, H. (2020). Symbiotic technology assessment in iron and steel industry based on entropy TOPSIS method. Journal of Cleaner Production, 260, 120900. [Google Scholar] [CrossRef]
Wei, J. T., Lee, M. C., Chen, H. K., & Wu, H. H. (2013). Customer relationship management in the hairdressing industry: An application of data mining techniques. Expert Systems with Applications, 40(18), 7513–7518. [Google Scholar] [CrossRef]
Wei, J. T., Lin, S. Y., & Wu, H. H. (2010). A review of the application of the RFM model. African Journal of Business Management, 4(19), 4199. [Google Scholar]
Yoon, P., & Hwang, C.-L. (1995). Multiple attributes decision making: An introduction. Sage Publications. [Google Scholar]
You, Z., Si, Y.-W., Zhang, D., Zeng, X., Leung, S., & Li, T. (2015). A decision-making framework for precision marketing. Expert Systems with Applications, 42(7), 3357–3367. [Google Scholar] [CrossRef]
Zhang, X., Gong, W., & Kawamura, Y. (2004). Customer behavior pattern discovering with web mining. In Asia-Pacific web conference (pp. 844–853). Springer. [Google Scholar]
Zhao, H. H., Luo, X. C., Ma, R., & Lu, X. (2021). An extended regularized K-means clustering approach for high-dimensional customer segmentation with correlated variables. IEEE Access, 9, 48405–48412. [Google Scholar] [CrossRef]

Figure 1. Data mining techniques (source: Mahdiraji et al., 2019).

Figure 2. Research process to evaluate hotel customers.

Table 1. Previous related research.

Researcher(s)/Year	Target(s)	Tool(s)	Result(s)
(Akter et al., 2025)	CLV forecasting and segmentation for retention strategy	RFM, ML (regression, decision trees, neural nets), K-means++	Improved CLV prediction accuracy; enabled personalized retention with AI-driven clustering
(Wang et al., 2024)	Dynamic customer segmentation in auto parts industry	LRFMS model, DTW-D, SBD, CID, AP, SP, k-medoids	Outperformed traditional RFM; identified meaningful segments using multivariate time series clustering
(Rajput & Singh, 2023)	Customer segmentation to guide platform focus (website vs. app)	K-means, elbow	Identified key clusters; recommended boosting app use and memberships
(Tabianan et al., 2022)	Customer segmentation in e-commerce	K-means	Behavior-based clustering for profitable customer segmentation
(Zhao et al., 2021)	Combining K-means and L1-norm	K-means, L-1 norm, RFM	Outperforms K-means with fewer errors
(Mohammadrezapour et al., 2020)	Comparing two clustering methods	K-means, C-means	C-means yielded higher accuracy than K-means
(Matz & Hermawan, 2020)	Proposing a model for a cluster of a loyal customer	LRIFMQ, CLV, AHP, K-means	Customers were grouped into six clusters
(Mahdiraji et al., 2019)	Clustering and ranking bank customers using RFM	RFM modeling, BWM, COPRAS	Classified customers into six clusters and selected two groups as influential ones
(Syakur et al., 2018)	Determining the best number of clusters	K-means, elbow method	Defining an appropriate number of clusters using the elbow method
(Doğan et al., 2018)	Clustering retail customers	RFM modeling, K-means, two-step	Comparing two types of clustering results
(Mosavi & Afsar, 2018)	Analyzing bank customers’ value	FAHP, K-means, random forest classification	Presenting the model according to the applied attributes
(Peker et al., 2017)	Developing services and increasing profits	LRFMP, K-means, Calinski–Harabasz, Davies–Bouldin, silhouette	Clustering customers into five groups
(Dursun & Caber, 2016)	Clustering hotel customers	RFM modeling, K-means	Offering proper strategies to each group
(Ansari & Riasi, 2016)	Combining data mining methods to cluster steel industries’ customers	LRFM modeling, two-step, genetic algorithm, C-means	Classifying customers into two groups, rendering tailored strategies
(Ganjali & Teimourpour, 2016)	Clustering insurance customers	K-means, CLV, association rule, decision tree, Davies–Bouldin	Classifying customers into five clusters
(Sarvari et al., 2016)	Clustering fast-food customers	Associated rules, RFML modeling, K-means	Having proper groups is critical to forming strong associations
(Abirami & Pattabiraman, 2016)	Clustering customers	RFM modeling, K-means, association rules	Predicting customers’ behavior, improving customer satisfaction
(Srihadi et al., 2016)	Clustering foreign customers	K-means	Identifying groups, proposing proper strategies
(Chang et al., 2009)	Finding important variables influenced by customer loyalty	Decision tree analysis	Exploring customer behavior
(Mohammadian & Makhani, 2016)	Analyzing data to identify customer intentions	RFM modeling, CLV	Grouping customers into eight clusters to understand customers
(You et al., 2015)	Clustering customers	RFM modeling, K-means, CHAID decision trees, Pareto values	Offering precision marketing strategies
(Dimitrovski & Todorovic, 2015)	Understanding customer behavior	K-means, chi-square test, hierarchical method	Understanding visitor intentions, presenting appropriate promotions
(Wei et al., 2013)	Clustering hairdressing industry customers	K-means, RFM modeling	Identifying customers, offering proper strategies
(Chen et al., 2012)	Understanding retail customers	K-means, RFM modeling, decision tree	Classifying customers into five clusters
(Liao et al., 2010)	Finding hidden patterns in data	K-means, a priori algorithm	Exploring group-buying customer behavior
(Hosseini et al., 2010)	Clustering SAPCO customers	K-means, WRFM, CLV	Assessing customers, proposing an effective model for understanding customers

Table 2. Demographic profiles of hotel customers (N = 1107).

Category	Group	Percentage (%)
Gender	Female	60.4
	Male	34.9
Age group	31–40	26.7
	41–60	19.4
Nationality	Iraqi	22.89
	Chinese	7.33
Travel type	Alone	61.12
	With others	31.55

Table 3. Validity index values.

Number of K	Silhouette	Davies–Bouldin	Calinski–Harabasz
Weight of validity indices	0.34	0.36	0.28
2	0.74	0.63	831.2
3	0.82	0.62	1096.65
4	0.78	0.66	1287.77
5	0.70	0.63	1347.64
6	0.73	0.61	1602.60
7	0.70	0.67	1484.12
8	0.74	0.70	1654.36
9	0.73	0.68	1613.19
10	0.73	0.71	1652.56

Table 4. Descriptive statistics of RMD table.

RMD Indices	Minimum	Maximum	$\bar{x}$	St. dev.
R (Recency)	10	365	126.8	97.0
M (Monetary)	667	4724	1252.3	501.6
D (Duration)	1	12	3.4	1.6

Table 5. RMD scores.

Clusters	N	$\bar{D}$	$\bar{M}$	$\bar{R}$	RMD Value
1	579	1.00	414.10	211.60	R↑M↓D↓
2	24	8.45	3221.50	202.58	R↑M↑D↑
3	81	3.66	1214.61	185.40	R↑M↓D↑
4	26	2.76	1314.88	14.00	R↓M↑D↓
5	315	3.57	806.17	114.01	R↓M↓D↑
6	81	1.32	542.76	33.75	R↓M↓D↓
Total	1107	3.46	1252.34	126.84

Table 6. Attributes of the clusters.

Attributes	NC	LC	CBC	PC	BC	LoC
RMD scores	R↑M↓D↓	R↑M↑D↑	R↑M↓D↑	R↓M↑D↓	R↓M↓D↑	R↓M↓D↓
N	579 (52.3%)	24 (2.16%)	81 (7.31%)	26 (2.34%)	315 (28.45%)	81 (7.31%)
Gender	Male (68%)	Male & Female (50–50%)	Male (69%)	Male & Female (50–50%)	Male (65%)	Male (66%)
Age group	21–30 (26%)	41–50 (41%)	31–40 (28%)	31–40 (38%)	31–40 (28%)	21–30 (27%)
Nationality	Iraqi (24%)	Iraqi (45%)	Chinese (17.3%)	Iraqi & Chinese (15.3–15.3%)	Iraqi (23%)	Iraqi (18.5%)
Travel companion	Alone (68.22%)	Two people (20%)	Two (38.2%)	Alone (38%)	Alone (65.7%)	1 (51.1%)
Job	Freelance (64.7%)	Freelance (41.6%)	Freelance (39.5%)	Employee (38%)	Freelance (62.2%)	Tourist (72.8%)
Travel intentions	Tourism (58.5%)	Tourism (43%)	Tourism (49.38%)	Office work (34.6%)	Office work (34.9%)	Tourism (50.6%)
Duration (days)	1 (100%)	7 (33.33%)	4 (44%)	1 (50%)	2 (74.3%)	1 (76.5%)

Note: NC: New Customers, LC: Loyal Customers, CBC: Collective Buying Customers, PC: Potential Customers, BC: Business Customers, LoC: Lost Customers.

Table 7. Association rule results.

Clusters	Rule	Confidence	Support
New customers	[male → Iraqi, freelance]	94.5%	16.5%
	[tourism → tourist]	93.5%	11.3%
Loyal customers	[freelance → Iraqi, men]	100%	12.5%
	[tourism → tourist]	100%	12.5%
Collective buying customers	[men → freelance, 41–50]	100%	11.11%
	[tourism → Chinese, 31–40]	100%	11.11%
Potential customers	[men → Chinese]	100%	15.3%
	[employee → women, 31–40, office work]	83.87%	23.4%
Business customers	[men → Iraqi, freelance]	94.11%	10.7%
	[men → office work, freelance]	100%	12.5%
Lost customers	[tourism → 61–90, tourist]	88%	12.3%

Table 8. CLV Ranking.

Clusters	Cluster Ranking By TOPSIS	N	D	M	R	CLV	CLV Ranking
C1	0	52.3	0.009	0.012	0.33	0.09	CLV4
C2	0.86	2.16	0.66	0.7	0.3	0.59	CLV1
C3	0.13	7.3	0.12	0.1	0.25	0.13	CLV2
C4	0.21	2.34	0.07	0.11	0.001	0.07	CLV3
C5	0.12	28.45	0.11	0.04	0.097	0.05	CLV5
C6	0.14	7.31	0.01	0.02	0.008	0.01	CLV6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deldadehasl, M.; Karahroodi, H.H.; Haddadian Nekah, P. Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy. Tour. Hosp. 2025, 6, 80. https://doi.org/10.3390/tourhosp6020080

AMA Style

Deldadehasl M, Karahroodi HH, Haddadian Nekah P. Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy. Tourism and Hospitality. 2025; 6(2):80. https://doi.org/10.3390/tourhosp6020080

Chicago/Turabian Style

Deldadehasl, Maryam, Houra Hajian Karahroodi, and Pouya Haddadian Nekah. 2025. "Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy" Tourism and Hospitality 6, no. 2: 80. https://doi.org/10.3390/tourhosp6020080

APA Style

Deldadehasl, M., Karahroodi, H. H., & Haddadian Nekah, P. (2025). Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy. Tourism and Hospitality, 6(2), 80. https://doi.org/10.3390/tourhosp6020080

Article Menu

Customer Clustering and Marketing Optimization in Hospitality: A Hybrid Data Mining and Decision-Making Approach from an Emerging Economy

Abstract

1. Introduction

2. Literature Review

2.1. Data Mining in the Hospitality Industry

2.2. Customer Segmentation Techniques

2.3. RFM vs. RMD: The Need for an Enhanced Segmentation Model

3. Basic Concepts

3.1. K-Means

3.2. Association Rules and Customer Behavior Analysis

3.3. Multi-Criteria Decision-Making (MCDM) Approaches for Customer Prioritization

3.3.1. Shannon Entropy

3.3.2. TOPSIS

3.3.3. BWM

3.3.4. Customer Lifetime Value

4. Research Methodology

5. Case Study and Results

5.1. Clustering Model

5.2. Clustering Analysis

5.3. Association Rule Results

5.4. Comparison and Evaluation of Clusters

6. Discussion and Implications

6.1. Theoretical Contributions

6.2. Practical Implications

6.3. Contextual Insights

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI