Applied Big Data Analysis to Build Customer Product Recommendation Model

With the development of the Internet environment, the trend of the retail industry in the future. It cannot be separated from the community, data and experience. Consumers’ lifestyles and purchasing behaviors are constantly changing and retailers must adopt policies to understand consumers. This research analyzes supermarkets most commonly touched by consumers in daily life. In order to find hidden information behind customer transaction data, it helps supermarkets to learn about the habits of customers to help them Formulate marketing strategies and improve the profitability of supermarkets and maintain long-term relationships with customers. Thus, the RFM model is used to convert customer transaction data into R, F, and M values and then clustering using the Ward’s method to combine with K-means, fuzzy C-means, and self-organizing maps. Using discriminant analysis find out the grouping method with the highest accuracy rate to calculate the customer lifetime value score. In terms of product recommendation, customers can be recommended to buy products in the top five categories or to use rules found in association rule to make recommendations. In terms of customers, we maintain long-term relationships with customers by recommending other related products, products for bundling sale, giving gifts or discount coupons, and regularly organizing promotional activities.


Introduction
Scoring customers based on their total spend in the current year has formed the basis for businesses to select their target customers for the following year. This method primarily evaluates customers based on their total spend on purchases. Businesses often believe that 20% of customers contribute 80% of total revenues and the relationship between resource input and customer contribution is not equivalent. Italian economist Pareto [1] stated in 1897 that a nonequivalent relationship exists between causes and results as well as between efforts and effects.
With the recent growth of the network environment and advent of information explosion, people have become inextricably linked to big data. The Global Powers of Retailing 2017 released by Deloitte & Touche indicates that future retailing trends can be encapsulated into three major paradigms: social networks, data, and experiences. Social Networks: With the advent of the social networking age, the starting point for consumer management should be social networks. Chatbots are expected to become highly popular in e-commerce. Consequently, big data will become the key decision-making point for retailers. Cyber Physical Systems (CPS) will become mainstream; brick-and-mortar stores face major changes; and data play an integral role in making the narrative facts specific, clear, and more convincing. Emotion-oriented consumer experiences, interactive experiential consumption, and games are expected to win over the hearts and minds of consumers. The new retail The first step for a quality company to develop over the long term is digitization, while maintaining good customer relations is the primary focus. Safeway, the third largest supermarket chain in the United Kingdom, records the types of goods purchased by customers using store credit with every purchase. Sales can be increased by identifying the correlations between different product purchases from customers' consumption data and placing two related products in close proximity. This research holds that the hidden information behind customer transaction data is highly important to understand customers' buying habits and increase sales volume by placing two highly correlated products together.

Statements of the Problem
The prevalence of the Internet makes product costs visible to the world. In an era of meagre profit, business management is in an increasingly competitive environment. After a period of operation, a consistent decision is a must in business management. It is of great importance to successful businesses about selecting which customers as primary marketing targets, how to verify the marketing resource is distributed to the selected customers equally, that is how to distribute the marketing resources to various types of customers (proportions and methods are not necessarily the same every year and would have to be modified based on the social boom and the political-economic atmosphere of the year), and how to screen customers and foster them to enhance the corporates future profitability.
Many businesses have been using the customer spending score (CSS) to evaluate their customers. CSS is defined as the anticipated revenue from customers' consumption, with which potential target customers for the next year are evaluated and differentiated based on one or two deciles ranging from the top to the bottom. Since CSS mainly relies on customers' consumption to differentiate different customers, without taking into considerations other Sustainability 2021, 13,4985 3 of 45 influential factors of customer service. Therefore, many businesses hope for a better alternative to replace the oversimplified CSS evaluation. It is the same with the retail business. When customers shop in a store, they are encouraged to apply for membership with all kinds of rewards, such as a one-hundred-dollar coupon for future consumption with the completion of membership application. For this, the researchers of the current study have designed a membership application form to collect customers' data, used as a basic marketing differentiation. Customers are divided into three tiers (i.e., A, B, and C) according to their annual expenditure, and are further provided with tier-specific marketing activities based on their preferences. However, problems come with the fact that some loyal frequent buyers in the stores with little annual expenditure might be ignored.
The Pareto principle (also known as the 80-20 rule) is widely quoted in many businesses to account for the unbalanced relation between resource investment and customer contribution. Discovered by Italian economist Vilfredo Pareto in 1897, the 80-20 rule suggests that there exists an unbalanced relation between causes and effects, or efforts and gains. Typical examples would be "20% of the gains come from 80% of the efforts, and 80% of the effects are made by 20% of the causes". In other words, 80% of a person's efforts only result in 20% of the gains. It is a rule of thumb in business that 80% of a company's profits come from 20% of its valuable clients.
The 80-20 rule has been proved worldwide. For example, ARC Management Consulting Ltd. (London, UK). analyzed the customer market with Activity Based Costing (ABC), finding that 80% of the overall interests come from top 20% of the 400 companies (2001). Ref. [1] further extended the rule into the 80/20/30 rule, in which the number "30" represents that most of the profit is wasted supporting the 30% of the customers who are not generating any revenue for the business. Therefore, the top 20% should get significant focus and receive most of the resources as they are bringing in the bulk of your revenue (80%). Knowing the 80% that make efforts out of proportion to gains enables the businesses to either suit methods to specific situations or simply delete them, which in turn reduces cost. This implies that giving up on the customers who do not make much contribution enhances a company's profits.

Scope and Review
Whether it is the 80/20 rule or the 80/20/30 rule, both rules imply the centralization and key management of resources. In other words, 80% of a company's resources should be distributed to the most valuable 20% customers, and the rest 20% to less valuable customers. With this strategy, customers who make much contribution to the profits would become loyal customers because of special preference or delicate services. On the other hand, customers who do not make much contribution to the profits would increase consumption and become those who actually make much contribution in the hope of receiving advanced preferences or services. They might also gradually move to other competitors because of not being able to receive advanced preference or service in the original company. This kind of transfer is beneficial to a business for that it not only lowers marketing expenditure to those less-valuable customers but, also, increases competing companies' cost. To fulfill this kind of customer differentiation and management, a successful database analysis is a must. First, administrators should develop a business culture with continuous creativity and R&D and support the required resources into the company-customer interactive network. Second, customers should be differentiated into different tiers with the use of database analysis. The last part lies in the most important assets of long-term business operation; that is, to keep high customer satisfaction and to maintain customer lifetime value. Specialized membership databases and multi-functional marketing activities that knitting consumption and after-sales services together would boost lasting and stable profits. For example, Capital One used database and analytical models to locate consumers who spend, borrow, and return money. This type of consumers receives more marketing activities and resources. On the contrary, for other consumers, their loan interests were raised, and they were forced to move to other credit card companies. This strategy made Capital One a professional credit card company with the highest growth rate. Another example is an index research of customer lifetime value (CLV), customers' profitability, and distribution of marketing resources conducted by International Business Machines Corporation (IBM) in 2008. The study tested one simple management principle: Could low-valuable clients create high-value profits with enhanced accuracy of interacting with clients? The aforementioned examples have been proven and have helped businesses save resources.

Objective
In light of the previous statements, pinpointing those key customers who make contribution to a company's profits is vital to business management. It helps businesses target on valuable customers and establishes long-term positive relationships with those customers. In addition, it enables businesses to effectively distribute limited marketing resources.

Organization of the Paper
In such a competitive management environment, important issues in business management have been centering on how to examine the cost, enhance profits, and identify customer values to boost profits, effectively distribute business resources, and improve performances. Sleeping customers were deleted to ease the heavy load of data analysis. Then the integration of RFM model and customer lifetime value model allowed for a thorough examination of customer lifetime value. In addition, the use of clustering technique to differentiate markets aimed to pinpoint high-value customers. Given the effect of customer lifetime value on customers' potential value, the current study proposed the use of membership databases of retailers to predict customer lifetime value, in the hope of receiving more accurate customer value.

Differentiating Active and Sleeping Customers
It is very difficult to analyze membership databases of retailers, partly because the large membership as well as its massive basic information. Membership databases include the consumption track of every member. In addition to the card number, relevant information includes purchased item numbers, unit prices, discounts of each item, and overall discounts. Of course, many members do not purchase products after applying for the membership. They may also change their minds and purchase in competitors' stores. Therefore, the researcher in the current study also wanted to understand the analysis of targeting on active customers and of deleting sleeping customers to save resources. Due to the lack of a differentiation standard, the sleeping customers in the current study referred to those who did not purchase for six consecutive months. The deletion of them in the data analysis also excluded marketing activities designed for them. However, this method has been questioned since the identification method that lacked theoretical supports might have excluded many valuable customers.

Predict Customer Value with Enhanced Precision
In the analyses and investigations of customer purchasing behavior, the most important part is the customer-based analysis, where a customer's purchasing history is used to predict his/her potential purchasing behavior in the future. Then the predicted customer lifetime value is employed to differentiate customers in the database, thus facilitating the choice of target customers [2,3]. More precise and real customer value means more accurate customer potential value, with which businesses could accurately identity customers with higher potentials and effectively develop relevant marketing strategies.

Research Objective
This study takes supermarket members as the research object to uncover hidden information behind customer transaction data and help supermarkets understand customers' habits, develop marketing strategies, increase profits, and maintain long-term customer relationships. In this study, we first employed the recency, frequency, and monetary (RFM) model to encode customer data and converted customer transaction data into R, F, and M values. Next, we used Ward's method along with K-means, fuzzy C-means (FCM), and other clustering methods to cluster existing customers. Following this, we calculated the customer lifetime value score (CLVS) of each group to obtain the customer lifetime value for each customer. Finally, we employed association rules to determine the correlations between customers buying one product and those buying another. According to each process step, this study is expected to achieve the following results:

1.
Converting customer transaction data into a calculable value for measurement and comparison; 2.
Grouping customers with similar RFMs to help supermarkets understand customer habits and develop marketing strategies for each customer group; 3.
Calculating the customer lifetime value of each customer group to arrive at the value brought by each group of customers to the supermarket; 4.
Determining the correlation between the products purchased by customers to juxtapose the products and thereby increasing the profitability for supermarkets.

Supermarkets
Supermarkets are large retail stores, usually chain stores, selling not only daily necessities but also many different types of food, including fresh food, snacks, cooked food, and so on. They are based on self-service, have low gross margins and selling prices, and seek to sell more at lower margins.
American scholar [4] pointed out in his book The Super Market that, "a supermarket is a highly departmentalized retail establishment dealing in foods and other merchandise, either wholly owned or concession operated, with adequate parking space, and doing a minimum of $250,000 annually".
Supermarkets are defined as markets that retail daily household necessities and food products with fresh and prepared food, including those that operate in the form of employees' consumer cooperatives or consortia and any business that engages in vending or catering in the premises of supermarkets [5].
The Ministry of Economic Affairs of the Republic of China [6] defines supermarkets as follows: "Those engaged in the provision of sectoral retail of daily household necessities and food products with mainly fresh and prepared food, including small-scale catering services provided on site".
Ref. [7] defines a supermarket in the following manner: "Supermarkets are departmentalized stores that sell daily foods for people's livelihood. They sell a variety of fresh foods (meat, fish, poultry, fruits, vegetables, and processed products) and daily necessities, most of which are sorted, neatly packaged, and labeled with weight, price, and date of manufacture. They are retail stores with self-service purchases, unified checkout, cash registers, and refrigeration equipment".

RFM Model
The three indicators in the RFM model, i.e., R, F, and M stands for recency, frequency, and monetary, respectively. Recency (R) refers to the time of a customer's last purchase and can be regarded as the customer's degree of activity toward the firm. Frequency (F) refers to how often a product is purchased during a period of time and can be regarded as the customer's loyalty to the firm. Monetary (M) refers to the amount of goods purchased by customers over a time span and can be regarded as the extent of the customer's contribution to the firm and customer value. Ref. [8] pointed out that RFM is a widely used analysis technique, and the RFM model can be used to run simple and quick customer analysis. With the three measurement indicators in the RFM model, customer consumption behavior can Sustainability 2021, 13, 4985 6 of 45 be outlined simply and clearly to provide firms consumer behavior analysis and develop customer relationship management strategies [9].

Recency
Recency (R) means how long it has been since the last time a customer visited a store for the purpose of consumption. When the time of a customer's last visit to a store is closer to the present, the recency value will be smaller; conversely, when a customer's last visit to a store is farther from the present, the recency value will be larger. Assuming that a customer's recency value is small, the firm will be able to discern that there is a greater chance of a repeat visit; in other words, the customer's activity level is high.

Frequency
Frequency (F) means the number of times a customer purchases a product within a certain time span. When the customer purchases a greater number of times, the frequency value will be higher; conversely, when the customer purchases a fewer number of times, the frequency value will be lower. Given a high frequency of purchase, firms can discern that the customer's repeat purchase rate is high, i.e., the customer loyalty is high.

Monetary
Monetary (M) means the amount of money a customer spends on a product in a certain time span. When a customer spends more money on a product, the monetary value will be higher; conversely, when a customer spends less money on a product, the monetary value will be lower. Given a high monetary amount, firms can discern that the customer value is high, i.e., the customer contribution is high.

RFM Score
The RFM scores are divided into two parts: Hughes' RFM score and Stone's RFM score.

•
Hughes' RFM Indicator Ref. [10] sorted the recency, frequency, and monetary values for all customers and divided them into five parts: 0-20%, 21-40%, 41-60%, 61-80%, and 81-100%. Customers with percentages between 0-20% were assigned a score of "5", customers with percentages between 21-40% were assigned a score of "4", customers with percentages between 41-60% were assigned a score of "3", customers with percentages between 61-80% were assigned a score of "2", and customers with percentages between 81-100% were assigned a score of "1". The Hughes RFM code is shown in Table 2.  Ref. [11] divided the recency values into five parts. When the customer's last visit to the store is within three months of the current time, the R value is given a "24" point ranking. When the time between the customer's last shopping and the current time is three to six months, the R value is given a "12" point ranking. When the last time the customer consumed in the store is six to nine months from the current time, the R value is given a "6" point ranking. When the time between the last time the customer visited the store and the current time is 9-12 months, the R value is given a "3" point ranking; if the time between the last time a customer visits the store and the current time exceeds 12 months, the R value is given a "0" point ranking. The number of purchases (F) code is the number of times a customer purchases a product in a period of time multiplied by "4" points. When the number of purchases is 3, the F value will be 3 × 4 = 12 and scored as a "12". The monetary (M) code is the amount of product purchased by the customer over a period of time multiplied by 1/10; however, the highest M value is limited to 9 points to avoid the occurrence of excessive monetary amount and insufficient number of purchases. When the monetary amount is NT$100, the M value will be 100 × 1/10 = 10. However, due to M value's highest score being 9 points, M value code is scored "9". The Stone RFM codes are shown in Table 3.

Recency
Most recent 3 months = 24 points Most recent 3-6 months = 12 points Most recent 6-9 months = 6 points Most recent 9-12 months = 3 points Most recent 12 months or more = 0 points Medium Frequency Number of purchases × 4 point High

Monetary
Monetary amount × 10 percent (The maximum point limit is 9) Low 1 Data Source: Stone (1995). Organized by author.

Clustering Method Application
This section reviews and discusses the methods used in this research: Ward's method, K-means, Fuzzy C-means, FCM, self-organizing maps (SOM), and discriminant analysis (DA).

Ward's Method
Ref. [12] proposed his method, also known as minimum variance method, which initially treats each individual as a group and then merges the clusters. The cluster with the smallest variance within the group is merged first. The earlier the cluster is merged, the higher is the similarity.
Ref. [13] applied Ward's Method and K-Means to cluster non-regular employees. Results show that they are divided into four groups: "Work is Work", "Work is Money", "Newborn Tigers", and "Youth is Capital". This classification is based on their characteristics and respective group segments as well as describing them for management reference and suggestions.
Ref. [14] targeted the surrounding large-scale hypermarkets in Tainan and Kaohsiung as the main subjects of a survey. The research data were clustered through Ward's Method to ascertain the competition among hypermarkets, thereby arriving at the following strategies: (1) business strategies wherein stores in the same business district tend to emphasize "future-oriented attributes", and the operating performance deriving from aggressive strategies is superior to risk aversion and defensive strategies and (2) differing regions with significant differences in the time of establishment and degree of importance attached to business strategy attributes by position within the hierarchy.
To ascertain consumer behavior in wealth management between industries, Ref. [15] provided his successful strategic plan, utilizing Ward's Method, K-Means, and variance analysis to explore consumer behavior in wealth management. The results of the study found that cross-sectoral wealth management consumers prefer to invest in stocks, funds, insurance plans, and bonds. While items such as currency exchange, trusts, futures, structured notes, and derivatives are less popular with consumers, it can be used as a frame of reference for financial professionals to promote cross-sectoral wealth management products.
The subject of a study by [16] was primarily the consumption behavior of tourists utilizing hotel services. The foundation of the study was tourist motivation treated as a demographic, which was then divided into three consumer groups using Ward's Method and K-Means. Said groups were labeled "emotional exchange group", the "experience leisure group", and the "spiritual precipitation group". The tourist motivation and service quality of tourist hotels could then be analyzed through factor analysis, thereby yielding five factors that were analyzed through variance analysis to identify tourism motivation clusters. For the differences in the five tourism agencies, we use Scheffe's test to discover the clusters that had significant differences. This ultimately provided suggestions for positioning, marketing, and pricing strategies for pertinent consumer groups, which can serve as a reference for future business directions and strategies in the industry.

K-Means
K-means is one of the simplest unsupervised learning algorithms to solve common clustering issues. The process first specifies K clusters following a simple and streamlined method to group the given data points [17].
Ref. [18] proposed a facial recognition system based on ant colony optimization algorithms and K-means theory. Through the use of ant colony optimization to improve the flaw of K-means getting bogged down in local minimums to improve K-means clustering and correct facial recognition. The experimental results show that this improved the accuracy of facial recognition. There are three research contributions: (1) versatility using the Adaboost system to detect faces in complex environments, (2) reformability over the shortcomings of the traditional K-means algorithm getting bogged down in local minimums and improve the accuracy of facial recognition; and (3) scalability that has real-time facial recognition applications such as in anti-theft systems.
Ref. [19] proposed a mathematical algorithm based on K-means clustering algorithm and Bayesian belief networks that collects and extracts the characteristics of the packet information sent by the user during gaming and constructs a game add-on identification mechanism. It employs K-means clustering to reduce feature data dimensionality and assign attributes, establishing a Bayesian network architecture through those attributes to calculate the conditional probability to determine whether the user is an add-on. It was implemented in the game Ragnarok Online, and the experimental results show that the false positive and false negative rates can be lower than 5% at the same time, effectively resolving the issues caused by the add-on.
Ref. [20] applied K-means clustering analysis to the grouping of 98 sixth-grade students in a Changhua county elementary school. The students were divided into two groups: active learner and poor. Students' ability to concentrate and attitude toward learning mathematics are important indicators of grouping. Furthermore, intelligence and attitude toward learning mathematics can effectively predict and explain student learning achievement in decimal division. This is anticipated to provide teachers and researchers reference for teaching and grouping.
Ref. [21] applied RFM value analysis and K-Means grouping to segment customers into VIP and non-VIP groups and Formulated different marketing plans for different groups such that companies could concentrate resources on serving important customers and reduce the important customer attrition rate. In addition, it is imperative to understand customer needs and behavior to analyze customer buying habits and discern the items usually purchased by VIP customers. If a customer purchases a key item sequence, there is a high chance of the customer being a VIP customer.
Ref. [22] used the customer transaction records provided by a food processing plant to perform RFM and associative analysis of customers' patterns combined with individual RFM scores and associative patterns and divided customers into groups through K-Means, thus interpreting the purchase behavior characteristics for each cluster and examining the RFM weightings using decision tree analysis. The research results show that customers' buying behavior and monetary amount (M) are more important variables from which companies can Formulate marketing strategies.
Ref. [23] examined the customers of multi-level marketing companies using the RFM analysis method as an input value and K-Means to segment customers, proposing customer relationship management strategy recommendations based on the results of these groupings, finally ascertaining the characteristics of the customer groups through C5.0 decision trees and providing references for subsequent customer value prediction, thereby increasing revenue.

Ward's Method Plus K-Means
Ref. [24] pointed out that the number of clusters searched by the hierarchical Ward's Method should be used as the initial value, followed by searches performed by the nonhierarchical K-means, such that a satisfactory number of clusters can be found.
For more accurate clustering results, hierarchical and non-hierarchical clustering methods can be used. In the first stage of hierarchical clustering, the number of clusters can be determined in advance, and in the second stage, the non-hierarchical grouping method is used for clustering [25].
Ref. [26] believes that Ward's method plus K-means corrects the previous hierarchical clustering method, once clustering analysis is produced, it cannot be incorporated into inappropriate cluster observations for regrouping. At the same time, it overcomes the shortcomings in nonhierarchical clustering. The number of clusters and the center point need to be determined ahead of time.

Fuzzy C-Means
Ref. [27] proposed the fuzzy C-means (FCM) approach, while [28] extended it. FCM works primarily on the concept of fuzzy logic. Data points will not belong to any absolute cluster, but an integer between 0 and 1 is used to indicate the degree of its affiliation with a certain cluster, further enhancing the effectiveness of the clustering.
Ref. [29] pointed out that fuzzy C-means is a highly efficient tool in clustering analysis, with applications in medical diagnoses, pattern recognition, image processing, and geological model analysis.
Ref. [30] pointed out that FCM has wide-ranging applications in agricultural engineering, astronomy, chemistry, geology, image analysis, medical diagnosis, shape analysis, and target discrimination.
Ref. [31] utilized Fuzzy C-Means to execute clustering and calculated the central value vector of each cluster, using two cases, "identifying arrhythmias in heart patients" and "identifying quality of motor skill classifications", to verify the FCM algorithm's efficiency. The example results show that the average correct recognition rate for identifying arrhythmias of heart patients is 92.63%, and the average correct recognition rate for identifying motor skill categories is 96.0%, indicating that Fuzzy C-Means is a highly efficient, speedy, and practical clustering method.
Ref. [32] employed the Fuzzy C-Means (FCM) to process the data fed back by probe vehicles and recommended corresponding clustering and calculation methods under different road conditions. The results indicate that, when the average travel speed on a certain section of road is greater than 30 km/h (or the service level is at level B or higher), a clustering method that involves "dividing the high-speed group into two clusters" or "dividing a high-speed group into three clusters" can be utilized. Furthermore, when the average travel speed of a road section is less than 30 km/h (the road grade is below C level), it is recommended to "divide into three clusters to arrive at a medium speed group" or to take the "straight average".
Ref. [33] obtained the survey data of the various evaluation factors of a guesthouse after being a consumer at a bed and breakfast. Through Fuzzy C-Means, the weighting concept was added to develop a set of "guesthouse evaluations" and a "ranking model" to provide future reference for domestic guesthouses and reduce the gap between tourist need recognition and consumer expectations to increase willingness to stay and willingness to recommend.
Ref. [34] prepared a case study on auto parts services provided by automobile general agencies in Taiwan and introduced the RFM variable into Fuzzy C-Means, clustering the data into three groups referencing the basic statistical variables of the three clusters to establish customer patterns, serving as a reference basis for companies to quickly classify new customers, and provide management advice based on the transaction characteristics of each cluster.

Self-Organizing Maps
Self-organizing maps (SOM) were proposed by [35] and are commonly known as Kohonen network. The self-organizing map network is a high-quality data exploration tool that can map high-dimensional input data images to low-dimensional output topologies, providing users a cluster of visual or query data. In the updating process, researchers can update the input and output link weights through the steps of the learning process in response to future data updates [36].
Ref. [37] employed the clustering characteristics of the self-organizing map to explore the relationship between rainfall and river flow in the Choshui River Basin and groundwater changes in Choshui River fanhead, mountain area, and river mouth area, discussing SOM correlations between temporal-spatial factors in the model to groundwater changes.
Ref. [38] used RFM to analyze the consumer behavior in hotel duty-free shops and employed SOM for clustering, ultimately conducting decision tree and association analyses for accurate marketing.
Ref. [39] utilized a self-organizing map to pinpoint the sample types of customer groups, analyzed the clustering status to find out the types of customers, and finally developed marketing methods based on customer types.
Ref. [40] uses the basic information and historical transaction records of members in the database of outdoor activity supplies stores to conduct RFM customer value analysis, utilizing the Two-Step, K-Means, and SOM methods to maximize customers, ensure appropriate clustering, identify important and potential customers, and understand their purchasing behavior characteristics for outdoor products.

Discriminant Analysis
Discriminant analysis (DA) is mainly used to distinguish differences between groups in data. The earliest concept was put forth by eminent British statistician Ronald [41]. The discriminative analysis method involves creating two or more groups based on the characteristic data of the predictive variables. The goal is to find a linear combination of predictive variables and establish a set of discriminant models such that this linear discriminant analysis model can most effectively discriminate the groups [42].
Ref. [43] employed discriminative analysis as the primary tool to determine attributes combined with the newly designed Kano questionnaire and the Delphi method as an auxiliary tool when selecting elements and establishing discriminative functionality. The empirical results show that the 28 store quality factors include 4 glamour qualities, 5 onedimensional qualities, 7 essential qualities, and 12 undifferentiated qualities. Finally, the indicators of increased probability of satisfaction and decreased probability of dissatisfaction were redefined to provide suggestions for improved order to the convenience store chains.
Ref. [44] used ECMWF Interim Reanalysis data and SSMI microwave satellite inverted thermal synthesis to analyze statistical methods by differential analysis of hurricanes with secondary eyewalls (18) and strong hurricanes without secondary eyewalls (18) in Pacific Northwest from 2000 to 2011 and verify the degree of contribution of various environmental parameters for the hurricanes from 2012 to 2015.
Ref. [45] used the financial data of 27 public companies in Taiwan's plastic industry from 2004 to 2009 and TCRI credit score reports combined with the decision tree CART algorithm and step-by-step differentiation to conduct corporate credit score research, examine its correct prediction rate, and construct predictive models. The results of the study found that the correct prediction rate using stepwise discriminative analysis was 72%, and the important variables were "operating profit ratio", "accounts receivable days", and "total assets", indicating that they play an important role in predicting the company's credit score.
Ref. [7] uses a quadratic discriminant function to classify consumers in Hsinchu City into "SOGO Department Stores", "Sunrise Department Stores", "Shin Kong Mitsukoshi Malls", and "Far East Department Stores" groups, based on eight distinguishing variables. According to their respective customer loyalty groups, research results show that the probability of misclassification of male data is 10.29% and that of female data is 5%.
Ref. [46] used a case bank as a case study to explore the characteristics of how credit risk affects individual consumer credit loan application. Employing logistic regression, differential analysis, and linear programming models, research results show that the factors affecting the credit risk of personal consumer credit loan applications include education, position, average annual income, other bank loans, recent three months credit inquiries, and number of houses.

Customer Lifetime Value Score
Ref. [47] pointed out that the RFM method evaluates customer lifetime value (CLV). Ref. [48] pointed out that RFM information can be extracted from the general customer transaction data of firms. Therefore, the RFM model can be considered one of the most commonly used customer value analysis methods in the corporate world. Ref. [49] defines customer lifetime value as the total income that each customer brings to the firm. Ref. [50] pointed out that customer lifetime value has broad applications in performance measurements [51], target demographics [52], market allocation of resources [53], product offerings [54,55], price [56], and customer segmentation [57,58].
Ref. [59] added the card-carrying benefits and the two indicators of the average customer order based on the member information of a hypermarket case, calculated the customer's value score, and then performed clustering, association analysis, and decision tree analysis to make predictions using a high value customer attrition model pyramid.
Ref. [60] considered the factors of individual and qualitative and group information simultaneously and calculated customer lifetime value on this basis. This can identify the most valuable customers and potential customers to Formulate corporate strategies. In addition, the understanding of customer consumption patterns helps the issuing bank build on the basis of customers when designing promotional activities.
Ref. [61] introduced customer lifetime value in the art industry, compared the gap between the current theory and art sales practices, calculated the initial retention probability of each customer by the drivers, and imported the Markov chain for the probability of future customer retention introduced in the modified customer lifetime value model to derive the customer equity model. Finally, the company's marketing investment return rate was calculated as 16%.
Ref. [62] used the RFM-based PLANET Framework Model (PFM), case-based reasoning, Pearson correlation analysis, and association rules to establish a customer lifetime value management system to assist companies in developing good customer relation management strategies to maintain and improve customer value and improve corporate profitability.
Ref. [63] analyzed the case data of a domestic financial holding bank employing the RFM model to calculate customer lifetime value. The two-stage method was used to group customers to understand their characteristics and ultimately analyze the association rules to explore the differences between customers' portfolios under the varying clusters, which are used as a basic reference for cross-selling of banking financial products.

Association Rules
Ref. [64] proposed the concept of association rules, which is a technique used to discover the relationships between large numbers of variables within a data set. It is widely used in business decision-making processes. A classic example is the market basket analysis [65].
Ref. [66] captured head rotation angle and facial feature points from 6785 selfies to design 45 proportional features and employed the Apriori algorithm to find high-frequency item sets and their association rules. He also proposed five recommended strategies to make horizontal and vertical directional adjustments and applied the Kappa value to evaluate the performance of the recommendations.
Ref. [67] used association rules to analyze school entrance scores and grade point averages over three school years. The results showed that the English entrance scores were good, and the GPAs for all subjects were good. Past entrance score records were good, while GPAs for all subjects were weak. Past entrance scores records were weak, while GPAs for all subjects were good. Therefore, the schools can utilize this method to find the admission standards of each school department to achieve its characteristics and goals.
Ref. [68] mainly explored the online shopping behavior of customers, used association rule analysis technology to ascertain customers' online shopping characteristics, and employed clustering technology to group customers with the same characteristics for verification. Research results show that the largest online shopping demographic was 21-30-year-old students. Customers who often go online and are willing to recommend shopping platforms and products repurchase products more frequently; thus, this demographic can be categorized as loyal customers.
Ref. [69] used customer deposits of a private bank in Taiwan as data. Employing association rules to estimate customer deposit behavior, discovering a correlation between customer attributes and product attributes, showing that customer deposit patterns can assist banks in introducing related product mixes and act as a reference for marketing strategy.
Ref. [70] utilized association rules to explore the patterns of students' product purchases with the factors of "term" and "temperature". Results of the study show that they buy milk in the morning and tea at night, and when the temperature is low, they buy hot Ovaltine drinks. Through analysis, decision makers can understand the correlation between different products and match promotions to increase store profitability.
Ref. [71] treated the e-commerce industry, which places great importance on customer lists as an example case. By segmenting customers with RFM, customers with similar consumption patterns were discovered through association rules, after which different products were recommended for different customer groups.

Summary
According to the review of the supermarket industry, the RFM model, application of the clustering method, customer lifetime value, and association rules, literature on the application of clustering methods indicates that the application of clustering methods in other industries is far greater than in the retailing industry. This study applied the customer groupings of supermarket retailers. In terms of clustering methods in literature, it was found that Ward's Method plus K-means is more common. Fuzzy C-Means is an efficient, high-speed, and practical grouping method, and self-organizing maps is a kind of neural network-like method; therefore, this study determines which of the three clustering methods is the most effective. The basic concepts of monetary value calculations present in literature on customer lifetime value were applied to score customer lifetime value.

Research Method
This study constructs a customer lifetime value model to measure customer value. Armed with clustering models, this study helps companies find their meaningful target demographics by discovering the correlations between customer groups' purchase of various products. This section is divided into six parts: research framework 2.
customer lifetime value score 6. association rules.

Research Framework
This research first utilized the RFM model to convert customer transaction data into R-values, F-values, and M-values, which were divided into two groups, namely, customer RFM and product RFM. Next, Ward's Method plus K-means, Fuzzy C-Means, and selforganizing maps were utilized. These clustering methods are used for grouping, and the method with the highest classification accuracy rate of the three methods was ascertained via differential analysis. The clustering method continued to be used over the next steps, wherein the customer lifetime value score of each group was calculated and the clusters were ranked from high to low. Finally, association rules were utilized to ascertain the correlations between customers who buy various products to group the highly correlated products and increase supermarket profits. The research framework is shown in Figure 1.

Data Description
A retailing mart database of over one million members was analyzed in the present study. The primary data were remarkably large, which included not only the member's card number but also the purchase history of each member on each item, such as the quantity of items purchased each time, unit price, item discount, and discount information after subtotaling. The present study focused on identifying regular patterns from these data

Data Description
A retailing mart database of over one million members was analyzed in the present study. The primary data were remarkably large, which included not only the member's card number but also the purchase history of each member on each item, such as the quantity of items purchased each time, unit price, item discount, and discount information after subtotaling. The present study focused on identifying regular patterns from these data that exceeded 1.5 PB (1 PB = 1024 TB). In this study, when customers made purchases, various reward schemes were used to attract them to sign up as members. Customers received an NT$100 discount coupon for future redemption after filling in their member demographic data. Member information was tabulated, in which every purchase for which their membership card numbers were entered, and their purchase details were recorded accordingly. Information thus collected served as the primary data for marketing differentiation. The customers were then classified according to their annual purchase amount. This facilitated targeting of marketing activities at various levels based on customers' preferences.

RFM Model
This study has two parts. The part I is the customer's RFM score, and part II is the product's RFM score. According to [10] quintile method, it is found that the data tends to be unequal. Since the box plot is not affected by outliers, box plot can describe the discrete distribution of data in a relatively stable way. Therefore, this study uses the box plot method to separate the data into five parts.

Box Plot
In descriptive statistics, the box plot is a method for graphically depicting groups of numerical data through their quartiles. The box plot is a standardized way of displaying the dataset based on a five-number summary, the minimum, the maximum, the sample median, and the first and third quartiles ( Figure 2). This study has two parts. The part I is the customer's RFM score, and part II is the product's RFM score. According to [10] quintile method, it is found that the data tends to be unequal. Since the box plot is not affected by outliers, box plot can describe the discrete distribution of data in a relatively stable way. Therefore, this study uses the box plot method to separate the data into five parts.

Box Plot
In descriptive statistics, the box plot is a method for graphically depicting groups of numerical data through their quartiles. The box plot is a standardized way of displaying the dataset based on a five-number summary, the minimum, the maximum, the sample median, and the first and third quartiles ( Figure 2).
The steps are as follow: Step 1, Quartile In statistics, a quartile is a type of the quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows: 1. The first quartile (Q1) is defined as the middle number between the smallest number (minimum) and the median of the dataset. 2. The second quartile (Q2) is the median of a dataset. 3. The third quartile (Q3) is the middle value between the median and the highest value (maximum) of the dataset. 4. Calculate the quartile as Formula (1). If the quartile is an integer, calculate with Formula (2). The steps are as follow: Step 1, Quartile In statistics, a quartile is a type of the quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows: 1.
The first quartile (Q 1 ) is defined as the middle number between the smallest number (minimum) and the median of the dataset.

2.
The second quartile (Q 2 ) is the median of a dataset. 3.
The third quartile (Q 3 ) is the middle value between the median and the highest value (maximum) of the dataset.

4.
Calculate the quartile as Formula (1). If the quartile is an integer, calculate with Formula (2).
Step 2, Interquartile Range, IQR The interquartile range is Formula (3). The third quartile is Q 3 , and the first Quartile Step 3, Inner Fence We find the inner fences. We start with the IQR and multiply this number by 1.5. We then subtract this number from the first quartile. We also add this number to the third quartile. These two numbers form our inner fence. Formulas (4) and (5), Q 3 is the third quartile, Q 1 is the first Quartile, IQR is the Interquartile Range and remove the Outlier.
Step 4, The maximum and minimum When we remove the Outlier. We can fine the maximum and minimum.

Customer's RFM Score
According to Box Plot and weight score of [11], Step 1, Check the data of purchase (e.g., purchase date, average point per transaction, and average cost) according to the box plot displaying the dataset based on a five-number summary of R, F, and M.
Step 3, Follow the weight score of the stone.

Product's RFM Score
Many studies only explore the customer's last purchase date, purchase frequency, and purchase amount. However, this study believes that product's RFM scores are also very important. Therefore, the research add the score of the products' RFM.
Step 2, Divide into all products five categories by the customer.
Step 3, Check the transaction data of customers. Sort the product's last purchase date, purchase time and purchase amount.
Step 5, Based on the weight value of [11], the weight value.
Step 6, Setup the value of the weight, where F is 5, R is 3, and M is 1.

Clustering
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning (Figure 3). Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning (Figure 3).

Euclidean Distance
The Euclidean distance between two points in either the plane or 3-dimensional space measures the length of a segment connecting the two points. This is shown in Figure  4. It is the most obvious way of representing distance between two points (Formula (12)).

Euclidean Distance
The Euclidean distance between two points in either the plane or 3-dimensional space measures the length of a segment connecting the two points. This is shown in Figure 4. It is the most obvious way of representing distance between two points (Formula (12)).
Example: A(0, 0), B(3, 3), Euclidean Distance: The Euclidean distance between two points in either the plane or 3-di space measures the length of a segment connecting the two points. This is shown 4. It is the most obvious way of representing distance between two points (Form

Ward's Method
In statistics, Ward's method is a criterion applied in hierarchical cluster analysis. Ward's minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares (Sum of Squares Error, SSE), and this example is known as Ward's method or more precisely Ward's minimum variance method (Formula (13)).
i shows the group size, j shows the sample size, x ij is the value of group i, x i is the average of i the group. n i is the number of samples, and S 2 i is the variation of the group.
Step 1, Treat each customer as a separate group, the error sum of squares is 0.
Step 3, Combine the two groups with the least increase in the sum of squared errors.
Step 4, Repeat Step 3. Combine the group until all are merged into a group.
Step 5, Find the smallest sum of squared errors and merge them together. This is best group.

Customer Segmentation Using K-Means Clustering
K-means clustering reduces the data by categorizing or grouping similar data items together. Such grouping is common to how humans process information. Furthermore, clustering algorithms are used to provide automated tools to help in constructing categories or taxonomies. These methods may also be used to reduce the effects of human factors in the process.
K-means clustering is a common method for partitioning factors, as it is closely related to the SOM algorithm. In K-means clustering, the criterion function is the average squared distance of the data items X k from their nearest cluster centroids.
Step 1, Customer Segmentation to k groups and random designation the center of K groups ( Figure 5).
Step 3, Initial clustering (Figure 7). Find the new clustering center of customers ( Figure 8). Reallocate customers to the nearest cluster ( Figure 9). ries or taxonomies. These methods may also be used to reduce the effects of human factors in the process.
K-means clustering is a common method for partitioning factors, as it is closely related to the SOM algorithm. In K-means clustering, the criterion function is the average squared distance of the data items Xk from their nearest cluster centroids.
Step 1, Customer Segmentation to k groups and random designation the center of K groups ( Figure 5). Step 2, Calculate every customers' Euclidean Distance ( Figure 6). Step 3, Initial clustering ( Figure 7). Find the new clustering center of customers (Figure 8). Reallocate customers to the nearest cluster ( Figure 9).  Step 2, Calculate every customers' Euclidean Distance ( Figure 6). Step 3, Initial clustering ( Figure 7). Find the new clustering center of customers (Figure 8). Reallocate customers to the nearest cluster ( Figure 9).  Step 4, Find the new clustering score of customers.
Step 5, Repeat Step 2 and Step 3, and reallocate customers to the nearest cluster.  Step 4, Find the new clustering score of customers.
Step 5, Repeat Step 2 and Step 3, and reallocate customers to the nearest cluster.

Fuzzy C-Means
Fuzzy c-means clustering was developed by [27] and improved by [28]. Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster.
Clustering or cluster analysis involves assigning data points to clusters such that items in the same cluster are as similar as possible, while items belonging to different clusters are as dissimilar as possible. Clusters are identified via similarity measures. These similarity measures include distance, connectivity, and intensity. Different similarity measures may be chosen based on the data or the application. Through the concept of Fuzzy Logic. Fuzzy logic is designed to solve problems by considering all available information and making the best possible decision given the input. So Fuzzy C-Means can further improve the effect of clustering (Formulas (14) and (15)).   Step 4, Find the new clustering score of customers.
Step 5, Repeat Step 2 and Step 3, and reallocate customers to the nearest cluster.

Fuzzy C-Means
Fuzzy c-means clustering was developed by [27] and improved by [28]. Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster.
Clustering or cluster analysis involves assigning data points to clusters such that items in the same cluster are as similar as possible, while items belonging to different clusters are as dissimilar as possible. Clusters are identified via similarity measures. These similarity measures include distance, connectivity, and intensity. Different similarity measures may be chosen based on the data or the application. Through the concept of Fuzzy Logic. Fuzzy logic is designed to solve problems by considering all available information and making the best possible decision given the input. So Fuzzy C-Means can further improve the effect of clustering (Formulas (14) and (15)).

Fuzzy C-Means
Fuzzy c-means clustering was developed by [27] and improved by [28]. Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster.
Clustering or cluster analysis involves assigning data points to clusters such that items in the same cluster are as similar as possible, while items belonging to different clusters are as dissimilar as possible. Clusters are identified via similarity measures. These similarity measures include distance, connectivity, and intensity. Different similarity measures may be chosen based on the data or the application. Through the concept of Fuzzy Logic. Fuzzy logic is designed to solve problems by considering all available information and making  (14) and (15)).
Degree of members m [1 , ∞ ) 1 ≤ m < ∞, u ij is the membership function of the observation i in the cluster j, v i customer center point of the i group.
We hope to reduce the objective function by minimizing.
Optimization of the problem by the Lagrange function (Formula (17)).
Get the best parameter v i by partial differential.
Partitions a numeric dataset by using the Fuzzy C-Means (FCM) clustering algorithm [28]. The step as follows: Step 1, Set the number of clusters to.
Step 2, Define the initial value, such as Formulas (19) and (20): Step 3, The objective function is zero Step 4, Calculate the new v i as Formula (18) Step 5, Recalculate the value of u ij (Formula (21)): Step 6, Calculate the objective function as Formula (16) Step 7, back to Step 4, while the value of the objective is less than the threshold value 2.2.12. Self-Organizing Maps Self-organizing map (SOM) technique was developed by [35]. Thus, self-organizing maps are neural networks that employ unsupervised learning methods and mapping their weights to conform to the given input data with a goal of representing multidimensional data in an easier and understandable form for the human eye.
In the topology and related areas of mathematics, a neighborhood function is one of the basic concepts in a topological space. A neighborhood function can reduce the dimensionality of the dataset. Visualization can be used to present high-dimensional data structures with low-dimensional graphics. This is show as Figure 10.
sional data in an easier and understandable form for th In the topology and related areas of mathematics, the basic concepts in a topological space. A neighborho sionality of the dataset. Visualization can be used to pr tures with low-dimensional graphics. This is show as F Step 2, Set up the learning parameters, initial weig and learning speed η.
Step 3, Input Euclidean distance of data vector V winner. This is shortest distance j * , Formula (23): * = ∀ − Step 4, Use the Neighborhood Function to find cu ner, Formula (24): How to use the SOM to cluster the customers Step 1, Input the dataset of customers. The datasets of customers are data vectors. V represents the input vector of n dimensions. A vector of n dimensions and input SOM system with Formula (22): Step 2, Set up the learning parameters, initial weight W j , Neighborhood Function K qj , and learning speed η.
Step 3, Input Euclidean distance of data vector V and weight vector W j to choose the winner. This is shortest distance j * , Formula (23): Step 4, Use the Neighborhood Function to find customers who are close to the Winner, Formula (24): K qj is the value of Neighborhood from jth customer-to-customer q of the winner. r j is the topology ordinate of jth, and r q is the topology ordinate of the winner. R is a neighborhood radius, Figure 11.

•
Neighborhood center: Focusing on winning customers, revise all customers in the neighboring area. • Neighborhood radius: Take a larger radius value first, but as the number of learning time increases or time increases, the radius can be gradually reduced (Formula (25)): • λ is the factor of Neighborhood radius. n is learning times. Each time the network learns, the Neighborhood radius will decrease once (Formula (26)):  (27)): r j x j , y j is topological coordinates of customer j, r q x q , y q is topological coordinates of winning customer. ainability 2021, 13, x FOR PEER REVIEW Figure 11. Parameter of neighborhood.

•
Neighborhood center: Focusing on winning customers, revise a neighboring area.

•
Neighborhood radius: Take a larger radius value first, but as the time increases or time increases, the radius can be gradually redu = × , < 1 • λ is the factor of Neighborhood radius. n is learning times. Each learns, the Neighborhood radius will decrease once (Formula (26 = • Neighborhood area: Take Neighborhood center as the center poin Neighborhood radius as the radius, and the area enclosed by it.
Step 5: Update the weight vector (Formula (28)): ∆W j is η × V − W j × K qj is the weighted value correction matrix for the jth customer. η is learning speed. K qj is the neighborhood function of customer q and customer j th. The greater the proximity distance between the customer and the winning customer q the smaller the proximity function and the smaller the weight correction.
Step 6: Repeat Step 3, Until the Self-Organizing Maps is formed.

Linear Discriminant Analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. Figures 12 and 13 are two groups. Figure 13 C 1 , C 2 be project at x 1 axis and x 2 axis. We found two types of data will overlap. Therefore, this is poor discrimination.
C 1 , C 2 project at FLD have best discrimination. Linear Discriminant Analysis (Formula (29)): y i is discriminant Function, b 0 is constant term, b i Discriminant coefficient, k is quantity of independent variables x.

Customer Lifetime Value
The definition of customer value is different from the perspective of the enterprise and the customer. For the enterprise, it refers to how much the customer can contribute to the enterprise. For customers, it refers to the products and services that companies can provide.
Customer lifetime value can also be defined as the monetary value of a customer relationship, based on the present value of the projected future cash flows from the customer relationship. This study wants to find the company's target customer.
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or d minant function analysis is a generalization of Fisher's linear discriminant, a method in statistics and other fields, to find a linear combination of features that characteriz separates two or more classes of objects or events. The resulting combination may be as a linear classifier, or, more commonly, for dimensionality reduction before later c fication. Figures 12 and 13 are two groups. Figure 13 , be project at axis and We found two types of data will overlap. Therefore, this is poor discrimination.
, project at FLD have best discrimination. Linear Discriminant Analysis (Formula (29)): is discriminant Function, is constant term, Discriminant coefficient quantity of independent variables x.

Customer Lifetime Value
The definition of customer value is different from the perspective of the enter and the customer. For the enterprise, it refers to how much the customer can contribu the enterprise. For customers, it refers to the products and services that companie provide. minant function analysis is a generalization of Fisher's linear discriminant, a method in statistics and other fields, to find a linear combination of features that characteriz separates two or more classes of objects or events. The resulting combination may be as a linear classifier, or, more commonly, for dimensionality reduction before later c fication. Figures 12 and 13 are two groups. Figure 13 , be project at axis and We found two types of data will overlap. Therefore, this is poor discrimination.
, project at FLD have best discrimination. Linear Discriminant Analysis (Formula (29)): is discriminant Function, is constant term, Discriminant coefficient quantity of independent variables x.

Customer Lifetime Value
The definition of customer value is different from the perspective of the enter and the customer. For the enterprise, it refers to how much the customer can contrib the enterprise. For customers, it refers to the products and services that companie provide. Calculate customer Lifetime Value Score after customers are grouped. The steps are as follows: Step 1: Transform the purchase data to RFM value.
Step 2: Setup the value of weight is (R, F, M) = (Medium, High, Low).
Step 4: Calculate the life value of customers, Formula (30): CR * , CF * , CM * is weighted values of customers' R, F, M.

Apriori Algorithm
Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in largescale transaction data recorded by point-of-sale (POS) systems in supermarkets. Ref. [65] Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you buy a pint of beer and do not buy a bar meal, you are more likely to buy crisps at the same time than somebody who did not buy beer.

Association Rules
Apriori Algorithm is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis ( Figure 14).

Subsubsection
This study analyzed, eliminated, and compared the big data of member information retrieved from a retailing industry database. This study focused on the customer purchase behavior database of 300 member from 1 July 2013 to 31 December 2013. A total of 117,003 transaction records in six months. The primary database contains a large amount of infor mation on the purchase history of each member, including member's card number, the quantity of items purchased each time, unit price, item discount, and discount information after subtotaling (Table 4). Product category information includes category number and category name. Customer transaction data contain a large amount of information on the purchase history of each member (Tables 5 and 6).  Step 1: Scan the database and get the single dataset is C 1 . Extend one item at a time and then it prunes the candidates which have an infrequent sub pattern. The result is frequent dataset L 1 .
Step 2: Combine two dataset to candidate item sets C 2 contain two dataset. Repeat the step 1 and step 2. The result is frequent item sets L 2 .
Step 3: Combine the dataset to candidate item sets C 3 .

Data Description Subsubsection
This study analyzed, eliminated, and compared the big data of member information retrieved from a retailing industry database. This study focused on the customer purchase behavior database of 300 member from 1 July 2013 to 31 December 2013. A total of 117,003 transaction records in six months. The primary database contains a large amount of information on the purchase history of each member, including member's card number, the quantity of items purchased each time, unit price, item discount, and discount information after subtotaling (Table 4). Product category information includes category number and category name. Customer transaction data contain a large amount of information on the purchase history of each member (Tables 5 and 6).

Item Remark
Category number Category name According to [10], the quintile method, it is found that the data tends to be unequal. Since the box plot is not affected by outliers, box plot can describe the discrete distribution of data in a relatively stable way. Therefore, this study uses the box plot method to separate the data into five parts.

Score of Customers' RFM
The box plot of last purchase date. Distance between the last purchase date to 31 December 2020 are 42 days, 34 days, 31 days, 29 days, 27 days, 20 days, 15 days, 13 days, 12 days, 10 days, 9 days and 8 days. These numbers are outliers. The maximum value is 7 days. The minimum value is 0 day. The first quartile is 0 day, and the third quartile is 3 days. The median value is 1 day.
This study divides the last purchase date into five parts and take its integer value. "Minimum value 0 day~First Quartile is 0 day", "First Quartile is 0 day~The Median value is 1 day", "The Median value is 1 day~Third Quartile is 3 days", "Third Quartile is 3 days~The maximum value is 7 days" and "upper/down fence value is 7 days/0 day". R = 0 day, 0 < R ≤ 1 day, 1 < R ≤ 3 days, 3 < R ≤ 7 days, R > 7 days, and R < 0 day. R = 0 day, the score of customer last purchase date is "5" points. 0 < R ≤ 1 day, the score of customer last purchase date is "4" points. 1 < R ≤ 3 day, the score of customer last purchase date is "5" points. 3 < R ≤ 7 day, the score of customer last purchase date is "2" points. R > 7 day, the score of customer last purchase date is "1" point and delete R < 0 day ( Figure 15). points. R > 7 day, the score of customer last purchase date is "1" point and delete R < 0 day ( Figure 15). This study divides the purchase frequency into five parts and take its integer value. "Third Quartile is 77 times ~ The maximum value is 87 times", "The Median value is 74 times ~ Third Quartile is 77 times", "First Quartile is 70 times ~ The Median value is 74 times", "The minimum value is 60 times ~ First Quartile is 70 times" and "upper/down fence value is 87 times/59 times". R = 0 day, 0 < R ≤ 1 day, 1 < R ≤ 3 days, 3 < R ≤ 7 days, R > 7 days, and R < 0 day. 77 < F ≤ 87 times, 74 < F ≤ 77 times, 70 < F ≤ 74 times, 60 < F ≤ 70 times, F > 87 times, or F < 59 times. This study divides the purchase frequency into five parts and take its integer value. "Third Quartile is 77 times~The maximum value is 87 times", "The Median value is 74 times~Third Quartile is 77 times", "First Quartile is 70 times~The Median value is 74 times", "The minimum value is 60 times~First Quartile is 70 times" and "upper/down fence value is 87 times/59 times". R = 0 day, 0 < R ≤ 1 day, 1 < R ≤ 3 days, 3 < R ≤ 7 days, R > 7 days, and R < 0 day. 77 < F ≤ 87 times, 74 < F ≤ 77 times, 70 < F ≤ 74 times, 60 < F ≤ 70 times, F > 87 times, or F < 59 times.
When 77 < F ≤ 87 times, the score of customer purchase frequency is "5" points. 74 < F ≤ 77 times, the score of customer purchase frequency is "4" points. 70 < F ≤ 74 times, the customer purchase frequency is "3" points. 60 < F ≤ 70 times, the score of customer purchase frequency is "2" points. F > 87 times or F < 59 times, the score of customer purchase frequency is "1" points ( Figure 16). When 1796 < M ≤ 3022 dollars, the score of customer purchase amount is " 1276 < M ≤ 1796 dollars, the scores of the customer purchase amounts are "4" p ≤ M ≤ 1276 dollar, the score of customer purchase amount is "3" points. 319 dollars, the score of customer purchase amount is "2" points. M > 3022 dollars of customer purchase amount is "1" points ( Figure 17). When 1796 < M ≤ 3022 dollars, the score of customer purchase amount is "5" points. 1276 < M ≤ 1796 dollars, the scores of the customer purchase amounts are "4" points. 979 ≤ M ≤ 1276 dollar, the score of customer purchase amount is "3" points. 319 ≤ M ≤ 979 dollars, the score of customer purchase amount is "2" points. M > 3022 dollars, the score of customer purchase amount is "1" points ( Figure 17). According to the score of Customers' RFM compiled of this research. The result can be found. For example, the last purchase date of Customer number 20000154800 is 31 December 2013. The distance of is zero date. Therefore, the R score is "5", purchase frequency is 69 times. F score is "2", purchase amount is 4746 dollars, M score is "1" ( Table 7).
The last purchase date of customer number 20000186600 is 31 December 2013. The distance of the zero date. R score is "5", purchase frequency is 92 times. F score is "1", purchase amount is 1094 dollars; M score is "3". Multiply by weight value of F, R and M after the customer is coded. We can get the weighted R, F, and M values of customers (Table 8).  According to the score of Customers' RFM compiled of this research. The result can be found. For example, the last purchase date of Customer number 20000154800 is 31 December 2013. The distance of is zero date. Therefore, the R score is "5", purchase frequency is 69 times. F score is "2", purchase amount is 4746 dollars, M score is "1" (Table 7). The last purchase date of customer number 20000186600 is 31 December 2013. The distance of the zero date. R score is "5", purchase frequency is 92 times. F score is "1", purchase amount is 1094 dollars; M score is "3". Multiply by weight value of F, R and M after the customer is coded. We can get the weighted R, F, and M values of customers (Table 8). First, divide the products of customers' purchased into five categories. "Raw food", "Instant Food", "Dry Food", "People's Livelihood Products", and "Others". Then, find the last purchase date of products' purchase frequency and purchase amount. The score of products is Table 9. For example: Last purchase date of five categories is also 31 December 2013. Therefore, the R score all is "5", purchase frequency of five categories is 55,740, 16,775, 11,254, 20,120, and 13,114. Therefore, the score value is "5", "3", "1", "4", and "2", average sales price of five categories is 69, 87, 80, 92, and 111. Therefore the score value is "1", "3", "2", "4", and "5". RFM score of raw food is 5-5-1; RFM score of instant food is 5-3-3; RFM score of Dry Food is 5-1-2. RFM score of People's Livelihood Products is 5-4-4. RFM score of others is 5-2-5. Multiply by weight the value of F, R, and M after the product is coded. We can get the weighted R, F, and M values of the product [11] (Table 9). Further explore the characteristics of product sales, organize as in Table 10.
Observe the number and percentage of purchases by customers in each cluster based on product sales frequency. Purchase frequency in which customers buy row food. Golden customer Golden customer > Iron customer > Copper customer > Diamond customer > Silver customer. Purchase frequency in which customers buy instant Food: Iron customer > Golden customer > Silver customer > Iron customer > Diamond customer. Purchase frequency in which customers buy dry Food: Golden customer > Copper customer > Iron customer > Diamond customer > Silver customer. Purchase frequency in which customers buy People's Livelihood Products: Golden customer > Copper customer > Iron customer > Diamond customer > Silver customer. Purchase frequency in which customers buy other

The Result of Cluster
The study cluster the customers with Ward's Method, K-Means, Fuzzy C-Means, and Self-Organizing Maps. Finally, Discriminant analysis to check the result and verify which methods work best. Use the results to calculate customer lifetime value.

Ward's Method and K-Means
The study used SPSS 12.0 statistical software for the analysis. First, find the best quantity of cluster by Ward's Method. The coefficient increment to 101.277 from step 295 to step 296. Therefore, the best number of groups are five (Table 12). This study found that the best number of groups are five based on the Ward method. The random designation of the center of K groups by K-Means. The center of initial clustering. Initial center of group 1 is 1-2-1. Initial center of group 2 is 5-1-2. The initial center of group 3 is 1-5-5. The initial center of group 4 is 5-5-4. The initial center of group 5 is 2-1-5 (Table 13). After iterations of five times can get the final grouping. Group 1 has 42 observations. Group 2 has 78 observations. Group 3 has 45 observations. Group 4 has 94 observations. Group 5 has 41 observations (Table 14).   (Table 15).  This study used MATLAB R2018a (MathWorks, Natick, MA, USA) statistical software for analysis. First, setup the number of group and compare the effects of the three grouping methods. Experiment with five groups as the number of groups. Matrix U of c × n. According to two conditions, these values are normalized between 0 and 1. For one point of n j and j is equal 1~300. The total value is 1 (Table 17).  After iterations of 93 times. The objective function is reduced from 333.26600 to 224.47860 (Table 19).

Comparison of Accuracy by the Discriminant Analysis
The study used SPSS 12.0 statistical software (SPSS lnc., Chicago, IL, USA) for the analysis. According to the grouping results of the three methods by discriminant analysis. The correct rates are 96.7%, 98.3%, and 90.0%. Thus, the study chooses Fuzzy C-Means (Table 26).

Customer Lifetime Value
According to the results of grouping by Fuzzy C-Means, calculate the customer lifetime value and sort from cluster 1 to cluster 5. Maximum is 45 points and minimum is nine points. For example, the customer number is 20000356000 that belongs to group 1, and the RFM value is 3-5-3. Be multiplied by the weight value of F (high), R (medium) and M (low). The study set the weight value of F is 5, R is 3, and M is 1 [11]. The customer-weighted RFM value is 9-25-3. Finally, the weighted RFM values are summed up. The score of the customer lifetime value is 37 (Table 27). Sorting the score of customer life values. Name the groups in descending order as Diamond, Golden, Silver, Copper, and Iron (Table 28). In order to further explore the characteristics of each cluster customer (Table 29).

Discussion
First, the study transforms customer's last purchase date, purchase sequence, and purchase amount to values of R, F, and M, by box plot. Second, find the best quantity of groups are five by Ward's Method. Third, cluster the customers by K-means, Fuzzy C-Means, and Self-Organizing Maps. Finally, validation results by discriminant analysis.

Research Results
Three research methodologies were applied in this study to classify customers, ultimately discriminant analysis was employed to see which clustering method was most effective, and to select results from which to further calculate customer lifetime value.

Ward's Method with K-Means
Three discriminant functions were extracted with this method, all of which are shown in Table 30. Since 58.1% of the information can be explained by the first discriminant function. 34.8% of the information can be explained by the second discriminant function, while 7.1% of the information can be explained by the third discriminant function. The greater the eigenvalue, the stronger the discriminative power. The eigenvalue of the first discriminant function is 3.544, whose discriminative power is the strongest of the three. Within the significance of discriminant function test as shown in Table 31, the Wilks' Lambda value is the ratio of the Within Groups Sum of Squares to the Total Sum of Squares (TSS), of which Within Groups Sum of Squares, also known as the residual Sum of Squares (RSS). The smaller the Wilks' Lambda value, the better its discrimination; the Wilks' Lambda value of the first discriminant function is 0.049, whose discriminant power is the greatest of the three. As shown in Table 31, the significance of the discriminant for all three was 0.000, indicating a relatively high significance in all of them. Fisher's linear discriminant is shown in Table 32 and was separated into five clusters for this research, each of which corresponds to one discriminant function and are thus written as five separate discriminant functions, and they are, respectively, Formulas (31)- (35  It is apparent from the discriminant analysis as shown in Table 33 that there are 42 customers in Cluster 1, of which 41 were classified correctly, while 1 was misclassified for an accuracy of 97.6%. There are 78 customers in Cluster 2, all of which were classified correctly, for an accuracy of 100%. there are 45 customers in Cluster 3, all of which were classified correctly, for an accuracy of 100%. Ninety-four customers in Cluster 4, all of which were classified correctly, for an accuracy of 100%. Forty-one customers in Cluster 5, of which 32 were classified correctly and 9 were misclassified, for an accuracy of 78%. There is a total of 300 customers, of which 290 were classified correctly, for an accuracy of 96.7%.

Fuzzy C-Means
Three discriminant functions could be extracted with this method, the eigenvalues of the discriminant function are shown in Table 34. The first discriminant function can explain 49.6% of the information. The second discriminant function can explain 34.9% of the information. The third discriminant function can explain 15.6% of the information. The larger the eigenvalue, the stronger the discriminant power. The eigenvalue of the first discriminant function is 2.929, which has the strongest discriminant power among the three. The significance test of discriminant function is shown in Table 35. The Wilks' Lambda value is the ratio of the Within-Groups Sum of Square to the Total Sum of Squares, in which the Within Groups Sum of Squares is also known as the Residual Sum of Squares (RSS). The smaller the Wilks' Lambda value, the greater the discriminant power of its discriminant function. The first discriminant function's Wilks' Lambda value is 0.043, which has the strongest discriminant power among the three. Table 35 shows that the significance of the three discriminant functions are all 0.000, thus indicating that the three discriminant functions all have relatively high significance. Fisher's linear discriminant, as shown in Table 36, was separated for this study separated into five clusters, each of which corresponds to a discriminant function and are thus written as five discriminant functions, each of which has been separated into Formulas (36)- (40).  The discriminant analysis as shown in Table 37 shows that there are 50 customers in Cluster 1, all of which were classified correctly, for an accuracy of 100%. There are 59 customers in Cluster 2, of which 55 were classified correctly. 4 were misclassified, for an accuracy of 93.2%. There are 73 customers in Cluster 3, all of which were classified correctly, for an accuracy of 100%. There are 66 customers in Cluster 4, of which 65 are classified correctly, and 1 was misclassified, for an accuracy of 98.5%. There are 52 customers in Cluster 5, all of which were classified correctly, for an accuracy of 100%. There is a total of 300 customers, of which 295 were classified correctly, for an accuracy of 98.3%.

Self-Organizing Map
Three discriminant functions could be extracted with this method, and the discriminant function eigenvalues are shown in Table 38. The first discriminant function can explain 56% of the information. The second discriminant function can explain 43.5% of the information. The third discriminant function can explain 0.5% of the information. The greater the eigenvalue, the stronger the discriminant power. The eigenvalue of the first discriminant function is 3.929, which has the strongest discriminant power among the three. The significance testing of discriminant function is as shown in Table 39. The Wilks' Lambda value is the ratio of the Within-Groups Sum of Square to the Total Sum of Square, which is also known as the Residual Sum of Squares (RSS). The smaller the Wilks' Lambda value, the better the discriminant function. The first discriminant function's Wilks' Lambda value is 0.048, whose discriminant power is strongest among the three. Table 39 shows that the significance of two discriminant functions is 0.000, indicating that the significance of these two discriminant functions is higher than the third one. Fisher's linear discriminant, as shown in Table 40, was separated for this research into five clusters, with each cluster corresponding to one discriminant function, and are thus written as five discriminant functions; they are, respectively, Formulas (41)- (45  The discriminant analysis in Table 41 shows that there are 55 customers in Cluster 1, of which 53 were classified correctly, 2 were misclassified, for an accuracy of 96.4%. There are 55 customers in Cluster 2, of which 49 were classified correctly, and 6 were misclassified, for an accuracy of 89.1%. There are 69 customers in Cluster 3, of which 61 were classified correctly and 8 were misclassified, for an accuracy of 88.4%. There were 47 customers in Cluster 4, of which 43 were classified correctly, and four were misclassified, for an accuracy of 91.5%. There are 74 customers in Cluster 5, of which 64 were classified correctly, and 10 were misclassified, for an accuracy of 86.5percent. There is a total of 300 customers, 270 of which were classified correctly, for an accuracy of 90%.  According to the grouping results of the three methods by discriminant analysis, the correct rates are 96.7%, 98.3%, and 90.0%. Thus, the study chose Fuzzy C-Means to calculate the score of customer life value. This research divides the customers and name into five types: Diamond, Golden, Silver, Copper, and Iron. In terms of customers, we provide the following strategies in Table 42. Table 42. Strategies of this study.

No
Categories Description of RFM Corporation, Mitsubishi, Nestlé, Tata Motors, and Tesco use customer value strategies to build rational and emotional bonds with their target markets. The author of this paper works in the third largest supermarket in Taiwan and is responsible for marketing analysis work and in charge of a database with more than one million members. However, the author has always concerned about how to allocate supermarket marketing resources; therefore, the author enters the academic field to look for a solution. After substantive survey and literature review work, the author found that most of the academic community uses a single method to analyze large databases, such as the RFM method; however, in practice, this approach cannot fully identify the correct customers. Therefore, the author sorted out all the methods presented in studies one by one, conducted data analysis, and completed this paper, hoping to share with other scholars and inspire other research results.
This study uses Taiwan's third largest supermarket member database for analysis. Since it is easy to acquire data, there is no copyright issue, and the key is to share technology. In the future, if there are other databases of different nature to be combined in the research, then it could further prove the feasibility.
The recent food security scandals in Taiwan have caused substantial management difficulties for the retail industry. Subsequently, the optimal response to these challenges largely focused on the affluent top-level customers. Therefore, customer value analysis has become increasingly crucial. Valuable customer clusters were identified on the basis of the detailed value characteristics of all clusters. Comparative analysis and verification were conducted with regard to the life value of all customer clusters.

1.
Diamond-level customers: Diamond-level customers account for 17.3% of the total members. The characteristics of this customer group are high degree of activeness, high customer loyalty, low customer contribution, and they are the customers with the highest lifetime value. In order to increase the purchase amount of this customer group, the supermarket can put discounted products at the back of the cashier, attracting customers to add to their purchase; observe products selected by customers and recommend additional products to them; hold some promotion events, such as mark-down for buying red-label and green-label products together.

2.
Gold-level customers: Gold-level customers account for 24.3% of the total members. The characteristics of this customer group are high degree of activeness, moderate customer loyalty, high customer contribution, and they are the customers with the second highest lifetime value. In order to increase the purchase frequency of this customer group, the supermarket can combine amount spent by customers directly into the bonus points, which can be used for mark-down at the next visit; give gifts or coupons to a single purchase of a certain amount.

3.
Silver-level customers: Silver-level customers account for 16.7% of the total members. The characteristics of this customer group are low degree of activeness, high customer loyalty, low customer contribution, and they are the customers with average lifetime value. In order to improve the last purchase date and purchase amount of this customer group, the supermarket can hold regular events such as special holiday promotions, buy one get one for free products, weekday benefits, etc.

4.
Copper-level customers: Copper-level customers account for 22.0% of the total members. The characteristics of this customer group are high degree of activeness, low customer loyalty, low customer contribution, and they are the customers with belowaverage lifetime value. To increase the frequency and purchase amount of this customer group, the supermarket can send product catalogs, coupons and event newsletters on a regular basis to update customers about the latest of the supermarket.

5.
Iron-level customers: Iron-level customers account for 19.7% of the total members. The characteristics of this customer group are low degree of activeness, low customer loyalty, high customer contribution, and they are the customers with the lowest lifetime value. In order to improve the latest purchase date and frequency of this customer group, the supermarket can send catalogs featuring Top Ten Popular Products Recommended by Netizens, so as to attract customers to revisit; send product sample kits, such as: shampoo, mask, etc., in order to rekindle customers' interest in supermarket products.

In Terms of Products, Provide the Following Strategies
According to Association Rules. The shopping habit of the customer is Raw food and People's Livelihood Products, Raw food and Instant Food, Instant Food, and People's Livelihood Products. Therefore, we put Raw food and People's together, Raw food and Instant Food together, and Instant Food and People's Livelihood Products together.
The product can be tied for sale, gift or promotion. In term of products, product bundling can be carried out, or promotional gift. For example: If the customer bought the Raw food, People's Livelihood Products is free, and so on. This can increase supermarket profits.

Research Limitations
Due to the limitations of this study, information of certain aspects cannot be obtained, which are described as follows: 1.
The database is based on a supermarket in Taiwan, so it is uncertain that whether other supermarkets in Taiwan also conform to the results of this study.

2.
The database consists of members of the supermarket, so it cannot reveal the transaction status of nonmembers. 3.
The database is not a questionnaire, so it cannot reveal customer satisfaction with products, services, environment, etc. 4.
The amount information recorded in the database does not specify whether it is the promotional price, so it cannot tell if the amount is discounted.

5.
Regarding research limitations, because many members use the same membership card for shopping, resulting in distortion in analysis, it is necessary to find solutions in the future.

6.
This study only analyzes and discusses the data of a supermarket. In the future, it is recommended to combine customer transaction data of other supermarkets into a large database for study; it is also suggested to compare the results of foreign supermarkets, develop, and examine hypotheses, so as to conduct a variance analysis to discover if the effect is significant. 7.
In terms of subject matter, scholars can study other retail entities, such as department stores, convenience stores, etc., or other industries, such as finance and insurance, leisure, and entertainment, etc. 8.
In terms of customer segmentation, other clustering methods can be used, such as the Expectation Maximization method (EM), the Density-based Spatial Clustering of Applications with Noise algorithm (DBSCAN), the Mean Shift (MS), and so forth, and then analyze their differences to find the clustering method with the highest accuracy.