A Hybrid Two-Phase Recommendation for Group-Buying E-commerce Applications

Featured Application: The hybrid two-phase recommendation for some group-buying based e-commerce applications,


Introduction
In recent years, the rapid development of the Mobile Internet [1] and Web 2.0 [2,3] has not only transformed people's way of life but also created new business models and economic behaviors, to enable a culture of sharing-economy. In particular, the last two decades have witnessed an explosive growth of e-commerce applications [4,5], and a great number of users are participating in a lifestyle of shopping online and sharing data. According to the data analytics firm ComScore (http://www.comscore.com/), the sales on Cyber Monday, i.e., the biggest online shopping day in the US, reached $1.35 bn. In China, Alibaba has broken its own record for sales on China's Single Day in 2017, with 168.2 bn yuan (about $26.4 bn). An enormous amount of data has been produced and has grown exponentially in e-commerce applications. The large-scale data yield the Big Data analytics problem for relevant practitioners [6]. Moreover, the data of group-buying websites, a popular transaction model for online shopping [6,7], are even more complex and dynamic due to the uncertainty, diversity and randomness of user behaviors. Group-buying, also known as collective buying, offers products and services at significantly reduced prices on the condition that a minimum number of buyers would make the purchase. Group-buying can be produced anytime and anywhere, through which discount 2 of 20 prices are obtained from retailers when a certain group of people were willing to buy the same item. Therefore, the data of group-buying are sparser compared to that of other e-commerce models. Efficiently and effectively recommending services/items to users in such an application scenario have become a significant challenge [7].
In typical online recommendation systems, results of non-personalized recommendation lack of interpretation and are not acceptable to users because the users' personalized requirements are not considered [8]. As a result, developing effective personalized recommendation approaches is important for high-quality service recommendation. Neighborhood-based collaborative filtering (CF) approaches [9][10][11][12][13], which include user-based and item-based, have become the most popular technique for a personalized recommendation. The user-based CF approach is to find out a set of users who have similar favor patterns to a given user (i.e., "neighbors" of the user) and recommend to the user those items that other users in the same set like, the item-based CF approach aims to provide a user with the recommendation on an item based on the other items with high correlations (i.e., "neighbors" of the item). In all collaborative filtering methods, it is a significant step to find users' (or items') neighbors, that is, a set of similar users (or items). However, current CF methods suffer from such problems as data sparsity and recommendation inaccuracy.
With the rapid development of Big Data techniques and platforms, effective implementation of recommendation approaches in distributed computing platforms such as MapReduce [7] has been explored, which gains good scalability and efficiency in Big Data environments. On the other hand, many studies have proven that online recommendation systems using hybrid recommendation approaches can achieve enhanced effects. Hybrid approaches mainly include collaborative filtering, demographic filtering [14], location information [15,16], clustering [17,18], content filtering [19] and Bayesian networks [20].
In this paper, a hybrid two-phase recommendation (TPR) method is proposed, combining collaborative filtering with clustering techniques. The main contributions of this work are summarized as follows: - To alleviate the data sparsity problem, item features and user behaviors were fully investigated; feature description approaches were designed to construct the feature matrix. -Based on item clustering, the user-item category tendency was defined to integrate users' preferences with users' concern degrees for item category. In addition, a concept of integrating similarity between users is proposed by considering user behaviors and frequencies of these behaviors. -A parallelized strategy of execution is proposed to improve the capability of dealing with massive data with the recommendation process. To be specific, after item clustering, the rating for new items (not rated) was predicted and taken as supplementation of the feature matrix. Meanwhile, considering item clusters, user clustering was performed on the basis of the integrating similarity and the user-item tendency matrix. -An improved K nearest neighbors (KNN) method was designed to generate a personalized recommendation list. The key idea was to determine the nearest neighbors by measuring the similarity between the target user and related clusters, which were selected by comparing cluster centers with a predefined similarity threshold. - The rest of this paper is organized as follows: Section 2 introduces recommendation techniques and the clustering-based two-phase recommendation. Section 3 analyzes the research problem. Section 4 illustrates the proposed HTPR method in detail. Section 5 shows the experimental results, and Section 6 concludes the paper and discusses future work.

Recommendation Techniques
Recommendation systems, first proposed by Resnick [21], have become popular solutions to provide users with predictions and appropriate recommendations by utilizing diverse sources of information. A recommendation system was originally defined as a system that generates personalized recommendation as output to guide the user in an individual way to interesting or useful services in a large space of potential options [21][22][23][24][25][26]. In general, a recommendation system has: (1) feature input the information that describes the user's preferences/item features in a specific data structure; (2) a recommendation engine that combines user's characteristic analysis and recommendation models building to generate suggestions; (3) output the information that presents in various forms including rating prediction and Top-N recommendations.
In basic CF methods, the target user receives recommended items preferred by similar users, or items which are similar to preferred items by the user. The former is called user-based CF (UCF), while the latter is called item-based CF (ICF). Both of them are memory-based CF. Memory-based CF [27] is a common approach extensively applied in e-commerce platforms such as Amazon, and Taobao. Memory-based approaches identify the similarity between two users by comparing their ratings on a set of items [28], which are simple and intuitive at the conceptual level. Nevertheless, they suffer from two drawbacks: sparsity and scalability. The data sparsity problem [29] in the user-item matrix has greatly limited the applicability of this kind of CF since it is difficult to make rating predictions using sparse training data. Some approaches have been proposed to address this problem, such as default voting methods [30], imputation-boosted techniques [31], and clustering-based approaches [32]. In general, unknown ratings are filled with constant values without actual variances considered. Moreover, the computational complexity of these methods grows linearly with the number of users and items.
Model-based CF finds the relations between various items by analyzing the user-item matrix and then obtains the list of recommendations based on the relations. In light of the scalability issues, a prediction model is produced in model-based CF [33] mostly by machine learning techniques to predict the unknown data [34]. Commonly-used model-building methods include neural networks [35], matrix factorization [36], and latent semantic models [37]. User-to-user correlations are represented with the relations of their ratings without regarding specific features and attributes of items. It is not valid to analyze the users' interests based on insufficient information. Besides, these approaches are time-consuming to build and update to some extent [38]. Rafailidis et al. [39] proposed a new measure of user-preference dynamics and a model of user-item interactions over time by applying a tensor that takes time as a dimension.
The mainstream recommendation techniques are listed and compared in Table 1. It can be seen from the table that these methods have diverse performances in aspects of cold-start [40], data sparsity, scalability, accuracy and interpretability.
The basic idea of recommendation techniques is to mine the similarity of data. Existing research efforts are mainly based on the following considerations: (1) content filtering: recommending services similar to those the user used to like; (2) social filtering: recommending services similar to those the user's friends or other users who have similar preferences in the past; (3) collaborative filtering: finding out the similarity of items, or the similarity of users by mining their historical behaviors, further to generate a recommendation.

Clustering Based Two-Phase Recommendation
Existing literature has addressed the improvement of the CF algorithm through clustering analysis. For example, in Reference [41], users are grouped into clusters to find a match between the target user and each user group. Figure 1 illustrates a two-phase strategy for a recommendation based on clustering techniques, which consists of two phases, i.e., offline clustering and online recommendation. Before making rating predictions according to neighbors, users or items are aggregated first to form clusters, from which the nearest neighbors are determined through similarity calculation. Once clusters of the target user are determined, rating predictions can be made in view of the ratings of nearest neighbors of the small-scale clusters in the phase of online recommendation. In this way, the computational complexity of the CF algorithm is optimized.

Clustering Based Two-Phase Recommendation
Existing literature has addressed the improvement of the CF algorithm through clustering analysis. For example, in Reference [40], users are grouped into clusters to find a match between the target user and each user group. Figure 1 illustrates a two-phase strategy for a recommendation based on clustering techniques, which consists of two phases, i.e., offline clustering and online recommendation. Before making rating predictions according to neighbors, users or items are aggregated first to form clusters, from which the nearest neighbors are determined through similarity calculation. Once clusters of the target user are determined, rating predictions can be made in view of the ratings of nearest neighbors of the small-scale clusters in the phase of online recommendation. In this way, the computational complexity of the CF algorithm is optimized.

Rating prediction Recommendation
Similarity computation

Offline Clustering
Online Recommendation (CF)

Clustering Clusters
Feature input Dataset K Nearest neighbors However, most of them focus on clustering analysis of items only or users only. Furthermore, due to the sparsity of the user-item rating matrix, the deviation of similarity calculation is quite large, making it difficult to guarantee the quality of recommendation services. This paper proposes a hybrid recommendation method considering both item clustering and user clustering. However, most of them focus on clustering analysis of items only or users only. Furthermore, due to the sparsity of the user-item rating matrix, the deviation of similarity calculation is quite large, making it difficult to guarantee the quality of recommendation services. This paper proposes a hybrid recommendation method considering both item clustering and user clustering.

Problem Statement
Information of a recommendation system consists of data of users and items, including user features, item attributes, and the relation of user and item. A recommendation system involves users, Appl. Sci. 2019, 9, 3141 5 of 20 items, and their relations. Here, relations refer to user behaviors on items, such as searches, purchases and ratings. User features refer to the basic user information submitted during the user registration process. Item attributes include all the information about an item. As indicated in Figure 2, there are three kinds of relations in the recommendation system: user-user feature (U2F) relation, user-item (U-I) relation, and item-item attribute (I2A) relation.

Problem Statement
Information of a recommendation system consists of data of users and items, including user features, item attributes, and the relation of user and item. A recommendation system involves users, items, and their relations. Here, relations refer to user behaviors on items, such as searches, purchases and ratings. User features refer to the basic user information submitted during the user registration process. Item attributes include all the information about an item. As indicated in Figure 2, there are three kinds of relations in the recommendation system: user-user feature (U2F) relation, user-item (U-I) relation, and item-item attribute (I2A) relation. U-I relations are the most important relations which are used by a collaborative filtering algorithm to generate recommendations. Traditional CF methods rely on user-item rating matrix, and recommendations are generated through similarities of users or items. However, their accuracy of recommendations is not always satisfactory due to the sparsity problem of the matrix. A user often gives ratings only for items that he/she is interested in. The low possibility of rating leads to a serious lack of rating data, and further influences the quality of recommendation based on the CF methods.
Based on the analysis above, we will make full use of objects and information on recommendation systems such as metadata of items, information of user behaviors. The extracted data will be used to supplement the feature matrix.

Feature Description
Generally, a recommendation system involves data like item features and user behaviors. Apart from rating information, browse information, add-to-favorite records, purchase and other user behaviors are integrated to complete the description of U-I relations. On the other hand, information of item features is comparatively important for I2A relation, and the features of the item can be extracted and regarded as a complement to U-I relation.

Feature Description of Item Attribute
An item generally contains a variety of attributes, which are inherent in the item and vary from item to item. The feature description of items based on attributes is a true reflection of the I2A relation.
Assuming that an item includes multiple attributes, each item is represented by a p-dimensional vector (each item has at most p attributes). An item/attribute matrix with m items is denoted as  U-I relations are the most important relations which are used by a collaborative filtering algorithm to generate recommendations. Traditional CF methods rely on user-item rating matrix, and recommendations are generated through similarities of users or items. However, their accuracy of recommendations is not always satisfactory due to the sparsity problem of the matrix. A user often gives ratings only for items that he/she is interested in. The low possibility of rating leads to a serious lack of rating data, and further influences the quality of recommendation based on the CF methods.
Based on the analysis above, we will make full use of objects and information on recommendation systems such as metadata of items, information of user behaviors. The extracted data will be used to supplement the feature matrix.

Feature Description
Generally, a recommendation system involves data like item features and user behaviors. Apart from rating information, browse information, add-to-favorite records, purchase and other user behaviors are integrated to complete the description of U-I relations. On the other hand, information of item features is comparatively important for I2A relation, and the features of the item can be extracted and regarded as a complement to U-I relation.

Feature Description of Item Attribute
An item generally contains a variety of attributes, which are inherent in the item and vary from item to item. The feature description of items based on attributes is a true reflection of the I2A relation.
Assuming that an item includes multiple attributes, each item is represented by a p-dimensional vector (each item has at most p attributes). An item/attribute matrix with m items is denoted as A = a j,ϕ m×p , item j is represented by a j,1 , a j,2 , . . . , a j,ϕ , . . . , a j,p , where a j,ϕ , a j,ϕ ∈ {0, 1} is the ϕ-th

Feature Description of User Behavior
User behaviors can be categorized as explicit and implicit behaviors. Table 2 lists common user behaviors. The feature description of user behaviors consists of user-item browse matrix, user-item wish list matrix, user-item purchase matrix, user-item rating matrix, and item review tag matrix.

Types of User Behavior User Behavior
Explicit behavior rating, vote, share/forward, add-to-favorite, purchase, review Implicit behavior click/browse, time on page User-item browse matrix: count the number of browse of a user on an item for a period of time and then store the number in User-item wish list matrix: record the wish list of items added to the favorites by a user for a period of time and then store it in W = w i,j n×m , where w i,j ∈ {0, 1}, if the user i added the item j to the favorites, then User-item purchase matrix: count the number of purchase by a user for a period of time and then User-item rating matrix: record the rating of a user on an item and then store it in R = r i,j n×m , where r i,j represents the rating of the user i on the item j on a scale of zero to four, i ∈ [1, n], j ∈ [1, m].
Item review tag matrix: extract feature tags that represent community opinions on an item from the review content by using textual analysis, the matrix is denoted as

Overview of the Proposed Solution
The proposed HTPR method consists of two phases: offline preparation and online recommendation, as shown in Figure 3. Firstly, in order to achieve a high-quality recommendation effect, we consider both item features and user behaviors in the process of feature input and description. Secondly, to explore effective recommendation techniques with good capability of dealing with massive data, a parallelized execution strategy was designed to optimize the recommendation process. After item clustering, feature supplementation and user clustering were carried out in a parallelized way. A brief introduction of the key components and concepts of the proposed HTPR method is given in the following: -Feature input and description. Feature information was input to the recommendation system after feature description. The input features involved the content of items, i.e., item/attribute matrix and user behaviors including browse, add-to-favorite, purchase, review and rating (described as user-item browse matrix, user-item wish list matrix, user-item purchase matrix, item review tag matrix, and user-item rating matrix respectively). -Item clustering. After the feature combination of item review tags and item attributes were obtained, item clusters were generated by the K-Means algorithm. -Feature supplementation. For each item without rating in the user-item rating matrix, the most similar neighbors from its cluster were picked out, and then it was rated based on the rating of its nearest neighbors. In this way, the user-item rating matrix was supplemented. -Integrating similarity. The preference of a user for an item category was defined based on the results of item clustering, and then the concern degree of the user for the item category was defined based on the user's historical behaviors. By integrating preference and concern degree, the tendency of the user for the item category could be defined, and the integrating similarity could be calculated. -User clustering. Clustering users according to the user-item category tendency matrix, and obtaining user clusters and their cluster centers.
-Online recommendation. Selecting some clusters from all user clusters by comparing cluster centers with a predefined similarity threshold, and then obtaining the set of nearest neighbors by measuring the similarity between the target user and users of selected clusters. Next, the rating prediction of all candidate items was made for the target user based on the rating of the nearest neighbors. Finally, a personalized service recommendation list (Top-N list) was generated and the most appropriate items were recommended to the target user.   -Item clustering. After the feature combination of item review tags and item attributes were obtained, item clusters were generated by the K-Means algorithm.
-Feature supplementation. For each item without rating in the user-item rating matrix, the most similar neighbors from its cluster were picked out, and then it was rated based on the rating of its nearest neighbors. In this way, the user-item rating matrix was supplemented.
-Integrating similarity. The preference of a user for an item category was defined based on the results of item clustering, and then the concern degree of the user for the item category was defined based on the user's historical behaviors. By integrating preference and concern degree, the tendency of the user for the item category could be defined, and the integrating similarity could be calculated.
-User clustering. Clustering users according to the user-item category tendency matrix, and obtaining user clusters and their cluster centers.
-Online recommendation. Selecting some clusters from all user clusters by comparing cluster centers with a predefined similarity threshold, and then obtaining the set of nearest neighbors by measuring the similarity between the target user and users of selected clusters. Next, the rating prediction of all candidate items was made for the target user based on the rating of the nearest neighbors. Finally, a personalized service recommendation list (Top-N list) was generated and the most appropriate items were recommended to the target user.

Feature Supplementation Based on Item Clustering
By using item clustering, similar neighbor items were selected for items without rating to predict their rating. As a result, a user-item rating matrix could be supplemented.
The feature input of a recommendation system could be transformed into optimization problems of feature extraction, feature combination. The function of the feature combination was to combine

Feature Supplementation Based on Item Clustering
By using item clustering, similar neighbor items were selected for items without rating to predict their rating. As a result, a user-item rating matrix could be supplemented.
The feature input of a recommendation system could be transformed into optimization problems of feature extraction, feature combination. The function of the feature combination was to combine the basic features extracted from different data sources, which contributed to the perfection of feature input. In the process of item clustering, feature input was constituted by the item/attribute matrix and the item review tag matrix. Item attributes contained the metadata of items, and item review tag extracted from the content of reviews, which was the community opinions of a user on an item.

Definition 1.
For an item set with m items and p kinds of attributes, q tags are extracted from the review content of a user, and then item/attribute matrix and item review tag matrix can be represented as (1) and (2) respectively, Appl. Sci. 2019, 9, 3141 8 of 20 As shown in Table 3, the feature combination can be represented as a m × (p + q) matrix, denoted as: For two arbitrary items u = s µ,1 , s µ,2 , . . . , s µ,k , . . . , s µ,l and v = s ν,1 , s ν,2 , . . . , s µ,k , . . . , s ν,l , the similarity of the items is measured by their distance, where λ k (0 ≤ λ k ≤ 1) is the weight of the k-th feature; d(s µ,k , s ν,k ) denotes the absolute value of the distance between the items u and v for the k-th feature; the similarity of the items u and v for the k-th feature is denoted as 1 1+d(s µ,k ,s ν,k ) . In this paper, the commonly used K-Means clustering algorithm is applied for item clustering (Algorithm 1).

Algorithm 1: Item-Clustering-K-Means
Input: The item set E = {e 1 , e 2 , . . . , e m } The item feature matrix S m×l The number of clusters k 1 Output: The item clusters E 1 , E 2 , . . . , E k 1 The cluster centers c 1 , c 2 , . . . , c k 1 1: select some items whose number is k 1 from item set E at random, i.e.,e 1 , e 2 , . . . , e k 1 , as the initial cluster centers 2: for each item e ∈ E 3: calculate its similarity with each cluster center, in turn, according to (4), and assign it to the corresponding cluster 4: end for 5: for each cluster 6: adjust the cluster center according to the average value of all items in the cluster, i.e., c i = 1 n i e, i = 1, 2, . . . , k 1 , where n i is the number of items for the i-th cluster 7: end for 8: return to step 2 till the square error of the cluster criterion function, i.e., reaches convergence Therefore, with the use of Algorithm 1, item set E will be divided into k 1 categories (namely clusters), i.e., E 1 , According to the analysis above, two item similarity calculation methods are introduced in the following.
Given that the number of users is n, for the user u and the user v, their union set of rated items is denoted as E uv = E u UE v , and the set of items without a rating for the user u is denoted as N u = E uv − E u . For an item e (e ∈ N u ) that has not been rated, if x is an arbitrary item of the same cluster that e belongs to, then e, x ∈ E i , i ∈ [1, k 1 ].
If e is not a new item (rated by other users), then the similarity of item e and item x can be calculated by Pearson correlation, where U ex is the set of all users that provide ratings for the item e and the item x; r e and r x are respectively the average ratings of all users for the item e and the item x; r u,e and r u,x respectively denote ratings of the user u for the item e and the item x.
If e is a new item (not rated by any user), then the similarity of the item e and the item x can be calculated based on the item feature matrix (according to Equation (4)), According to Equations (5)-(6), the set of nearest neighbors K 1 NN(e) was obtained which was comprised of the most similar k 1 items with the item e (not rated) in the same cluster. The nearest neighbor k 1 items were used to predict rating, as shown in the formula below: A null value in the user-item rating matrix means that one user has not rated the item. The null values in the user-item rating matrix will be replaced by the predicted rating as above. For the user-item rating matrix after feature supplementation, i.e., FR = R i,j n×m , the ratings of the user u for the item e can be represented as follows: R u,e = r u,e , i f user u had rated item e P u,e , otherwise .

Integrating Similarity Calculation and User Clustering
In this paper, we present an integrating similarity calculation method considering user behaviors. Item category is taken as a supplement for feature input based on the result of item clustering.
User behaviors can be categorized as implicit and explicit behaviors. Implicit behaviors show a user's concern for an item measured by the frequency of behaviors including browse, add-to-favorite and purchase, while explicit behavior indicates exactly a user's preference for an item, and the rating represents how much the user likes the item directly. As a matter of fact, the data size of browser behaviors is far larger than that of behaviors of add-to-favorite, and purchase. Moreover, ratings are the most valuable data with a minimum size. In order to guarantee the efficiency and quality of recommendation, data in recommendation systems are usually updated offline, for example, the short-period data (browser information) are totally updated once every day, the long-period data (information of add-to-favorite and purchase) are incrementally updated once every day, and the full-period data (rating information) are updated once every few days.
According to the analysis above, preference and concern degree of a user for an item category can be defined on the basis of user behaviors and frequency of these behaviors.

Definition 2.
User-item category preference is defined as the ratio of the sum of ratings of a user for an item category to that of the user for all items. For example, the category preference of the user u for the item category E i , i ∈ [1, k 1 ] is represented as: User behaviors on items are endowed with weights, e.g., ratings of browse, add-to-favorite and ordering, and they are valued at 1, 3 and 5 separately. Then statistics of all items in each item category are used to count behavior frequencies and the sum of behavior ratings.
A user may show a preference for a certain item category to some degree even when he/she knows little about the details. The ratio of behavior frequency of a user for an item category to that of the user for all categories is defined as RCF (relative category frequency). The ratio of the sum of behavior ratings of a user for an item category to that of the user for all categories is defined as RCR (relative category rating). The concern degree is defined as follows: where conc(u, E i ) ∈ [0, 1] is the concern degree of the user u for the item category E i ; bhvFrq u,i denotes the behavior frequency of the user u for the i-th category; bhvR u,i denotes the sum of behavior ratings of the user u for the i-th category. The behavior frequency and the sum of behavior ratings of the user u for all categories are respectively written as sumbhvFrq u,i and sumbhvR u,i .

Definition 3.
User-item category tendency is an integration of preference and concern degree of a user for an item category. The tendency of the user u for the item category E i is defined as follows: where ρ, 0 < ρ < 1 is a weight, called the regulating parameter.
Once the feature vectors of user-item category tendency were obtained, a matrix could be constructed by these vectors. If the feature vector of the user u for all categories (the number is k 1 ) is denoted as tnd u = tnd u,1 , tnd u,2 , . . . , tnd u,k 1 , then the tendency matrix is tnd[n][k 1 ], n is the number of users.
Furthermore, based on the definition of user-item category tendency, the calculation method of integrating similarity between users is introduced as follows: where, sim int (u, v) denotes the integrating similarity between the user u and the user v, tnd u and tnd v are respectively the feature vectors of category tendency for the user u and the user v. Through integrating similarity calculation considering item clustering and user behaviors, users who have the highest similarity will be assigned into one cluster. As similar with item clustering, user clustering applies the K-Means clustering algorithm as well. Algorithm 2 illustrates the user clustering method for n users.

Algorithm 2: User-Clustering-K-Means
Input: The user set U[n] = {u 1 , u 2 , . . . , u n } The user-item category tendency matrix tnd[n][k 1 ], The number of user clusters k 2 Output: The user clusters U 1 , U 2 , . . . , U k 2 1: select some users whose number is k 2 from user set U at random, i.e., u 1 , u 2 , . . . , u k 2 , as the initial cluster centers 2 : for each user u ∈ U 3: calculate its similarity with each cluster center, in turn, according to (14), and assign it to the nearest cluster 4: end for 5: for each cluster 6: adjust the cluster center according to the average value of item category tendency for all users in a cluster, i.e., c i = 1 At this point, users with similar user-item category tendency will be assigned to one cluster, and thus the search scope of neighbor users can be narrowed through the offline preparation process. In this way, we reduced the computational complexity of the online recommendation.

Online Recommendation
In this section, the process of searching the set of nearest neighbor users and making rating prediction is introduced, and an online recommendation for the target user was created.
The search scope will be too small if nearest neighbors are only selected from the cluster that has the highest similarity with the target user without regard to other clusters. Therefore, the outcome of clustering cannot be assumed to be optimal. For this reason, a proxy user is defined as the cluster center of each cluster. Firstly, we calculated the similarity between the proxy user and the target user, and then defined a threshold of similarity and select eligible clusters. Finally, the set of nearest neighbors was obtained from these clusters.
Through user clustering, n users were divided into k 2 clusters, i.e., U 1 , U 2 , . . . , U k 2 where U 1 ∪ U 2 ∪ . . . ∪ U k 2 = U and U i ∩ U j = ∅ (i, j ∈ [1, k 2 ]). Assuming that the user u was the target user, we calculated the similarity of the user u and its neighbor user. To obtain the neighbor user set, the Pearson correlation was used to measure the similarity, where v is a candidate neighbor user; E uv denotes the union set of items rated by the user u and the user v as well; R u and R v are respectively the average ratings given by the user u and the user v. Algorithm 3 illustrates the algorithm to search for a neighbor user set. Here, we use a weighted average approach to predict the personalized rating of an item for the target user.
where sim(u, v) is the similarity of the target user u and the neighbor user v in neighbor user set KNN(u); R u and R v are respectively the average ratings given by the user u and the user v in user-item rating matrix with feature supplemented, i.e.,FR = R i,j n×m ; R v,j denotes the rating of the user v for the item j.
Once the set of nearest neighbor users was obtained, the rating of all candidate items for the target user was predicted based on the historical ratings of K neighbor users, and the candidate items were sorted according to the predicted ratings. Finally, a personalized service recommendation list (Top-N list) was generated and the most appropriate items were recommended to the target user.
Note that most of the operations of HTPR are processed offline. During the online recommendation phase, we only need to accomplish missions including nearest neighbor searching, rating prediction and recommendation, which can achieve a high quality of recommendation service. On the other hand, in the offline phase, feature information and the result of item clustering and user clustering would be updated periodically. To be specific, the newly produced data of items and users will also be updated into the database for feature input and clustering model updating.

Experimental Results
In this section, experiments were conducted to evaluate the effectiveness and accuracy of the proposed HTPR. Experiments for item clustering and user clustering are conducted to evaluate the effectiveness of HTPR, and to determine the values of relevant parameters used in the clustering process. In addition, to evaluate accuracy, HTPR was compared with the other three recommendation methods: basic CF, user clustering based CF (UCCF) and item clustering based CF (ICCF). Four metrics were used to evaluate the accuracy: mean absolute error (MAE), precision, recall, and F1 score [42,43]. All the experiments are carried out with the spark framework and run on a workstation equipped with an Intel 12-core 3.5 GHz CPU, a GTX1080TI GPU, and 16 GB memory.

Experiment Dataset and Metrics
The dataset adopted in our experiments comes from an offline dataset used by an operation team of a group-buying website in China. With seven years of operation, the group-buying website has collected a share of large-scale user market in China and owns a comprehensive data system. The dataset used in this paper is taken from a half year (from June to December 2014) of its historical data, including ratings of 3043 users on 1628 items. The ratings are recorded on a numeric five-point scale (0, 1, 2, 3, 4). Table 4 shows the data structure of dataset with normalization. Table 4. Structure of the dataset.

Data Table Format
Item attribute <item id> <attribute id> <attribute weight> Item review tag <item id> <tag id> <tag relevant value> User/item browse <user id> <item id> <browse tag> <browse time> User/item wish list <user id> <item id> <add-to-favorite tag> <add-to-favorite time> User/item purchase <user id> <item id> <purchase tag> <purchase time> User/item rating <user id> <item id> <rating value> Note: the data table of item attributes was obtained through keywords extracted from the description of items based on word segmentation, and every word was endowed with weight; the feature tag of item review tag was extracted from the review content by using textual analysis, and the relevant values of tag are given with the range of (0, 1).
To make a comparison, the dataset was divided into a training set and a test set. As shown in Figure 4, the test set took from the data of 21-30 December 2014, from which data of one certain day were selected as the data validation object. On the other hand, the training set consisted of browser data of 7 days, add-to-favorite data and purchase data of 150 days, and all rating data before the selected day. Based on the training set, each user was recommended with several items, which will be compared with the data in the test set.  There are several types of metrics to evaluate the quality of a recommendation system such as accuracy, coverage, diversity and novelty. In this paper, MAE, precision, recall and F1 score were chosen as evaluation metrics. MAE represents the deviation of recommendations from actual userspecified values. The lower the MAE is, the more accurate the rating prediction is. Precision is the percentage of selected items that are "relevant"; and Recall is the percentage of relevant items that are selected.
Overall, the greater the value of precision and recall are, the more effective the prediction is. However, the emphasis of the two metrics is different. The aim of precision is to return mostly useful items, while recall is designed to avoid missing useful items. F-measure is a comprehensive metric; the greater the value of F1 is, the more effective the recommendation service is.

Effectiveness Evaluation
The first group of experiments verify the effectiveness of HTPR in term of item clustering and user clustering, the number of nearest neighbors and the number of clusters are determined as well.
(1) Validation test on item clustering. In the first experiment, the effectiveness of the item clustering was examined by measuring the search efficiency of nearest neighbor items. Specifically, the 1628 items in the dataset were clustered under the circumstances of 20 nearest neighbors. We There are several types of metrics to evaluate the quality of a recommendation system such as accuracy, coverage, diversity and novelty. In this paper, MAE, precision, recall and F1 score were chosen as evaluation metrics. MAE represents the deviation of recommendations from actual user-specified values. The lower the MAE is, the more accurate the rating prediction is. Precision is the percentage of selected items that are "relevant"; and Recall is the percentage of relevant items that are selected.
Overall, the greater the value of precision and recall are, the more effective the prediction is. However, the emphasis of the two metrics is different. The aim of precision is to return mostly useful items, while recall is designed to avoid missing useful items. F-measure is a comprehensive metric; the greater the value of F1 is, the more effective the recommendation service is.

Effectiveness Evaluation
The first group of experiments verify the effectiveness of HTPR in term of item clustering and user clustering, the number of nearest neighbors and the number of clusters are determined as well.
(1) Validation test on item clustering. In the first experiment, the effectiveness of the item clustering was examined by measuring the search efficiency of nearest neighbor items. Specifically, the 1628 items in the dataset were clustered under the circumstances of 20 nearest neighbors. We analyzed the impact of different clusters (20,30,40) on the search efficiency, the results are shown in Figure 5. Y-coordinate denotes the ratio of searched nearest neighbors of items, and X-coordinate denotes the percentage of items that have been searched. For further verifying the results, rating prediction of the items was made based on ratings of neighbor items and then compared with the data in the test set. The number of clusters varied from 10 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10, 20, 30, 40, 50 respectively. As shown in Figure 6, the curves of MAE values in different cases as a whole show down first and then up. They decreased obviously when the number of clusters increased from 10 to 40 or 50, and afterwards go up slowly and tend to stabilize. Moreover, it swa noticed that the curve whose nearest neighbors is 30 was lower than other curves, MAE achieved the lowest value when the number of clusters was 40. Therefore, for item clustering, the optimal values of parameters, i.e., the number of clusters and nearest neighbors could be 40 and 30 respectively. (2) Validation test on user clustering. This experiment was designed to verify the effectiveness of user clustering. The first part analyzed the influence of different user clusters on the search efficiency of nearest neighbor users. According to the results of item clustering, the number of clusters and nearest neighbors was respectively 40 and 30. While, for user clustering, the number of nearest neighbors was 30, the regulating parameter ρ was 0.5, the similarity threshold θ was 0.7. Figure 7 shows the diagram of search efficiency with different clusters varied from 30 to 50 with a step of 10. Y-coordinate denotes the ratio of searched nearest neighbors of users, and X-coordinate denotes the percentage of users that have been searched.
The results are similar to that of item clustering. When the percentage of users being searched It can be found in Figure 5 that more nearest neighbor items can be found out with less searched items through item clustering. Besides, when the number of clusters was far less than the number of items, the time to compute the similarity between the target item and cluster center was very short, compared to the time used in searching its nearest neighbors. In this case, the greater the number of clusters was, the higher the efficiency of neighbors searching was. However, it may cause a loss to accuracy if the number of clusters is too big.
For further verifying the results, rating prediction of the items was made based on ratings of neighbor items and then compared with the data in the test set. The number of clusters varied from 10 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10, 20, 30, 40, 50 respectively. As shown in Figure 6, the curves of MAE values in different cases as a whole show down first and then up. They decreased obviously when the number of clusters increased from 10 to 40 or 50, and afterwards go up slowly and tend to stabilize. Moreover, it swa noticed that the curve whose nearest neighbors is 30 was lower than other curves, MAE achieved the lowest value when the number of clusters was 40. Therefore, for item clustering, the optimal values of parameters, i.e., the number of clusters and nearest neighbors could be 40 and 30 respectively. For further verifying the results, rating prediction of the items was made based on ratings of neighbor items and then compared with the data in the test set. The number of clusters varied from 10 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10, 20, 30, 40, 50 respectively. As shown in Figure 6, the curves of MAE values in different cases as a whole show down first and then up. They decreased obviously when the number of clusters increased from 10 to 40 or 50, and afterwards go up slowly and tend to stabilize. Moreover, it swa noticed that the curve whose nearest neighbors is 30 was lower than other curves, MAE achieved the lowest value when the number of clusters was 40. Therefore, for item clustering, the optimal values of parameters, i.e., the number of clusters and nearest neighbors could be 40 and 30 respectively. (2) Validation test on user clustering. This experiment was designed to verify the effectiveness of user clustering. The first part analyzed the influence of different user clusters on the search efficiency of nearest neighbor users. According to the results of item clustering, the number of clusters and nearest neighbors was respectively 40 and 30. While, for user clustering, the number of nearest (2) Validation test on user clustering. This experiment was designed to verify the effectiveness of user clustering. The first part analyzed the influence of different user clusters on the search efficiency of nearest neighbor users. According to the results of item clustering, the number of clusters and nearest neighbors was respectively 40 and 30. While, for user clustering, the number of nearest neighbors was 30, the regulating parameter ρ was 0.5, the similarity threshold θ was 0.7. Figure 7 shows the diagram of search efficiency with different clusters varied from 30 to 50 with a step of 10. Y-coordinate denotes the ratio of searched nearest neighbors of users, and X-coordinate denotes the percentage of users that have been searched. Likewise, in the second part, to investigate the relation between the relevant parameters of user clustering and the accuracy of rating prediction, we varied the number of clusters from 30 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10,20,30,40,50 respectively. Values of the regulating parameter ρ and the similarity threshold θ remained the same as above.
Overall, as shown in Figure 8, MAE values decreased first and then rose with the increase in the numbers of clusters. As shown in Figure 8, the lowest MAE appeared with 0.75 when the clusters were 50 or 60; the MAE values were relatively lower with 0.85 when the nearest neighbors were 30, 40; all the curves dropped to their lowest points when the number of clusters was 50. Thus, in the following experiments, the number of clusters for user clustering was chosen as 50, while the number of nearest neighbors was indeterminate, which can be 30 or 40.  Figure 9 shows the effect of the regulating parameter on rating prediction. In view of above analyses, the number of clusters and nearest neighbors was respectively 40 and 30 for item clustering, while for user clustering, the number was 50 and 30 respectively, and the similarity threshold θ was 0.7. We varied the regulating parameter ρ from 0 to 1 with a step of 0.1. It can be found in Figure 10 that MAE achieved the lowest value when ρ = 0.4. Accordingly, the regulating parameter can be set to 0.4 in the following experiments. Note that we obtained the same conclusion if we only changed the number of nearest neighbors for user clustering to 40. The results are similar to that of item clustering. When the percentage of users being searched increased, the ratio of searched nearest neighbors increased. Through user clustering, more nearest neighbor users can be found within fewer search spaces; the greater the number of clusters was, the higher the search efficiency was.
Likewise, in the second part, to investigate the relation between the relevant parameters of user clustering and the accuracy of rating prediction, we varied the number of clusters from 30 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10,20,30,40,50 respectively. Values of the regulating parameter ρ and the similarity threshold θ remained the same as above.
Overall, as shown in Figure 8, MAE values decreased first and then rose with the increase in the numbers of clusters. As shown in Figure 8, the lowest MAE appeared with 0.75 when the clusters were 50 or 60; the MAE values were relatively lower with 0.85 when the nearest neighbors were 30, 40; all the curves dropped to their lowest points when the number of clusters was 50. Thus, in the following experiments, the number of clusters for user clustering was chosen as 50, while the number of nearest neighbors was indeterminate, which can be 30 or 40. Likewise, in the second part, to investigate the relation between the relevant parameters of user clustering and the accuracy of rating prediction, we varied the number of clusters from 30 to 100 with a step of 10 to compute the average MAE under different circumstances, i.e., the number of nearest neighbors was set to 10,20,30,40,50 respectively. Values of the regulating parameter ρ and the similarity threshold θ remained the same as above.
Overall, as shown in Figure 8, MAE values decreased first and then rose with the increase in the numbers of clusters. As shown in Figure 8, the lowest MAE appeared with 0.75 when the clusters were 50 or 60; the MAE values were relatively lower with 0.85 when the nearest neighbors were 30, 40; all the curves dropped to their lowest points when the number of clusters was 50. Thus, in the following experiments, the number of clusters for user clustering was chosen as 50, while the number of nearest neighbors was indeterminate, which can be 30 or 40.  Figure 9 shows the effect of the regulating parameter on rating prediction. In view of above analyses, the number of clusters and nearest neighbors was respectively 40 and 30 for item clustering, while for user clustering, the number was 50 and 30 respectively, and the similarity threshold θ was 0.7. We varied the regulating parameter ρ from 0 to 1 with a step of 0.1. It can be found in Figure 10 Figure 9 shows the effect of the regulating parameter on rating prediction. In view of above analyses, the number of clusters and nearest neighbors was respectively 40 and 30 for item clustering, while for user clustering, the number was 50 and 30 respectively, and the similarity threshold θ was 0.7. We varied the regulating parameter ρ from 0 to 1 with a step of 0.1. It can be found in Figure 10 that MAE achieved the lowest value when ρ = 0.4. Accordingly, the regulating parameter can be set to 0.4 in the following experiments. Note that we obtained the same conclusion if we only changed the number of nearest neighbors for user clustering to 40.

Accuracy Evaluation
HTPR was compared with basic CF, UCCF and ICCF on the prediction quality (MAE), and on the quality of recommendation (Precision, Recall and F1) to evaluate accuracy.
(1) Comparison of HTPR with CF, UCCF and ICCF on MAE. In this experiment, the number of clusters and nearest neighbors for item clustering was respectively 40 and 30; for user clustering, the number of clusters was 50; the similarity threshold θ was 0.7; the regulating parameter ρ was 0.4. We varied the number of nearest neighbors from 10 to 100 with a step of 10. Figure 10 shows the MAE values of CF, ICCF, UCCF and HTPR. It can be found that the MAE values of ICCF, UCCF and HTPR were much lower than CF. In addition, the MAE value of HTPR was the lowest, the curve of HTPR shows down first and then up, and the MAE achieved a lowest value 0.732 when the number of nearest neighbors was 40. Thus the proposed HTPR method was proven to provide more accurate predictions than CF, ICCF and UCCF. (2) Comparison of HTPR with CF, ICCF and UCCF on precision, recall and F1. In most recommendation applications, users would be recommended a personalized recommendation list, i.e., Top-N list. The services/items in higher positions should be more appropriate than the services/items in lower positions of the Top-N list. Precision, pecall and F1 were used as evaluation metrics to evaluate the quality of the Top-N recommendation list. The experiment setup was the same as for the previous comparison, except that the number of nearest neighbors for user clustering was set as 40 according to the above experiments. Figures 11-13 show the precision, recall and F1 values of Top-N (N ranged from 5 to 50 with a step of 5) recommendation list of CF, ICCF, UCCF and HTPR. The Y-coordinate respectively denotes the precision, recall and F1 values, and the X-coordinate denotes the number of items in recommendation list, i.e., N. It can be found that the precision values decreased when N increased,

Accuracy Evaluation
HTPR was compared with basic CF, UCCF and ICCF on the prediction quality (MAE), and on the quality of recommendation (Precision, Recall and F1) to evaluate accuracy.
(1) Comparison of HTPR with CF, UCCF and ICCF on MAE. In this experiment, the number of clusters and nearest neighbors for item clustering was respectively 40 and 30; for user clustering, the number of clusters was 50; the similarity threshold θ was 0.7; the regulating parameter ρ was 0.4. We varied the number of nearest neighbors from 10 to 100 with a step of 10. Figure 10 shows the MAE values of CF, ICCF, UCCF and HTPR. It can be found that the MAE values of ICCF, UCCF and HTPR were much lower than CF. In addition, the MAE value of HTPR was the lowest, the curve of HTPR shows down first and then up, and the MAE achieved a lowest value 0.732 when the number of nearest neighbors was 40. Thus the proposed HTPR method was proven to provide more accurate predictions than CF, ICCF and UCCF. (2) Comparison of HTPR with CF, ICCF and UCCF on precision, recall and F1. In most recommendation applications, users would be recommended a personalized recommendation list, i.e., Top-N list. The services/items in higher positions should be more appropriate than the services/items in lower positions of the Top-N list. Precision, pecall and F1 were used as evaluation metrics to evaluate the quality of the Top-N recommendation list. The experiment setup was the same as for the previous comparison, except that the number of nearest neighbors for user clustering was set as 40 according to the above experiments. Figures 11-13 show the precision, recall and F1 values of Top-N (N ranged from 5 to 50 with a step of 5) recommendation list of CF, ICCF, UCCF and HTPR. The Y-coordinate respectively denotes the precision, recall and F1 values, and the X-coordinate denotes the number of items in recommendation list, i.e., N. It can be found that the precision values decreased when N increased, while the recall values increased when N increased. The F1 values went up first and then down when N increased. Besides, the precision, recall and F1 values of HTPR were comparatively higher than those of CF, ICCF and UCCF. It can also be noticed that when N = 25, the proposed HTPR method achieved the most accurate quality of recommendation.

Accuracy Evaluation
HTPR was compared with basic CF, UCCF and ICCF on the prediction quality (MAE), and on the quality of recommendation (Precision, Recall and F1) to evaluate accuracy.
(1) Comparison of HTPR with CF, UCCF and ICCF on MAE. In this experiment, the number of clusters and nearest neighbors for item clustering was respectively 40 and 30; for user clustering, the number of clusters was 50; the similarity threshold θ was 0.7; the regulating parameter ρ was 0.4. We varied the number of nearest neighbors from 10 to 100 with a step of 10. Figure 10 shows the MAE values of CF, ICCF, UCCF and HTPR. It can be found that the MAE values of ICCF, UCCF and HTPR were much lower than CF. In addition, the MAE value of HTPR was the lowest, the curve of HTPR shows down first and then up, and the MAE achieved a lowest value 0.732 when the number of nearest neighbors was 40. Thus the proposed HTPR method was proven to provide more accurate predictions than CF, ICCF and UCCF.
(2) Comparison of HTPR with CF, ICCF and UCCF on precision, recall and F1. In most recommendation applications, users would be recommended a personalized recommendation list, i.e., Top-N list. The services/items in higher positions should be more appropriate than the services/items in lower positions of the Top-N list. Precision, pecall and F1 were used as evaluation metrics to evaluate the quality of the Top-N recommendation list. The experiment setup was the same as for the previous comparison, except that the number of nearest neighbors for user clustering was set as 40 according to the above experiments. Figures 11-13 show the precision, recall and F1 values of Top-N (N ranged from 5 to 50 with a step of 5) recommendation list of CF, ICCF, UCCF and HTPR. The Y-coordinate respectively denotes the precision, recall and F1 values, and the X-coordinate denotes the number of items in recommendation list, i.e., N. It can be found that the precision values decreased when N increased, while the recall values increased when N increased. The F1 values went up first and then down when N increased. Besides, the precision, recall and F1 values of HTPR were comparatively higher than those of CF, ICCF and UCCF. It can also be noticed that when N = 25, the proposed HTPR method achieved the most accurate quality of recommendation.

Conclusions and Future Work
In this paper, a hybrid two-phase recommendation method for group-buying e-commerce applications was proposed to particularly address the problems of scalability and data sparsity by integrating collaborative filtering and clustering techniques. In this hybrid two-phase recommendation method, the feature matrix was developed to alleviate the sparsity problem by using a feature description and combination approach, the user-item category tendency was defined to integrate users' preferences with users' concern degrees for item category, the concept of

Conclusions and Future Work
In this paper, a hybrid two-phase recommendation method for group-buying e-commerce applications was proposed to particularly address the problems of scalability and data sparsity by integrating collaborative filtering and clustering techniques. In this hybrid two-phase recommendation method, the feature matrix was developed to alleviate the sparsity problem by using a feature description and combination approach, the user-item category tendency was defined to integrate users' preferences with users' concern degrees for item category, the concept of

Conclusions and Future Work
In this paper, a hybrid two-phase recommendation method for group-buying e-commerce applications was proposed to particularly address the problems of scalability and data sparsity by integrating collaborative filtering and clustering techniques. In this hybrid two-phase recommendation method, the feature matrix was developed to alleviate the sparsity problem by using a feature description and combination approach, the user-item category tendency was defined to integrate users' preferences with users' concern degrees for item category, the concept of

Conclusions and Future Work
In this paper, a hybrid two-phase recommendation method for group-buying e-commerce applications was proposed to particularly address the problems of scalability and data sparsity by integrating collaborative filtering and clustering techniques. In this hybrid two-phase recommendation method, the feature matrix was developed to alleviate the sparsity problem by using a feature description and combination approach, the user-item category tendency was defined to integrate users' preferences with users' concern degrees for item category, the concept of integrating similarity between users was proposed by considering user behaviors and their frequencies, a parallelized strategy to optimize the recommendation process was also studied. The experiments from a real-world dataset indicated that the proposed HTPR method was effective.
Our future work will focus on the application of the HTPR method and analyze the strengths and weaknesses of the real-time group-buying e-commerce application.