Recommendation Algorithm Using Clustering-Based UPCSim (CB-UPCSim)

: One of the well-known recommendation systems is memory-based collaborative ﬁltering that utilizes similarity metrics. Recently, the similarity metrics have taken into account the user rating and user behavior scores. The user behavior score indicates the user preference in each product type (genre). The added user behavior score to the similarity metric results in more complex computation. To reduce the complex computation, we combined the clustering method and user behavior score-based similarity. The clustering method applies k -means clustering by determination of the number of clusters using the Silhouette Coefﬁcient. Whereas the user behavior score-based similarity utilizes User Proﬁle Correlation-based Similarity (UPCSim). The experimental results with the MovieLens 100k dataset showed a faster computation time of 4.16 s. In addition, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values decreased by 1.88% and 1.46% compared to the baseline algorithm. Overall, the average running time is 0.91 s (decreased by 4.16 s or 5.5 times faster than before the clustering process). It shows that the performance of the execution time after the clustering process is better than before the clustering process, in the sense that the clustering process helps speed up the execution time to generate recommendations. It occurs because the amount of data in each cluster executed is less than before the clustering process. T.W.; validation, T.W., I.H. and T.B.A.; formal analysis, T.W.; investigation, T.W.; resources, T.W.; data curation, T.W.; writing—original draft preparation, T.W.; writing—review and editing, I.H. and T.B.A.; visualization, T.W.; supervision, I.H. and T.B.A.; project administration, T.W.; funding acquisi-tion, T.W. and T.B.A.


Introduction
The increasing volume and complexity of online information make it difficult for users to obtain appropriate information. The recommendation system is the ultimate solution to deal with the information explosion [1,2]. This system is a valuable information filtering tool to assist users in finding a product or service from the many possibilities that exist.
Recommendation systems have developed rapidly, and various domains have used them, such as movies, music, news, books, restaurants, and other media. In addition, several researchers have developed recommendation systems with many existing approaches, including demographic filtering, content-based filtering, collaborative filtering, and hybrid filtering [3,4].
One of the most prevalent approaches to recommendation systems is collaborative filtering [5][6][7][8]. This approach is capable of generating recommendations based on the ratings provided by the users for several items. Collaborative filtering consists of two methods: model-based and memory-based. The first method uses a model built from the ratings to generate recommendations, while the second method utilizes similarity metrics to get the distance between two users/items [6,9].
In recent years, several researchers have proposed collaborative filtering using the similarity metrics approach to increase the accuracy of recommendations. Some of the proposed similarity metrics are Proximity-Significance-Singularity (PSS) [10], Bhattacharyya [11], multi-level collaborative filtering [12], item frequency-based similarity [13], Triangle Multiplying Jaccard (TMJ) [14], and three impact factors-based similarity [3]. However, these similarity metrics only consider the user rating score to calculate similarities between users. The user rating score is the value given directly by the user in assessing the selected or purchased product. This score ranges from 1 to 5, with a score of 1 indicating that the user

Related Work
The memory-based method employs all rating data to generate a list of recommended products [11]. This method focuses on similarity metrics to count the similarity between products/users. The similarity metrics build upon the distance between two users or products. Pearson Correlation Coefficient (PCC) and Cosine Similarity (COS) are traditional similarity metrics that are frequently used in recommendation systems [20].
Several studies performed improvement of traditional similarity metrics to increase the performance of recommendation systems. Patra et al. [11] presented the Bhattacharyya coefficient in collaborative filtering (BCF) similarity. BCF utilized all rating scores assessed by pairs of users, combined with Jaccard similarity. Furthermore, Polatidis et al. [12] presented the improvement of PCC by using the number of co-rated products in some levels. This new similarity metric is then called multi-level collaborative filtering. Sun et al. [14] offered a new similarity metric called TMJ (Triangle Multiplying Jaccard) similarity. Triangle similarity considers the co-rated data, while Jaccard similarity provides information on ratings not assessed together. Finally, Feng et al. [3] developed a new similarity metric by considering three impact factors (S a , S b , and S c ). S a describes the similarity between users, S b states the tendency of users to give ratings, and S c expresses the rating weight of each user. In general, these previous researchers only utilize the user rating score to count the similarity metrics.
Furthermore, Wu et al. [15] proposed a novel similarity metric by incorporating the user rating-based and behavior-based similarity score to improve recommendation accuracy. The user behavior score is the accumulated scores in assessing/viewing the product type (genre). In the movie recommender system, this score indicates how much users like a movie genre based on the movie title they watch. For example, User A watches the movie title "Aladdin" with the movie genre of animation, children, and comedy. Each genre will get a behavior score of "1". After that, User A watches the movie title "Get Shorty", which has the movie genre of action, comedy, and drama. Each genre will also get the behavior score by "1". Consequently, User A implicitly gives the user behavior scores of animation = 1, children = 1, comedy = 2, action = 1, and drama = 1. The genre "Comedy" gets a score = 2 because the movie titles of "Alladin" and "Get Shorty" are included in the genre "Comedy." Therefore, the user behavior score ranges from 1 to N. If the user frequently accesses the product types, the value of N will be immense. Conversely, if the user rarely accesses the product types, the value of N will be smaller. Wu et al. applied the probability of the user behavior score to calculate the similarity metric. This similarity metric is called User score Probability Collaborative Filtering (UPCF). Their research results reduced the MAE and RMSE values by 1.51% and 0.94% compared to the traditional similarity metric, i.e., Cosine Similarity.
Recently, the research conducted by [16] presents a novel similarity metric known as User Profile Correlation-based Similarity (UPCSim). UPCSim improved UPCF, replacing the similarity weight in UPCF (threshold value) with the correlation coefficient between user profile factors and user rating/behavior score. The results showed that the MAE and RMSE values decreased by 1.64% and 1.4%.
Although UPCF and UPCSim improve recommendation performance, these similarity metrics still have shortcomings. Combining the two similarities makes the computation more complex. Therefore, the increasing data will consume time to produce recommendations.
Several researchers applied clustering methods to reduce the increasing data in their recommendation system. For example, Vellaichamy et al. [19] utilized clustering to overcome the scalability problem and improve recommendation quality. The clustering algorithm applied Fuzzy C-Means (FCM) with Bat optimization and determined the number of clusters to 16 groups. Bat algorithm serves to obtain the initial cluster position. Furthermore, PCC similarity calculates the similarity between users. The experimental results reduced the MAE value and increased the precision and recall values.
Meanwhile, Lestari et al. [17] applied k-means clustering to reduce the dataset by determining the number of clusters with 7. The cluster division utilized one of the user profiles factors, namely age. Each cluster then performs the recommendation process using a ranking-oriented collaborative filtering approach, known as WP-Rank. The results showed the Normalized Discounted Cumulative Gain (NDCG) increased by 0.022 and a longer running time of 0.026 s.
In the meantime, Tran et al. [18] implemented collaborative filtering based on clustering with an incentive/penalized user (IPU) model to overcome a large volume of data and improve the performance of the recommendation system. Their research combines spectral clustering and FCM algorithms by dividing the rating data into 10 clusters. After that, each product receives an incentive/penalty based on the user's tendency in the cluster. The experimental results showed a significant increase in the F1 score, precision, and recall.
The three previous studies [17][18][19] utilize the partition-based clustering methods, i.e., FCM and k-means. The FCM has O(nk 2 t) time complexity, whereas the k-means has O(nkt). n, k, and t represent the number of data, the number of clusters, and the number of iterations, respectively [21]. These studies determined the number of clusters without measuring the clustering quality to obtain the optimal number of clusters. In addition, the study conducted by [17] only used the age factor in grouping users without considering other factors that affected the user preferences. Therefore, our study utilizes the intrinsic method of the Silhouette Coefficient to determine the optimal number of clusters and clustering the user and rating data based on all user profile factors (i.e., age, gender, job, and location). The clustering method in our study uses k-means clustering because the k-means method requires less computation time than FCM.

Research Method
This research proposes a recommendation algorithm that combines the clustering method and similarity based on user behavior scores (i.e., UPCSim), known as CB-UPCSim. The study consists of five stages: data collection, data preparation, clustering process, memory-based process, and evaluation, as illustrated in Figure 1. The following subsection presents the details of each stage.

Data Collection
We utilized the MovieLens 100k dataset in this study. The dataset includes 100,000 ratings, 943 users, and 1682 movies collected by the "GroupLens Research Group of the University of Minnesota [22]. The rating scale is between 1 and 5. Score 1 indicates the user does not really like the movie, and score 5 indicates the user really likes the movie. The sparsity and density of this rating data are 93.7% and 6.3%. There are 19 genres in the dataset, and each movie can contain several genres. Each user rates at least 20 movies and has information about the user profile (i.e., gender, age, job, and location).

Data Preparation
In this study, the data preparation stage is data pre-processing that prepares raw data before the following process (clustering and memory-based processes) to obtain clean data. One way to data pre-processing is to reduce irrelevant attributes. For example, there are timestamps, movie titles, release dates, video release dates, and IMDb URLs as irrelevant attributes in the MovieLens dataset.

Clustering Process
This stage consists of two steps. The first step is to determine the number of clusters to get the optimal number of clusters using the Silhouette Coefficient. The second step is to group the user data using the k-means algorithm, where k is the number of clusters with the maximum Silhouette Coefficient value. The details of each method are presented as follows.

Determination of the Number of Clusters
This process aims to measure the clustering quality to obtain the optimal number of clusters (k). The clustering quality measurement consists of two methods: extrinsic and intrinsic [21]. The extrinsic method compares a clustering result with the ideal clustering made by experts. If there is no ideal clustering from experts, we can use the intrinsic method, which evaluates the clustering quality by testing how far apart the clusters are and how dense the clusters are.
One of the metrics used in the intrinsic method is the Silhouette Coefficient. This method measures an object's similarity to its cluster (cohesion) compared to other clusters (separation). The following steps describe how to count the Silhouette Coefficient's value [21]:

1.
Calculate the average distance from one document to another in a cluster using the formula defined in Equation (1).
j is another document in one cluster A, and d(i, j) is the distance between document i and document j.

2.
Calculate the average distance from the document i to all documents in other clusters, using the formula defined in Equation (2). Then, find the minimum average distance using Equation (3). 3.
Calculate the Silhouette Coefficient value using Equation (4).
The value of a(i) represents the density of the cluster containing object i. The smaller the value of a(i), the denser the cluster. Meanwhile, the value of b(i) indicates how far object i is apart from other clusters. The greater the value of b(i), the further apart object i is from other clusters. If the value of a(i) is minimal and the value of b(i) is immense, then the Silhouette Coefficient of object i will be close to 1. It means that the cluster containing object i is very dense, and object i is far from other clusters. Conversely, if the value of a(i) is immense and the value of b(i) is minimal, then the Silhouette Coefficient of object i will be close to −1. It means that the cluster containing object i is not congested, and object i is very close to other clusters.
The clustering results are said to be good if the Silhouette Coefficient value is positive. The Silhouette Coefficient value equal to 1 is the maximum value, which states that the number of clusters produced is perfect. This study uses the intrinsic Silhouette Coefficient method to obtain the optimal number of clusters (k).
We need to determine the optimal number of clusters because previous researches [17][18][19] determined the number of clusters directly without measuring the clustering quality to obtain the optimal number of clusters that will affect the similar user preferences in the same cluster.

Data Clustering
Clustering is a technique in data mining that groups the same objects in one cluster and different objects into different clusters [23][24][25][26]. Clustering is known as unsupervised learning because there is no class label. The clustering consists of four methods: hierarchicalbased, grid-based, density-based, and partition-based.
The partition-based method works by dividing the data into several non-overlapping groups, and each data is in precisely one cluster [21]. This method is also known as the center-based method or the representative-based method [27]. Some of the algorithms included in the partition-based method are k-means, k-medoids, k-modes, and fuzzy c-means. k-means is one of the prominent algorithms in recommender systems [28].
The k-means algorithm aims to group data by maximizing data similarity in one cluster and minimizing the similarity inter clusters. The similarity measurement utilizes a distance function. The distance between data p in the C i cluster and c i centroid in the k-means algorithm uses the Euclidean distance. The shortest distance between the data and the centroid point indicates the maximum data similarity. The output of the k-means is highly dependent on the initial centroid that is randomly determined [26,[29][30][31].
This study applies a partition-based method, k-means clustering, because this method is simple, easy to implement, and has fast computation time. Therefore, it is suitable for solving complex computational problems in the similarity metrics of recommendation systems. The clustering process works by grouping user and rating data based on all user profile factors.

Memory-Based Process
The memory-based method consisted of two processes: similarity calculation and rating prediction. In this study, the similarity calculation applies UPCSim [16], which refers to Equation (5).
S(x, y) represents the final similarity between users x and y. S r (x, y) denotes the user rating score-based similarity between users x and y, whose formula refers to Equation (6). S b (x, y) states the user behavior score-based similarity between users x and y, whose formula refers to the Equation (7). Finally, α and β are correlation coefficients between user profile attributes and user rating/behavior scores, calculated using multiple linear regression [16].
P x and P y express the set of products rated by user x and user y, respectively. Next, r xp and r yp state the rating values on product p by user x and user y, respectively. Furthermore, r x and r y describe the rating averages for users x and y. Finally, p is one of the co-rated products by users x and y.
G x and G y denote the set of product types assessed by user x and user y, respectively. Next, P xg and P yg express the probability of product type g given by users x and y. Furthermore, P x and P y indicate the average probability of product type from users x and y. Finally, g is a co-rated product type of users x and y.
The illustration of the similarity calculation between users can be explained as follows. The initial step is generating a rating matrix. For example, matrix R shows the user rating score given by five users on seven products. The blank value of matrix R indicates the sparseness of the matrix.
After generating the rating matrix, the next step calculates the similarity based on the user rating score (S r ) by referring to Equation (6). Matrix S r shows the results of similarity S r . After calculating the similarity S r , the next step calculates the user behavior scorebased similarity (S b ). Equation (7) explains this similarity S b using the probability matrix of user behavior scores and the similarity S b formula. The user behavior score is the total score given by the user in accessing the product type. Table 1 shows the product type data of the seven products rated by the previous five users. In this case, each product can be part of several products types. For example, product p 1 (movie title "Alladin") includes the product's types of animation, children, and comedy.  Based on matrix R and Table 1, the illustration of user behavior scores can be explained as follows. If user 1 accesses product p 1 (the movie title "Aladdin"), then user 1 also accesses the product types (animation, children, and comedy), each of which will get a user behavior score of 1. Furthermore, user 1 accesses products p 2 , p 4 , and p 5 . As a result, user 1 will access the product type of action = 1, animation = 1, children = 1, comedy = 2, crime = 1, drama = 2, and thriller = 1. These values are called the user behavior scores. In the same way, we calculate the user behavior scores for user 2, user 3, user 4, and user 5. Matrix B shows the final results of calculating the user behavior scores of five users on eight product types.
The next step is to calculate the probability score of user behavior by dividing the user behavior score by the number of users who access the product type. For example, based on matrix B, user 1 and user 3 accessed the product type of G 1 . Thus, the probability of user behavior score for user 1 and user 3 on product type of G 1 is 0.5. Matrix P shows the probability score of user behavior from the five users of eight product types. After calculating similarity, the following process is the rating prediction. This process aims to predict the rating score for unrated products by active users. Before making a rating prediction, it is necessary to determine the number of nearest neighbors (k). In this study, the k value ranges from 10 to 100, incremented by 10 [3,12,[14][15][16].
The formula for calculating the predicted rating for the unrated products is expressed in Equation (8) [3,11].r xp = r x + ∑ y NNx S(x, y)· r yp − r y ∑ y NNx |S(x, y)| r xp is the predicted rating score from user x to product p. y NNx represents the set of users who have the nearest similarity to the user x. S(x, y) denotes the final similarity between users x and y. r x and r y are the rating score average of users x and y, respectively. Finally, r yp is the given rating score by user y to product p.

Evaluation
The final stage in this study is evaluation. This stage evaluates the recommendation system's performance that combines the k-means clustering method and user behaviorbased similarity. The recommendation system's performance was measured by predictive metrics and running time. In this study, the predictive metrics utilize Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) [32,33].
The MAE calculates the average absolute deviation between the actual and the predicted rating scores. At the same time, RMSE computes the deviation degree between the actual and the predicted rating scores. A lower MAE and RSME represent good recommendation quality [19,34].
The formulas of MAE and RMSE refer to Equations (9) and (10).
N is the total of predicted products. The r xp andr xp denote the actual and predicted ratings of the user x to product p, respectively.

Experiment Result and Discussion
This section provides the experiment results, starting from determination of the number of clusters, the data distribution after clustering, comparison of MAE and RMSE, and comparison of the running time. Finally, the discussion explains the findings of this study.
Experiments in the MovieLens 100k dataset used the computer specifications of Intel ® Core™ i7-4510U CPU @ 2.000 GHz (4CPUs),~2.6 GHz, and RAM of 16 GB. The UPCSim and CB-UPCSim algorithms were programmed using Python, running under Microsoft Windows 7.

Result of Silhouette Coefficient
The experiment for determining the optimal number of clusters begins with selecting the number of clusters to be evaluated, ranging from 2 to 19. The minimum number of clusters is 2, based on the smallest possible clusters. Meanwhile, the maximum number of clusters is 19, referring to the optimal value after the largest clusters used in previous studies [17][18][19]. At k = 18, the Silhouette Coefficient yields the second highest optimal value, and at k = 19, the Silhouette Coefficient value decreases again. We do not continue to the next k because the higher k needs more computation time. Figure 2 shows the results of the clustering evaluation using the Silhouette Coefficient method. Note that the Silhouette Coefficient value is an average value taken from 5 experiments. Based on Figure 2, the number of clusters (k) equal to 3 gets the maximum value of the Silhouette Coefficient, showing the optimal number of clusters. Therefore, the number of clusters k equal to 3 will be used as the basis for the clustering process.

Result of k-Means Clustering
The k-means clustering process with a value of k = 3 works to group 943 users and rating data on the MovieLens 100k dataset into 3 clusters based on the similarity of user profiles. Figure 3 shows the clusters formed, i.e., cluster 0, cluster 1, and cluster 2. Based on Figure 3, the distribution of user data after clustering is in cluster 0 with 319 users (33.83%), cluster 1 with 336 users (35.63%), and cluster 2 with 288 users (30.54%). Meanwhile, the distribution of rating data is in cluster 0 with 32,906 ratings (32.91%), cluster 1 with 34,370 ratings (34.37%), and cluster 2 with 32,724 ratings (32.72%). The sparsity and density of each cluster are 93.87% and 6.13% (in cluster 0), 93.92% and 6.08% (in cluster 1), and 93.24% and 6.76% (in cluster 2). Table 2 shows the details of the statistical data before and after the clustering process.

Result of MAE and RMSE
This subsection aims to compare the results of the recommendation system performance using a combination of memory-based and clustering methods with the memorybased method (without clustering) in the previous study. The results of the recommendation system performance are measured based on the MAE and RMSE values.
The experiment was performed by dividing the dataset of each cluster into several parts using the k-fold cross-validation method. In machine learning, the value of k is generally 5 or 10. Both of these values have been empirically proven to produce estimates of test error rates that are neither too high bias nor very high variance [35]. We select k = 5 from these values to divide the dataset because k = 5 consumes less time than k = 10. In addition, many previous studies [1,3,9,10,12,16] in recommender systems split the dataset into 80%:20% as the training data and testing data.
The value of k in this study is five. Hence, there are five training data (train_1, train_2, train_3, train_4, and train_5) and five testing data (test_1, test_2, test_3, test_4, and test_5). Therefore, there are five iterations performed in each cluster. The first iteration uses the train_1 and test_1 datasets, the second iteration uses the train_2 and test_2 datasets, repeated until the fifth iteration. The number of nearest neighbors ranges from 10 to 100. Table 3 compares the average MAE values before and after the clustering process in the MovieLens 100k dataset. Based on Table 3, the average of MAE values decreases as the number of nearest neighbors (N) grows. The average of MAE values after the clustering process reduces compared to before the clustering process. The decrease in the average of MAE values in cluster 0 is 1.89%, cluster 1 is 1.74%, and cluster 2 is 2.01%. It shows that the prediction accuracy after the clustering process experienced an average increase of 1.88% compared to before. Figure 4 shows a graphic illustration of the average MAE values before the clustering process (using the UPCSim algorithm) and after the clustering process (using the CB-UPCSim algorithm in each cluster).  Figure 4 illustrates that the average of MAE values generated in the UPCSim algorithm and the combination of UPCSim and clustering (cluster 0, cluster 1, and cluster 2) decreases very sharply at the beginning of the curve. Meanwhile, at the end of the curve, the average of MAE values tends to be stable. In the same number of nearest neighbors, the average MAE values in each cluster shows a smaller value than the UPCSim algorithm. It shows that the clustering process results in the performance of the recommendation algorithm increases, especially in rating prediction. Table 4 compares the average RMSE values before and after the clustering process using the MovieLens 100k dataset. The average RMSE value in each cluster is always smaller than the RMSE generated before the clustering process. In other words, there is a decrease in the RMSE value after the clustering process compared to before the clustering process. The reduction in the average of RMSE values in cluster 0 is 1.45%, cluster 1 is 1.27%, and cluster 2 is 1.65%. Thus, the recommendation performance generated after the clustering process obtained an average increase of 1.46%.  Figure 5 shows the graphic illustration of the average RMSE value before the clustering process (using the UPCSim algorithm) and after the clustering process (in each cluster).  Figure 5 shows that the increase in the number of nearest neighbors affects the resulting RMSE values. The four scenarios exhibit a decline in the value of RMSE initially, then stabilize as the number of nearest neighbors grows. The RMSE value generated in each cluster shows a smaller value than the RMSE value before the clustering process. It shows that the clustering process affects the resulting recommendation performance.

Result of Running Time
In addition to measuring the MAE and RMSE values as recommendation metrics, this experiment also computed the resulting execution time to see the effect of the clustering process on the running time of algorithms. Table 5 presents the average running time before and after the clustering process. Please note that the running time calculated in this study is a combination of training and testing times. The running time after the clustering process in each cluster is faster than before the clustering process. The running time in cluster 0 shows 4.14 s faster, cluster 1 shows 4.12 s faster, and cluster 2 shows 4.22 s faster. Overall, the average running time is 0.91 s (decreased by 4.16 s or 5.5 times faster than before the clustering process). It shows that the performance of the execution time after the clustering process is better than before the clustering process, in the sense that the clustering process helps speed up the execution time to generate recommendations. It occurs because the amount of data in each cluster executed is less than before the clustering process.
We also measured the computation times for determining the optimal number of clusters using the Silhouette Coefficient and k-means clustering. Both methods need computation times of 1.03 s and 30.32 milliseconds, respectively.

Discussion
In this study, we propose a recommendation algorithm that combines memory-based and clustering methods. The memory-based method considers user rating scores and user behavior scores to accommodate user preferences. Meanwhile, the clustering method is k-means clustering by determining the number of clusters based on the Silhouette Coefficient to obtain the optimal number of clusters.
Choosing the number of clusters has become a consideration by researchers. There are two methods available, i.e., extrinsic and intrinsic methods. The extrinsic methods need expert judgment, while the intrinsic methods use algorithms to find the best number of clusters. We chose the intrinsic method, i.e., the Silhouette Coefficient, because there is no expert involved in our work. In addition, our experiment also evaluates the results using another well-known intrinsic method (i.e., Davies Bouldin Index). Both intrinsic methods result in the same optimal number of clusters (i.e., k = 3).
The results showed that combining memory-based and clustering methods could improve the prediction performance by reducing MAE by 1.88% and RMSE by 1.46% compared to the baseline method (UPCSim). In addition, the performance of the recommendation processing time after clustering improved 5.5 times faster than before clustering. It occurs because users with the same preferences are in one cluster, and the similarity calculation only considers data in one cluster without processing data in other clusters.
Furthermore, we also evaluated the performance of UPCSim and CB-UPCSim in another dataset (i.e., MovieLens 1M). Testing in MovieLens 1M dataset produces the average MAE and RMSE values of 0.6993 and 0.8921 for UPCSim and then 0.6857 and 0.8784 for CB-UPCSim. The testing results on the larger dataset show that CB-UPCSim also outperforms UPCSim, reducing MAE and RMSE by 1.94% and 1.53%, respectively. In addition, the larger dataset (MovieLens 1M) results in lower MAE and RMSE than the MovieLens 100k. It shows that the CB-UPCSim yields a low prediction error in a larger dataset, which is our research advantage.
Recently, some studies [36][37][38] also proposed sophisticated collaborative filtering to improve recommendation performances. The study conducted by [36] suggested new collaborative filtering using cognitive similarity. Their experimental results show that for the lower number of nearest neighbors (k) (i.e., k = 10 and k = 20), CB-UPCSim outperforms cognitive similarity. It becomes the advantage of our method because fewer nearest neighbors will need less computation time. However, for the higher k (i.e., k = 30 and k = 50), cognitive similarity outperforms our method and becomes the advantage of the cognitive similarity method. We still need further study to compare both algorithms for higher k (i.e., k = 60 up to k = 100) because the cognitive similarity did not measure them. In addition, Nguyen et al. [37] use word embedding to improve their proposed collaborative filtering. Hence, implementing word embedding in CB-UPCSim can be another option to obtain better performance. Furthermore, Logesh et al. [38] presented user-based collaborative filtering using a new bio-inspired clustering ensemble (BICE). This method was evaluated to large-scale datasets (i.e., Yelp and TripAdvisor). The clustering process in BICE will also be the next consideration to obtain the optimal cluster in CB-UPCSim. Although the proposed system can improve recommendation performance (both rating prediction and processing time), it still has drawbacks where the similarity calculation has to work serially. This similarity calculation starts from calculating the user rating score-based similarity, calculating the user behavior score-based similarity, and finally calculating both similarity weightings. The computation time of our system consists of the overhead computation time and the prediction computation time. The overhead computation time of this research is the pre-processing data time that includes the dataset reading (130.11 milliseconds), attribute reduction (20.05 milliseconds), and attribute conversion (16.35 milliseconds). This overhead is usually out of the researchers' consideration because only the prediction computation time will be taken into account in the operational system. Meanwhile, the prediction computation time consumes 0.91 s (the average running time from Table 4).
Besides the similarity calculation working serially, there are three other limitations in our proposed method. First, our method only uses MovieLens with sizes 100 k and 1 M. Further research must investigate MovieLens with sizes 10 M, 20 M, and 25 M. Second, in the pre-processing stage, we convert three attributes of user profile data (gender, occupation, and location) into a numeric type, transform the gender (M, F) into (1, 2) and convert the occupation into 1 to 21. We may explore other conversion techniques to gain a better performance. At the same time, there is a conversion from the location to the first digit of location. We assume that the conversion into two digits of location will increase the prediction result. Finally, in the post-processing, we only measure the rating prediction error without measuring the quality of top-N recommendations.

Conclusions
This paper focuses on improving recommendation performance from a previous similarity algorithm involving user behavior scores. We propose a combined clustering and memory-based method by using k-means clustering and UPCSim. The clustering method based on the user profile similarity can speed up the processing time of the recommendation system by 4.16 s. In addition, the method can increase the system performance with a decline of MAE and RMSE by 1.88% and 1.46% in the MovieLens 100k dataset. In a larger dataset (MovieLens 1M), our method yields better prediction performance.
For further research, the system development can consider parallel processing to calculate the similarity between users and explore other clustering methods to improve recommendation performance. Moreover, the pre-processing stage can be extended by considering two digits of location, and the post-processing can involve measurement of the top-N recommendation.