Cognitive Similarity-Based Collaborative Filtering Recommendation System

.


Introduction
The information overload problem on the Internet is popular today so that the recommendation system is powerful methods to handle these problems. The recommendation system covers a wide range of recommendation targets such as travels, movies, restaurants, fashion, news, and so on [1,2]. Clearly, one of the highly effective technologies applied to the recommendation system is Collaborative Filtering (CF) [3][4][5][6]. Basically, the operation of the CF system is described as a following. First, CF collects user feedback, and such responses reside within a certain domain and allow users to rate items within that domain. Second, CF exploits the similarities between ranking behaviors of users. Finally, it is possible to determine how to recommend an item. CF accumulates user-item ratings, identifies users with common ratings to items and offers recommendations based on the inter-user comparison. In other words, recommendations for a specific user are based on the behavior and evaluation of other users. The motivation for CF comes from the idea that people often get the best recommendations from someones (i.e., neighbors) who have similar preferences. The main problem of collaborative filtering is how to incorporate and weigh the preference of neighbors.
The purpose of collaborative filtering algorithms is to suggest new items or predict the utility of a given item for a particular user based on the feedback from the user and the other users who like and leave ratings for the item. Let's assume that, there is a list of n users U = {U i |i ∈ [1, ..., n]} and a list of m items I = {I j |j ∈ [1, ..., m]}. Each user u i has a list of item I u i , in which the user has created their feedback. The feedback can be given by the user as a rating score, usually a certain numerical scale, or can be implicitly derived from historical records, by analyzing timing logs, mining web hyperlinks and so on [6,7]. Because, I u i ⊆ I and it is possible for U u i to be a null-set. Therefore, there exists a distinguished user called the active user for whom the task of a collaborative filtering algorithm is to find an item similarity that can be in the two forms, prediction and recommendation. Figure 1 shows the schematic diagram of the collaborative filtering process. In addition, CF algorithms represent the entire matrix n × m user-item data with each entry in the matrix n × m represents the preference score (ratings) of the user u n on the item i m . Each rating get a numerical scale and when the user has not rated yet, it will be 0.

Input (ratings table)
Output interface CF -Algorithm Generally, collaborative filtering algorithms can be divided into two main categories: memory-based and model-based algorithms [8]. Memory-based Collaborative Filtering algorithm uses all or a sample of the user-item database to do the predictions. Each user is part of a group with similar preferences. The prediction about preferences for the items for users can be created by identifying the neighbors of a new user (or active user). On the other hand, Model-based Collaborative Filtering algorithms allow the system to learn the given model so that the algorithms recognize complex patterns based on training data. Then, based on learned models, the system makes the intelligent predictions for collaborative filtering tasks for the test or real-world data.
Besides the outstanding advantages as mentioned, the user-based collaborative filtering still has a lot of successes. However, their widespread use has revealed challenges, such as: • Scalability: With large systems, such as Netflix (https://www.netflix.com) and Amazon (https: //www.amazon.com/), the number of users and items increases a lot every day. The traditional CF algorithm will face serious scalability issues, with computational resources exceeding actual or acceptable levels. For example, if we have millions of users and millions of distinct items, the complexity of the CF algorithm is already too large. Besides, the system needs to respond immediately to online requests and make recommendations to all users regardless of their purchase and rating history, thus requiring high scalability. • Sparsity: In fact, many commercial recommendation systems are used to evaluate very large sets of products. Therefore, the user-item matrix used for collaborative filtering will be very sparse, and the performance or predictions of CF systems are challenged. In several situations, specifically, the cold start problem occurs when having a new user/item in the system. It may not find the similar user/item because there is not enough information (it is also called the new user problem or new item problem [9,10] ). Besides, neighbor transitivity is another problem with sparse databases. The users with similar preferences may not be identified if they have not rated any of the same items. So, it will reduce the effectiveness of a recommendation system based on comparing users in pairs to make predictions.
This paper has three primary research contributions: (i) propose a cognitive similarity approach and collect the real cognitive similarity data based on a crowdsourcing system (called OurMovieSimilarity) [11]. (ii) formulation of a pre-computed model (the three-layered architecture) of cognitive similarity to extract the cognitive similarity from users. (iii) proposed the cognitive similarity-based collaborative filtering recommendation system. In particular, we create a crowdsourcing system [11][12][13][14] to collect the cognitive data from the user. Then, we propose the three-layered architecture [15] to extract the cognitive information from the user. Our architecture is bottom-up and structure made of three superposed networks that are strongly linked: • User network relating users on the basis of explicit from the cognitive network. • Cognition network relating cognitive similarity between users based on selecting items similarity. • Item network relating items based on the basis comparing features extracted from them.
The remainder of this paper is organized as follows. In the next section, we present some of the research related to user-based collaborative filtering. In Section 3, we present a definition of cognitive similarity and propose the three-layered architecture to extract the cognitive similarity. We present the recommendation system based on cognitive similarity in Section 4. The details of our experiment, data set, evaluation, and result will be provided in Section 5. In the final section, we provide concluding remarks and directions for future works.

Related Work
GroupLens [3,6] has implemented the MovieLens (https://movielens.org) [16] as one of the large systems that allow new users to sign up and rate their favorite movies. GroupLens researchers have also released a data set that they collected over the years with more than 25 million movie ratings. They provide a pseudonymous collaborative filtering solution for movies based on their data set in order to improve and solve the disadvantages of collaborative filtering, especially to improve user-based collaborative filtering in a recommender system. Other technologies have also been applied such as Bayesian networks and clustering. A Bayesian Network (BN) [17][18][19] is a compact representation of a multivariate statistical distribution function. BN encodes the probability density function governing a set of random variables {X i |I ∈ [1, ..., n]} by specifying a set of conditional independence statements together with a set of conditional probability functions. In particular, a BN consists of a qualitative part, a directed acyclic graph where the nodes mirror the random variables X i and a quantitative part, the set of conditional probability functions. In general, BN creates a model based on a training-set with a decision tree at each node and edges representing user information. The model can be built off-line over a matter of hours a day. The resulting model is very small, fast, and essentially as accurate as the nearest neighbor method.
Recently, a lot of improvements for user-based CF have been proposed to mitigate the effects of the data sparseness [20,21]. For example, a singular vector decomposition was used to condense the original user-item matrix [22] for dimensional reduction, and latent semantic models [23] was used to cluster the users and items. However, these approaches have a disadvantage that the decomposition must be renewed every time another user or rating is added to the matrix. Another, more recent contribution is based on an analysis of prediction errors to improve the accuracy of user-based CF. This approach has the limitation that the cost for the calculation of errors [24] of all ratings during training is quite an expensive. Alternative approaches, using recursive prediction strategies, have been proposed to exploit not only the neighbors but also the neighbors of the neighbors [25]. Because the similarity calculation of neighbors of all neighbors is required, such strategies incur high computational costs and grow exponentially with the depth of the recursion. Besides, these strategies must be enrich the information of the user-item matrix to improve the performance of user-based CF [8,26].
In addition, in [27,28], the two item-based similarity measures have been designed to overcome the cold-start problem by incorporating genre data of items. They use popular datasets such as MovieLens and MovieTweets in their experiments. According to their approach, an item be uniform to other items because they have more than one common genre. Therefore, by considering the association of common genres, they exploit one of the similarity measures that is determining the degree of direct asymmetric correlation between items.
The proposed method in this paper was inspired by [29] that proposes the Rated-Item Pools (called RIP-based) approach to improve user-based CF. This approach aims to eliminate extra calculations that increase computational complexity and thereby avoid the need to add external knowledge resources resulting in potential cost. In order to formulate the approach, the author used a related method [3] that applied Equation (1) to predicting the rating value R u,i for an active user u and item i where N u,i represents a subset of the neighbors v of the active user u who explicitly rated item i. In addition, the user similarity sim(u, v) is normalized by the sum among all similarities computed between both the active user and the neighbors from N u,i . The classic user-based CF approach calculates sim(u, v) as a global similarity. To calculate sim(u, v), either the cosine similarity metric (also referred to as vector spatial similarity) or, more frequently, the Pearson correlation coefficient are generally used. The Pearson correlation coefficient is defined as following: where R u,i represents the user u rating on the item i; C u,v represents the intersection of the item rated by the users u and v; R u represents the average rating of user u on all the C u,v co-rated items.
In Equation (2), similarity is not only used to contribute sim(u, v) to Equation (1), but also to find the neighbors (N u,i ) of the active user u. Normally, we have two methods that can select the nearest neighbors. However, the more popular is to choose the K users most similar to the active user [3]. In this method, we estimated a similarity threshold, and then alternately, all chosen users have distances to the active user which not exceeded a similarity threshold [4]. We decided this classical method which described above as User-based Pearson Correlation Similarity (UBPS) and adopts it as the baseline for the comparative analysis presented in this paper.

Cognitive Similarity
Our work explores the cognitive similarity between users, then we can define the most similar user for the active user. For example, consider the relation between users such as Kyle, Jason, and Paul. Typically, the process of a CF system first detects the preference of Jason based on his rating items. In the second step, the system comparing the Jason's ratings against Kyle and Paul to find the most "similar" tastes. The final step is to recommend items that similar users have rated highly but not yet been rated by Jason. However, how do we combine and weigh the preferences of user neighbors to define the top-N recommendations for Jason? We recognized the behavior of users when using a service is crucial to making accurate predictions [30]. Our work aims to understand the cognitive similarity of the user. Therefore, according to our approach, we can define the most similar user of Jason is Kyle so that the suitable recommendations to Jason almost depend on Kyle and a little from Paul. As shown in Table 1, with the traditional CF method the user Jason has the same relation with Kyle and Paul, while our proposed method showed that Jason and Kyle have the stronger relation than Jason and Paul. In the remainder of this part, we describe the details of our approach to extract the cognitive similarity between users Suppose n is the number of items and k is the number of features extracted from each item. Therefore, each item will be represented as a vector I n = {F i |i ∈ [1, ..., k]}. When users u selects a pair of items similar, the cognitive similarity of user u will be represented as where Sim F i is a cosine similarity between features F i of each items I n . The cognitive similarity between users will be enriched by each of their selection. Generally, we have a definition of cognitive similarity as a following: Definition 1. The cognitive similarity (CS) between user u and v is their priority of these F features extracted from each item i in the selecting process a pair of an item similar and can be formulated as: where Sim F i ,F j is the similarity of features F which extracted from the pair of items i and j in the selection of user u and user v. Otherwise, p i,j represents the similarity of the priority between user u and user v in order to select pairs of item similarity.

Measuring Cognitive Similarity
The most important step in memory-based collaborative filtering algorithms is calculating the similarity between items or users. The basic idea of calculating the similarity between two items item i and item j is to first isolate the user who evaluated both of these items and then apply the same calculation technique to determine the similarity Sim item i , item j . In this study, we using soft cosine similarity is the metric for measure the similarity. The cosine similarity is defined to equal the cosine of the angle between two non-zero vectors of an inner product space. Given the vector I and vector J, the cosine similarity is represented as follows: where I h and J h are components of vector I and vector J respectively. For example, given movie m i which users have seen and all movies m j remained in the database, we measure similarity Sim m i , m j by using Vector Cosine-Based Similarity. Movie m represented as a vector T, G, D, A, P , in which T, G, D, A and P represent the feature of title; genre; director; actors; plot. In this regard, the formula for scoring Sim m i , m j described as follows: where T ij , G ij , D ij , A ij , and P ij represent the features which measure similarity between movie m i and movie m j such as titles, genres, directors, actors and plots. In particular, consider the title feature, by apply Equation (4), the similarity between the title of movie m i and movie m j describe as follows: Respectively, we measured the remainder similarity between features such as genre (G ij ), director (D ij ), actors (A ij ), and plot (P ij ). Finally, we repeat that calculation for all the remaining movies in the system and obtain a set {Sim m i , m j+h |h ∈ [1, .., n]}. Besides, we add an element that is the priority of users while they select a similar movie. Hence, the Equation (5) is re-written as follows.
where ω k denote the priority of user in selecting a pair of movies similar; k denote the number of features extracted from movie m respectively is the title, genre, director, actor, and plot; Sim k m i , m j is a similarity measuring between movie m i and movie m j as described in Equation (5). The priority of users in selecting a pair of movies similarity dynamic re-calculated and updated by the OMS system when users have a new activity (a new pair of movies similarity). Based on the history of the users (their activities), we collected all the pair of movies similarity so that we can represent the user by using the feature extracted from the all of the pair of movies similarity which user recognized. By using Equation (3), we formulate an equation that measure the cognitive similarity between user Kyle and Paul or Jason. It's described as follows:

Three-Layered Architecture for Cognitive Similarity
Our purpose is to extract the similarity between users based on their cognitive similarity in finding the pair of similar items. Then we can use the cognitive similarity to find the k-nearest neighbor of active users. Hence, we introduce a three-layered architecture as shown in Figure 3, including (i) an item network S, (ii) a cognition network C, and (iii) a user network U. The networks are considered with several different relations between individuals. Hence, each network is characterized as a set of relations and a set of objects (nodes). The characteristics of each layer and the relationships between layers in the three-layered architecture are described below. •

Item Layer
In the item network S, nodes are representing items, and relations (i.e., edges) are the similarity between items. A item network S is a directed graph N S , E similarity S , where N S is the set of item and E similarity S ⊆ N S × N S is the set of relations between these items. From the item network in the Figure 3, the dot edges represent the relationship between the nodes while these nodes represent the items (the movies). In this study, the relation between items is measured by using cosine vector similarity as mentioned above.

• Cognition Layer
The cognition network C is a network N C , E i C , in which N C is a set of cognitive similarity from groups of user and E i C ⊆ N C × N C is the relationship between these groups. The objective relationship from the S to the O is through the selecting pairs of items similarity by users which can be expressed by a relation: Selections ⊆ N S × N C . We can easily interpret the hubs as being the user's groups that combine a large number of other users with cognitive similarity. These would be an exciting starting point for any new users willing to annotate a similar set of objects as his friend. For example, from the cognition network in Figure 3, the new user Bill has cognitive similarity in groups {John, Karla, Bill} so that Bill will start related with the groups {John, Paul, Kyle, Karla}. These will be enriched during the cognitive of Bill and dynamically changed in this network. The relation of these groups is represented by the overlap from sets of features that collect implicit from the activities of users. In particular, feature sets extracted from movies are {t, g, d, a, p}, in which, t denotes title, g denotes genre, d denotes director, a denotes actor, and p denotes plot. Likewise, cognitive similarity between users will be extended and imported based on many histories of users' activities (i.e., the item's similarity). Clearly, there is a difference between cognition networks and item networks though: in item networks, based on several connected items, cognition will be extracted from there. The connection in cognition network extends to include the relation between cognitive similarity of the user and between user groups. Thus, it would be useful to recognize those hubs that connect users on the same groups, these are likely to be the expression of alignment between the two groups.

• User Layer
In the user layer U, nodes are users, and relations are the numerous kinds of relationships that can be found in cognitive similarity. The user network U is a network N U , E i U , in which N U is a set of entity of a user and E i U ⊆ N U × N U the relationship between these entities. The relation was extracted based on the objective relationship from the C to the U that is through the extraction of relation users' group in a cognition similarity and can be expressed by a relation:

Recommendation System Based on Cognitive Similarity
CF-based recommendation systems usually use similarity method for finding k-nearest neighbor users to target a user. Then, the system utilize the past ratings of neighbor users to predict or recommend new content to the active user who will like that content. The content recommendation can be also made by using different methods based on the similarity of information from the past rating (buying, browse, and so on) of the users. In this paper, we use cognitive similarity among users to find k-nearest neighbor users. Obviously, by using a rating score, we can identify the user preferences, but a key problem is how to combine and weight the preferences of user neighbors. We consider another side, that is finding the cognitive similarity between users and combine with the user preferences. It is worth mentioning that, users cognitive similarity must be constructed based on their cognition about the items instead of rating score. All activities of the user can be collected and saved in the database. The features extracted from items that a user uses to recognize similarity can be used to develop the initial of user cognitive similarity. In this case, these features collected implicitly from the users through their movie similar selection. The system then analyzes and updates each cognitive similarity of users individually based on the collected features. The system continues to recommend pairs of similar movies of the k-nearest neighbors to collect feedback from the active user. Finally, the feedback from users on the results of recommendations can be used to adjust their cognitive similarity.
In order to develop the cognitive similarity, items similarity needs to be elaborated in the preprocessing. After that, the cognitive similarity between users will be occurred based on the similar items previously browsed and selected. The recommendation processes can be divided into three steps as follows: • The representation of the user information. The cognitive similarity by user is analyzed and modeled.

•
The generation of neighbor users. The similarity of users can be extracted from the three-layer architecture according to the data collected and the collaborative filtering algorithm presented in Section 3.

•
The generation of recommendations. Top-N items will be recommended to the users according to the cognitive similarity of the neighbors.
Following to the above steps, each user activity in the database can be used to calculate the user list of neighbors which are recorded in the corresponding record in the user database. When users log into the system, the recommendations can be presented based on the cognitive similarity of the neighbors. Then, after each activity of the user can be used to enrich their cognitive similarity and store to database. The process of recommendation is shown in Figure 4.  Most recommendation systems based on user feedback to provide high-quality recommendations. Explicit feedback is sometimes considered as more reliable, implicit feedback requires less intervention to users, captures short-term interest, and continuously updates user preference [21]. Modern approaches make the quality of recommendation based on implicit feedback comparable to those based on explicit feedback. That is the reason, we consider the dynamic update cognitive similarity based on understanding implicit feedback from the user. By allowing the user to update their selection or suggest a new items' similarity for collecting feedback from the user, we can make the measuring of cognitive similarity more efficiency. In addition, recommendations are computed by the cognitive similarity of neighbors. According to the cognitive similarity extracted, we know that the neighbors of each user, so we can list all the items similarity as to summary the most popular ones. For example, from the three-layered architecture described in Figure 3, consider these users Paul, Kyle and Jason, we can recognize that the neighbors of Kyle are Jason and Paul in the threshold neighborhood. Hence, the pair of similar items A, B , and C, D which recognized similar by Jason and Paul should be presented to the user Kyle. Then, when Kyle makes the selection (feedback from Kyle), his cognitive similarity will increasingly, dynamic re-calculated, and updated to the database.

Overview of the OMS System
We propose the OurMovieSimilarity (http://recsys.cau.ac.kr:8084/ourmoviesimilarity) (OMS) is the crowdsourcing system which can be collecting the cognitive similarity of users. Our system was built based on Java and MySQL database [31]. Because our system contains services are web-service and background service, the security and handle multiple access are one of the most important. Therefore we designed the system based on the Model-View-Controller (MVC) [32] model. Besides, we implement Apache Tomcat for the web services side and MySQL database was used because it has rock-solid reliability, scalability, and security.
OMS system is a web-based crowdsourcing platform, therefore we identify that the lowest latency should be considered carefully. Besides, the main challenge in the web-based system that it has enough instructions for the users during the entire system. In order to solve this problem, we using the concept of progressive disclosure, which is "show users what they need when they need, and where they want" in the whole all function of OMS.
In addition, in order to improve the user experience, we focus on the simplest interact and fast response to design our system. In particular, we use one template for the system to maintain consistency. Therefore, users can easier to recognize the interface function (e.g., buttons, functions) when they interact with OMS. All of the features we mentioned above based on three gold rules of user interface design [33].
Generally, interacting with the user is the most important in our purpose so that we made the process of selecting a similar movie from a user as simple as possible, whereby the user, in turn, selects the movies that they have seen and continues to choose which movie is the similar movie from the suggestion of the OMS system. In case the user does not find any of the proposed movies, it is possible to search for movies from IMDB and add to the OMS system.

Data Set
When the OMS was designed, we need the initial movie database to conduct the process of collecting the cognitive similarity of users. Therefore, we implement the movie crawling function in the OMS system, which is automatic collects movie information from sources provided online. We identify an IMDB (https://www.imdb.com/) is an extensive highly scalable movie database. In order to implement the crawling functions to collect movie information from IMDB, we used the open API provided by OMDb (http://www.omdbapi.com/). Up to now, we collected over 14,000 popular movies from 1990 to 2019 with nine genres, 3439 directors, and 8057 actors.
The OMS system still continues collecting data online. At this time, we have about 150 active users and more than five thousand activities of users. The number of data collected from users has the format: (U i , m j , m k , CS U i m j ,m k , γ i ) inside U i is the id of the user; m j and m k are a pair of the movie similar; CS U i m j ,m k is a vector represented the cognitive similarity of user U i ; and γ i is the number of times user change suggested movies in select a pair of movies similar.

Evaluation
To evaluate the recommender system, firstly, the pair of item similarity (in this evaluation, it will be called item) of each user was divided into two sets. These datasets were selected randomly and called the training-set (the first set) and the Test-set (the second set). The proposed algorithms were first implemented on the training-set in order to filter N items to be recommended to the active user that is called top-N. Then, the items in top-N were compared with the items in the Test-set. The common items in the Test-set and top-N were called Hit-set. Finally, after obtaining the Test-set, training-set, and Hit-set, we can calculate the accuracy percentage of the algorithm using evaluation criteria. Here, we used two criteria evaluation that are Precision and Recall.
Measuring the Precision will returns the proportion of relevant recommendations according to the total recommendations (denotes as N), where the relevant recommendations are the ratings equal to or greater than a threshold. The Recall is the proportion of relevant recommendations regarding the total relevant items (from the total number of items selected by the user). However, note that whereas N is a constant, the number of relevant items is not. Hence, the Recall is a "relative" measure because extract relevant recommendations from a few relevant items are more difficult than a large of relevant items. Generally, for better performance, we use the F 1 that is a combination of two above criteria and can be formulated as follows: where S Hit-set is the size of the Hit-set; S Test-set is the size of the Test-set; and S Top-N-set is the size of top-N set. F 1 was computed for each user and the average F 1 obtained from all users was considered as the criterion for determining the algorithm accuracy. To compare the proposed methods with the previous methods, we compared with the recommendation system that has been designed based on association rules. The following diagram shows the results of these algorithms. In the following evaluations, the various values of top-N were considered from 10 to 250. Experimental results show that the accuracy of collaborative filtering based on cognitive similarity (CF-Cognitive Similarity) is higher than collaborative filtering based on Pearson correlation similarity (CF-Pearson Correlation Similarity) approach. The proposed method achieves improvement over the baseline in the best case is 11.1%. Consider various values of top-N in a set {10, 50, 100, 150, 200, 250}, we have the comparison between the proposed method and the baseline as shown in Table 2. In addition, to improve the evaluation, we continue to enable a comparative analysis by using MAE and RMSE as the evaluation metric and the measurement is defined as follows: where n denotes the number of cognitive similarity values in the Test-set, y i denotes the actual cognitive similarity values, and y p i denotes predicted cognitive similarity values. In general MAE can range from 0 to infinity, where infinity is the maximum error according to the cognitive similarity values scale of the measured. The main reason following this approach is because the predicted cognitive similarity values can create the ordering of items in which the predictive accuracy can be used to measure the ability of a recommendation system to rank items according to user cognitive [34].
In order to create the Test-set, we divide the items of each user following the k number of section/folds (k-fold cross-validation) where each fold is used as the Test-set at some point. According to [35], we decided k is set as 5 because yield test error rate estimates that suffer neither from excessively high bias nor very high variance. Specifically, the data set was split into five folds. The first fold used as the Test-set and the remainder used as Tranning-set at the first interaction. In the next interaction, the second fold will be a Test-set and the remainder is Tranining-set. This process will repeat until each fold of the five folds has been used as the Test-set. As mentioned above, we used Pearson Correlation Similarity which uses the similarity and recommendation models (1) and (2) as a baseline. Because the baseline is reported to perform best if around 50 neighbors of the active user, we decided 5, 10, 20, 30, and 50 as the neighborhood sizes in our experiments. The comparison between our proposed method and the baseline shown in Tables 3 and 4 . Table 3. Comparison between CF-Pearson Correlation Similarity and the CF-Cognitive Similarity with varying neighborhood sizes (MAE metric).

Conclusions
In this paper, we proposed the three-layered architecture which can extract the cognitive similarity so that it can exactly identify the k-nearest neighbors. In order to apply the architecture, we aim to create a web-service crowdsourcing platform (called OurMovieSimilarity) to collect the cognitive feedback from the users. Our crowdsourcing system has deployed online and continues to collect feedback from the user. Our data set includes over 150 users and more than 5000 feedback stored in our database. In the evaluate how accurate the proposed method work in the recommendation system, we designate the collaborative filtering Pearson correlation similarity as the baseline to comparing with our methods. Clearly, the results demonstrate that the accuracy of cognitive similarity-based collaborative filtering higher than the baseline. Specifically, compared with the Pearson Correlation, our method more accurate and achieves improvement over the baseline 11.1% in the best case. The result shows that our method achieved a consistent improvement of 1.8% to 3.2% for various neighborhood sizes in MAE calculation, and from 2.0% to 4.1% in RMSE calculation.