Personalized Standard Deviations Improve the Baseline Estimation of Collaborative Filtering Recommendation

Featured Application: A novel baseline estimation model for collaborative filtering recommendation is proposed. Personalized standard deviations from global and local could improve the predictive accuracies. Abstract: Baseline estimation is a critical component for latent factor-based collaborative filtering (CF) recommendations to obtain baseline predictions by evaluating global deviations for both users and items from personalized ratings. Classical baseline estimation presupposes that the user’s factual rating range is the same as the system’s given rating range. However, from observations on real datasets of movie recommender systems, we found that different users have different actual rating ranges, and users can be classified into four kinds according to their personalized rating criterion, including normal, strict, lenient, and middle. We analyzed ratings’ distributions and found that the proportion of user ratings’ local standard deviation to the system’s global standard deviation is equal to that of the user’s actual rating range to the system’s rating range. We propose an improved and unified baseline estimation model based on the standard deviation’s proportion to alleviate the influence of classical baseline estimation’s limitation. We also apply the proposed baseline estimation model in existing latent factor-based CF recommendations and propose two instances. We performed experiments on full ratings of datasets by cross evaluations, including Flixster, Movielens (10 M), Movielens (latest small), FilmTrust, and MiniFilm. The results prove that the proposed baseline estimation model has better predictive accuracy than the classical model and is efficient in improving prediction performance for existing latent factor-based CF recommendations.


Introduction
Recommender technologies have been applied to a variety of Internet-based systems, such as online multimedia services, online shopping, and online social networking. Collaborative filtering (CF) recommendation is one of the most important techniques for recommender systems [1,2]. The main techniques of CF recommendation include the neighborhood method and latent factor model [3]. The neighborhood method aims at improving predictive accuracy by similarities of relations, such as [4][5][6][7]. The latent factor model uses historical rating data to discover latent features in users' behaviors and explain user ratings in conjunction with explicit evidence, such as [8][9][10][11][12], including baseline estimation model and latent factor model as in Equation (1): The formalized function ( ) decomposes the user-item rating matrix into users' latent factor matrix and items' latent factor matrix , where reflects the weight of users' preference on each latent factor, and reflects the weight of items' characteristics. Parameters and are related factors of and . Most researchers have focused on algorithms of ( ) to obtain more accurate predictive accuracy by personalization methods, such as [5,8] and [9][10][11][12][13][14].
In this paper, we focus on the first part in Equation (1), called baseline estimation [5,8,14], where and denote the observed global deviations of user and item , respectively. The related equation is: where ̂ denotes the baseline prediction for an unknown item by , and denotes the global average rating of the system. Baseline estimation evaluates the global deviations of users' personalized ratings on items against global average ratings and the deviations of items' received ratings to the overall average [5]. Usually, and can be observed from the user-item rating matrix by calculating the related row average rating and column average rating, denoted by ̅ and ̅ , respectively, and = ( ̅ − ) and = ( ̅ − ).
Our main motivation comes from the limitation of the baseline estimation, which is firstly mentioned by Koren in [5,8]. The main problem of baseline estimation in (2) is that system cannot differentiate users' personalized rating range from the system's rating range. Factually, different users usually have different rating criteria on the same items. Some have relatively higher ratings for items than others, while some have very strict rating criteria for the same item. As a result, users' actual rating range may differ from the system's rating range, and the baseline estimation may result in an irrational predictive value. We believe that the main limitation of classical baseline estimation is the lack of local personalization consideration. From observations on real recommendation datasets, we found that users in all datasets can be classified into four kinds according to their real rating ranges: Normal, strict, lenient, and middle. The normal users gave ratings within the system's range, but the strict users' rating range's supremum was less than that of the system's range, while the lenient users' rating range's infimum was larger than that of the system's, and the middle users' rating scale did not cover both the supremum and infimum of the system's rating range. The observation proves that users' real rating range is influenced by users' personalized rating criteria, and baseline estimation should not only consider global deviation from the system's rating range but also consider the local deviation from users' personalized rating ranges. We discuss this problem in Section 3 in detail.
Our main purpose was to propose an improved and unified baseline estimation model from local deviations and global deviations. We investigated local deviations from different kinds of users' rating distributions and found a standard deviation proportion (SDP) pattern for baseline estimation. To our knowledge, this study is the first to improve the baseline estimation from the perspective of a normal distribution. This work has the following main contributions: (1) Observed that four kinds of personalized users exist in recommender systems with personalized rating criteria; (2) Proposed an improved and unified baseline estimation model based on users' personalized rating distributions and system ratings' global distribution; and (3) Proposed application instances of SDP to existing latent factor-based CF recommendations with excellent improved predictive accuracies.
Since the standard deviations can be calculated from historical data, the proposed SDP has no additional cost compared to classical baseline estimation in real systems. Experiments on real datasets with full ratings and cross validation prove that the proposed SDP is more efficient than classical baseline estimation and can effectively improve the predictive accuracies of existing latent factor-based CF recommendations. The rest of this article is organized as follows. We introduce the related work in Section 2 and problem statements in Section 3. We describe SDP in Section 4 and 5, and experiments and evaluation in Section 6, followed by conclusions in Section 7.

Related Work
Recommendation techniques mainly include content-based recommendations, collaborative filtering (CF), and hybrid recommendations. Content-based recommendation focuses on user profiles and preferences [15][16][17], while CF recommendation focuses on user ratings related to the user-item matrix to find a set of like-minded users or similar items [18,19], and hybrid recommendation combines two or more recommendation techniques [20].
Collaborative filtering (CF) techniques play a significant role in recommender systems and can be classified into neighborhood-based CF and latent factor model-based CF. Neighborhood-based CF focuses on user ratings related to the user-item matrix to find a set of likeminded users or similar items and is mainly classified into user-based CF and item-based CF [21]. User-based CF discovers users with similar interests or preferences on items to a given user based on users' similarities, while item-based CF usually recommends similar items to a user based on items' correlations [22][23][24]. Components of reliability [6,7], users' relationships and interest [22], personalized behaviors [23], and similarities [2,24] are often used to improve the predictive accuracy of neighborhood-based CF recommendations.
Personalization is the core topic during recommendation [25]. Latent factor-based CF uses historical rating data to discover latent features in users' behaviors. Because latent factor-based CF can train and learn users' personalization offline, it has higher prediction and has become one of the most popular topics in recommender systems since the Netflix Prize competition. The latent factor model is derived from the singular value decomposition (SVD) [26]. Koren investigated baseline estimates, neighborhood models, and latent factor models of CF recommendations and proposed SVD++ by extending SVD to make full use of implicit feedback information [5]. Kumar et al. [9] incorporated social popularity factors in the SVD++ matrix factorization model to improve the practical accuracy of recommendations. Bao et al. [10] adopted topic modeling to model latent topics in review content and used biased matrix factorization (MF) for prediction in recommender systems, simultaneously correlating with user and item factors. Guo et al. [11,12] investigated the effectiveness of novel fusing of trust networks to rating prediction, where not only the explicit but also the implicit influence of trust are integrated into the SVD model. Pan et al. [13] proposed a self-transfer learning algorithm to iteratively identify and integrate some likely positive unlabeled feedback into the learning task of labeled feedback. In the past five years, deep learning has also been imported into the recommendation to discover more useful latent factors based on users' preferences by big data analysis. Wu et al. [27] proposed a deep latent factor model based on a high-dimensional and sparse matrix to construct a deep-structured model by sequentially connecting multiple latent factor models. Li et al. [28] proposed a combined deep CF framework by probabilistic matrix factorization with marginalized denoising stacked autoencoders, deep learning the effective latent representations and achieving good performance. Zhang et al. [29] summarized related research efforts on deep learningbased recommender systems with expanded trend analysis and mentioned that recommendations need better, more unified, and harder evaluation. As argued in [1], the assessment criterion should be considered when choosing the appropriate recommendation. The baseline estimation model is a critical component for latent factor model-based CF recommendations, such as in [3,5,8,9] and [11][12][13][14]. Baseline estimation can evaluate the global deviations of users' personalized ratings on items against global average ratings and the deviations of items' received ratings to the overall average [5]. The important elements in baseline estimation are , , and . In contrast to most researchers that focused on modeling the latent factors, fewer researchers found that the baseline estimation could also consider more factors to enhance the global deviation predictions. In [8], Koren observed that users' preferences for items drifted over time, and items' popularity also constantly change. He modeled the dynamics over time and proposed a time factor-based CF recommendation (timeSVD++), where the baseline estimations were modeled as = + ( ) + ( ). Tan et al. [14] proposed a proportion-based baseline estimation model for CF recommendation by considering personalized rating segments, and deviations b were weighted with factor ( + − ) ( − ) ⁄ when b is negative, and with ( − − ) ( − ) ⁄ when is positive, where [ , ] is the rating segment.
The Gaussian process is usually applied to improve the accuracy of recommendations. Zhang et al. [16] proposed a modified EM model by Bayesian hierarchical linear models assuming user models are sampled randomly from a Gaussian distribution. Sofiane and Bermak [30] proposed a Bayesian procedure based on the Gaussian process using a nonstationary covariance function for time series prediction. Ko et al. [31] integrated Gaussian processes into different forms of Bayes filters; both considered the global mean of previous states and related uncertainty.
Inspired by the above, our motivation was to find an improved and unified baseline estimation model for latent factor-based CF recommendation, with consideration of both global deviations and local deviations from users' personalized normal distributions. A preliminary version of this report appeared in the Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME) [32].

Problem Statement
In the baseline estimation model in (2), related deviations and are both based on the system's global rating range, and it has inherent limitations in controlling the predictive value from overflowing the system's rating range. This global deviation may result in irrational predictive results, and the prediction may even overflow the rating range. For example, assume that the entire recommender system's average rating is = 4.0, and the system's rating range is from 0. More examples were demonstrated in [14] and [32].
In real systems, users always give ratings according to their personalized rating criterion; some are strict, and some are lenient. Ideally, the baseline estimation will be accurate if we fully consider each user's personalization. However, it is time costly and unrealistic in real systems. Therefore, we focused on how to discover a unified personalized factor with commonality to improve the baseline estimation predictive accuracy and attempted to apply it to improve the performance of related CF recommendation algorithms, considering not only global deviation from system ratings but also local deviation from users' personalized ratings.
We observed that users in many recommender systems can be classified into four kinds by their personalized rating ranges.
(1) Normal user. The actual rating range of this kind of user usually covers the recommender system's rating range. For example, assume that the system's rating range is between [1,5], those normal users' rating range may be the same as [1,5].
(2) Strict user. In this situation, users have relatively strict criteria and make a relatively lower rating than others. They rate items strictly, and few cover the highest rating range.
(3) Lenient user. These users habitually give relatively higher ratings than others, and their actual rating range usually does not cover the lowest range.
(4) Middle user. Users of this kind neither give higher ratings nor give lower ratings, and their actual rating range is the middle, e.g., [2,4] while the system's range is [1,5].
We performed rating statistics over five real datasets, including Flixster, MiniFilm, FilmTrust, Movielens (10 M), and Movielens (Latest Small), to verify the existence of these four kinds of personalized rating behaviors. Figure 1 shows the rating statistics. During the observation, we defined that the normal users' ratings cover all the system's rating scale, the strict users' ratings do not cover the highest boundary, and the lenient users' ratings do not cover the lowest, while the middle users' ratings do not include both the highest boundary and lowest boundary. From the statistics, we can see that the four situations exist in all datasets. The observation shows that most users were lenient or normal, some were middle, while a few users were strict, and proves that the real rating range of users may be less than the system's rating range. Therefore, it is necessary to consider local deviation during the baseline estimation.

Proposed Model
The main purpose of this paper was to discover a more rational baseline estimation model from both local deviations and global deviations to improve predictive accuracy. This section will introduce the processes of the proposed improved baseline estimation model.

Symbols Definition
Definition 1: Related symbols refer to Table 1.

Symbols
Description item ID overall average rating average rating given from u average rating of i observed rating of (u, i) predicted rating of (u, i) bias of u bias of i Local standard deviation of u Global standard deviation The number of all ratings in system The number of ratings given by user ∈ ℝ × factor-item matrix ∈ ℝ × latent feature vector of user u ∈ ℝ × latent feature vector of item i

Observation from Rating Distribution
(1) Global observation As argued in many related works [16,31,32], the overall ratings follow the normal distribution in recommender systems when there are enough rating behaviors. Theoretically, the rating densities follow ( ) = ( , ), with the probability density as: where ∈ [ − , + ], is the mean value of whole ratings in the system, is the global variance in all ratings, and is the length of the system's rating scale. Additionally: represents the global standard deviation of ratings, where is the number of all ratings described. We selected five real datasets for the global observation, including large-scale datasets Flixster and Movielens-10 M, middle-scale dataset FilmTrust, small-scale dataset ml-latest-small, and miniscale dataset MiniFilm. The related properties of these datasets are described in Section 6. Figure 2 shows the rating densities' distribution with different rating segments. We observed that these ratings follow the general normal distribution trend, especially when there are more ratings. This observation proves that the normal distribution probability can adapt to the global distribution for overall ratings in all recommender systems. (2) Local observation To investigate the personalized rating features, we further observed the local distribution of ratings given from different kinds of personalized users. We selected Flixster for the example demonstration to make the description more convenient. Other datasets with enough ratings also follow a similar regularity. As described in Section 2, we found that four types of personalized users exist in recommender systems, including normal users, strict users, middle users, and lenient users. We calculated the statistics on all 8,196,077 ratings of Flixster according to different users' rating segments. Table 2 shows the statistical results. The rating range of normal users is from 0 to 5, that of strict users is from 0 to 4, that of middle users is from 1 to 4, and that of lenient users is from 1 to 5.
We also investigated related segments' rating densities and calculated their mean values and standard deviations . Meanwhile, we calculated the fitted mean value and fitted standard deviation for different kinds of users' ratings according to a normal distribution. We demonstrate the related rating densities in stem style in Figure 3 to observe the local distribution of ratings given by different kinds of users. We observed that the rating densities of all kinds of users generally follow normal distributions with different deviations, as shown in stem Figure 3. Intuitively, we fitted these densities into normal distributions based on ( , ), as shown in the dashed curves in Figure 3. Additionally, for each kind of user, we fitted the complete rating densities into a normal distribution based on the related mean ( , ), as shown in the blue curves in Figure 3. We found that the statistical distribution of ( , ) is close to the fitted , . The main differences between them on the same ratings remain in the related standard deviation and related mean value. From the observation, we found that the rating distribution of different kinds of users can also follow the normal distribution. We call this a local distribution in this paper and define that ratings given by a personalized user follow the normal distribution ( ) = ( , ), with the probability density as: where ∈ [ − , + ] and is the local mean value by and is the related local standard deviation.  [3,4) 138,739 0.4750 [4,5) 94,333 0.3230 [5,6) --0.0000 (

3) Guess and Proposition
The main purpose of this paper was to find a more rational estimation model based on the global distribution ( ) = ( , ) and local distribution ( ) = ( , ) . Observation proves that different personalized rating ranges result in different standard deviations but still follow local normal distributions. In our opinion, with ratings increasing, the cumulative distribution of different kinds of users' ratings will be closer and the same when there are enough rating behaviors. To obtain a relatively generic result, we assumed that the cumulative distribution of a specific user's normal distribution is equal to that of the system's normal distribution, and we made the following hypothesis according to different rating ranges and distributions.

Hypothesis 1 (H1).
In a RecSys, the cumulative distributions of ( ) and ( ) constrained to related rating ranges remain the same, meaning that the coverage area of the local normal distribution ( ) between that is: where K is a constant and less than 1.0. Based on this hypothesis, we focused on the standard deviations and personalized rating ranges and propose the following.

Proposition 1. The proportion of
is equal to , that is = .
Proof. The cumulative distribution of ( ) is: . Let = , then: The cumulative distribution of ( ) is: . Let = , then: According to the hypothesis, Φ( ) = Φ ( ) = K, then: Thus, the proportion of is equal to the proportion of ; that is, = .
Thus, we obtained the following relation: = .
End proof.

Proposed Improved Baseline Estimation Model
Based on the above, we conclude the improved baseline estimation model based on the proportion of local standard deviation to global standard deviation. According to = , we can obtain: Since and are the observed deviations of user and item , respectively, generally = − , = ̅ − , so: This is the improved baseline estimation with the proportion of the global standard deviation to the local standard deviation, and we call this novel model SDP (standard deviation proportion baseline estimate). For parameters and , we can use the following loss function to obtain the optimized * and * : Since the local standard deviation is usually less than the global standard deviation , the above SDP can satisfy most situations. However, the sparsity problem appears when the number of ratings is less than 20, where standard deviation does not follow the normal distribution. Thus, we adjusted the SDP into the following: This means that the SDP uses the original baseline estimation with no features of a normal distribution to limit the influence of the sparsity problem when > . In the next section, we introduce how to apply SDP into existing latent factor-based recommendation algorithms.

Application Instances of Proposed SDP
In this section, we provide two instances to illustrate how to apply the proposed SDP baseline estimation model to existing latent factor-based recommendation algorithms that use classical baseline estimation. The main method adds factor to and makes some improvement during model learning. We selected two famous and efficient existing latent factor-based CF recommendations for improved instances. The first instance improves SVD++ [5] by SDP, named SDPSVD++; the second improves TrustSVD [11,12] by SDP, named SDPTrustSVD.

Application Instance-1: SDPSVD++
(1) Improved by SDP SVD++ extends SVD using feedback from user ratings. The detailed improvement is that a free factor-user vector is complemented by | | • ∑ ∈ , and a user is modeled as The equation of SVD++ is: where represents the set of item ratings by user ; ∈ represents the implicit influence of items rated by user in the past on the ratings of unknown items in the future.
Based on the SDP model, we improved the baseline estimation using a proportion of for SVD++, leading to the following model: All of these parameters have the same meanings as in SVD++. We named the new SVD++ model (13) SDPSVD++.
where and are the set of users who rate items and , respectively; ‖•‖ denotes the Frobenius norm; and alleviates operational complexity and avoids overfitting. To obtain a local minimization of the objective function ℒ , we performed the following gradient descents on , , , , , and for all the users and items in each given training dataset: During the above gradient descent, there is =̂ − . Processes of ℒ , ℒ , ℒ , and ℒ are the same as those in SVD++, while ℒ adds factor to , and ℒ is added during gradient descent according to ℒ in (10). Finally, latent factor matrices for the user and item, and , are output when the loss function ℒ reaches a local minimization.
Algorithm 1 shows how to obtain the prediction ̂ by learning ℒ. Given a dataset, we input the user-item matrix. The initial learning rate is for ℒ , ℒ , and ℒ , while rate for ℒ , ℒ , and ℒ . A very small positive value (line 1, line 6) is to control the learning convergence of ℒ.
According to the SDP model in Equations (11) and (13), we calculated the prediction of SDPSVD++ while ≤ , as described between line 4 to line 14. If > , the prediction would follow the original learning algorithm in SVD++ according to Equations (11) and (12). The time complexity of ℒ is ( • |ℝ|), where represents the matrix dimensions, and |ℝ| represents the number of the observed ratings.

Application Instance-2: SDPTrustSVD
(1) Improved by SDP The second instance improves TrustSVD [10] based on SDP. The TrustSVD considers explicit users' trust information in the SVD++ [5] and merges the trust factor to obtain better accuracy. The equation is as follows: where denotes the latent factor vector of users (trustees) trusted by user ; denotes the set of users trusted by . To apply the proposed SDP, we add the proportion to b : All other parameters in the improved equation have the same meaning as the original TrustSVD, and we named this improved TrustSVD as SDPTrustSVD.
(2) Model learning Involved parameters in SDPTrustSVD are learned by minimizing the regularized squared error function association. The regularized squared error function ℒ is: where ̂ denotes the predictive trust value of user from user , is the observed trust value, denotes a user-specific latent feature vector of users (trustees) trusted by user , denotes the set of users trusted by user u, denotes the set of users who trust user , and is a parameter to control the degree of trust regularization. To obtain a local minimization of the objective function ℒ, we performed the gradient descents on , , , , , , for all the users and items in a given training dataset: During the above process, is also equal to (̂ − ), and ℒ , ℒ , ℒ , ℒ , and ℒ are the same as the related processes in TrustSVD, and ℒ is renewed by factor , while ℒ is calculated during SDPTrustSVD according to ℒ in (10). As a result, with the minimization convergence of the error function ℒ, latent factor matrices for the user and item, and , are output. Algorithm 2 with pseudocode shows the learning processes during predicting ̂ .

Experiments and Analysis
In this section, we performed a series of experiments to evaluate the efficiencies of the proposed SDP and its application instances (SDPSVD++ and SDPTrustSVD) on five real datasets, including Flixster, FilmTrust, MiniFilm, Movielens-10 M, and Movielens-latest-small (100K).
The evaluation performance of predictive accuracies was measured by the metrics mean absolute error (MAE), root mean square error (RMSE), and relative error (RE), respectively, where a lower value indicates better predictive accuracy. It was assumed = ( , )|∃ (( , , ) ∈ ℝ) , and is the number of observed ratings. The evaluation matrices are as follows: We designed three experiments to evaluate the performance.
• Experiment (1): The first experiment evaluates the baseline estimation accuracy of the proposed SDP on the five datasets, compared with classical baseline estimation in (2) and the PBEModel [14]. Each dataset in the experiments was separated into five parts. To perform a full evaluation, we used a cross-validation method during experiments in which four parts of a given dataset were used for training while the remaining part was used for prediction. As a result, each part was used for both training and prediction by cross validation. The final performance on a given dataset was expressed as the average of the cross-prediction results.
As mentioned above, our experiments were performed on five real datasets, and Table 3 shows their statistics. These datasets are all about movies. Flixster, Movielens (10 M), and Movielens (Latest Small) are generally large datasets, while FilmTrust and MiniFilm are relatively small. All of these could be found in related websites or www.librec.net. Flixster has 147 K users and 48 K movies with 409 K ratings and more than 11 M trust ratings with friendship links. Movielens (10 M) has more than 70 K users and 10 K movies with 10 M ratings. Movielens (Latest Small) is the latest small dataset of Movielens and has 700 users and 10 K movies with 100 K ratings. FilmTrust is relatively small and has 1508 users and 2071 items with 35,497 ratings and 1853 trust ratings. The smallest dataset is MiniFilm, which has only 55 users and 334 items with 1K ratings. Flixster and FimlTrust can be used to evaluate the performance of TrustSVD and SDPTrustSVD, which require latent factors from trust ratings.

Performance of Proposed SDP
The purpose of this experiment was to evaluate the baseline estimation accuracy of SDP on all users with full ratings using cross validation. We selected the classical baseline estimation (BE) described in (2) and the PBEModel [14] for baseline estimation performance comparison. We performed the experiment on the full ratings in each dataset. The evaluation results of MAEs, RMSEs, and REs are shown in Table 4 where PBE is short for PBEModel, and the visualized performance comparisons are shown in Figure 4. The proposed SDP exhibits the best performances on Flixster, MiniFilm, Movielens (10 M), and Movielens (Latest Small), and second on FilmTrust. Compared with classical BE, the baseline predictive accuracies of the proposed SDP are superior. The MAEs of SDP are 5.26%, 1.35%, 2.11%, 0.81%, and 2.15% higher than classical BE on the datasets Flixster, FilmTrust, MiniFilm, Movielens (10 M), and Movielens (Latest Small), respectively. On the large-scale dataset Flixster, compared with BE, the RMSE and RE are also improved by 2.96% and 2.95%, respectively, by SDP. On another largescale dataset, MovieLens (10 M), the improvements of RMSE, RE, and MAE are 0.84%, 1.19%, and 0.81%, respectively.
Compared with classical PBE, the predictive accuracies of SDP are also superior on Flixster, MiniFilm, Movielens (10 M), and Movielens (Latest Small), except for the small dataset FilmTrust. The results prove that the proposed SDP could effectively improve the baseline estimation accuracy, and all MAEs, RMSEs, and REs are superior to BE.

Performance of Proposed SDPSVD++
The purpose of this experiment was to evaluate the predictive accuracy of the proposed SDPSVD++ on all users with full ratings using cross validation. SDPSVD++ is an improved recommendation algorithm based on SVD++ [5] by the proposed baseline estimation model SDP. PBESVD++ [14] is also an improved SVD++ based on the baseline estimation model PBE. Therefore, we selected SVD++ and PBESVD++ for comparisons. We performed the experiment on the full ratings in four datasets by cross validation, including Flixster, FilmTrust, MiniFilm, and Movielens (Latest Small), with 5 and 10 dimensions, respectively. The evaluation results of MAEs, RMSEs, and REs are shown in Table 5, and the visualized performance comparisons on each dataset are shown in Figure  5. The results show that the predictive accuracies (RMSEs, REs, and MAEs) of the proposed SDPSVD++ are superior to those of SVD++ and PBESVD++.
Compared with the original SVD++, using MAEs as examples, on each dataset with d=5, the predictive accuracies of SDPSVD++ are 7.69%, 2.84%, 1.26%, and 0.67% higher than those of SVD++. On each dataset with d=10, the improvements of SDPSVD++ are also distinct, with MAE improvements of 3.47%, 2.81%, 1.76%, and 0.65% compared with original SVD++. The greatest improvements appear in the large-scale dataset Flixster with d = 5, where the RMSE, RE, and MAE by SDPSVD++ are more than 6.8% higher than SVD++.
Compared with PBESVD++, the proposed SDPSVD++ also obtains higher predictive accuracies. Continuing to use MAEs for examples as described in Table IV, for each database with d = 5, the proposed SDPSVD++ are 1.73%, 0.94%, 1.49%, and 0.65% higher than those of PBESVD++. This proves that the proposed SDP can improve the recommendation accuracy of the existing SVD++ more efficiently than PBE. Additionally, we observed that the predictive accuracies on full ratings of the small-scale dataset FilmTrust and MiniFilm are also improved greatly by SDPSVD++. On dataset FilmTrust, with d = 5 or d = 10, the RMSE, RE, and MAE are always more than 2.1% higher than SVD++. Compared with SVD++, the RMSE, RE, and MAE are 1.66%, 1.67%, and 1.76% higher, respectively, than those of the original SVD++ on MiniFilm with d = 10, proving that the SDP can be effectively combined with existing CF-based recommendation algorithms in different situations and obtain higher predictive accuracies.

Performance of Proposed SDPTrustSVD
The purpose of this experiment was to evaluate the predictive accuracy of the proposed SDPTrustSVD on all users with full ratings using cross validation. We selected the TrustSVD [11] for comparison. Because the two need to compute the trust information, we performed the experiment on the full ratings in Flixster and FilmTrust by cross validation, with d = 5 and d = 10, respectively. The evaluation results of MAEs, RMSEs, and REs are shown in Table 6, and the visualized performance comparisons on each dataset are shown in Figure 6.
The results show that the predictive accuracies of the proposed SDPTrustSVD are superior to the original TrustSVD. On full ratings of Flixster with d = 5, the improvements of RMSE, RE, and MAE by SDPTrustSVD are 1.91%, 1.79%, and 3.20% higher than the original Trust SVD. On FilmTrust with d = 5, the proposed SDP could also improve TrustSVD effectively, where the performances of RMSE, RE, and MAE are 2.39%, 2.38%, and 3.85% higher than TrustSVD. On both datasets with d = 10, the improvements are also distinct. The results prove that the proposed SDP can efficiently improve the existing TrustSVD.

Conclusions
In this paper, we investigated how to improve the baseline estimation accuracy. Observed from real datasets, we found that different users usually have different rating criteria, and the real rating range of users is usually a subset of the system's rating range. Further observations from ratings' global and local distributions showed that ratings generally follow the normal distribution and different personalized rating ranges result in different standard deviations. Moreover, we analyzed the ratings' distribution and found that the proportion of users' personalized local deviation to the system's global deviation can improve the baseline estimation accuracy and proposed an improved novel model named SDP, which can be conveniently applied to all existing latent factorbased CF recommendations that utilize classical baseline estimation, by adding a proportion to the global . Two application instances of SDP, SDPSVD++ and SDPTrustSVD, were proposed in this paper to illustrate how to apply SDP and how to improve the learning algorithms. The experiments showed that the proposed SDP is superior to classical baseline estimation on all datasets with full ratings evaluation by cross validation. The experiments also showed that the proposed instances SDPSVD++ and SDPTrustSVD have dramatical improvements on predictive accuracies compared to original algorithms. The results prove that SDP can not only improve baseline estimation accuracy effectively but also efficiently improve existing latent factor-based CF recommendations.