A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model

: Nowadays, as the number of items is increasing and the number of items that users have access to is limited, user-item preference matrices in recommendation systems are always sparse. This leads to a data sparsity problem. The latent factor analysis (LFA) model has been proposed as the solution to the data sparsity problem. As the basis of the LFA model, the singular value decomposition (SVD) model, especially the biased SVD model, has great recommendation effects in high-dimensional sparse (HiDs) matrices. However, it has the disadvantage of requiring several iterations before convergence. Besides, the model PID-incorporated SGD-based LFA (PSL) introduces the principle of discrete PID controller into the stochastic gradient descent (SGD), the learning algorithm of the SVD model. It could solve the problem of slow convergence speed, but its accuracy of recommendation needs to be improved. In order to make better solution, this paper fuses the PSL model with the biased SVD model, hoping to obtain better recommendation result by combining their advantages and reconciling their disadvantages. The experiments show that this biased PSL model performs better than the traditional matrix factorization algorithms on different sizes of datasets.


Introduction
With the development of the network and the arrival of information times, people are exposed to more and more information. However, the number of items that users are interested in does not increase from before. It only accounts for a small proportion of the total items. Currently, it is difficult to satisfy the needs of most users by allowing them to go through the filter by themselves or by filtering only with item labels. In this case, it becomes essential to design a specific recommendation system to help users find the things they might be enthusiastic about. The recommendation system can extract the user's preference information and recommend items that fit each user's interest. The introduction of the recommendation system can significantly enhance the user's experience with the software as they can effortlessly meet their own needs. Moreover, it has great commercial value in the fields of advertising promotion and commodity sales.
The main function of the recommendation system [1,2] is to predict user's ratings through a series of calculations according to the existing scores and then fill in the user-item matrix with the predicted scores. Most of the data in the matrix are vacant because the amount of scoring data are much lesser than the number of users multiplied by the number of items. The recommendation system needs to fill the matrix to acquire a data-filled user-item matrix, which entails obtaining all users' scores on all items.
Owing to the amount of information increases and the number of items each user has access to is limited, the user-item preference matrix is always described as the highdimensional sparse (HiDS) matrix. For example, the Movielens-10 M dataset contains a total of 72,000 users and 10,000 movies. The user rating data shows that each user watched and rated only about 140 movies on average, with a density of only 1.31%. Compared with the total number of movies, the proportion of movies with user ratings is too low. In this situation, cold start [3], reduced coverage, neighbor transitivity and a series of problems need to be solved. Data sparsity [4] brings tons of difficulties to the recommendation. In other words, the user provides little information for reference by the recommendation system, but the number of items that are needed to predict the scores is quite large. In previous studies, scholars have proposed many solutions to these problems, such as singular value decomposition (SVD) [5], principal component analysis (PCA) [6], contentboosted CF algorithm (CBCF) [7], tree augmented naïve bayes optimized by extended logistic regression (TAN-ELR) [8]. All of them have their pros and cons.
Aiming to devise better solution to the data sparsity problem, this paper describes a recommendation model based on the latent factor analysis (LFA) model, which is a kind of model-based recommendation techniques of the collaborative filtering (CF) recommendation system [9]. It uses the idea of the biased SVD model to improve the PID-incorporated SGD-based LFA (PSL) model which combines the advantages of both of them. This model fuses the prediction method of biased SVD with the instantaneous error correction method of PSL, which solves the shortcomings of slow convergence of the biased SVD model and the problem of low recommendation accuracy of the PSL model. When compared with other recommendation models, experiments prove that it has high computational efficiency, as well as high prediction accuracy on the HiDS matrix. The contributions of this paper are as follows: 1.
It introduces a biased PSL model combining the biased SVD and the PSL model; 2.
Experiments on three large datasets demonstrate that the biased PSL model can achieve highly competitive prediction accuracy for the missing data of an HiDS matrix compared to the other models.
The rest of this paper is organized as follows: Section 2 shows the related work about the research; Section 3 gives the preliminaries knowledge for a detailed introduction to the basis of the algorithm. Section 4 describes the specific implementation process of the algorithm. Section 5 reports the recommendation results of the algorithm and compares it with previous recommendation methods; Section 6 evaluates the performance of each method; Finally, Section 7 summarizes the whole paper and puts forward the direction of future research.

Related Work
The recommendation system is generally divided into 3 main categories: (1) Contentbased recommendation system [10,11]. It recommends the items that are similar to previous favorite ones to users based on item content and item label; (2) CF Recommendation System [12][13][14]. It gives priority to recommend the items that are favored by people who have similar preferences to them and the recommendation is based on the relevance of users and items; (3) Hybrid Recommendation System [15,16]. It combines multiple techniques to make the final decision.
As for the CF algorithms, it can be divided into 3 techniques: (1) Memory-Based CF Techniques [17]. For these techniques, every user is part of a group of people with similar interests. A new user's preference could be obtained by identifying his so-called neighbors; (2) Model-Based CF Techniques [18]. In order to solve the limitations of memory-based CF algorithms, it applies some models to memory-based CF algorithms, such as machine learning and data mining algorithms. It can resolve the limitations of memory-based CF algorithms; (3) Hybrid CF Techniques [19,20]. As each recommendation technology has its limitations, it combines multiple recommendation techniques to make the final recommendation.
As one of the model-based CF recommendation technique and the basis of the model introduced in this paper, the LFA model [21][22][23][24][25] decomposes the user-item matrix into P u * k and Q i * k by the SVD, where u and i are the number of the users and items, respectively, and k is the number of latent factors. Then, it obtains the final prediction scoring matrix by multiplying the two matrices. The value of k is self-set, which refers to the number of latent factors. The item matrix can be understood as the degree to which each item has this series of attributes, while the user matrix can be considered as the degree to which each user likes this series of attributes. However, to say "each latent factor represents an attribute" is just for the convenience of understanding. In fact, each latent factor has no definite meaning, which makes the latent factor model more flexible and not limited to the label attribute of the item.
According to the previous studies [26,27], the SVD model, especially the biased SVD model, is proposed to solve a series of problems of data sparsity since it has a good performance on the HiDS matrix. The biased SVD model takes users' rating habits and the quality of items into account to give a more targeted prediction score in accordance with the characteristics of users and items. In the Netflix Prize recommendation game, Yehuda Koren reduced the rating error by 32% when adding the bias portion alone, and by 42% when adding the personalization portion. Thus, by adding personalization, it can only reduce the error by 10%. This illustrates the importance of the bias which has a greater effect in improving accuracy. However, this model has a serious drawback, it requires multiple iterations to converge on large datasets.
In addition to the biased SVD model, according to [28,29], based on the SVD model, the LFA model incorporating the principle of discrete PID controller named PSL is also raised to solve data sparsity because it has the advantage of rapid convergence. The traditional SVD can be regarded as a discrete PID controller with K I = K D = 0. Therefore, this method adds these two parameters to improve the performance of the recommendation model. However, since it assumes that everyone has the same rating standard and all items have the same quality, its recommendation accuracy has to be improved.
Since both of them have advantages and disadvantage, this paper introduces the principle of the biased SVD model to the PSL model to improve the recommendation effect.

Preliminaries
The biased PSL model proposed in this paper incorporates the benefits of biased SVD and PSL models while also addressing their drawbacks. The following section introduces the SVD model and PSL model in detail.

Conventional Matrix Decomposition SVD Model
The SVD in the recommendation system is slightly different from the mathematical singular value decomposition [30]. In linear algebra, SVD decomposes a matrix into three matrices. As shown in (1), W is the original matrix, U is the left singular matrix, V is the right singular matrix, and ∑ is the singular value diagonal matrix.
However, in the recommendation algorithm, SVD decomposes a matrix into two matrices P u * k and Q i * k , the user latent factor matrix and the item latent factor matrix, respectively. Then it gains a matrix full of data which is a predicted matrix filled with user ratings for items through the product of two matrices like (2) and (3). This SVD method is the basis of the LFA.
The recommendation system aims to minimize the error between the real score and the predicted score, thus the objective function of the traditional SVD model is (4). To avoid the problem of gradient disappearance of P and Q in the process of gradient descent, the regularization term should be added. λ expresses the regularization parameter.
The iterative formulas (5) for P and Q are obtained by taking the derivative of the objective function.
Then the stochastic gradient descent (SGD) method (6) is used to acquire the final solution where α represents the learning rate [31].

Biased SVD Model
Obviously, there are many subjective factors in the scoring process. Different people have different scoring habits. For example, some people like to give high scores, while others always give low scores. In addition, the quality of items varies from each other. Naturally, high-quality items always get higher scores than low-quality items. Under this circumstance, it is necessary to introduce bias into the SVD model. In the biased SVD model, the characteristics of the user and item are considered by adding an item bias and a user bias. Compared with the traditional SVD algorithm, the performance of the biased SVD algorithm is better, because it takes individuation into account. As mentioned above, the prediction score formula of the biased SVD model is shown in (7) where b u and b i indicate the user bias and item bias, representing the eigenvalues of users and items, respectively, and µ expresses the average score of all users.
After gaining the predicted score, the objective function can be calculated as follows (8). Since b i and b u are also variables and need to be updated during iteration, the regularization terms of b i and b u need to be added as well.
In the biased SVD model, since b i and b u have no product relationship with P and Q, the iterative method of P and Q is the same as that of the traditional SVD method. By calculating the derivative of the objective function, the iterative formulas of b i and b u are obtained as follows (9), where α indicates the learning rate.

PSL Model
As for the PSL model, it implements the principle of the PID controller into the LFA model which can improve the convergence speed. As such, before describing the PSL model, the principle of the PID controller is introduced at first.

PID Controller
The principle of PID [32,33] is to calculate instantaneous error, i.e., the difference between the true value and the predicted value, and then correct this error according to the proportion (P), integral (I), and derivative (D). Since the update point in the HiDS matrix is discrete, it is compatible with the discrete PID controller. The schematic diagram of the discrete PID controller is shown in Figure 1. The TV and PV in the figure represent true and predicted values, respectively. As shown in Figure 1, firstly, it calculates the error between TV and PV, i.e., the instantaneous error, and then inputs it into the three modules, P, I, and D, to accomplish the following calculation (10). Through this formula we can attain the adjusted error, and finally use it to update the predicted value instead of using the instantaneous error.
The control coefficients for the proportional, integral, and derivative terms are K P , K I , and K D , respectively, and E t is the instantaneous error in the t th update point. The PV is reconstructed and returned to the controller based on the adjusted error. This mechanism continues until the termination condition, i.e., the LFA converges, is met.

PSL Model
To converge, the SGD-based LFA model requires several iterations, and the overall cost of time can be considerable. As a result, the convergence process must be accelerated. It can be shown that at each update point of the LFA model based on SGD, the instantaneous error between the actual value r ui and the predicted valueτ ui will be measured and transmitted to the algorithm. Therefore, this procedure can be considered as a generalized discrete PID controller, with K I = K D = 0 omitting the integral and derivative terms. From this standpoint, applying these two terms to the model will help it preform better and accelerate the convergence process. Based on the above assumptions, the PSL model is introduced. The main idea of the PSL model is to reconstruct the instantaneous error according to the PID principle, and then bring the reconstructed error into the SGD algorithm to accelerate the convergence of the LFA model. In the PSL model, according to the concept of PID controller, the integral term and the derivative term are used to extend the calculation of error, as seen in the formulas (11) and (12).
• ∑ t n=0 τ n ui shows the sum of historical τ ui • τ t ui − τ t−1 ui shows the discrepancy between this time's error and last time's error The objective function (13) of the PSL model is deduced according to the above formulas.
By taking the derivative of the objective function, the following iterative formula (14) for P and Q is obtained.

Materials and Methods
This paper proposes a biased PSL model which combines the biased SVD model with the PSL model. It solves the problem that the PSL model does not consider individuation and the biased SVD model requires several iterations to convergence. The input of the model is the user-item matrix, K P , K I , K D of the discrete PID controller, average score of all users, the number of users U and items I. The output is the user latent factor matrix P and item latent factor matrix Q. The final prediction scoring matrix is obtained by multiplying these two matrices.

Algorithm Description
According to [26][27][28][29], it can be concluded that both the PSL model and the biased SVD model help improve the effectiveness of the recommendation system. Therefore, this paper presents a new model combining the PSL model with the biased SVD model to test whether it can provide better recommendation result. In this method, the instantaneous error, i.e., the difference between the true value and the predicted value, is derived from the biased SVD formula, and the error is then fed into the PSL model for an iterative solution.
The calculation formula of the predicted value is the same as that of the biased SVD model (7). Then the instantaneous error value is brought into the PID controller to gain the adjusted error for iteration to obtain the final predicted value. In this way, the advantages of rapid convergence and fewer iterations of the PSL model can be combined with the advantage of high accuracy of the biased SVD model. Substituting the above predicted formula into the PSL model can obtain the formula (15).
Thenτ u,i is calculated by Formula (12) and substituted into the following objective function (16). The regularization term is also changed because of the addition of b u and b i .
the formulas in (17) are the iterative formula of b u and b i obtained based on the objective function. The iterative formulas of P and Q are the same as that of the PSL model.
The pseudocode of the algorithm is shown in Algorithm 1.

Algorithm 1 Bias-PSLSVD.
Require: U, I ,R ,K , λ, α, K P , K I , K D , S ,µ Ensure: P, Q init P |U| * K , Q |I| * K with random numbers in [−0.01,0.01] init b u , b i with 0 init Ψ, Υ with size |R| for each r u,i in R do init Ψ u,i = 0,Υ u,i = 0 end for while not converge and n ≤ S do for r u,i in R do fetch Ψ u,i from Ψ and Υ u,i from Υ r u,i = ∑ K k=1 p u,k q i,k + µ for k = 1 to K do p u,k = p u,k + α(τq i,kλp u,k ) q i,k = q i,k + α(τ p u,kλq i,k ) end for end for n = n + 1 end while

Algorithm Analysis
Since this algorithm is mainly based on the framework of the PSL model, the computational cost of the algorithm is quite similar to the PSL algorithm. In terms of storage complexity, it has two more sequences (b u and b i ) than that of the PSL algorithm. Apart from these, there is no other change. This storage method is suitable for practical development, as it makes the algorithm formula more concise and comprehensible. The storage complexity is described as S Bias−PSL in (18). |U| and |I| mean the number of users and the number of items, respectively, K expresses the number of latent factors, and |R| expresses the amount of data.
S Bias−PSL = (|U| + |I|)(K + 1) + 2|R| The computational cost corresponding to the storage complexity is described as T Bias−PSL in (19). In this formula, step is the number of iterations in the recommendation algorithm.
T Bias−PSL = step * |R| * K It can be seen that the time complexity of the algorithm is mainly determined by the number of iterations, the amount of data and the number of latent factors.

General Settings
The main function of the LFA model is to decompose the scoring matrix of missing data into two latent factor matrices P and Q, the user latent factor matrix and item latent factor matrix, and finally calculate the product of the two latent factor matrices PQ T to obtain a complete scoring matrix. After gaining the complete scoring matrix, we remove the items users have already rated, and then rank the remaining items according to their predicted scores. Naturally, the items with high scores will be recommended first. As a result, accurate item ratings are the key to the recommendation system. The ultimate purpose of the LFA model is to gain missing data in the user-item matrix, so the main method to judge the recommendation effect of a recommendation model is to evaluate the difference between its predicted value and real value. At present, the methods commonly used are root mean square error (RMSE) [34] and mean absolute error (MAE) [35]. In Formula (20), r ui andr ui denote the real score and predicted score, respectively, and |R| denotes the amount of data.
According to the formulas of the two methods, it can be concluded that the smaller the RMSE and MAE value of this model, the better its recommendation effect.

Dataset
The experiments use a dataset from the Movielens [36] series, which was collected from the Movielens system maintained by the Grouplens research team, with a score on the scale of 1 to 5 (1,2,3,4,5). To explore the recommendation effect of the biased PSL model on datasets of different sizes and densities, this experiment selects 3 different sizes of subsets of the dataset. Their details are provided in Table 1.

Model Comparison Test
In this paper, the biased PSL model is compared with the traditional SVD model, the biased SVD model and the PSL model.
The learning rate of the algorithm is unified as 0.01, the regularization parameter is 0.05, K P is 1.44, K I is 0.002, and K D is 0.001. The emphasis is to compare the recommendation effects of different algorithms on different sizes of datasets under the condition of the different number of latent factors (50,100).
All models are accomplished based on surprise [37] (surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.) The models involved in the experiments are as follows: • SVD model (the traditional SVD model): It decomposes the user-item matrix directly; • Biased SVD model: It introduces the user and item bias into the SVD model, taking the individuation into account; • PSL model: It introduces the principle of the discrete PID controller into the SGD and applies the parameter of PID to the traditional SVD model; • Biased PSL model: It integrates the biased SVD model and the PSL model.

Results
The experiments adopt the 80-20% train-test settings and applies five-fold crossvalidations to obtain objective results. The average training processes of compared models are shown in Figures 2-7.      The performance of the four models on the datasets is shown below in Tables 2-4.

Discussion
According to the Figures 2-7, the RMSE and MAE initially decrease as the number of iterations increases. However, once over-fitting occurs, the RMSE and MAE will rise as the number of iterations rises. In this experiment, the lowest point of the RMSE and MAE before over-fitting is chosen for comparison. From these results, we have following findings:

1.
By comparing the 4 models in Figures 2-7, it can be seen that the biased PSL model requires fewer iterations to achieve the best recommendation result; 2.
By comparing the 4 models in Figures 2-7, it can be seen that the final RMSE and MAE of the biased PSL model are smaller, which means it has better recommendation effect; 3.
By  According to Tables 2-4, the biased PSL model outperforms the SVD model, the biased SVD model, and the PSL model in terms of the recommendation result in each dataset with different number of latent factors. Among the three datasets, the smallest dataset (100 K) can best reflect the advantages of the biased PSL model. As shown in Table 2, the RMSE of the biased PSL model decreased by 0.0227, about 1%, compared with the traditional SVD model, the model with the worst recommendation effect. Even if the difference is not extraordinarily obvious, its result is still superior to the biased SVD model and PSL model. From the data in Tables 2-4, it can be proven that the combination of the biased SVD model and the PSL model can result in improved performance.

Conclusions
This paper combines the prediction formula of the biased SVD model with the PSL model, designing a biased PSL model which achieves significantly higher computational efficiency, as well as highly competitive prediction accuracy. This method fuses the prediction method of biased SVD with the instantaneous error correction method of PSL. The biased SVD formula can take into account the scoring habits of different people and the different qualities of items, i.e., taking into account the individuation, to give the most appropriate score for users and items. The PSL model integrates the principle of the discrete PID controller into the SGD-based LFA model, to corrects the errors of the real and predicted scores according to the proportion (P), integral (I), and derivative (D). The biased PSL model blends the advantages of the two models to obtain a better recommendation effect. In the experiment, we use 3 datasets with different sizes and different densities to test the recommendation performance of the biased PSL model and other recommendation models (SVD model, biased SVD model, and PSL model). The prediction effect is evaluated in detail by measuring the results of RMSE and MAE. Eventually, the experimental result delivers that the biased PSL model performs better than the SVD and PSL models on the three HiDS matrices.
Regarding future work, we hope to determine the best parameters appropriate for specific dataset. This model is mainly based on the PSL model, and the principle of the PSL model is discrete PID controller. Thus, its prediction results are greatly affected by the parameters of the PID controller. Owing to the wide range of K P , K I , and K D , it is difficult to ensure whether K P , K I , and K D are most suitable parameters for this dataset. If a system can achieve the best parameters appropriate for the specific dataset, the best recommendation effect could be gained. Furthermore, despite having fewer iterations, the PSL algorithm takes longer to complete each iteration than the SVD algorithm due to the difficulty of vector changes in each iteration. Its capacity to run on a huge dataset is restricted, as it takes longer to complete a single training session. Since new users and items are be introduced regularly into a recommendation system, the lengthy preparation period is not conducive to the implementation of the model. It is difficult to label a recommendation system as outstanding if it cannot measure easily and make initial suggestions as new users and items are being introduced. Besides, we only attempted to combine the PSL model and the biased SVD model in this paper in the hope of a better recommendation effect. Now no attempt has been made to combine the PSL model with other recommendation models. In the future, we can try integrating PSL model with other recommendation models and test their recommendation performance.  Acknowledgments: We thank the National Natural Science Foundation of China for funding our work, grant number 61971268.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: