An Extended-Tag-Induced Matrix Factorization Technique for Recommender Systems

Social tag information has been used by recommender systems to handle the problem of data sparsity. Recently, the relationships between users/items and tags are considered by most tag-induced recommendation methods. However, sparse tag information is challenging to most existing methods. In this paper, we propose an Extended-Tag-Induced Matrix Factorization technique for recommender systems, which exploits correlations among tags derived by co-occurrence of tags to improve the performance of recommender systems, even in the case of sparse tag information. The proposed method integrates coupled similarity between tags, which is calculated by the co-occurrences of tags in the same items, to extend each item’s tags. Finally, item similarity based on extended tags is utilized as an item relationship regularization term to constrain the process of matrix factorization. MovieLens dataset and Book-Crossing dataset are adopted to evaluate the performance of the proposed algorithm. The results of experiments show that the proposed method can alleviate the impact of tag sparsity and improve the performance of recommender systems.


Introduction
Collaborative filtering is one of the most popular recommendation techniques.It is divided into two categories: memory-based collaborative filtering and model-based collaborative filtering.Memory-based collaborative filtering [1][2][3][4], known as neighbor-based methods, using some similarity measures, discovers neighbors of active users and neighbors of target items.Once the neighbors are found, the ratings of neighbors are usually used to predict the rating of target item by memory-based algorithms.Model-based filtering makes use of user-item rating matrix to learn a predictive model by statistical and machine learning methods firstly, thereafter the model predicts the rating of the target item.Recently, matrix factorization, which discovers latent preferences of users and items and deals with very large user-item rating matrix effectively, has developed as a very popular recommendation algorithm.
In the last years, developed recommendation systems use of different sources of information, including social information, social behaviors of users, information of items, etc., to provide recommendations of items to users [5].For example, the contextual description was combined with usage patterns to predict behaviors of users and provide effective recommendation services [6].The detailed categories of extra information integrated by hybrid recommendation methods are listed in Table 1.
Table 1.The categories of extra information.

Categories Detailed Description
Social information the "credibility" of users [7], social relationships of users discovered by social networks [8] Social behaviors of users Users' browsing behaviors [9], users' point of interest [10] Opinions of users Comments given by users [11,12] Information of items Items' reputations, semantic contents [6] and items' attributes [5,13] Tag information Tags annotated by users and tags provided by systems [14] Beside the basic descriptions of users and items, tag information, which has been incorporated into hybrid CBF/CF algorithms by being used to calculate user-based and item-based similarity measures [14], is a kind of useful semantic information for recommendation systems.
Nowadays, collaborative tagging system, which improves the interaction between users and systems, has been applied in several recommender systems.In collaborative tagging system, users are allowed to annotate sources with some specific tags in terms of their comprehensions.To some extent, characteristics of sources can be reflected by tags as users utilize tags to annotate the features and categories of sources.Therefore, more accurate classification of sources can be achieved through analyzing tag information.Meanwhile, personalized preferences of users can be explored by analyzing tag information, because the diversity among annotating tags represents the different personalized information among users.Hence, tag information, as a kind of vital data, brings new challenges and opportunities to recommender systems.
At present, many studies [15] focus on utilizing tags to improve recommendation algorithms and provide more personalized recommendations.To relieve the influence of rating data sparsity, tag information has been introduced to reinforce the relationship of users and items.Peng et al. [16] proposed the method of probabilistic model by considering each tag as a specific topic and measuring the probability of a tag used by a user to annotate an item.Blaze et al. [17] proposed a tensor factorization exploiting the content of items and users' tag assignments through a relevance feedback mechanism for identifying the optimal number of conceptually similar items.Tag-aware method [18] regarded tagging information as a data source for extending user-item rating vectors.TagiCoFi proposed by Yi Zhen et al. [19] exploits tagging information to regularize the matrix factorization procedure of probabilistic matrix factorization.Huang et al. [20] proposed a content-based collaborative filtering using tagging information to alleviate cold start problem and sparsity problem in the collaborative filtering recommendation algorithms.To provide an enhanced recommendation quality derived from user-created tags, Kim et al. [21] proposed a collaborative filtering method which uses collaborative tags as an approach to grasp and filter users' preferences for items.Nguyen et al. [22] studied different content-boosted matrix factorization techniques which integrate content information into the matrix factorization collaborative filtering methods.They also found that these approaches not only improve recommendation accuracy, but also provide useful insights about the contents, as well as make the recommendation more easily interpretable.Huang et al. [23] constructed a personalized user interest by incorporating frequency, recency and tag-based information and performed collaborative recommendations using user's social network in social resource sharing websites.Rawashdeh et al. [24] showed a novel personalized search algorithm building two models of which one is user-tag relation model that reflects how a certain user assign tags which are similar to a given tag, the other one is tag-item relation model that captures how a certain tag is annotated to items which are similar to a given item.Kim et al. [25] proposed a recommender system based on graph model providing recommendations to a group of users instead of a single user, and this system not only considers positive feedbacks from users but also negative feedbacks from users.Three-factor factorization model is used to learn user preference vectors based on the tags annotated by users and item feature vectors based on the keywords corresponding to items [26].The co-occurrences between tags annotated by users and keywords corresponding to items are utilized to create the relationships of tags and keywords, which requires all user have tags and all items have keywords.
Most existing methods compute similarities of users and similarities of items based on match of tags, while few methods consider the tag sparsity problem.Tag sparsity is one of the difficult problems in collaborative recommendation algorithms based on tags.There are two causes of tag sparsity, one is user rarely annotate tags to items or merely annotate very few tags to items, and the other is some tags are different but similar in semantic due to different comprehensions of users.Because of the significant impact and applying value of tags to recommendation techniques, resolving the tag sparsity problem is a crucial task to improve recommendation techniques.As shown in Figure 1, User1 gives a rating to Item1 and annotate Item1 with "psychology" and "clever"; User2 gives a rating to Item2 and annotate Item2 with "clever" and "genius"; Item3 is merely annotated with "genius" by User1 and without any rating.At that time, it is difficult to obtain the similarities of these three items based on ratings and match of tags.tags, while few methods consider the tag sparsity problem.Tag sparsity is one of the difficult problems in collaborative recommendation algorithms based on tags.There are two causes of tag sparsity, one is user rarely annotate tags to items or merely annotate very few tags to items, and the other is some tags are different but similar in semantic due to different comprehensions of users.
Because of the significant impact and applying value of tags to recommendation techniques, resolving the tag sparsity problem is a crucial task to improve recommendation techniques.As shown in Figure 1, User1 gives a rating to Item1 and annotate Item1 with "psychology" and "clever"; User2 gives a rating to Item2 and annotate Item2 with "clever" and "genius"; Item3 is merely annotated with "genius" by User1 and without any rating.At that time, it is difficult to obtain the similarities of these three items based on ratings and match of tags.Fang et al. [27] considered correlations among tags to construct a tag co-occurrence matrix transfer model to regularize the procedure of learning user latent factors and item latent factors.However, the similarities of items based on similar tags is not considered, which is not accurate to capture the item latent factors.Therefore, to alleviate the deviation of similarities of items based on sparse tags, an Extended-Tag-Induced Matrix Factorization (ETIMF) method, which extends tags of items and exploits extended tags to generate similarities of items, is proposed in this paper.Considering the latent feature factors between coupling items similar when their corresponding extended tags are similar, the proposed method uses extended tags to constrain the process of matrix factorization via an item relationship regularization term.The recommender system using the proposed method contain two processing modules, namely tags processing module and recommendation module.Tags processing module carries out the inputting tags and extends them.Once recommendation module receives the extended tags, it uses them to control the process of rating matrix factorization.After that, it predicts and ranks the ratings of the target user.Finally, the Top N items would recommend to the target user.The overview of the recommender system is shown in Figure 2. The results of experiments show that the proposed method can alleviate the impact of tag sparsity and improve the performance of recommender systems.Fang et al. [27] considered correlations among tags to construct a tag co-occurrence matrix transfer model to regularize the procedure of learning user latent factors and item latent factors.However, the similarities of items based on similar tags is not considered, which is not accurate to capture the item latent factors.Therefore, to alleviate the deviation of similarities of items based on sparse tags, an Extended-Tag-Induced Matrix Factorization (ETIMF) method, which extends tags of items and exploits extended tags to generate similarities of items, is proposed in this paper.Considering the latent feature factors between coupling items similar when their corresponding extended tags are similar, the proposed method uses extended tags to constrain the process of matrix factorization via an item relationship regularization term.The recommender system using the proposed method contain two processing modules, namely tags processing module and recommendation module.Tags processing module carries out the inputting tags and extends them.Once recommendation module receives the extended tags, it uses them to control the process of rating matrix factorization.After that, it predicts and ranks the ratings of the target user.Finally, the Top N items would recommend to the target user.The overview of the recommender system is shown in Figure 2. The results of experiments show that the proposed method can alleviate the impact of tag sparsity and improve the performance of recommender systems.recommendation module.Tags processing module carries out the inputting tags and extends them.Once recommendation module receives the extended tags, it uses them to control the process of rating matrix factorization.After that, it predicts and ranks the ratings of the target user.Finally, the Top N items would recommend to the target user.The overview of the recommender system is shown in Figure 2. The results of experiments show that the proposed method can alleviate the impact of tag sparsity and improve the performance of recommender systems.

Primary Definition
In a typical scenario, a recommender system contains a set of M user The user preference of items is usually represented by a user-item rating matrix R ∈ R m×n .Each entry R u,v denotes the rating given by user u on item v.In general, R u,v is an integer and falls into [1, 5], where R u,v = 0 represents user u has not yet given rating to item i.Higher values of R u,v means user u has a better satisfaction on item v.
The tags of an item are denoted by a set of tags T I = {t 1 , t 2 , • • • , t k } and the sizes of the sets on various items can be different.Table 2 shows the tags of some items.{ robots, sci-fi, quirky, genius}

Matrix Factorization
Matrix factorization is widely used and applied to research due to its high efficiency in resolving large-scale user-item rating matrices.Based on the assumption that latent preferences of users and latent characteristics of items can be represented by a certain number of factors, matrix factorization algorithm decomposes user-item rating matrix into two low rank latent feature matrices, namely P ∈ R k×m and Q ∈ R k×n , where k min(m, n), and then uses P and Q to rebuild a predictive rating matrix R * ∈ R m×n .As a result, where the column vectors p u and q v represents the K-dimensional user latent feature vector corresponding to user u and K-dimensional item latent feature vector corresponding to item v respectively.Once getting the low rank vectors, we can use the inner product of p u and q v to estimate the rating given by user u to item v, which is R u,v * = p T u q v .The user latent feature matrix and the item latent feature matrix are learning by minimizing the following loss function, where λ 1 and λ 2 denotes regularization parameters [28], which is used to prevent overfitting.• | 2 F is Frobenius norm [29], and Ω indicates the set of the (u, v) tuples for known ratings.
Generally, the stochastic gradient descent algorithm (SGD) [30] is applied to seek a local minimum of loss function given by Equation ( 2).The matrix factorization is a widely applied collaborative filtering method, thus the matrix factorization algorithm is regard as baseline approach.However, data sparsity and cold start problem are main challenges of matrix factorization recommendation algorithm.

Extended-Tag-Induced Matrix Factorization
Extended-Tag-Induced Matrix Factorization method (ETIMF) aiming at recommending items has three stages.Firstly, tag-tag co-occurence matrix is created by analyzing co-occurrences of tags in the same items.Secondly, extended tag vectors of each coupled items are rebuilt according to tag-tag co-ocurrence matrix and item similarity is estimated by the extended tag vectors.Finally, the item similarity obtained by the second stage is used as a regularization term to constrain the process of matrix factorization.

Tag-Tag Co-Occurrence Matrix
Tags of an item assigned by different users might be different due to the various comprehensions of users.Thus, the similarities of users and the similarities of items only measured by match of users' tags and items' tags are inaccurate.Each two tags' frequencies of co-occurrence in items can be used to evaluate similarity of the two tags.Therefore, tag-tag co-occurrence matrix is created to extend tags.Cosine similarity is used to evaluate co-occurrence distribution of each two tags as follow: where n t,i indicates the number of tag t annotated to item i and n z,i indicates the number of tag z annotated to item i. N(t) is the set of items annotated as tag t and N(z) is the set of items annotated as tag z.N(t) ∩ N(z) is the set of items annotated as tag t and tag z. p(t, z) falls into [0,1] and p(t, z) closer to 1 represents tag t and tag z are more similar.The co-occurrence distribution of tag "sci-fi" and other tags is evaluated by Equation ( 3) and tags co-occurring with "sci-fi" whose co-occurrecne probability ranking top 10 are listed in Table 3.These 10 tags are relevant to "sci-fi" in semantic.Definition 1: Tag-tag co-occurrence matrix T ∈ R k×k : where each entry of above matrix T ij = p(t i , t j ) represents the similarity of tag t i and tag t j .In terms of tag-tag co-occurrence matrix, the relationships of all tags can be obtained.Before evaluating item similarities based on tags, tags of each two items are mapped to the shared tag space of these two items.We define vector n i,j = [n i,j , . . ., n ] as tag vector of item i corresponding to item j and vector n j,i = [n (1) ] as tag vector of item j corresponding to item i.Here, N(T i ) and N(T j ) are the number of tags of item i and the number of tags of item j respectively, and N(T i∩j ) is the number of total tags annotated to item i and item j.

The entry n (k)
i,j is the frequency of tag k annotated to item i and n (k) i,j = 0 if tag k has been annotated to item j but not been annotated to item i.Similarly, the entry n (k) j,i is the frequency of tag k annotated to item j and n (k) j,i = 0 if tag k has been annotated to item i but not been annotated to item j.

Extended Tag Vectors of Item to Item
The item similarities are inaccurate based on above tag vectors of item to item due to the diversity of tags.Tag-tag co-occurrence matrix is used to extend tags and reconstruct extended tag vectors of item to item.For the tag z, which has been annotated to item j but has not item i, we estimate the possible frequency of tag z annotated to item i according to co-occurrence distributions between tag z and all tags annotated to item i.We estimate the possible frequency as follow: where T i denotes the set of tags annotated to item i and n t i denotes the frequency of tag t annotated to item i. p(t, z) is the co-occurrence probability between tag t and tag z, namely the entry (t, z) of tag-tag co-occurrence matrix.N t∈T i indicates the number of total tags annotated to item i.

Similarities of Items Based on Extended Tags
After recreating extended tag vectors of item to item, cosine similarity is used to evaluate the similarity of each two items as follow: where T * i,j indicates the set of tags shared by item i and item j after extending tags.T * i and T * j indicate the set of tags annotated to item i and the set of tags annotated to item j, respectively.sim(i, j) belongs to [0, 1], and item i is more similar to item j if the value of sim(i, j) is closer to 1. Definition 2: item-item similarity matrix based on tags S ∈ R N×N : where the entry S ij = sim(v i , v j ) of this matrix indicates the similarity of item i and item j.

The Process of Matrix Factorization
The main idea of the proposed algorithm is to utilize similarities of items based on extended tags to regularize the process of matrix factorization, and consequently deal with tag sparsity problem and cold start item problem.The similarities of items based on extended tags are converted to an item relationship term.We suppose that two item latent feature vectors q i and q j are similar if these two items have similar tags according to extended tags.
To make two item latent vectors q i and q j as similar as possible if they are annotated as similar tags according to extended tags, an item relationship regularization term based on extended tags is added into matrix factorization to constrain the baseline matrix factorization (MF).
Definition 3: The item relationship regularization term: where β indicates a regularization parameter to control the influence of similarities of items based on extended tags.S i,j indicates the similarity between item i and item j based on extended tags.Higher values of S i,j illustrate the distance between two item latent feature vectors is relatively shorter and vice versa.Therefore, the item regularization term makes two latent feature vectors closer if these two items have more similar tags.Then, we convert the item relationship regularization term as follow: where Q = [q 1 , q 2 , . . ., q n ] is item latent feature matrix which is consist of n item latent feature vectors.The dimensions of each item latent feature vector are K, thus q ki is the value of latent feature vector of item i in the kth demension.Similarly q kj is the value of latent feature vector of item j in the kth dimension.L = D − S represents the Laplacian matrix [30] and D is a diagonal matrix with diagonal elements D ii = ∑ j S ij .tr(•) is the function to calculate the trace of inputting matrix.After adding item relationship regularization term into loss (Equation ( 2)), the proposed extended-tag-induced matrix factorization method can be formulated as follow: where λ 1 and λ 2 is the regularization parameter which controls complexity and prevents overfitting [18].Combining Equations ( 9) and ( 10), the loss function is converted to where • is the Hadamard product, W is the indicator matrix and W u,i = 1 indicates user u has given a rating to item i.Through gradient descent algorithm [31], the local minimum of loss function can be obtained.In each iteration, the gradients of P and Q are shown by Equations ( 12) and ( 13), respectively.
To summarize, the proposed extended-tag-induced matrix factorization algorithm is described in Algorithm 1.

Algorithm 1. The Framework of Proposed Recommendation Algorithm ETIMF
Input: User-item rating matrix R ∈ R m×n , the set of tags T I = {t 1 , t 2 , • • • , t k }, the dimension of latent feature vector K, the number of iteration W, the step size of gradient descent η, parameters λ 1 , λ 2 and β.Output: The user latent feature matrix P and the item latent feature matrix Q.
1: Compute tag-tag similarity matrix by using Equation (3) 2: Create tag vectors of item to item 3: Recreate extended tag vectors of item to item 4: Using Equation (6) compute similarities of items according to extended tag vectors of item to item 5: Initialize P ∈ R k×m and Q ∈ R k×n randomly 6: for w = 1 to W do 7: for k = 1 to K do 8: 10: end for 11: end for

Results
MovieLens and Book-Crossing datasets were utilized to perform several groups of experiments.The first group consisted of comparing experiments for evaluating the performance of the proposed method.The second group analyzed the influences of regularization parameter β and the dimension of latent feature vector K.The third group was performed to discuss efficiency performances of different methods.The proposed method in the case of sparse tags was evaluated by the fourth group.The last group was designed to verify whether the proposed method can alleviate the cold start item problem.

Dataset
Several datasets have been adopted to evaluate the performance of recommendation algorithms.MovieLens 20M dataset [32] and Book-Crossing datasets [33] were applied to perform the experiments.MovieLens 20M dataset, which contains 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users, is provided by GroupLens in 2015.Book-Crossing dataset contains 278,858 users providing 1,149,780 ratings about 271,379 books.
First, we filtered the original datasets to gain appropriate experimental datasets.For MovieLens 20M dataset, the movies rated by at least 20 users were extracted as usual movies.Tags annotated by at least five users and assigned to at least five usual movies were kept as distinct tags.For users, we only kept those users who annotated at least three distinct tags in the tagging history as distinct users.For movies, we only kept those movies that were annotated by distinct users and annotated as distinct tags as final distinct movies.For Book-Crossing dataset, the books rated by at least 20 users were extracted from the original dataset as usual books.For tag information, authors who had written at least three usual books and presses that hadpublished at least 15 usual books were kept as distinct tags.We only considered those items with tags in the experiments, but ETIMF method can be used in the case that items are without any tags.For items without any tags, the similarities them and other items are 0, which means the item regularization term (Equation ( 9)) will have no effect on those items.Consequently, ETIMF method changes into basic MF.For evaluating the effectiveness of the proposed method, we only kept those items which had been annotated as tags and given ratings from original datasets.
After filtering the datasets, for MovieLens 20M dataset, 2161 distinct movies were kept, of which 999 were selected as the final distinct movies; for Book-Crossing dataset, 122,433 distinct books were kept, of which 10,000 were selected as the final distinct books.Each movie is rated using a discrete scale from 1 to 5 in MovieLens dataset, and each book is rated using a discrete scale from 1 to 10 in Book-Crossing dataset.General statistics about the final datasets are summarized in Table 4.In the experiments, the above datasets were required to be divided into training set which is for learning the parameters of models and test set which is for evaluating the models.Thus, the rating records were randomly split into two parts, each of which contained 50% of known ratings.One part was used as a test set, which was kept the same in all experiments.The other part was used as data pool to generate different training sets.For example, the size of the training set of 20% represents 20% of records were selected from the data pool as a training set.For each training set size, 10 different training sets were selected from data pool to perform 10 experiments and the average result was the final result.

Performance Evaluation
Mean Absolute Error (MAE), which gives average absolute deviation between the real ratings and predictive ratings, was utilized to measure the recommendation quality of the proposed method compared with other recommendation algorithm.Formally, where R ij and R * ij represent real ratings and predictive ratings respectively.In general, a lower MAE means a lower deviation between real values and predictive values, namely ahigher quality of the recommendation algorithm.

Recommendation Quality Comparisons
To evaluate the performance of the proposed method, the following state-of-the-art approaches were chosen for comparison.
1. MF: Proposed by Koren et.al [34], MF learns user latent feature matrix and item latent feature matrix by minimizing the sum-of-squared errors between real ratings and predictive ratings.The number of latent feature vector K is 10. 2. TagiCoFi: Proposed by Yi Zhen et.al [19], TagiCoFi exploits tagging information to regularize the MF procedure of PMF.More specifically, it seeks to make two user latent feature vectors as similar as possible if the two users have similar tagging history.Here, we modify this algorithm to constrain item latent feature vectors by tagging information of items.The number of latent feature vector K is 10. 3. Tag Matrix Transfer(TMT) model: Proposed by Fang et.al [27], TMT model uses a tag co-occurrence matrix, which is generated by Bayesian method, as the third factor into matrix factorization to improve the quality of recommender system.The number of latent feature vector K is the number of tags.
To make a fair comparison, we set parameters of each method according to respective references or based on the best performance of our experiments.For MovieLen 20M dataset, we set λ 1 = λ 2 = 0.1 and the learning rate η to be 0.05, and the control parameters of β in TagiCoFi and ETIMF were set to 0.1 and 1.3 respectively.For Book-Crossing dataset, we set λ 1 = λ 2 = 0.1 and the learning rate η to be 0.01, and the control parameters of β in TagiCoFi and ETIMF were set to be 0.1 and 4.5, respectively.In the comparisons, we used training sets size of 20%, 50% and 80% and report the average results on test sets.
The results of comparisons on these methods are shown in Tables 5 and 6.We can observe that, no matter in which dataset, ETIMF achieves the best performance among all compared methods.Comparing with MF, the main difference of TagiCoFi, TMT and ETIMF lies in the extra tagging information.The results demonstrate that the recommendation quality can be improved by exploiting tagging information.Furthermore, ETIMF extends the tags of each item for supplementing the tagging information.The observations of the comparisons in two datasets demonstrate that ETIMF, which exploits extended tagging information, can further improve the recommendation quality.Therefore, extended tagging information is helpful to generate a better recommendation.

The Influence of Tagging Information
The value of the parameter β in ETIMF controls the influence of extended tags of items in learning each item latent feature vector.The higher value of β representswe put a larger weight in extended tags of items.The extended tags of items are added to learning item latent feature vectors and make two item latent feature vectors as similar as possible if the two item have similar tags and extended tags.When β = 0, ETIMF is reduced to basic matrix factorization.Hence, it is necessary to analyze the influence of β.In this section, we only use the training set of size 20% and fix the dimensions K of latent feature vector as 10.For MovieLens 20M dataset, we perform a group of experiments by changing β from 0.1 to 2.9.For Book-Crossing dataset, we perform a group of experiments by changing β from 3 to 6.
The results of changing are shown in Figure 3. MAE decreases along with the value of β increase, MAE is the smallest when β reaches 1.3/4.5, and MAE increases along with the value of β increase when it is beyond 1.3/4.5.It illustrates that only using user-item rating matrix by abandoning tagging information or excessively using tagging information cannot achieve reliable recommendation.The dimensions K of latent feature vectors is another parameter.In this section, based on training set of size 20%, we perform the experiments of TagiCoFi and ETIMF with K from 10 to 50.For MovieLens 20M dataset, the  of TagiCoFi is set to 0.1 while the  of ETIMF is set to 1.3.For Book-Crossing dataset, the  of TagiCoFi is set to 0.1 while the  of ETIMF is set to be 4.5.The results of TagiCoFi and ETIMF with different K are shown in Fig. 3.As K increases, the values of MAE decreases.As we known, the larger is the value of K, the more preference can be represented by the latent features.However, Figure 4 show that the improvement gets smaller when the value of K continually increases.It illustrates that the existing latent features can represent useful information when the value of K reaches a certain threshold, and the value of K passing the threshold may introduce noise into the loss function.Form the experimental result, we can observe that ETIMF can gain good performance with K taking a large range of values.

Efficiency comparisons
The MAE of ETIMF starts better than that of TMT when the dimensions K of latent feature

The Influence of Dimension of Latent Feature K
The dimensions K of latent feature vectors is another parameter.In this section, based on training set of size 20%, we perform the experiments of TagiCoFi and ETIMF with K from 10 to 50.For MovieLens 20M dataset, the β of TagiCoFi is set to 0.1 while the β of ETIMF is set to 1.3.For Book-Crossing dataset, the β of TagiCoFi is set to 0.1 while the β of ETIMF is set to be 4.5.The results of TagiCoFi and ETIMF with different K are shown in Figure 3.As K increases, the values of MAE decreases.As we known, the larger is the value of K, the more preference can be represented by the latent features.However, Figure 4 show that the improvement gets smaller when the value of K continually increases.It illustrates that the existing latent features can represent useful information when the value of K reaches a certain threshold, and the value of K passing the threshold may introduce noise into the loss function.Form the experimental result, we can observe that ETIMF can gain good performance with K taking a large range of values.The dimensions K of latent feature vectors is another parameter.In this section, based on training set of size 20%, we perform the experiments of TagiCoFi and ETIMF with K from 10 to 50.For MovieLens 20M dataset, the  of TagiCoFi is set to 0.1 while the  of ETIMF is set to 1.3.For Book-Crossing dataset, the  of TagiCoFi is set to 0.1 while the  of ETIMF is set to be 4.5.The results of TagiCoFi and ETIMF with different K are shown in Fig. 3.As K increases, the values of MAE decreases.As we known, the larger is the value of K, the more preference can be represented by the latent features.However, Figure 4 show that the improvement gets smaller when the value of K continually increases.It illustrates that the existing latent features can represent useful information when the value of K reaches a certain threshold, and the value of K passing the threshold may introduce noise into the loss function.Form the experimental result, we can observe that ETIMF can gain good performance with K taking a large range of values.

Efficiency comparisons
The MAE of ETIMF starts better than that of TMT when the dimensions K of latent feature vectors in ETIMF is equal to 50.Therefore, based on the training set of size 20%, K is set to 50 in MF, TagiCoFi and ETIMF to compare the efficiencies of MF, TagiCoFi,TMT and ETIMF.In the group of

Efficiency Comparisons
The MAE of ETIMF starts better than that of TMT when the dimensions K of latent feature vectors in ETIMF is equal to 50.Therefore, based on the training set of size 20%, K is set to 50 in MF, TagiCoFi and ETIMF to compare the efficiencies of MF, TagiCoFi, TMT and ETIMF.In the group of experiments, the values of β in TagiCoFi and ETIMF are set to 0.1 and 1.3, respectively.For MovieLens 20M dataset, the values of β in TagiCoFi and ETIMF are set to 0.9 and 4.5, respectively.
The statistic of running time (the unit is second) of recommendation methods is shown in Table 7.For the training set with the same size, no matter MovieLens 20M dataset or Book-Crossing dataset, the running time of ETIMF is larger than that of MF and that of TagiCoFi, but it is much smaller than that of TMT, which illustrates that the efficiency performance of ETIMF is better than that of TMT.Although the efficiency of ETIMF is worse than that of MF and that of TagiCoFi, the MAE of ETIMF is lowest in all comparison methods.

Performance of Sparse Tagging Information
Tag sparsity usually has two representations, one is users only annotate a few items with tags, and the other is only a small number of items are annotated with tags.In the case of tag sparsity, the tags annotated to items cannot represent the characteristics of items clearly.Hence, the similarities of items only measuring existing tags of items are inaccurate.In this section, we perform the experiments of TagiCoFi and ETIMF by using a part of tags (10%, 30%, 50% and 80%) randomly selected from existing tags.In this group of experiments, the value of K is set to be 50 uniformly in TagiCoFi and ETIMF.For MoiveLens 20M dataset, the values of β in TagiCoFi and ETIMF are set to 0.1 and 1.3, respectively.For Book-Crossing dataset, the values of β in TagiCoFi and ETIMF are set to 0.9 and 4.5, respectively.
Tables 8 and 9 show the impact of tagging information size for 20% MovieLens 20M training set and that for 20% Book-Crossing training set respectively.From these results, we can see that the performance of ETIMF is obviously better than that of TagiCoFi and TMT.Furthermore, the value of MAE in ETIMF has no obvious decrease with sparse tags.The experimental results demonstrate that ETIMF performs better in the case of tag sparsity.

Performance of Item Cold Start Problem
Cold start problem is one of the most difficult challenges of recommendation algorithms.Recommender system are required to recommend items to the user who has not rated any items and recommend the item which has not been rated by any user to users.Most collaborative filtering recommendation algorithms, such as matrix factorization algorithms, have poor performances in the case of cold start problem due to lack of preference information.
In this section, based on the training set of size 20%, we randomly removed the rating records of 50 and 100 items from the training set.Those removed items are regarded as cold start items, namely new items in the recommender system.In this group of experiments, the value of K is set to be 50 uniformly in TagiCoFi and ETIMF.For MoiveLens 20M dataset, the values of β in TagiCoFi and ETIMF are set to 0.1 and 1.3, respectively.For Book-Crossing dataset, the values of β in TagiCoFi and ETIMF are set to 0.9 and 4.5, respectively.
The performances of TagiCoFi, TMT and ETIMF in cold-start setting for different datasets are shown in Tables 10 and 11.The performance of ETIMF for two datasets is distinctly better than that of TagiCoFi and TMT.It demonstrates ETIMF can integrate extended tags well to recommend new items to users.

Discussion and Conclusions
An extended-tag-induced matrix factorization method is proposed for recommender systems in this paper.The proposed method makes a pair of item latent feature vector as similar as possible if they are annotated as similar tags according to extended tags.An item relationship regularization term based on extended tags is added into matrix factorization to constrain the baseline matrix factorization.Experimental results on real datasets demonstrate the proposed algorithm can outperform state-of-the-art collaborative filtering algorithms, including some recommendation methods which use tagging information.Furthermore, it can not only cope with the cold start item problem but also alleviate tag sparsity.
Tagging information not only contains the description of items but also the sentiment of users.For improving the development of recommendation techniques, our future work is to integrate tagging information into recommendation algorithms.One of our future research directions is filling the missing ratings by tagging information before learning latent features.Moreover, we plan to analyze the sentiment of tags to discard noisy tags and classify the remaining tags precisely.More extra information, such as social behaviors of users and social information, will be considered to combine with tag information to provide more precise user profile and item profile to recommender system.

Figure 1 .
Figure 1.The example of the recommendation with the association of tags.

Figure 1 .
Figure 1.The example of the recommendation with the association of tags.

Figure 2 .
Figure 2. The overview of the recommender system with the association of tags.Figure 2. The overview of the recommender system with the association of tags.

Figure 2 .
Figure 2. The overview of the recommender system with the association of tags.Figure 2. The overview of the recommender system with the association of tags.

Figure 3 .
Figure 3.The impact of tagging information: (a) the impact of  in MovieLens 20M; and (b) the impact of  in Book-Crossing.

3. 5 .
The influence of dimension of latent feature K

Figure 4 .
Figure 4.The impact of latent factor number K: (a) the impact of K in MovieLens 20M; and (b) the impact of K in Book-Crossing.

Figure 3 .
Figure 3.The impact of tagging information: (a) the impact of β in MovieLens 20M; and (b) the impact of β in Book-Crossing.

Figure 3 .
Figure 3.The impact of tagging information: (a) the impact of  in MovieLens 20M; and (b) the impact of  in Book-Crossing.

3. 5 .
The influence of dimension of latent feature K

Figure 4 .
Figure 4.The impact of latent factor number K: (a) the impact of K in MovieLens 20M; and (b) the impact of K in Book-Crossing.

Figure 4 .
Figure 4.The impact of latent factor number K: (a) the impact of K in MovieLens 20M; and (b) the impact of K in Book-Crossing.

Table 2 .
The tags of some items.

Table 5 .
The comparison of different recommendation method in MovieLens 20M.

Table 6 .
The comparison of different recommendation method in Book-Crossing.

Table 7 .
The statistic of running time in two datasets.

Table 8 .
The Impact of tag information size in MovieLens 20M.

Table 9 .
The Impact of tag information size in Book-Crossing.

Table 11 .
MAE comparison in Book-Crossing cold-start setting.