A Social Recommendation Based on Metric Learning and Users’ Co-Occurrence Pattern

For personalized recommender systems, matrix factorization and its variants have become mainstream in collaborative filtering. However, the dot product in matrix factorization does not satisfy the triangle inequality and therefore fails to capture fine-grained information. Metric learning-based models have been shown to be better at capturing fine-grained information than matrix factorization. Nevertheless, most of these models only focus on rating data and social information, which are not sufficient for dealing with the challenges of data sparsity. In this paper, we propose a metric learning-based social recommendation model called SRMC. SRMC exploits users’ co-occurrence patterns to discover their potentially similar or dissimilar users with symmetric relationships and change their relative positions to achieve better recommendations. Experiments on three public datasets show that our model is more effective than the compared models.


Introduction
With the rapid development of the internet, information overload [1] has become a common problem.To help users find truly valuable information better and faster, recommender systems have been widely applied in recent decades. Traditional recommender systems are mainly divided into two categories: content-based recommendation and collaborative filtering recommendation. [2] In collaborative filtering-based recommendation models, matrix factorization (MF) plays an important role due to its efficiency and scalability. In MF, each user or item is represented by a user or item latent vector, and the dot product between them is used to capture known ratings and predict unknown ratings. Since the dot product does not satisfy the triangle inequality,the MF model cannot reliably capture the item-item or user-user similarity, nor can it capture the fine-grained preferences present in user feedback, as Hsieh has proved. [3] Metric learning-based models produce distance functions that both satisfy the triangle inequality and capture important relationships among data.They have been widely used in various tasks for classification and clustering. [4] Accordingly, some works [5,6,7] utilize metric learning to overcome the disadvantages of the MF models.These works project users and items into a low-dimensional metric space, where user preferences are measured by the distance between the user and the item.Specifically, Hsieh's CML [3] minimizes the distance between the users and their positively rated items, making them closer to their preferred items.Tay's LRML [6] incorporates a memory network to further learn relations between users and items in metric space. Yu's SocialFD [7] changed users' spatial location, bringing users closer to their positively rated items and trusted friends.
Existing metric learning-based models have achieved satisfactory results,but these models still face the challenge of data sparsity. To alleviate this problem, they have introduced social information and achieved a certain degree of success [8].However,all of these models overlook an important issue: social information is often as sparse as rating data, and most users' social information is still very sparse.An exploratory work in matrix factorization models is Liang's Cofactor [9], which jointly decomposes the user-item rating matrix and the item-item co-occurrence matrix with shared item latent factors. The Cofactor considers that the items that users often consume in tandem have some similarity, effectively enhancing the recommendation model. Tran's RME [9] further extends the Cofactor with the co-occurrence patterns of users and successfully proves that the effectiveness of the co-occurrence pattern for the matrix factorization model.
Although the co-occurrence pattern of users or items effectively improves the matrix factorization model, the way to utilize this co-occurrence pattern information in a metric learning-based model is still a problem that needs to be solved. In our work, we propose a metric learning-based social recommendation model, called SRMC.SRMC exploits information about the users' co-occurrence patterns to discover users with symmetric relationships,whose consumption behavior is extremely similar or dissimilar and changes their relative positions in the metric space to achieve better recommendations.Our main contributions are shown as follows: 1. We propose a metric learning-based social recommendation model(SRMC),which provides a new idea of how to exploit co-occurrence pattern information in a metric learning-based model. 2. We provide an idea of how to exploit the user's co-occurrence pattern information to discover their potentially similar of dissimilar users with symmetric relationships. 3. We conducted extensive experiments on three datasets to demonstrate the superiority of SRMC over comparative algorithms for rating prediction tasks.

Social Recommender system
Traditional recommender systems have been facing the problem of data sparsity. With the development of social network platforms, social recommender systems have emerged and effectively alleviate this problem. Social recommender systems assume that users are influenced by users with social relationships, resulting in some similarity in their preferences [10].Specifically, if a user interacts with only a few items, we can infer his preferences based on his friends' interactions and then generate better recommendations. Early explorations of this idea focused on matrix factorization (MF) and achieved satisfying results. Ma's SoRec [11] model utilized users' social networks to alleviate the data sparsity problem and improve the recommendation effect.Mohsen [12] introduces the trust propagation principle into the matrix factorization model. Guo exploited the implicit information and proposed TrustSVD [13].Zhao introduced social information into the Bayesian personalized ranking algorithm and proposed SBPR. [14]

Metric Learning In Recommender System
The goal of metric learning is to learn a suitable distance metric under the condition of a given set of constraints to ensure that the distribution of similar samples is more compact and the distribution of different samples is more spread out [15]. There are many distance functions that can be utilized, such as Euclidean distance and Mahalanobis distance. The Mahalanobis distance between any two points and can be expressed by: In (1), × has to be a positive semidefinite matrix to keep the distance nonnegative and symmetric. The global optimization problem with constraints can be stated as: where S denotes the set of equivalent constraints in which and belong to the same class, and denotes the set of inequivalent constraints in which and belong to different classes. is the minimum distance between two different classes of data points.
Compared to the dot product in matrix factorization,the metric learning-based model reflects the user's preferences more accurately.For example, let user be at = (1,1) in the matrix factorization model and let user be at = (1,1) in the metric learning-based model.Next we are going to recommend the most suitable items for both users. Obviously, for user , we only need to recommend the closest item in the metric space, i.e., the closest space point to = (1,1). However, we cannot recommend the most suitable item for user , but it is certain that = (1,1) will be worse than = (2,2) to recommend for user .

Co-occurrence Pattern In Matrix Factorization
It is well known that Word2vec [16] has achieved substantial success in NLP tasks.Essentially, Word2vec uses word vectors to represent semantic information in a massive corpus, making similar words closer in the word vector space.Word2vec has two main models, Skip-Gram to predict context words from input words and CBOW to predict input words from given context words. In Skip-Gram model,Pointwise mutual information(PMI) values measure the association between a word w and its context word c by calculating the log of the ratio between their joint probabilities ( , ) and their marginal probabilities ( ) ( ). The formula of PMI is shown as: It can also be written as: Here #( , ) is the number of times word appears in the context of word . #( ) = ∑ #( , ) and #( ) = ∑ #( , ) . | | is the total number of word-context pairs. Liang's Cofactor suggested that items that were frequently consumed by users in tandem also have some similarity. It utilized the PMI formula to measure the similarity of two items, where and denoted these two items, ( ) and ( ) denote the probability of these two items being purchased separately, and ( , ) denoted the probability of these two items being purchased together. After the calculation is completed, the Cofactor created a matrix for all item-item pairs, where the values in the i-th row and j-th column were the PMI values of item and item . After that, Liang jointly decomposed the rating matrix and the item-item co-occurrence matrix,making them share the same item latent factors, achieving satisfactory results.

Proposed Methodology
In this section, we propose a social recommendation model based on metric learning and the users'co-occurrence pattern(SRMC). The inspiration for SRMC is that Cofactor Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 September 2021 doi:10.20944/preprints202109.0489.v1 can effectively enhance matrix factorization models. Enlightened by the success of distance metric learning on classification tasks,we utilize the users' co-occurrence patterns to distinguish sets of users with extremely similar or dissimilar consumption habits, and combine social information to change their relative positions in the metric space. Finally, each user moves closer to trusted friends, preferred item and potentially similar users and away from disliked items and potentially dissimilar users in the space. After that, SRMC will generate more reasonable recommendation results based on the changed distance.

Model Definition
In SRMC, we construct two constraints in metric learning by this approach: given a user with several potentially similar users and several potentially dissimilar users and an item , if user u gives a positive rating to item , then we add that user and item , user u and its potentially similar users to the set of equivalent constraints; otherwise we add user and item , user and its potentially dissimilar users to the set of inequality constraints.The framework of the SRMC model is shown in Figure 1. The predicted rating of user u and item i is determined by the distance between them and can be defined as: where is the global mean, is user bias, is item bias, and and are point vectors of user and item in the metric space. ‖ − ‖ is the squared Mahalanobis distance between user and item .The reason why we utilize squared Mahalanobis distances is that they are cheaper to calculate than Mahalanobis distances and the impact on accuracy is minimal.
is a positive semi-definite matrix that can be calcalated by = . The loss function of the SRMC is shown below： where and denote the potentially similar users and trusted users of user . is the potentially dissimilar users of user . and are the sets of positively and negatively rated items of user ,respectively. [ ] = max ( , 0) is the standard hinge loss. controls the magnitudes of biases. and control the magnitude of two constraints. With these two constraints, users are guaranteed to be closer to their trusted users, positively rated items and potentially similar users, but further away from their negatively rated items and potentially dissimilar users.
indicates the confidence level. For extremely high or extremely low ratings, we will assign greater weights.
controls the size of the confidence level, and /2 indicates the median rating of the current dataset.
SRMC optimizes the loss function using stochastic gradient descent and updates , , H, , by： where is the difference between the true rating and the predicted rating.

Discover Potentially Similar Or Dissimilar Users
Considering that our aim is to find potentially similar or dissimilar users, we first filter the ratings in the rating matrix by a threshold, and any rating below the threshold will be removed.For different datasets, the threshold is also different. The first purpose of filtering is to ensure that both user and user have positively rated the items they have jointly rated, rather than just rated them. The second purpose is to ensure that the found users' consumption behavior is extremely similar or extremely dissimilar.For instance, for item , user has a positive rating and user has a negative rating. It is obvious that for item , user and user do not have the same preference. However, if the PMI value between user and user is calculated directly without filtering the rating matrix, the wrong situation of considering user and user as potentially similar users may occur.
After filtering the rating matrix,the PMI values between users are calculated as follows: where | | indicates all user-user pairs for which a rating is in the filtered rating matrix.#( ) and #( ) represent the number of interactions between user and user , respectively. #( , ) represents the number of items that are positively rated by both and .For instance, if and both have positive ratings for [ , , ] and they have no common ratings for other items in all item sets, then #( , ) = 3.
The higher the PMI( , ) is, the more positive the rated items that both user and user have rated, and the more similar the consumption behavior of user and user is. Conversely the lower PMI( , ) is,the less relevant and less similar user and user are.
Another issue worth considering is what happens when the PMI is negative. When calculating PMI values in the Skip-Gram model, there are cases where PMI values are negative or even negative infinity. This means that the ( , ) word-context pairs rarely or never appear together in the sliding window. In SRMC, the above case can be interpreted as the number of items with positive ratings from both user and user being very few or even zero. In other words, the consumption behaviors of user and user are extremely dissimilar. Previous studies tend to ignore the situation when PMI is negative and filter the PMI values by: where is the parameter that controls the size of the SPPMI matrix. Later, those models based on matrix factorization will jointly decompose the rating matrix and the SPPMI matrix.When PMI( , ) is negative, SRMC considers user and user to be extremely dissimilar and increases their distance in metric space.Therefore, we filter the PMI values by the following equation： Positive PMI( , ) = max (PMI( , ) − log( ) , 0) Negative PMI( , ) = max (PMI( , ) + log( ) , σ) When the PMI value between user and user is positive, SRMC uses the Formula 16 to determine whether user is a potentially similar user to user . Similarly, when the PMI value between user and user is negative, SRMC uses the Formula 17 to determine whether user is a potentially dissimilar user for user . σ is a small negative number, and the value of σ varies depending on the datasets. and are also two constants that change with the datasets, ensuring that the users have extremely similar or extremely dissimilar consumption behaviors.
After calculating the PMI values among all of the users in the filtered rating matrix, we can obtain the set of potentially similar or dissimilar users for each user.The process of discovering potentially similar or dissimilar users is shown in Figure 2.

Expetiment
In this section,we present the experimental results of the SRMC on three public datasets and conduct two main experiments:(1)the first experiment includes the recommendation quality of SRMC compared to other algorithms and (2)the second experiment examines the effect of parameters on SRMC.

Datasets and Evaluation Metrics
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
We used three public datasets that provide user-item ratings and trust relationships to validate SRMC's recommendation effect. FilmTrust [17] 's data were obtained from the FilmTrust website, containing 1508 users and 2071 movies. The sparsity of its data is 98.86%. FilmTrust's rating range is [0. 5,4], and the step size is 0.5. The Douban[18] dataset has a data sparsity of 99.21% and contains 2848 users, 39586 items, and 894887 ratings. Douban' rating range is [1,5], and the step size is 1. The Epinions[19] dataset has a data sparsity of 99.98% and contains 49,286 users, 139,738 items, and 664,824 ratings. Epinions' rating range is [1,5], and the step size is 1. We use root mean square error (RMSE) and mean absolute error (MAE), the two most commonly used evaluation metrics in recommender systems, as our evaluation criteria.
RMSE is defined as: We use root mean square error (RMSE) and mean absolute error (MAE), the two most commonly used evaluation metrics in recommender systems, as our evaluation criteria.
where indicates the number of ratings in the test set. is the true rating,and ̂ is the predicted rating.A lower / indicates that missing ratings are predicted more precisely. Lower values of and indicate more accurate rating prediction of SRMC.

Compared Algorithms
To demonstrate the performance improvement of SRMC, we conducted a series of experiments to test our proposed SRMC. We chose several representative and relatively new models incorporating social relationships as our comparison algorithms. These models are shown below: SoRec: This model shares the same user latent space to factorize the user-item rating matrix and the user-user social matrix based on the probability matrix factorization (PMF).
SocialMF:This introduces a trust propagation mechanism based on matrix factorization to alleviate the cold start problem. SoReg: This model uses social regularization to denote social constraints, making two potentially similar users more similar in terms of latent feature vectors.
UE-SVD++:This method is a matrix factorization-based model that jointly decomposes the rating matrix and the user-user co-occurrence matrix.
SocialFD: In the metric space, SocialFD brings users closer to their preferred items, pushes them farther away from their disliked items, and brings them closer to their friends in space.

Model Parameter Selection
This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn. The selection of model parameters has a substantial impact on the recommendation performance of SRMC.In this section, we investigate the impact of several important parameters ( , , , ) on SRMC.In our experiments,we use the 5-fold cross validation method to process the dataset. The whole dataset is randomly divided into five parts, with 80% of the data as the training set and the remaining 20% as the test set.  This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
In this section, we utilize the control variables method to determine the most suitable parameters for SRMC. First, we determine the range of values for each parameter based on previous work, fix the other three parameters and continuously adjust the current parameter α until the optimal value is found. Then, we proceeded in a similar manner to find the optimal values for other parameters of SRMC.
From Figures 3,4 and 5, we can see that the variation of rating prediction accuracy is mainly affected by , and , because these three parameters control the distribution of data points in the space. When is too small, SRMC cannot effectively reduce the spatial distance from the preferred items and increase the spatial distance from the disliked items for each user. Conversely, when is too large, SRMC reduces the distance that should not be reduced smaller and increases the distance that should not be increased larger, which impacts the recommendation effect. Similarly, when is too small, SRMC cannot effectively reduce the spatial distance for each user with trusted users and potentially similar users, as well as increase the spatial distance with potentially dissimilar users, and vice versa. ensures that the biases and the distance between two points in the space are in a suitable range. If either too large or too small will affect recommendation effect.
It is also worth mentioning how SRMC defines preferred and undesired items and how it defines potential similar users and potential dissimilar users.For these two questions, we conducted many experiments; for the question asking whether users like an item, we made a judgment based on the ratings given by users; and for the question asking whether users are potentially similar to each other we made a judgment based on the PMI values from users to users.
In the FilmTrust dataset, if a user gives a rating of 3.0, 3.5,or 4.0, we assume that the user likes the item; if a user gives a rating of 0.5, 1.0,or 1.5, we assume that the user does not like the item. In the following, we consider how to distinguish potentially similar users from dissimilar users. If the PMI value between users is higher than 4.5, we consider these two users to be potentially similar users.If the PMI value between users is less than -0.6, we consider these two users to be potentially dissimilar users.
In the Douban dataset, if a user gives a rating of 4 or 5, we assume that the user prefers the item, and if the user gives a rating of 1 or 2, we assume that the user does not prefer the item. In the following, we consider how to distinguish potentially similar users from dissimilar users. If the PMI value between users is higher than 2.5, we consider these two users to be potentially similar users. If the PMI value between users is less than -1.7, we consider these two users to be potentially dissimilar users.
In the Epinions dataset, if a user gives a rating of 4 or 5, we assume that the user likes the item, and if the user gives a rating of 1 or 2, we assume that the user does not like the item. In the following, we consider how to distinguish potentially similar users from dissimilar users. If the PMI value between users is higher than 6, we consider these two users as potentially similar users. If the PMI value between users is less than -0.7, we consider these two users as potentially dissimilar users. Compared with the baseline algorithm, SRMC has the best recommendation results among all of the datasets. Compared with the suboptimal algorithm SocialFD,SRMC improves RMSE by 1.34% and MAE by 3.35% in the FilmTrust dataset and improves RMSE by 2.15% and MAE by 4.72% in Epinions.It is worth noting that similar to SRMC, SocialFD maps all users and items into the metric space and brings users closer to trusted users and user-preferred items, and pushes users further away from unpreferred items. Nevertheless SRMC is more effective than SocialFD. In addition, we control the relative distance between data points in the regularization term, which makes the data distribution in the whole metric space more compact and achieves a better recommendation effect. The comparison chart between SRMC and other algorithms is shown in Figure 6.

Conclusion and future work
This paper is inspired by Cofactor's effective enhancement for matrix factorization models.Considering the limitations of the matrix factorization model, we propose a new recommendation model, called SRMC, which is based on metric learning and user cooccurrence patterns. SRMC utilizes user co-occurrence patterns to distinguish users from extremely similar or dissimilar consumption behaviors and changes the relative positions of different users and items in the metric space. The obtained distances are used to generate understandable and reliable recommendations. In future work, we will try to extend SRMC with other auxiliary information (e.g., item content) and hope to validate SRMC on a more larger dataset.