Social Recommendation Based on Multi-Auxiliary Information Constrastive Learning

: Social recommendation can effectively alleviate the problems of data sparseness and the cold start of recommendation systems, attracting widespread attention from researchers and industry. Current social recommendation models use social relations to alleviate the problem of data sparsity and improve recommendation performance. Although self-supervised learning based on user– item interaction can enhance the performance of such models, multi-auxiliary information is neglected in the learning process. Therefore, we propose a model based on self-supervision and multi-auxiliary information using multi-auxiliary information, such as user social relationships and item association relationships, to make recommendations. Specifically, the user social relationship and item association relationship are combined to form a multi-auxiliary information graph. The user– item interaction relationship is also integrated into the same heterogeneous graph so that multiple pieces of information can be spread in the same graph. In addition, we utilize the graph convolution method to learn user and item embeddings, whereby the user embeddings reflect both user–item interaction and user social relationships, and the item embeddings reflect user–item interaction and item association relationships. We also design multi-view self-supervising auxiliary tasks based on the constructed multi-auxiliary views. Signals generated by self-supervised auxiliary tasks can alleviate the problem of data sparsity, further improving user/item embedding quality and recommendation performance. Extensive experiments on two public datasets verify the superiority of the proposed model


Introduction
The rapid development of the Internet has made life more convenient and produced a large amount of information, causing the problem of information overload. It is difficult for users to select a target product that matches their preferences among hundreds of millions of products. The recommendation system significantly alleviates information overload and improves user experience. However, it is difficult to mitigate the problems of data sparsity and the cold start problem [1][2][3][4] in recommendation systems. In recommendation scenarios, user preferences are influenced by the preferences of friends [5,6]. Based on this hypothesis, researchers have integrated users' social information into the recommendation system as auxiliary information, which can alleviate the problems of data sparseness and cold start problems, thus forming social recommendations. Some researchers have attempted to make social recommendations based on graph embedding learning of heterogeneous networks [7][8][9]. Implicit friends with similar preferences do not have explicit links, but they can be indirectly linked based on what they have interacted with. Implicit friends can be used to mine more reliable information from sparse data, specifically learning node embeddings that can accurately express user preferences. Mainstream social recommendation models use heterogeneous networks to describe user social relations and user-item interaction relations and then use the graph embedding learning method to obtain user/item node representation. Node embedding can express node attributes or relationships between nodes (such as user preferences). For example, IF-BPR [8] designed "user-item-user", "user-user-user", and other multi-meta paths based on domain knowledge to guide a random walk in a heterogeneous network, thus learning high-quality node embedding. MoHINRec [9] used the meta-path of a variety of motif structures (triangular structures that reflect strong connections between nodes) to guide the random walk and obtain more accurate node embedding. In recent years, graph neural networks (GNNs) have achieved considerable success in node classification and link prediction. Owing to their powerful modeling ability in graph relation, GNNs are also applied in the field of recommendation systems. However, there are three challenges in social recommendation based on GNNs: (1) information is not fully mined from existing data as auxiliary information; (2) users' social relationships have limited ability to alleviate data sparsity; and (3) information is transmitted independently in the user-item interaction graph, and the user social network graph and node embedding are formed independently, whereas user-item interactions and user social relationships do not affect user preference simultaneously. To tackle these challenges, we design a social recommendation model based on self-supervision and multi-auxiliary information.
The main contributions of this paper are as follows: • We mine item association relationships, user social relationships, and user-item interaction relationships as auxiliary information to alleviate the problem of data sparsity. Unlike the existing social recommendation models of graph neural networks that independently carry out interactive information and social information dissemination, we design a dissemination mode to make multiple auxiliary information affect the formation of user/item embedding simultaneously.

•
We design self-supervised auxiliary tasks for the social recommendation scenario to improve the node embedding quality and alleviate data sparsity, construct several views according to different combinations of auxiliary information, and maximize the mutual information of node embedding under different perspectives based on contrastive self-supervised learning.

•
We conduct extensive experiments on two public datasets to demonstrate the effectiveness of the proposed model and analyze the benefits of auxiliary information and self-supervised tasks.

Related Work
In this section, we introduce graph neural networks and contrastive self-supervised learning.

Graph Neural Networks (GNNs)
In recent years, GNNs have attracted increasing attention because, owing to their excellent performance on various tasks. Inspired by the success of other fields, such as node classification and link prediction, the applicability of GNNs in recommendation tasks was investigated. In particular, GCN [10] has driven a large number of graph-based neural network recommendation models, such as GCMC [11], NGCF [12], and LightGCN [13]. The basic idea of these GCN-based models is to improve the representation of the target node by aggregating the node representation of the neighbor [14] to obtain higherorder neighbor information in the user-item interaction graph. In addition to these generic models, GNNs support other recommendation methods for specific graphs, such as session and social graphs.
GNNs are often used for information transmission in social networks because information is transmitted in social networks the same was as in GNNs. Researchers naturally transplanted GNN into social recommendation work. GraphRec is the first model to introduce GNN into social recommendation [15]. It learns target node embedding by aggregating first-order neighbor information in the user-item interaction graph and the social network graph. User embedding of DiffNet [10] comes from the social network graph and the user-item interaction graph. It carries out more profound information dissemination in the social network through a multi-layer GNN. Moreover, node embedding in the useritem interaction graph only comes from the first-order neighbor. DiffNet uses a multilayer GNN structure to realize the dynamic propagation of social influence in the social network. Wu et al. [16] proposed a dual-graph attention network for collaborative learning of node embedding influenced by two layers of society. DGRec uses two circulating neural networks to dynamically simulate user behavior and social influence [17]. Yu et al. [18] enhanced social recommendation with adversarial graph convolutional networks to process complicated high-order interactions among users. Later, they [19] improve social recommendation with a multi-channel hypergraph convolutional network to leverage high-order user relations. Huang et al. [20] proposed a knowledge-aware coupled GNN that injects knowledge across items and users into the recommendation. Yang et al. [21] proposed the ConsisRec model to calculate the consistency score between neighbors as the probability of sampling neighbors and further handle the problem of relationship inconsistency through an attention mechanism.

Contrastive Self-Supervised Learning
Self-supervised learning was first proposed in the field of robotics, whereby training data are automatically generated from data of two or more sensors. The principle of selfsupervised learning can be explained a description of complete data based on observation of different aspects or different parts of data. Self-supervised learning enhances data by deforming, intercepting, and disturbing the original data and generates data as pseudolabels of the original data to make up for the data deficiency. Self-supervised learning can be divided into two types: generative self-supervised learning and contrastive self-supervised learning. In this paper, the comparative self-supervised learning method is used to provide more auxiliary information to alleviate the problem of data sparsity and realize a recommendation algorithm.
Contrast learning is a discriminant method with the aim of making the embeddings of similar samples (positive samples) closer to each other in the representation space and the embeddings of different samples (negative samples) farther apart. This method uses a similarity measure to quantify the distance between two inserts (commonly known as cosine similarity). Studies based on contrastive self-supervision have made significant progress, such as SwAV [22], MoCo [23], and SimCLR [24], and their extensions, with performances comparable to those of related models based on supervised learning.
Pseudo-label construction is an essential strategy for embedded learning based on contrastive self-supervised learning. Positive samples and negative samples are essentially pseudo-labels of data to expand training data. The original samples are based on positive/negative samples for supervised learning. The proposed goal of self-supervised learning is to reduce the cost of manual labeling, so the generation of false labels (the selection of positive/negative samples) is an automatic process.
Based on contrastive self-supervised learning, the model needs to design auxiliary tasks to complete contrastive learning and assist with the main tasks of specific scenes to train the model. The training process is as follows:

•
The auxiliary task performs data enhancement based on the original sample. The original and positive/negative samples generate corresponding low-dimensional embedding through the encoder and construct the loss function through contrast learning.

•
The main task generates the corresponding low-dimensional embedding of the original sample through the encoder and constructs the principal loss function through the main task. Finally, by comparing the loss function with the main loss function, higher quality embedding can be obtained.
Contrastive self-supervised learning is also applied flexibly in graph embedding learning (graph contrast learning for short). This work mainly constructs self-supervised signals from graph structures of different perspectives to explore higher-quality graph structure embedding [25,26]. Generally, a new perspective can be obtained through random data enhancement of the same graph. Common data enhancement methods include but are not limited to the random deletion of nodes, random deletion of edges, the random transformation of features or attributes, random walk-based pattern graphs, etc. Inspired by such work, some researchers began to apply graph contrast learning to recommendation tasks [27][28][29][30]. Zhou et al. [30] designed self-supervised auxiliary tasks, specifically adding random embedding masks for item embedding and randomly skipping given items and sub-sequences to carry out pre-training for sequence recommendation. Yao et al. [29] proposed a two-tower network structure based on DNNs (deep neural networks), on which random feature mask and random feature discarding operations were carried out for self-supervised item recommendations. Ma et al. [28] reconstructed short-term future sequences by observing long-term sequences, which essentially mined more self-supervised signals using feature masks. Wu et al. [27] summarized the above random data enhancement operations (random node/edge deletion and random walk) and integrated them into a recommendation framework based on self-supervised graph learning. Long et al. [31] proposed a heterogeneous graph neural network based on meta-relation and used self-supervised learning to guide the interaction between users and items under different views, incorporating the knowledge information of the item and the high-level semantic relationship between users and items into the user representation. Liu et al. [32] designed two new information augmentation methods. Furthermore, they proposed a contrastive self-supervised learning framework, CoSeRec, for sequence recommendation, which alleviates the problems of data sparsity and noisy interaction issues. Wu et al. [33] proposed a social recommendation model that disentangles the collaborative domain and social domain to learn user representations separately and uses cross-domain contrastive learning to further improve the recommendation performance.
Unlike these models, we use item-association-aware contrastive learning in self-supervised learning. We maximize the mutual information of node embedding in two itemassociation-aware views in self-supervised learning; view 1 is constructed by the useritem interaction relationship and the item association relationship, and view 2 is formed by the user-item interaction relationship and the user social relationship. Although the model proposed in [33] also uses cross-domain contrastive learning, the domains are the user-item interaction domain and the social relationship domain, not taking item association into consideration. Our model is designed to utilize more auxiliary information to improve recommendation performance, and self-supervised learning can finetune the embeddings from different kinds of auxiliary information. There are two main differences between our model and SEPT [34]. First, we use item-association-aware contrastive learning in self-supervised learning, whereas SEPT does not take item association into consideration. Second, SEPT focuses more on finding positive samples for contrastive learning and uses tri-training to determine which samples are positive; however, our model does not attempt to identify additional positive samples and treat the same node in different views as a positive sample.

Problem Analysis
Accurately capturing user preferences is the key to improving the recommendation quality of recommendation systems. The recommendation algorithm is committed to learning more accurate user embedding to improve the recommendation performance. The current social recommendation algorithm based on a graph neural network believes that user preference is jointly determined by items that have interacted historically, was well as the influence of social friends. The process of learning user/item embedding in this kind of algorithm is summarized as follows: In the first step, by default, user-item interaction influence and social influence are spread in an independent scope, and information is spread and embedded in the user-item interaction graph and user social network graph. In the second step, there are two different embeddings of user nodes in the two graphs. The two embeddings representing user-item interaction information and social influence are aggregated to form the final embeddings.
Social recommendations based on graph neural networks are subject to the following problems: • They carry out user-item interaction information and social information dissemination in two graphs instead of being affected by both simultaneously (more realistic).

•
As auxiliary information, social relationship alleviates the problem of data sparsity in the recommendation system, but its role is limited, and more auxiliary information is needed.

Methodology
To alleviate the above problems, we propose a social recommendation model based on self-supervised and multi-auxiliary information (SlightGCN). We combine user social relationships, item association relationships, and self-supervised auxiliary signals as auxiliary information to carry out social recommendations. First, the model SlightGCN mines item association relationships based on user-item interaction records and item attributes. Then, a heterogeneous network is constructed, including user social relationships, item association relationships, and user-item interaction relationships. In the heterogeneous network graph containing multi-auxiliary information, information is transmitted and output-embedded through a GCN. The user/item embedding containing rich information is used to recommend the main task. In addition, two heterogeneous networks with different perspectives are constructed according to different combinations of existing auxiliary information, and self-supervised auxiliary tasks are constructed to maximize the mutual information of nodes under different perspectives to obtain higher quality node embedding. Finally, the model is trained through the combination of primary task (recommendation task) loss and auxiliary task (contrastive self-supervised learning).
SlightGCN can be generally divided into three parts: the heterogeneous network construction, main recommendation task, and self-supervision auxiliary task (See Figure 1).

Heterogeneous Network Construction
First, the association relationship is mined from the user-item interaction relationship and item attribute through the meta-path rule to alleviate the data sparsity problem. Secondly, three types of information (user social relationship, user-item interaction relationship, and item association relationship) are integrated into the same heterogeneous network in the form of edges. The construction process of a heterogeneous network is shown in Figure 2. In the process of information dissemination of the recommendation scenario, the user node is influenced by friends and items that have been interacted with, and the item node is influenced by items that are closely connected and users that have been interacted with to learn user/item embedding of higher quality. To realize this information transmission mode, user-item interaction relationship, user social relationship, and item association relationship need to be integrated into the same heterogeneous network in the form of an edge. The user node receives information from friends and historical interaction records through user-item interaction relationships and user social relationships, and the item node receives information from closely related items and users who have interacted with each other through user-item interaction relationships and item association relationships.

Main Tasks
A recommendation algorithm is based on the main task of supervised learning to optimize model parameters. The prediction is established on user embedded ( ) and item-embedded ( ). Generally, the inner product of the embedded is used to predict user's ( ) preference degree ( ) for item , as in (1): Specifically, it assumes the observed interactions as monitoring signals and makes the predicted preference ( ) as close as possible to the real preference ( ). It takes the unobserved data as a negative sample. We take the generally used Bayesian personalized recommendation [35] (BPR) as the loss function for the ranking recommendation task. The core idea is that the target user (u) prefers interactive item (observable data) to uninteractive item (unobserved data). The specific loss function is shown in (2): where = ( , , )|( , ) ∈ , ( , ) ∈ is all training data, is the set of observable interactions, and is the set of unobserved interactions.
The basis of the recommendable task is to learn user/item embedding. Heterogeneous networks can output user/item embedding through a graph encoder. Inspired by the excellent graph convolution neural network recommendation model LightGCN [13], feature transformation, nonlinear transformation, and self-connection operation are redundant for the collaborative filtering recommendation model. We integrate user social relations and item associations to realize that user nodes are affected by social friends and interactive items simultaneously, and item nodes are affected by interactive users and closely connected items simultaneously.
In graph convolution mode, simple weighted aggregation is sampled in the convolution process and feature transformation. The specific convolution operation is shown in (3): where ( ) and ( ) represent the embedment of user and item at the (k + 1) layer, respectively; and the (k + 1) layer embedment is aggregated from the k layer embedment. Taking the formation of ( ) as an example, user 's neighbor, , contains the user's neighbor and item neighbor, and  After the multi-layer convolution, the final node embedding is obtained by average aggregation of each layer embedding of the node, as shown in (4): In the graph matrix representation of the convolution process, the heterogeneous network adjacency matrix ( ∈ ℝ ( )×( ) ) is composed of the user-item interaction matrix ( ) and its transpose ( ), the user social matrix ( ), and the item association matrix ( ). The specific expression of adjacency matrix is shown in (5): User embedding and item embedding constitute the 0-layer embedding matrix, ( ) ∈ ℝ ( )× , where is the embedding dimension, and ( ) is randomly initialized. The matrix form of the convolution operation is shown in (6), where ∈ ℝ ( )×( ) is a diagonal matrix, and the value of is the number of nonzero elements in the ith row of matrix A.

Self-Supervised Auxiliary Tasks
The main view composed of three types of relationships (user-item interaction relationship, user social relationship, and item association relationship) introduced in the previous section helps to recommend the embedding of main task learning nodes. When a designer models a character, it is necessary to observe the character information from different aspects and combine the information from multiple perspectives to build a more accurate character model. In the recommendation scenario, the recommendation algorithm needs to model the user and the item before recommending the user to build a more accurate portrait of the person/item (user/item embedding). The graph neural network plays the role of a model and can generate more accurate node embedding by combining node information from different perspectives.
Based on the main view, we remove the user's social relationship and item association relationship and generate auxiliary view 1 and auxiliary view 2, respectively. In the process of information transmission, the two auxiliary views correspond to the adjacency matrices A1 and A2, respectively, as shown in (8). The convolution of the two auxiliary views is to generate two groups of different embedded nodes (E1 and E2), as shown in (9).
Contrastive self-supervised learning is used to maximize the mutual information of node embedding in different views to learn node embedding of higher quality. Specifically, the node embedding of the same node in different views is a positive sample pair (i.e., ( , )| ∈ ), and the node embedding of different nodes in different views is a negative sample pair (i.e., ( , )| , ∈ , ≠ ). Auxiliary tasks make similar nodes as similar as possible, whereas different sample nodes are embedded as far away as possible. The self-supervised loss function is formed according to the comparative loss, In-foNCE, as shown in (10): where the function (•) is cosine similarity, which is used to measure the distance between two embeddings, and the hyperparameter is the temperature index, which can reduce or amplify the effect of distance between nodes. Finally, as shown in (11), the loss function (ℒ ) of the self-supervision auxiliary task is the sum of the loss function (ℒ ) of the user node and the loss function (ℒ ) of the item node. To improve the recommendation performance, the main recommendation task and self-supervised auxiliary task are combined for training. The loss function of the combined training is shown in (12), where and are the hyperparameters, and is the model parameter.

Experiments
In this study, extensive experiments were carried out on two public datasets to verify the following points: (1) the advantages of SlightGCN in terms of recommendation performance, (2) that SlightGCN effectively alleviates data sparsity and cold start problems; (3) that multi-auxiliary information (users' social relationship, item association relationship, and self-supervised learning signal) plays an essential role in improving the recommendation performance, and (4) the influence of hyperparameters on SlightGCN.

The Datasets
We conducted extensive experiments on two public datasets: DoubanMovie and DoubanBook. Detailed information on the datasets is shown in Table 1, including the number of users, the number of items, the number of user-item interactions, the number of user social relationships, and the density of user-item interactions. DoubanMovie is a movie dataset of the Douban platform, which contains 13,367 users' 1,067,278 viewing behaviors on 12,677 movies and 4085 users' social friend relationships. We select two crucial attributes of film type and director in DoubanMovie to excavate the social relations of objects. DoubanBook is a book dataset of the Douban platform, which contains 792,062 interactions between 13,024 users and 22,347 books and 169,150 social friend relationships between users. Two important attributes of DoubanBook (publisher and author) are selected to mine the social relations. In this study, the dataset is divided into a training set, a validation set, and a test set (70%, 10%, and 20%, respectively). The model was cross-verified ten times, and the average value was taken as the result.

Baselines and Metrics
To verify the recommendation performance of SlightGCN, we select six baseline algorithms, as shown below: • BPR: Bayesian personalized ranking [35], a classical sorting recommendation algorithm. Based on user-item interaction information, it is assumed that the target user prefers interactive items rather than non-interactive items. • SBPR: A classic social recommendation algorithm [36] that integrates social relations to optimize the item preference priority of target users based on BPR. • DiffNet: A social recommendation algorithm based on a graph neural network [37] that simulates the social influence of dynamic propagation in user social networks. • LightGCN: A recommendation algorithm based on a graph convolutional neural network [13] that learns node embedding in a simple convolution mode suitable for collaborative filtering. • SGL: A recommendation algorithm based on self-supervised learning and a graph convolution neural network [27] that creates multiple views by randomly changing the graph structure to improve the embedding quality. • SEPT: A graph convolution neural network recommendation algorithm based on self-supervised learning and collaborative training [34] that finds more suitable positive/negative samples for self-supervised learning through collaborative training to learn more accurate embedding.
where u is a user; U is the set of all users; R(u) denotes recommended items for user u; T(u) is user u's real liked items; reli represents item i's relevance score, which can be predefined; and iDCG is the DCG of the ideal order for the recommendation set.
In the experiments, the depth of LightGCN, SGL, and SlightGCN for information propagation is three, and that of SEPT is two because SEPT can achieve the best performance in that case [34].

Overall Comparison
We compare the proposed model, SlightGCN, with six baselines on DoubanMovie and DoubanBook. The advantages and disadvantages of each model are shown by comparing the four metrics in top-10 and top-20 recommendations. Based on the experimental results in Tables 2 and 3 (the bold numbers mean the best performance), the following conclusions can be drawn: 1. SlightGCN's recommendation performance is significantly better than that of the other five baseline models on the two datasets. Specifically, the evaluation metrics of SlightGCN on DoubanMovie and DoubanBook improved by 1.84-4.26% and 2.08−3.30%, respectively, compared with the suboptimal model. In addition, the interactive data density of the DoubanMovie dataset is higher than that of Dou-banBook. Thus, the recommendation performance of all models on the DoubanMovie dataset is significantly better than that of DoubanBook. 2. The results show that SBPR is superior to BPR in some indicators, indicating that SBPR can alleviate data sparsity to a certain extent by integrating users' social relationships based on BPR. SBPR's simple approach of integrating users' social relationships by directly optimizing the order of items does not accurately simulate the impact of social relationships on users' preferences. DiffNet enables the influence of friends' preferences on users to propagate dynamically in social networks through multi-layer graph neural network simulation. The performance of DiffNet is better than that of SBPR owing to the ability of the graph neural network to capture graph relation and the dynamic propagation of social influence. LightGCN's simple convolution mode makes it more suitable for collaborative filtering, and its recommendation performance is significantly higher than that of the previous recommendation model. Based on LightGCN as a graph encoder, collaborative training is used to find more suitable positive and negative samples for self-supervised learning, and SEPT improves the recommendation performance based on LightGCN. Using LightGCN as the graph encoder and designing the self-supervised auxiliary task based on random graph structural disturbance, the SGL recommendation performance was considerably improved. 3. SlightGCN performs best on both datasets. SlightGCN has the following advantages over other models. First, it not only uses user-item interaction relationships and usersocial relationships but also mines item association relationships from existing information to form multi-auxiliary information. Second, a more appropriate convolution mode is designed so that users are affected by both user-item interaction relationships and user-social relationships, and objects are affected by both user-item interaction relationships and item association relationships. Third, multi-view self-supervised auxiliary tasks are designed to learn more accurate user/item embedding.

Cold-Start User Experiment
Cold-start users have few or no interactions, and extremely sparse data make it difficult to generate recommend for such users. In this section, users with fewer than 10 interactions are selected as cold-start users; the recommendation results for these users are shown in Figures 3 and 4.  The recommendation performance of SBPR is better than that of BPR, indicating that users' social relationships alleviate the problem of data sparsity to some extent. For the other four recommendation models based on GNN, the recommendation performance increases in the order of DiffNet, LightGCN, SEPT, SGL, and SlightGCN. LightGCN, which only uses user-item interaction relationships, achieves a better recommendation performance than DiffNet, which uses the social relationship as auxiliary information. Its simple and efficient convolution structure is more suitable for collaborative filtering. SEPT uses collaborative training to find positive and negative samples for self-supervised learning. SGL is based on the structure of a random disturbance graph for data enhancement, and the generated self-supervised signal alleviates data sparsity and improves the embedding quality. The proposed SlightGCN model incorporates multiple pieces of auxiliary information, such as user social relationship and item association relationship, as well as various perspectives of self-supervised auxiliary task construction, all of which alleviate the problem of data sparsity to a certain extent. In addition, user/item embedding is influenced by multiple relationships simultaneously, improving the capture of user preferences.

The Benefit of Self-Supervised Auxiliary Tasks
To verify that user social relationship, item association relationship, and self-supervised auxiliary tasks can help improve the recommendation performance of SlightGCN, we designed an ablation experiment as follows.
To verify the effectiveness of self-supervised auxiliary tasks, the self-supervised auxiliary tasks were removed to form the UI model variant. In addition, to verify the benefits of user social relationship and item association relationship to improve the model performance, user-item interaction relationship and user social relationship are reserved to form the U model variant, and user-item interaction relationship and item association relationship) are reserved to form the I model variant. The experimental results of SlightGCN and its three deformation models on the two datasets are shown in Tables 4 and 5 (the bold numbers mean the best performance). Compared with LightGCN, variant UI, variant U, and variant I, the three deformation models all improved to a certain extent, indicating that user social relationship and item association relationship can improve the recommendation performance by alleviating data sparsity. However, there is no significant difference in the recommendation performance between variant UI containing two auxiliary relationships and variant U and variant I, containing only one auxiliary relationship, indicating that the user social relationship and item association relationship cannot be well-integrated. SlightGCN removes the user social relationship and the item social relationship to construct different auxiliary views. Contrastive self-supervised learning maximizes the mutual information of the two views and promotes the integration of user social relationships and item association relationships. SlightGCN's recommendation performance is superior to that of variant UI, which does not include self-supervised assistance tasks.

Parameter Sensitive Analysis
In the training process of the model, it is constrained by the main recommendation task and the self-supervised auxiliary task. The self-supervised task assists the recommendation task to guide the updating direction of model parameters. In the loss function of the joint training, the hyperparameter λ1 is the weight coefficient of the loss of the selfsupervised auxiliary task, and λ1 can control the influence of the self-supervised auxiliary task on the overall recommendation performance of the model. The training process of the parameter λ1 ambassador model is more dependent on the information of the selfsupervised auxiliary task and less dependent on the information of the self-supervised auxiliary task. The experiments of hyperparameter λ1 on dataset DoubanMovie and dataset DoubanBook are shown in Figures 5 and 6, respectively. In DoubanMovie, the overall trend of the values of the four metrics is consistent, increasing first and then decreasing with increased λ1, with the recommendation performance reaching its peak at λ1 = 0.009. In DoubanBook, the overall variation trend of the four metrics is consistent, increasing with increased inλ1, with the recommendation performance reaching its peak at λ1 = 0.05. Two conclusions can be drawn from observations:

•
The optimal λ1 = 0.009 on DoubanMovie is far less than the optimal λ1 = 0.05 on Dou-banBook, indicating that the model relies much more on self-supervised auxiliary tasks on DoubanBook than on DoubanMovie. The reasons for the above phenomenon are as follows: the user-item interaction density on DoubanBook is lower than that on DoubanMovie. To learn higher-quality node embedding and improve recommendation performance, the model in the DoubanBook environment needs to obtain more information from self-supervised auxiliary tasks.

•
The improvement of the four metrics in DoubanBook with sparse data is much more significant than that in DoubanMovie, indicating the effectiveness of self-supervised auxiliary tasks in alleviating the data sparsity problem.

Conclusions
In this paper, we propose a social recommendation model based on self-supervision and multi-auxiliary information (SlightGCN). The social relationships between users and the related relationships between items are mined from user-item interaction data and item attribute information. The relationships between items and users can be used as multiple auxiliary information to alleviate data sparsity. The user-item interaction relationship, user social relationship, and item associate relationship are integrated into the same heterogeneous network in the form of an edge. A heterogeneous network is used as the input of a graph convolutional neural network to carry out the main recommendation task, and an appropriate convolutional mode is designed; user embedding is simultaneously affected by user social relationship and user-item interaction relationship, and item embedding is simultaneously affected by item association relationship and user-item interaction relationship. Auxiliary views are constructed according to different combinations of information, and self-supervised auxiliary tasks of multi-auxiliary views are designed to improve the embedding quality of nodes and recommendation performance. Extensive experiments were conducted on two public datasets to verify the superiority of SlightGCN in terms of recommendation performance, the effectiveness of multi-auxiliary information, and self-supervised auxiliary tasks.
Many data argumentation methods for comparative self-supervised learning are available in the GNN-based method, such as randomly removing edges or nodes, randomly adding edges or nodes, removing/adding edges or nodes on purpose, and even rewiring the graph. Further research is required with deep analysis of the effectiveness of these data argumentation methods within the framework the proposed SlightGCN model.