Popularity-Debiased Graph Self-Supervised for Recommendation

: The rise of graph neural networks has greatly contributed to the development of recommendation systems, and self-supervised learning has emerged as one of the most important approaches to address sparse interaction data. However, existing methods mostly focus on the recommendation’s accuracy while neglecting the role of recommended item diversity in enhancing user interest and merchant benefits. The reason for this phenomenon is mainly due to the bias of popular items, which makes the long-tail items (account for a large proportion) be neglected. How to mitigate the bias caused by item popularity has become one of the hot topics in current research. To address the above problems, we propose a Popularity-Debiased Graph Self-Supervised for Recommendation (PDGS). Specifically, we apply a penalty constraint on item popularity during the data enhancement process on the user–item interaction graph to eliminate the inherent popularity bias. We generate item similarity graphs with the popularity bias removed to construct a self-supervised learning task under multiple views, and we design model optimization strategies from the perspectives of popular items and long-tail items to generate recommendation lists. We conduct a large number of comparison experiments, as well as ablation experiments, on three public datasets to verify the effectiveness and the superiority of the model in balancing recommendation accuracy and diversity.


Introduction
In recent years, recommendation systems (RS) have emerged in order to mitigate the effect of information overload.Due to the advantages in improving platform effectiveness and user satisfaction, recommendation systems are widely applied in various industries.Recommendation systems aim to mine user preferences based on the observed interaction data and provide personalized services/items to users.
With the rise of deep learning, researchers have attempted to model users' and items' representations in recommendation data with graph neural networks.However, these approaches heavily rely on sufficient interaction data [1,2], making them insufficient for addressing data sparsity, noise, etc., in recommendations.Self-supervised learning techniques have been successfully applied in the field of recommendation systems and proved to be effective in alleviating data sparsity issues.Although these methods have shown significant improvements in recommendation accuracy, they have not fully addressed the inherent popularity bias in data augmentation.
In addition, most models focus on recommendation accuracy while neglecting the novelty and diversity of recommended items.At the same time, the unbalanced nature of the observed data makes recommendation systems vulnerable to popularity bias [3].Specifically, in the observed user-item interactions, users tend to choose the items with high popularity, so the absence of click data does not necessarily imply negative feedback from users [4].As a result, models tend to recommend popular items to achieve higher recommendation accuracy, which results in a loss of recommendation diversity.This not only affects users' personalized experience but also affects the potential revenue of item providers, which clearly does not fulfill the requirements of personalized recommendations.The above problems motivate us to safeguard the trade-off between overall recommendation accuracy and debiased recommendation while solving the problems caused by data sparsity and interaction imbalance.
There is significant literature on popularity bias research, which can be mainly categorized into three main types [5]: (1) Data-level, where the reverse propensity score methods adjust the data distribution by reducing the weights of popular items during training [6].
(2) Loss-function-level, where objective approaches balance the popular and long-tail items in recommendation results by increasing the loss function [7].(3) Model-level, where causal inference approaches utilize counterfactual reasoning to predict user interactions [8].Although effective, we argue that existing popularity-debiasing methods mostly ignore highly sparse user behavioral data, hindering encoding representation capability.
To address these challenges, we propose PDGS, a popularity-debiased graph selfsupervised recommendation algorithm.Specifically, we first define penalty weights for popular and long-tail items based on their popularity, calculate the similarity between items after removing popularity bias, and then construct a popularity-bias-free item similarity graph.We then utilize this graph as an augmented view of the user-item collaboration graph for contrastive learning, thereby alleviating the scarcity of labeled data.Finally, we construct optimization functions from the perspectives of popular and long-tail items to increase the exposure of long-tail items.In summary, our work makes the following contributions: (1) We propose a popularity-debiased graph supervised recommendation model (PDGS).
We design penalty constraints for items based on their popularity.This graph serves as an augmented view that participates in contrastive learning with the collaborative graph, which compensates for the defect of long-tail items that are less/unrecommended due to exposure limitations.(2) We improve the supervised learning recommendation task by considering both popular items and long-tail items and optimize the self-supervised learning task and recommendation task with multitask joint training to achieve end-to-end training of the model to alleviate data sparsity while reducing the impact of popularity bias on model learning, thereby improving recommendation diversity and enhancing user experience.(3) We validate the effectiveness of our model through comparative experiments and ablation experiments on three real-world datasets.

Related Work
In this section, we review the work relevant to our paper, focusing on two aspects: popularity bias in recommendation systems and self-supervised learning.

Popularity Bias for Recommendation
Due to the higher attention given to popular items, recommendation systems tend to assign higher rankings to popular items, leading to the problem of popularity bias.On one hand, recommendation systems that ignore popularity and focus solely on data fitting hinder the accuracy, diversity, and novelty of results.This reduces users' chances of discovering niche products, diminishes user experience, and can result in decreased benefits for service providers [9].On the other hand, popularity bias leads to the Matthew effect, which leaves more items with low popularity unattended, and the market is occupied by a few high-popularity items [10][11][12], resulting in the homogenization of different user groups [13].However, compared with popular items, it is more meaningful to recommend more diverse long-tail items for users.Therefore, research on mitigating popularity bias is necessary.Currently, there are several approaches to alleviate popularity bias in recommendation systems: (1) Ranking adjustment.This approach aims to improve the recommendation scores of unpopular items to achieve more balanced recommendations.For example, the IPL [14] introduced a regularization debiasing model based on the proportional interaction between popular and unpopular items and the number of users who like them to obtain unbiased recommendations.(2) Causal inference.This approach analyzes the variables affected by popularity bias during the recommendation process by means of causal graphs or probabilistic derivation so as to implement bias operations.For instance, the DCCL [15] disentangled the cause of clicks into interest and conformity.It directly learned the causal embedding of decoupled users and items on the historical click data, resulting in final recommendations that simultaneously take into account both user interests and conformity.
(3) Popularity penalty.This approach aims to measure the similarity between different items for recommendation.For example, Zhang et al. [16] proposed a debiasing method based on popularity and dynamic interest changes.It defined a popularity penalty function based on the difference in popularity between different items to alleviate the high similarity issue of popular items.Additionally, a time-decay function was defined based on users' behavior characteristics at different times to eliminate popularity bias in historical data.

Self-Supervised Learning for Recommendation
Self-supervised learning [17] is an emerging paradigm in machine learning.With self-supervised learning, models make full use of relevant information to assist their main task.One branch of self-supervised learning is maximizing mutual information [18][19][20], which has achieved significant progress in computer vision [21], audio processing [22,23], natural language understanding [24], and so on.There are existing works that integrate self-supervised learning with recommendation systems.For example, Zhou et al. [25] leveraged the relevance of contextual information as self-supervision signals in sequence recommendation to maximize the mutual information between attributes, items, and sequence views.Ma et al. [26] maximized the mutual information among items in different temporal sequences.Zou et al. [27] comprehensively considered the semantic and structural relationships between nodes to generate multiple views and proposed a multilevel crossview contrastive learning mechanism, which achieved local-level contrastive learning between collaborative views and semantic views, as well as global-level contrastive learning between global views and local views.

Preliminaries
We first introduce the concepts used in this paper and provide corresponding explanations.User-Item Graph: Given the sets of M users and N items defined as respectively, the interactions between the users and the items are denoted as Y ∈ R M×N .Where y ui = 1, it indicates that there is interaction (e.g., click, purchase, favorite) between the user u and the item i, and vice versa.Therefore, the user-item interaction graph is defined as G r = {(u, y ui , i)|u ∈ U, i ∈ I, y ui ∈ Y}.
Popularity-Debiased Item Similarity Graph: It is represented as G c = {i, S(i, j), j|i, j ∈ I}.Where, i, j represent the item nodes belonging to G c , we define the item nodes set as N c .If there is a strong correlation between item i and j, then we set S(i, j) = 1, indicating that there is an edge between item i and item j in graph G c .
Item Popularity: This reflects the overall popularity of items [28], which is often defined as a specific value in research, such as the number of clicks, comments, or other user-related data regarding items.

The Proposed Methodology
In this section, we introduce the detailed technical design of our proposed PDGS in Figure 1.Firstly, we construct the inputs for the model, which are the user-item interaction graph (collaborative graph) G r , built based on user-item interaction data, and the item similarity graph G c , obtained by removing the items' popularity bias.We employ the classical graph neural network LightGCN [29] to learn the node representations, yielding collaborative embeddings for items e r i and popularity-debiased embeddings for items e c i .Then, we leverage the self-supervision signals provided by the interactions between multiple views to construct contrastive learning tasks between the views.Finally, we jointly optimize the self-supervised learning tasks based on popularity debiasing and the recommendation task, creating an end-to-end optimization strategy.

Popularity-Debiased Item Similarity Graph
User-item interaction behaviors are often influenced by popularity items.Over time, users lose their freshness for uninteracted items, resulting in user churn and incurring incalculable losses.In order to extract the real user preferences and to mitigate the adverse effects caused by the data imbalance during the modeling process of the user-item interaction graph G r , in this section, the item popularity and the popularity difference between items are used to penalize the popularity of the items that users have interacted with.Additionally, a sampling strategy is designed on the basis of the similarity of the penalized weights to generate the popularity-debiased item similarity graph G c .The following section describes in detail the process of constructing graph G c .
Based on the definition of item popularity, its formal expression is denoted as Pop i = |numU i |.For computational convenience, the item popularity is normalized f (Pop i ) ∈ (0, 1).The formalization process is calculated as follows: where nor_Pop i represents the normalized popularity of item i, and min_Pop and max_Pop denote the minimum and the maximum value of item popularity, respectively.Then, the popularity difference Pop_Bias i,j between item i and item j is calculated by The smaller the popularity difference Pop_Bias i,j , the more similar the popularity between item i and item j, indicating a higher probability of co-occurrence in the candidate recommendation list, and conversely, the the co-occurrence probability becomes smaller.Therefore, influenced by the items' popularity, the recommendation results are likely to not match the real interests of users.Based on this, in order to generate effective data augmentation views for the user-item interaction graph G r and provide self-supervision signals for learning users' more genuine preferences, it is necessary to consider the rationality of setting the penalty weights for item similarity based on both item popularity and the popularity differences between items.
Statistical analyses of item popularity in recommendation data show an "80-20 rule" between item popularity and the number of interactions counted.It describes that for the interaction data of recommendation systems, a few popular items (approximately 20%) typically account for a large portion of sales (approximately 80%), while the majority of long-tail items (approximately 80%) tend to be unknown due to low exposure, resulting in lower sales.This concept first originated in the field of economics.The reason for this phenomenon can be attributed to the fact that items with high popularity contribute less to similarity.Therefore, a certain penalty is applied when calculating the similarity of popular items.Additionally, due to the higher probability of co-occurrence between items with smaller popularity differences, a corresponding penalty is applied when calculating their similarity.Taking the above analysis as a basis, the penalty weights when calculating the similarity between item i and item j are formalized as where w i and w j are the penalty weights for item i and item j, respectively.α is the popularity threshold set based on the 80-20 rule.Specifically, the minimum value of item popularity in the top 20% of item popularity is set as the popularity threshold.By introducing penalty weights into the calculation of item similarity based on the Pearson correlation coefficient, the similarity score between two items can be computed as shown: where N i , N j represent the set of users who interacted with item i and item j, respectively, N i ∩ N j represents the users who have interacted with both item i and item j, and y i and y j indicate the mean values of ratings of item i and item j, respectively.Finally, the k c items with the highest relevance are kept for each target item node to construct the popularity-debiased item similarity graph G c = {(i, S(i, j), j)|i, j ∈ I}.In this graph, (i, j) denotes a high similarity between item i and item j, even after applying penalties to their popularity.The purpose of the item node sampling strategy described above is to ensure that the filtered nodes have a strong correlation with the target nodes, and this correlation is independent of popularity.Additionally, this strategy avoids the adverse effects of randomly selecting long-tail items on the model.This sampling strategy takes user-item interaction data as input and generates a popularity-debiased item similarity graph as output.To provide a better description of the process of the debiased sampling strategy, the pseudocode for this sampling strategy is presented as Algorithm 1.

Feature Extraction of Items from Multiple Views
To explore and extract more comprehensive features from item nodes, graph encoders are built for the collaborative graph G r and the popularity-debiased item similarity graph G c , respectively.The general form of the graph encoder structure is defined as follows: where H denotes the graph encoder for information aggregation, and G ∈ {G r , G c }.We define H r and H c as graph encoders for graph G r and graph G c , respectively.E (l) and E (l−1) are the node embedding in the l th and (l − 1) th layers, respectively.When l = 0, E (0) represents the initial node embeddings.
Taking H c as an example, the embedding of a specific node on H c are denoted as e c(l) i , in which the embedding of the l th layer is obtained by aggregating the layer (l − 1) th embeddings of its neighboring node.The calculation is given by the following expression: where N c i , N c j denote the sets of neighbors of item i and item j in the popularity-debiased item similarity graph G c , respectively.Then, through the stacking of information aggregation layers, the representations of each layer, denoted as e c(1) i i , can be obtained by iteratively applying the aggregation Equation (7) to the initial embedding e c(0) i .Finally, the embedding e c i for item i is obtained by weighted summation, which is calculated by where L denotes the total number of layers in the graph neural network information aggregation, and α i represents the weight parameter of the embedding in the l th layer.In our experiments, we set α i = 1/(L + 1).
Similarly, the collaborative embedding of item i, denoted as e r i , can be obtained by H r from the collaborative graph G r .In addition, H r also serves the function of generating the user embedding e r u required for the recommendation task, where e r u , e r i , e c i ∈ R d .

Constructing Self-Supervised Learning Tasks Based on Multiple Views
Embedding in different views focuses on containing information about different aspects of an item.Based on this, we learn self-supervised signals from others' views to guide their own supervised information.Based on this idea, a self-supervised learning task between multiple views is constructed.
To generate extra self-supervised signals, we first create data augmentation views by applying edge dropout, a widely used graph data augmentation technique, to both graph G r and graph G c .Specifically, during each iteration of the aggregation process in LightGCN, we randomly drop edges from graphs G r and G c with a certain probability ρ, thereby constructing data augmentation views and building an unlabeled sample set Ẽ.
ρ is a trainable hyperparameter.This makes it easier for the model to identify influential nodes in the augmentation view and reduce the sensitivity of the node representations to structural changes.The formulations are shown as where We use H c to learn the embedding e c i of item i from the popularity-debiased item similarity graph G c , which contains information about items that are similar to it after removing popularity bias.Obviously, e c i can provide supplementary information for the embedding of item i in collaborative graph G r .We can use it to predict the self-supervised signal for item i in the bias-removed item similarity graph G c , aiming to reduce the influence of item popularity in user-item interactions.The probability calculation formula y r i+ for the self-supervised signal in the popularity-debiased item similarity graph, using the node embeddings from the unlabeled sample set E as follows: where ⟨•⟩ indicates the inner product, and σ is the Softmax function.To generate selfsupervised signals that better align with the user's true interests, we select the Top-K items with the highest ratings from the unlabeled sample set as the self-supervised signal P r i+ for the collaborative graph G r .The calculation is as follows: Similarly, the self-supervised signal set P c i+ of the popularity-biased item similarity graph G c is obtained by the following: Finally, the model maximizes the similarity between the embeddings of items in different views and their corresponding self-supervised signals, while minimizing the similarity between item embeddings and the unlabeled sample set.The unlabeled sample set is generated from the data augmentation views created by the views themselves.This is achieved by maximizing the mutual information between item node embeddings and self-supervised signals.Based on this, the self-supervised task loss function is constructed as follows: ψ(e v i , ẽp ) = exp(cos(e v i • ẽp )/τ) (16) where v ∈ {r, c} denotes the view set, r indicates the user-item interaction graph G r , and c indicates the popularity-debiased item similarity graph G c .e v i represents the item embedding of view v, P v i+ denotes the self-supervised signals set generated by the prediction of view v, ẽp indicates the item embedding from the self-supervised signal set, and j ∈ I/P v i+ denotes the other item embedding in the unlabeled sample set.

Popularity-Aware Multitask Learning Strategy
The objective of PDGS is to predict the preference ratings of each user for candidate items.We employ the inner product to measure the similarity between users and candidate items, which serves as the calculation method for the prediction function, as shown During the model training process, the widely used BPR method will be influenced by item popularity when conducting negative sampling.Specifically, there is a higher probability of including long-tail items as negative samples in the model training, leading to a higher probability of the model learning user preferences towards popular items.This phenomenon makes popular items more and more popular, resulting in serious issues such as the Matthew Effect [30,31], echo chamber [32], and filter bubble [33,34].Therefore, PDGS divides items into popular and long-tail sets using a popularity threshold α.Then, we construct popularity-aware BPR optimization objectives from the perspectives of both popular and long-tail items.By integrating these two objectives, the final optimization function for the recommendation task is obtained as follows: where σ is the Sigmoid function, O p = {(u, i, j)|(u, i) ∈ O + p , (u, j) ∈ O − p } represents that both the positive sample i and the negative sample j are from the popular items set, while up , (u, j) ∈ O − up } represents that both the positive sample i and the negative sample j are from the long-tail items set.
Finally, to leverage self-supervised learning to enhance the model's ability to improve recommendation diversity, PDGS adopts a joint strategy by simultaneously training the recommendation task and the self-supervised learning task.The model's loss function is computed as follows: where L r and L ssl are the loss functions of the recommendation task and the self-supervised task, respectively.β and λ are hyperparameters that control the scale of self-supervised learning and the strength of regularization, respectively.Θ = {e r u , e r i , e c i } represents the parameters to be learned by the model.

Complexity of PDGS
The time complexity of PDGS mainly comes from the graph encoder, self-supervised signal prediction, and multiview self-supervised learning.Firstly, the time complexity of the graph encoder can be represented as |G|d, where |G| represents the scale of the graph structure learned by the graph encoder, and d represents the embedding dimension of entities in the model.Since PDGS consists of two graph encoders used for learning the entity embeddings of the user-item interaction graph G r and the popularity-debiased item similarity graph G c , the time complexity from the graph encoder is denoted as Secondly, the time complexity of self-supervised signal prediction is represented as O (xlog(K)) , where x denotes the number of randomly selected unlabeled samples in each training batch, and K represents the number of self-supervised signals to be predicted from the unlabeled sample set.Finally, as the model uses a shared graph encoder for the joint optimization of self-supervised learning and the recommendation task, the time complexity of the multiview self-supervised learning task mainly comes from the self-supervised signals between views and the contrastive learning of item entities.This part of the time complexity can be represented as O(xd).

Experiment
In this section, we conduct extensive experiments on three real datasets to evaluate the performance of our proposed model.Our experiment aims to answer the following research questions: RQ1: Does the model outperform existing baseline methods?RQ2: How can the different components in our framework improve performance?RQ3: How do different hyperparameter settings affect recommendation performance?

Dataset Description
To validate the effectiveness of the model, the model performance is evaluated on three different datasets from diverse domains with varying scales and sparsity.MovieLens-1M is a movie recommendation dataset that includes user ratings for movies on a scale of 1-5.Book-Crossing is a book recommendation dataset that includes user ratings for books on a scale of 1-10.Although both of these datasets are explicit feedback datasets, we intentionally chose them to study the performance of learning from implicit feedback.To achieve this, we transform them into implicit data, where each item is labeled as 0 or 1, indicating whether the user has rated the item or not.Last-FM is a music recommendation dataset that contains users' one-year listening history on the Last.fmwebsite.Table 1 summarizes the statistical information of these datasets.

Evaluation Protocol
To evaluate the accuracy of the model, common metrics such as Recall and Normalized Discounted Cumulative Gain (NDCG) are used.Additionally, to verify the impact of reducing the popularity bias on the model, evaluation metrics that measure the diversity of the model's recommendation results are used: Coverage and Novelty.
Coverage (Cov@K) [35] is used to measure the proportion of items covered in the item space by the recommendation results.Its expression is given by where U is the user set, I is the item set, and R u @K denotes the Top-K recommendation list of user u.Novelty (Tail@K) [36] is used to measure the percentage of long-tail items in the Top-K recommendation list for all users, and its expression is shown as follows where U is the user set, I up is the long-tail item set, and R u @K denotes the Top-K recommendation list of user u.

Baselines
To validate the effectiveness of the proposed model PDGS, we compare PDGS with the following baselines in our experiments: -NeuMF [37]: It combines Generalized Matrix Factorization and MultiLayer Perceptron to extract low-dimensional and high-dimensional features simultaneously.-NGCF [38]: It utilizes graph neural networks to model high-order connectivity information and capture collaborative information between nodes.-LightGCN [29]: It designs a lightweight graph convolution operation that simplifies model design to a large extent, which includes the most important components in GCN for recommendation.-SGL [39]: It designs multiple data augmentation methods to construct a comparative learning task to learn node representations with the help of mutual information maximization idea.-MCCLK [27]: It generates global-, local-, and semantic-level contrastive views, constructs contrastive learning tasks, and explores comprehensive graph features and structural information in a self-supervised manner.

Performance Comparison with Baselines
For the proposed model PDGS, we use the Adam optimizer to learn the model.For each dataset, 80% is randomly selected as the training set, and the remaining 20% is divided equally into the validation set and the test set.We use 5-fold cross-validation with Recall@20 as the validation metric.The detailed hyperparameters settings of the models can be found in Table 2. Figure 2 illustrates the recommendation performance of the comparative models on Top-10 and Top-20 recommendation tasks in terms of four evaluation protocols.Overall, we proposed PDGS model outperforms all baselines in terms of the evaluation metrics across all three datasets.The detailed improvements in model performance are shown in Table 3.
Electronics 2024, 1, 0 11 of as the validation metric.The detailed hyperparameters settings of the models can be foun in Table 2.   3.    results in terms of recommendation accuracy compared with all baselines, indicating that the PDGS algorithm does not trade off the loss of accuracy for the diversity of recommended items.From a practical perspective, solely improving the diversity of item recommendations without considering that recommendation accuracy loses the significance of personalized recommendation.Our proposed PDGS model can effectively balance the dilemma between recommendation accuracy and diversity, fully explore the uninteracted items related to users' interests, and improve the performance of the recommendation model as a whole.(3) Compared with the self-supervised recommendation models SGL and MCCLK, our proposed PDGS achieves optimal performance in recommendation diversity evaluation metrics.The method of data augmentation of the user-item interaction graph from the perspective of popularity debiased item similarity is illustrated, which takes into account user preferences while eliminating the influence of popularity bias, making the generated self-supervised signals more in line with the real situation and allowing more long-tail items to be covered in the recommendation lists, thus effectively reducing the problem of popularity bias that exists in the original user-item historical interaction dataset.

Ablation Study of PDGS
To further validate the effectiveness of the model components in improving recommendation performance, we designed two variants of PDGS.One is to replace the popularity-debiased item graph with an item similarity graph to participate in contrastive learning with the collaborative graph, denoted as PDGS-NC.The other is to replace the loss function designed for popular items and long-tail items with the traditional BPR loss function, denoted as PDGS-BPR.The hyperparameter settings of the variant models remain consistent with PDGS.The results of the ablation experiments on three datasets are shown in Table 4, with the best performance indicated in bold.
From the experimental results in Table 4, it can be observed that PDGS-NC exhibits a certain decrease in both the Cov@10 and Tail@10 evaluation metrics compared with PDGS.This indicates that the data-enhanced view learning without adding the popularity restriction in the self-supervised learning task will lead the model to reduce attention to long-tail items.Consequently, the representation learning of long-tail items and the generation of self-supervised signals that are closer to real samples are restricted, resulting in poorer performance in terms of coverage (Cov@10) and novelty (Tail@10).

Dataset Metric PDGS-NC PDGS-BPR PDGS
Furthermore, comparing PDGS and PDGS-BPR, when the recommendation task loss function is changed to the traditional BPR loss function, there is a certain improvement in accuracy metrics (Recall@10 and NDCG@10), but diversity metrics (Cov@10 and Tail@10) experience a significant decrease.This is because the BPR loss function tends to focus on popular items that have been interacted with more frequently in the user's history.The model learns more about the items that the user has previously engaged with, leading to improved recommendation accuracy.However, due to the interference of popularity bias, it results in lower coverage (Cov@10) and novelty (Tail@10).From the comparative results, it can be observed that directly using the BPR loss function exacerbates the influence of popularity bias.On the other hand, the recommendation task loss function that considers both popular and long-tail items alleviates this issue by balancing the impact of popularity and users' true interests in item selection.In order to investigate the effect of the β value on model performance, we fixed the other two parameters and set L = 2 on the three datasets, and the number of self-supervised signals on the three datasets is set to K = 15, 30, and 40, respectively.From Figure 3, it can be seen that as the value of β increases, and the indicators for measuring model accuracy, Recall, and NDCG show relatively minor fluctuations.However, when β > 0.01, the model's performance exhibits a decreasing trend on both the MovieLens-1M and Last-FM datasets.On the Book-Crossing dataset, the decreasing trend begins at β > 0.005.Meanwhile, as the value of β gradually increases, the metrics evaluating recommendation diversity, Cov and Tail, initially increase and then decrease across all three datasets.Overall, when the value of β is small, the self-supervised task serving as an auxiliary task improves recommendation diversity without significantly affecting the recommendation accuracy.However, as the value of β continues to increase, the model's performance is affected to varying degrees in terms of accuracy and diversity.Taking all factors into consideration, we set the self-supervised learning weight coefficient β as 0.01, 0.002, and 0.005 for the MovieLens-1M, Last-FM, and BookCrossing datasets, respectively.

Impact of the Number of Hyperparameters
In order to explore the effect of the number of self-supervised signals K on model performance, we set L = 2 for all three datasets, with β values set as 0.01, 0.002, and 0.005, respectively.From Figure 4, it can be observed that as K increases, the performance metrics evaluating recommendation accuracy, Recall, and NDCG exhibit an initially increasing and then decreasing trend on the MovieLens-1M and Last-FM datasets.However, the change is minimal on the BookCrossing dataset, indicating that the variation in K has little impact on this dataset.Additionally, the metrics evaluating recommendation diversity, Cov and Tail, show varying degrees of improvement across all three datasets, suggesting that higher values of K positively encourage diverse recommendations.A reasonable value of K can effectively alleviate the issue of data sparsity and facilitate better learning of user and item embeddings by the model.However, it should be noted that self-supervised signals are predicted based on confidence, and as K increases, more noise is introduced, leading to a decrease in the model's recommendation accuracy.Taking all of these factors into consideration, we set the number of self-supervised signals K to 100 for all three datasets in PDGS.To investigate the effect of the parameter L on the model, we fixed β and K on three datasets as β = 0.01, 0.002, 0.005 and K = 15, 30, 40.From Figure 5, it can be observed that as L increases from 1 to 2, the evaluation metrics for recommendation accuracy, Recall, and NDCG show a significant improvement on the MovieLens-1M and Last-FM datasets, while they remain relatively unchanged on the Book-Crossing dataset.However, as L continues to increase from 2 to 5, Recall and NDCG exhibit different degrees of decline on all datasets.This indicates that a large number of network layers increases the complexity of the model, causing the learned node representations to become more homogeneous, which leads to a decrease in model performance.Similarly, when L increases from 1 to 2, the metrics evaluating recommendation diversity, Cov and Tail, show a noticeable improvement across all three datasets.However, when L > 2, the growth in the metrics evaluating recommendation diversity becomes less significant.Balancing recommendation accuracy and diversity is crucial in recommendation systems.Taking all factors into consideration, we set the number of layers in the graph encoder L = 2 in the PDGS model to encourage the discovery of more long-tail items among the unobserved items for users.

Conclusions
In view of the fact that data sparseness and long-tail features in recommendation systems lead to insufficient diversity of recommended items, which negatively impacts users' satisfaction and merchants' revenue, we proposed a popularity-debiased graph self-supervised recommendation model PDGS.For popular items and long-tail items, we designed corresponding penalty functions, constructed an item similarity graph that removes the popularity bias, and conducted comparative learning with the collaboration graph to alleviate the sparsity of the data.In addition, we designed optimization functions for popular items and long-tail items, respectively, and built a multitask learning strategy to generate an end-to-end training model.We verified the superiority of the model through extensive comparative experiments and ablation experiments on different datasets.In real life, popular items may also be favored by most people because of their good quality.Therefore, in future work, considering the contribution of popular items to recommendation performance in different scenarios will be a direction worthy of research.

Algorithm 1 : 8
The Sampling Strategy of Items for Popularity Debiasing.Input: User-Item Interaction Matrix A; & Item Set I; & Popularity Threshold α Output: Popularity-Debised Item Similarity Graph G c 1 for item i ∈ I do 2 Using the user-item interaction matrix Y, the popularity of item i is normalized to obtain nor_Pop i ; 3 for item j ∈ I do 4 Compute the item popularity difference Pop_Bias i,j between item i and item j 5 Compute the penalty weights w i and w j for item i and item j 6 Compute the similarity sim(i, j) between item i and item j 7 end Select the k c most similar items to be connected to a specific item i based on the similarity ordering; 9 end 10 Construct the popularity-debiased item similarity graph G c return G c N r , N c and E r , E c are the node set and edge set of G r and G c , respectively.M r ∈ {0, 1} |E r | and M c ∈ {0, 1} |E c | are two maskers used to randomly drop out the edges of G r and G c .G r and G c are the augmentation views, after performing the edge dropout operation on G r and G c , respectively.Additionally, during the training process, the additional graph encoders are utilized to learn the embeddings of item nodes in the augmentation views.The learned item node embeddings are used as the unlabeled sample set E for the initial graphs G r and G c , reducing the sensitivity of node embedding learning to changes in graph structure.

Figure 2
Figure 2 illustrates the recommendation performance of the comparative models o Top-10 and Top-20 recommendation tasks in terms of four evaluation protocols.Overa we proposed PDGS model outperforms all baselines in terms of the evaluation metri across all three datasets.The detailed improvements in model performance are show in Table3.

Figure 2 .
Figure 2. Comparison of the performance of different models: (a) ML represents MovieLens-1M (b) LF represents Last-FM; (c) BC represents Book-Crossing

Figure 3 .
Figure 3. Performance of PDGS with reference to the number of the β.

Figure 4 .
Figure 4. Performance of PDGS with reference to the number of the K.

Figure 5 .
Figure 5. Performance of PDGS with reference to the number of L.

Author Contributions:
Study design and writing, S.L.; literature search, X.H. and Y.J.; figures, B.L. and M.Q.; supervision, J.G.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the S&T Program of Hebei under Grant 226Z0102G, 21310101D, the National Natural Science Foundation of China under Grant 42306218, and the National Cultural and Tourism Science and Technology Innovation Project (2020), Hebei Natural Science Foundation under Grant F2023407003.Data Availability Statement: All data generated or analyzed during this study are included in this published article.

Table 1 .
The statistics of the Datasets.

Table 3 .
PDGS performance improvement percentage.It is intuitively clear from Figure2that our proposed model PDGS outperforms the comparison models both in terms of accuracy and novelty recommendation.Table3digitally demonstrates the performance improvement of PDGS on all evaluation metrics.It illustrates that the model PDGS can effectively improve the problem of insufficient diversity of recommended items in the existing models and increase the recommendation ratio of long-tail items.It can fully explore the value of long-tail items, enhance user engagement, and generate more revenue for businesses.(2)In both Top-10 and Top-20 recommendations, the PDGS model achieves optimal

Table 4 .
Performance compared with model variants of PDGS.