Preference-Guided Debiasing and Denoising Social Recommendation

Li, Jun; Li, Shenghan; Zeng, Huachang; Zhuo, Shengda

doi:10.3390/info17050473

Open AccessArticle

Preference-Guided Debiasing and Denoising Social Recommendation

¹

School of Computer and Information Technology, Anhui University of Applied Technology, Hefei 230011, China

²

College of Automobile and Rail, Anhui Technical College of Mechanical and Electrical Engineering, Wuhu 241002, China

³

School of Internet and Communication, Anhui Technical College of Mechanical and Electrical Engineering, Wuhu 241002, China

⁴

College of Cyber Security, Jinan University, Guangzhou 511443, China

^*

Authors to whom correspondence should be addressed.

Information 2026, 17(5), 473; https://doi.org/10.3390/info17050473

Submission received: 7 March 2026 / Revised: 27 April 2026 / Accepted: 10 May 2026 / Published: 12 May 2026

(This article belongs to the Topic Graph Neural Networks and Learning Systems)

Download

Browse Figures

Versions Notes

Abstract

User behaviors and social interactions on online platforms are intricately intertwined, naturally forming complex graph structures. Leveraging this structure, Graph Neural Networks (GNNs) efficiently aggregate neighborhood information and have become a prevailing paradigm for social recommendation. However, existing methods often overemphasize social modeling while overlooking the joint effects of preference-guided relation filtering and user/item biases, rendering them vulnerable to noise from redundant ties. To address these limitations, we propose PDDSR, a Preference-Guided Debiasing and Denoising Social Recommendation framework. Specifically, for debiasing, PDDSR explicitly models user rating bias and item popularity bias as learnable vectors, integrating them into embedding learning to mitigate bias drift at the embedding level. Simultaneously, for denoising, the model employs a social relation confidence mechanism guided by user preferences and adopts an adaptive graph denoising strategy to retain highly informative connections, effectively capturing social influence while filtering out noise. Extensive experiments on the Ciao and Epinions datasets demonstrate that PDDSR consistently outperforms state-of-the-art methods, and notably on the Ciao dataset, the MAE and RMSE are improved by 1.90% and 1.87%, respectively. These results validate the effectiveness and robustness of the joint debiasing and denoising mechanism in complex social recommendation scenarios.

Keywords:

recommender systems; social links; Graph Neural Networks; user bias

1. Introduction

In the era of rapid digital evolution, the exponential growth of online data has made “information overload” a severe challenge for users. As a pivotal solution to this issue, recommender systems have emerged as an indispensable infrastructure across various online platforms, ranging from e-commerce and social networking to content streaming services [1,2]. By meticulously modeling user historical interactions and latent preferences, these systems efficiently retrieve highly relevant items from vast repositories. This personalized information filtering mechanism not only significantly reduces decision-making costs for users, thereby enhancing their satisfaction, but also optimizes platform content distribution, ultimately driving user retention and sustainable business growth [3,4].

Early recommendation methods, such as PMF [5] and TrustMF [6], primarily rely on user–item interactions. These methods learn latent user and item embeddings via Matrix Factorization (MF) or similarity metrics. However, they struggle with severe data sparsity in real-world scenarios, where new users or long-tail items lack sufficient interactions, limiting preference modeling accuracy [7,8]. With the rise in social media, social connections serve as valuable auxiliary signals for inferring user interests [9]. Leveraging homophily (i.e., similar users share preferences), social recommendation methods like TrustSVD [10] and DREAM [11] mitigate sparsity by incorporating social ties. However, they typically model social relations at a coarse-grained level, failing to capture complex influence propagation.

Since user–user and user–item interactions naturally form graphs, GNNs have become the dominant paradigm for social recommendation, leveraging their multi-hop neighborhood aggregation and structure-aware learning. GraphRec [12] pioneered GNN-based social recommendation by jointly modeling social and interaction graphs, using attention mechanisms to differentiate individual preferences from social influence. Successors like DiffNet++ [13] and GAIPSRec [14] leverage multi-layer propagation to capture high-order social influence, significantly outperforming traditional baselines. However, real-world social networks are inherently noisy, which severely degrades model performance. To mitigate this, recent studies explore various denoising strategies, such as consistency-based suppression (ConsisRec [15]), structure reconstruction (RS-GNN [16]), training optimization (ADT [17]), and diffusion modeling (RecDiff [18]). However, these methods face concrete technical limitations: (i) Architecturally, models like GraphRec [12] and DiffNet++ [13] rely on indiscriminate neighbor aggregation or static attention mechanisms. They inherently assume all observed social ties reflect shared preferences, lacking a dynamic filtering mechanism to prune structural noise. (ii) Statistically, denoising methods like ConsisRec [15] and RecDiff [18] often treat raw observational ratings as ground truth. By failing to explicitly decouple subjective user rating habits and item popularity biases from the embedding learning process, these models are highly susceptible to bias drift.

Consequently, existing GNN-based social recommendation paradigms overlook the joint effects of preference-guided relation filtering and explicit bias calibration. Addressing noise without calibrating biases leads models to denoise based on skewed preference signals, while debiasing without topological denoising leaves the graph vulnerable to redundant information propagation. To illustrate this critical technical gap, Figure 1 presents a dual-view schematic of the process, highlighting two key observations.

(1): Social view: The black dashed lines represent social connections with co-interactions (i.e., users interacting with the same items), whereas gray dashed lines indicate connections lacking such shared behaviors. Real-world social ties are complex and do not necessarily imply shared preferences. Consequently, social connections without co-interactions often fail to provide effective signals, introducing noise that hinders learning, which makes pruning noisy edges to retain highly informative social ties critical for improving recommendation performance.
(2): Interaction view: Orange, green, and black nodes represent strict, lenient, and unbiased raters, respectively. User ratings are highly subjective; identical scores may carry distinct semantic meanings across users. For instance, strict raters may assign low scores even to items they favor. Furthermore, item popularity skews rating interpretation. For example, if a widely acclaimed blockbuster (which typically receives five stars) is given a mathematically ‘moderate’ three-star rating, it actually reflects relatively negative feedback compared to the crowd’s consensus. Consequently, failing to rectify these user and item biases prevents the accurate capture of preferences, severely compromising recommendation accuracy.

To extract high-quality social signals while mitigating biases and social noise, we propose PDDSR, a Preference-Guided Debiasing and Denoising Social Recommendation framework. This framework explicitly addresses biases and suppresses noise to learn robust user and item representations. The main contributions of this work are summarized as follows:

Overall Framework Extension: We propose PDDSR, which extends the prevailing GNN-based social recommendation paradigm by systematically integrating explicit bias calibration and structural denoising into a unified joint-learning framework.
Explicit Debiasing Architecture: Unlike existing methods that rely on implicit regularization, we introduce an explicit bias mitigation architecture. This module isolates user-specific rating habits and item popularity as learnable vector offsets, preventing bias drift at the foundational embedding level.
Adaptive Denoising Mechanism: Moving beyond standard static graph pruning, we design a preference-guided adaptive graph denoising mechanism. It calculates social relation confidence based on actual interaction consistency and adaptively scales the denoising ratio according to individual social network sizes, effectively filtering structural noise without exacerbating data sparsity.
Empirical Validation: Extensive experiments on two benchmark datasets demonstrate PDDSR’s superiority over state-of-the-art methods, firmly validating the effectiveness of our dual-mechanism design in complex real-world scenarios.

The rest of the paper is organized as follows: We first survey related work in Section 2, followed by the problem formulation in Section 3. Section 4 introduces the technical details of our framework. We then demonstrate the experimental results in Section 5 and summarize our conclusions in Section 6.

2. Related Work

2.1. Social Recommendation

Early social recommendation research predominantly relies on MF methods [5,6,19]. Mnih et al. [5] introduce Probabilistic Matrix Factorization (PMF), which reconstructs rating matrices by learning latent representations of users and items. FunkSVD [19] extends traditional singular value decomposition by incorporating implicit feedback into latent factor learning, improving prediction accuracy. However, neither explicitly models social relations. To address this, Yang et al. [6] propose TrustMF, an MF-based model that regularizes latent representations using user trust relationships, encouraging socially connected users to exhibit consistent embeddings in the latent space. Nevertheless, such methods are inherently limited by linear assumptions, struggling to capture complex nonlinear interaction patterns.

To overcome these limitations, Deep Learning (DL) is increasingly adopted. He et al. [20] propose NeuMF, which integrates neural architectures into MF by replacing inner-product operations with Multi-Layer Perceptrons (MLPs) for flexible interaction modeling. Similarly, Fan et al. [21] introduce DeepSoR, incorporating Deep Neural Networks (DNNs) to capture nonlinear social patterns. Despite this progress, existing DL-based models primarily focus on node-level feature interactions or local associations, often treating users and items as isolated instances. Consequently, they fail to fully exploit the rich structural information embedded in social graphs. Specifically, these methods overlook the propagation of high-order neighborhood information over the global topology, hindering the capture of deep, transitive social influences and thereby limiting further performance gains.

2.2. GNN-Based Social Recommendation

Recently, GNNs are widely adopted in social recommendation for modeling complex topologies and high-order dependencies [12,13,22,23,24,25,26,27,28]. Fan et al. [12] propose GraphRec, which jointly models social and interaction graphs using attention to distinguish individual preferences from social influence. However, this method is limited to first-order aggregation. To address this, Wu et al. [22] introduce DiffNet to model high-order influence via recursive propagation, while its successor, DiffNet++ [13], further models interest diffusion between users and items. Subsequent studies extend these methods by refining social influence and structural modeling. For instance, EIISRS [23] models multi-dimensional social influence by considering the effect of diversity and implicit influence discovery. LightGCN [24] simplifies GNNs by retaining only neighborhood aggregation for linear propagation. Other works explore complex fusion strategies for social and collaborative signals. Fu et al. [25] propose DICER with bidirectional context-aware modulation, while Liao et al. [26] propose SocialLGN to propagate representations across graphs via a fusion mechanism. Similarly, Xia et al. [27] introduce DGNN, utilizing latent memory units for heterogeneous relationships, and Zhang et al. [28] propose MNGCN, which fuses multi-type neighbor information via multi-level attention.

Despite these advancements, existing GNN-based methods face specific limitations that our PDDSR framework directly addresses. First, models like GraphRec [12] and DiffNet++ [13] primarily focus on sophisticated message passing but rely on indiscriminate neighbor aggregation. They process all social ties as equally valid pathways for influence, lacking a mechanism to actively filter out redundant connections. Second, while methods like DICER [25] attempt context-aware modulation, they still largely overlook the inherent statistical biases prevalent in observational data (e.g., user rating habits and item popularity). Consequently, their representation learning is easily skewed. PDDSR uniquely addresses this by introducing explicit learnable vectors to decouple these statistical biases from genuine preferences before any social aggregation occurs.

2.3. Graph Denoising for Social Recommendation

Recommender models are vulnerable to redundant structures and noisy data [15,16,17,18,29,30,31], which positions graph denoising as a critical research focus. Yang et al. [15] identify the misalignment between social ties and rating objectives, proposing ConsisRec to select consistent neighbors at both relational and contextual levels. Furthermore, GBSR [29] employs a graph bottleneck to learn minimal sufficient structures, effectively pruning redundancy. In the broader field of graph learning, Dai et al. [16] address noisy edges that hinder message passing by learning denoised structures with label smoothness. Addressing noisy implicit feedback, Wang et al. [17] propose ADT to enhance robustness by dynamically suppressing high-loss samples. Subsequently, Wang et al. [30] introduce DeCA, which leverages cross-model agreement on clean samples to denoise interactions. To mitigate noise diffusion and over-smoothing, Wang et al. [31] propose SI-GAN, enhancing discrimination via adversarial learning. Similarly, targeting irrelevant ties, Li et al. [18] introduce RecDiff, employing multi-step latent diffusion for denoising.

While existing methods achieve partial success in mitigating structural noise, they present a critical gap in systematically distinguishing between statistical bias and topological noise. For instance, structure-centric denoising methods like ConsisRec [15] and SI-GAN [31] focus purely on pruning or reweighting edges, yet they treat the underlying biased user ratings as ground truth. Conversely, our PDDSR framework treats bias and noise as distinct but intertwined challenges: it performs bias calibration at the embedding level to ensure clean preference signals, which then guide the topological denoising process. Furthermore, regarding social tie strength, prior works rarely establish a dynamic confidence metric directly derived from user-item interaction consistency. Instead of relying on static thresholds or computationally heavy GANs (e.g., SI-GAN), PDDSR introduces a preference-guided adaptive denoising ratio inspired by Dunbar’s number, enabling personalized granularity.

To clearly articulate our specific contributions, Table 1 summarizes the methodological differences between PDDSR and representative existing baselines.

3. Problem Formulation

Let

U = \{u_{1}, u_{2}, \dots, u_{M}\}

and

V = \{v_{1}, v_{2}, \dots, v_{N}\}

denote the sets of users and items, respectively. Since users rate only a subset of items, the resulting rating matrix

R \in R^{M \times N}

is inherently sparse.

R (u_{i})

denotes the set of items interacted with by user

u_{i}

. Our goal is to predict missing entries in

R

leveraging historical interactions and social connections to generate recommendations. We model dual layers—interaction and social—to characterize user preferences and social influence. In the interaction layer, the edge weight

r_{i j}

represents user

u_{i}

’s rating of item

v_{j}

. The social layer encodes explicit social ties, serving as pathways for the propagation of latent interests. Here, we introduce a behavioral consistency metric to quantify tie strength. Specifically, the relationship coefficient between users

u_{i}

and

u_{j}

is defined as

T_{i j} = 1 + \sum_{v_{k} \in \{R (u_{i}) \cap R (u_{j})\}} I (|r_{i k} - r_{j k}| \leq δ),

(1)

where

I (\cdot)

is the indicator function, returning 1 if the rating discrepancy falls within the threshold

δ

. This coefficient weights social ties by behavioral consistency rather than binary connections, thereby capturing preference similarity among neighbors. Leveraging these joint representations, the model infers latent preferences for unobserved items constrained by social influence.

4. Framework

This section details the architecture of PDDSR, as illustrated in Figure 2. The framework comprises five key modules: (1) User Modeling encodes inherent rating biases as vectors, integrating them into latent factor learning to capture user-specific tendencies. (2) Item Modeling fuses item features with explicit bias vectors to learn robust item embeddings. (3) Social Modeling utilizes attention mechanisms to quantify social influence strength. Additionally, a preference-aware denoising mechanism estimates relationship confidence, pruning noisy edges to retain highly informative connections, thereby adapting to heterogeneous tie strengths. (4) Rating Prediction integrates the learned biased latent factors to model interactions and estimate user preferences. (5) Model Training jointly optimizes social link prediction and recommendation objectives, effectively exploiting social signals to boost performance. The following subsections detail each module.

4.1. User Modeling

Existing methods learn user latent factors leveraging user-item interaction histories and raw ratings [28,32,33]. However, inherent rating biases across users and items often receive insufficient attention, potentially obscuring genuine preference patterns and compromising modeling accuracy. To explicitly capture user-side rating biases, we introduce the unbiased rating difference as a core signal for user modeling, aiming to accurately reflect true preferences during interactions. Specifically, given the average ratings

A (u_{i})

and

A (v_{j})

computed from raw data, the unbiased rating difference

{\bar{r}}_{i j}

for user

u_{i}

and item

v_{j}

is defined as

{\bar{r}}_{i j} = ⌈|r_{i j} - A (v_{j})|⌉,

(2)

where

{\bar{r}}_{i j}

is mapped to a vector representation

s_{{\bar{r}}_{i j}}

via an embedding lookup table, encoding user-perspective rating biases. It is worth noting that our formulation deliberately subtracts the item’s mean rating

A (v_{j})

from the actual rating

r_{i j}

. Unlike traditional mean-centering, which merely normalizes ratings, this cross-subtraction is designed to capture the user’s deviation from the “crowd’s consensus” (i.e., how strict or lenient the user is compared to the general public), thereby extracting a personalized bias offset. While the ceiling operator

⌈ \cdot ⌉

is mathematically non-differentiable, it does not impede backpropagation. The resulting discrete value

{\bar{r}}_{i j}

strictly serves as a categorical index for the embedding lookup table. During training, gradients directly update the parameters of this bias embedding matrix based on the fetched index, bypassing the need to differentiate through the ceiling function itself.

Next, the item embedding is concatenated with the bias vector and fed into an MLP to generate the bias-aware interaction representation

x_{i l}

:

x_{i l} = MLP ([e_{v_{l}} \oplus s_{{\bar{r}}_{i l}}]),

(3)

where ⊕ denotes concatenation. Since interactions contribute differently to user preferences, we employ an attention mechanism to assign adaptive importance weights. Interaction weights are computed via a two-layer neural network and normalized over the user’s history:

\begin{matrix} η_{i l}^{*} = W_{2} \cdot ReLU (W_{1} \cdot [x_{i l} \oplus e_{u_{i}}] + b_{1}) + b_{2}, \\ η_{i l} = \frac{exp (η_{i l}^{*})}{\sum_{v_{l} \in R (u_{i})} exp (η_{i l}^{*})}, \end{matrix}

(4)

where

R (u_{i})

denotes user

u_{i}

’s interaction set, and

W_{*}

and

b_{*}

are weight matrices and bias terms, respectively.

Finally, aggregating these weighted representations yields the user latent factor offset:

h_{u_{i}}^{R} = Tanh (W \cdot \{\sum_{v_{l} \in R (u_{i})} η_{i l} x_{i l}\} + b) .

(5)

4.2. Item Modeling

Item latent factors depend not only on intrinsic attributes but also on the evaluation behaviors of diverse users. Due to varying rating scales and subjective preferences, raw ratings often contain systematic user biases, potentially distorting the item’s true characteristics. To mitigate this, we adopt an item-centric strategy to learn latent factor offsets from users interacting with item

v_{j}

. This process aggregates data from all historical interactions with

v_{j}

, incorporating their specific ratings. To minimize subjectivity biases, analogous to the user modeling in Section 4.1, we define the bias-corrected rating difference

{\tilde{r}}_{i j}

between user

u_{i}

and item

v_{j}

as

{\tilde{r}}_{i j} = ⌈|r_{i j} - A (u_{i})|⌉ .

(6)

Analogous to the user modeling logic, subtracting the user’s mean rating

A (u_{i})

from

r_{i j}

in Equation (6) aims to capture the item’s deviation from the user’s personal rating habit. This formulation filters out the user’s subjective rating scale to reflect the true intrinsic quality and unique appeal of the item. Using

{\tilde{r}}_{i j}

, we construct the perceived bias representation

y_{j n}

for the interaction between user

u_{n}

and item

v_{j}

:

y_{j n} = MLP ([e_{u_{n}} \oplus s_{{\tilde{r}}_{n j}}]),

(7)

where

s_{{\tilde{r}}_{n j}}

is generated analogously to the user modeling stage. Since users contribute unequally to item latent factors, we employ a two-layer neural attention mechanism. Specifically, we compute attention scores using the interaction-aware bias

y_{j n}

and item embedding

e_{v_{j}}

, normalizing them to obtain weights

ξ_{j n}

:

\begin{matrix} ξ_{j n}^{*} = W_{2} \cdot ReLU (W_{1} \cdot [y_{j n} \oplus e_{v_{j}}] + b_{1}) + b_{2}, \\ ξ_{j n} = \frac{exp (ξ_{j n}^{*})}{\sum_{u_{n} \in R (v_{j})} exp (ξ_{j n}^{*})} . \end{matrix}

(8)

Finally, aggregating these weighted representations yields the latent factor offset for item

v_{j}

:

h_{v_{j}} = Tanh (W \cdot \{\sum_{u_{n} \in R (v_{j})} ξ_{j n} y_{j n}\} + b),

(9)

where

R (v_{j})

denotes the user interaction set for item

v_{j}

.

4.3. Social Modeling

While social relations mitigate data sparsity, real-world networks often contain redundant or noisy ties. Indiscriminately using all ties hinders influence propagation and distorts user preference modeling. Thus, we aim to leverage highly informative social relations for robust social influence modeling.

To characterize socially induced preference shifts, we employ an attention mechanism to quantify neighbor influence. Analogous to user interaction modeling, this module aggregates neighbor preferences to adjust target user representations. Specifically, the neighbor’s interaction offset

h_{u_{f}}^{R}

and the target user embedding

e_{u_{i}}

are concatenated and fed into a two-layer neural network to compute attention scores:

α_{i f}^{*} = W_{2} \cdot ReLU (W_{1} \cdot [h_{u_{f}}^{R} \oplus e_{u_{i}}] + b_{1}) + b_{2},

(10)

where ReLU denotes the rectified linear unit. Next, we apply Softmax normalization over the neighborhood set

N_{S} (i)

to obtain attention weights

α_{i f}

:

α_{i f} = \frac{exp (α_{i f}^{*})}{\sum_{u_{f} \in N_{S} (i)} exp (α_{i f}^{*})} .

(11)

Aggregating these weighted offsets yields the user’s social latent factor offset:

h_{u_{i}}^{S} = Tanh (W \cdot \{\sum_{u_{f} \in N_{S} (i)} α_{i f} h_{u_{f}}^{R}\} + b) .

(12)

Finally, we fuse interaction and social offsets to construct a unified user representation:

h_{u_{i}} = σ (W \cdot (h_{u_{i}}^{R} \oplus h_{u_{i}}^{S}) + b) .

(13)

To reduce social noise, we propose a preference-aware social graph denoising strategy. Since

G_{S}

typically contains numerous weak ties, we formulate social denoising as a link prediction task [34]. While the behavioral consistency metric defined in Equation (1) conceptually captures tie strength, computing it directly from exact co-interactions suffers from severe data sparsity. To overcome this and provide a differentiable, high-order approximation of Equation (1), we quantify relationship similarity using historical interaction behaviors. By feeding the interaction sequences of the user

u_{i}

and their neighbor

u_{f}

into a Transformer [35], we compute the relationship confidence:

{\hat{r}}_{i f} = T (S_{L} (\{e_{v_{l}} ∣ \forall v_{l} \in R (u_{i})\}) \oplus S_{L} (\{e_{v_{m}} ∣ \forall v_{m} \in R (u_{f})\})) .

(14)

This captures high-order dependencies in interaction sequences, distinguishing true friends from noisy connections. Inspired by Dunbar’s number [36], we perform adaptive denoising based on social network size. This allows users with sparse ties to retain connections while applying stricter filtering to those with dense ties, offering superior adaptability compared to uniform denoising. For user

u_{i}

with social relation count

N_{S} (i)

, the denoising ratio is defined as

η_{u_{i}} = \{\begin{matrix} 0, & if |N_{S} (i)| < ϵ \\ {[[{log}_{10} (|N_{S} (i)|)]]}^{γ} * H, & else \end{matrix},

(15)

where

ϵ

,

γ

, and H are hyperparameters. The functional form of Equation (15) is specifically designed to accommodate the power-law degree distribution universally observed in social networks. Applying the base-10 logarithm (

{log}_{10}

) acts as a smoothing penalty, preventing the excessive pruning of highly connected hub users. Intuitively,

γ

serves as a non-linear intensity controller that adjusts how aggressively the denoising ratio scales with the user’s network size, while H represents the base scaling factor determining the minimum unit of pruning. To stabilize training against noisy labels, we introduce a prediction smoothing mechanism [37] to update relationship confidence after the

k D

-th epoch:

{\hat{r}}_{i f} (t = k D) = β \cdot {\hat{r}}_{i f} (t = (k - 1) D) + (1 - β) \cdot {\hat{r}}_{i f} (t = k D),

(16)

where

β

controls the smoothing degree. The hyperparameter D determines the update frequency of the topological structure and is empirically set to 5 in our implementation. Updating the graph every D epochs, rather than at every single epoch, prevents training oscillation and allows node representations to converge sufficiently before the topology is modified again. This decoupled strategy ensures D operates independently without interfering with continuous optimizations like learning rate decay. By integrating preference modeling with graph denoising, PDDSR suppresses noise while preserving critical social signals. The resulting denoised graph

G_{S}^{d}

provides reliable signals, enhancing robustness and accuracy in complex environments.

4.4. Rating Prediction

This module integrates latent factors from user, item, and social modeling to predict user preferences. Specifically, we concatenate

h_{u_{i}}

and

h_{v_{j}}

and feed them into a three-layer feedforward neural network to estimate the preference score

r_{i j}^{P}

:

r_{i j}^{P} = W \cdot σ (W_{2} \cdot σ (W_{1} \cdot (h_{u_{i}} \oplus h_{v_{j}}) + b_{1}) + b_{2}) .

(17)

To mitigate rating scale discrepancies, the final rating prediction incorporates the historical average ratings

A (u_{i})

and

A (v_{j})

of the target user and candidate item:

{\hat{r}}_{i j} = \frac{c}{2} [A (u_{i}) + A (v_{j})] + r_{i j}^{P},

(18)

where c is a hyperparameter controlling the contribution of the average rating term. During inference, average ratings remain fixed as computed during training.

4.5. Model Training

Since social relations and user-item interactions are intrinsically linked, optimizing recommendation alone ignores structural signals, while isolating social modeling neglects true preferences. Thus, we jointly optimize link prediction and recommendation to achieve mutual reinforcement.

For social link prediction, we optimize the confidence score

{\hat{r}}_{i f}

using Binary Cross-Entropy (BCE) loss:

L^{BCE} = - \sum_{(u_{i}, u_{f}) \in N_{S}} log (σ ({\hat{r}}_{i f})) - \sum_{(u_{i}, u_{w}) \notin N_{S}} log (1 - σ ({\hat{r}}_{i w})),

(19)

where

(u_{i}, u_{f})

denotes an observed social link, and

(u_{i}, u_{w})

denotes an unobserved pair.

For recommendation, we employ Bayesian Personalized Ranking (BPR) loss [38] on the predicted scores:

L^{BPR} = - \sum_{(u_{i}, v_{j}) \in R \cap (u_{i}, v_{x}) \notin R} ln σ ({\hat{r}}_{i j} - {\hat{r}}_{i x}) .

(20)

The final joint objective function is:

L = α L^{BCE} + (1 - α) L^{BPR},

(21)

where

α \in [0, 1]

balances the two tasks. The training process is summarized in Algorithm 1.

Algorithm 1: PDDSR

5. Experiments

To comprehensively validate the effectiveness of the PDDSR, our experiments are designed to answer the following key research questions:

RQ1: Does PDDSR demonstrate superior performance compared to current state-of-the-art baselines in both rating prediction and item ranking tasks?

RQ2: How do the individual components within PDDSR contribute to the final recommendation results?

RQ3: Is the PDDSR efficient enough for practical application in terms of runtime and space complexity?

RQ4: How sensitive is PDDSR to key hyperparameters (e.g., joint optimization weight

α

, denoising rate

γ

, and average-rating weight c)?

5.1. Experiment Settings

5.1.1. Dataset

To validate the effectiveness of PDDSR in real-world social recommendation scenarios, we select two widely used benchmark datasets: Ciao and Epinions. Both originate from consumer-oriented product review platforms where users rate and comment on products using a 1–5 rating scale. Table 2 summarizes their statistical characteristics.

Furthermore, to empirically motivate the necessity of our debiasing module, we analyzed the underlying rating distributions of both datasets. Both Ciao and Epinions exhibit severe rating skewness, with the majority of interactions heavily concentrated on 4-star and 5-star ratings (accounting for over 60% of the total data). This long-tailed, imbalanced distribution confirms the pervasive presence of systematic user rating biases and item popularity skew. Without explicit calibration, models are prone to treating these heavily skewed subjective ratings as objective ground truth, leading to suboptimal preference learning.

5.1.2. Evaluation Metrics

To evaluate recommendation performance, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Recall, Normalized Discounted Cumulative Gain (NDCG), Precision, and Mean Average Precision (MAP) are adopted as evaluation metrics.

MAE measures the average magnitude of the errors in a set of predictions, while RMSE penalizes larger errors more heavily due to the squaring process, making it more sensitive to outliers [39]. The calculation formulas are defined as follows:

MAE = \frac{\sum_{(u_{i}, v_{j})} | r_{i j} - {\hat{r}}_{i j} |}{| N |},

(22)

RMSE = \sqrt{\frac{\sum_{(u_{i}, v_{j})} {(r_{i j} - {\hat{r}}_{i j})}^{2}}{| N |}},

(23)

where N denotes the set of rating data in the test set,

r_{i j}

represents the actual rating given by the user

u_{i}

to item

v_{j}

, and

{\hat{r}}_{i j}

is the predicted rating generated by the model. It is worth noting that while rating prediction metrics are secondary to top-N ranking performance in modern recommender systems, they still provide valuable insights into model accuracy. Since the ratings in both the Ciao and Epinions datasets are on a scale of 1 to 5, the maximum possible prediction error is 4. In this context, achieving an MAE of approximately 0.66 indicates an average prediction deviation of less than one star. This effectively normalizes our understanding of the error magnitude, demonstrating that the model’s predictions remain highly proximate to users’ true preferences.

Recall measures the proportion of relevant items correctly recommended in the test set [40], defined as follows:

Recall = \frac{\sum_{u} | R (u) \cap T (u) |}{\sum_{u} | T (u) |},

(24)

where

R (u)

denotes the set of items recommended to user u, and

T (u)

denotes the set of items that user u is interested in the test set.

NDCG evaluates ranking quality by considering the positions of recommended items in the ranked list [41]. A higher NDCG value indicates that the model not only recommends relevant items but also ranks them at higher positions. This metric is calculated as the ratio of Discounted Cumulative Gain (DCG) to Ideal Discounted Cumulative Gain (IDCG):

\begin{matrix} D C G @ K = \sum_{j = 1}^{K} \frac{2^{r e l_{j}} - 1}{{log}_{2} (j + 1)}, \\ I D C G @ K = \sum_{j = 1}^{K} \frac{1}{{log}_{2} (j + 1)}, \\ N D C G @ K = \frac{D C G @ K}{I D C G @ K} . \end{matrix}

(25)

Precision measures the exactness of the recommendations by calculating the proportion of genuinely relevant items within the top-K results. Furthermore, MAP evaluates the overall ranking quality by averaging the Average Precision (AP) across all users. Similar to NDCG, MAP is highly sensitive to item position; a higher score signifies that the model successfully prioritizes relevant items at the very top of the ranked list [42]. Formally, they are defined as follows:

Precision @ K = \frac{1}{| U |} \sum_{u = 1}^{| U |} \frac{|l_{rec}^{u} \cap l_{tes}^{u}|}{K},

(26)

MAP @ K = \frac{1}{| U |} \sum_{u = 1}^{| U |} \frac{\sum_{a = 1}^{K} P_{u} (a) \times δ_{u} (a)}{min \{K, |l_{tes}^{u}|\}},

(27)

where

l_{rec}^{u}

is the recommended list of user u, and

l_{tes}^{u}

is ground-truth data.

P_{u} (a)

indicates the precision@a for user u, and

δ_{u} (a)

indicates whether the ath item in the top-K list has been visited.

To evaluate the ranking performance, we follow the established sampled evaluation protocol commonly used in previous recommendation literature [43]. Specifically, instead of ranking all unobserved items for each user (the allItems protocol), we adopt the testItems strategy. For each ground-truth positive item in the test set, we pair it with 99 randomly sampled unobserved negative items that the user has not interacted with. The evaluation metrics—including Recall@K, NDCG@K, Precision@K, and MAP@K—are then computed based on the position of the true positive item within this ranked list of 100 candidates. This protocol allows for efficient evaluation while providing a reliable relative comparison among different models.

5.1.3. Baselines

To ensure a fair and rigorous evaluation, we compare PDDSR against representative baselines across diverse paradigms, ranging from traditional and social recommendation methods to DNNs, GNNs, and graph denoising models.

Traditional Recommendation Methods

PMF [5] is a classical probabilistic MF model that relies solely on the user-item rating matrix, modeling latent factors via Gaussian distributions.

FunkSVD [19] extends traditional Singular Value Decomposition (SVD) by incorporating implicit feedback into the latent factor learning process.

Traditional Social Recommendation Method

TrustMF [10] employs MF to decompose trust networks according to directionality, mapping users into distinct truster and trustee spaces.

DNN-Based Recommendation Methods

NeuMF [20] utilizes MLPs to replace the inner product in MF, thereby learning non-linear matching functions between users and items.

DeepSoR [21] uses DNNs to extract user representations from social relations, integrating them into PMF for rating prediction.

GNN-Based Recommendation Methods

LightGCN [24] simplifies the GCN framework by discarding nonlinear projections and transformation matrices during message passing, generating user representations via sum pooling.

GraphRec [12] jointly models user-item and user-user interactions, leveraging GNNs to aggregate neighbor embeddings and employing fully connected layers for prediction.

DiffNet++ [13] constructs social recommendation as a heterogeneous graph, recursively learning user embeddings from convolutions of social and interest neighbors. This injects higher-order structures from both domains into user modeling.

Graph Denoising-Based Recommendation Methods

SI-GAN [31] models user-item interactions and refines embeddings through diffusion-based influence propagation. It employs adversarial learning to enhance social embeddings and achieve effective feature fusion.

RecDiff [18] identifies and eliminates noise from encoded user representations via multi-step diffusion, performing denoising across varying noise levels.

5.1.4. Parameter Settings

The dataset was randomly partitioned into training (80%), validation (10%), and testing (10%) sets. The validation set was used for hyperparameter tuning, and the test set for final performance evaluation. Experiments were conducted on an Ubuntu server equipped with an Intel Xeon Gold 6230 CPU (Intel Corporation, Santa Clara, CA, USA), 256 GB RAM, and a Tesla V100 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The model was implemented in Python 3.7 using PyTorch v1.13.1. The batch size was set to 1024, and the embedding dimension was set to 8. While this dimension might appear relatively low for general recommendation tasks, it was deliberately selected based on rigorous preliminary grid search. Given the extreme sparsity of both the Ciao and Epinions datasets, we observed that larger embedding dimensions (e.g., 64 or 128) inevitably led to severe overfitting across all baseline models, rapidly degrading their test set performance. Setting

d = 8

provided the most equitable baseline constraint, forcing all models to capture essential latent features without fitting to noise. For social graph denoising, we set the social network size threshold

ϵ = 5

and parameter

H = 0.02

. Key hyperparameters were tuned via grid search: denoising coefficient

γ \in {0.2, 0.5, 2.0, 3.0}

, joint optimization weight

α \in {0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0}

, and average rating weight

c \in {0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6}

. Model parameters were initialized using PyTorch’s default uniform initialization.

5.2. Performance Comparison (RQ1)

Table 3, Table 4, Table 5 and Table 6 report the comparative performance of different models on the Ciao and Epinions datasets, with the best and second-best results marked in bold and underlined, respectively. We summarize the key observations as follows:

PDDSR consistently outperformed all baseline models across both datasets in terms of MAE, RMSE, Recall, NDCG, Precision, and MAP metrics. Furthermore, paired t-tests confirm that the performance improvements of PDDSR over the strongest baseline (e.g., SI-GAN or RecDiff) are statistically significant at the

p < 0.05

level across all evaluated metrics. Experimental results validate the effectiveness of PDDSR, demonstrating improvements of 1.90% (MAE) and 1.87% (RMSE) on the Ciao dataset, and 2.05% (MAE) and 1.65% (RMSE) on the Epinions dataset. This gain stems from PDDSR’s ability to disentangle genuine user intent from noisy observational data. By leveraging a preference-guided mechanism to filter out redundant social ties precisely, it enhances the quality of recommendation rankings while ensuring high recall. Notably, as shown in Table 6, PDDSR tightly approaches the theoretical upper bound of 0.05 for Precision@20 under our rigorous 100-item sampled evaluation, while its exceptional MAP@20 scores further confirm its ability to consistently prioritize true positive items at the very top of the lists.

DNN-based models generally surpass traditional MF. NeuMF, leveraging neural architectures, significantly outperforms PMF, highlighting the importance of nonlinear feature interaction modeling in capturing complex user behaviors. However, these methods are mostly confined to feature combinations within Euclidean space and fail to fully exploit the high-order topological information embedded in social networks. LightGCN captures high-order dependencies via graph propagation but is constrained by the lack of explicit social signals. In contrast, while GraphRec benefits from social relations, PDDSR further excels by addressing social noise and preference biases.

GNN-based social recommendation models, such as GraphRec and DiffNet++, exhibit strong competitiveness. By recursively aggregating neighbor information, they successfully model the propagation of social influence. However, experimental results indicate that their performance is often limited by their “indiscriminate neighbor aggregation” strategy. Without distinguishing the quality of social ties, noise and irrelevant connections are amplified during multi-layer graph convolution, leading to sub-optimal representation learning. In contrast, PDDSR employs preference-based confidence modeling to achieve a “soft selection” of social relations, effectively blocking noise propagation.

PDDSR outperforms state-of-the-art graph denoising methods (SI-GAN, RecDiff), underscoring the critical role of “debiasing”. Although SI-GAN and RecDiff effectively prune structural noise via adversarial learning or diffusion models, they often overlook the statistical biases prevalent in the data, such as user rating habits and item popularity. PDDSR distinguishes itself by not treating all observed ratings as ground truth. Instead, it adopts a “calibrate-then-denoise” approach: explicitly calibrating user ratings via learnable bias vectors to provide cleaner supervisory signals for the subsequent denoising module. This dual optimization at both the embedding level (debiasing) and the topological level (denoising) is key to PDDSR’s ability to overcome the performance bottlenecks of existing denoising methods.

5.3. Ablation Study (RQ2)

To evaluate the contribution of each key component in PDDSR, we conduct ablation studies using three model variants, as illustrated in Figure 3: (1) w/o DB, which removes the debiasing component; (2) w/o DN, which excludes the social graph denoising strategy; and (3) w/o BCE, which omits the social link prediction task. We observe the outcomes below from the experimental results.

(1): The w/o DB variant utilizes raw ratings to learn latent factor offsets instead of using bias-corrected score differences. As shown in the results, it consistently yields inferior Recall@10 and NDCG@10 scores across both datasets compared to the full PDDSR model. This indicates that relying directly on raw ratings leaves the model susceptible to inherent user and item biases. In contrast, the debiasing mechanism enables the model to learn more discriminative and robust latent representations, thereby enhancing recommendation performance. Furthermore, we acknowledge the pipeline interdependence in this variant: removing the debiasing module inevitably distorts the user-item preference signals used downstream for social denoising. This actually reinforces our core design philosophy—explicit debiasing is a crucial prerequisite. Without clean, bias-free representations, even an advanced denoising mechanism would operate on flawed confidence scores, leading to inaccurate graph pruning.
(2): Removing the social graph denoising process (w/o DN) leads to significant performance degradation on both datasets. This drop occurs because real-world social networks often contain numerous redundant or noisy relationships, rendering models vulnerable to irrelevant social connections. It is important to clarify the exact mechanics of this variant: while the link prediction loss ( $L^{B C E}$ ) is retained as an auxiliary regularization task to train confidence scores, these scores are not actively used to prune the graph. The significant performance drop observed here confirms that merely learning tie confidence as a representation regularizer is insufficient; to prevent the indiscriminate aggregation of irrelevant ties, actively executing the graph pruning strategy based on these learned scores is strictly essential.
(3): The w/o BCE variant retains only the recommendation task, eliminating the social link prediction objective from the joint optimization framework. Its performance lags significantly behind PDDSR, demonstrating that relying solely on user–item preference modeling is insufficient. The social link prediction task provides essential supervision for both the denoising process and social relation modeling, enabling the model to utilize social information more effectively.

5.4. Runtime and Storage Analysis of Models (RQ3)

To evaluate the computational efficiency and memory overhead of PDDSR, we compare it against representative baselines (e.g., DiffNet++, SI-GAN, and RecDiff). We record the training time and peak GPU memory consumption under each model’s individually optimized settings, summarizing the results in Figure 4. To rigorously rule out normal training variance, all reported runtime metrics represent the average of five independent runs with different random seeds. Furthermore, rather than forcing identical training schedules, each model was trained until convergence, dictated by a strict early stopping criterion (i.e., training halts if the validation NDCG does not improve for 10 consecutive epochs). Importantly, the reported peak memory for PDDSR fully accounts for the additional computational overhead incurred during the periodic structural denoising updates.

Consistently across these rigorous evaluation conditions, PDDSR achieves the fastest average training speeds on both the Ciao and Epinions datasets, reducing training time by approximately 2.1–10.2% compared to competitors. This stems from its structure-aware parameterized diffusion design, which circumvents expensive multi-layer iterative aggregation and redundant attention operations. Regarding memory overhead, PDDSR exhibits consistently lower GPU consumption, showing notable reductions against DiffNet++ and RecDiff. This indicates that PDDSR maximizes parameter efficiency via its streamlined architecture, effectively bypassing storage burdens associated with high-order neighbor diffusion and multi-view modeling.

Addressing the critical trade-off between model complexity and performance gain, PDDSR presents a highly favorable profile. While the absolute metric improvements over state-of-the-art baselines (e.g., SI-GAN) might seem modest, paired t-tests confirm they are indeed statistically significant (

p < 0.05

). More importantly, it is crucial to emphasize that PDDSR achieves these gains without relying on heavier architectures. By employing the adaptive denoising mechanism to proactively prune redundant edges, the framework significantly accelerates downstream graph convolutions rather than adding computational overhead. Consequently, this optimal balance between superior recommendation effectiveness and reduced model complexity underscores the strong practicality of PDDSR for large-scale social recommender systems.

5.5. Parameter Sensitivity (RQ4)

5.5.1. Effect of Different Social Network Size Thresholds and Denoising Ratios

To validate the effectiveness of PDDSR’s adaptive denoising mechanism, we analyzed the joint impact of the social network size threshold

ϵ

and the denoising ratio coefficient

γ

on model performance, as illustrated in Figure 5. Horizontally, as

γ

increases from 0.2 to 3.0, performance initially rises and then declines, peaking at

γ = 0.5

. This confirms the necessity of moderate denoising: a low

γ

fails to effectively filter noise, while an excessive

γ

leads to over-smoothing, causing the loss of valuable social signals. Vertically,

ϵ

functions as a sparsity protection threshold, yielding peak results around

ϵ = 5

. This suggests that the mechanism successfully preserves connections for users with sparse ties while applying targeted filtering to dense nodes. Overall, the combination of

ϵ = 5

and

γ = 0.5

achieves the optimal balance between noise removal and information preservation. However, it is important to acknowledge that these optimal values are empirically derived from Ciao and Epinions, which share similar product review domains and comparable average social degrees. For datasets exhibiting significantly denser or sparser social topologies, these specific values may not transfer directly. In practice,

ϵ

and

γ

should be treated as adaptable hyperparameters that require recalibration based on the target network’s average social degree to maintain an appropriate denoising granularity.

5.5.2. Effect of Co-Optimization Under Different Weights

To examine the impact of joint optimization, we vary the weighting coefficient

α

within the range

{0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0}

to balance recommendation and link prediction losses. Figure 6 depicts the corresponding performance trends on both Ciao and Epinions. Performance initially improves as

α

increases from 0, peaking at

α = 0.5

on both datasets before subsequently declining. This indicates that joint optimization facilitates the learning of more informative representations, thereby enhancing model performance. In contrast, setting

α = 0

yields negligible improvement, suggesting that relying solely on user preference modeling is insufficient to effectively guide social denoising. Consequently, the link prediction task plays a pivotal role in enabling effective social denoising. Furthermore, we note that the observed optimal peak at

α = 0.5

is inherently tied to our current experimental configuration, as the BCE and BPR loss functions are not explicitly normalized relative to each other. Because the raw magnitudes of these losses can be heavily influenced by training dynamics—such as the selected batch size and the ratio of positive to negative samples—the effective balance point will shift across different environments. Therefore, the optimal

α

reported here is a dataset-dependent finding, emphasizing its role as a flexible tuning parameter rather than a universally fixed constant.

5.5.3. Effect of Different Average Rating Weights

To analyze the effect of the average rating weight c on prediction performance, c was varied in

{0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6}

, as shown in Figure 7. Decreasing c from 1 to 0 causes a noticeable decline in Recall@10 and NDCG@10, indicating that negligible rating weights distort the predicted rating distribution. Conversely, increasing c beyond 1 (up to 1.6) also degrades performance, suggesting that an overemphasis on global rating statistics dilutes personalized preference modeling. Therefore, a balanced integration of global statistics and personalized preferences is essential for achieving optimal performance.

6. Conclusions

In this paper, we propose PDDSR, a debiased and denoised social recommendation model designed to address rating bias drift and interference from irrelevant social relations. PDDSR explicitly models rating bias as a learnable vector integrated into representation learning, effectively mitigating bias drift at the embedding level. Furthermore, it employs a preference-oriented social graph denoising strategy to filter noise while adopting adaptive denoising ratios to accommodate varying social strengths. Extensive experiments on the Ciao and Epinions datasets demonstrate that PDDSR not only achieves superior recommendation accuracy but also exhibits high computational efficiency, reducing training time and memory overhead compared to complex GNN baselines.

Despite these promising results, PDDSR currently focuses on static social structures, neglecting the temporal evolution of user interests and social ties. In future work, we plan to incorporate dynamic graph learning to capture these temporal shifts; specifically, we will investigate how temporal dynamics might affect and continuously adjust our adaptive denoising ratio over time. Additionally, we aim to integrate Large Language Models (LLMs) to enhance the semantic understanding of user reviews. By extracting rich semantic signals from textual feedback, we can provide deeper contextual supervision to further refine our explicit debiasing mechanism, moving beyond reliance on purely numerical rating statistics.

Author Contributions

Conceptualization, J.L.; methodology, J.L. and S.L.; software, J.L. and S.L.; validation, J.L. and S.L.; formal analysis, H.Z.; investigation, J.L. and S.L.; resources, S.Z.; data curation, J.L. and S.L.; writing—original draft preparation, J.L.; writing—review and editing, H.Z.; supervision, S.Z.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the Key Research Project in Natural Sciences of Anhui Provincial Universities (2022AH052062), the Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (Climbing Program Special Funds) under Grant pdjh2025ak028, the Cybersecurity College Student Innovation Funding Program (Topsec Technologies Group Inc), and the Outstanding Innovative Talents Cultivation Funded Programs for Graduate Students of Jinan University under Grants 2025CXY336, 2025CXY339, and 2025CXY402.

Institutional Review Board Statement

Ethical review and approval were waived for this study, as it solely utilizes publicly available, fully anonymized datasets (Ciao and Epinions), in accordance with the exemption criteria outlined in the U.S. OHRP Common Rule (45 CFR 46.104(d)(4)).

Informed Consent Statement

The informed consent is not required as our study is algorithmic research utilizing publicly available, anonymized datasets.

Data Availability Statement

The Ciao and Epinions datasets utilized in this study can be accessed at https://www.cse.msu.edu/~tangjili/trust.html (accessed on 9 May 2026).

Conflicts of Interest

Authors have received research grants from Topsec Technologies Group Inc. The funder had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Raza, S.; Rahman, M.; Kamawal, S.; Toroghi, A.; Raval, A.; Navah, F.; Kazemeini, A. A comprehensive review of recommender systems: Transitioning from theory to practice. Comput. Sci. Rev. 2026, 59, 100849. [Google Scholar] [CrossRef]
Yoo, H.; Qiu, R.; Xu, C.; Wang, F.; Tong, H. Generalizable recommender system during temporal popularity distribution shifts. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, ON, Canada, 3–7 August 2025; Association for Computing Machinery: New York, NY, USA, 2025. [Google Scholar]
Deldjoo, Y.; He, Z.; McAuley, J.; Korikov, A.; Sanner, S.; Ramisa, A.; Vidal, R.; Sathiamoorthy, M.; Kasirzadeh, A.; Milano, S. A review of modern recommender systems using generative models (gen-recsys). In Proceedings of the 30th ACM SIGKDD conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 6448–6458. [Google Scholar]
Roy, D.; Dutta, M. A systematic review and research perspective on recommender systems. J. Big Data 2022, 9, 59. [Google Scholar] [CrossRef]
Mnih, A.; Salakhutdinov, R.R. Probabilistic matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems 20 (NIPS 2007), Vancouver, BC, Canada, 3–9 December 2007. [Google Scholar]
Yang, B.; Lei, Y.; Liu, J.; Li, W. Social collaborative filtering by trust. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1633–1647. [Google Scholar] [CrossRef] [PubMed]
Rajput, S.; Mehta, N.; Singh, A.; Hulikal Keshavan, R.; Vu, T.; Heldt, L.; Hong, L.; Tay, Y.; Tran, V.; Samost, J.; et al. Recommender systems with generative retrieval. Adv. Neural Inf. Process. Syst. 2023, 36, 10299–10315. [Google Scholar]
Zhu, J.; Dai, Q.; Su, L.; Ma, R.; Liu, J.; Cai, G.; Xiao, X.; Zhang, R. Bars: Towards open benchmarking for recommender systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 2912–2923. [Google Scholar]
Yu, J.; Yin, H.; Gao, M.; Xia, X.; Zhang, X.; Viet Hung, N.Q. Socially-aware self-supervised tri-training for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2084–2092. [Google Scholar]
Guo, G.; Zhang, J.; Yorke-Smith, N. Trustsvd: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Washington, DC, USA, 2015; Volume 29. [Google Scholar]
Song, L.; Bi, Y.; Yao, M.; Wu, Z.; Wang, J.; Xiao, J. Dream: A dynamic relation-aware model for social recommendation. In Proceedings of the 29th ACM international Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 2225–2228. [Google Scholar]
Fan, W.; Ma, Y.; Li, Q.; He, Y.; Zhao, E.; Tang, J.; Yin, D. Graph neural networks for social recommendation. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 417–426. [Google Scholar]
Wu, L.; Li, J.; Sun, P.; Hong, R.; Ge, Y.; Wang, M. Diffnet++: A neural influence and interest diffusion network for social recommendation. IEEE Trans. Knowl. Data Eng. 2020, 34, 4753–4766. [Google Scholar] [CrossRef]
Xiong, F.; Sun, H.; Luo, G.; Pan, S.; Qiu, M.; Wang, L. Graph attention network with high-order neighbor information propagation for social recommendation. In Proceedings of the IJCAI-24: Thirty-Third International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence, Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
Yang, L.; Liu, Z.; Dou, Y.; Ma, J.; Yu, P.S. Consisrec: Enhancing gnn for social recommendation via consistent neighbor aggregation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2141–2145. [Google Scholar]
Dai, E.; Jin, W.; Liu, H.; Wang, S. Towards robust graph neural networks for noisy graphs with sparse labels. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual, 21–25 February 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 181–191. [Google Scholar]
Wang, W.; Feng, F.; He, X.; Nie, L.; Chua, T.S. Denoising implicit feedback for recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 373–381. [Google Scholar]
Li, Z.; Xia, L.; Huang, C. Recdiff: Diffusion model for social recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1346–1355. [Google Scholar]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 426–434. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 173–182. [Google Scholar]
Fan, W.; Li, Q.; Cheng, M. Deep modeling of social relations for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018; Volume 32. [Google Scholar]
Wu, L.; Sun, P.; Fu, Y.; Hong, R.; Wang, X.; Wang, M. A neural influence diffusion model for social recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 235–244. [Google Scholar]
Chen, X.; Lei, P.I.; Sheng, Y.; Liu, Y.; Gong, Z. Social influence learning for recommendation systems. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 312–322. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 639–648. [Google Scholar]
Fu, B.; Zhang, W.; Hu, G.; Dai, X.; Huang, S.; Chen, J. Dual side deep context-aware modulation for social recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2524–2534. [Google Scholar]
Liao, J.; Zhou, W.; Luo, F.; Wen, J.; Gao, M.; Li, X.; Zeng, J. SocialLGN: Light graph convolution network for social recommendation. Inf. Sci. 2022, 589, 595–607. [Google Scholar] [CrossRef]
Xia, L.; Shao, Y.; Huang, C.; Xu, Y.; Xu, H.; Pei, J. Disentangled graph social recommendation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2332–2344. [Google Scholar]
Zhang, M.; Liao, X.; Wang, X.; Wang, X.; Jin, L. Multi-neighbor social recommendation with attentional graph convolutional network. Data Min. Knowl. Discov. 2025, 39, 21. [Google Scholar] [CrossRef]
Yang, Y.; Wu, L.; Wang, Z.; He, Z.; Hong, R.; Wang, M. Graph bottlenecked social recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 3853–3862. [Google Scholar]
Wang, Y.; Xin, X.; Meng, Z.; Jose, J.M.; Feng, F.; He, X. Learning robust recommenders through cross-model agreement. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 2015–2025. [Google Scholar]
Wang, J.; Li, H.; Mo, T.; Li, W. Adversarial Learning Enhanced Social Interest Diffusion Model for Recommendation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Tianjin, China, 17–20 April 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 357–372. [Google Scholar]
Li, J.; Wang, H. Graph diffusive self-supervised learning for social recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 2442–2446. [Google Scholar]
Khan, B.; Wu, J.; Yang, J.; Ma, X. Heterogeneous hypergraph neural network for social recommendation using attention network. ACM Trans. Recomm. Syst. 2025, 3, 30. [Google Scholar] [CrossRef]
Wang, W.; Zhang, W.; Liu, S.; Liu, Q.; Zhang, B.; Lin, L.; Zha, H. Incorporating link prediction into multi-relational item graph modeling for session-based recommendation. IEEE Trans. Knowl. Data Eng. 2021, 35, 2683–2696. [Google Scholar] [CrossRef]
Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
Lindenfors, P.; Wartel, A.; Lind, J. Dunbar’s number deconstructed. Biol. Lett. 2021, 17, 20210158. [Google Scholar] [CrossRef] [PubMed]
Chen, P.; Ye, J.; Chen, G.; Zhao, J.; Heng, P.A. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI Press: Washington, DC, USA, 2021; Volume 35, pp. 11442–11450. [Google Scholar]
Hu, Y.; Xiong, F.; Pan, S.; Xiong, X.; Wang, L.; Chen, H. Bayesian personalized ranking based on multiple-layer neighborhoods. Inf. Sci. 2021, 542, 156–176. [Google Scholar] [CrossRef]
Ga, S.; Cho, P.H.; Moon, G.E.; Jung, S. Efficient GNN-based social recommender systems through social graph refinement. J. Supercomput. 2025, 81, 215. [Google Scholar] [CrossRef]
Yang, L.; Wang, S.; Tao, Y.; Sun, J.; Liu, X.; Yu, P.S.; Wang, T. Dgrec: Graph neural network for recommendation with diversified embedding generation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 661–669. [Google Scholar]
Yang, Y.; Wu, L.; Zhang, K.; Hong, R.; Zhou, H.; Zhang, Z.; Zhou, J.; Wang, M. Hyperbolic graph learning for social recommendation. IEEE Trans. Knowl. Data Eng. 2023, 36, 8488–8501. [Google Scholar] [CrossRef]
Zhang, B.; Tian, Y.; Li, C.; Liang, J.; Ye, Y. EGCL: An Effective and Efficient Graph Contrastive Learning Framework for Social Recommendation. ACM Trans. Inf. Syst. 2026, 44, 58. [Google Scholar] [CrossRef]
Bellogin, A.; Castells, P.; Cantador, I. Precision-oriented evaluation of recommender systems: An algorithmic comparison. In Proceedings of the Fifth ACM Conference on Recommender Systems, Chicago, IL, USA, 23–27 October 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 333–336. [Google Scholar]

Figure 1. Schematic illustration of noisy social ties and rating biases in social recommendation.

Figure 2. The overall architecture of the PDDSR framework.

Figure 3. Ablation study results of different variants.

Figure 4. Runtime and storage comparison of several models in achieving optimal performance.

Figure 5. Impact of different social network size thresholds

ϵ

and denoising ratio coefficients

γ

.

Figure 5. Impact of different social network size thresholds

ϵ

and denoising ratio coefficients

γ

.

Figure 6. Impact of different co-optimization weights

α

.

Figure 6. Impact of different co-optimization weights

α

.

Figure 7. Impact of different average rating weights c.

Table 1. Comparison of PDDSR with representative baseline methods.

Method	Paradigm	Explicit Bias Modeling	Structural Denoising	Preference-Guided Confidence
GraphRec [12]	GNN	×	×	×
DiffNet++ [13]	GNN	×	×	×
ConsisRec [15]	Denoising	×	✓	Partial
SI-GAN [31]	Denoising	×	✓	×
RecDiff [18]	Diffusion	×	✓	×
PDDSR	GNN + Denoising	✓	✓	✓

Note: ✓ indicates that the method includes the corresponding feature; × indicates that the feature is not included.

Table 2. Statistics for the two datasets.

Dataset	Ciao	Epinions
Users	7317	18,088
Items	104,975	261,649
Ratings	283,319	764,352
Social Relations	111,781	355,813

Table 3. Performance comparison under MAE and RMSE.

Model	Ciao		Epinions
Model	MAE	RMSE	MAE	RMSE
PMF	0.9520	1.1967	1.0211	1.2739
FunkSVD	0.8462	1.0513	0.9036	1.1431
TrustMF	0.7681	1.0543	0.8550	1.1505
NeuMF	0.8251	1.0824	0.9097	1.1645
DeepSoR	0.7739	1.0316	0.8383	1.0972
LightGCN	0.7715	1.0203	0.8717	1.1103
GraphRec	0.7540	1.0093	0.8441	1.0878
DiffNet++	0.7459	0.9987	0.8435	1.0795
SI-GAN	0.6810	0.9507	0.7709	0.9827
RecDiff	0.6921	0.9498	0.7782	0.9806
PDDSR	0.6683	0.9324	0.7554	0.9647

Table 4. Performance comparison under Recall@K (K = 10 or 20).

Model	Ciao		Epinions
Model	Recall@10	Recall@20	Recall@10	Recall@20
PMF	0.9685	0.9693	0.9645	0.9648
FunkSVD	0.9663	0.9674	0.9651	0.9659
TrustMF	0.9672	0.9678	0.9676	0.9682
NeuMF	0.9687	0.9695	0.9672	0.9677
DeepSoR	0.9676	0.9679	0.9669	0.9686
LightGCN	0.9674	0.9682	0.9677	0.9682
GraphRec	0.9718	0.9736	0.9682	0.9689
DiffNet++	0.9736	0.9752	0.9687	0.9694
SI-GAN	0.9732	0.9745	0.9695	0.9712
RecDiff	0.9741	0.9756	0.9692	0.9723
PDDSR	0.9785	0.9823	0.9713	0.9772

Table 5. Performance comparison under NDCG@K (K = 10 or 20).

Model	Ciao		Epinions
Model	NDCG@10	NDCG@20	NDCG@10	NDCG@20
PMF	0.8745	0.8762	0.8273	0.8296
FunkSVD	0.8782	0.8794	0.8367	0.8381
TrustMF	0.8706	0.8742	0.8472	0.8489
NeuMF	0.8781	0.8792	0.8301	0.8342
DeepSoR	0.8732	0.8756	0.8496	0.8523
LightGCN	0.9094	0.9128	0.8754	0.8782
GraphRec	0.9146	0.9182	0.8814	0.8847
DiffNet++	0.9149	0.9185	0.8851	0.8872
SI-GAN	0.9167	0.9196	0.8869	0.8892
RecDiff	0.9158	0.9202	0.8867	0.8906
PDDSR	0.9184	0.9274	0.8883	0.8965

Table 6. Performance comparison under Precision@20 and MAP@20.

Model	Ciao		Epinions
Model	Precision@20	MAP@20	Precision@20	MAP@20
PMF	0.0485	0.8251	0.0482	0.7764
FunkSVD	0.0483	0.8282	0.0483	0.7855
TrustMF	0.0484	0.8236	0.0484	0.7981
NeuMF	0.0485	0.8286	0.0483	0.7821
DeepSoR	0.0484	0.8248	0.0484	0.8037
LightGCN	0.0486	0.8654	0.0485	0.8295
GraphRec	0.0487	0.8715	0.0484	0.8369
DiffNet++	0.0486	0.8722	0.0485	0.8398
SI-GAN	0.0487	0.8735	0.0486	0.8421
RecDiff	0.0488	0.8743	0.0485	0.8436
PDDSR	0.0491	0.8831	0.0489	0.8512

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Li, S.; Zeng, H.; Zhuo, S. Preference-Guided Debiasing and Denoising Social Recommendation. Information 2026, 17, 473. https://doi.org/10.3390/info17050473

AMA Style

Li J, Li S, Zeng H, Zhuo S. Preference-Guided Debiasing and Denoising Social Recommendation. Information. 2026; 17(5):473. https://doi.org/10.3390/info17050473

Chicago/Turabian Style

Li, Jun, Shenghan Li, Huachang Zeng, and Shengda Zhuo. 2026. "Preference-Guided Debiasing and Denoising Social Recommendation" Information 17, no. 5: 473. https://doi.org/10.3390/info17050473

APA Style

Li, J., Li, S., Zeng, H., & Zhuo, S. (2026). Preference-Guided Debiasing and Denoising Social Recommendation. Information, 17(5), 473. https://doi.org/10.3390/info17050473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Preference-Guided Debiasing and Denoising Social Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Social Recommendation

2.2. GNN-Based Social Recommendation

2.3. Graph Denoising for Social Recommendation

3. Problem Formulation

4. Framework

4.1. User Modeling

4.2. Item Modeling

4.3. Social Modeling

4.4. Rating Prediction

4.5. Model Training

5. Experiments

5.1. Experiment Settings

5.1.1. Dataset

5.1.2. Evaluation Metrics

5.1.3. Baselines

5.1.4. Parameter Settings

5.2. Performance Comparison (RQ1)

5.3. Ablation Study (RQ2)

5.4. Runtime and Storage Analysis of Models (RQ3)

5.5. Parameter Sensitivity (RQ4)

5.5.1. Effect of Different Social Network Size Thresholds and Denoising Ratios

5.5.2. Effect of Co-Optimization Under Different Weights

5.5.3. Effect of Different Average Rating Weights

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI