A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering

Belgacem, Ouahiba; Boudaa, Boudjemaa; Kouadria, Abderrahmane; Abouaissa, Abdelhafid

doi:10.3390/digital5040057

Open AccessArticle

A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering

¹

Laboratoire de Génie Energétique et Génie Informatique (L2GEGI), University of Tiaret, Tiaret 14000, Algeria

²

Department of Computer Science, University of Tiaret, Tiaret 14000, Algeria

³

Laboratoire de Recherche en Informatique de Sidi Bel Abbes (LabRI-SBA), Ecole Supérieure en Informatique, Sidi Bel Abbes 22000, Algeria

⁴

Institut de Recherche en Informatique, Mathématiques, Automatique et Signal (IRIMAS), University of Haute-Alsace, 68000 Mulhouse, France

^*

Author to whom correspondence should be addressed.

Digital 2025, 5(4), 57; https://doi.org/10.3390/digital5040057

Submission received: 19 August 2025 / Revised: 11 October 2025 / Accepted: 15 October 2025 / Published: 20 October 2025

Download

Browse Figures

Versions Notes

Abstract

Collaborative filtering (CF) continues to be a fundamental approach in recommendation systems for providing users with personalized suggestions. However, such kind of recommender systems are prone to performance issues when faced with noisy, inconsistent, or deliberately manipulated user ratings. Although Generative Adversarial Networks (GANs) offer promising solutions to capture complex user-item interactions in these CF situations, many existing GAN-based methods assume uniform reliability across all ratings, reducing their effectiveness under uncertain conditions. To overcome this challenge, this paper presents DST-AttentiveGAN to introduce a confidence-aware adversarial framework specifically designed to denoise inconsistent ratings in collaborative filtering scenarios. The proposed approach employs Dempster-Shafer Theory (DST) to compute confidence scores by aggregating diverse behavioral indicators, such as item popularity, user activity, and rating variance. These scores guide both components of the GAN architecture in which the generator incorporates a cross-attention mechanism to highlight trustworthy features, while the discriminator uses DST-based confidence to evaluate the credibility of input ratings. Training is carried out using a stabilized Wasserstein GAN objective that promotes both robustness and convergence efficiency. Experimental results in three benchmark data sets show that DST-AttentiveGAN consistently surpasses conventional GAN-based models, delivering more accurate and reliable recommendations under conditions of uncertainty.

Keywords:

recommender system; collaborative filtering; noisy rating; Generative Adversarial Networks; Dempster-Shafer theory

1. Introduction

Recommender systems (RS) have become essential components of modern digital platforms, helping users navigate vast information spaces by providing personalized content [1]. Among various recommendation approaches, Collaborative Filtering (CF) [2] remains one of the most effective techniques, relying on historical user-item interactions to infer preferences. CF has been successfully deployed in diverse domains such as e-commerce, streaming services, and social media [3,4]. However, CF models are highly sensitive to data quality. The presence of noisy, inconsistent, or malicious ratings can distort user similarity patterns and severely degrade recommendation performance [5,6].

To mitigate these challenges and enhance CF performance under adverse conditions, recent works have turned to Generative Adversarial Networks (GANs) as a means of modeling complex user-item interactions. GANs consist of a generator, which synthesizes data, and a discriminator, which distinguishes real from generated samples [7]. Their adversarial training setup enables the generation of realistic and diverse rating profiles, making them suitable for tasks such as simulating user behavior, improving latent representations, and increasing sparse data [8,9]. For example, GANRS [10] introduced synthetic profile generation but struggled with mode collapse, a challenge later addressed in WGANRS [11] through Wasserstein loss. CFGAN [12] modeled user-item interactions in a latent space, while GANMF [13] combined matrix factorization with an autoencoder-based discriminator to enhance personalization. AACF [14] introduced attention and virtual items to improve training efficiency in sparse conditions. Despite these advancements, existing GAN-based models generally treat all ratings as equally reliable, without considering the varying degrees of uncertainty or credibility in user feedback. This assumption limits their ability to filter or correct misleading inputs, particularly in real-world CF recommender systems where rating reliability can vary significantly.

In parallel, Dempster–Shafer Theory (DST) has established itself as a powerful mathematical framework for uncertain reasoning and evidence fusion [15,16]. It has been successfully applied in diverse fields such as spatial object matching [17], hyperspectral image classification [18], groundwater potential mapping [19], and urban resilience assessment [20], where data imperfections, ambiguity, and conflicts are prevalent. These applications highlight DST’s versatility in representing uncertainty, aggregating heterogeneous evidence, and supporting robust decision-making in complex environments.

Within the realm of recommender systems, DST has been effectively applied to capture user preferences under incomplete or ambiguous conditions [21]. Prior works have used DST for trust estimation, multi-criteria recommendation [22], and multimodal fusion [23]. A recent approach [24] further enhances Evidential Collaborative Filtering by employing Deng entropy and the Best Worst Method to optimize the assessment of user reliability. By assigning belief masses instead of deterministic labels, DST provides a flexible mechanism for expressing varying levels of confidence, making it particularly well suited for environments with noisy or unreliable data.

Although both DST and GANs have shown individual success in the recommendation domain, their integration remains underexplored. In this paper, we present DST-AttentiveGAN, a novel confidence-aware architecture that embeds DST-derived trust signals into both the generator and the discriminator of a GAN framework for rating denoising. This unified approach allows the model to distinguish between reliable and unreliable user feedback more effectively during training and inference. The main contributions of this work are as follows:

We propose a novel DST-based framework for quantifying the reliability of ratings using multiple sources of evidence (item popularity, variance, and user activity).
We design a generator that employs a cross-attention mechanism to selectively focus on reliable rating components, guided by the DST-derived confidence scores.
We develop a discriminator conditioned on both rating values and DST-based trust levels, trained using a Wasserstein GAN loss with gradient penalty to ensure training stability.
We perform extensive evaluations on benchmark datasets under various noise settings, showing that DST-AttentiveGAN consistently outperforms existing GAN-based recommendation models in terms of robustness and accuracy.

The remainder of this paper is organized as follows. Section 2 details the proposed DST-AttentiveGAN architecture, including the construction of the confidence matrix, the cross-attentive generator, and the evidentially conditioned discriminator. Section 3 describes the experimental settings, including datasets, evaluation metrics, and implementation details, and presents a detailed performance analysis compared to the state-of-the-art baselines. Finally, Section 4 concludes the paper and outlines promising future research directions.

2. Method

Collaborative filtering recommender systems are often challenged by noisy, biased, or adversarial ratings that compromise prediction reliability. To address these issues, we propose DST-AttentiveGAN a DST-guided Generative Adversarial Network (GAN) architecture that integrates evidence-based confidence modeling and cross-attention mechanisms. The model is designed to reconstruct denoised rating profiles while being guided by the estimated trustworthiness of each user-item interaction. Figure 1 illustrates the overall architecture of our proposed DST-AttentiveGAN.

2.1. Frame of Discernment

To systematically characterize the reliability of user–item interactions, we define the frame of discernment as the hypothesis set formulated by Equation (1):

Θ = {Reliable, Noisy}

(1)

where Reliable corresponds to the assumption that a rating

r_{u i}

is trustworthy and informative, while Noisy corresponds to the assumption that the rating is affected by bias, inconsistency, or adversarial manipulation.

2.2. Evidence Construction

Let

R \in R^{N \times M}

be the user-item rating matrix, where N is the number of users and M the number of items. To assess the reliability of each individual rating

r_{u i}

, we construct a confidence matrix

C = [c_{u i}]

based on three forms of evidence: item popularity, item rating variance, and user activity.

First, the popularity of an item i is defined as the proportion of users who have interacted with it, as shown in Equation (2), where N denotes the total number of users, and

I (r_{u i} > 0)

is an indicator function that equals 1 if user u rated item i, and 0 otherwise.

p_{i} = \frac{1}{N} \sum_{u = 1}^{N} I (r_{u i} > 0)

(2)

Next, the variance of the ratings received by item i captures the level of disagreement among users, as defined in Equation (3):

v_{i} = \frac{1}{| U_{i} |} \sum_{u \in U_{i}} {(r_{u i} - \frac{1}{| U_{i} |} \sum_{u^{'} \in U_{i}} r_{u^{'} i})}^{2}

(3)

Since this value is unbounded and can vary significantly across items, it is normalized using min-max scaling:

v_{i}^{norm} = \frac{v_{i} - min (v)}{max (v) - min (v)}

(4)

To reflect user involvement, we define the activity of user u as the fraction of items they have rated, as shown in Equation (5), where M is the total number of items, and

I (r_{u i} > 0)

is an indicator function that returns 1 if user u rated item i, and 0 otherwise.

a_{u} = \frac{1}{M} \sum_{i = 1}^{M} I (r_{u i} > 0)

(5)

Although

a_{u}

is also bounded within

[0, 1]

, we apply min-max normalization for consistency across all evidence sources:

a_{u}^{norm} = \frac{a_{u} - min (a)}{max (a) - min (a)}

(6)

The three evidential signals are first expanded into full matrices P, V, and A, ensuring dimensional compatibility with the original rating matrix R. Together, they form the basis for a confidence matrix that captures the trust level of each user–item interaction. Since these signals provide complementary and sometimes conflicting perspectives on rating reliability, we adopt a fusion strategy guided by Dempster–Shafer Theory (DST).

DST is particularly well-suited here because it: (i) allows explicit modeling of uncertainty when evidence is incomplete, (ii) manages conflict between heterogeneous signals without forcing premature decisions, and (iii) is more flexible than classical probability theory, which generally assumes precise prior probabilities, while DST can handle situations where such priors are unavailable or only partially defined. This makes DST especially appropriate for recommender systems, where reliability signals are often noisy, incomplete, or contradictory.

In our framework, fusion is operationalized through a nonlinear aggregation that emphasizes agreements while penalizing conflicts, as shown in Equation (7):

c_{u i} = p_{i} \cdot v_{i} + v_{i} \cdot a_{u} + p_{i} \cdot a_{u} - 2 \cdot p_{i} \cdot v_{i} \cdot a_{u} .

(7)

This formulation serves as a practical instantiation of DST’s principles: the pairwise terms strengthen consistent signals, while the triple interaction term reduces the influence of contradictory evidence. The resulting matrix

C = [c_{u i}]

, with

c_{u i} \in [0, 1]

, encodes a trust-aware confidence score for each rating. This fused representation is subsequently provided as auxiliary guidance to both the generator and discriminator in the proposed DST-AttentiveGAN.

2.3. Generator Architecture with DST-Guided Cross-Attention

The generator G aims to reconstruct a denoised rating vector

{\hat{x}}_{u} = G (x_{u}, c_{u})

for each user u, where

x_{u} \in R^{M}

denotes the user’s normalized (and potentially noisy) rating vector, and

c_{u} \in R^{M}

represents a vector of DST-based confidence scores that quantify the reliability of each rating entry.

To effectively leverage this evidential guidance, we employ a cross-attention mechanism that enables the generator to selectively focus on trustworthy components of

x_{u}

. As shown in Equation (8), the input vectors are projected into a shared latent space using learnable linear transformations:

Q = W_{Q} c_{u}^{⊤}, K = W_{K} x_{u}^{⊤}, V = W_{V} x_{u}^{⊤}

(8)

where

W_{Q}, W_{K}, W_{V} \in R^{d \times M}

are trainable weight matrices, and d represents the dimensionality of the attention space. Here,

c_{u}

denotes the confidence vector derived from Dempster–Shafer Theory (DST), which is used as the query, while

x_{u}

serves as the key and value for attention computation.

The core of the attention mechanism is the scaled dot-product operation (Equation (9)), which computes an attention-weighted combination of the rating components based on the confidence signals:

z_{u} = softmax (\frac{Q K^{⊤}}{\sqrt{d}}) V

(9)

This operation allows each dimension of the user’s rating vector to focus on the most informative dimensions of the confidence vector, thereby prioritizing trustworthy entries while down-weighting uncertain ones.

The attention output,

z_{u}

, is then concatenated with the original rating vector

x_{u}

, and the resulting representation is passed through a feedforward neural network to obtain the reconstructed rating vector, as defined in Equation (10):

{\hat{x}}_{u} = σ ({Dense}_{2} (ReLU ({Dense}_{1} ([z_{u}; x_{u}]))))

(10)

The sigmoid activation

σ (\cdot)

ensures that the final output lies within the normalized rating range

[0, 1]

. This cross-attention structure allows the generator to dynamically incorporate evidential trust at each layer, guiding the denoising process in a personalized and reliability-aware manner.

Generator Loss Function. The generator is trained using a composite objective that combines adversarial and reconstruction losses. As shown in Equation (11), the total loss

L_{G}

includes two components:

L_{G} = - E_{x_{u}, c_{u}} [D (G (x_{u}, c_{u}), c_{u})] + λ_{r} \cdot L_{rec} ({\hat{x}}_{u}, x_{u})

(11)

The first term is an adversarial loss derived from the Wasserstein GAN framework, which encourages the generator to produce denoised vectors that the discriminator cannot distinguish from real ratings. The second term, weighted by hyperparameter

λ_{r}

, enforces closeness to the ground-truth ratings through a reconstruction penalty defined by mean squared error (Equation (12)):

L_{rec} ({\hat{x}}_{u}, x_{u}) = {∥{\hat{x}}_{u} - x_{u}∥}_{2}^{2}

(12)

The hybrid objective in Equation (11) ensures that the generator produces outputs that are both adversarially plausible and numerically accurate, while aligning with the DST-derived evidence during the attention and reconstruction processes.

2.4. Discriminator Architecture with Evidential Conditioning

The discriminator D is designed to assess the authenticity of user rating vectors by leveraging evidential trust cues derived from the Dempster-Shafer Theory (DST). Specifically, it receives as input a pair

(x_{u}, c_{u}) \in R^{M} \times R^{M}

, where

x_{u}

denotes either a real or generated (denoised) normalized rating vector for user u, and

c_{u}

is the associated DST-based confidence vector.

To jointly process both rating values and their corresponding confidence scores, the two vectors are concatenated into a single input as shown in Equation (13):

h_{0} = [x_{u}; c_{u}] \in R^{2 M}

(13)

This representation is passed through a sequence of fully connected layers with LeakyReLU activations, enabling the network to model non-linear dependencies and complex interactions between ratings and their associated trust signals. The final output is a scalar prediction (Equation (14)):

D (x_{u}, c_{u}) \in R

(14)

which reflects the discriminator’s confidence that the input pair corresponds to an authentic (real) rating profile. By integrating DST-based evidence into the decision process, the discriminator not only evaluates statistical realism but also learns to recognize evidential consistency.

The discriminator is trained using the Wasserstein GAN with Gradient Penalty (WGAN-GP) framework, which stabilizes adversarial learning and promotes meaningful gradients. The loss function is formulated as shown in Equation (15):

L_{D} = E_{{\hat{x}}_{u} \sim P_{G}} [D ({\hat{x}}_{u}, c_{u})] - E_{x_{u} \sim P_{r}} [D (x_{u}, c_{u})] + λ_{G P} \cdot L_{G P}

(15)

Here,

P_{r}

and

P_{G}

denote the distributions of real and generated rating vectors, respectively, and

λ_{G P}

is a regularization coefficient. To satisfy the 1-Lipschitz constraint required by WGAN, the gradient penalty term

L_{G P}

is computed using interpolated samples

{\tilde{x}}_{u}

as defined in Equation (16):

{\tilde{x}}_{u} = α x_{u} + (1 - α) {\hat{x}}_{u}, α \sim U (0, 1)

(16)

The penalty term itself is given by Equation (17):

L_{G P} = E_{{\tilde{x}}_{u}} [{({∥\nabla_{{\tilde{x}}_{u}} D ({\tilde{x}}_{u}, c_{u})∥}_{2} - 1)}^{2}]

(17)

Equation (15) ensures that the discriminator learns to distinguish real from generated profiles while incorporating trust-aware signals. The gradient regularization term in Equation (17) stabilizes training, and the interpolation defined in Equation (16) enables robust gradient estimation. Together, these formulations allow the discriminator to guide the generator toward producing outputs that are both statistically plausible and evidentially coherent.

The following Algorithm 1 summarizes the evidentially guided GAN framework for denoising user-item ratings.

Algorithm 1 Evidentially-Conditioned GAN for Rating Denoising

Require:: Raw rating matrix $R \in R^{N \times M}$
Ensure:: Denoised rating matrix $\hat{R} \in R^{N \times M}$
1:: // Step 1: Evidence Construction
2:: for each item i do
3:: Compute item popularity: $p_{i} = \frac{1}{N} \sum_{u = 1}^{N} I (r_{u i} > 0)$ ▹ Equation (2)
4:: Compute rating variance: $v_{i} = \frac{1}{| U_{i} |} \sum_{u \in U_{i}} {(r_{u i} - \frac{1}{| U_{i} |} \sum_{u^{'} \in U_{i}} r_{u^{'} i})}^{2}$ ▹ Equation (3)
5:: end for
6:: for each user u do
7:: Compute user activity: $a_{u} = \frac{1}{M} \sum_{i = 1}^{M} I (r_{u i} > 0)$ ▹ Equation (5)
8:: end for
9:: Compute confidence matrix: ▹ Equation (7)

$c_{u i} = p_{i} \cdot v_{i} + v_{i} \cdot a_{u} + p_{i} \cdot a_{u} - 2 \cdot p_{i} \cdot v_{i} \cdot a_{u}$
10:: // Step 2: Generator Forward Pass
11:: for each user u do
12:: Input: rating vector $x_{u}$ , confidence vector $c_{u}$
13:: Project to latent space: ▹ Equation (8)

$Q = W_{Q} x_{u}^{⊤}, K = W_{K} c_{u}^{⊤}, V = W_{V} c_{u}^{⊤}$
14:: Compute attention: ▹ Equation (9)

$z_{u} = softmax (\frac{Q K^{⊤}}{\sqrt{d}}) V$
15:: Concatenate and generate: ▹ Equation (10)

${\hat{x}}_{u} = σ ({Dense}_{2} (ReLU ({Dense}_{1} ([z_{u}; x_{u}]))))$
16:: end for
17:: // Step 3: Discriminator Forward Pass
18:: for each user u do
19:: Compute $D (x_{u}, c_{u})$ and $D ({\hat{x}}_{u}, c_{u})$ using MLP ▹ Equation (14)
20:: end for
21:: // Step 4: Adversarial Training Loop
22:: for each epoch do
23:: Sample minibatch of users u
24:: Discriminator Update: ▹ Equation (15)

$L_{D} = E [D (\hat{x}, c)] - E [D (x, c)] + λ_{G P} \cdot L_{G P}$
25:: Generator Update: Minimize loss $L_{G}$ using: ▹ Equation (11)

$L_{G} = - E_{x_{u}, c_{u}} [D (G (x_{u}, c_{u}), c_{u})] + λ_{r} \cdot {∥{\hat{x}}_{u} - x_{u}∥}_{2}^{2}$
26:: end for
27:: return Denoised rating matrix $\hat{R}$

3. Results and Discussion

This section presents the experimental results of DST-AttentiveGAN and provides a comprehensive discussion of its performance. We start by detailing the experimental setup. Then, we report and compare the model’s performance against several baseline methods. Finally, we analyze the impact of DST-AttentiveGAN on rating distributions to highlight its ability to denoise and structure user-item interactions more effectively.

3.1. Experimental Setup

The experimental setup includes the datasets used, evaluation metrics, and implementation details. These components provide the basis for assessing the performance and robustness of DST-AttentiveGAN in various recommendation scenarios.

3.1.1. Datasets

The experiments are conducted on three real-world datasets to validate the robustness of our DST-AttentiveGAN. Table 1 provides a detailed summary of their statistical properties.

MovieLens 1M (https://grouplens.org/datasets/movielens/1m/) (accessed on 16 June 2025): This dataset contains 1,000,209 ratings from 943 users on 1682 movies. Ratings range from 1 to 5. It is well-structured and widely adopted in collaborative filtering research.
Netflix Subset (https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data (accessed on 16 June 2025): We use a curated subset of the Netflix Prize dataset with 3000 users and 4000 movies, totaling 982,142 ratings on a 1–5 scale. It reflects a more sparse and realistic recommendation setting.
Amazon Electronics (https://nijianmo.github.io/amazon/index.html (accessed on 16 June 2025): This dataset includes 874,319 ratings from 5000 users on 4000 electronic products. Ratings are in the range of 1 to 5. It presents challenges such as high sparsity and implicit noise, making it suitable for evaluating robustness.

3.1.2. Evaluation Metrics

To evaluate recommendation quality, we adopt three widely used metrics: HR@K, Precision@K, and NDCG@K, each computed at

K \in {5, 10, 20}

.

Precision (Precision@K) Precision@K evaluates the proportion of recommended items in the top-K list that are relevant [25]. It focuses on the accuracy of top-ranked results. As shown in Equation (18), it is defined as:

$Precision @ K = \frac{1}{| U |} \sum_{u \in U} \frac{| {Rel}_{u} \cap Top - K (u) |}{K}$

(18)

where ${Rel}_{u}$ is the set of relevant items for user u, and $Top - K (u)$ denotes the top-K items recommended to user u.
Hit Ratio (HR@K) Hit Ratio (HR) measures whether the ground truth item appears within the top-K predicted list [26]. It is defined in Equation (19):

$HR @ K = \frac{1}{| U |} \sum_{u \in U} I (r_{u} \in Top - K (u))$

(19)

where $r_{u}$ is the relevant item for user u, and $I (\cdot)$ is the indicator function.
Normalized Discounted Cumulative Gain (NDCG@K)
NDCG rewards correct recommendations at higher ranks [27]. It is calculated as shown in Equation (20):

$NDCG @ K = \frac{1}{| U |} \sum_{u \in U} \frac{1}{{IDCG}_{u}} \sum_{i = 1}^{K} \frac{2^{I (r_{u} = i)} - 1}{{log}_{2} (i + 1)}$

(20)

where ${IDCG}_{u}$ denotes the ideal DCG for user u.

As shown in Equations (19) and (20), these metrics allow us to assess not only whether the target item is retrieved, but also how well it is ranked in the recommendation list.

3.1.3. Hyperparameters Settings

All models were implemented using Python 3.10. For the proposed DST-AttentiveGAN, we adopted the Adam optimizer, tuning the learning rate in

{0.0005, 0.001, 0.005}

and the batch size in

{128, 256, 512}

to select the best configuration. Early stopping was employed with a patience of 10 epochs to prevent overfitting.

The top-K values used for evaluation were set to

K = {5, 10, 20}

, consistent with standard practices in the literature. All baseline models were trained using their available implementations and hyperparameter settings reported in their respective original papers to ensure a fair comparison.

3.2. Empirical Results

To validate the effectiveness of our proposed DST-AttentiveGAN, we compare it against several strong baseline models. Each baseline is implemented with its best-reported hyperparameters and retrained on the same datasets for a fair comparison. The results are presented in Table 2, Table 3, and Table 4, which report the performance on the MovieLens, Amazon Electronics, and Netflix datasets, respectively.

WGANRS [11]: An extension of GANRS that incorporates Wasserstein loss with gradient penalty to improve stability and reduce mode collapse. It better captures rating distribution over large synthetic datasets.
CFGAN [12]: A collaborative filtering GAN that models user-item interactions as latent representations and learns to generate plausible user vectors for top-K recommendations.
GANMF [13]: A GAN model that integrates matrix factorization into the generator and autoencoder-based discriminator to improve rating predictions, targeting both sparsity and cold-start problems.
AACF [14]: An attentive adversarial collaborative filtering model that introduces an attention mechanism and virtual items to improve GAN training on discrete recommendation data. It is optimized for scalability and convergence.
Self-AttentiveGAN: A variant of DST-AttentiveGAN where the DST-guided attention mechanism is removed. The model uses only plain self-attention in both the generator and discriminator, allowing us to evaluate the contribution of DST signals to rating prediction and top-K recommendation performance.

3.3. Evaluation of Prediction Accuracy via MAE and MSE

While HR@K, NDCG@K, and Precision@K are standard metrics for evaluating recommendation performance, they primarily capture ranking quality and do not directly reflect the model’s ability to predict rating values accurately. To complement these measures, we analyzed Mean Absolute Error (MAE) and Mean Squared Error (MSE) over training epochs across the MovieLens, Netflix, and Amazon datasets.

As shown in Figure 2, both MAE and MSE decrease consistently with training, indicating that the DST-AttentiveGAN effectively reconstructs user ratings while denoising noisy inputs. MovieLens exhibits the lowest errors, reflecting its higher density and lower sparsity, whereas Amazon presents higher and more variable errors due to data sparsity. Netflix falls in between, showing moderate error reduction.

These results highlight that, beyond ranking performance, the model achieves accurate rating prediction, with MAE and MSE providing complementary insights into the reliability and stability of the learned representations.

3.4. Discussion and Analysis

In this subsection, we analyze the results obtained from our experiments and discuss the effectiveness of DST-AttentiveGAN across different datasets and conditions. We explore its impact on the structure of user-item rating distributions. These insights provide a deeper understanding of the model’s robustness and adaptability in real-world recommendation scenarios.

3.4.1. Performance Comparison

The experimental findings demonstrate that DST-AttentiveGAN consistently outperforms strong GAN-based baselines across all evaluation metrics (HR@K, NDCG@K, Precision@K). These improvements are both substantial and consistent across datasets of varying sparsity and noise characteristics, which confirms the model’s robustness and adaptability.

On the MovieLens dataset, which is relatively dense and less noisy, our model still achieves over 7% improvement in Precision@K, demonstrating its ability to leverage confidence signals even in stable rating environments. For the highly sparse and biased Amazon Electronics dataset, the model yields up to +9.59% in NDCG@10 and +8.01% in Precision@5, showcasing strong resilience to noisy feedback. In the Netflix dataset, where inconsistency in user ratings is common, DST-AttentiveGAN records consistent performance boosts across all ranks and metrics, indicating its generalization capability under varying rating behaviors.

These quantitative gains are attributed to the integration of Dempster-Shafer Theory and the cross-attention mechanism. The former provides a principled way to model belief and uncertainty in ratings, while the latter dynamically weighs informative cues from similar users and items. This synergy enables the generator to correct unreliable feedback and preserve high-confidence signals, resulting in more accurate and relevant recommendations.

3.4.2. Impact of Dst-AttentiveGAN on Rating Distribution Refinement

The effectiveness of Dst-AttentiveGAN is highlighted by its ability to transform noisy ratings into a more structured and reliable distribution. Figure 3 demonstrates how the original ratings across the datasets, particularly in Amazon and Netflix, are dominated by noise. In Amazon, this is characterized by inflated 5-star ratings, leading to a skewed distribution, while Netflix exhibits ambiguity in midrange ratings, which reflect conflicting user sentiment or indecision. Such noisy distributions arise from various sources, including biased reviews, adversarial feedback, and platform-induced inflation, all of which obscure true user preferences and affect the reliability of recommendation models.

To address this, the model introduces confidence scores, as shown in Figure 4, which quantify the model’s trust in each rating. These scores are critical in distinguishing between reliable feedback and uncertain or biased ratings. High-confidence ratings indicate strong alignment with the user’s true preferences and are thus given more weight in the denoising process. In contrast, low-confidence ratings suggest ambiguity or inconsistency and are adjusted or filtered out. By calculating confidence scores based on the consistency and reliability of ratings, the model is able to systematically identify which ratings should be preserved and which should be downweighted.

Each box in Figure 4 illustrates the distribution of confidence values generated by our model for a given rating level. The horizontal line inside the box represents the median confidence, while the box boundaries indicate the interquartile range (IQR). The whiskers extend to values within 1.5 × IQR, and the points above or below the whiskers denote outlier confidence values that are significantly higher or lower than the main confidence distribution.

Following this process, Figure 5 illustrates the impact of the denoising step, where the distribution of ratings is refined. The high-confidence ratings are reinforced, resulting in sharper, more prominent peaks, particularly for 4 and 5 stars. The 3-star ratings, which often represent conflicting or mixed sentiments, are smoothed or adjusted, reducing their impact on the final output. This transformation significantly improves the rating distribution, making it more interpretable and aligned with true user preferences.

The denoised distributions, especially in Amazon and Netflix, become clearer and less skewed, with more reliable feedback that is better suited for downstream recommendation systems. In datasets like MovieLens, which contains cleaner data, the denoising process is less aggressive, as the confidence scores naturally highlight the reliability of existing ratings without needing extensive adjustments. As a result, the model adapts to dataset-specific characteristics, ensuring that only necessary adjustments are made.

By integrating confidence scores, Dst-AttentiveGAN ensures that the denoising process produces a structured, reliable distribution. This results in less biased feedback that is critical for building accurate user profiles in recommendation models, ultimately leading to better personalized recommendations.

3.4.3. Potential Extensions of DST Integration

To examine the effect of DST, we compare DST-AttentiveGAN with Self-AttentiveGAN, its version without DST signals. The results show consistent gains across datasets. On MovieLens, DST improves HR and NDCG by around +10%, confirming benefits even in relatively dense settings. Amazon, which is both sparse and noisy, shows larger improvements, with +16.7% on HR@10 and over +10% on NDCG, highlighting DST’s ability to handle uncertain feedback. Netflix achieves the most pronounced gains, with HR@5 and NDCG metrics improving by up to +18%. Overall, these results validate that DST-derived confidence strengthens robustness by reducing noise and emphasizing reliable user–item interactions.

Our experiments on DST-AttentiveGAN, showing that integrating DST-derived confidence effectively reduces noise and emphasizes reliable user–item interactions. These results indicate that the DST-based fusion strategy is not limited to this architecture and opens new directions for research in recommender systems. For other GAN variants such as GANMF [13], CFGAN [12], WGANRS [11] or AACF [14], DST can provide trust-aware guidance to both the generator and discriminator, enhancing robustness to sparse or noisy ratings, while its additional computational complexity remains manageable and could be further optimized in future work.

More broadly, DST can benefit a wide range of recommendation algorithms: in Matrix Factorization, it can weight interactions according to reliability; in Neural Collaborative Filtering, it can serve as an auxiliary signal guiding network training; in Graph Neural Networks, it can modulate message propagation based on confidence; and in Sequential models, it can adjust the influence of past interactions depending on certainty. By explicitly modeling uncertainty and managing conflicts among heterogeneous signals, DST provides a versatile mechanism to improve robustness and prediction quality across diverse recommender system architectures.

4. Conclusions

In conclusion, this work proposed the DST-AttentiveGAN, an innovative evidential adversarial framework for denoising inconsistent user ratings in collaborative filtering. By incorporating Dempster-Shafer theory into both the generator and discriminator, and integrating a cross-attention mechanism, the model effectively captures confidence signals and focuses on reliable user feedback while mitigating the impact of noise. The experimental results across various benchmark datasets demonstrate that DST-AttentiveGAN outperforms existing GAN-based methods, particularly under high-noise scenarios. The use of Wasserstein loss with gradient penalty contributes to more stable training, and the model consistently achieves superior performance in terms of Hit Ratio, NDCG, and especially Precision at lower values of top-K recommendations, where accurate ranking is most critical. These findings highlight the importance of explicitly modeling uncertainty in user ratings and incorporating it directly into the learning process.

In future work, we plan to incorporate multi-modal evidence, including textual reviews, temporal patterns, and item features, to improve uncertainty modeling in sparse or cold-start scenarios. We also intend to evaluate DST-AttentiveGAN under shilling and adversarial attack settings, in addition to testing on datasets from other domains such as social media and news articles, to better assess its robustness and generalizability across diverse recommendation environments. Furthermore, we aim to conduct experiments on integrating DST with GAN variants and other recommendation algorithms to evaluate its broader applicability. Finally, extending the model to sequential and interactive recommendation tasks will enable dynamic, context-aware personalization, where uncertainty is continuously refined through user interactions.

Author Contributions

Conceptualization, O.B. and B.B.; Methodology, O.B., B.B. and A.K.; Software, O.B. and A.K.; Validation, O.B., B.B. and A.A.; Formal analysis, O.B. and B.B.; Investigation, O.B.; Resources, O.B.; Data curation, O.B. and A.K.; Writing—original draft, O.B.; Writing—review & editing, O.B., B.B., A.K. and A.A.; Visualization, O.B. and A.K.; Supervision, B.B. and A.A.; Project administration, B.B. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used in this study are publicly available, as referenced in the Results and Discussion section. No new data were created in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ibrahim, O.A.S.; Younis, E.M.G.; Mohamed, E.A.; Ismail, W.N. Revisiting recommender systems: An investigative survey. Neural Comput. Appl. 2025, 37, 2145–2173. [Google Scholar] [CrossRef]
Aljunid, M.F.; Manjaiah, D.H.; Hooshmand, M.K.; Ali, W.A.; Shetty, A.M.; Alzoubah, S.Q. A collaborative filtering recommender systems: Survey. Neurocomputing 2025, 617, 128718. [Google Scholar] [CrossRef]
Huang, L.; Guan, C.R.; Huang, Z.-W.; Gao, Y.; Wang, C.-D.; Chen, C.L.P. Broad recommender system: An efficient nonlinear collaborative filtering approach. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2843–2857. [Google Scholar] [CrossRef]
Ibrahim, N.M.; Abiduzzaman, S.M.; Raziff, A.R.A.; Shah, A. A Collaborative Filtering Approach Using Machine Learning and Business Intelligence: A Critical Review. Int. J. Perceptive Cogn. Comput. 2025, 11, 41–49. [Google Scholar]
Jain, K.; Jindal, R. Sampling and noise filtering methods for recommender systems: A literature review. Eng. Appl. Artif. Intell. 2023, 122, 106129. [Google Scholar] [CrossRef]
Wang, S.; Zhang, X.; Wang, Y.; Ricci, F. Trustworthy recommender systems. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–20. [Google Scholar] [CrossRef]
Ayemowa, M.; Ibrahim, R.; Khan, M.M. Analysis of Recommender System Using Generative Artificial Intelligence: A Systematic Literature Review. 2024. Available online: https://ssrn.com/abstract=4922584 (accessed on 17 July 2025).
Gao, M.; Zhang, J.; Yu, J.; Li, J.; Wen, J.; Xiong, Q. Recommender systems based on generative adversarial networks: A problem-driven perspective. Inf. Sci. 2021, 546, 1166–1185. [Google Scholar] [CrossRef]
Deldjoo, Y.; He, Z.; McAuley, J.; Korikov, A.; Sanner, S.; Ramisa, A.; Vidal, R.; Sathiamoorthy, M.; Kasrizadeh, A.; Milano, S.; et al. Recommendation with generative models. arXiv 2024, arXiv:2409.15173. [Google Scholar] [PubMed]
Bobadilla, J.; Gutiérrez, A.; Yera, R.; Martínez, L. Creating synthetic datasets for collaborative filtering recommender systems using generative adversarial networks. Knowl.-Based Syst. 2023, 280, 111016. [Google Scholar] [CrossRef]
Bobadilla, J.; Gutiérrez, A. Wasserstein GAN-based architecture to generate collaborative filtering synthetic datasets. Appl. Intell. 2024, 54, 2472–2490. [Google Scholar] [CrossRef]
Chae, D.K.; Kang, J.S.; Kim, S.W.; Lee, J.T. CFGAN: A generic collaborative filtering framework based on generative adversarial networks. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018. [Google Scholar]
Dervishaj, E.; Cremonesi, P. GAN-based matrix factorization for recommender systems. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, Virtual, 25–29 April 2022. [Google Scholar]
Sun, Z.; Wu, B.; Hu, S.; Zhang, M.; Ye, Y. Attentive adversarial collaborative filtering. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 4064–4076. [Google Scholar] [CrossRef]
Zhao, K.; Li, L.; Chen, Z.; Sun, R.; Yuan, G.; Li, J. A survey: Optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Appl. Soft Comput. 2022, 124, 109075. [Google Scholar] [CrossRef]
Tang, Y.; Wu, K.; Li, R.; Guan, H.; Zhou, D.; Huang, Y. Probabilistic transformation of basic probability assignment based on weighted visibility graph networks. Appl. Soft Comput. 2025, 184, 113821. [Google Scholar] [CrossRef]
Belghaddar, Y.; Begdouri, A.; Chahinian, N.; Seriai, A.; Et-targuy, O.; Delenne, C. Dempster-Shafer theory for object matching under data imperfection constraints: Application to wastewater networks’ line matching. Inf. Sci. 2025, 717, 122304. [Google Scholar] [CrossRef]
Prokopenko, I.; Alpert, S.; Petrova, Y. Applications of Dempster–Shafer evidence theory to data processing in remote sensing. In Proceedings of the ADP 24: International Workshop on Algorithms of Data Processing, Kyiv, Ukraine, 5 November 2024; Volume 3895. [Google Scholar]
Azizi, A.; Pahlavani, P.; Nakhaei, M. Integrating Dempster–Shafer theory and clustering algorithms for enhanced groundwater potential assessment. In Stochastic Environmental Research and Risk Assessment; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–15. [Google Scholar]
Fei, L.; Li, T.; Liu, X.; Ding, W. A novel multi-source information fusion method for emergency spatial resilience assessment based on Dempster-Shafer theory. Inf. Sci. 2025, 686, 121373. [Google Scholar] [CrossRef]
Belmessous, K.; Sebbak, F.; Mataoui, M.; Senouci, M.R.; Cherifi, W. Dempster-Shafer Theory in Recommender Systems: A Survey. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2024, 32, 747–780. [Google Scholar] [CrossRef]
Le, Q.H.; Mau, T.N.; Tansuchat, R.; Huynh, V.N. A multi-criteria collaborative filtering approach using deep learning and dempster-shafer theory for hotel recommendations. IEEE Access 2022, 10, 37281–37293. [Google Scholar] [CrossRef]
Wang, X.; Qin, J. Multimodal recommendation algorithm based on Dempster-Shafer evidence theory. Multimed. Tools Appl. 2024, 83, 28689–28704. [Google Scholar] [CrossRef]
Belmessous, K.; Sebbak, F. A novel evidential collaborative filtering framework based on discounting conflicting preferences. Jordanian J. Comput. Inf. Technol. 2025, 11, 16. [Google Scholar] [CrossRef]
Jadon, A.; Patil, A. A comprehensive survey of evaluation techniques for recommendation systems. In Proceedings of the International Conference on Computation of Artificial Intelligence & Machine Learning, Jaipur, India, 18–19 January 2024; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
Deldjoo, Y.; Noia, T.D.; Sciascio, E.D.; Merra, F.A. How dataset characteristics affect the robustness of collaborative recommendation models. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020. [Google Scholar]
Zehlike, M.; Yang, K.; Stoyanovich, J. Fairness in ranking, part ii: Learning-to-rank and recommender systems. ACM Comput. Surv. 2022, 55, 1–41. [Google Scholar] [CrossRef]

Figure 1. Overview of the DST-AttentiveGAN architecture.

Figure 2. MAE and MSE Across Training Epochs for DST-AttentiveGAN.

Figure 3. Original Rating Distribution Across Datasets.

Figure 4. Confidence Score Distributions for Ratings Across Datasets.

Figure 5. Denoised Ratings Distribution Across Datasets.

Table 1. Statistics of the Datasets Used.

Dataset	#Users	#Items	#Ratings	Density
MovieLens 1M	6040	3900	1,000,209	4.25%
Netflix Subset	3000	4000	982,142	8.18%
Amazon Electronics	5000	4000	874,319	4.37%

Table 2. Performance on MovieLens Dataset.

Model	HR@5	HR@10	HR@20	NDCG@5	NDCG@10	NDCG@20	Precision@5	Precision@10	Precision@20
WGANRS	0.735	0.801	0.869	0.521	0.553	0.572	0.426	0.392	0.361
CFGAN	0.742	0.814	0.878	0.528	0.559	0.577	0.432	0.396	0.366
GANMF	0.748	0.819	0.884	0.531	0.562	0.581	0.437	0.398	0.369
AACF	0.752	0.823	0.888	0.534	0.565	0.584	0.440	0.402	0.373
Self-AttentiveGAN	0.712	0.793	0.857	0.515	0.548	0.591	0.418	0.425	0.381
DST-AttentiveGAN	0.790	0.876	0.930	0.557	0.600	0.625	0.489	0.447	0.401
Improvement (%)	+5.11%	+6.43%	+4.80%	+4.31%	+6.19%	+5.75%	+11.14%	+5.17%	+5.24%

Table 3. Performance on Amazon Electronics Dataset.

Model	HR@5	HR@10	HR@20	NDCG@5	NDCG@10	NDCG@20	Precision@5	Precision@10	Precision@20
WGANRS	0.576	0.661	0.768	0.403	0.413	0.421	0.396	0.363	0.322
CFGAN	0.583	0.657	0.781	0.411	0.419	0.428	0.401	0.367	0.327
GANMF	0.595	0.674	0.794	0.417	0.425	0.438	0.409	0.371	0.332
AACF	0.591	0.681	0.798	0.418	0.427	0.442	0.412	0.374	0.336
Self-AttentiveGAN	0.587	0.636	0.775	0.407	0.425	0.436	0.403	0.357	0.319
DST-AttentiveGAN	0.629	0.742	0.854	0.449	0.468	0.481	0.445	0.401	0.362
Improvement (%)	+5.71%	+8.96%	+7.01%	+7.42%	+9.59%	+8.83%	+8.01%	+7.21%	+7.74%

Table 4. Performance on Netflix Dataset.

Model	HR@5	HR@10	HR@20	NDCG@5	NDCG@10	NDCG@20	Precision@5	Precision@10	Precision@20
WGANRS	0.601	0.679	0.743	0.423	0.434	0.485	0.387	0.359	0.318
CFGAN	0.626	0.691	0.801	0.435	0.443	0.479	0.398	0.366	0.324
GANMF	0.638	0.713	0.846	0.448	0.451	0.496	0.409	0.371	0.331
AACF	0.639	0.701	0.849	0.442	0.456	0.501	0.412	0.374	0.335
Self-AttentiveGAN	0.591	0.667	0.813	0.407	0.425	0.436	0.403	0.357	0.322
DST-AttentiveGAN	0.682	0.766	0.906	0.478	0.503	0.548	0.415	0.386	0.345
Improvement (%)	+6.72%	+7.43%	+6.71%	+6.70%	+10.30%	+9.21%	+0.73%	+3.20%	+2.99%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Belgacem, O.; Boudaa, B.; Kouadria, A.; Abouaissa, A. A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering. Digital 2025, 5, 57. https://doi.org/10.3390/digital5040057

AMA Style

Belgacem O, Boudaa B, Kouadria A, Abouaissa A. A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering. Digital. 2025; 5(4):57. https://doi.org/10.3390/digital5040057

Chicago/Turabian Style

Belgacem, Ouahiba, Boudjemaa Boudaa, Abderrahmane Kouadria, and Abdelhafid Abouaissa. 2025. "A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering" Digital 5, no. 4: 57. https://doi.org/10.3390/digital5040057

APA Style

Belgacem, O., Boudaa, B., Kouadria, A., & Abouaissa, A. (2025). A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering. Digital, 5(4), 57. https://doi.org/10.3390/digital5040057

Article Menu

A GAN-Based Approach Incorporating Dempster–Shafer Theory to Mitigate Rating Noise in Collaborative Filtering

Abstract

1. Introduction

2. Method

2.1. Frame of Discernment

2.2. Evidence Construction

2.3. Generator Architecture with DST-Guided Cross-Attention

2.4. Discriminator Architecture with Evidential Conditioning

3. Results and Discussion

3.1. Experimental Setup

3.1.1. Datasets

3.1.2. Evaluation Metrics

3.1.3. Hyperparameters Settings

3.2. Empirical Results

3.3. Evaluation of Prediction Accuracy via MAE and MSE

3.4. Discussion and Analysis

3.4.1. Performance Comparison

3.4.2. Impact of Dst-AttentiveGAN on Rating Distribution Refinement

3.4.3. Potential Extensions of DST Integration

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI