MDM-GANSA: A Multi-Distribution Generative Shilling Attack for Recommender Systems

Quanqiang Zhou; Xiaoyue Zhang; Xi Zhao

doi:10.3390/info17010077

,

and

School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China

^*

Author to whom correspondence should be addressed.

Information2026, 17(1), 77;https://doi.org/10.3390/info17010077

Version Notes

Order Reprints

Abstract

Shilling attacks pose a significant threat to collaborative filtering recommender systems. However, fake user profiles generated by mainstream attack models often lack diversity and realism. Furthermore, the static noise strategies and statistical dependency modeling used in advanced frameworks like the Multi-Distribution Mixture Generative Adversarial Network (MDM-GAN) are ill-suited for high-dimensional, sparse attack scenarios. To address these challenges, we propose MDM-GANSA, a specialized attack model tailored for shilling attacks. First, it replaces the static mixture with a dynamic adaptive noise strategy by incorporating a weight predictor network. This network dynamically adjusts the weights of multiple noise sources based on the current training state, generating more diverse user latent representations. Second, it employs an autoencoder for data-driven dependency modeling, replacing the traditional statistical method. This allows the model to learn and generate profiles with inherent logical dependencies directly from genuine data. Consequently, it enhances the realism of the generated fake user profiles in terms of both statistical properties and internal logic. Additionally, the model utilizes an optimized two-stage generative architecture and fine-grained loss constraints to ensure training stability and high-quality outputs. Experimental results on two public datasets demonstrate that MDM-GANSA significantly outperforms various baseline models in both attack effectiveness and stealthiness. This study provides a concrete implementation for building a shilling-attack generation model targeting collaborative filtering recommender systems, and it also offers a feasible pathway for adapting general-purpose deep generative models to specialized security-oriented scenarios.

Keywords:

collaborative filtering recommender system; shilling attack; generative adversarial networks; multi-distribution mixture noise; autoencoder

1. Introduction

With the rapid growth of online information and content, recommender systems have become essential for mitigating information overload and improving user experience. Among the various recommendation paradigms, Collaborative Filtering (CF) [1] is widely adopted in core applications like e-commerce and streaming services. This technique models user similarities and preference patterns by analyzing vast amounts of historical interaction data, such as ratings and clicks, to provide highly personalized services.

However, CF’s reliance on user-generated content also makes it highly vulnerable to malicious manipulation. Shilling attacks [2] are among the most representative and destructive threats. Attackers inject a number of fake user profiles with fake ratings into the system to promote (push attacks) or demote (nuke attacks) specific items. A successful attack has severe consequences. It can distort fair market competition, mislead public opinion, and fundamentally erode user trust in the platform. Ultimately, this can lead to a significant decline in both the platform’s core value and user retention.

Shilling attack techniques have evolved from simple heuristics to more sophisticated strategies. Early approaches mainly relied on hand-crafted rules—such as random attacks, average attacks, and bandwagon attacks [3,4]—to construct fake user profiles by assigning extreme ratings to target items and rule-based ratings to filler items. While these methods are easy to implement, the resulting rating patterns are often rigid and exhibit distinguishable statistical characteristics, leading to limited attack effectiveness and poor stealthiness. In recent years, researchers have increasingly introduced deep generative models, particularly generative adversarial networks (GANs) [5,6,7,8,9], to learn real users’ rating distributions and generate fake profiles that more closely resemble genuine users in both statistical properties and behavioral patterns, thereby substantially improving the realism and concealment of such attacks.

Despite the great potential of GAN-based attack models, existing research still faces several bottlenecks in practice: (1) Lack of diversity: Relying on a single prior noise distribution makes it difficult to capture the heterogeneous preferences of genuine user groups. (2) Lack of realism: A simple mapping function struggles to generate rating patterns with inherent correlations between items. (3) Training instability: Inherent GAN issues, such as mode collapse [10], severely limit the scale and effectiveness of the attack.

As an advanced generative framework, the Multi-Distribution Mixture GAN (MDM-GAN) [11] partially alleviates the problem of insufficient diversity. However, when directly transferred to recommender-system scenarios, it also exhibits notable limitations. First, its static noise mixture strategy requires weights to be manually set as hyperparameters, which lacks the flexibility to adapt to complex attack demands. Second, its reliance on traditional statistical models like Copula theory is ill-suited for the high-dimensional, sparse data characteristic of recommender systems. This approach not only fails to ensure the fidelity of the generated data but also incurs significant computational overhead. These limitations suggest that a simple transfer may not be sufficient, and that task-specific customization and optimization are needed.

Therefore, this study adheres to the “offense-for-defense” research paradigm. We aim to develop a generative attack model that is better aligned with recommender-system attack scenarios. The key significance lies in providing a more realistic adversarial benchmark to facilitate subsequent research on defense strategies. To achieve this goal, we propose MDM-GANSA, a specialized attack model for shilling-attack scenarios built upon the MDM-GAN framework. This model performs two adaptive enhancements to the MDM-GAN framework. First, we replace the static scheme with a dynamic adaptive noise mixture strategy, which injects more intelligent diversity at the source. Second, we use a data-driven autoencoder (AE) [12] to replace the reliance on statistical theory, efficiently generating fake user profiles with more coherent internal logic. Furthermore, the model retains and optimizes the two-stage generative architecture and fine-grained loss constraints, yielding relatively robust performance in terms of training stability and generation quality. The main contributions of this paper can be summarized as follows:

We propose a dynamic adaptive multi-distribution noise mixture strategy. By introducing a weight predictor, the model can adaptively adjust the mixing proportions of multi-source noise according to the training dynamics, thereby balancing exploration and representation capacity across different training stages:
We achieve a paradigm shift in dependency modeling from “statistical theory” to a “data-driven” approach. We replace the original model’s reliance on Copula statistical theory with an AE to model complex dependencies between items. This design enables the model to directly learn non-linear associations from high-dimensional sparse data, generating fake user profiles that are more logically consistent and realistic.
We design a two-stage GAN architecture combined with fine-grained loss constraints. By introducing reconstruction consistency and feature matching losses at different stages, this design provides the generator with more stable and direct gradient guidance. To some extent, This design mitigates the instability of GAN training and overall improves the quality of the generated profiles as well as the attack performance.
We conduct comprehensive experiments on two widely used public datasets to evaluate the proposed model in terms of both attack effectiveness and stealthiness, and to compare it against multiple representative baseline methods.

The remainder of this paper is organized as follows: Section 2 reviews related work and introduces background knowledge. Section 3 provides a detailed description of our proposed MDM-GANSA model. Section 4 presents the experimental setup, results, and analysis. Section 5 concludes the paper and discusses future work.

2. Related Work and Background

In this section, we first review the evolution of shilling attack techniques and their current challenges. We then introduce the core concepts that form the basis of our model: AE, MDM-GAN, curriculum learning, and dynamic weighting.

2.1. Shilling Attack

The research history of shilling attacks can be broadly divided into two main phases based on the core technical paradigm. The first phase is characterized by models based on heuristic rules, which rely on manually defined statistical patterns to construct fake user profiles. The second phase marks a shift toward a generative paradigm based on deep learning. Researchers began using models like GANs to automatically learn the underlying distribution of genuine data, generating more realistic and stealthy attacks. Next, we will review and analyze representative works from each of these developmental stages.

Early research focused on designing heuristic rules based on simple statistical properties to construct fake user profiles. These methods are simple to implement and serve as a baseline for subsequent research, but their patterns are simplistic and easy to detect. The most fundamental model is the random attack [3]. It assigns the highest possible rating to the target item and then randomly selects several non-target items as fillers. These filler items are assigned random ratings based on the system’s global rating distribution, such as a normal distribution centered around the global average rating. However, these filler ratings are largely unrelated to genuine user preferences, resulting in noisy overall rating patterns that are prone to being exposed by statistical characteristics. To improve stealthiness, the average attack [3] assigns the highest rating to the target item but gives filler items their historical average ratings. This strategy, by mimicking mainstream rating patterns, makes the fake ratings more consistent with the general consensus, allowing them to more effectively influence collaborative filtering systems. However, their behavior still closely adheres to global statistical characteristics, while overlooking user-level preference heterogeneity and higher-order item correlations, which makes them vulnerable to more fine-grained distributional detection or behavior-aware modeling. Subsequently, a more strategic and efficient model, the bandwagon attack [4], was proposed. This model associates the target item with highly popular items in the system. In addition to assigning the highest rating to the target item, it also selects a small number of the most popular items and gives them high ratings, thereby rapidly gaining influence. Although bandwagon attacks are more effective at increasing the exposure of target items, their strong coupling with popular items introduces pronounced co-occurrence patterns and popularity bias. As a result, fake users become overly concentrated in the “popular-item subspace,” making them easier to capture by detectors that leverage item popularity and co-occurrence signals.

To overcome the limitations of traditional methods, researchers began to utilize deep generative models, especially GANs, to learn the intrinsic distribution of genuine user data and generate statistically indistinguishable fake user profiles.

Lin et al. proposed AUSH [5], a landmark in this field that represents a paradigm shift. Instead of creating fake user profiles from scratch, it “augments” a “template” user sampled from genuine users to generate fake profiles. AUSH achieves this through a customized GAN framework. Its core is an innovative multi-objective loss function that simultaneously optimizes three goals: realism, achieved through an adversarial loss to deceive the discriminator; plausibility, ensured by a reconstruction loss to preserve the template user’s original preferences; and effectiveness, guaranteed by an injection loss to achieve the attack’s objective on the target item. The success of AUSH demonstrated the immense potential of generative models in constructing high-quality attacks and laid the foundation for subsequent research. However, it is largely constrained by the quality and quantity of the template users, and makes only limited use of noise-source diversity.

Following AUSH, later research has advanced from different perspectives. The core innovation of the GOAT model [6] is its use of graph convolutional networks to focus on and model the relationships between items. It first constructs an “item-item” co-occurrence graph. When generating ratings, an item’s rating is influenced by its “co-occurrence neighbors”, creating fake user profiles with more internally consistent and smoother rating logic. However, its reliance on graph structure incurs substantial modeling and inference overhead, and the graph itself may also degrade in extremely sparse settings.

GSA-GANs [7] extend the attack scenario from an ideal white-box setting to a more realistic gray-box setting, where the attacker does not know the specific structure of the backend recommender model. Its key contribution is an innovative dual-GAN architecture: one GAN generates highly realistic fake user profiles for stealth, while another GAN acts as a recommender simulator, providing an optimizable proxy target for the attack. This allows for effective attacks in information-limited scenarios. Subsequent work has pushed the challenge to the more severe black-box scenario, where the attacker has no knowledge of the target system. The Leg-UP model [8] addresses this problem by training a surrogate model. Its basic assumption is that an attack capable of deceiving a powerful surrogate model is likely to transfer to an unknown target, thus achieving attack transferability. Technically, Leg-UP also introduces a learnable discretization layer, significantly improving the realism and precision of the generated ratings. However, GSA-GANs and Leg-UP place greater emphasis on the uncertainty induced by an “unobservable target system,” while providing comparatively weaker modeling of diversity in the input noise and the latent representations themselves.

The latest PAGUP model [9] aims to achieve a finer balance between attack effectiveness and stealthiness. Its core innovations are twofold. First, it intelligently selects “high-impact users” as templates by analyzing user behavior sequences. Second, it employs a unique variant of the GAN architecture (RDGAN) that designs two generators with distinct roles. The high-impact generator focuses on maximizing attack effectiveness, while the low-impact generator concentrates on mimicking the genuine distribution to ensure stealth. This ultimately achieves a decoupling and precise generation for both effectiveness and stealthiness. However, it pays relatively limited attention to the dynamic scheduling of input noise and the explicit modeling of rating dependencies.

Reviewing the existing work, we find that while generative models have great potential, they still have limitations in simulating user diversity and the logical realism of rating patterns. Therefore, this paper proposes a new attack model named MDM-GANSA: On the one hand, we introduce dynamic multi-source noise to enhance fake-user diversity at the input stage, without relying on template users or additional surrogate models. On the other hand, we adopt an AE-based latent dependency modeling strategy to more flexibly capture nonlinear item–item correlations in high-dimensional sparse rating data. By combining these two components, MDM-GANSA can further strengthen attacks against collaborative filtering recommender systems while maintaining the stealthiness of the generated profiles.

2.2. Autoencoder

An AE [12] is an unsupervised learning model widely used for feature learning, data dimensionality reduction, and reconstruction. Its core structure consists of two symmetrical parts: an encoder and a decoder. The encoder is responsible for compressing high-dimensional original input data into a low-dimensional, information-dense latent representation. The decoder then attempts to reconstruct the original data from this latent representation.

In its workflow, the encoder transforms the input data x into a latent space representation h through deterministic mapping. Specifically, this process is realized through a non-linear transformation, given by [13]:

\begin{matrix} h = f (W_{1} x + b_{1}), \end{matrix}

(1)

where

W_{1}

is the weight matrix from the input layer to the hidden layer (the latent space),

b_{1}

is the bias term, and

f (\cdot)

is an activation function (e.g., ReLU) used to introduce non-linearity, allowing the model to learn more complex patterns.

Next, the decoder receives this latent representation h and performs a reverse mapping process, attempting to reconstruct an output

\hat{x}

that is as similar as possible to the original input. The decoding process is formulated as [13]

\hat{x} = g (W_{2} h + b_{2}),

(2)

where

W_{2}

is the weight matrix from the hidden layer to the output layer,

b_{2}

is the bias term, and

g (\cdot)

is the activation function used in the decoder. The training objective of an AE is to minimize the difference (i.e., reconstruction error) between the original data x and the reconstructed data

\hat{x}

. This forces the model to learn the most essential and core features of the data in the low-dimensional latent representation h.

Although advanced variations like variational autoencoders (VAEs) [14] and denoising autoencoders (DAEs) [15] have been developed in the deep learning field, the traditional AE remains a widely used tool for data reconstruction and dependency modeling tasks due to its simple structure, efficient training, and powerful non-linear feature extraction capabilities. This makes it particularly suitable for extracting valuable latent dependency relationships from the high-dimensional, sparse rating data of collaborative filtering recommender systems for our model.

2.3. Multi-Distribution Mixture Generative Adversarial Network

A GAN [16,17] is a powerful deep learning framework that learns the distribution of genuine data through an adversarial game between a generator and a discriminator. Its core objective function is as follows [16]:

\begin{matrix} \min_{G} \max_{D} V (D, G) = E_{x \sim p_{X} (x)} [l o g D (x)] + E_{z \sim p_{Z} (z)} [l o g (1 - D (G (z)))] . \end{matrix}

(3)

Here, the generator G attempts to produce data

G (z)

from noise z (sampled from a prior distribution

p_{Z} (z)

) that is statistically indistinguishable from genuine data. The discriminator D, in turn, strives to differentiate between the genuine data x and the generated data.

However, traditional GANs often face challenges when fitting complex, multimodal datasets. A major bottleneck is that the generator typically relies on a single, structurally simple prior noise distribution, such as the standard normal distribution. This makes it difficult to capture the diverse modes and complex structures inherent in genuine data, often leading to mode collapse [10] and a lack of sample diversity.

To address this challenge, Yang et al. proposed MDM-GAN [11]. This framework introduces two key designs that improve the model’s generative capability and diversity to a considerable extent.

(1) Multi-distribution mixture noise strategy

MDM-GAN does not use a single noise source. Instead, it samples from a set of probability distributions with different statistical properties and inputs them into the generator after a weighted mixture. This strategy injects rich diversity into the generation process from the very beginning. The distributions considered in the original paper include the following:

Normal distribution: Simulates the central tendency of the data. Its n-dimensional probability density function is [11]:

$f_{X} (x) = \frac{1}{\sqrt{{(2 π)}^{n} | Σ |}} e^{- \frac{1}{2} ({(x - μ)}^{T} Σ^{- 1} (x - μ))},$

(4)

where $μ$ and $Σ$ are the mean vector and the covariance matrix between dimensions, respectively.
Uniform distribution: Promotes exploration in the latent space. Its n-dimensional probability density function is [11]:

$g_{X} (x) = \frac{1}{V},$

(5)

where $x \in D M$ , $D M$ is the domain, and V is the volume of $D M$ .
Laplace distribution: Its sharp peak and heavy tails help generate data with more distinct features. Its n-dimensional probability density function is [11]:

$h_{X} (x) = \frac{2}{{(2 π)}^{\frac{n}{2}} {| Σ |}^{0.5}} \frac{x^{'} {| Σ |}^{- 1} x}{2} K_{ρ} (\sqrt{2 x^{'} Σ^{- 1} x}),$

(6)

where $ρ = \frac{2 - n}{2}$ , and $K_{ρ}$ is the modified Bessel function of the second kind.
t-distribution: Its heavy tails allow it to better model data containing outliers, improving model robustness. Its shape is controlled by the degrees of freedom parameter $ν$ . The smaller $ν$ is, the heavier the tails. Its n-dimensional probability density function is [11]:

$z_{X} (x) = \frac{Γ (\frac{ν + n}{2})}{Γ (\frac{ν}{2}) ν^{\frac{n}{2}} π^{\frac{n}{2}} {| Σ |}^{\frac{1}{2}}} {(1 + \frac{1}{ν} {(x - μ)}^{T} {| Σ |}^{- 1} (x - μ))}^{- \frac{ν + n}{2}},$

(7)

where $Γ$ is the gamma function.

By creating a weighted combination of vectors

x_{1}, \dots, x_{k}

sampled from these different distributions

D_{1}, \dots, D_{k}

, MDM-GAN constructs the final input noise z [11]:

z = w_{1} x_{1} \oplus w_{2} x_{2} \oplus \dots \oplus w_{k} x_{k},

(8)

where

w_{i}

are adjustable mixture weights satisfying

\sum_{i = 1}^{k} w_{i} = 1

. This mixed noise strategy allows the model to better fit complex data distributions.

(2) Two-stage generative architecture

To stabilize training and gain finer control over the generation process, MDM-GAN introduces a two-stage generative architecture, breaking down the task into two independent sub-tasks:

Stage 1: Generate latent dependency vector. The core of this stage is to learn and generate the dependency relationships between different dimensions of the data, rather than the specific numerical values. It utilizes Copula theory, whose essence lies in Sklar’s theorem [18]:

F_{X} (x_{1}, \dots, x_{n}) = C (F_{X_{1}} (x_{1}), \dots, F_{X_{n}} (x_{n})) .

(9)

This theory allows for the transformation of original data x into a uniformly distributed variable u that contains only dependency information. This lets the generator

G_{u}

focus on learning this latent dependency vector.

Stage 2: Generate samples. This stage takes the latent dependency vector

u^{'}

generated in the first stage as input and maps it back to the original data space, generating the final, complete data sample

x^{'}

.

This decoupling strategy—separating dependency-structure generation from numerical value generation—helps stabilize GAN training and improves the realism and internal coherence of the generated samples. In summary, MDM-GAN, with its mixed noise strategy and two-stage architecture, provides a feasible foundation for generating high-dimensional data with complex internal correlations, such as user ratings. It is an important starting point for building a high-quality attack model in our research. However, despite MDM-GAN’s excellent performance in general data generation, its static noise mixture method and reliance on statistical theory for dependency modeling still present limitations when directly applied to the high-dimensional, sparse scenario of shilling attacks in collaborative filtering recommender systems. This is also the starting point for further optimization in our research.

2.4. Curriculum Learning and Dynamic Weighting

Curriculum learning is a training strategy inspired by the human learning process. Its core idea is to present samples to the model in a structured order, from easy to difficult, rather than all at once in a random order [19]. This method has been proven to accelerate model convergence and improve final generalization performance.

Traditional curriculum learning often relies on a pre-defined, fixed “curriculum”, which lacks feedback on the model’s actual learning state. To overcome this limitation, researchers have proposed a data-driven, dynamic curriculum learning paradigm. A representative work, MentorNet [20], introduced an auxiliary network, the weight predictor, to dynamically assign weights to training samples. This predictor observes the main network’s performance during training (e.g., each sample’s loss, training epoch) and uses this information to determine which samples are “easy” or “credible”, assigning them higher weights. In this way, the curriculum is no longer static but is dynamically adjusted based on the model’s real-time feedback, leading to more efficient and robust training.

This idea of “using an auxiliary network to dynamically adjust the training strategy” offers a new approach for solving complex generation tasks. Although MentorNet was used to weight training samples to handle label noise, its core concept of dynamically adjusting inputs or strategies based on training state is highly versatile. In our research, we borrow and extend this idea by applying it to the noise input of the GAN. We design a weight predictor that no longer focuses on individual samples but instead dynamically weights different types of noise sources based on the overall training dynamics of the GAN. This establishes an adaptive noise-scheduling mechanism for the generation process. This strategy aims to enhance the generator’s diversity and its ability to fit complex data distributions from the source.

3. Proposed Shilling Attack Model

This section provides a detailed introduction to our proposed shilling attack model, MDM-GANSA. We will successively describe its overall architecture, the multi-distribution mixture noise generation strategy, the AE-based latent dependency vector extraction mechanism, the two-stage GAN, and the model’s training algorithm.

3.1. Model Architecture

The overall architecture of the proposed MDM-GANSA model is shown in Figure 1. The model adopts a two-stage generation process that decomposes fake-profile generation into two steps. First, it generates a fake latent dependency vector (

u^{'}

) that represents the user’s core preferences. Subsequently, in the second stage, it uses this latent vector to generate a complete fake rating vector (

x^{'}

). This two-stage design aims to ensure the diversity and realism of the final generated samples, as well as the stability of the training process.

Figure 1. The overall architecture of the MDM-GANSA model.

The model’s workflow begins with the first stage, which aims to learn and generate a low-dimensional, dense fake latent dependency vector

u^{'}

. This latent representation encapsulates the core logic and correlations of a user’s rating behavior and is fundamental to generating a high-quality fake rating vector

x^{'}

. In this stage, we introduce several key modules: a weight predictor (detailed in Section 3.2) that dynamically mixes multiple noise sources to form a rich mixed noise vector z; an AE (detailed in Section 3.3) that extracts the corresponding “ground truth” genuine latent dependency vector u from the genuine rating vector X. Then, the generator

G_{u}

takes z as input to generate

u^{'}

, while the discriminator

D_{u}

learns to distinguish between the

u^{'}

generated by

G_{u}

and the u extracted by the AE. Through this adversarial training,

G_{u}

is forced to generate latent representations that are statistically indistinguishable from the internal preferences of genuine users.

After successfully mastering how to generate high-quality

u^{'}

, the model proceeds to the second stage. The task of this stage is to transform the

u^{'}

generated in the first stage into a complete, high-dimensional fake rating vector

x^{'}

that can be directly used for an attack. This stage consists of another dedicated pair of generator and discriminator networks. The generator

G_{x}

takes

u^{'}

as input and expands it into a complete fake rating vector

x^{'}

with the same dimensionality as the original user data. Meanwhile, the discriminator

D_{x}

operates in the original data space, its goal being to distinguish between the

x^{'}

generated by

G_{x}

and the genuine rating vector x. Through this final round of adversarial competition at the output level,

G_{x}

is able to generate

x^{'}

that is indistinguishable from x in both content and structure, thus successfully simulating realistic fake user profiles.

By separating “learning the dependency structure” from “generating specific samples”, this two-stage design allows MDM-GANSA to control the generation process more stably and precisely. This systematically improves the quality and stealthiness of the fake user profiles, achieving a reasonable balance between attack effectiveness and stealthiness.

3.2. Multi-Distribution Mixture Noise Generation

Traditional GANs typically rely on a single prior probability distribution, such as a standard normal or uniform distribution, to generate the initial noise. However, the preferences and interaction patterns of users in the real world are extremely complex and diverse, exhibiting typical multimodal characteristics. Such a structurally simple single noise source is insufficient to capture the complexity of the data, often leading to generated fake user profiles that are simplistic in pattern and lack realism, making them easily to detect by advanced attack detection mechanisms.

To overcome this limitation and generate more diverse and realistic fake user profiles, the MDM-GANSA model borrows and extends the core idea of MDM-GAN and introduces a dynamic, adaptive multi-distribution mixture noise generation strategy.

3.2.1. Strategy Adaptation and Optimization

The original MDM-GAN framework samples from multiple high-dimensional multivariate probability distributions to capture and utilize the complex internal dependency relationships between data dimensions (e.g., defined by a covariance matrix), thereby enhancing the diversity and realism of the generated data. However, in the specific context of shilling attacks on collaborative filtering recommender systems, we found that this strategy poses adaptation challenges. Particularly when the latent space dimension is high, directly sampling from and maintaining the inter-dimensional dependencies of complex high-dimensional multivariate distributions would lead to huge computational and resource overhead, limiting the model’s practicality and efficiency.

Based on these considerations, MDM-GANSA simplifies and adapts the noise generation strategy. We shift the core idea from directly modeling high-dimensional dependencies to performing independent sampling for each dimension of the latent noise vector and achieve diversity by mixing multiple one-dimensional probability distributions with different statistical properties. This dimensionally decoupled design preserves the core advantages of multi-source noise while drastically lowering the computational complexity and implementation burden associated with high-dimensional joint sampling. This improves the model’s practicality and efficiency in sparse recommender systems. Meanwhile, since the latent noise dimensions are mutually independent, different dimensions can more evenly explore distinct base distribution patterns. In our framework, “diversity” is achieved primarily through multi-distribution mixing with dynamic weight scheduling, whereas “structured dependency” is learned jointly by the first-stage generator and the autoencoder during the subsequent mapping process, capturing the nonlinear coupling between them.

The specific process is as follows: The model first independently samples from four parameterized one-dimensional probability distributions to generate four base noise vectors, each with dimension d. Then, a specially designed weight predictor network dynamically assigns appropriate mixture weights to these four base noise vectors based on the training progress. Finally, these dynamically weighted base noise vectors are linearly combined to form the final input to the generator, a rich and diverse mixed noise vector. It is worth emphasizing that independent sampling is merely a computationally pragmatic simplification of the noise source. The structured dependencies in real rating vectors are instead modeled and reconstructed by the subsequent fully connected MLP generator together with the pretrained autoencoder. Therefore, the dimension-independence assumption does not fundamentally weaken the model’s ability to capture complex dependency structures; rather, it shifts “dependency modeling” from the noise-prior level to learnable deep networks.

3.2.2. Selection of Base Probability Distributions

To achieve dimension-independent sampling, MDM-GANSA samples each latent dimension from the following four parameterized one-dimensional probability distributions.

To enable dimension-wise independent sampling, MDM-GANSA independently draws samples for each latent dimension from four parameterized one-dimensional probability distributions. Their probability density functions are given in Equations (10)–(13), corresponding to the normal distribution

N (μ, σ^{2})

, the uniform distribution

U (a, b)

, the Laplace distribution

Lap (μ, b)

, and the t-distribution

t_{ν}

. The parameters of these distributions are estimated from the statistics of the rating data in the training set:

μ

and

σ

are set to the global mean and standard deviation, a and b are set to the minimum and maximum rating values, and the degrees of freedom

ν

are determined according to the data scale. In addition, a location–scale transformation based on the mean and variance is applied to ensure that the sampled noise lies within a reasonable range close to the rating scale.

f (x ∣ μ, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}} .

(10)

f (x ∣ a, b) = \frac{1}{b - a}, a \leq x \leq b .

(11)

f (x ∣ μ, b) = \frac{1}{2 b} e^{- \frac{| x - μ |}{b}} .

(12)

f (t ∣ ν) = \frac{Γ (\frac{ν + 1}{2})}{\sqrt{ν π} Γ (\frac{ν}{2})} {(1 + \frac{t^{2}}{ν})}^{- \frac{ν + 1}{2}} .

(13)

From a functional perspective, the four distributions play complementary roles in the latent space. Normal and Laplace noise primarily capture mainstream preferences concentrated around the global mean; in particular, the sharper peak and heavier tails of the Laplace distribution are beneficial for characterizing users with more “decisive” preferences. The uniform distribution provides an approximately unbiased exploration capability for each dimension, helping the generator escape local modes and mitigating mode collapse. The t-distribution, in contrast, focuses on modeling niche or extreme-preference users, which is especially important for enhancing the overall multimodality and stealthiness of the generated profiles. By mixing these four types of noise within a single latent vector, the model attains a richer prior diversity while keeping the implementation simple and the computational overhead manageable, thereby laying a solid foundation for generating high-quality fake user profiles.

3.2.3. Dynamic Adaptive Mixture Mechanism

MDM-GANSA does not use the four base probability distributions in isolation or statically. Instead, it employs a dynamic fusion mechanism to generate a more adaptive mixed noise. The core of this mechanism is that the model can dynamically adjust the weight of each noise component based on the real-time status of the training process.

To this end, we have specifically designed a weight predictor,

W_{p r e d}

. It is a lightweight multi-layer perceptron (MLP), as illustrated in Figure 2. The predictor’s responsibility is to self-adaptively assign appropriate weights to the four base noise vectors based on current training dynamics.

Figure 2. Network architecture of the dynamic weight predictor (

W_{p r e d}

).

The input to

W_{pred}

is a feature vector f that encapsulates key indicators of the training state:

f = (p r e_L_{G_{u}}, p r e_L_{D_{u}}, Δ L_{G_{u}}, Δ L_{D_{u}}, e p o c h f r a c),

(14)

where

p r e_L_{G_{u}}

and

p r e_L_{D_{u}}

represent the loss values of the first-stage generator

G_{u}

and discriminator

D_{u}

from the previous training epoch, reflecting the past performance of the networks.

Δ L_{G_{u}}

and

Δ L_{D_{u}}

are the changes in these two losses from the previous epoch, indicating the gradient trend of the training (e.g., whether the loss is decreasing rapidly, stabilizing, or oscillating).

e p o c h f r a c

represents the proportion of completed training epochs to the total, indicating the training progress. Consistent with the “curriculum learning” intuition in Section 2.4, we treat f as a compact representation of the training state. Specifically, the loss magnitude reflects the current difficulty of fitting the data, the first-order loss difference captures the trend of difficulty change, and

e p o c h f r a c

indicates the relative temporal position from “early-stage coarse exploration” to “late-stage fine convergence.” In this way, the weight predictor does not require a manually designed curriculum schedule; instead, it learns from these training-state indicators when to emphasize exploratory noise and when to shift toward more refined noise.

It should be noted that we deliberately use these particular loss histories and their first-order differences—rather than discriminator gradients, generator weight variations, or discriminator confidence margins—primarily for the following reasons. First, in GAN training, gradients and parameter updates are often highly sensitive to mini-batch noise, and their scales are difficult to normalize across different architectures. In contrast, losses and their changes tend to be more comparable and stable across models, making them better suited as architecture-agnostic curriculum signals. Second, directly monitoring parameter changes or intermediate-layer confidence typically requires additional access to implementation-specific details, which increases overhead and reduces portability. By comparison, losses and epoch progress are “free” global signals available in most training frameworks. Overall, using

(pre_L_{G_{u}}, pre_L_{D_{u}}, Δ L_{G_{u}}, Δ L_{D_{u}}, epochfrac)

preserves conceptual consistency with the curriculum-learning paradigm while balancing stability and computational cost in practice, providing a feasible and broadly applicable trade-off under complex GAN training dynamics.

The rationale behind this mechanism is that the demand for noise characteristics changes at different stages of GAN training. For example, in the early stages, more uniform distribution is needed to encourage exploration and avoid mode collapse. In the later stages, the focus shifts to normal and Laplace distributions to finely depict mainstream user characteristics, or to the t-distribution to capture “outlier” users to improve stealth. From the curriculum-learning perspective described above, the weight predictor

W_{p r e d}

learns a mapping from the training-state vector f to the optimal noise mixing proportions, thereby automatically implementing the dynamic scheduling strategy that progressively evolves across different training stages.

As shown in Figure 2,

W_{pred}

learns this complex mapping from “training state” to “optimal noise ratio” through its multi-layer non-linear structure. Its specific network architecture is as follows:

Input layer: Receives the 5-dimensional training state feature vector f.
Hidden layer 1 (feature extraction and expansion): A fully connected layer expands the input to a 256-dimensional vector, followed by a LeakyReLU activation function to extract richer non-linear features.
Hidden layer 2 (feature refinement and stabilization): A fully connected layer reduces the feature dimension to 128, followed sequentially by layer normalization and a LeakyReLU activation function to stabilize training and refine key information.
Output layer (weight scoring): The final fully connected layer maps the 128-dimensional features to a 4-dimensional raw “score” vector. Each element of this vector represents the raw contribution “score” for the normal, uniform, Laplace, and t-distributions to the current mixed noise.

This raw 4-dimensional “score” vector is then passed through a softmax function for normalization, producing the final weight vector w, ensuring that all weights are non-negative and sum to 1:

w = (w_{n}, w_{u}, w_{l}, w_{t}) = softmax (W_{pred} (f)),

(15)

where

w_{n} + w_{u} + w_{l} + w_{t} = 1

.

Finally, the ultimate mixed noise vector z, which is input to the first-stage generator

G_{u}

, is obtained by performing a weighted sum of the four independently sampled base noise vectors:

z = w_{n} \cdot z_{normal} + w_{u} \cdot z_{uniform} + w_{l} \cdot z_{Laplace} + w_{t} \cdot z_{t},

(16)

where

z_{normal}

,

z_{uniform}

,

z_{Laplace}

, and

z_{t}

represent the base noise vectors sampled from the normal, uniform, Laplace, and t-distributions, respectively.

This dynamic weighted mixture strategy endows the MDM-GANSA model with a degree of adaptability. It can flexibly adjust the composition of the input noise based on real-time training feedback, laying a solid foundation for the subsequent generation of high-quality, highly stealthy fake user profiles.

3.3. Latent Dependency Vector Extraction Based on Autoencoder

In the original MDM-GAN, the first stage relies on explicit statistical models (e.g., copulas) to characterize inter-variable dependencies. However, in high-dimensional and highly sparse collaborative filtering settings, such approaches often suffer from the curse of dimensionality and mismatched modeling assumptions. To address this issue, MDM-GANSA adopts an autoencoder (AE) as a data-driven dependency extraction module, learning low-dimensional latent representations directly from the rating matrix.

Specifically, we employ a symmetric MLP-based autoencoder (see Figure 3). The encoder compresses a real rating vector x into a 128-dimensional latent dependency vector

u = E (x)

, and the decoder reconstructs

\hat{x} = D (u)

from u. The AE is pretrained by minimizing the reconstruction error, using mean squared error (MSE) as the sole objective:

L_{AE} = \frac{1}{m} \sum_{i = 1}^{m} {∥x_{i} - D (E (x_{i}))∥}_{2}^{2},

(17)

where m is the batch size,

x_{i}

is the i-th real user’s rating vector,

E (x_{i})

is its representation obtained through the encoder, and

D (E (x_{i}))

is the rating vector reconstructed by the decoder. By minimizing this loss

L_{AE}

, the AE is driven to effectively capture and preserve the intrinsic structure and core information within the user rating data.

Figure 3. Network architecture of the autoencoder for dependency extraction.

It should be noted that the AE in this work primarily suppresses trivial identity mapping through the combination of an undercomplete bottleneck, sparse inputs, and an MSE reconstruction loss. On one hand, the latent dimension (128) is far smaller than that of the real rating vector, so the network cannot simply perform element-wise copying by design; instead, it must select the most informative features for reconstruction within a compressed space. On the other hand, given the extreme sparsity of collaborative filtering data, a pseudo-identity strategy—i.e., “output the observed ratings as-is and set unobserved entries to zero”—would incur large reconstruction errors at the numerous missing positions and thus be heavily penalized by the MSE loss in Equation (17). Therefore, even without explicitly adding additional sparsity regularization or denoising terms, the architecture itself encourages the network to uncover non-trivial dependencies among user ratings rather than collapsing into a simple copying task.

Through this process, we not only solve the limitations of traditional statistical methods but also provide high-quality “genuine samples” u for the subsequent two-stage GAN training. These genuine latent dependency vectors extracted by the AE will play a crucial role: they will serve as the “real” input for the first-stage discriminator

D_{u}

and provide a clear, realistic learning target for the generator

G_{u}

. This lays a solid foundation for generating high-quality fake user profiles in the subsequent stages of the entire model.

In principle, end-to-end joint training or alternating optimization of the AE and the generator could further improve latent-space consistency. However, in our preliminary experiments, such joint optimization substantially amplified the instability of GAN training, often leading to degraded reconstruction accuracy and even mode collapse. For the sake of training stability and engineering controllability, we adopt a stage-wise pretraining strategy for the AE: we first train the AE on real rating data until convergence by minimizing only the reconstruction loss in Equation (17); then, during adversarial training, we freeze the AE parameters and treat it as a stable dependency extractor, providing discriminator

D_{u}

with supervision in the form of real latent dependency vectors.

3.4. Two-Stage Generative Adversarial Network

To effectively tackle the complex task of generating fake user profiles and further enhance the stability of the training process and the realism of the final generated samples, the MDM-GANSA model adopts a two-stage GAN architecture. This architecture inherits and optimizes the core design philosophy of MDM-GAN. Its essence lies in systematically decoupling the complex generation process into two logically sequential and progressively difficult sub-tasks through more refined task decomposition and loss constraints. As shown in Figure 1, each stage includes a dedicated pair of generator and discriminator, focusing on learning and generating data at different levels of representation.

The core of the first stage is to learn and generate a latent vector representing dependency relationships. The goal of this stage is not to directly generate rating vectors but to learn and simulate the intrinsic, abstract logical structure within a genuine user’s preferences. The task of the generator

G_{u}

is to take the mixed noise vector z as input and transform it into a fake latent dependency vector

u^{'} = G_{u} (z)

. This

u^{'}

, in terms of dimension and structure, is designed to mimic the genuine latent dependency vector u extracted from genuine data by the AE (Section 3.3).

The network architecture of generator

G_{u}

employs an MLP, with its detailed structure shown in Figure 4. This design, from low to high and then to the target dimension, helps the model learn richer feature representations.

Figure 4. Network architecture of the generator (

G_{u}

).

In contrast, the discriminator

D_{u}

is responsible for distinguishing between the genuine latent dependency vector u (extracted by the AE) and the fake latent dependency vector

u^{'}

generated by

G_{u}

. The network structure of discriminator

D_{u}

is shown in Figure 5.

Figure 5. Network architecture of the discriminator (

D_{u}

).

We use the standard adversarial loss [16,17] to train the discriminator

D_{u}

. Its adversarial loss function

L_{D_{u}}

is defined as

\begin{matrix} L_{D_{u}} = - E_{u \sim p_{U} (u)} [l o g D_{u} (u)] - E_{z \sim p_{Z} (z)} [l o g (1 - D_{u} (G_{u} (z)))], \end{matrix}

(18)

where u represents the real dependency data,

p_{U} (u)

is its distribution, and

p_{Z} (z)

is the distribution of the multi-distribution mixture noise.

Correspondingly, to guide the generator

G_{u}

more effectively, we have refined its loss function. The standard adversarial loss relies entirely on feedback from the discriminator

D_{u}

. However, in the early stages of training, the discriminator itself has not yet learned well, and the gradient signals it provides are often noisy and lack clear direction. This can slow down the generator’s convergence and potentially lead to training instability. To mitigate this early-stage training bottleneck, we introduce an additional reconstruction consistency loss,

L_{r e c}

. This loss term bypasses the immature discriminator and provides the generator with a stable and direct optimization target by directly calculating the L2 norm distance between the generator’s output

G_{u} (z)

and the genuine latent representation u. The total loss function for

G_{u}

,

L_{G_{u}}

, is expressed as:

L_{G_{u}} = - E_{z \sim p_{Z} (z)} [l o g D_{u} (G_{u} (z))] + λ_{rec} \cdot L_{rec},

(19)

where

L_{rec}

is defined as:

L_{rec} = E_{z \sim p_{Z} (z), u \sim p_{U} (u)} [{∥G_{u} (z) - u∥}_{2}^{2}] .

(20)

Here,

λ_{rec}

is a weight coefficient that controls the influence of this loss term.

After successfully learning to generate fake latent dependency vectors (

u^{'}

) in the first stage, the model enters the second stage. Its core task is to use this data to construct a complete, high-dimensional fake rating vector.

The generator

G_{x}

takes

u^{'}

as input and “translates” and expands it into a complete, high-dimensional fake rating vector

x^{'} = G_{x} (u^{'})

. The network structure of generator

G_{x}

is shown in Figure 6.

Figure 6. Network architecture of the generator (

G_{x}

).

The discriminator

D_{x}

, operating in the original data space, undertakes the final authenticity judgment task, responsible for distinguishing between real rating vector x and fake rating vector

x^{'}

. The network structure of discriminator

D_{x}

is shown in Figure 7.

Figure 7. Network architecture of the discriminator (

D_{x}

).

The loss function of

D_{x}

,

L_{D_{x}}

, is formally similar to

L_{D_{u}}

, also using the standard adversarial loss [16,17]:

\begin{matrix} L_{D_{x}} = - E_{x \sim p_{X} (x)} [l o g D_{x} (x)] - E_{u^{'} \sim p_{U^{'}} (u^{'})} [l o g (1 - D_{x} (G_{x} (u^{'})))], \end{matrix}

(21)

where

p_{X} (x)

is the distribution of real rating data, and

p_{u^{'}} (u^{'})

is the distribution of fake latent dependency vectors generated by the first-stage generator

G_{u}

.

To further improve the quality of the generated samples and stabilize training, we introduce a feature matching loss,

L_{f e a t}

, for the optimization of

G_{x}

. This loss no longer solely relies on the final judgment result of

D_{x}

but encourages the fake rating vector

x^{'}

generated by

G_{x}

to have intermediate layer feature representations in

D_{x}

that are consistent with the feature representations of the real rating vector x. This method provides richer and more stable gradient signals, helping the generator learn more detailed statistical properties and structural details. The total loss function for

G_{x}

,

L_{G_{x}}

, is defined as

L_{G_{x}} = - E_{u^{'} \sim p_{U^{'}} (u^{'})} [l o g D_{x} (G_{x} (u^{'}))] + λ_{feat} \cdot L_{feat},

(22)

where the feature matching loss

L_{feat}

is defined as follows [21]:

L_{feat} = E_{x \sim p_{X} (x), u^{'} \sim p_{U^{'}} (u^{'})} [{∥f_{D_{x}} (x) - f_{D_{x}} (G_{x} (u^{'}))∥}_{2}^{2}] .

(23)

Here,

f_{D_{x}} (x)

represents the intermediate layer features extracted by the feature extractor of

D_{x}

, and

λ_{feat}

is the weight coefficient for this loss. Through two-stage generation and loss constraint design, MDM-GANSA provides a feasible solution for generating high-quality fake user profiles.

Under this two-stage framework, the latent representation learned in the first stage retains sufficient information for rating reconstruction in the second stage while also satisfying attack-oriented objectives. This dual requirement may, in theory, introduce a risk of representational bias: if improperly optimized, the generator may prioritize features that are “reconstruction-friendly” rather than those that truly benefit stealthiness and attack accuracy. MDM-GANSA mitigates this risk in two ways. First, the real latent vector u is extracted from real rating data by a pretrained AE whose objective is to capture the principal structure of user preferences. Consequently, constraining

G_{u} (z)

to approximate u via the discriminator

D_{u}

and the reconstruction consistency loss

L_{r e c}

effectively encourages the latent representation to align with the distribution of genuine user preferences, helping preserve both realism and stealthiness. Second, the second stage retains the full adversarial loss: the discriminator

D_{x}

distinguishes between real rating vector x and generated vectors

x^{'}

in rating space, continuously feeding back gradients related to attack effectiveness to both

G_{x}

and

G_{u}

. This prevents the training dynamics from being dominated solely by reconstruction-oriented features.

Overall,

L_{r e c}

and

L_{f e a t}

jointly form an auxiliary supervision scheme. Across different stages of adversarial training, they provide the generator with stable, data-driven gradients that bypass the discriminator, helping mitigate mode collapse, accelerate convergence, and improve the final generation quality.

3.5. Model Training Algorithm

Let X be the genuine user-item rating matrix of the training set. Let m be the batch size during the training process. Let M be the final number of fake user profiles to be generated. Let

{S e t}_{h y p} = {K, k_{1}, k_{2}, p}

be the hyperparameter configuration, where K is the total number of training epochs,

k_{1}

is the number of inner training loops for the discriminators

D_{u}

and

D_{x}

,

k_{2}

is the number of inner training loops for the generators

G_{u}

and

G_{x}

, and p is the number of AE training epochs. Let S be the hyperparameter search space. The detailed training steps of the MDM-GANSA model are shown in Algorithm 1.

Algorithm 1 MDM-GANSA model training algorithm

Input: $X, m, M, {Set}_{hyp}, S$ .
Output: A set of fake rating vectors ${x_{i}^{'}}_{i = 1}^{M}$ .
I. Model training

1:: Initialize parameters $θ_{AE}, θ_{G_{u}}, θ_{G_{x}}, θ_{D_{u}}, θ_{D_{x}}$ for AE, generators $G_{u}$ , $G_{x}$ , discriminators $D_{u}$ , $D_{x}$ :
2:: for q from 1 to p do /* pre-train */
3:: Randomly sample m genuine rating vectors ${x_{i}}_{i = 1}^{m}$ from $X_{input}$ , where $X_{input}$ is the current genuine user-item rating matrix for training.
4:: Process ${x_{i}}_{i = 1}^{m}$ with AE to output reconstructed rating vectors ${{\hat{x}}_{i}}_{i = 1}^{m}$ .
5:: Calculate the reconstruction loss $L_{AE}$ between ${x_{i}}_{i = 1}^{m}$ and ${{\hat{x}}_{i}}_{i = 1}^{m}$ using Equation (17) and update $θ_{AE}$ .
6:: end for
7:: for T from 1 to K do /* Adversarial training */
8:: Obtain weights $w_{n}$ , $w_{u}$ , $w_{l}$ , $w_{t}$ for normal, uniform, Laplace, and t-distributions using the weight predictor $W_{p r e d}$ according to Equations (14) and (15).
9:: for $t_{1}$ from 1 to $k_{1}$ do /* Update Discriminator */
10:: Randomly sample m genuine rating vectors ${x_{i}}_{i = 1}^{m}$ from $X_{input}$ .
11:: Use AE on ${x_{i}}_{i = 1}^{m}$ to extract genuine latent dependency vectors ${u_{i} = A E (x_{i})}_{i = 1}^{m}$ .
12:: Calculate the mixed noise vectors ${z_{i}}_{i = 1}^{m}$ using the distribution weights $w_{n}$ , $w_{u}$ , $w_{l}$ , $w_{t}$ according to Equation (16).
13:: Use generator $G_{u}$ with ${z_{i}}_{i = 1}^{m}$ as input to generate fake latent dependency vectors ${u_{i}^{'} = G_{u} (z_{i})}_{i = 1}^{m}$ .
14:: Use generator $G_{x}$ with ${u_{i}^{'}}_{i = 1}^{m}$ as input to generate fake rating vectors { $x_{i}^{'} = G_{x} (u_{i}^{'})}_{i = 1}^{m}$ .
15:: Using ${u_{i}}_{i = 1}^{m}$ and ${u_{i}^{'}}_{i = 1}^{m}$ , calculate discriminator $D_{u}$ loss $L_{D_{u}}$ with Equation (18) and update $θ_{D_{u}}$ .
16:: Using ${x_{i}}_{i = 1}^{m}$ and ${x_{i}^{'}}_{i = 1}^{m}$ , calculate discriminator $D_{x}$ loss $L_{D_{x}}$ with Equation (21) and update $θ_{D_{x}}$ .
17:: end for
18:: for $t_{2}$ from 1 to $k_{2}$ do /* Update Generator */
19:: Randomly sample m genuine rating vectors ${x_{i}}_{i = 1}^{m}$ from $X_{input}$ .
20:: Use AE on ${x_{i}}_{i = 1}^{m}$ to extract genuine latent dependency vectors ${u_{i} = E (x_{i})}_{i = 1}^{m}$ .
21:: Calculate the mixed noise vectors ${z_{i}}_{i = 1}^{m}$ using the distribution weights $w_{n}$ , $w_{u}$ , $w_{l}$ , $w_{t}$ according to Equation (16).
22:: Use generator $G_{u}$ with ${z_{i}}_{i = 1}^{m}$ as input to generate fake latent dependency vectors ${u_{i}^{'} = G_{u} (z_{i})}_{i = 1}^{m}$ .
23:: Use generator $G_{x}$ with ${u_{i}^{'}}_{i = 1}^{m}$ as input to generate fake rating vectors { $x_{i}^{'} = G_{x} (u_{i}^{'})}_{i = 1}^{m}$ .
24:: Using ${u_{i}}_{i = 1}^{m}$ and ${u_{i}^{'}}_{i = 1}^{m}$ , calculate generator $G_{u}$ loss $L_{G_{u}}$ with Equation (19) and update $θ_{G_{u}}$ .
25:: Using ${x_{i}}_{i = 1}^{m}$ and ${x_{i}^{'}}_{i = 1}^{m}$ , calculate generator $G_{x}$ loss $L_{G_{x}}$ with Equation (22) and update $θ_{G_{x}}$ .
26:: end for
27:: end for

II. Hyperparameter optimization and fake user profiles generation

28:: for each hyperparameter configuration ${Set}_{hyp} = {K, k_{1}, k_{2}, p}$ from S do /* Phase I: Hyperparameter optimization */
29:: for each cross-validation fold $j = 1 \dots 10$ do
30:: Partition X into a training subset $X_{train}^{(j)}$ and a validation subset $X_{val}^{(j)}$ .
31:: Let $X_{input} = X_{train}^{(j)}$ and perform model training (lines 1–27) using the parameters from the current ${Set}_{hyp}$ , then calculate HR@10 on $X_{val}^{(j)}$ .
32:: end for
33:: Calculate the average HR@10 for each ${Set}_{hyp}$ over the 10 cross-validation folds.
34:: Select the ${Set}_{hyp}^{*}$ with the highest average HR@10 as the best hyperparameter configuration.
35:: end for
36:: Let $X_{input} = X$ and perform model training (lines 1–27) using the best hyperparameter configuration ${Set}_{hyp}^{*}$ . /* Phase II: Final model training */
37:: Obtain mixed noise vectors ${z_{i}}_{i = 1}^{M}$ using Equation (16). /* Phase III: Generate final fake user profiles */
38:: Use generator $G_{u}$ with ${z_{i}}_{i = 1}^{M}$ as input to generate fake latent dependency vectors ${u_{i}^{'} = G_{u} (z_{i})}_{i = 1}^{M}$ .
39:: Use generator $G_{x}$ with ${u_{i}^{'}}_{i = 1}^{M}$ as input to generate fake rating vectors ${x_{i}^{'} = G_{x} (u_{i}^{'})}_{i = 1}^{M}$ .
40:: for i from 1 to M do
41:: Set the rating of the target item in $x_{i}^{'}$ to the maximum value.
42:: end for
43:: return the set of fake rating vectors ${x_{i}^{'}}_{i = 1}^{M}$ .

4. Experiments and Analysis

To effectively evaluate our proposed MDM-GANSA model, this section will first describe the relevant experimental setup, including the datasets used, evaluation metrics, baseline methods, and experimental procedures. Subsequently, we will answer the following key research questions through a series of experiments:

RQ1: How to optimize the hyperparameters for MDM-GANSA and evaluate its stability?
RQ2: How well does MDM-GANSA perform in terms of attack effectiveness?
RQ3: How well does MDM-GANSA perform in terms of attack stealthiness?
RQ4: How does the performance of MDM-GANSA compare to the original MDM-GAN?
RQ5: To what extent do the key components in MDM-GANSA contribute to attack effectiveness?

4.1. Experimental Setup

4.1.1. Dataset

We selected two widely used public datasets from the recommendation domain: MovieLens-1M (ML-1M) [22] and Douban [23]. Table 1 lists the key statistics of these two datasets.

Table 1. Statistical information of the datasets.

ML-1M is a classic dataset released by GroupLens Research, containing over one million ratings from 6040 users for 3706 movies. Due to its moderate scale, extensive user coverage, and relatively dense data, ML-1M has become a standard benchmark dataset for evaluating recommendation algorithms and related attack/defense strategies.

The Douban dataset originates from the well-known Chinese social and review website “Douban Movie”, collecting user ratings for films. The version we used covers nearly 900,000 ratings from 2848 users for 39,586 movies. Compared to ML-1M, the Douban dataset features a smaller user base but a much larger number of items, resulting in an extremely high sparsity in its user-item interaction matrix. This characteristic provides an important complementary perspective for testing the model’s performance in highly sparse scenarios.

These two datasets have significant differences in terms of user and item scale, as well as data density. Using them for evaluation helps provide a more comprehensive assessment of our proposed method’s performance under different data characteristics. In both datasets, the vast majority of blank entries in the user–item matrices represent unobserved interactions, rather than zero ratings. For implementation purposes, we use zero as a placeholder for tensorization, but during training and evaluation, losses and metrics are calculated only at the positions of truly observed ratings. This approach prevents the misinterpretation of missing values as negative feedback, thus ensuring that the results, particularly in highly sparse scenarios like the Douban dataset, maintain interpretability.

4.1.2. Evaluation Metric

We measure the performance of the proposed MDM-GANSA model from two dimensions: attack effectiveness and attack stealthiness.

(1) Attack effectiveness

Attack effectiveness aims to quantify the extent to which an attack model can successfully manipulate the output of a target recommender system. We primarily evaluate this by measuring the degree of ranking improvement for the target item in the recommendation list.

Hit Ratio (HR@N) [24] is our core metric for evaluating effectiveness. HR@N calculates the frequency with which the target item appears in the Top-N recommendation list generated for users after the attack is injected. Its formula is as follows [24]:

H R @ N = \frac{1}{| T | \times | A |} \sum_{i \in T} \sum_{a \in A} δ (i, a),

(24)

where N is the length of the recommendation list, T is the set of target items, and A is the set of users for whom the recommendation results are evaluated.

δ (i, a)

is an indicator function: if the target item i appears in user a’s Top-N recommendation list, then

δ (i, a) = 1

; otherwise,

δ (i, a) = 0

. In this paper, we fix the recommendation list length N to 10 and use HR@10 as the core evaluation metric. A higher HR@10 value indicates that the attack model is more effective at promoting the target item to users, meaning the attack is more successful.

(2) Attack stealthiness

Attack stealthiness is used to evaluate the extent to which the generated fake user profiles can evade detection, i.e., how similar their rating data is to that of genuine users. We frame this evaluation as a binary classification problem, where a standard attack detector is used to distinguish between genuine and fake users. To comprehensively reflect detection performance, we adopt Accuracy, Precision, Recall, F-measure, and the area under the ROC curve (AUC-ROC) [25,26,27]. The definitions of these metrics are as follows:

P r e c i s i o n = \frac{TP}{TP + FP},

(25)

R e c a l l = \frac{TP}{TP + FN},

(26)

F - m e a s u r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(27)

A c c u r a c y = \frac{TP + TN}{TP + TN + FP + FN},

(28)

where,

T P

is the number of fake users correctly identified,

F P

is the number of genuine users incorrectly identified as fake,

F N

is the number of fake users incorrectly identified as genuine, and

T N

is the number of genuine users correctly identified. AUC-ROC measures a detector’s overall ranking ability to distinguish real users from fake ones across different decision thresholds: the closer the AUC is to 1, the more easily the detector can separate the two classes at a global level.

With the detector and the target recommender fixed, if an attack model leads to an overall decrease in Precision, Recall, F-measure, Accuracy, and AUC-ROC, this indicates that the generated fake users are harder to distinguish from real users, i.e., the attack achieves higher stealthiness.

4.1.3. Comparison Method

Our comparative experiments cover: comparison with baseline shilling attack models, evaluation of attack effectiveness on different target recommendation algorithms, and evaluation of stealthiness against various detectors.

In selecting baseline shilling attack models, we aim to cover the typical technological spectrum from traditional to cutting-edge. We therefore include two representative categories of attack models:

(1) Classical heuristic-based attacks.

Random attack [3]: As a lower-bound baseline for attack effectiveness, it assigns the maximum rating to the target item and randomly fills ratings for other items, characterizing attack capability under minimal prior knowledge.
Average attack [3]: It uses each item’s historical mean rating for filler items, representing an early attempt to better match rating distributions and improve stealthiness.
Bandwagon attack [4]: By coupling the target item with highly popular items in the system, it is widely regarded as a strong traditional baseline that balances efficiency and effectiveness.

(2) GAN-based generative attacks.

AUSH [5]: One of the earliest works to formally introduce GANs into shilling attacks, it “augments” template users and marks a shift from rule-based generation to data-driven synthesis.
GSA-GANs [7]: Targeting the gray-box setting, it improves attack transferability and practicality by introducing a recommender simulator when the attacker cannot fully access the target recommender.
Leg-UP [8]: It extends the setting to a more stringent black-box scenario and enables cross-system transfer attacks via a surrogate model, representing an important step toward real-world deployment.

In addition, we also include the original MDM-GAN [11] model as a direct control group, which facilitates analyzing the impact of our attack-oriented modifications in the experimental results.

To evaluate attack effectiveness, we deliberately select five widely used collaborative filtering recommenders with markedly different architectures as attack targets, covering paradigms from classical matrix factorization to deep models and graph neural networks:

(1) Classical matrix factorization, represented by NMF [28] and SVD [29], explicitly models user–item latent factors via low-rank decomposition;

(2) Deep interaction modeling, represented by NeuMF [30], combines GMF and an MLP to capture more complex preference patterns through nonlinear networks;

(3) Graph-based high-order collaborative filtering, represented by NGCF [31] and LightGCN [32], models user–item interactions as a graph and characterizes high-order neighbor relations via multi-layer aggregation.

This diversified set of target systems—from traditional MF to deep neural recommenders and graph-based methods—enables us to systematically examine the attack penetrability and cross-model generalization capability of MDM-GANSA under different architectural settings.

In evaluating attack stealthiness, we employ different types of detection methods. This includes an unsupervised method (PCA [33]), which identifies attacks by discovering statistical anomalies in the data, and two supervised methods (CoDetector [34], Pop-SAD [35]), which are trained on known attack features. The test results on these detectors will be used to compare and evaluate the relative performance of MDM-GANSA and baseline models in generating “indistinguishable” fake users.

4.1.4. Experimental Procedure

Our experiments follow a unified simulation procedure.

First, we randomly split the ML-1M and Douban datasets into training and testing sets with a 9:1 ratio. This split ratio is widely adopted in recommender-system and security studies [5,8], as it allocates sufficient samples for testing while leaving as much data as possible for training complex generative models. In particular, for the highly sparse datasets considered in this work (e.g., Douban with sparsity exceeding 99%), a larger training portion helps expose the autoencoder and the two-stage GAN modules to more diverse user–item interaction patterns. This mitigates underfitting and training instability caused by insufficient data, enabling more reliable learning of the latent preference distribution and dependency structure. The first step of the attack simulation is to train MDM-GANSA and all baseline attack models on the original training set.

Next, in each experiment, we randomly select 5 different items as attack targets. To systematically evaluate the performance of each attack model under different attack intensities, we set a series of increasing attack sizes, where the number of injected fake users accounts for 1%, 2%, 3%, 4%, and 5% of the total number of genuine users. We then train the target recommendation algorithms based on these “polluted” training sets.

Finally, we evaluate the performance of the recommendation algorithms on the clean, original test set. By comparing this with the baseline performance of training on the original data, we quantify the attack’s effect. Although the test set accounts for only 10%, its absolute size is still substantial for datasets such as ML-1M and Douban, containing a large number of user–item interactions and covering rich long-tail users and items. Therefore, it is sufficient to support statistically meaningful performance comparisons and robustness analyses.

To ensure the stability and reliability of the results and to reduce the impact of random factors, all reported experimental data are the average of 5 independent runs.

4.2. Hyperparameter Optimization and Stability Analysis (RQ1)

In all experiments, for the method proposed in this paper, we uniformly use the Adam optimizer with a fixed learning rate of 0.001 and a batch size of 128. At the same time, to balance the different optimization objectives in the model, the weights for the reconstruction consistency loss

λ_{r e c}

and the feature matching loss

λ_{f e a t}

are both set to 1.0.

To configure the optimal hyperparameters for the model, we employed a random search strategy combined with 10-fold cross-validation [36] on the original training set. This process primarily targeted four key hyperparameters: the total number of training epochs K, with a search space of {5, 10, 15, 20, 25, 30, 35, 40, 45, 50}; and the internal training loops for the discriminator, generator, and AE,

k_{1}

,

k_{2}

, and p, with their search spaces all being {10, 15, 20, 25, 30}. For each dataset, we perform 15 rounds of random sampling followed by training and evaluation to efficiently approximate the optimal hyperparameter combination within the defined search space.

Figure 8 and Figure 9 respectively show the hyperparameter optimization process with HR@10 as the target on the two datasets with different characteristics, ML-1M and Douban.

Figure 8. Hyperparameter search process on the ML-1M dataset with HR@10 as the target.

Figure 9. Hyperparameter search process on the Douban dataset with HR@10 as the target.

As can be observed from Figure 8 and Figure 9, the optimization process on both datasets exhibits efficient convergence. Although each trial of the random search is independent, causing natural fluctuations in the curve, the key is that our model quickly finds the performance peak within a limited number of attempts. On the ML-1M dataset, the optimal configuration

\{K, k_{1}, k_{2}, p\} = {20, 15, 15, 20}

was located in just the 7th search. Similarly, on the sparser Douban dataset, the 10th search also successfully found the optimal configuration

\{K, k_{1}, k_{2}, p\} = {30, 20, 20, 20}

. Under our experimental setting, this indicates that the high-performance region of the hyperparameter space is not overly narrow or hard to reach, and that a near-optimal configuration can be found relatively quickly within the specified search space without exhaustive enumeration.

Building on the optimal configuration above, we further investigate the effect of the autoencoder’s latent dimensionality on attack effectiveness. Figure 10 reports the HR@10 trends on ML-1M and Douban when the latent dimension varies in 32, 64, 128, 256, 512, 1024. The results show that HR@10 increases markedly as the dimension rises from 32 to 128, while the performance largely saturates when the dimension is further increased to 256 and above. This suggests that a 128-dimensional latent space is sufficient to capture user–item dependencies while avoiding the redundancy and training overhead introduced by excessively high dimensions. Therefore, we select 128 as a trade-off between performance and compression ratio.

Figure 10. HR@10 of MDM-GANSA on the ML-1M and Douban datasets under different latent dimensions.

To quantify the impact of the two-stage auxiliary losses on the final performance and to justify the choices of

λ_{r e c}

and

λ_{f e a t}

, we conduct a univariate sensitivity analysis on these two weights. Figure 11 shows the changes in HR@10 on ML-1M and Douban when one weight is fixed at 1.0 while the other is varied. As can be seen, for both the reconstruction-consistency loss and the feature-matching loss, increasing the weight from 0.1 to 1.0 leads to a pronounced improvement in HR@10. When the weight is further increased to 2.0 and even 5.0, the performance begins to decline gradually, indicating that overly large auxiliary losses can weaken the adversarial objective and constrain attack effectiveness through “over-reconstruction.” Overall, the model exhibits moderate sensitivity to these weights without being overly fragile; therefore, we adopt a unified setting of

λ_{r e c}

=

λ_{f e a t}

= 1.0.

Figure 11. HR@10 of MDM-GANSA on the ML-1M and Douban datasets under different loss weights.

Under the above optimal hyperparameter setting, we further record the training dynamics of MDM-GANSA on the ML-1M and Douban datasets, i.e., how each loss term evolves across epochs. Specifically, we track the discriminator and generator losses in the first stage (

L_{D_{u}}

,

L_{G_{u}}

), the discriminator and generator losses in the second stage (

L_{D_{x}}

,

L_{G_{x}}

), as well as the autoencoder reconstruction loss

L_{A E}

, as shown in Figure 12. Despite involving five sub-networks and employing stage-wise adversarial training, all loss curves enter a relatively stable regime after a number of iterations. This suggests that the proposed loss design and training strategy can effectively mitigate the instability commonly observed in two-stage GAN training, enabling coordinated convergence of all sub-modules within a unified framework.

Figure 12. Changes in losses in MDM-GANSA.

All experiments are conducted on a server equipped with an NVIDIA Tesla P100 GPU. Under the optimal configuration, the complete training procedure of MDM-GANSA on ML-1M—including 15 random-search trials with 10-fold cross-validation—takes approximately 1 h, while it takes about 1.5 h on the larger and sparser Douban dataset. Notably, the weight predictor and the autoencoder are relatively lightweight; compared with the training of the two-stage GAN backbone, their additional computational overhead is negligible.

Regarding RQ1, the above results demonstrate the efficiency of hyperparameter search and the training stability of MDM-GANSA. The model converges quickly to a high-performance region within a limited search space and exhibits strong robustness to parameter perturbations, reducing the risk of overfitting. Together with the coordinated convergence of module-wise losses and the controllable computational cost, these findings indicate that MDM-GANSA has stable training behavior under our experimental setting, providing a practical configuration basis for subsequent experiments.

4.3. Attack Effectiveness Analysis (RQ2)

To comprehensively evaluate the attack effectiveness of MDM-GANSA, we systematically tested its performance against five mainstream recommendation algorithms under different attack scales on both the relatively dense ML-1M dataset and the highly sparse Douban dataset. The core evaluation metric is the Hit Rate (HR@10), with the results detailed in Table 2 and Table 3.

Table 2. Attack effectiveness (HR@10) of various attack models against five different recommendation algorithms on the ML-1M dataset.

Table 3. Attack effectiveness (HR@10) of various attack models against five different recommendation algorithms on the Douban dataset.

On the ML-1M dataset, MDM-GANSA demonstrates strong and stable attack performance. As shown in Table 2, MDM-GANSA achieves higher HR@10 than the baseline methods in most attack settings. As the attack scale increases from 1% to 5%, the attack effectiveness of all models shows an upward trend, but MDM-GANSA not only consistently maintains its leading position, but also exhibits a relatively more pronounced performance gain with increasing attack size, indicating a certain degree of scalability under our experimental setting.

Notably, MDM-GANSA achieves fairly consistent attack effectiveness across different types of recommender models. Whether targeting traditional matrix factorization methods such as NMF and SVD, or deep learning–based recommenders such as NeuMF, NGCF, and LightGCN, it consistently yields noticeable improvements in HR@10. This suggests that the fake user profiles generated by MDM-GANSA go beyond imitating shallow rating patterns; moreover, its AE-based dependency modeling may help capture collaborative signals that remain influential across a range of recommender architectures.

To further test the model’s robustness, we conducted evaluations on the more challenging, highly sparse Douban dataset. The experimental results in Table 3 first reveal a universal challenge posed by the sparse environment: due to a significant reduction in available collaborative information, the absolute HR@10 values for all attack models are lower than their performance on ML-1M. However, even in this harsh environment, the core advantages of MDM-GANSA remain prominent.

Analysis of Table 3 shows that MDM-GANSA still outperforms all baseline methods overall under sparse settings. By combining a dynamic multi-distribution noise strategy with AE-based dependency extraction, MDM-GANSA generates fake user profiles that are closer to real user behavior in both diversity and rating logic, rather than relying on simple statistical filling. Consequently, these injected profiles are more likely to be learned by recommender models in environments where collaborative signals are weak, thereby influencing the recommendation results. Notably, even when attacking harder-to-manipulate models such as LightGCN, MDM-GANSA still maintains a certain advantage, suggesting that the induced collaborative noise can also exert a disruptive effect on graph-convolution–based recommenders.

Across both datasets, MDM-GANSA consistently outperforms the selected baselines in terms of attack effectiveness. This finding aligns well with our two design principles: the dynamic, adaptive noise strategy provides a foundation for generating diverse and realistic users, while the AE-based, data-driven dependency modeling ensures the internal logical coherence of rating behaviors. Together, these components enable MDM-GANSA to produce fake user profiles with strong attack impact under our experimental setting, thereby answering RQ2 from an empirical perspective.

4.4. Attack Stealthiness Analysis (RQ3)

A successful attack model must not only effectively manipulate recommendation results but also be able to highly simulate genuine users to evade detection. This section aims to evaluate the stealthiness of the fake users generated by MDM-GANSA. We employed three types of detectors for testing: PCA, which discovers statistical anomalies; CoDetector, which detects collaborative behavior patterns; and Pop-SAD, which focuses on item popularity deviation. To comprehensively characterize stealthiness from the attacker’s perspective, we consider not only the classification metrics Precision, Recall, and F-measure, but also the overall discriminative metrics Accuracy and AUC-ROC. With the detector and evaluation split fixed, lower values of these metrics indicate that the detector finds it harder to distinguish fake users from real ones, implying a more stealthy attack.

Experimental results on the ML-1M dataset show that, across multiple detectors, MDM-GANSA yields overall lower metric values than the baseline methods, indicating a certain advantage in stealthiness. As shown in Figure 13, the values of all detection metrics increase as the attack scale grows from 1% to 5%, which is expected, as a larger injection of fake users makes the detection task easier. However, the key finding in the figure is that the three metric curves for MDM-GANSA are consistently lower than those of all baseline models in almost all scenarios, and its curves typically exhibit the smallest slope. This clearly indicates that the fake user profiles generated by MDM-GANSA are not only difficult to identify in small-scale attacks, but as the attack scale expands, their detectability deteriorates at a slower rate than other models. Figure 14 further characterizes this phenomenon from the perspectives of Accuracy and AUC-ROC. Across the three detectors, MDM-GANSA typically achieves the lowest—or near-lowest—Accuracy and AUC-ROC, indicating that the detectors have the weakest ability to distinguish the fake users generated by MDM-GANSA. This suggests that its stealthiness advantage remains evident even under more fine-grained separability metrics.

Figure 13. Comparison of stealthiness metrics (Precision/Recall/F-measure) of different attack models on the ML-1M dataset.

Figure 14. Accuracy and AUC-ROC of different attack models under different detectors on the ML-1M dataset.

On the sparser Douban dataset, the stealth advantage of MDM-GANSA is even more pronounced in terms of the measured values. As shown in Figure 15, the highly sparse environment makes any unnatural rating pattern more conspicuous, leading to an overall increase in detectability for all attack models compared to ML-1M. However, it is worth noting that the gap in stealthiness between MDM-GANSA and other models has further widened. In sparse data, a “good” fake user must exhibit a reasonable preference logic within very few rating interactions. The dense or patterned ratings generated by heuristic attacks and simple GAN models appear out of place in this context and are extremely easy to expose. In contrast, the fake user profiles generated by MDM-GANSA are not only logically consistent, but also better match the sparsity patterns of real-world behavior, making them less likely to exhibit clearly separable anomalous signatures in the overall data distribution. The Accuracy and AUC-ROC results in Figure 16 corroborate this observation: MDM-GANSA generally achieves lower Accuracy and AUC-ROC than most baseline methods, indicating that its fake users are harder for the detectors to distinguish reliably.

Figure 15. Comparison of stealthiness metrics (Precision/Recall/F-measure) of different attack models on the Douban dataset.

Figure 16. Accuracy and AUC-ROC of different attack models under different detectors on the Douban dataset.

Overall, MDM-GANSA’s stealthiness advantage stems from coordinated realism at both the macro and micro levels. On the one hand, the dynamic noise strategy breaks away from fixed templates at the population level, allowing fake users to blend into the real-user group in terms of statistical distributions. On the other hand, the AE-driven dependency modeling learns and reproduces realistic rating logic at the individual level, making each fake user’s behavioral pattern appear natural and credible. Considering multiple metrics—Precision/Recall/F-measure, Accuracy, and AUC-ROC—MDM-GANSA outperforms the compared baselines across different detectors and datasets, demonstrating strong stealthiness under our experimental conditions and providing empirical evidence in response to RQ3.

4.5. Comparison with the Original MDM-GAN (RQ4)

This section aims to quantify the performance improvements of our proposed MDM-GANSA over its foundational framework, MDM-GAN, through direct comparison. The core purpose of this ablation study is to verify whether the two key improvements we introduced—the dynamic adaptive noise strategy and the AE-based data-driven dependency modeling—have brought substantial enhancements in both attack effectiveness and stealthiness.

4.5.1. Effectiveness Comparison

To evaluate the improvement in attack effectiveness, we compared the HR@10 performance of the two models across different datasets and recommender systems.

On the ML-1M dataset, MDM-GANSA consistently outperforms the original MDM-GAN across different recommender models and attack sizes. As shown in Figure 17, in attacks against all five types of recommender models, MDM-GANSA’s attack effectiveness is consistently better than the original MDM-GAN at all scales. It is particularly noteworthy that as the attack scale increases from 1% to 5%, the performance gap between the two generally remains or even widens. This phenomenon is particularly evident on models such as NeuMF and NGCF, suggesting that our proposed improvements provide certain advantages in modeling high-order collaborative information. A plausible explanation is that the AE employed by MDM-GANSA can directly learn and reproduce complex non-linear preference logic from high-dimensional sparse data, a task for which the original MDM-GAN’s reliance on Copula statistical theory is ill-suited. This leads to a lack of attack power in the fake user profiles it generates.

Figure 17. HR@10 performance of MDM-GANSA versus the original MDM-GAN on the ML-1M dataset.

On the sparser Douban dataset, the advantages of MDM-GANSA are even more pronounced in quantitative terms. As shown in Figure 18, the high-sparsity environment places greater demands on the quality of the attack model. In this context, MDM-GANSA’s advantage is more significant. Especially when the attack scale exceeds 3%, its performance curve rapidly diverges markedly from that of MDM-GAN. This result indicates that, even in environments with weak collaborative signals, the fake user profiles generated by MDM-GANSA can still exert a substantial influence on the recommender models. Overall, our architectural improvements consistently translate into stronger attack effectiveness across different data distributions and model paradigms.

Figure 18. HR@10 performance of MDM-GANSA versus the original MDM-GAN on the Douban dataset.

4.5.2. Stealthiness Comparison

Besides attack effectiveness, we also conducted an in-depth comparison of the two models’ abilities to evade detection, i.e., their attack stealthiness.

On the ML-1M dataset, the fake user profiles generated by MDM-GANSA exhibit stronger evasion capabilities. As shown in Figure 19, under the scrutiny of the PCA, CoDetector, and Pop-SAD detectors, the detection metric curves for MDM-GANSA are consistently lower than those for MDM-GAN. This means that at the same attack scale, MDM-GANSA is more difficult to detect. Particularly with the CoDetector and Pop-SAD detectors, the detection metrics for MDM-GAN rise sharply as the attack scale increases, while MDM-GANSA’s curves maintain a much flatter growth. In the scatter plot in Figure 20, the arrows from the hollow circles (MDM-GAN) to the solid stars (MDM-GANSA) exhibit an overall down-left shift. This indicates that, under the same detector and attack intensity, the detector’s discriminative capability is systematically weakened, making the fake users generated by MDM-GANSA harder to identify. This observation is consistent with our dynamic noise strategy and AE-based dependency modeling. The former ensures the diversity of the generated profiles, avoiding the collective exposure risk brought by simplistic patterns, while the latter ensures the intrinsic realism of the rating behavior, allowing it to better blend in with the genuine user population.

Figure 19. Comparison of detection metrics (Precision/Recall/F-measure) between MDM-GANSA and the original MDM-GAN on the ML-1M dataset.

Figure 20. Accuracy and AUC-ROC of MDM-GANSA versus the original MDM-GAN on the ML-1M dataset.

On the highly sparse Douban dataset, the gap in stealthiness metrics between MDM-GANSA and the original MDM-GAN becomes even larger. As shown in Figure 21 and Figure 22, the difference between the two models is more pronounced than that on ML-1M, with all three detectors exhibiting larger drops in Accuracy and AUC-ROC. In sparse data, any pattern that does not conform to the natural characteristics of the data is extremely easy to identify. The profiles generated by the original MDM-GAN, based on statistical theory, have a pattern that differs significantly from the sparse behavior of genuine users, thus their detectability rises sharply. In contrast, the AE in MDM-GANSA directly learns the behavioral logic from this sparse environment. The generated profiles are highly similar to genuine users in both their statistical and behavioral characteristics, thus exhibiting strong stealthiness.

Figure 21. Comparison of detection metrics (Precision/Recall/F-measure) between MDM-GANSA and the original MDM-GAN on the Douban dataset.

Figure 22. Accuracy and AUC-ROC of MDM-GANSA versus the original MDM-GAN on the Douban dataset.

Overall, the comparative analysis in this section shows that, under the considered datasets and experimental settings, our modifications to the MDM-GAN framework lead to measurable performance improvements. Specifically, the dynamic noise strategy and the AE-based dependency modeling consistently enhance attack effectiveness and improve the ability to evade detection, thereby providing clear empirical evidence in response to RQ4.

4.6. Ablation Study on Key Components (RQ5)

To answer RQ5, we conduct a series of component-level ablation experiments on MDM-GANSA under the same experimental settings, using HR@10 as the metric for attack effectiveness. We denote the full model as MDM-GANSA. The variant with static uniform noise is MDM-GANSA_static, the annealing-based noise scheduling variant is MDM-GANSA_anneal, the variant without AE-based dependency modeling is MDM-GANSA_AE, the variant without the stage-1 reconstruction-consistency loss is MDM-GANSA_rec, and the variant without the stage-2 feature-matching loss is MDM-GANSA_feat. Table 4 summarizes the HR@10 results of all variants under different attack sizes.

Table 4. HR@10 of MDM-GANSA and its ablated variants under different attack sizes.

From the perspective of noise mixing strategies, the full MDM-GANSA achieves the highest HR@10 across all attack intensities. In contrast, MDM-GANSA_static consistently yields the weakest attack performance, indicating that a fixed uniform weighting scheme cannot fully exploit the complementarity of multi-source noise. The predefined annealing schedule in MDM-GANSA_anneal performs between these two: it delivers stable gains over the static scheme, yet remains systematically inferior to the full model with adaptive mixing. This trend suggests that a heuristic “curriculum-style” annealing strategy based solely on training progress can partially alleviate the limitations of static mixing, but cannot match the weight predictor in MDM-GANSA, which adjusts the weights of base distributions according to the real-time discriminator–generator training state. Overall, the results demonstrate that adaptive multi-distribution noise brings consistent improvements in attack effectiveness over static or heuristic alternatives in our experiments.

Regarding dependency modeling, MDM-GANSA_AE exhibits lower HR@10 than the full model at all attack sizes, with a larger gap at medium-to-high attack intensities. This indicates that simple random linear compression cannot replace AE-based, data-driven dependency modeling; the latent representations extracted by the AE are beneficial to the subsequent two-stage generation under the tested settings.

Finally, from the perspective of the auxiliary losses in the two-stage design, removing the reconstruction-consistency loss or the feature-matching loss leads to noticeable performance drops for MDM-GANSA_rec and MDM-GANSA_feat. This suggests that the auxiliary losses not only provide more stable optimization signals in early training—helping mitigate the instability of purely adversarial learning—but also impose complementary constraints in the latent space and rating space, respectively. Together, these losses contribute to MDM-GANSA’s advantage in attack effectiveness in our empirical results.

5. Conclusions and Future Work

To address the challenges of limited attack pattern diversity and poor stealthiness in shilling attacks against collaborative filtering recommender systems, this paper proposes MDM-GANSA, an enhanced attack model built upon the MDM-GAN framework. MDM-GANSA introduces two key innovations. First, it incorporates a weight predictor to realize a dynamic, adaptive noise strategy, improving the diversity of generated fake user profiles. Second, it replaces traditional statistical dependency modeling with an AE-based, data-driven paradigm, enhancing the internal logical consistency of the generated profiles. In addition, the refined stage-wise architecture contributes to improved training stability. Comprehensive experiments on two public datasets, ML-1M and Douban, demonstrate that MDM-GANSA consistently outperforms multiple mainstream baselines in terms of both attack effectiveness and attack stealthiness against detection. Importantly, this work is conducted strictly on public datasets and follows the principle of “attack for defense,” aiming to support security evaluation and the design of defense mechanisms for recommender systems, rather than encouraging misuse in real-world systems.

Despite these promising results, MDM-GANSA still has several limitations. For example, the current study mainly focuses on collaborative filtering recommenders based on rating matrices and evaluates performance under a limited range of attack sizes, detector types, and dataset scales. Its effectiveness and stealthiness under more complex interaction data, stricter online defense mechanisms, and unknown detectors remain to be further validated. Looking ahead, we plan to integrate the proposed generative model with decision-making mechanisms such as reinforcement learning to build an adaptive agent that can dynamically adjust attack strategies based on environmental feedback. Building on this direction, we will further explore how generative attack modeling can be leveraged to inform robust training and defense design, advancing the field toward a more intelligent and controllable attacker–defender game.

Author Contributions

Conceptualization, Q.Z.; methodology, Q.Z.; software, Q.Z. and X.Z. (Xiaoyue Zhang); validation, X.Z. (Xiaoyue Zhang); formal analysis, Q.Z.; investigation, X.Z. (Xi Zhao); resources, Q.Z.; data curation, X.Z. (Xi Zhao); writing—original draft preparation, X.Z. (Xiaoyue Zhang); writing—review and editing, Q.Z.; visualization, Q.Z.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of this manuscript.

Funding

The Natural Science Foundation of Shandong Province (Grant No. ZR2025MS1021).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study analyzed two publicly available datasets, the MovieLens-1M dataset and the Douban dataset, which are readily available for download from their respective sources online. The simulated attack data generated and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Azri, A.; Haddi, A.; Allali, H. IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive Autoencoder. Information 2024, 15, 204. [Google Scholar] [CrossRef]
Zhang, Y.; Hao, Q.; Zheng, W.; Xiao, Y. User Similarity-Based Graph Convolutional Neural Network for Shilling Attack Detection. Appl. Intell. 2025, 55, 340. [Google Scholar] [CrossRef]
Lam, S.K.; Riedl, J. Shilling Recommender Systems for Fun and Profit. In Proceedings of the 13th International Conference on World Wide Web, New York, NY, USA, 17–20 May 2004; ACM: New York, NY, USA, 2004; pp. 393–402. [Google Scholar] [CrossRef]
Burke, R.; Mobasher, B.; Zabicki, R.; Bhaumik, R. Identifying Attack Models for Secure Recommendation. In Beyond Personalization: A Workshop on the Next Generation of Recommender Systems; 2005; pp. 347–361. Available online: http://www.grouplens.org/beyond2005/full/burke.pdf (accessed on 8 January 2026).
Lin, C.; Chen, S.; Li, H.; Xiao, Y.; Li, L.; Yang, Q. Attacking Recommender Systems with Augmented User Profiles. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; ACM: New York, NY, USA, 2020; pp. 855–864. [Google Scholar] [CrossRef]
Wu, F.; Gao, M.; Yu, J.; Wang, Z.; Liu, K.; Wang, X. Ready for Emerging Threats to Recommender Systems? A Graph Convolution-Based Generative Shilling Attack. Inf. Sci. 2021, 578, 683–701. [Google Scholar] [CrossRef]
Wang, Z.; Gao, M.; Li, J.; Zhang, J.; Zhong, J. Gray-Box Shilling Attack: An Adversarial Learning Approach. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–21. [Google Scholar] [CrossRef]
Lin, C.; Chen, S.; Zeng, M.; Zhang, S.; Gao, M.; Li, H. Shilling Black-Box Recommender Systems by Learning to Generate Fake User Profiles. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 1305–1319. [Google Scholar] [CrossRef]
Cai, H.; Wang, S.; Zhang, Y.; Zhang, M.; Zhao, A. A Poisoning Attack Based on Variant Generative Adversarial Networks in Recommender Systems. In Advanced Data Mining and Applications; Yang, X., Suhartanto, H., Eds.; Springer: Cham, Switzerland, 2023; pp. 371–386. [Google Scholar] [CrossRef]
Barsha, F.L.; Eberle, W. An In-Depth Review and Analysis of Mode Collapse in Generative Adversarial Networks. Mach. Learn. 2025, 114, 141. [Google Scholar] [CrossRef]
Yang, M.; Tang, J.; Dang, S.; Chen, G.; Chambers, J.A. Multi-Distribution Mixture Generative Adversarial Networks for Fitting Diverse Data Sets. Expert Syst. Appl. 2024, 248, 123450. [Google Scholar] [CrossRef]
Chekanov, S.V.; Islam, W.; Zhang, R.; Luongo, N. ADFilter—A Web Tool for New Physics Searches with Autoencoder-Based Anomaly Detection Using Deep Unsupervised Neural Networks. Information 2025, 16, 258. [Google Scholar] [CrossRef]
Chen, S.; Guo, W. Auto Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
Rigoni, D.; Navarin, N.; Sperduti, A. RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design. Mach. Learn. 2025, 114, 47. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, J.; Zhao, X.; Hui, Y. Batch Process Quality Prediction Based on Denoising Autoencoder–Spatial Temporal Convolutional Attention Mechanism Fusion Network. Appl. Intell. 2025, 55, 515. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf (accessed on 8 January 2026).
Zhang, Y.; Xia, M.; Shen, Y.; Zhu, J.; Yang, C.; Zheng, K.; Huang, L.; Liu, Y.; Cheng, F. Exploring Guided Sampling of Conditional GANs. In Proceedings of the Computer Vision–ECCV 2024, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Eds.; Springer: Cham, Switzerland, 2025; pp. 36–53. [Google Scholar] [CrossRef]
Huang, Y.; Guo, K.; Yi, X.; Yu, J.; Shen, Z.; Li, T. T-Copula and Wasserstein Distance-Based Stochastic Neighbor Embedding. Knowl.-Based Syst. 2022, 243, 108431. [Google Scholar] [CrossRef]
Chen, G.; Zhan, R.; Wong, D.F.; Chao, L.S. Dynamic Curriculum Learning for Conversation Response Selection. Knowl.-Based Syst. 2024, 293, 111687. [Google Scholar] [CrossRef]
Jiang, L.; Zhou, Z.; Leung, T.; Li, L.; Li, F. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR: Cambridge, MA, USA, 2018; pp. 2304–2313. [Google Scholar]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-Paper.pdf (accessed on 8 January 2026).
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2015, 5, 19. [Google Scholar] [CrossRef]
Zhao, G.; Qian, X.; Xie, X. User-Service Rating Prediction by Exploring Social Users’ Rating Behaviors. IEEE Trans. Multimed. 2016, 18, 496–506. [Google Scholar] [CrossRef]
Fang, M.; Yang, G.; Gong, N.Z.; Liu, J. Poisoning Attacks to Graph-Based Recommender Systems. In Proceedings of the 34th Annual Computer Security Applications Conference, San Juan, PR, USA, 3–7 December 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 381–392. [Google Scholar] [CrossRef]
Zhou, Q.; Huang, C. A Recommendation Attack Detection Approach Integrating CNN with Bagging. Comput. Secur. 2024, 146, 104030. [Google Scholar] [CrossRef]
Si, M.; Li, Q. Shilling Attacks Against Collaborative Recommender Systems: A Review. Artif. Intell. Rev. 2020, 53, 291–319. [Google Scholar] [CrossRef]
Kumar, A.; Singh, Y. Detection of Hybrid Profile Injection Attacks in Recommender Systems Using a Multilevel Ensemble Framework. Int. J. Intell. Eng. Syst. 2025, 18, 11. [Google Scholar] [CrossRef]
Fathi Hafshejani, S.; Moaberfard, Z. Initialization for Non-Negative Matrix Factorization: A Comprehensive Review. Int. J. Data Sci. Anal. 2023, 16, 119–134. [Google Scholar] [CrossRef]
Saifudin, I.; Widiyaningtyas, T. Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets. IEEE Access 2024, 12, 19827–19847. [Google Scholar] [CrossRef]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 173–182. [Google Scholar] [CrossRef]
Kobiela, D.; Groth, J.; Sieczczyński, M.; Wolniak, R.; Pastuszak, K. Neural Graph Collaborative Filtering: Analysis of Possibilities on Diverse Datasets. In New Trends in Database and Information Systems; Abelló, A., Vassiliadis, P., Eds.; Springer: Cham, Switzerland, 2023; pp. 612–619. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 639–648. [Google Scholar] [CrossRef]
Mehta, B.; Nejdl, W. Unsupervised Strategies for Shilling Detection and Robust Collaborative Filtering. User Model. User-Adapt. Interact. 2009, 19, 65–97. [Google Scholar] [CrossRef]
Dou, T.; Yu, J.; Xiong, Q.; Gao, M.; Song, Y.; Fang, Q. Collaborative Shilling Detection Bridging Factorization and User Embedding. In Collaborative Computing: Networking, Applications and Worksharing; Romdhani, I., Shu, L., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 459–469. [Google Scholar] [CrossRef]
Li, W.; Gao, M.; Li, H.; Zeng, J.; Xiong, Q.; Hirokawa, S. Shilling Attack Detection in Recommender Systems via Selecting Patterns Analysis. IEICE Trans. Inf. Syst. 2016, E99.D, 2600–2611. [Google Scholar] [CrossRef]
Rimal, Y.; Sharma, N.; Alsadoon, A. The Accuracy of Machine Learning Models Relies on Hyperparameter Tuning: Student Result Classification Using Random Forest, Randomized Search, Grid Search, Bayesian, Genetic, and Optuna Algorithms. Multimed. Tools Appl. 2024, 83, 74349–74364. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of the MDM-GANSA model.

Figure 2. Network architecture of the dynamic weight predictor (

W_{p r e d}

).

Figure 3. Network architecture of the autoencoder for dependency extraction.

Figure 4. Network architecture of the generator (

G_{u}

).

Figure 5. Network architecture of the discriminator (

D_{u}

).

Figure 6. Network architecture of the generator (

G_{x}

).

Figure 7. Network architecture of the discriminator (

D_{x}

).

Figure 8. Hyperparameter search process on the ML-1M dataset with HR@10 as the target.

Figure 9. Hyperparameter search process on the Douban dataset with HR@10 as the target.

Figure 10. HR@10 of MDM-GANSA on the ML-1M and Douban datasets under different latent dimensions.

Figure 11. HR@10 of MDM-GANSA on the ML-1M and Douban datasets under different loss weights.

Figure 12. Changes in losses in MDM-GANSA.

Figure 13. Comparison of stealthiness metrics (Precision/Recall/F-measure) of different attack models on the ML-1M dataset.

Figure 14. Accuracy and AUC-ROC of different attack models under different detectors on the ML-1M dataset.

Figure 15. Comparison of stealthiness metrics (Precision/Recall/F-measure) of different attack models on the Douban dataset.

Figure 16. Accuracy and AUC-ROC of different attack models under different detectors on the Douban dataset.

Figure 17. HR@10 performance of MDM-GANSA versus the original MDM-GAN on the ML-1M dataset.

Figure 18. HR@10 performance of MDM-GANSA versus the original MDM-GAN on the Douban dataset.

Figure 19. Comparison of detection metrics (Precision/Recall/F-measure) between MDM-GANSA and the original MDM-GAN on the ML-1M dataset.

Figure 20. Accuracy and AUC-ROC of MDM-GANSA versus the original MDM-GAN on the ML-1M dataset.

Figure 21. Comparison of detection metrics (Precision/Recall/F-measure) between MDM-GANSA and the original MDM-GAN on the Douban dataset.

Figure 22. Accuracy and AUC-ROC of MDM-GANSA versus the original MDM-GAN on the Douban dataset.

Table 1. Statistical information of the datasets.

Dataset	#User	#Item	#Rating	Rating Scale	Sparsity
ML-1M	6040	3706	1,000,209	[1, 5]	95.53%
Douban	2848	39,586	894,887	[1, 5]	99.21%

Note: The symbol # stands for “Number of”.

Table 2. Attack effectiveness (HR@10) of various attack models against five different recommendation algorithms on the ML-1M dataset.

Recommendation Algorithm	Attack Size	Random	Average	Bandwagon	AUSH	Leg-UP	GSA-GANs	MDM-GANSA
NMF	1%	4.27%	4.36%	4.34%	4.73%	4.75%	4.14%	4.79%
	2%	5.46%	5.52%	5.47%	5.46%	5.78%	5.32%	5.63%
	3%	6.34%	6.39%	6.31%	6.25%	6.27%	6.12%	6.52%
	4%	7.04%	7.15%	7.07%	7.42%	7.07%	7.59%	7.68%
	5%	8.00%	8.07%	7.98%	8.42%	8.60%	8.36%	8.86%
SVD	1%	5.37%	5.26%	5.44%	5.43%	5.31%	4.81%	5.76%
	2%	6.33%	6.40%	6.39%	6.46%	6.28%	6.62%	6.76%
	3%	7.25%	7.29%	7.28%	7.36%	7.15%	7.56%	7.60%
	4%	8.09%	8.13%	8.17%	8.23%	8.07%	8.51%	8.53%
	5%	8.95%	9.04%	9.03%	9.07%	9.30%	9.28%	9.33%
NeuMF	1%	3.67%	3.68%	4.99%	5.04%	4.82%	5.05%	5.42%
	2%	5.84%	5.92%	5.95%	7.47%	7.92%	7.72%	8.82%
	3%	13.28%	7.79%	7.54%	11.50%	10.53%	10.92%	15.49%
	4%	17.65%	11.49%	11.08%	16.38%	11.14%	11.65%	20.14%
	5%	16.67%	15.35%	16.01%	18.87%	13.75%	13.99%	23.90%
NGCF	1%	4.60%	6.23%	7.17%	9.05%	5.29%	5.05%	5.32%
	2%	5.12%	6.37%	7.94%	8.96%	6.14%	7.12%	7.71%
	3%	4.56%	6.61%	7.08%	8.99%	7.92%	7.85%	8.77%
	4%	4.95%	7.46%	7.90%	10.50%	10.74%	9.98%	10.77%
	5%	5.32%	7.71%	8.77%	10.77%	12.17%	11.41%	12.36%
LightGCN	1%	3.75%	3.77%	3.52%	3.87%	4.47%	3.86%	4.40%
	2%	4.14%	3.99%	4.25%	4.93%	5.04%	4.77%	5.66%
	3%	5.02%	4.62%	5.42%	5.92%	5.57%	5.75%	6.17%
	4%	6.40%	6.14%	6.03%	6.87%	6.74%	6.13%	6.91%
	5%	6.98%	6.98%	6.85%	7.13%	8.66%	6.57%	7.78%

The best results are shown in bold and underlined.

Table 3. Attack effectiveness (HR@10) of various attack models against five different recommendation algorithms on the Douban dataset.

Recommendation Algorithm	Attack Size	Random	Average	Bandwagon	AUSH	Leg-UP	GSA-GANs	MDM-GANSA
NMF	1%	0.47%	0.50%	0.62%	0.69%	0.84%	0.28%	1.03%
	2%	1.32%	1.24%	1.44%	1.56%	1.70%	0.52%	2.00%
	3%	2.35%	2.38%	2.49%	2.51%	2.72%	2.93%	2.95%
	4%	3.32%	3.24%	3.35%	3.46%	3.56%	3.86%	3.94%
	5%	4.22%	4.19%	4.14%	4.49%	4.76%	4.86%	4.86%
SVD	1%	1.03%	1.04%	1.04%	1.09%	1.03%	0.82%	1.00%
	2%	2.00%	2.01%	2.01%	2.06%	1.99%	2.10%	2.15%
	3%	3.00%	2.95%	2.96%	3.05%	2.96%	2.84%	3.11%
	4%	3.93%	3.94%	3.89%	3.99%	3.89%	4.06%	4.11%
	5%	4.84%	4.82%	4.82%	4.91%	4.81%	4.94%	5.02%
NeuMF	1%	0.38%	0.42%	0.49%	0.43%	0.31%	0.39%	0.45%
	2%	1.12%	1.11%	1.01%	1.19%	1.23%	1.30%	1.67%
	3%	2.28%	2.50%	2.32%	2.62%	1.79%	2.24%	2.99%
	4%	3.03%	2.99%	3.09%	3.45%	3.67%	3.49%	4.16%
	5%	3.28%	3.08%	3.86%	4.04%	4.49%	4.65%	4.95%
NGCF	1%	0.40%	0.48%	0.32%	0.62%	0.16%	0.63%	0.76%
	2%	1.26%	1.06%	1.28%	1.43%	0.48%	1.71%	1.95%
	3%	1.58%	1.66%	1.56%	1.88%	1.22%	2.71%	2.43%
	4%	2.52%	2.27%	2.25%	3.88%	2.46%	3.56%	4.37%
	5%	2.71%	2.45%	2.63%	4.71%	4.76%	4.63%	5.46%
LightGCN	1%	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	0.01%
	2%	0.00%	0.00%	0.00%	0.00%	0.01%	0.00%	0.02%
	3%	0.00%	0.00%	0.00%	0.04%	0.31%	0.01%	0.03%
	4%	0.03%	0.04%	0.01%	0.00%	0.74%	0.06%	0.06%
	5%	0.07%	0.08%	0.06%	0.11%	1.71%	0.40%	0.70%

The best results are shown in bold and underlined.

Table 4. HR@10 of MDM-GANSA and its ablated variants under different attack sizes.

Attack Model	1%	2%	3%	4%	5%
MDM-GANSA	5.42%	8.82%	15.49%	20.14%	23.90%
MDM-GANSA_static	4.58%	7.09%	13.68%	17.06%	19.59%
MDM-GANSA_anneal	4.88%	8.04%	14.38%	19.01%	21.26%
MDM-GANSA_AE	4.28%	8.75%	14.09%	17.48%	19.66%
MDM-GANSA_rec	5.07%	8.39%	13.17%	18.53%	19.88%
MDM-GANSA_feat	4.92%	7.88%	13.28%	18.80%	20.54%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.