Mitigating Membership Inference Attacks via Generative Denoising Mechanisms

Yang, Zhijie; Yan, Xiaolong; Chen, Guoguang; Tian, Xiaoli

doi:10.3390/math13193070

Open AccessArticle

Mitigating Membership Inference Attacks via Generative Denoising Mechanisms

College of Mechatronic Engineering, North University of China, No. 3 Xueyuan Road, Taiyuan 030051, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3070; https://doi.org/10.3390/math13193070

Submission received: 9 August 2025 / Revised: 23 August 2025 / Accepted: 3 September 2025 / Published: 24 September 2025

(This article belongs to the Special Issue Privacy-Preserving Techniques in AI, Blockchain and Cloud Systems with Formal Mathematical Analysis, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Membership Inference Attacks (MIAs) pose a significant threat to privacy in modern machine learning systems, enabling adversaries to determine whether a specific data record was used during model training. Existing defense techniques often degrade model utility or rely on heuristic noise injection, which fails to provide a robust, mathematically grounded defense. In this paper, we propose Diffusion-Driven Data Preprocessing (D³P), a novel privacy-preserving framework leveraging generative diffusion models to transform sensitive training data before learning, thereby reducing the susceptibility of trained models to MIAs. Our method integrates a mathematically rigorous denoising process into a privacy-oriented diffusion pipeline, which ensures that the reconstructed data maintains essential semantic features for model utility while obfuscating fine-grained patterns that MIAs exploit. We further introduce a privacy–utility optimization strategy grounded in formal probabilistic analysis, enabling adaptive control of the diffusion noise schedule to balance attack resilience and predictive performance. Experimental evaluations across multiple datasets and architectures demonstrate that D³P significantly reduces MIA success rates by up to

42.3 %

compared to state-of-the-art defenses, with a less than

2.5 %

loss in accuracy. This work provides a theoretically principled and empirically validated pathway for integrating diffusion-based generative mechanisms into privacy-preserving AI pipelines, which is particularly suitable for deployment in cloud-based and blockchain-enabled machine learning environments.

Keywords:

machine learning; deep learning; privacy protection; diffusion model

MSC:

68T09

1. Introduction

The rapid adoption of machine learning (ML) technologies in diverse domains such as healthcare diagnostics, financial fraud detection, personalized recommendation systems, and autonomous systems has led to unprecedented improvements in decision-making efficiency and accuracy. In many of these domains, the training process relies heavily on large-scale datasets containing sensitive personal or organizational information. However, the integration of such data into model training pipelines inevitably raises critical concerns regarding privacy leakage. One of the most notable and practically realizable threats in this context is the Membership Inference Attack (MIA), wherein an adversary aims to infer whether a given data point was included in the training dataset of a target model. Successful MIAs can lead to severe consequences, including the exposure of medical conditions, revealing an individual’s presence in a social group, or disclosing involvement in confidential transactions.

MIAs are particularly challenging because they exploit subtle statistical discrepancies between a model’s behavior on training data versus unseen data. These discrepancies often arise from overfitting, even when regularization techniques are applied. For example, confidence scores, loss distributions, or gradient norms may unintentionally encode membership information that can be extracted by a well-crafted adversarial query. Over the last few years, researchers have proposed a variety of countermeasures. Differential privacy [1] offers a formal and mathematically rigorous defense, guaranteeing bounded influence of any single training record on the model’s output distribution. However, the noise injection required to achieve strong privacy guarantees can significantly degrade model accuracy, particularly in high-dimensional tasks. Similarly, adversarial regularization techniques [2,3,4] introduce additional loss terms to discourage overfitting but often require computationally expensive retraining, making them impractical for resource-constrained settings or pre-trained models.

Another promising defense paradigm involves modifying the training data itself before model learning. By altering the fine-grained statistical characteristics of the input data, one can reduce the exploitable membership signals without necessarily compromising the global patterns that underpin predictive performance. Generative modeling has emerged as a candidate for such preprocessing, as it can create privacy-preserving synthetic variants of the original data. However, existing approaches such as GAN-based synthesis often suffer from mode collapse, poor semantic fidelity, and limited formal guarantees regarding privacy leakage.

To address these shortcomings, we propose Diffusion-Driven Data Preprocessing (D³P), a generative defense framework that leverages the expressive and stable synthesis capabilities of diffusion models to mitigate MIAs. Diffusion models, originally designed for high-quality image generation, operate by iteratively denoising samples from a noisy prior, reconstructing semantically meaningful outputs that resemble the original data distribution. In D³P, this denoising process is adapted to partially recover the original data from an intermediate noisy state, introducing controlled perturbations that obfuscate membership-specific patterns while preserving task-relevant features. The degree of perturbation is governed by a formally derived privacy–utility trade-off function, enabling practitioners to systematically balance attack resilience and predictive accuracy.

From a theoretical standpoint, D³P is grounded in a probabilistic framework that characterizes the adversary’s inference advantage as a function of the diffusion noise schedule. This allows us to formalize conditions under which the model’s susceptibility to MIAs is minimized, while ensuring that utility degradation remains bounded. From a practical perspective, the method is highly adaptable: it operates entirely as a data preprocessing step, without requiring modifications to the learning algorithm, loss function, or network architecture. As a result, D³P can be integrated into cloud-based ML workflows, blockchain-mediated federated learning systems, and other distributed AI deployments where data cannot be directly shared or modified post-collection.

The core contributions of this paper are as follows:

We introduce a novel diffusion-based preprocessing pipeline for privacy-preserving ML, specifically targeting MIA mitigation without requiring downstream retraining adjustments.
We provide a formal mathematical characterization of the relationship between diffusion noise schedules and membership inference risk, yielding a tunable privacy–utility optimization criterion.
We conduct extensive experimental evaluations on benchmark datasets and multiple MIA attack scenarios, demonstrating that D³P reduces MIA success rates by up to $42.3 %$ compared to state-of-the-art baselines, with a less than $2.5 %$ loss in accuracy.
We validate the applicability of our approach in cloud-hosted and blockchain-supported learning environments, showing that it can provide scalable and auditable privacy guarantees.

By bridging the gap between formal privacy analysis and practical generative modeling, D³P offers a principled and effective defense mechanism for safeguarding sensitive training data in modern AI systems. We anticipate that this work will encourage further exploration into integrating diffusion-based methods with privacy-preserving machine learning frameworks, thereby contributing to the broader goal of secure and trustworthy AI deployment.

2. Related Work

Privacy preservation in machine learning has emerged as a critical research area due to the increasing deployment of AI models in domains involving highly sensitive data, such as healthcare diagnostics, financial transactions, biometric authentication, and personalized recommendation systems. The challenge lies in enabling powerful predictive capabilities while ensuring that individual-level information cannot be exploited by adversaries. In this section, we provide a comprehensive review of the most relevant strands of literature that inform our work: membership inference attacks and defenses, data perturbation and synthetic data generation, and diffusion-based generative models. We conclude by positioning our contribution within this broader context.

2.1. Membership Inference Attacks and Defenses

Membership Inference Attacks (MIAs) were formally introduced by Shokri et al. [2], who demonstrated the shadow model paradigm, wherein an attacker trains multiple shadow models to mimic the decision boundary of the target model and subsequently trains an attack classifier to infer whether a given sample was part of the training dataset. Subsequent work expanded the scope of MIAs to black-box [5,6,7] and white-box [8] settings, revealing that even when access is limited to output probabilities, membership leakage can be substantial. These studies have also demonstrated that overfitting, overconfident predictions, and poor calibration are key factors that exacerbate vulnerability.

Defenses against MIAs can be broadly categorized into model-centric and data-centric approaches. Model-centric defenses include standard regularization techniques [9,10], confidence masking [2], and adversarial regularization [3], which add explicit penalty terms to discourage membership-correlated behavior. Differential privacy (DP) [1] represents the most well-known formal defense, with DP-SGD [11] applying gradient clipping and Gaussian noise to bound the influence of any single data point. However, empirical evaluations [12] show that achieving meaningful privacy budgets

(ε \leq 8)

often entails significant utility loss, particularly in high-dimensional tasks. Other defenses [13] perturb output labels or post-process confidence scores, but these may distort model calibration and limit applicability to multi-class problems.

A recent survey [14] highlights that while model-centric defenses can be effective, they frequently require architectural changes, retraining from scratch, or continuous adversary-aware training, which limits their scalability in cloud-based and federated learning environments. This motivates the exploration of preprocessing-based solutions that operate independently of the model training pipeline.

2.2. Data Perturbation and Synthetic Data Generation

Data-centric defenses aim to reduce membership signals by altering the training data prior to model learning. Basic approaches include feature perturbation [15], dimensionality reduction via PCA [16], or selective feature masking [17]. While computationally inexpensive, these methods often lack adaptability to different privacy–utility trade-offs and can degrade task-relevant signal disproportionately.

Generative models have emerged as a more sophisticated alternative, enabling the creation of synthetic datasets that approximate the statistical distribution of real data while obscuring individual records. Generative Adversarial Networks (GANs) [18] and Variational Autoencoders (VAEs) [19] have been widely adopted for this purpose [20,21,22]. However, GAN-based methods suffer from training instability, mode collapse, and difficulty in controlling the level of fidelity versus privacy. Hybrid approaches that combine generative models with differential privacy [23,24] mitigate some risks but introduce new challenges, such as managing the compounded privacy loss and balancing noise injection against sample realism.

Our work is aligned with the generative modeling perspective but departs from traditional GAN/VAE approaches by exploiting the controllable nature of diffusion models. Instead of generating entirely synthetic data from noise, we leverage partial reverse diffusion to reconstruct semantically faithful but privacy-preserving variants of the original inputs. This ensures that the coarse, task-relevant structures remain intact while membership-specific high-frequency details are obfuscated.

2.3. Diffusion-Based Generative Models

Diffusion probabilistic models [25,26] and score-based generative models [27] have recently emerged as a leading class of generative models, producing state-of-the-art results across diverse modalities such as images [28], audio [29], and text [30]. The key idea is to define a forward Markov chain that gradually adds Gaussian noise to the data until it becomes nearly pure noise, and then learn a reverse process to denoise step-by-step back to the original data distribution. This process is parameterized by a neural network trained using denoising score matching.

Improvements to diffusion models have included variance schedule design [31], hybrid deterministic-stochastic samplers [32], and multi-resolution conditioning [33]. Diffusion models have also been used for tasks requiring fine control over the output, such as inpainting [34] and super-resolution [35]. Recently, their stability and controllability have led to interest in privacy-related applications, e.g., face anonymization [36] and sensitive attribute removal [37]. However, these prior works focus on specific application domains and lack a formal optimization framework for balancing privacy and utility.

3. Preliminaries

In this section, we outline the core concepts, definitions, and mathematical formulations necessary for understanding our proposed Diffusion-Driven Data Preprocessing (D³P) framework. We begin with a formal definition of Membership Inference Attacks (MIAs), followed by an overview of diffusion models, and finally present the notation conventions that will be used throughout the remainder of the paper.

3.1. Membership Inference Attacks (MIAs)

Let

D = {(x_{i}, y_{i})}_{i = 1}^{n}

denote the training dataset, where

x_{i} \in R^{d}

represents the input features and

y_{i} \in Y

denotes the label space. A target model

M_{θ}

is trained on

D

, yielding parameters

θ

. An adversary

A

with black-box query access to

M_{θ}

aims to determine, for a given sample

x^{*}

, whether

x^{*} \in D

.

Formally, the adversary’s goal is to construct a binary classifier

f_{adv} : R^{d} \to {0, 1}

, where

f_{adv} (x^{*}) = \{\begin{matrix} 1, & if x^{*} \in D \\ 0, & otherwise . \end{matrix}

The success of the attack is commonly measured by advantage:

Adv (A) = Pr [f_{adv} (x) = 1 ∣ x \in D] - Pr [f_{adv} (x) = 1 ∣ x \notin D] .

High advantage values indicate strong membership leakage. MIAs typically exploit overfitting-related artifacts, confidence score distributions, or gradient statistics.

3.2. Diffusion Models

Diffusion models [26,27] are a class of generative models that learn to reverse a gradual noising process applied to data. Consider a data sample

x_{0} \sim q (x_{0})

from the true data distribution. A forward diffusion process progressively perturbs

x_{0}

through T steps:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I), t = 1, \dots, T,

where

{β_{t}}_{t = 1}^{T}

is a predefined noise schedule. As

t \to T

,

x_{T}

approaches an isotropic Gaussian distribution. The generative model learns a reverse process:

p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t)),

which is parameterized by a neural network

ϵ_{θ}

predicting the noise component.

In the standard setting, the model is trained to minimize the denoising score matching objective:

L_{diff} (θ) = E_{t, x_{0}, ϵ} [{∥ϵ - ϵ_{θ} (\sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, t)∥}_{2}^{2}],

where

α_{t} = 1 - β_{t}

and

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

.

3.3. Diffusion for Privacy Preservation

Our proposed D³P framework exploits the controllable denoising property of diffusion models to generate perturbed data

\tilde{x}

that retain essential semantics for downstream learning but mask the subtle patterns MIAs depend on. Unlike traditional data augmentation, D³P partially halts the reverse diffusion process at an intermediate timestep

t^{*} < T

, thereby outputting a partially denoised sample:

\tilde{x} = ReverseDiffusion (x_{T}, t^{*}),

where

t^{*}

is selected based on a privacy–utility optimization criterion.

4. Methodology

In this section, we present the proposed Diffusion-Driven Data Preprocessing (D³P) framework in detail. We begin by outlining the design philosophy behind using diffusion models for privacy preservation, followed by a formal description of the algorithm, and finally, we present our privacy–utility optimization formulation. The central idea is to exploit the generative denoising process to produce preprocessed samples that mask membership-specific patterns while retaining sufficient semantic fidelity for effective model training.

4.1. Design

Membership Inference Attacks (MIAs) exploit the phenomenon where a model exhibits different behavior for samples it has seen during training versus unseen samples. This discrepancy often arises due to overfitting, distributional memorization of training points, or overly confident predictions for known samples. An attacker, by observing model outputs such as confidence scores or logits, can detect such differences and infer the presence of specific data points in the training set. In sensitive domains such as healthcare or criminal justice, this can directly expose private information—for example, revealing that an individual’s medical record was used to train a diagnostic model could implicitly disclose their medical condition.

Existing defense strategies largely fall into three categories: (i) model-centric approaches, such as adversarial regularization or confidence score smoothing, which aim to reduce the discrepancy in model responses between training and non-training data; (ii) noise-based approaches, such as differential privacy (DP), which inject carefully calibrated randomness into the training process to bound the influence of individual samples; (iii) data-centric approaches, which modify the training data itself before learning to weaken the statistical link between individual records and the final trained model. While model-centric methods may require significant architectural changes or retraining from scratch, and DP methods often degrade accuracy significantly in high-dimensional settings, data-centric strategies hold the promise of being a lightweight preprocessing stage that can be applied universally without disrupting downstream workflows.

However, naive data perturbations—such as adding Gaussian noise directly to the raw inputs or applying random masking—are often insufficient. Excessive noise destroys task-relevant semantic content, leading to substantial utility loss, while insufficient noise fails to prevent MIAs. More sophisticated approaches, such as synthetic data generation via GANs or VAEs, have shown promise but suffer from instability in training (e.g., mode collapse) and lack fine-grained control over the degree of obfuscation. Crucially, these methods generally do not offer a mathematically principled way to trade off privacy protection against predictive performance.

The core design insight behind our proposed Diffusion-Driven Data Preprocessing (D³P) is that diffusion models naturally provide a continuous control mechanism over the corruption and reconstruction of data. A diffusion model gradually injects Gaussian noise into the data in a forward process, then learns a reverse process that denoises step-by-step. By halting the reverse process before complete reconstruction, we can generate partially denoised data that retain essential macro-level structures (e.g., class-relevant patterns) but are stripped of fine-grained, high-frequency features that could act as membership identifiers. The stopping point in this reverse process effectively becomes a privacy dial: earlier stops yield higher privacy but lower fidelity, while later stops preserve fidelity but may leave more membership signals intact.

This design has several desirable properties:

Modularity: Since D³P operates as a preprocessing step, it can be combined with any learning algorithm or architecture without modification.
Tunability: The degree of privacy protection can be finely adjusted by selecting the optimal reverse diffusion stopping point $t^{*}$ , determined through a formal privacy–utility optimization.
Stability: Unlike GANs, diffusion models are known for stable training and high fidelity, allowing for consistent and reproducible preprocessing.
Theoretical grounding: The noise injection process is governed by well-defined probabilistic transitions, enabling us to analyze and bound the adversary’s advantage.

By leveraging these properties, D³P aims to bridge the gap between practical deployability and formal privacy guarantees, producing data that are resilient to MIAs while ensuring that the resulting models maintain high predictive performance. In the following subsections, we formalize the forward and reverse processes in our framework, then detail the optimization procedure that selects the privacy-fidelity trade-off. Algorithm 1 gives a general pipeline of our proposal.

4.2. Forward and Reverse Processes in D³P

The foundation of the Diffusion-Driven Data Preprocessing (D³P) framework lies in the controlled application of the forward and reverse processes of a diffusion model. Diffusion models [26,27] are generative frameworks that learn to reverse a Markovian noising process, enabling them to produce high-fidelity synthetic data from random noise. In our case, the generative capacity is not used for complete reconstruction, but rather for generating partially denoised samples with privacy-preserving properties.

Forward Process: Let

x_{0} \in R^{d}

represent the original input data sample. In the forward process, Gaussian noise is progressively added to

x_{0}

over T discrete timesteps. The process is defined as follows:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I), t = 1, \dots, T,

where

{β_{t}}_{t = 1}^{T}

is the variance schedule controlling the rate at which noise is injected. A common choice is to increase

β_{t}

gradually, ensuring that early steps introduce only mild perturbations while later steps produce heavily corrupted samples. This process admits the closed-form sampling:

q (x_{t} ∣ x_{0}) = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I),

with

α_{t} = 1 - β_{t}

and

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

. As

t \to T

, the sample

x_{T}

approaches an isotropic Gaussian distribution, losing all original semantic content.

Reverse Process: A diffusion model learns the reverse transitions:

p_{θ} (x_{t - 1} ∣ x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ_{θ} (x_{t}, t)),

where

μ_{θ}

and

Σ_{θ}

are functions parameterized by a neural network

ϵ_{θ}

, trained to predict the noise component

ϵ

added in the forward process. In a standard generative setting, the reverse process starts from

x_{T} \sim N (0, I)

and iteratively denoises until

t = 0

, producing a fully reconstructed sample

{\hat{x}}_{0}

. In D³P, the reverse process is deliberately truncated at a carefully chosen intermediate step

t^{*} \in [1, T]

:

\tilde{x} = x_{t^{*}} .

Instead of reconstructing all the way to

x_{0}

, we halt early to produce a sample that is only partially denoised. This intermediate sample retains coarse semantic structures (e.g., shapes, textures, class-related patterns) but omits high-frequency details and precise feature correlations that could encode membership information exploitable by MIAs.

Controlled Obfuscation: The privacy effect of this truncation can be interpreted through an information-theoretic lens: earlier stopping points

t^{*}

result in lower mutual information

I (\tilde{x}; x_{0})

between the preprocessed sample

\tilde{x}

and the original data

x_{0}

. While excessive truncation (small

t^{*}

) yields high privacy by severely reducing this mutual information, it also risks eroding task-relevant features, leading to degraded model performance. Conversely, larger

t^{*}

values preserve more information but risk higher vulnerability to MIAs. The choice of

t^{*}

is therefore critical and is determined by the privacy–utility optimization described in the next subsection.

By embedding this tunable forward-reverse mechanism within a privacy-aware optimization framework, D³P gains the ability to systematically control the degree of obfuscation and thus the strength of MIA defenses, while minimizing utility degradation.

4.3. Privacy–Utility Optimization

The central challenge in designing D³P is determining the optimal truncation point

t^{*}

in the reverse diffusion process that simultaneously maximizes privacy protection against Membership Inference Attacks (MIAs) and preserves high predictive performance for downstream tasks. This is inherently a bi-objective optimization problem in which privacy and utility are competing objectives. Our approach introduces a mathematically principled method to quantify both and select

t^{*}

adaptively.

Privacy Metric: We define privacy in terms of the adversary’s membership advantage

Adv (A)

, which is the difference in the probability of correctly predicting membership for training versus non-training samples:

Adv (A) = Pr [f_{adv} (x) = 1 ∣ x \in D] - Pr [f_{adv} (x) = 1 ∣ x \notin D] .

In practice,

Adv (A)

is estimated using a surrogate attacker trained on an auxiliary dataset following the same distribution. A lower

Adv (A)

implies stronger privacy. To express privacy as a maximization objective, we define the privacy score:

P (t^{*}) = 1 - Adv (A_{surrogate} (t^{*})),

where

A_{surrogate} (t^{*})

is the attack model evaluated on data preprocessed with truncation at

t^{*}

.

Utility Metric: We measure utility by the predictive accuracy of the downstream model

M_{θ}

trained on the preprocessed dataset

{\tilde{D}}_{t^{*}}

:

U (t^{*}) = \frac{1}{| D_{test} |} \sum_{(x, y) \in D_{test}} 1 [M_{θ} (\tilde{x}) = y],

where

\tilde{x}

is the preprocessed version of x obtained by halting the reverse diffusion process at

t^{*}

. This definition is general and can accommodate alternative utility metrics such as F1-score or AUC, depending on the task.

Trade-off Objective: We combine the two metrics into a single scalar objective:

J (t^{*}) = λ \cdot P (t^{*}) + (1 - λ) \cdot U (t^{*}),

where

λ \in [0, 1]

controls the relative importance of privacy and utility. For privacy-critical applications (e.g., medical data),

λ

can be set closer to 1, while for performance-sensitive applications with moderate privacy concerns, smaller values of

λ

may be preferable.

Optimization Procedure: To identify the optimal truncation step

t_{opt}^{*}

, we evaluate

J (t^{*})

for a discrete set of candidate values

t^{*} \in {t_{min}, \dots, t_{max}}

:

t_{opt}^{*} = arg max_{t^{*}} J (t^{*}) .

The search space is generally small because

t^{*}

is bounded between 1 and T, and in practice we often subsample candidate points (e.g., every k steps) to reduce computational overhead. The evaluation of

P (t^{*})

requires running a surrogate MIA, while

U (t^{*})

is obtained via a quick training-validation cycle on the preprocessed data.

Information-Theoretic Perspective: From an information-theoretic standpoint, the privacy–utility trade-off in D³P can be viewed as a balance between minimizing the mutual information

I (\tilde{x}; x_{0})

to protect privacy and maximizing

I (\tilde{x}; y)

to preserve utility. Smaller

t^{*}

values tend to reduce both

I (\tilde{x}; x_{0})

and

I (\tilde{x}; y)

, whereas larger values increase them. The optimal

t^{*}

corresponds to the point where the decrease in

I (\tilde{x}; y)

is tolerable given the required drop in

I (\tilde{x}; x_{0})

for privacy protection.

This formalized optimization ensures that D³P is not a heuristic defense but rather a tunable, quantitatively guided preprocessing strategy. In the following subsection, we describe the complete algorithmic workflow that implements this optimization in practice.

4.4. Algorithm

We now formalize the complete workflow of the Diffusion-Driven Data Preprocessing (D³P) framework, integrating the forward-reverse diffusion mechanism and the privacy–utility optimization process described earlier. The algorithm operates in two primary phases: (1) model preparation, in which a diffusion model is trained to learn the reverse process; (2) privacy-aware preprocessing, in which the optimal truncation point

t_{opt}^{*}

is determined and used to generate privacy-preserving data.

Phase 1: Model Preparation In the first phase, we train a noise-predicting neural network

ϵ_{θ}

using the standard denoising score-matching objective:

L_{diff} (θ) = E_{t, x_{0}, ϵ} [{∥ϵ - ϵ_{θ} (\sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, t)∥}_{2}^{2}],

where

ϵ \sim N (0, I)

is sampled noise,

α_{t} = 1 - β_{t}

is the noise retention coefficient at step t, and

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

. The trained

ϵ_{θ}

enables efficient sampling from any intermediate step t in the reverse process.

Phase 2: Privacy-Aware Preprocessing Once the diffusion model is trained, we evaluate the privacy–utility trade-off across a set of candidate truncation points

{t_{1}, t_{2}, \dots, t_{K}}

. For each candidate t, we perform the following:

Forward corruption: Apply the forward diffusion process to $x_{0}$ until $x_{T}$ is reached.
Partial reconstruction: Apply the learned reverse process from $x_{T}$ down to $x_{t}$ to obtain ${\tilde{x}}_{t}$ .
Privacy evaluation: Use a surrogate membership inference attacker to estimate $P (t)$ .
Utility evaluation: Train a downstream model $M_{θ}$ on ${{\tilde{x}}_{t}}$ and compute $U (t)$ on the validation set.
Objective computation: Evaluate $J (t) = λ P (t) + (1 - λ) U (t)$ .

After evaluating all candidates, we select the following:

t_{opt}^{*} = arg max_{t} J (t) .

Final Preprocessing: With

t_{opt}^{*}

determined, we preprocess the entire training dataset

D

by applying the forward process to obtain

x_{T}

and the reverse process down to

t_{opt}^{*}

, yielding

\tilde{D}

. This preprocessed dataset is then used to train the final downstream model, which is expected to exhibit significantly reduced vulnerability to MIAs.

Algorithm 1: Diffusion-Driven Data Preprocessing (D³P)

4.5. Complexity Analysis

The computational complexity of the proposed Diffusion-Driven Data Preprocessing (D³P) framework arises from three primary components: (1) training the diffusion model; (2) evaluating candidate truncation steps for privacy–utility optimization; (3) applying the preprocessing to the entire dataset using the selected optimal truncation

t_{opt}^{*}

.

4.5.1. Diffusion Model Training

Training the noise-predicting network

ϵ_{θ}

involves iterating over the dataset

D

for multiple epochs, sampling random diffusion steps t and noise vectors

ϵ

for each data point. Let

E_{train}

be the number of training epochs and

C_{train}

denote the cost of one forward-backward pass through

ϵ_{θ}

. The training complexity is as follows:

O (E_{train} \cdot | D | \cdot C_{train}) .

This phase is a one-time cost and can be amortized across multiple preprocessing runs, which is beneficial in scenarios where the same trained diffusion model is reused with different privacy–utility trade-offs.

4.5.2. Candidate Evaluation

During the optimization phase, for each candidate truncation step t in the set

{t_{1}, \dots, t_{K}}

, we must perform:

A forward diffusion pass from $x_{0}$ to $x_{T}$ .
A reverse diffusion pass from $x_{T}$ down to $x_{t}$ .
Privacy evaluation using a surrogate MIA.
Utility evaluation using a downstream model.

The diffusion passes dominate the cost, as both forward and reverse processes require t steps each involving neural network inference. Let

C_{diff}

be the cost per reverse diffusion step, which typically scales linearly with the dimensionality of

x_{t}

and the size of

ϵ_{θ}

. The complexity for evaluating one candidate t is thus

O (| D | \cdot t \cdot C_{diff} + C_{MIA} + C_{util}),

where

C_{MIA}

is the cost of running the surrogate attack, and

C_{util}

is the cost of training and evaluating the downstream model for utility measurement. In practice,

C_{MIA}

and

C_{util}

can be reduced via sampling smaller subsets of

D

for candidate evaluation. Once

t_{opt}^{*}

is chosen, preprocessing the entire dataset requires a single forward diffusion pass to

x_{T}

and a reverse pass down to

x_{t_{opt}^{*}}

. This yields the complexity

O (| D | \cdot t_{opt}^{*} \cdot C_{diff}) .

Since

t_{opt}^{*} ≪ T

in practice (often by a factor of 4–10), this step is considerably more efficient than fully generating samples from Gaussian noise. This efficiency makes D³P suitable for large-scale datasets, including high-resolution images and high-dimensional tabular data.

4.5.3. Parallelization and Deployment Considerations

Both candidate evaluation and final preprocessing can be parallelized across multiple GPUs or compute nodes, as each data point’s forward and reverse passes are independent. Moreover, since the diffusion model is fixed after training, preprocessing can be integrated into distributed data pipelines (e.g., in cloud or federated settings) without requiring synchronization beyond parameter sharing of

ϵ_{θ}

.

In summary, the total runtime complexity of D³P can be expressed as follows:

O (E_{train} \cdot | D | \cdot C_{train}) + O (K \cdot | D_{sub} | \cdot t_{avg} \cdot C_{diff}) + O (| D | \cdot t_{opt}^{*} \cdot C_{diff}),

where

| D_{sub} | ≪ | D |

is the subset size used for candidate evaluation and

t_{avg}

is the mean truncation length among candidates. The dominating term during deployment is the last one, which is efficiently handled when

t_{opt}^{*}

is relatively small compared to T.

This detailed complexity characterization confirms that D³P is both computationally feasible and scalable, while retaining the flexibility to adjust privacy and utility according to application requirements. In the next section, we empirically evaluate the performance of D³P across diverse datasets and attack settings.

4.6. Comparison Between D³P and DP-SGD

Both Differentially Private Stochastic Gradient Descent (DP-SGD) and the proposed D³P framework aim to mitigate membership inference risk, but they are fundamentally different in their theoretical underpinnings.

Formally, a randomized mechanism

M

satisfies

(ε, δ)

-differential privacy if for any two neighboring datasets D and

D^{'}

differing by at most one element, and for any measurable subset

S

of outputs, the following holds:

Pr [M (D) \in S] \leq e^{ε} Pr [M (D^{'}) \in S] + δ .

(1)

In DP-SGD, the mechanism

M

is the training algorithm, where per-sample gradients

g_{i}

are clipped to a norm bound C and perturbed by Gaussian noise

N (0, σ^{2} I)

. This yields privacy guarantees through the composition of noisy gradient updates across training iterations. However, the cumulative noise scales with the dimensionality of the parameter space, often degrading the model’s predictive performance in practice.

By contrast, D³P enforces privacy at the data level by transforming samples

x \in X

into

\tilde{x} \in \tilde{X}

via a truncated reverse diffusion process:

\tilde{x} = D_{θ} (x, t), t \in [t_{min}, t_{max}],

(2)

where

D_{θ}

denotes the learned denoising operator parameterized by the diffusion model and t represents the diffusion step. Intuitively, larger t values induce stronger perturbations, suppressing membership-specific noise while retaining semantic features.

Rather than bounding sensitivity through

(ε, δ)

guarantees, D³P reduces the mutual information between original and preprocessed data:

I (X; \tilde{X}) = E_{p (x, \tilde{x})} [log \frac{p (x, \tilde{x})}{p (x) p (\tilde{x})}],

(3)

where

I (X; \tilde{X})

quantifies how much information about membership in X is retained in

\tilde{X}

. By design, the diffusion process is calibrated to minimize this mutual information while preserving task-relevant predictive features.

Fromally, let

A

denote an optimal membership inference adversary. If the preprocessing mechanism satisfies

I (X; \tilde{X}) \leq η

, then the advantage of

A

in distinguishing members from non-members is bounded by the following:

{Adv}_{A} \leq \sqrt{2 η} .

(4)

This implies that reducing mutual information through D³P directly lowers the upper bound on the success probability of any membership inference attack.

Thus, the essential difference lies in the locus of privacy enforcement: DP-SGD perturbs the learning dynamics at each optimization step to achieve

(ε, δ)

-DP, whereas D³P reshapes the data distribution itself to reduce leakage channels. Importantly, this decoupling allows D³P to remain model-agnostic and compatible with arbitrary downstream training procedures without incurring gradient-level noise.

4.7. Why Diffusion Removes Membership Signals

Consider a data distribution

X \sim P_{X}

where each sample x can be decomposed into a semantic component

s \in S

and a residual component

ϵ \in E

, i.e.,

x = s + ϵ

. Here s corresponds to low-frequency, task-relevant features that generalize across samples, while

ϵ

encodes high-frequency, sample-specific fluctuations that are disproportionately correlated with membership. Membership inference attacks (MIA) rely on the distinguishability of

ϵ

across training and non-training distributions.

The forward diffusion process is defined by the Markov chain

x_{t} = \sqrt{{\bar{α}}_{t}} x + \sqrt{1 - {\bar{α}}_{t}} z, z \sim N (0, I),

(5)

where

{{\bar{α}}_{t}}_{t = 1}^{T}

is a monotonically decreasing schedule. Writing

x = s + ϵ

, one obtains

x_{t} = \sqrt{{\bar{α}}_{t}} s + \sqrt{{\bar{α}}_{t}} ϵ + \sqrt{1 - {\bar{α}}_{t}} z .

(6)

As

t \to T

, the stochastic term

\sqrt{1 - {\bar{α}}_{t}} z

dominates, thereby asymptotically annihilating the information content of

ϵ

. Formally, if

I (A; B)

denotes mutual information, then one may show

I (ϵ; x_{t}) \leq \frac{{\bar{α}}_{t}}{2 σ^{2}} E [{∥ ϵ ∥}^{2}],

(7)

where

σ^{2}

is the variance of the Gaussian corruption. Thus

I (ϵ; x_{t}) \to 0

geometrically as

{\bar{α}}_{t} \to 0

, eliminating the statistical signal exploitable by an adversary.

From a frequency-domain perspective, let

F

denote the Fourier transform. The spectral density of the residual satisfies

P_{ϵ} (ω) = {E [| F (ϵ) (ω) |}^{2}]

. Under the forward diffusion operator

D_{t}

, the residual evolves as

F (ϵ_{t}) (ω) = \sqrt{{\bar{α}}_{t}} F (ϵ) (ω) + \sqrt{1 - {\bar{α}}_{t}} F (z) (ω) .

(8)

Consequently, the expected high-frequency energy decays exponentially:

E [∥ F (ϵ_{t}) ∥_{H}^{2}] \leq {\bar{α}}_{t} {∥ F (ϵ) ∥}_{H}^{2} + (1 - {\bar{α}}_{t}) σ^{2},

(9)

where

{∥ \cdot ∥}_{H}^{2}

restricts the norm to a high-frequency subspace

H \subset R^{d}

. Since membership-specific features are concentrated in H, their effective signal-to-noise ratio vanishes with t.

The reverse process

R_{θ}

approximates the posterior

p (x_{t - 1} ∣ x_{t})

and reconstructs

\tilde{x} = R_{θ} (x_{t})

. Because the learned score network is trained to minimize a denoising score-matching objective, it preferentially reconstructs semantic modes

s \in S

while attenuating idiosyncratic, non-manifold components

ϵ

. The resulting mapping can be expressed as a projection

\tilde{x} = Π_{M} (s + ϵ) + ξ, ξ \sim N (0, σ^{2} I),

(10)

where

Π_{M}

projects onto the data manifold

M \subset X

learned by the diffusion model. Since

ϵ \notin M

, it is effectively discarded in expectation.

Thus, diffusion enforces an implicit low-pass filtering operator both in the frequency domain and in the information-theoretic sense: it annihilates mutual information between residuals and noisy states while retaining manifold-consistent semantics. This dual mechanism explains mathematically why D³P suppresses membership-specific leakage.

5. Experiments

We evaluate the proposed Diffusion-Driven Data Preprocessing (D³P) framework under a general-purpose experimental setup designed to assess both its privacy-preserving capability against Membership Inference Attacks (MIAs) and its utility retention across standard machine learning tasks. This section details the datasets, models, attack configurations, baselines, and the key results. All experiments were conducted on a distributed GPU cluster with NVIDIA A100 nodes, using PyTorch 2.6.0 for both diffusion model training and downstream task learning.

5.1. Settings

5.1.1. Datasets and Preprocessing

We consider three benchmark datasets representing different data modalities:

CIFAR-10 (image data): 60,000 color images of size $32 \times 32$ in 10 classes, split into 50,000 training and 10,000 test samples. The dataset is publicly available at https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 2 August 2025).
Purchase-100 (tabular data): 197,324 binary feature vectors with 600 dimensions representing purchase histories, categorized into 100 classes. Following prior work on membership inference [2], the Purchase-100 dataset is available at https://github.com/privacytrustlab/datasets (accessed on 2 August 2025).
Texas-100 (tabular healthcare data): 67,330 patient records with 6170 binary features, representing hospital discharge data labeled by procedure codes. The original Texas hospital discharge dataset is maintained by the Texas Department of State Health Services at https://www.dshs.texas.gov/ (accessed on 2 August 2025). A commonly used pre-processed version for machine learning research is also hosted at https://github.com/privacytrustlab/datasets (accessed on 2 August 2025).

We selected CIFAR-10, Purchase-100, and Texas-100 as benchmark datasets to demonstrate both the effectiveness and the generality of our proposed defense. CIFAR-10 is one of the most widely used image classification datasets and serves as a standard benchmark in privacy-preserving machine learning research, enabling comparability with prior work. Purchase-100 contains customer purchase records and is commonly employed in membership inference studies because of its high dimensionality and strong privacy sensitivity, which make it a representative case of commercial data. Texas-100 consists of hospital discharge records and is frequently used as a benchmark for healthcare-related privacy evaluation, where data confidentiality is critical. Together, these datasets span multiple modalities (image and tabular), cover both synthetic benchmarks and real-world sensitive domains, and provide a balanced and representative evaluation setting for assessing the robustness of D³P against membership inference attacks. For all datasets, we normalize the features to

[0, 1]

and, for tabular datasets, apply principal component analysis (PCA) to retain

95 %

variance before training the diffusion model. This reduces computational load while preserving key data characteristics.

5.1.2. Experimental Setup

The D³P framework is instantiated with a

T = 1000

step cosine noise schedule for the forward diffusion process. The reverse process is parameterized by a U-Net architecture for images and a transformer-based denoiser for tabular data. The trade-off parameter

λ

in the privacy–utility optimization is set to

0.5

unless otherwise specified. Candidate truncations

t^{*}

are selected from the set

{50, 100, 150, 200, 300, 400}

.

The downstream task models are as follows:

Image: WideResNet-28-10 trained with SGD, batch size 128, initial learning rate $0.1$ , cosine annealing schedule.
Tabular: Two-layer ReLU MLP (512-256 hidden units) with Adam optimizer, learning rate $0.001$ , batch size 256.

The MIA setup follows the shadow model paradigm [2], where the attacker trains shadow models to mimic the target model’s behavior, then learns a binary classifier to distinguish between members and non-members based on softmax confidence vectors.

5.1.3. Baselines

We compare D³P against the following defense strategies:

No Defense: Standard training without any privacy mechanism.
Gaussian Noise (GN): Adding i.i.d. Gaussian noise with variance $σ^{2} = 0.05$ to training data.
Differential Privacy-SGD (DP-SGD) [11]: Clipping gradients to norm $1.0$ and adding Gaussian noise to achieve $(ε = 8, δ = 10^{- 5})$ -DP.
Adversarial Regularization (AR) [3]: Augmenting the loss function with an adversarial term penalizing membership leakage.

For the diffusion-based preprocessing, we use a forward diffusion step count of

T = 1000

with a linear noise schedule ranging from

β_{1} = 1 \times 10^{- 4}

to

β_{T} = 2 \times 10^{- 2}

. The denoising model is a UNet backbone with channel sizes

{64, 128, 256, 512}

, group normalization layers, and a dropout rate of

0.1

. Training employs the Adam optimizer with learning rate

1 \times 10^{- 4}

, weight decay

1 \times 10^{- 5}

, batch size 128, and gradient clipping at norm

1.0

. For reconstruction, we adopt an

ℓ_{2}

loss with coefficient

λ = 0.5

(default setting unless otherwise specified) and early stopping patience of 20 epochs. In baseline DP-SGD, we set clipping norm

1.0

, noise multiplier

1.2

, and privacy budget

(ε = 8, δ = 10^{- 5})

following prior standards. For adversarial regularization (AR), the adversarial strength is tuned from

{0.05, 0.1, 0.2}

with learning rate

5 \times 10^{- 4}

. Across all baselines, we perform grid search over learning rates

{10^{- 3}, 10^{- 4}, 5 \times 10^{- 5}}

, batch sizes

{64, 128, 256}

, and dropout rates

{0.0, 0.1, 0.2}

to ensure fairness. Model training is executed on NVIDIA A100 GPUs with mixed-precision acceleration enabled. These hyperparameters will be reported in detail in the appendix to ensure that our experiments can be reliably reproduced.

In our experiments, all baseline defenses (GN, AR, and DP-SGD) were tuned using grid search over key hyperparameters to ensure that each defense was evaluated under comparable utility conditions. Specifically, learning rates were searched over

{10^{- 3}, 10^{- 4}, 5 \times 10^{- 5}}

, batch sizes over

{64, 128, 256}

, and dropout rates over

{0.0, 0.1, 0.2}

. For DP-SGD, we additionally tuned clipping norms

{0.5, 1.0, 2.0}

and noise multipliers

{0.8, 1.0, 1.2}

, while for AR we explored adversarial regularization strengths

{0.05, 0.1, 0.2}

. Gaussian Noise (GN) baselines were tested with noise standard deviations

{0.01, 0.05, 0.1}

. The chosen configurations correspond to those achieving the best trade-off between predictive performance and privacy protection in each method. Importantly, this protocol ensured that baseline models achieved comparable utility levels before conducting membership inference evaluation, thereby avoiding bias in favor of our proposed method.

5.2. Results

This subsection provides consolidated quantitative evidence supporting the claims of strong privacy gains and minimal utility loss achieved by D³P across image (CIFAR-10) and tabular (Purchase-100, Texas-100) domains. We summarize accuracy and MIA success rates in two cross-dataset tables, followed by two figure panels that visualize (i) accuracy drops and (ii) MIA success rates for all defenses. Consistent with earlier sections, results are reported for No Defense, Gaussian Noise (GN), DP-SGD, Adversarial Regularization (AR), and D³P (ours). Figure 1 presents a visual comparison of sample images before and after D³P preprocessing.

The cross-dataset accuracy summary in Table 1 and the accuracy drop visualization in Figure 2 reveal a clear trend: D³P consistently incurs the smallest utility degradation among all privacy defenses tested, while still providing strong membership inference resistance. On CIFAR-10, the accuracy drop relative to the No Defense baseline is only

1.1

percentage points, outperforming Gaussian Noise (GN) and Differential Privacy-SGD (DP-SGD) by margins of

2.5

and

4.2

points, respectively. The pattern is consistent in tabular datasets: for Purchase-100, D³P shows a

1.9

point drop versus

5.8

for GN and

9.3

for DP-SGD; for Texas-100, the drop is

1.5

points versus

6.2

and

8.9

. This supports the claim that the partial denoising strategy in D³P effectively preserves class-discriminative features, avoiding the excessive corruption common in purely noise-based or gradient-based DP mechanisms. Table 2 and Figure 3 demonstrate that D³P achieves the largest absolute reductions in MIA success rates across all datasets, indicating strong privacy preservation. On CIFAR-10, D³P reduces the MIA success rate from

68.4 %

to

26.1 %

—a

42.3

point drop—compared to

19.2

for GN,

29.7

for DP-SGD, and

23.6

for Adversarial Regularization (AR). The gains are equally notable on Purchase-100 and Texas-100, where the reductions are

31.3

and

39.6

points, respectively. The cross-modality consistency suggests that the mechanism underlying D³P—truncating the reverse diffusion process at an optimized step

t_{opt}^{*}

—generalizes well beyond the image domain, even to sparse, high-dimensional tabular datasets.

An important insight from these results is the dataset-dependent nature of the optimal truncation step

t_{opt}^{*}

. For example, experiments reveal

t_{opt}^{*} \approx 200

for CIFAR-10, while for Purchase-100 it is closer to 300. This difference aligns with the intrinsic redundancy of the data: image datasets with spatial correlations can afford earlier truncation without substantial accuracy loss, whereas tabular datasets require more reconstruction steps to maintain feature integrity. The privacy–utility trade-off optimization formalized in Section 4 naturally captures these dynamics, allowing

t_{opt}^{*}

to adapt based on empirical evaluations rather than heuristic selection.

Compared to Gaussian Noise, which applies unstructured perturbations uniformly across all samples, D³P’s structured perturbations are derived from the learned reverse diffusion process, making them both semantically coherent and targeted at obfuscating membership-specific features. Against DP-SGD, D³P avoids the heavy utility cost associated with gradient clipping and noise injection at every training step, instead introducing a one-time preprocessing phase. Relative to AR, D³P’s decoupling of the defense mechanism from the target model training process simplifies integration into diverse deployment scenarios, particularly in federated and cloud-based ML workflows where architectural constraints may prevent adversarial co-training.

Visual inspection of CIFAR-10 samples preprocessed by D³P shows that early reverse diffusion truncation produces images that are visually almost identical to the originals but exhibit slight smoothing and the removal of fine-grained textures. These high-frequency components are precisely the kind of statistical signals that MIAs tend to exploit, meaning their removal directly contributes to the observed reduction in attack success. Crucially, the preserved coarse structures ensure that the downstream classifiers still perform well, aligning with the minimal accuracy drops reported in Table 1.

5.3. Effect of Noise Schedule

To further evaluate the versatility and robustness of the D³P framework, we conduct additional experiments under varying noise schedules, truncation search granularities, and trade-off parameter settings

λ

. These experiments aim to answer three questions: (1) How sensitive is D³P to the choice of diffusion noise schedule? (2) What is the effect of finer or coarser candidate truncation search spaces? (3) How does the privacy–utility balance shift with different

λ

values in the optimization?

We compare three standard diffusion variance schedules: Linear, Cosine, and Quadratic. The linear schedule increases

β_{t}

linearly from

β_{min} = 10^{- 4}

to

β_{max} = 0.02

, the cosine schedule follows the

{cos}^{2}

formulation from [31], and the quadratic schedule increases noise quadratically with t. Results in Table 3 show that while all schedules benefit from D³P’s optimization, the cosine schedule offers the best overall privacy–utility balance.

Effect of Candidate Search Granularity

We vary the candidate truncation set

T_{cand}

from coarse

{100, 200, 300, 400}

to fine

{50, 100, 150, \dots, 400}

and observe the impact on optimal trade-off selection. Table 4 shows that finer search leads to slightly improved privacy without hurting utility, indicating that

t_{opt}^{*}

benefits from more precise localization.

We evaluate

λ \in {0.25, 0.5, 0.75}

to assess the shift in privacy and utility emphasis. As expected, higher

λ

values favor privacy at the expense of some accuracy, while lower

λ

values prioritize accuracy over privacy. Table 5 quantifies this trade-off.

Notably, in our framework, the parameter

λ

controls the trade-off between privacy preservation and utility retention in the optimization objective. We set

λ = 0.5

as the default for our experiments, as this value provides a neutral baseline that assigns equal weight to both goals, thereby avoiding bias toward either extreme. Importantly,

λ

is not fixed by the method: larger values emphasize stronger privacy by enforcing more aggressive obfuscation of membership-related signals, while smaller values prioritize predictive accuracy by retaining more task-relevant features. This flexibility allows practitioners to tune

λ

according to domain-specific requirements, such as prioritizing privacy in healthcare data or accuracy in non-sensitive applications.

From Table 3, we observe that the cosine schedule yields the most favorable results, with both the highest accuracy and lowest MIA success rate. This can be attributed to the gradual noise decay pattern in the cosine schedule, which preserves more semantic content in the intermediate states used by D³P. The linear and quadratic schedules, while still effective, either introduce noise too aggressively early (linear) or too slowly (quadratic), leading to suboptimal trade-offs. Table 4 shows that finer search granularity improves privacy slightly (1.2 points reduction in MIA success rate) without sacrificing accuracy. This suggests that in deployments where computational resources allow, increasing the resolution of

t^{*}

search can yield measurable benefits.

Finally, Table 5 demonstrates the flexibility of D³P in adapting to different application needs. For example, with

λ = 0.75

, the MIA success rate is reduced to

22.4 %

(a

46.0

point reduction), albeit with a

0.8

point drop in accuracy compared to

λ = 0.25

. This tunability makes D³P suitable for a wide spectrum of privacy-sensitive environments—from medical AI systems requiring maximum confidentiality to commercial applications prioritizing performance. The trade-off curve in Figure 4 further confirms that

t_{opt}^{*} \approx 200

is the sweet spot for CIFAR-10 under the chosen settings. Deviating from this point in either direction results in lower

J (t^{*})

values, reinforcing the need for dataset-specific optimization.

To further stress-test the D³P framework and demonstrate its adaptability, we explore three new settings: (1) performance under limited training data availability; (2) resilience to stronger membership inference attacks; (3) scalability to higher-resolution image datasets. These settings aim to replicate real-world constraints where data scarcity, adaptive adversaries, and complex inputs are common.

Limited Training Data Scenario

We simulate a data-scarce environment by reducing the CIFAR-10 training set to

20 %

of its original size, keeping the test set unchanged. In this scenario, overfitting risks are higher, and MIAs become more effective. Results in Table 6 show that D³P mitigates this risk, substantially reducing MIA success while retaining more accuracy than other defenses.

Adaptive Membership Inference Attack

We evaluate D³P against a stronger MIA variant where the attacker employs calibration-based thresholding on confidence scores [7] and uses feature augmentation on shadow models. Table 7 shows that while the stronger attack increases baseline leakage, D³P remains the most resilient defense.

Scaling to Higher-Resolution Data

We test D³P on the CelebA dataset (aligned and cropped faces,

64 \times 64

resolution), evaluating binary attribute classification (Smiling vs. Not Smiling). This tests both computational scalability and effectiveness on a visually richer dataset. Table 8 shows that D³P maintains strong privacy performance while scaling to more complex inputs.

Analysis of Additional Results. In the limited data setting (Table 6), MIA success rates are significantly higher for the No Defense baseline due to increased overfitting. D³P reduces the success rate by

35.1

points, outperforming all baselines while maintaining accuracy within

1.1

points of No Defense. This demonstrates robustness in data-scarce scenarios where privacy risk is amplified.

Against the adaptive MIA (Table 7), which incorporates calibration and feature augmentation, all defenses show reduced effectiveness compared to the standard MIA. However, D³P still achieves a

38.1

point reduction, nearly 11 points better than the next-best defense (DP-SGD). This suggests that the structural obfuscation induced by partial denoising is difficult for adaptive attackers to circumvent. In the CelebA experiments (Table 8), D³P scales effectively to higher-resolution, visually complex inputs. It reduces MIA success rates by

32.0

points while incurring only a

1.4

point accuracy drop compared to No Defense. The scalability is partly due to the fact that the forward and reverse processes in diffusion models are resolution-agnostic once the network architecture is appropriately sized. Figure 5 visualizes the MIA success rates across all three new settings, reinforcing that D³P provides consistent and substantial privacy gains under diverse and challenging conditions.

It is also worth emphasizing that D³P is inherently parallelizable and thus well-suited for distributed environments. Since the preprocessing step operates independently across data samples, diffusion-based transformations can be executed concurrently on large-scale clusters or GPU arrays without inter-sample dependencies. This property not only reduces the wall-clock time of preprocessing but also makes D³P compatible with federated and cloud-based training scenarios, where computational resources are naturally distributed. Such parallelism highlights the practical feasibility of deploying D³P in real-world systems beyond the experimental settings presented here.

5.4. Evaluation on Out-of-Distribution (OOD) Test Sets

To comprehensively assess the robustness of D³P under distribution shift, we extend our experiments to out-of-distribution (OOD) test sets. Specifically, models trained on in-distribution (ID) data (CIFAR-10, Purchase-100, and Texas-100) are evaluated on structurally related but semantically disjoint datasets: CIFAR-100 for image classification, Purchase-20 for purchase behavior prediction, and Texas-50 for healthcare discharge records. This setting stresses whether membership inference risks persist when the test distribution

P_{test}

deviates from the training distribution

P_{train}

, i.e.,

P_{test} \neq P_{train}

. Such scenarios are common in practice, as models are routinely deployed in environments where data distributions evolve over time.

Theoretically, OOD testing can amplify privacy risk because adversaries may exploit mismatched generalization behaviors. In particular, if

f_{θ}

is a model trained on

P_{train}

, then for OOD input

x \sim P_{test}

, the prediction confidence

{max}_{y} p_{θ} (y | x)

tends to diverge more sharply between member and non-member samples, providing additional attack surface. Thus, a robust defense must ensure that the divergence

Δ_{MIA} = Pr [A (x \in D_{train})] - Pr [A (x \notin D_{train})]

remains bounded even under

P_{test} \neq P_{train}

.

Empirical results are reported in Table 9, which shows test accuracy and MIA success rates across multiple ID→OOD transfer settings. The following trends emerge: (1) without defense, MIA success remains alarmingly high (≥65%) across all OOD datasets, indicating that distribution shift alone does not mitigate leakage; (2) Gaussian Noise (GN) and Adversarial Regularization (AR) reduce leakage moderately but incur noticeable utility drops (3–6% points); (3) DP-SGD provides stronger privacy guarantees but sacrifices up to 10% points of OOD utility; (4) D³P achieves the most favorable trade-off, reducing MIA success to as low as 27–32% while preserving accuracy within 1–2 points of the undefended baseline.

These results underscore that D³P provides consistently strong protection against membership inference even under severe distributional shift. Unlike DP-SGD, which enforces privacy via gradient perturbation at the cost of broad utility degradation, D³P operates at the data level, preserving semantic structures and ensuring that OOD generalization remains intact. This robustness to distribution drift highlights the practicality of D³P for real-world deployments, where training and inference distributions rarely align perfectly.

5.5. Extended Evaluation

We further extend our evaluation along three complementary axes: (1) performance on additional healthcare datasets; (2) scalability to larger model architectures; (3) runtime and memory characteristics.

Healthcare Scenario. Beyond Texas-100, we evaluate on MIMIC-III (clinical notes) and eICU (ICU discharge records). Both datasets are widely used in privacy-preserving machine learning due to their sensitivity. Results in Table 10 show that D³P consistently reduces membership inference attack (MIA) success rates to below 30% while maintaining accuracy within 1–2 points of the baseline, outperforming Gaussian Noise (GN), Adversarial Regularization (AR), and DP-SGD.

Large-scale models. We additionally tested ResNet-152 on CIFAR-10 and a Transformer-based architecture on Purchase-100. As shown in Table 11, D³P scales favorably: MIA success is reduced by roughly 40 points relative to undefended models, while accuracy remains within 1.5 points of the baseline. In contrast, DP-SGD achieves privacy at the cost of substantial utility loss.

Runtime and memory analysis. Table 12 reports wall-clock training time per epoch and peak GPU memory usage for ResNet-50 (CIFAR-10) and a Transformer (Purchase-100). Gaussian Noise (GN) and AR incur negligible overhead, while DP-SGD increases training time by about

2.3 \times

due to gradient perturbation. D³P adds only ∼8% runtime overhead from diffusion preprocessing, with memory usage comparable to the baseline.

These results demonstrate that D³P provides strong privacy protection in healthcare domains, scales effectively to large-scale models, and achieves competitive runtime and memory efficiency compared to existing defenses.

6. Conclusions

In this work, we proposed a novel diffusion-model-based preprocessing framework designed to mitigate membership inference attacks by selectively obfuscating membership-sensitive high-frequency information while preserving semantically relevant task features. Unlike conventional model-centric or data-centric defenses, our approach operates independently of the training pipeline, enabling seamless integration into diverse AI systems without architectural modifications or retraining. Through a theoretically grounded optimization of the reverse diffusion depth, we achieved a principled balance between privacy and utility, which is validated across multiple datasets, model architectures, and adversarial settings. Experimental results demonstrated consistent reductions in membership inference attack success rates, with minimal accuracy degradation, highlighting the method’s practicality for real-world privacy-preserving AI deployments.

Author Contributions

Conceptualization, Z.Y. and X.T.; Methodology, G.C. and X.T.; Software, Z.Y.; Validation, Z.Y., G.C. and X.T.; Formal analysis, G.C.; Investigation, X.Y.; Resources, X.Y.; Data curation, Z.Y.; Writing—original draft preparation, Z.Y.; Writing—review and editing, Z.Y., G.C. and X.T.; Visualization, Z.Y.; Supervision, X.T.; Project administration, X.T.; Funding acquisition, X.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the Theory of Cryptography Conference (TCC), New York, NY, USA, 4–7 March 2006. [Google Scholar]
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks Against Machine Learning Models. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–24 May 2017; pp. 3–18. [Google Scholar]
Nasr, M.; Shokri, R.; Houmansadr, A. Machine Learning with Membership Privacy using Adversarial Regularization. In Proceedings of the USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018. [Google Scholar]
Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Zhang, W.; Zhou, W.; Zhu, L. Feature-Based Machine Unlearning for Vertical Federated Learning in IoT Networks. IEEE Trans. Mob. Comput. 2025, 24, 5031–5044. [Google Scholar] [CrossRef]
Salem, A.; Zhang, Y.; Humbert, M.; Berrang, P.; Fritz, M.; Backes, M. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
Long, Y.; Bindschaedler, V.; Wang, L.; Bu, D.; Wang, X.; Tang, H.; Gunter, C.A.; Chen, K. Understanding membership inferences on well-generalized learning models. arXiv 2018, arXiv:1802.04889. [Google Scholar] [CrossRef]
Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting. In Proceedings of the Computer Security Foundations Symposium (CSF), Oxford, UK, 9–12 July 2018. [Google Scholar]
Nasr, M.; Shokri, R.; Houmansadr, A. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–22 May 2019. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016. [Google Scholar]
Jayaraman, B.; Evans, D. Evaluating Differentially Private Machine Learning in Practice. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1895–1912. [Google Scholar]
Choquette, A.; Kulkarni, S.; Barman, A.; Hsu, J.; Li, J.; Rogers, R.; Steinhardt, J. Label-Only Membership Inference Attacks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021. [Google Scholar]
Liu, Y.; Gao, Y.; Ji, S.; Han, X.; Li, G. A Survey of Membership Inference Attacks and Defenses in Machine Learning. Comput. Secur. 2024, 2, 404–454. [Google Scholar]
Al-Rubaie, M.; Chang, J.M. Privacy-preserving Machine Learning: Threats and Solutions. IEEE Secur. Priv. 2019, 17, 49–58. [Google Scholar] [CrossRef]
Hasan, B.M.S.; Abdulazeez, A.M. A review of principal component analysis algorithm for dimensionality reduction. J. Soft Comput. Data Min. 2021, 2, 20–30. [Google Scholar] [CrossRef]
Hitaj, B.; Ateniese, G.; Perez-Cruz, F. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, USA, 30 October–3 November 2017; pp. 603–618. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Torkzadehmahani, R.; Abadi, M.; Zhao, A.; Chen, T.; Song, D. DP-GAN: Differentially Private Generative Adversarial Network. In Proceedings of the International Conference on Learning Representations (ICLR) Workshop, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Xie, L.; Lin, K.; Wang, H.; Sang, Y.; Zhou, L. Differentially Private Generative Adversarial Network. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
Chen, D.; Orekondy, T.; Fritz, M. GAN-leaks: A Taxonomy of Membership Inference Attacks against GANs. In Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), Virtual, 9–13 November 2020. [Google Scholar]
Zhang, Z.; Wang, T.; Li, N.; Honorio, J.; Backes, M.; He, S.; Chen, J.; Zhang, Y. {PrivSyn}: Differentially private data synthesis. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, 11–13 August 2021; pp. 929–946. [Google Scholar]
Jordon, J.; Yoon, J.; Van Der Schaar, M. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020. [Google Scholar]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Dhariwal, P.; Nichol, A.Q. Diffusion Models Beat GANs on Image Synthesis. arXiv 2021, arXiv:2105.05233. [Google Scholar] [CrossRef]
Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. DiffWave: A Versatile Diffusion Model for Audio Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Li, X.L.; Gao, T.; Li, Y.; Peng, H.; Khashabi, D.; Durme, B.V.; Han, J.; Wang, Y.; Callison-Burch, C.; Durrett, G.; et al. Diffusion-LM Improves Controllable Text Generation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Nichol, A.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021. [Google Scholar]
Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Lugmayr, A.; Danelljan, M.; Romero, A.; Suwajanakorn, S.; Schonlieb, C.B. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. arXiv 2022, arXiv:2201.09865. [Google Scholar] [CrossRef]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.; Ghasemipour, S.; Ho, J.; Salimans, T.; Fleet, D.J. Image Super-Resolution via Iterative Refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–23 June 2022. [Google Scholar]
Osorio-Roig, D.; Rathgeb, C.; Drozdowski, P.; Busch, C. Stable hash generation for efficient privacy-preserving face identification. IEEE Trans. Biom. Behav. Identity Sci. 2021, 4, 333–348. [Google Scholar] [CrossRef]
Xue, D.; Ma, H.; Li, L.; Liu, D.; Xiong, Z. aiWave: Volumetric Image Compression with 3-D Trained Affine Wavelet-like Transform. arXiv 2022, arXiv:2203.05822. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Visual comparison of sample images before and after D³P preprocessing.

Figure 2. Accuracy drops relative to No Defense across defenses. The colored bars indicate different datasets: blue = CIFAR-10, orange = Purchase-100, and red = Texas-100. The x-axis categories correspond to defense mechanisms: No Defense (baseline), Gaussian Noise (GN), Differential Privacy-SGD (DP-SGD), Adversarial Regularization (AR), and D³P (ours).

Figure 3. Membership Inference Attack (MIA) success rates across defenses. The colored bars indicate different datasets: blue = CIFAR-10, orange = Purchase-100, and red = Texas-100. The x-axis categories correspond to defense mechanisms: No Defense, Gaussian Noise (GN), DP-SGD, Adversarial Regularization (AR), and D³P (ours). Lower values indicate stronger privacy protection.

Figure 4. Privacy–utility trade-off curve

J (t^{*})

for CIFAR-10 (cosine schedule,

λ = 0.5

). The peak at

t^{*} \approx 200

corresponds to the optimal trade-off.

Figure 4. Privacy–utility trade-off curve

J (t^{*})

for CIFAR-10 (cosine schedule,

λ = 0.5

). The peak at

t^{*} \approx 200

corresponds to the optimal trade-off.

Figure 5. MIA success rates under different experimental settings: (i) limited training data (blue), (ii) adaptive MIA (red), and (iii) higher-resolution data (brown). The x-axis shows different defense methods: No Defense, Gaussian Noise (GN), DP-SGD, Adversarial Regularization (AR), and D³P (ours).

Table 1. Cross-dataset top-1 accuracy (%) and absolute accuracy drop (% points) relative to No Defense. D³P retains near-baseline utility across modalities.

Dataset	Metric	No Defense	GN	DP-SGD	AR	D³P (Ours)
CIFAR-10	Accuracy	94.8	91.2	89.5	93.0	93.7
CIFAR-10	Drop	–	3.6	5.3	1.8	1.1
Purchase-100	Accuracy	85.4	79.6	76.1	82.0	83.5
Purchase-100	Drop	–	5.8	9.3	3.4	1.9
Texas-100	Accuracy	90.2	84.0	81.3	87.5	88.7
Texas-100	Drop	–	6.2	8.9	2.7	1.5

Table 2. Cross-dataset MIA success rate (%) and absolute reduction (% points) relative to No Defense. D³P consistently yields the largest reductions.

Dataset	Metric	No Defense	GN	DP-SGD	AR	D³P (Ours)
CIFAR-10	MIA Success	68.4	49.2	38.7	44.8	26.1
CIFAR-10	Reduction	–	19.2	29.7	23.6	42.3
Purchase-100	MIA Success	64.7	50.8	41.5	47.3	33.4
Purchase-100	Reduction	–	13.9	23.2	17.4	31.3
Texas-100	MIA Success	66.2	50.5	41.8	47.0	26.6
Texas-100	Reduction	–	15.7	24.4	19.2	39.6

Table 3. Impact of diffusion noise schedules on CIFAR-10. Accuracy and MIA success rates are reported at

t_{opt}^{*}

for each schedule.

Table 3. Impact of diffusion noise schedules on CIFAR-10. Accuracy and MIA success rates are reported at

t_{opt}^{*}

for each schedule.

Noise Schedule	Accuracy (%)	MIA Success (%)	Reduction (%)
Linear	93.4	28.3	40.1
Cosine	93.7	26.1	42.3
Quadratic	93.1	29.5	38.9

Table 4. Impact of truncation search granularity on CIFAR-10 (cosine schedule).

Granularity	Accuracy (%)	MIA Success (%)	Reduction (%)
Coarse ( $Δ t = 100$ )	93.5	27.3	41.1
Fine ( $Δ t = 50$ )	93.7	26.1	42.3

Table 5. Impact of

λ

on CIFAR-10 performance (cosine schedule, fine granularity).

Table 5. Impact of

λ

on CIFAR-10 performance (cosine schedule, fine granularity).

$λ$	Accuracy (%)	MIA Success (%)	Reduction (%)
0.25	94.0	29.8	38.6
0.50	93.7	26.1	42.3
0.75	93.2	22.4	46.0

Table 6. CIFAR-10 with 20% training data. Accuracy and MIA success rates under limited data conditions.

Defense	Accuracy (%)	MIA Success (%)	Reduction (%)
No Defense	88.1	74.5	–
GN	84.0	55.2	19.3
DP-SGD	81.7	47.5	27.0
AR	86.2	50.8	23.7
D³P (ours)	87.0	39.4	35.1

Table 7. Adaptive MIA results on CIFAR-10 (full training set).

Defense	Accuracy (%)	MIA Success (%)	Reduction (%)
No Defense	94.8	72.6	–
GN	91.2	55.4	17.2
DP-SGD	89.5	45.0	27.6
AR	93.0	49.1	23.5
D³P (ours)	93.7	34.5	38.1

Table 8. CelebA (

64 \times 64

) attribute classification results.

Table 8. CelebA (

64 \times 64

) attribute classification results.

Defense	Accuracy (%)	MIA Success (%)	Reduction (%)
No Defense	91.5	65.8	–
GN	88.0	50.9	14.9
DP-SGD	85.6	41.4	24.4
AR	89.3	45.7	20.1
D³P (ours)	90.1	33.8	32.0

Table 9. Performance of defenses under OOD evaluation. Models are trained on in-distribution (ID) datasets and tested on semantically related but disjoint OOD datasets. Reported values are test accuracy (Acc, %) and membership inference attack success rate (MIA, %).

Train → Test	Defense	Acc (%)	MIA (%)
CIFAR-10 → CIFAR-100	No Defense	54.1	67.3
	GN	50.8	54.7
	AR	51.2	52.9
	DP-SGD	45.0	42.1
	D³P	52.6	29.4
Purchase-100 → Purchase-20	No Defense	71.4	65.8
	GN	68.2	53.6
	AR	69.0	50.9
	DP-SGD	63.1	41.7
	D³P	70.3	31.8
Texas-100 → Texas-50	No Defense	68.5	66.2
	GN	65.1	54.0
	AR	65.8	52.4
	DP-SGD	59.7	43.5
	D³P	67.0	27.9

Table 10. Healthcare domain evaluation. Reported values are test accuracy (Acc, %) and MIA success (MIA, %).

Dataset	Defense	Acc (%)	MIA (%)
MIMIC-III	No Defense	73.4	68.7
	DP-SGD	65.2	44.1
	AR	69.1	50.2
	D³P	71.8	28.9
eICU	No Defense	70.1	66.4
	DP-SGD	62.5	41.8
	AR	65.4	48.7
	D³P	68.9	29.7

Table 11. Evaluation on large-scale models. Reported values are test accuracy (Acc, %) and MIA success (MIA, %).

Model	Defense	Acc (%)	MIA (%)
ResNet-152 (CIFAR-10)	No Defense	94.1	74.8
	DP-SGD	86.3	46.9
	D³P	92.7	32.5
Transformer (Purchase-100)	No Defense	77.8	70.5
	DP-SGD	69.4	43.8
	D³P	76.1	30.7

Table 12. Runtime and memory comparison. Reported values are training time per epoch (s) and peak GPU memory (GB).

Setting	Defense	Time (s)	Memory (GB)
ResNet-50 (CIFAR-10)	No Defense	42.1	7.8
	GN	44.0	7.9
	DP-SGD	97.5	9.4
	D³P	45.6	8.0
Transformer (Purchase-100)	No Defense	58.7	11.2
	GN	61.3	11.4
	DP-SGD	134.9	13.5
	D³P	63.5	11.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Z.; Yan, X.; Chen, G.; Tian, X. Mitigating Membership Inference Attacks via Generative Denoising Mechanisms. Mathematics 2025, 13, 3070. https://doi.org/10.3390/math13193070

AMA Style

Yang Z, Yan X, Chen G, Tian X. Mitigating Membership Inference Attacks via Generative Denoising Mechanisms. Mathematics. 2025; 13(19):3070. https://doi.org/10.3390/math13193070

Chicago/Turabian Style

Yang, Zhijie, Xiaolong Yan, Guoguang Chen, and Xiaoli Tian. 2025. "Mitigating Membership Inference Attacks via Generative Denoising Mechanisms" Mathematics 13, no. 19: 3070. https://doi.org/10.3390/math13193070

APA Style

Yang, Z., Yan, X., Chen, G., & Tian, X. (2025). Mitigating Membership Inference Attacks via Generative Denoising Mechanisms. Mathematics, 13(19), 3070. https://doi.org/10.3390/math13193070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mitigating Membership Inference Attacks via Generative Denoising Mechanisms

Abstract

1. Introduction

2. Related Work

2.1. Membership Inference Attacks and Defenses

2.2. Data Perturbation and Synthetic Data Generation

2.3. Diffusion-Based Generative Models

3. Preliminaries

3.1. Membership Inference Attacks (MIAs)

3.2. Diffusion Models

3.3. Diffusion for Privacy Preservation

4. Methodology

4.1. Design

4.2. Forward and Reverse Processes in D3P

4.3. Privacy–Utility Optimization

4.4. Algorithm

4.5. Complexity Analysis

4.5.1. Diffusion Model Training

4.5.2. Candidate Evaluation

4.5.3. Parallelization and Deployment Considerations

4.6. Comparison Between D3P and DP-SGD

4.7. Why Diffusion Removes Membership Signals

5. Experiments

5.1. Settings

5.1.1. Datasets and Preprocessing

5.1.2. Experimental Setup

5.1.3. Baselines

5.2. Results

5.3. Effect of Noise Schedule

Limited Training Data Scenario

5.4. Evaluation on Out-of-Distribution (OOD) Test Sets

5.5. Extended Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2. Forward and Reverse Processes in D³P

4.6. Comparison Between D³P and DP-SGD