Privacy Auditing in Differential Private Machine Learning: The Current Trends

Namatevs, Ivars; Sudars, Kaspars; Nikulins, Arturs; Ozols, Kaspars

doi:10.3390/app15020647

Open AccessReview

Privacy Auditing in Differential Private Machine Learning: The Current Trends

Institute of Electronics and Computer Science, 14 Dzerbenes St., LV-1006 Riga, Latvia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 647; https://doi.org/10.3390/app15020647

Submission received: 3 December 2024 / Revised: 31 December 2024 / Accepted: 8 January 2025 / Published: 10 January 2025

Download Versions Notes

Abstract

Differential privacy has recently gained prominence, especially in the context of private machine learning. While the definition of differential privacy makes it possible to provably limit the amount of information leaked by an algorithm, practical implementations of differentially private algorithms often contain subtle vulnerabilities. Therefore, there is a need for effective methods that can audit

(ϵ, δ)

differentially private algorithms before they are deployed in the real world. The article examines studies that recommend privacy guarantees for differential private machine learning. It covers a wide range of topics on the subject and provides comprehensive guidance for privacy auditing schemes based on privacy attacks to protect machine-learning models from privacy leakage. Our results contribute to the growing literature on differential privacy in the realm of privacy auditing and beyond and pave the way for future research in the field of privacy-preserving models.

Keywords:

differential privacy; differential private machine learning; differential privacy auditing; privacy attacks

1. Introduction

In today’s data-driven world, more and more researchers and data scientists are using machine learning to develop better models or more innovative solutions for a better future. These models often tend to use sensitive (e.g., health-related personal data and proprietary data) [1] or private data (e.g., personal identifiable information, such as age, name, and user input data), which can lead to privacy issues [2]. When using data containing sensitive information, the individual’s right to privacy must be respected, both from an ethical and legal perspective [3]. The functionality of privacy modeling for the privacy landscape ranges from descriptive queries to training large machine-learning (ML) models with millions of parameters [4]. Moreover, ML subsets of deep-learning algorithms can analyze and process large amounts of data collected from different users or devices to detect unusual patterns [5]. On the other hand, ML systems are exposed to several serious vulnerabilities. This logically leads to the consideration that training ML models are vulnerable to privacy attacks. Therefore, it is crucial for the practical application of ML models and algorithms to protect the privacy of input datasets, training data, or data that must be kept secret during inference.

Numerous works have shown that data and parameters of ML models can leak sensitive information about their training, for example, in statistical modeling [6,7,8,9]. There are several causes of data leakage, such as overfitting and influence [10], model architecture [11], or memorization [12]. If your personal or important data are used to train an ML model, you may want to ensure that an intruder cannot steal your data. To measure and reduce the likelihood of sensitive data leakage, there are various mitigation and protection strategies.

A robust framework for protecting sensitive data in statistical databases, especially through mechanisms such as noise addition and gradient clipping, is a proven mathematical framework called differential privacy (DP) [2,13]. The core idea of DP is to add noise to the data or model parameters to obscure an individual’s influence on a data release [14], where the unit of privacy characterizes what you are trying to protect. To clarify DP, it must be provably guaranteed that an attacker is not able to reliably predict whether or not a particular individual is included in the dataset. Consequently, such an approach can provide a strong privacy guarantee for individuals. In this context, DP is a powerful privacy-preserving tool to prevent sensitive information about an individual from being revealed in a variety of ML models and to analyze the privacy of/from the training data.

The integration of DP methods into ML models makes them more robust to privacy attacks and paves the way for differential private machine learning (DPML) [15,16,17,18]. In this regard, ML models using DP algorithms could guarantee that each user’s contribution to the dataset does not result in a significantly different model [19]. However, the advantages of ML models’ accuracy with DP’s strong privacy guarantees and ease of decentralization [20] come with a price [21], especially when aiming for a low-privacy parameter [22]. For example, models trained with the Differentially Private Stochastic Gradient Descent (DP-SGD) privacy algorithm [23] show a significant decrease in accuracy compared to non-DP models [24,25]. The main reason for this could be that the privacy analysis of existing DPML methods and algorithms (e.g., DP-SGD) is overly cautious in real-world scenarios.

Ensuring privacy in DPML raises the following key questions: How can we guarantee the privacy of the model? Does our model reveal private information? What level of differential privacy does an algorithm satisfy? Answering these questions is crucial, because overestimating the privacy guarantee leads to a decrease in the accuracy of the model, while underestimating it leads to a privacy leakage [26].

To prevent privacy leakage from ML models, we use a DP framework that adds a calculated amount of noise or randomness to hide each individual’s contribution to the data, thus reducing the risk of privacy leakage from small changes in a dataset [27]. A common approach is to add noise to the data during the training process [28]. The process of determining how to add noise is called mechanism in the context of DP and can be influenced by several factors, including the specific noise distribution (e.g., Laplacian and Gaussian mechanisms), the desired level of the privacy, and the type of query. DP can also facilitate effective data-partitioning strategies when sensitive information is distributed across multiple datasets or partitions. By ensuring that each portion adheres to DP standards, organizations can analyze aggregated data without compromising individual privacy. These strategies are used, for example, when data cannot be centralized due to privacy concerns (e.g., federated learning) [29]. There are other approaches where noise is added to the inputs, outputs, ground truth labels, or even to the whole model [30]. As a result, the algorithm can still learn from the data and make accurate predictions about decisions. Adding noise provides a strong worst-case privacy guarantee for ML algorithms [31]. Moreover, there is a crucial technique in the DP context—gradient clipping, which is used in training ML models. It helps to ensure that the contribution of individual data points to the model’s gradients remains bounded, improving privacy guarantees while preserving the performance of the model. The purpose of gradient clipping is twofold. First, bounding the gradients reduces the sensitivity of the model output to individual training examples, as doing so is essential for ensuring DP. Second, gradient clipping helps prevent overfitting by avoiding extreme updates that could lead to the memorization of specific data points. Although DP is a formalization stating that a query should not reveal that an individual is present in a trained dataset. It should be noted that there are recent approaches in which ML models are trained non-privately and their predictions are de-noised before being released to satisfy DP [32]. This means that DP gives the user a quantitative guarantee of how distinguishable an individual’s information can be to a potential attacker.

Differential privacy [2,33] ensures that running the algorithm on two adjacent datasets,

D

and

D^{'}

, results in two approximately equal distributions that differ in one data point, and that the two distributions are approximately equal. The privacy level is often characterized by the privacy parameters (also known as privacy risk):

ϵ

, i.e., the privacy loss; and

δ

, i.e., the probability of deviation from the privacy guarantee. Together, these parameters form a mathematical framework for quantifying privacy and allow the fine-tuning the privacy level to balance data utility and privacy concerns. Choosing appropriate privacy parameters is challenging but crucial, as weak parameters can lead to excessive privacy leakage, while strong parameters can compromise the utility of the model utility [34]. A small

ϵ

ensures that an attacker cannot reliably distinguish whether the algorithm has processed

D

or

D^{'}

; that is, it provides strong privacy but less accuracy. Meanwhile, a large

ϵ

provides weaker privacy guarantees [35,36]. This parameter controls the trade-off between privacy and utility. Since there are no guidelines on how to set the right amount of ϵ and δ in practice, this can be a challenging process. Even when implemented correctly, there are several known cases where published DP algorithms with miscalculated privacy guarantees incorrectly report a higher level of privacy [37,38]. In order to provide the expected privacy guarantees for the DPML model, the privacy auditing must be used.

Privacy auditing—the process of testing privacy guarantees—relies on multiple model training runs in different privacy configurations to effectively detect privacy leakage [39,40]. There are many reasons why one would want to audit the privacy guarantees of a privately differentiated algorithm. First, if we check and the audited value of ϵ is greater than the (claimed) upper bound, the privacy proof is false, and there is an error or bug in our algorithm [34,41]. Second, if we audit and the audited value of ϵ matches, then we can say that our privacy proof is a tight privacy estimate or tight auditing, and our privacy model does not need to be improved [42]. Tight auditing refers to the process of empirically estimating the privacy level of a DP algorithm in a way that closely matches its theoretical privacy guarantees. The goal is to obtain an accurate estimate of the actual privacy provided by the algorithm when applied to real-world data. Existing auditing scenarios for DP suffer from the limitation that they provide narrow estimates under implausible worst-case assumptions and that they require thousands or millions of training runs to produce non-trivial statistical estimates of privacy leakage [43]. Third, if we are unable to rigorously prove how private our model is, then auditing provides a heuristic measure of how private it is [44].

In practice, the differential privacy audit [45,46,47,48,49,50] of ML modeling has been proposed to empirically measure and analyze the privacy leakage through the DPML algorithm. To investigate and audit the privacy of data and models, you must first apply a specific type of attack, called privacy attack, to a DP algorithm and then perform an analysis, for example, a statistical calculation. To evaluate data leakage in ML, we categorize privacy attacks into membership inference attacks [51,52,53], data-poisoning attacks [54,55], model extraction attacks [56,57], model inversion attacks [58,59], and property inference attacks [60]. In addition, assumptions must be made about the attacker’s knowledge and ability to access the model in either black-box or white-box settings. Finally, the attacker’s success is converted into an

ϵ, δ

estimate using an attack’s evaluation procedure. The privacy attack, together with the privacy assessment, forms an auditing scheme. For example, most auditing schemes [46,47,61,62] have been developed for centralized settings.

Motivation for the research. The aim of this review paper is to provide a comprehensive and clear overview of the study of privacy auditing schemes issued in the context of differential private machine learning. The following aspects are considered:

The implementation of differential privacy in consumer-use cases makes greater privacy awareness necessary, thus raising both data-related and technical concerns. As a result, privacy auditors are looking for scalable, transparent, and powerful auditing methods that enable accurate privacy assessment under realistic conditions.
Auditing methods and algorithms have been researched and proven effective for DPML models. In general, auditing methods can be categorized according to privacy attacks. However, the implementation of sophisticated privacy auditing requires a comprehensive privacy-auditing methodology.
Existing privacy-auditing techniques are not yet well adapted to specific tasks and models, as there is no clear consensus on the privacy loss parameters to be chosen, such as ϵ, algorithmic vulnerabilities, and complexity issues. Therefore, there is an urgent need for effective auditing schemes that can provide empirical guarantees for privacy loss.

Contributions. This paper provides a comprehensive summary of privacy attacks and violations with practical auditing procedures for each attack or violation. The main contributions can be summarized as follows:

We systematically present types and techniques for privacy attacks in the context of differential privacy machine-learning modeling. Recent research on privacy attacks for privacy auditing is categorized into five main categories: membership inference attacks, data-poisoning attacks, model inversion attacks, model extraction attacks, and property inference.
A structured literature review of existing approaches to privacy auditing in differential privacy is conducted with examples from influential research papers. The comprehensive process of proving auditing schemes is presented. An in-depth analysis of auditing schemes is provided, along with an abridged description paper of the papers.

The rest of this article is organized as follows: The following section provides an overview of the relevant background of the theoretical foundations of differential privacy, including its mathematical definitions and basic properties. Section 3 describes the types of privacy attacks on ML models before evaluating privacy leakage in the context of differential privacy. Section 4 presents various privacy auditing schemes based on privacy attacks and privacy violations, along with some influential paper examples. Section 5 discusses the manuscript and provides future research trends.

2. Preliminaries

In this section, the brief theoretical and mathematical foundations for differential privacy are presented.

2.1. Differential Privacy Fundamentals

An individual’s privacy is closely related to intuitive notions of privacy, such as release of privacy unit and privacy loss [63]. The privacy unit (e.g., person) quantifies how much influence a person can have on the dataset. The privacy loss quantifies how recognizable the data release is. The formalization of differential privacy is defined in relation to the privacy unit and the privacy loss. The DP framework can ensure that the insertion or deletion of a record in a dataset has no effect on the query results, thus ensuring privacy. To satisfy the DP, a random function called “mechanism” is used. Any function can be a mechanism as long as we can mathematically prove that the function satisfies the given definition of differential privacy. The relevant definitions, proof, and theorems are presented below.

DP mechanism: DP relies on rigorous mathematical proofs to ensure privacy guarantees. These foundations help us to understand the behavior of DP models and determine the privacy loss [64,65]. DP is defined in terms of privacy unit (input) and privacy loss (output) of a randomized function. The description of this function that satisfies DP is called the mechanism

M (\cdot)

[66].

Adjacent datasets: Two datasets,

D

and

D^{'}

, are adjacent if

D \subset D^{'}

differs by the change in a single individual

| D | \pm 1

. To determine whether your data analysis is a DP data analysis, you must provide data transformations that contain each function from a dataset to a dataset. For example, if you are using functions to help you understand your data, the properties or statistics you are using are statistical queries.

Inbounded and bounded DP [67,68]: If the dataset is not known, you are operating under unbounded DP (e.g., the sets of possible datasets is of any size). In contrast, if the dataset is known, you are operating under bounded DP (e.g., the sets of possible datasets are known size).

Pure DP [2]: In the original DP, a mechanism,

M,

satisfies

ϵ

-DP if for all pairs of adjacent datasets,

D

and

D^{'}

, differing by one individual, and for all possible sets of outputs,

S

, of the algorithm, the following identity is as shown:

\Pr [M (D) \in S] \leq e^{ϵ} \times P r [M (D^{'}) \in S],

(1)

where

P r

denotes probability,

ϵ

is the privacy budget (also known as a privacy risk or a privacy loss parameter) representing the degree of privacy protection, and

e^{ε}

is amount of information leakage or the maximum difference between the outcomes of two transformations.

Approximate DP: In approximate DP, a small failure probability of error,

δ

, is added to pure DP to relax the constraint:

\Pr [M (D) \in S] \leq e^{ϵ} \times P r [M (D^{'}) \in S] + δ

(2)

This makes it easier to design practical algorithms that keep the privacy guarantees perfect with higher utility, especially when the dataset is large. If

δ = 0

, we can achieve a stricter notation of

ϵ

-differential privacy.

The privacy loss [36,69]: Let

M (\cdot)

be a mechanism and

D

and

D^{'}

adjacent datasets, the privacy loss for a given output,

y :

L (M (D), M (D^{'}), y) = \ln (\frac{\Pr [M (D) = y]}{\Pr [M (D^{'}) = y]}) = \leq ϵ

(3)

The privacy loss quantifies how assure a potential attacker can be based on the odds ratio of the two possibilities. In this way, the distance between the output distributions for a given

y

can be measured. In other words, the pair of output distributions provides the distinguishability of the mechanisms. If the loss is zero, the probabilities match and the attacker have no advantage. If the loss is positive, the attacker chooses dataset

D

. If the loss is negative, the attacker chooses dataset

D^{'}

. If the loss magnitude is large, there is a privacy violation.

Hypothesis test interpretation of DP [70]: DP can be interpreted as a hypothesis test with the null hypothesis that

M

was trained on

D

and the alternative hypothesis that

M

was trained on

D^{'}

. False-positive result (type-I error) is the probability of rejecting the null hypothesis when it is true, while false-negative result (type-II errors) is the probability of failing to reject the null hypothesis when the alternative hypothesis is true. For example, Kairouz et al. (2015) characterised

(ϵ, δ)

-DP in terms of the false-positive rate (FPR) and false-negative rate (FNR) that can be achieved by an acceptance region. This characterisation enables the estimation of the privacy parameters as follows:

\frac{T P R - δ}{F P R} < e^{ϵ},

(4)

Furthermore, from the hypothesis testing perspective of hypothesis testing [66] (Balle et al., 2020), the attacker can be viewed as performing the following hypothesis testing problem given the output of either

M (D)

or

M (D^{'}) :

H₀: The underlying dataset is

D

.

H₀: The underlying dataset is

D^{'}

.

In other words, for a fixed type I error,

α

, the attacker tries to find a rejection rule,

ϕ

, that minimises the type II error,

β

.

Private prediction interface [71]: A prediction interface,

M

, is

ϵ

-differential private if for any interactive query generating algorithm,

Q

, the output (

Q ⇆ M (F)

) is

ϵ

-differential private with respect to model

F

, where (

Q ⇆ M (F)

) denotes the sequence of queries and responses generated in the interaction of

Q

and

M

on model

F

.

Rényi DP (RDP) [72]: This DP extends the standard concept of DP by allowing for a continuum level based on the Rényi divergence. In RDP a randomized mechanism,

M

, satisfies

(α, \bar{ϵ})

-RDP if for all neighbouring datasets,

D

and

D^{'}

, the Rényi divergence of order

α

between the distribution of the outputs of the algorithm on

D

and

D^{'}

is bounded by

ϵ

:

D_{α} (M (D) | | M (D^{'}) \leq \bar{ϵ}

(5)

The global sensitivity of a function [66]: The sensitivity,

∆ f

, of a function is the maximum absolute distance between scalar outputs,

∆ f (D)

and

f (D^{'})

, over all possible adjacent datasets,

(D)

and

(D^{'})

:

∆ f = {m a x}_{D, D^{'}} | ∆ f (D) - f (D^{'}) |,

(6)

If the query dimension of the function

d = 1,

the sensitivity of function

f

is the maximum difference between the values that

f

may take on a pair of the adjacent datasets.

Differential privacy mechanisms. One way to achieve

ϵ

-DP and

(ϵ, δ)

-DP is to add noise sampled from Laplace and Gaussian distributions, respectively, where the noise is proportional to the sensitivity of the mechanism. In general, there are three main mechanisms for adding noise to data used in DP, namely the Laplace mechanism [2,73,74], the Gaussian mechanism [66,75,76], and the exponential mechanism [13]. It should be noted that the Laplace mechanism provides

ϵ

-DP and focuses on tasks that return numeric results. The mechanism achieves privacy via output perturbation, i.e., modifying the output with Laplace noise. The Laplace distribution has two adjustable parameters, its centre and its width,

b

. The Gaussian mechanism yields

(ϵ, δ)

-DP. Considering the proximity of the original data, the Laplace mechanism would be a better choice than the Gaussian mechanism, which has a more relaxed definition. It should be noted that the exponential mechanism is usually used more for non-numerical data and performs tasks with categorical outputs. When ϵ is small, the transformation tends to be private. The exponential mechanism is used to privately select the best-scoring response from a set of candidates. The mechanism associates each candidate,

c

, via a scoring function,

q (x, c)

.

Upper bound and lower bound [77]: The DP algorithm is accompanied by a mathematical proof that gives an upper bound for the privacy parameters,

ϵ

and

δ

. In contrast, a privacy audit provides a lower bound for the privacy parameters.

2.2. Differential Privacy Composition

Three core properties of differential privacy are defined for the development of suitable algorithms to preserve privacy and fulfil data protection guarantees. They play a central role in understanding the net privacy cost of a combination of DP algorithms. A crucial property of differential privacy is the composition of differentially private queries [50,78] bounds on privacy guarantees.

Sequential composition [2,79] is the most fundamental, in which a series of queries are computed and released in a single batch. It limits the total privacy cost of obtaining multiple results on DP mechanisms with the same input data. Suppose a sequence of randomized algorithm,

{{M}_{1} (D), M_{2} (D), \dots, M_{k} (D)}

, which consists of

k

sequential steps, which is performed with the privacy budget,

{ϵ_{1}, ϵ_{2}, \dots, ϵ_{k}}

, on the same given dataset,

D

, then the output mechanism is

G (D) = {(M}_{1} (D), M_{2} (D), \dots, M_{k} (D))

, which satisfies

\sum_{i = 1}^{k} ϵ_{i}

-DP.

Parallel composition [13] is a special case of DP composition when different queries are applied to disjoint subsets of the dataset. If

M (D)

satisfies

ϵ

-DP and dataset

D

is divided into

k

disjoint parts, such as

d_{1} \cup, \dots, \cup d_{k} = D

, then the mechanism that releases all results,

M (d_{1}, \dots, d_{k})

, satisfies

ϵ

-DP. In this case, the privacy loss is not the sum of all,

ϵ

, but rather the

ϵ_{m a x}

.

Postprocessing immunity [80] means that you can apply transformations (either deterministic or random) to the DP release and know that the result is still differentially private. If

M (D)

satisfies

ϵ

-DP, then for any randomised or deterministic function

g = (M (D))

satisfies

ϵ

-DP. Postprocessing immunity guarantees that the output of DP mechanism can be used arbitrarily without additional privacy leakage [81]. Since postprocessing of DP outputs does not decrease privacy [61], we can choose a summary function

τ

to preserve as much information about

M

as possible.

2.3. Centralized and Local Models of Differential Privacy

The two main common models [82,83] for ensuring data privacy, each with different applications and mechanisms, are the central model and the local models. In differential privacy, the user datasets are noised either at the data center after receiving client’s data or it is noised by each user of the data locally.

The classic centralized version of DP requires a trusted curator who is responsible for adding noise to the data before distributing or analyzing. In centralized differential privacy, the data and model are collocated, and the noise is added to the original dataset after it has been aggregated in a trusted data center. A major problem with centralized differential privacy setting is that users still need to trust a central authority, namely the administrator of the dataset, to maintain their privacy [84]. There is also the risk of a hostile curator [85].

In the local differential privacy model, the data are made differentially private before they come under the control of the curator of the dataset [86]. Noise is added directly to the user’s dataset [87] before it is transmitted to the data center for further processing [88]. In the trade-off between privacy and accuracy, both centralized and local paradigms of DP can reduce the overall accuracy of the converged model due to the randomization of information shared by users [85].

Another taxonomic approach is the general distinguish DP into single-party learning and the multi-party learning [89]. Single-party learning means that the training data of each data owner is stored in a central place. In multi-party learning, on the other hand, there are several data owners who each keep their own datasets locally and are often unable to exchange raw data for data protection reasons.

2.4. Noise Injecting

In the ML pipeline, there are multiple stages at which we can insert noise (perturbation) to achieve DP: (1) on the training data (the input level), (2) during training, (3) on the trained model, or (4) on the model outputs [90].

Input perturbation. At the input level, we distinguish between central or local settings [91]. Input perturbation is the simplest method to ensure that an algorithm satisfies the DP. It refers to the introduction of random noise into the data (into the input of the algorithm) itself. If a dataset is

D = \{x_{1}, x_{2}, \dots x_{n}\}

and each record

x_{i}

is a d-dimensional vector, then a differential private

x

is denoted as

\tilde{x} = x + n o i s e

, where

n o i s e

is a random d-dimensional vector.

Output perturbation. Another common approach is output perturbation [92], which obtains DP by adding random noise is introduced to the intermediate output or the final model output [50]. By intermediate output, we mean the middle layers of the neural networks, while the final model output implies the optimal weight obtained by minimizing the loss function. The differential private layer is denoted as

\tilde{h} = h + n o i s e

, where

h

represents the hidden layers in a neural network.

Objective perturbation. In objective perturbation, random noise is introduced to the underlying objective function of the machine-learning algorithm [50]. As the gradient is dependent on the privacy-sensitive data, randomization is introduced at each step of the gradient descend. We can imagine that the utility of the model changes slightly when the noise is added to the objective function. A differential private objective function is represented as

\tilde{L} (D, w) = L (D, w) + n o i s e

.

Gradient perturbation. In gradient perturbation [19], noise is introduced into the gradients during the training process to solve the optimal model parameter using gradient descent methods. The differential private gradient descent is

w_{t + 1} = w_{t} + η_{t} (λ w_{t} + \nabla L (x_{i}, w_{t}) + n o i s e)

, where

λ

is a regularization parameter, and

η

is a learning rate.

Each option provides privacy protection at different stages of the ML development process, with privacy protection being weakest when the DP is introduced at the prediction level and strongest when it is introduced at the input level. Keeping the input data private in different ways means that any model trained on that data will also have DP guarantees. If you introduce DP during training, only that particular model will have DP guarantees. DP at the prediction level means that only the model’s predictions are protected, but the model itself is not differentially private. Note that if perturbations (noise) are added to data to protect privacy, the magnitude of this noise is often controlled using norms (also called scaling perturbations with norms).

3. Privacy Attacks

In this section, the alternative privacy attacks related to differential privacy auditing are presented, and their classification is proposed, evaluating the privacy guarantees of the DP mechanisms and algorithms.

3.1. Overview

The assets that are sensitive and potentially threatening the ML models target either the training data, the model, its architecture, its parameters, and/or the hyperparameters. They can take place either in the training phase or in the inference phase. During model training, the attacker attempts to infer and actively modify the training process or the model. While privacy attacks provide a qualitative assessment of DP, they do not provide quantitative privacy guarantees, nor do they detect exact differential privacy violations with respect to the desired ϵ [84].

Based on a comprehensive overview of the current State of the Art in privacy-related attacks and the proposed threat models in DP, the different types of attacks for DPML auditing typically include (1) black-box and white-box attacks (also known as settings), (2) the type of attack, and (3) the centralized or local DP settings. There is an extensive body of literature tailoring attacks to specific ML problems [93,94,95].

3.2. White-Box vs. Black-Box Attacks

Depending on the capability and knowledge of the attacker and the analysis of potential information leaks in DPML models, they are generally divided into two main areas: black-box and white-box attacks (also known as settings) [84,95]. If the attacker has full access to the target model, including its architecture, training algorithm, model parameters, hyperparameters, gradients, and data distribution, as well as outputs and inputs, we speak of white-box attacks. On the other hand, if an attacker evaluates the privacy guarantees of differential private mechanisms without accessing their internal workings and only has access to the output of the model with arbitrary inputs, it is a black-box attack [26,48]. In this type of attack, the attacker can only query the target model and obtain the outputs, typically confidence scores or class labels.

White-box privacy attacks [96,97], in the context of DP, involve scenarios where the attacker has full access to the model parameters, architecture, and training data. The attacker can exploit this detailed knowledge to create an attack model that predicts whether specific records were part of the training dataset based on internal model behavior. Usually, the attacker tries to identify the most vulnerable space (e.g., the feature space) of the target model by using the available information and modifying the inputs. They may also analyze gradients, loss values, or intermediate activations to derive insights of information status. Wu et al. [96], for example, focus on implementing a white-box scenario where the attacker has full access to the weights and outputs of the target model. If this is the case, we can speak of the strong capability of the attacker on the model. Steinke et al. [48] implement white-box auditing in their auditing framework, where the attacker has access to all intermediate values.

Black-box privacy attacks [84,95] involve scenarios in which the attacker has limited access to a ML model, typically only being able to observe and retrieve its output for specific inputs. This means that the attacker has no knowledge of the internals of the target model. In this scenario, the vulnerabilities of the model are identified using information about the past input/output pairs. Most black-box attacks require the presence of a predictor [84]. In black-box environments, only the predicted confidence probabilities or hard labels are available [97]. In privacy auditing through black-box access, the attacker only sees the final model weights or can only query the model [96]. Black-box attacks are also used to detect privacy violations that exploit vulnerabilities in differential privacy algorithms and lead to privacy violations [26].

Grey-box privacy attacks in DPML represent a middle ground between white-box and black-box attacks, where the attacker has limited access to the model’s internals. This type of attack occurs when an attacker has some knowledge of the model, such as to specific parameters or layers, but not to the complete internal workings.

Attacks on collaborative learning assume access to the model parameters or the gradients during training or deal with attacks during inference. In cases where the attacker has partial access, these are called grey-box attacks (partial white-box attacks) [98,99]. We consider attackers to be active if they interfere with the training in any way. On the other hand, if the attackers do not interfere with the training process and try to derive knowledge after the training, they are considered passive attackers. It is important to add here that most work assumes that the expected input is fully known, although some preprocessing may be required.

3.3. Type of Attacks

In the ML context, the attacker attempts to gain access to the model (e.g., to the parameters) or intends to violate the privacy of the individuals in the training data or perform an attack on the dataset used for inference and model evaluation. In this study, privacy attack techniques are categorized into five types: membership inference attack, data-poisoning attacks, model inversion attacks, model extraction attacks, and property inference attacks [46,62]. The most general form corresponding to the assessment of a data leakage is membership inference: the inference of whether a particular data point was part of a model training set or not [52]. Far more powerful attacks, such as model inversion (e.g., attribute inference [100] or data extraction [101,102], aim to recover partial or even complete training samples by interacting with an ML model. In the ML context, inference attacks that aim to infer private information from data analysis tasks are a significant threat to privacy-preserving data analysis.

3.3.1. Membership Inference Attack

A membership inference attack (MIA; also known as training data extraction attack) is used to determine whether a particular data point is part of the training dataset or not [10,51,52,103,104]. In other words, MIA tries to measure how much information about the training data leaks through the model [34,90,105]. The success rate of these attacks is influenced by various factors, such as data properties, model characteristics, attack strategies, and the knowledge of the attacker [106]. This type of attack is based on the attacker’s knowledge in both white-box and black-box settings [96]. The earlier works in MIAs use average-case success metrics, such as the attacker’s accuracy in guessing the membership of each sample in the evaluation set [52]. The MIAs typically consist of four main types:

White-box membership inference.
Black-box membership inference.
Label-only membership inference.
Transfer membership inference.

In white-box MIAs, the attacker (also known as a full-knowledge attacker) [84,107] has access to the internal parameters of the target model (e.g., gradients and weights), along with some additional information [10]. For example, the attacker has access to the internal weights of the model and thus to the activation values of each layer [108]. The goal of the attacker is to gain access to the model parameters and gradients to identify differences in how the model processes training and non-training data. The main techniques are gradient-based approaches, which exploit the gradients of ML models [41,109] to infer whether a specific data point was part of the training dataset, which can reveal training data by examining gradients for target data points, as the gradients for such data can often differ from the activations during training; and activation analysis [61], as activations for training data may differ from activations for non-training data in certain layers, which can be used as a signal to detect membership. In addition, introducing white-box MIAs in the ML model, it may also be possible to analyse internal gaps of the model and the use of features by exploiting the internal parameters and hidden layers of a model, as these often reveal training data [96].

In the black-box setting, the attacker simply queries the ML model with input data and observes the output, which can return either confidence-based scores, hard labels, or class probabilities. The attacker exploits the differences in the ML model between training data and unseen data, often leveraging high confidence scores for training data points as a signal of membership. In a black-box setting this type of attack is carried out through techniques such as shadow model training [90] or confidence-based score analysis [110].

By training a series of shadow models—local surrogate models—attacker obtains an attack model that can infer the membership of a particular record in the training dataset [111]. This is performed by training a model that has the same architecture as the target model but uses its own data samples to approximate the training set of the target model. The attack only requires the exploitation of output prediction vector of the model and is well feasible against supervised machine-learning models. Instead of creating many shadow models [112], use only one to model the loss or logit distributions for members and non-members.

In confidence score analysis (also known as confidence-based attacks), the attacker analyzes the confidence scores returned by the model by comparing the confidence in the trained samples with the untrained samples (unseen data). The attacker has access to the labels and prediction vectors to obtain confidence scores (probabilities) for the queried input. This approach is mainly investigated in works such as [52,113]. Carlini et al. [103] use a confidence-based analysis that is maximized at low false-positive rates (FPRs). To improve performance at low FPRs, Tramèr et al. [49] introduce data poisoning during training. However, these attacks can be computationally expensive, especially when used together with shallow models to stimulate the behavior of the target models [114].

In the label-only MIAs, the attacker only has access to the model’s labels to determine whether a specific data sample was part of the model’s training set. The attacker uses only the predicted labels to infer membership under input perturbation [20,115], often by leveraging inconsistencies in the model’s predictions between training and non-training data. Standard label-only MIAs often require a high number of queries to assess the distance of the target sample from the model’s decision boundary, making them less effective [54,116]. There are two main techniques in label-only MIAs: adaptive querying, where the inputs are slightly modified to see if the model changes the label, which could indicate that the data were part of the training set; and meta-classification, which means that a secondary model is trained to distinguish between the labels of training and non-training data to infer membership.

The transfer membership [117] is a case where direct access to the target model is restricted. The attacker can train an independent model with a similar dataset or use publicly available models that have been trained with similar data. The attacker’s goal is to train a local model that approximates the behavior of the target model and use this local model to launch MIAs. There are two main techniques: model approximation [118,119], which means that the attacker approximates the decision boundary of the target model and uses black-box settings to infer the membership of the surrogate model; and adversarial examples [120], meaning that the attacker can generate adversarial examples that are more likely to be misclassified by non-training points to improve the accuracy of membership inference.

MIA is widely researched in the field of ML and could serve as a basis for stronger attacks or be used to audit different types of privacy leaks. Most MIAs follow a common scheme to quantify the information leakage of ML algorithms over training data. For example, Ye et al. [113] compare different strategies for selecting loss thresholds. Yeom et al. [10] compares the use of MIA for privacy testing with the use of a global threshold, τ, for all samples. Later, attack threshold calibration was introduced to improve the attack threshold, as some samples are more difficult to learn than others [22,103]. Another approach to MIA is defined by an indistinguishably game between a challenger and an adversary (i.e., privacy auditor) [105]. The challenger tries to find out whether a particular data point or a sample of data points was part of the training dataset used to train a particular model. A list of privacy attacks is shown in Table 1.

3.3.2. Data-Poisoning Attack

In data-poisoning attacks, malicious data are injected into the training set in order to influence the behavior of the model. These attacks, whether untargeted (random) or targeted [54,121], are a form of undermining the functionality of the model. Common approaches are to either reduce the accuracy of the model (random) or manipulate the model to output a label specified by the attacker (targeted) to reduce the performance of the model or cause targeted misclassification or misprediction. If the attacker tries to elicit a specific behavior from the model, the attack is called targeted. A non-targeted attack, on the other hand, means that the attacker is trying to disrupt the “overall functionality of the model”. Targeted or non-targeted poisoning attacks can include both model poisoning and data-poisoning attacks. The impact of poisoning attacks, for example, causes the classifier to change its decision boundary and achieve the attacker’s goal of violating privacy [121].

In the context of DP, during data-poisoning attacks [55,122], the attacker manipulates and falsifies the model at the time of its training or during the inference time of the model by injecting adversarial examples into the training dataset [54]. In this way, the behavior of the model is manipulated, and meaningful information is extracted. Poisoning attacks are not only limited to training data points; they also target model weights. Among these threats, data poisoning stands out due to its potential to manipulate and undermine the integrity of AI-driven systems. It is worth noting that this type of attack is not directly related to data privacy but still poses a threat to ML modeling [123]. In model poisoning, target model poisoning aims to misclassify selected inputs without modifying them. This is achieved by manipulating the training process. Data-poisoning attacks are relevant for DP auditing as they can expose potential vulnerabilities in privacy-preserving models. The DPs typically consist of five main types (as shown in Table 1):

Gradient manipulation attacks: In gradient manipulation attacks, especially gradient inversion attacks [96,124], the attacker manipulates the gradient update by injecting false or poisoned gradients that either distort the decision boundary of the model or lead to overfitting. These attacks allow the attacker to reconstruct private training data from shared gradients and undermine the privacy guarantees of ML models. This approach also aims to investigate whether gradient clipping and noise addition can effectively protect against excessive influence of individual gradients.

Targeted label flipping: This involves modification of the labels of certain data points in the training data without changing the data themselves, especially those in sensitive classes [125]. The attacker then checks whether this modified information can be restored from the model.

Influence limiting: To assess how DP mechanisms limit the influence of any single data point, poisoned records are inserted into the training data to see if their impact on the model predictions and accuracy can be detected [126].

Backdoor poisoning attacks: This type of attack aims to insert a specific trigger or “backdoor” [127,128,129] that later manipulates the behavior of the model when it is activated in the testing or deployment phase. If the model is influenced by the backdoor pattern in a way that compromises individual data points, this may indicate vulnerability to targeted privacy risks. These types of attacks are often evaluated against specific target perturbation learners. The attacker intentionally disrupts some training samples to change the parameter distribution [130]. Among these threats, data poisoning stands out due to its potential to manipulate and undermine the integrity of ML-driven systems. These attacks were developed for image datasets [129]. In the original backdoor attacks, the backdoor patterns are fixed [131], e.g., a small group of pixels in the corner of an image. More recent backdoor attacks can be dynamic [132] or semantic [133]. Backdoor attacks are a popular approach to poison ML classification models. DP can help prevent backdoor attacks by ensuring that the training process of the model includes noise addition or privacy amplification.

Data injection: In this type of attack, an attacker injects malicious data samples that are designed to disrupt the model’s training [134]. This is different to backdoor attacks in that it may not involve a specific trigger pattern but simply serves to corrupt the model’s decision making. Adding random, noisy samples into the training set can skew the model’s weight, leading to suboptimal performance.

3.3.3. Model Inversion Attack

These attacks exploit the released model to predict sensitive attributes of individuals using available background information. Existing DP mechanisms struggle to prevent these attacks while preserving the utility of the model [108,135,136].

The model inversion attack is a technique in which the attacker attempts to recover the training dataset from learned parameters. For example, Zhang et al. [58] use model inversion attacks to reconstruct training images from a neural network-based image recognition model. In these attacks, the attackers use the released model to infer sensitive attributes of individuals in the training data or the outputs of DPML models. These attacks allow attackers to infer sensitive attributes of individuals by exploiting the outputs of the model [29,135,136].

The idea of model inversion [137] is to invert a given pre-trained model,

f_{θ}

, in order to recover a private dataset,

D

, such as images, texts, and graphs. The attacker who attempts to use the model inversion attack [28,138,139,140] queries the model with different inputs and observes the outputs. By comparing the outputs for different inputs, the attacker identifies and recognizes patterns. By testing each feature, the attacker can consequently infer the pattern in the original training data, resulting in a data leakage.

A common approach for this type of attack is to reconstruct the input data from the confidence score vectors predicted by the target model [100]. The attacker trains a separate attack model on an auxiliary dataset that acts as the inverse of the target model [141]. The attack model takes the confidence score vectors of the target model as input and tries to output the original data of the target model [113]. Formally, let

f_{θ}

be the target model and

g_{θ^{'}}

be the attack model. Given a data record

(x, y)

, the attacker inputs

x

into

f_{θ}

and receives

f (x)

and then feeds into

g_{θ^{'}}

and receives

g (f (x))

, which is expected to obtain

g (f (x)) \approx x

; that is, the outputs of the target models and the attacks model could be very similar. These attacks can be categorized as follows:

Learning-based methods.
White-box inversion attacks.
Black-box inversion attacks.
The gradient-based inversion attacks.

Learning-based methods: These methods can reconstruct diverse data for different training samples within each class. Recent advances have improved their accuracy by regularizing the training process with semantic loss functions and introducing counterexamples to increase the diversity of class-related features [142].

White-box inversion attacks: In such attacks, the attackers have full access to the structure and parameters of the model. Auditors use the white-box inversion as a worst-case scenario or use the model’s parameters of the model [143] to assess whether the DP mechanisms preserve privacy even with highly privileged access.

Black-box inversion attacks: In such attacks, the attackers only have access to the output labels of the model [144] or obtain the confidence vectors. Black-box attacks simulate typical model usage scenarios, allowing auditors to assess how much information leakage occurs purely through interactions with the model’s interface.

Gradient-based inversion attacks (also known as input recovery from gradients): The attacker accesses gradients shared during the training rounds (especially in FL) and uses these to infer details about the training [145]. Auditors use these attacks by masking sensitive information, particularly in collaborative and decentralized working environments.

Traditional DP mechanisms often fail to prevent model inversion attacks while maintain model utility. Model inversion attacks are a significant challenge for DP mechanisms, especially for regression models and graph neural networks (GNNs). These attacks allow attackers to infer sensitive attributes of individuals by exploiting the released model and some background information [135,136].

Model inversion enables an attacker to fully reconstruct private training samples [97]. For visual tasks, the model inversion attack is formulated as an optimization problem. The attack uses a trained classifier to extract representations of the training data. A successful model inversion attack generates diverse and realistic samples that accurately describe each class of the original private dataset.

3.3.4. Model Extraction Attack

Model extraction attacks (also known as reconstruction attacks) aim to steal the functionality, replicate ML models, and expose sensitive information of well-trained ML models [146]. The attacker can approach or replicate a target model by sending numerous queries to infer model parameters or hyperparameters and observing its response. In model extraction attacks [95,147,148], in the context of DP auditing, attackers attempt to derive a victim model by extensively querying model parameters or training data in order to train a surrogate model. The attacker learns a model to try to extract information and possibly fully reconstruct a target model by creating a new duplicate model that resembles the target model in a way that behaves very similarly to the attacked model. This type of attack only targets the model itself and not the training data.

This type of attack can be categorized into two classes [149]: (i) accuracy extraction attacks, i.e., focusing on replacing the target model; and (ii) fidelity extraction attacks, i.e., aim to closely match the prediction of target model.

This threat can increase the privacy risk as a successful model extraction can enable a subsequent next threat, such as a model inversion. There are two approaches to creating a surrogate model [62,147]. First, the surrogate model fits the target model to a set of input points that are not necessarily related to the learning task. Second, create a surrogate model that matches the accuracy of the target model on the test set that is related to the learning task and comes from the distribution of the input data.

Model extraction attacks pose a significant security threat to ML models, especially those provided via cloud services or public APIs. In these attacks, an attacker repeatedly queries a target model to train a surrogate model that mimics the functionality of the target’s model.

3.3.5. Property Inference Attacks

Property inference attacks (also called distribution inference) [150,151,152,153] aim to infer global, aggregate properties of the training data used in machine-learning models, rather than details of individual data points. These sensitive properties are often based on ratios, such as the ratio of male to female records in a dataset. The attack attempts to understand the statistical information of a training dataset from an ML model. In contrast to privacy attacks that focus on individuals in a training dataset (e.g., membership inference), PIAs aim to extract population-level features from trained ML models [60]. Existing property inference attacks can be broadly categorized as follows:

The attacker attacks the training dataset and attempts to leak sensitive statistical information related to the dataset or a subset of the training dataset, such as specific attributes, which can have a significant impact on privacy in the model [150]. The attacker can also exploit the model’s ability to memorize explicit and implicit properties of the training data [154]. This can be achieved by poisoning a subset of the training data to increase the information leakage [152]. There is an option where an attacker can maliciously control a portion of the training data to increase the information leakage. This can lead to a significant increase in the effectiveness of the attack and is then referred to as a property inference poisoning attack [155].

Differential privacy (DP) auditing uses property inference attacks to test whether the DP mechanisms are robust against the leakage of information about specific properties, features, or patterns within the dataset.

4. Privacy Auditing Schemes

In this section, the privacy auditing schemes for differential privacy auditing are presented.

4.1. Privacy Auditing in Differential Privacy

Testing and evaluating DPML models are important to ensure that they effectively protect privacy while retaining their utility. Since differential privacy is always a trade-off between privacy and utility, evaluating the privacy of the model helps in choosing an appropriate privacy budget. A privacy budget that is high enough to ensure sufficient accuracy and a budget that is low enough to ensure acceptable privacy. Which level of accuracy and/or privacy is sufficient depends on the application of the model [130]. To address the security-critical privacy issues [156] and detect potential privacy violations and biases, the privacy auditing procedures can be implemented to empirically evaluate DPML.

Privacy auditing is a set of techniques for empirically verifying the privacy leakage of an algorithm to ensure that it fulfils the defined privacy guarantees and standards. Privacy auditing of DP models [45,46,47,157,158] aims to ensure that privacy-preserving mechanisms are effective, reliable and provide the privacy guarantees of DPML models and algorithms. For example, one approach is to use a probabilistic automation model to track privacy leakage bounds [159]; and another uses canaries to audit ML algorithms [160], efficiently detects

(ϵ, δ)

-violations [48], estimates privacy loss during a single training run [118], or transforms

(ϵ, δ)

into Bayesian posterior belief bounds [34]. To ensure robust privacy auditing in differential privacy (DP), it is important to follow the key steps that rigorously attack the machine-learning model and verify privacy guarantees:

Define the scope of the privacy audit: Establish the objectives and purposes of the audit. This includes determining which specific mechanisms or algorithms are to be evaluated and determining the privacy guarantees

(ϵ, δ)

that are relevant for the audit.

Clear delineation of the privacy guarantees expected by the DP model (differential privacy mechanisms), the definition of data protection requirements that are tailored to the sensitivity of the data, compliance with standards, and the justification of the privacy parameters,

ϵ, δ

(upper bound), are required. For example, the authors of [161] describe a privacy auditing pipeline that is divided into two components: the attacker scheme and the auditing scheme.

Perform privacy attacks and vulnerability implementation: Implement privacy attacks (e.g., membership inference, model extraction, and model inversion) to evaluate the robustness of the DP mechanism or DPML algorithm. It means to providing robust privacy guarantees for a DP mechanism that effectively limit the amount of information that can be inferred about individual data points, regardless of a potential attacker’s strategy or the configuration of the dataset. For example, simulate black-box or white-box membership inference [162,163] to assess the impact of model access on privacy leakage. This gives us the opportunity to test the resilience of the model and measure the success/failure rates of these attacks.

Analyze and interpret the audit results: The final step is to empirically estimate the privacy leakage from a DPML model, denoted as

ϵ_{e m p}

, and compare it with the theoretical privacy budget,

ϵ

[81]. An important goal of this process is the assessment of the tightness of the privacy budget. The audit is considered tight if the privacy parameter is

ϵ_{e m p} \approx ϵ

. Such an approach can be used to effectively validate DP implementations in the model or to detect DP violations in case of

ϵ_{e m p} > ϵ

[164,165,166].

4.2. Privacy Auditing Techniques

Privacy auditing techniques in differential privacy are essential to ensure that privacy guarantees are met in practical implementations. Empirical auditing techniques establish practical lower bounds on privacy leakage, complimenting the theoretical upper bounds provided by DP [32]. Before we address privacy auditing schemes, it is necessary to explain the main auditing techniques (Birhane et al., 2024) that have been used to evaluate the effectiveness of DP mechanisms and algorithms against privacy attacks in ML models.

Canary-based audits: Canary-based auditing is a technique for assessing the privacy guarantees of DPML algorithms by introducing specially designed examples, known as canaries, into the dataset [43,160]. The auditor then tests whether these canaries are included in the outputs of the model and distinguishes between models trained with different numbers of canaries. An effective DP should limit the sensitivity of the model to the presence of these canaries, minimizing the privacy risk. Canaries must be carefully designed to ensure that they can detect potential privacy leaks without jeopardizing overall privacy guarantees. Canary-based auditing often requires dealing with randomized datasets, which enables the development of randomized canaries. The introduction of Lifted Differential Privacy (LiDP) [160] by distinguishing between models trained with different numbers of canaries can leverage statistical tests and novel confidence intervals to improve sample complexity. There are several canary strategies: (1) a random sample from a dataset distribution with a false label, (2) the use of an empty sample, (3) an adversarial sample, and (4) the canary crafting approach [42]. The disadvantage of canaries is that an attacker must have access to the underlying dataset and knowledge of the domain and model architecture.

Statistical auditing: In this context, statistical methods are used to empirically evaluate privacy guarantees [167]. These include influence-based attacks and improved privacy search methods that can be used to detect privacy violations and understand information leakage in datasets, thus greatly improving the auditing performance of various models, such as logistic regression and random forest [62].

Statistical hypothesis testing interpretation: The aim of this approach is to find the optimal trade-off between type I and type II errors [168]. This means that no test can effectively determine whether a specific individual’s data are included in a dataset, ensuring that both high power and high significance are simultaneously unattainable simultaneously [169,170]. It is used to derive the theoretical upper bound and is very useful in deriving the tight compositions [171] and has even motivated a new relaxed notion of DP called f-DP [76].

Single training run auditing (also known as one-shot auditing): It enables privacy auditing during a single training run, eliminating the need for multiple retraining sessions. This technique utilizes the parallelism of independently adding or removing multiple training examples and enables meaningful empirical privacy reduction with only one training run of the model [43]. The technique is efficient and requires no prior knowledge of the model architecture or DP algorithm. This method is particularly useful in FL settings and provides accurate estimates of privacy loss under the Gaussian mechanism [118].

Empirical privacy estimation: In this technique, the actual privacy loss of an algorithm is evaluated by practical experiments rather than theoretical guarantees [172]. This technique is used to audit the implementations of DP mechanisms or claims about the models trained with DP [42]. They are also useful to estimate the privacy loss in cases where a tight analytical upper bound on ϵ is unknown.

Post hoc privacy auditing: This technique traditionally establishes a set of lower bounds for privacy loss (e.g., thresholds). However, it requires sharing intermediate model updates and data with the auditor, which can lead to high computational costs [173].

Worst-case privacy check: In the context of differential privacy, the worst-case privacy auditing [32,102] refers to the specific data points or records in a dataset that, if added, removed or altered from the dataset, could potentially have the greatest impact on the output of a differential privacy mechanism. Essentially, these are the most “sensitive” records where the privacy guarantee is most at risk.

4.3. Privacy Audits

When we focus on auditing DPML models, we first need to know whether we have enough access to information to perform a white-box audit or a black-box audit. A white-box auditing can be difficult to perform on a large scale in practice, as the algorithm to be audited needs to be significantly modified, which is not always possible [109]. Nevertheless, auditing DPML models in a white-box environment requires minimal assumptions about the algorithms [43]. Instead, black-box audits are more realistic in practice, as our attacker can only observe the final trained model.

Privacy auditing schemes empirically evaluate the privacy leakage of a target ML model, or its algorithm trained with DP [42,46,47,62,96]. Such schemes use the DP definition a priori to formalize and quantify the privacy leakage [174]. Currently, most auditing techniques are based on simulating different types of attacks [114] to determine a

(ϵ, δ)

lower bound on the privacy loss of a ML model or algorithm [62]. Privacy auditing can be performed using different attacker schemes (processes), which can be broadly categorized as follows (Table 2):

Membership inference auditing.
Poisoning auditing.
Model inversion auditing.
Model extraction auditing.
Property inference auditing.

In summary, privacy auditing schemes leverage various techniques to strike a balance between privacy, data utility and auditing efficiency. The comprehensive privacy auditing methodology, privacy guarantees with references can be found in Appendix A. We review the most important works, starting with the use of black-box and white-box privacy attacks.

4.3.1. Differential Privacy Auditing Using Membership Inference

Membership inference audits: These audits test the resilience of the model against membership attacks, where an attacker tries to determine whether certain data points were included in the training set. The auditor performs MIAs to estimate how much information about individual records may have been leaked. This category is divided into the following subcategories:

Black-box inference membership auditing: This approach relies solely on assessing the privacy guarantees of machine-learning models by evaluating their vulnerability to membership inference attacks (MIAs) without accessing the internal workings of the model.

Song et al. [175] examine how robust training, including a differential privacy mechanism, affects the vulnerability to black-box MIAs. The success of MIAs is measured using metrics such as attack accuracy and the relationship between model overfitting and privacy leakage. It also investigates MIAs under attacker robustness and differential privacy conditions and shows that DP models are also vulnerable under black-box conditions. Carlini et al. [103] present a DP audit method related to black-box threshold MIAs by proposing a first-principles approach. The authors introduce the likelihood ratio attack (LiRA). It analyzes the most vulnerable points in the model predictions. The authors question the use of existing methodologies that rely on average-case accuracy metrics to evaluate empirical privacy, which do not adequately capture an attacker’s ability to identify the actual members of the training dataset. They propose to measure the attacker’s ability to infer membership of a dataset using the true-positive rate (TPR) and the low false-positive rate (FPR) at very low positive rates (e.g., <0.1%). The authors offer a

ϵ

—maximization strategy. The authors conclude that even a powerful DP mechanism can sometimes be vulnerable to carefully constructed black-box accesses. Lu et al. [172] present Eureka, a novel method for estimating relative DP guarantees in black-box settings, which defines a mechanism’s privacy concerning a specific input set. At its core, Eureka uses a hypothesis testing technique to empirically estimate privacy loss parameters. By computing outputs on adjacent datasets, the potential leakage and thus the degree of privacy guarantee is determined. The authors use MIAs based on classifiers to audit

(ϵ, δ)

-DP algorithms. They demonstrate that Eureka achieves tight accuracy bounds in estimating privacy parameters with relatively low computational cost for large output space.

Kazmi et al. [174] present a black-box privacy auditing method for ML target models based on a MIA using both training data (i.e., “members” (true positives)) and generated data not included in the training dataset (i.e., non-members (true negatives)). This method leverages membership inference as primary method to audit datasets used in the training of ML models without retraining them (ensembled membership auditing (EMA)). EMA aggregates the membership scores on a data-by-data basis based on individual data, using statistical tests. The method, which authors call PANORAMIA, quantifies privacy leakage for large-scale ML models without controlling the training process or retraining of the model. Koskela et al. [176] use the total variation distance (TVD), a statistical measure that quantifies the difference between two probability distributions, between the output distributions of a model when trained on two neighboring datasets. The authors suggest that TV distance can serve as a robust indicator of privacy guarantee when examining the outputs of a DP mechanism. The auditing process compares two TVD how much they differ from each other across different outputs generated from adjacent datasets to approximate the privacy parameters. It is directly related to the privacy parameter,

ϵ

, and provides a tangible way to evaluate privacy loss. The auditing process utilizes a small hold-out dataset that has not been exposed during training. Their approach allows for the use of an arbitrary hockey-stick divergence to measure the distance between the score distributions of audit training and test samples. This work fits well with FL scenarios.

White-box inference membership auditing: White-box audits leverage full access to the internal parameters of a model, including gradients and weights. They are often used in corporate ML research, where the internal parameters of the model are available and allow a detailed analysis of the DP efficiency.

Leino and Fredrikson [107] propose a calibrated white-box membership inference attack by evaluating the resulting privacy risk, which also leverages the intermediate representations of the target model. The work investigates how MIAs exploit the tendency of deep networks to memorize specific data points, leading to overfitting. They linearly approximated each layer, launched a separate attack on each layer, and trained the target model (DP-SGD) that combines the outputs of the layer-wise attacks. The high-precision calibration ensures that the attack can confidently identify whether a data point was part of the training set. Chen et al. [177] evaluate a differential private convolutional neural network (CNN) and Lasso regression model with and without sparsity using a MIA on high-dimensional training data, using genomic data as an example. They show that sparsity of the model in contrast to the non-private setting can improve the accuracy of the model in the non-private setting. By applying a regularization technique (e.g., Lasso), the study demonstrates that sparsity can complement DP efforts.

There are seminal works that use both white-box and black-box settings: Nasr et al. [47] extended the study of [46] Jagielski et al. (2020) on empirical privacy estimation techniques by analyzing DP-SGD through an increasing series of black-box membership inference to white-box poisoning attacks. They are the first to audit DP-SGD tightly. To do so, they use attacker-crafted datasets and active white-box attacks that insert canary gradients into the intermediate steps of DP-SGD. Tramèr et al. [49] propose a method for auditing backpropagation clipping algorithm (a modification of the DP-SGD algorithm), assuming that it works in black-box or white-box settings. The goal was to empirically evaluate how often the outputs of

M (D)

and

M (D^{'})

are distinguishable. The auditor’s task with MIA is to maximize the FPR/TPR ratio to assess the strength of the privacy mechanism. Nasr et al. [42] follow up on their earlier work (Nasr et al., 2021) and design an improved auditing scheme for testing DP implementations in black-box and white-box settings for DP-SGD with gradient canaries or input space canaries. This method provides a tight privacy estimation that significantly reduces the computational cost by leveraging tight composition theorems for DP. The authors check each individual step of the DP-SGD algorithm; that is, they do not convert each obtained lower bound obtained into a guarantee for

ϵ

, which is given after compiling over all training steps with understanding of the privacy understanding of the end-to-end algorithm.

Shadow model auditing: Shadow model membership auditing is a technique used to assess the privacy of machine-learning models by replicating the target models that are trained on similar datasets. They allow the auditor to infer information about the target model’s training data without direct access to it. In this method, multiple shadow models are created that mimic the behavior of the target model so that the auditor can infer the membership status based on the outputs of the shadow models. The primary purpose of using shadow models is to facilitate MIAs determining whether specific data points were part of the training dataset.

The groundbreaking work Shokri et al. [52] evaluates the membership inference attack in a black-box environment in which the attacker only has access to the target model via a query. The attack algorithm is based on the concept of shadow models. The attacker trains shadow models that are similar to the target model and uses these shadow models to train a membership inference model. MIA is modeled as a binary classification task for an attack model that is trained using the prediction of shadow models on the attacker dataset. Salem et al. [112] utilize data augmentation to create shadow models and analyze privacy leakage. It provides insights into the impact of data transformations on inference accuracy in both black-box and white-box settings.

Yeom et al. [10] investigate overfitting in ML model auditing using a threshold membership inference attack as a primary method and attribute inference attack based on distinction between training and testing per-instance losses. The authors provide an upper bound on the success of MIA as a function of the parameters in DP. By training shadow models, the authors demonstrate how models that memorize training data are more susceptible to MIAs, especially when DP techniques are not optimally applied. They conclude that overfitting is sufficient for an attacker to perform MIA. Sablayrolles et al. [22] focus on the comparison of black-box attacks and white-box attacks by effectively estimating the model loss for a data point. The authors use the shadow model technique to demonstrate MIAs across architectures and training methods. They also investigate Bayes optimal strategies for MIAs that leverage knowledge of model parameters in white-box settings. Their findings suggest that white-box attacks do not require specific information about model weights and losses, but can still be performed effectively using probabilistic assumptions, and that optimal attacks depend on the loss function and thus black-box attacks are as good as white-box attacks. The authors introduce the Inverse Hessian attack (IHA), which utilizes model parameters to enhance the effectiveness of membership inference. By computing inverse-Hessian vector products, these attacks can exploit the sensitivity of model output to specific training examples.

Label-only membership auditing: label-only membership inference auditing in differentially private machine learning is a privacy assessment method where an auditor attempts to deduce whether a particular data point was part of the training dataset based on the model’s predicted labels (without access to probabilities or other model details). This form of auditing is particularly relevant for real-world scenarios.

Malek et al. [178] adapt a heuristic method to evaluate label differential privacy (Label DP) in different configurations where privacy of the labels associated with training examples is preserved, while features may be publicly accessible. The authors propose two primary approaches: Private Aggregation of Teacher Ensembles (PATE) and Additive Laplace with Iterative Bayesian Inference (ALBI). They apply noise exclusively to the labels in the training data, leading to the development of different label-DP mechanisms, and investigate model accuracy and estimate lower bounds for the privacy parameter

ϵ

values. They trained several models with and without a training point, while the rest of the training set remained unchanged. Choquette-Choo et al. [179] focus on a black-box attack, where only the labels of the model, and not the full probability distribution, are available to the attackers. They investigate privacy leakage in four private prediction algorithms: PATE, CaPC, PromptPATE, and Private-KNN. The authors showed that DP provides the strongest protection against privacy violation in both the average-case and worst-case scenarios and when the model is trained with overconfidence. However, this may come at the expense of the model’s test accuracy. The authors show that an effective defense against label-only MIAs involves DP and strong regularization, which significantly reduces the leakage of private information.

Single-run membership auditing: It is a technique using a single execution of the audit process.

Steinke et al. [43] propose a novel auditing scheme that uses only a single training of the model and can be evaluated using a one-time model output, making audits feasible in practical applications. The authors apply their auditing scheme specifically to the DP-SGD algorithm. After training, auditors select a set of canary data points (auditing examples) and apply MIA thresholds and model parameter tuning to maximize audit assurance from a single model output. The attack estimates the sensitivity of the model to individual data, which is an indication of the privacy risk of the model. By adjusting the MIA parameters and interpreting the model’s response to the canary points, the auditor approximates the empirical privacy loss,

ϵ_{e m p}

. This empirical estimate provides information on how closely the model’s practical privacy matches the theoretical DP guarantees. Their analysis uses parallelism to add or remove multiple data points (training examples independently) in a single training run of the algorithm and statistical generalization. This auditing scheme requires minimal assumptions about the underlying algorithm, making it applicable in both black-box and white-box settings. Andrew et al. [118] propose a novel one-shot auditing framework that enables efficient auditing during a single training run without a priori knowledge of the model architecture, tasks, or DP training algorithm. The method is proven to provide provably correct estimates for privacy loss under the Gaussian mechanism, demonstrating its performance on FL benchmark datasets. The method they propose is model and dataset agnostic, so it can be applied to any local DP task.

Annamalai et al. [109] present a one-shot nearly tight black-box auditing scheme for the privacy guarantees of the DP-SGD algorithm to investigate empirical vs. theoretical aspects. The main idea behind the auditing is to craft worst-case initial model parameters, since DP-SGD is agnostic to the choice of initial model parameters that can yield to tighter privacy audits. The model was initialized with the average-case initial parameters. The authors empirically estimate the privacy leakage from DP-SGD by using the gradient-based membership inference attacks approach. The authors’ key finding is that by crafting worst-case initial model parameters, more realistic privacy estimates can be obtained to address the limitations of the theoretical privacy analysis of DP-SGD.

Loss-based membership inference auditing: It is a technique that measures privacy leakage in differentially private models by evaluating the differences in the model’s loss values generating during the training of ML models when predicting training data versus non-training data.

Wang et al. [111] introduced a novel randomized approach to privacy accounting, which aims to improve on traditional deterministic methods by achieving tighter bounds on privacy loss. The method leverages the concept of Privacy Loss Distribution (PLD) to more accurately measure and track the cumulative privacy loss over a sequence of computations. This approach is particularly beneficial for large-scale data applications where the privacy budget is strict.

Confidence score membership auditing: In this type of audit, the vulnerability of the model is assessed on the basis of the confidence scores of the predictions. Higher confidence in the predictions for training data points compared to non-training points often indicates a leak. By examining the confidence values for predictions in a large sample, auditors can determine whether training points have a higher confidence than non-training points and thus estimate membership leakage.

Yeom et al. [10] establish a direct link between overfitting and membership inference vulnerabilities by analyzing confidence scores. It is shown that high-confidence predictions are often associated with data memorization, which increases privacy risks, especially when attackers exploit confidence scores.

Metric-based membership inference auditing: This refers to the use of various metrics and statistics calculated from a ML model’s outputs to assess the privacy risks in DPML systems. This approach applies membership inference techniques and calculates metrics and statistics that allow a quantitative assessment of privacy leakage. It is often used to compare different models and privacy parameters.

Rahman et al. [180] evaluates DP mechanisms against MIAs and uses accuracy and F-score as a privacy leakage metrics to measure the privacy loss on models trained with DP algorithms. Jayaraman and Evans [21] evaluate the private mechanism against both membership inference and attribute inference attacks. They used balanced prior data distribution probability. Note that if the prior probability is skewed, the above-mentioned methods are not applicable. Liu et al. [169] evaluate DP mechanism using a hypothesis testing framework. They connect precision, recall, and F-score metrics to the DP parameters

(ϵ, δ)

. Based on the attacker’s background knowledge, they give insight into choosing these parameter values. Balle et al. [170] explain DP through a statistical hypothesis testing interpretation, where conditions for a privacy definition based on statistical divergence are identified, allowing for an improved conversion rules between divergence and differential privacy. Carlini et al. [102] investigate how neural networks unintentionally memorize specific training data. The authors develop an attack methodology to quantify unintended memorization by evaluating how easy it is to reconstruct specific data points (e.g., training examples with private information) from the trained model. This study uses metric-based approaches to measure memorization and unintended data retention, which are both critical components in determining membership inference. The research identifies factors contributing to memorization, including model size, training duration, and dataset characteristics.

Humphries et al. [181] investigate the effectiveness of DP in protecting against MIAs in ML. The authors perform an empirical evaluation by varying the

ϵ

values in DP-SGD and observing the effect on the success rate of MIAs including back-box and white-box attacks. The authors suggest that DP needs to be complemented by other techniques that specifically target membership inference risk. Ha et al. [41] evaluate the impact of privacy parameters by adjusting

ϵ

on the effectiveness of DP in mitigating gradient-based MIAs. The authors recommend specific DP parameter settings and training procedures to improve privacy without sacrificing model utility. Askin et al. [182] explore statistical methods for quantifying and verifying differential privacy (DP) claims. This method includes estimators and confidence intervals for the optimal privacy parameter ϵ of a randomized algorithm and avoids the complex process of event selection, which simplifies the implementation, provides estimators and confidence intervals for the optimal privacy parameter ϵ of a randomized algorithm. Liu and Oh [158] report on extensive hypothesis testing using DPML using the Neyman–Pearson criterion. They give guidance on setting the privacy budget on assumption about the attacker’s knowledge considering different types of auxiliary information that an attacker can obtain (to strengthen the MIA such as probability distribution of data, record correlation, and temporal correlation).

Aerni et al. [183] design adaptive membership inference attacks based on the LiRA framework [103], which frames membership inference as a hypothesis testing problem. Given the score of the victim model on

x

, the attack then applies a likelihood ratio test to distinguish between the two hypotheses. To estimate the score distribution, multiple shadow models must be trained by repeatedly sampling a training set and training models.

Data augmentation-based auditing: This form of auditing involves generating synthetic or modified versions of the data in order to assess and improve privacy guarantees. This approach is useful in evaluating models with overfitting tendencies, where small perturbations could reveal privacy weaknesses.

Kong et al. [184] present a notable connection between machine unlearning and MIA. Their method provides a mechanism for privacy auditing without modifying the model. By leveraging forgeability (creating new, synthetic data samples), data owners can construct Proof-of-Repudiation (PoR) concept that allows a model owner to refute claims made by MIAs to enhance privacy protection and mitigate privacy risks.

Recently, there has been works on auditing lower bounds for differential Rènyi differential privacy (RDP). Balle et al. [170] investigate the relationship between RDP and its interpretation in terms of a statistical hypothesis testing interpretation. The authors investigate the conditions for a privacy definition based on statistical divergence, which allows for an improved conversion rules between divergence and differential privacy. They provide precise privacy loss bounds under RDP and interpret these in terms of type I and type II errors in hypothesis testing. Kutta et al. [185] develop a framework to estimate lower bounds for RDP parameters (especially for

ϵ - R D P

) by investigating a mechanism in a black-box manner. Their framework allows auditors to derive minimal privacy guarantees without requiring internal access to the mechanism. Their goal was to observe how much the outputs deviate for small perturbations in the inputs. Domingo-Enrich et al. [186] propose an auditing procedure of DP with the regularized kernel Rènyi divergence (KRD) to define the regularized kernel Rènyi differential privacy (KRDP). Their auditing procedure can estimate from samples even in high dimensions for

ϵ

-DP,

(ϵ, δ)

-DP, and

(α, ϵ)

-Rényi DP. Their proposed auditing method does not suffer from curse of dimension and has parametric rates in high dimension. However this approach requires knowledge of the covariance matrix of the underlying mechanisms, which is impractical for most mechanisms other than Laplace and Gaussian mechanisms and inaccessible in black-box settings.

Kong et al. [40] introduce a family of function-based testers for Rènyi DP (also for pure and approximate DP). The authors introduce DP-Auditorium, a DP auditing library implemented in Python that allows to test DP guarantees from the black-box access to the mechanism. DP-Auditorium facilitates the development and execution of privacy audits, allowing researchers and practitioners to evaluate the robustness of DP implementations against various adversarial attacks, including membership inference and attribute inference. The library also supports multiple privacy auditing protocols and integrates configurable privacy mechanisms, allowing for testing across different privacy budgets and settings. Chadha et al. [32] proposes a framework for auditing private predictions with different poisoning and querying capabilities. They investigate privacy leakage in terms of Rényi DP of four private prediction algorithms PATE, CaPC, Prompt PATE and Private-KNN. The experiments have shown that there are algorithms that are easier to poison and lead to much higher privacy leakage. Moreover, the privacy leakage is significantly lower for attackers without query control than for attackers with full control. The authors have shown that privacy leakage is lower for attackers without query control.

4.3.2. Differential Privacy Auditing with Data Poisoning

Data-poisoning auditing: In data poisoning, “poisoned” data are introduced into the training dataset to observe whether they influence the model predictions and worsens the data protection guarantees. The auditor simulates various data poisoning scenarios by inserting manipulated samples that distort the data distribution. The main scenarios that have been considered in the data poisoning auditing literature are adversarial injection of data points, influence function analysis, manipulation of gradients in DP training, empirical evaluation of privacy loss ϵ, simulation of worst-case poisoning scenarios, and privacy violations.

Influence function analysis is a statistical tool that helps to identify whether specific data points have an excessive influence on the model used to measure the effect on model’s predictions, indicating possible poisoning. They provide a way to estimate how much specific training samples influence the model’s behavior without needing to retrain the model. The seminal work by Koh and Ling [187] provides a robust framework for influence functions to analyze and audit the predictions made by black-box ML models. It introduces techniques for measuring the influence of individual training points on model predictions, setting the stage for analyzing poisoning attacks. The authors utilize first-order Taylor approximations to derive influence functions. This method is particularly useful for diagnosing issues related to model outputs. Lu et al. [61] audit the tightness of DP algorithms using influence-based poisoning to detect privacy violations and understand information leakage. They manipulate the training data to influence the output of the model and thus violate privacy guarantees. Their main goal is to verify the privacy of a known mechanism whose inner workings may be hidden.

To understand how poisoned gradients can influence privacy guarantees, the gradient manipulation is used in DP training. By monitoring gradients, auditors can detect anomalies due to poisoned inputs, as these may cause the differentially private model to exhibit non-robust behavior. Chen et al. [188] investigate the potential for reconstructing training data from gradient leakage analysis during the training of neural networks. The reconstruction problem is formulated as a series of optimization problems that are solved iteratively for each layer of the neural network. An important contribution that this work makes is the proposal of a metric to measure the security level of DL models against gradient-based attacks. The seminal paper Xie et al. [189] investigate the impact of gradient manipulation on both privacy guarantees and model accuracy relevant to DP auditing in federated learning. Liu and Zhao [190] focus on the interaction of gradient manipulation with privacy and proposed ways to improve the robustness of the model under these attacks. Ma et al. [54] establish complementary relationships between data poisoning and differential privacy by using a small-scale and a large-scale data-poisoning attack based on the gradient ascend of logistic regression parameters

θ

with respect to

x

to reach target

θ^{*}

. They evaluate the attack algorithms on two private learners targeting an objective perturbation and an output perturbation. They show that differentially private learners are provably resistant to data-poisoning attacks, with the protection deceasing exponentially as the attacker poisons more data. Jagielski et al. [46] investigate privacy vulnerabilities in DP-SGD, focusing on the question of whether the theoretical privacy guarantees hold under real-world conditions. The DP-SGD algorithm was audited by simulating a model-agnostic clipping-aware poisoning attack (ClipBKD) in black-box settings on logistic regression and fully connected neural network models. The models were initialized, such that the initial parameters were set for the average case. The empirical privacy estimates are derived from Clopper–Pearson confidence intervals of the FP and FN rates of attacks. The authors provide a

ϵ

-maximisation strategy to obtain a lower bound on the privacy leakage.

Empirical evaluation of privacy loss evaluates how the privacy budgets are affected by poisoned data. Auditors measure the effective privacy loss, or empirical epsilon, by feeding in poisoned data and calculating whether the privacy budget remains within acceptable bounds. Steinke and Ullman [191] introduce auditing mechanisms that track the empirical privacy losses and provide insights into the impact of poisoned data on privacy guarantees in real-world applications. The authors clarify the relationship between pure and approximate DP by establishing quantitative bounds on privacy loss under different conditions and introducing adaptive data analysis. Kairouz et al. [192] develop empirical privacy assessment methods applicable to DP-SGD in high-risk inversion settings. This method allows a detailed examination of how shuffling affects privacy guarantees. The authors evaluate the different parameters, such as batch size and privacy budget, in terms of privacy leakage.

Privacy violation: The first work in the field of DP auditing, Li et al. [193] consider relaxing the DP notations to cover different types of privacy violations, such as unauthorized data collection, sharing and targeting. The authors outline key dimensions that influence privacy, such as individual factors (e.g., awareness and knowledge), technological factors (e.g., data processing), and contextual factors (e.g., legal framework). However, the data leakage is not assessed. Hay et al. [194] evaluate the existing DP implementations for correctness of implementation. The authors create a privacy evaluation framework, named DPBench. This framework is designed to evaluate, test and validate privacy guarantees. Recent work proposes efficient solutions for auditing simple privacy mechanisms for scalar or vector inputs to detect DP violations (Ding et al., 2018; Bichsel et al., 2021). For each neighboring input pair, the corresponding output is determined, and Monte Carlo probabilities are measured to determine privacy. Ding et al. [45] were the first to propose practical methods for testing privacy claims in black-box access to a mechanism. The authors designed StatDP, a hypothesis testing pipeline for checking DP violations in many classical DP algorithms, including noisy argmax and for identifying ϵ-DP violations in sparse algorithms, such as the spare vector technique and local DP algorithms. Their work focuses on univariate testing of DP and evaluates the correctness of existing DP implementations.

Wang et al. [195] offer a code analysis-based tool CheckDP to generate or prove counterexamples for a variety of algorithms, including spares variety vector algorithm. Barthe et al. [196] investigate the decidability of DP. CheckDP and DiPC can not only detect violations of privacy claims, but can also be used for explicit verification. Bichsel et al. [26] present a privacy violation detection tool, DP-Sniper, which shows that a black-box setting can effectively identify privacy violations. It utilizes two strategies: (1) classifier training to train a classifier that predicts whether an observed output is likely to have been generated from one of two inputs; and (2) optimal attack transformation, where this classifier is then transformed into an approximately optimal attack on differential privacy. DP-Sniper is particularly effective at exploiting floating-point vulnerabilities in naively implemented algorithms and detecting significant privacy violations

Niu et al. [165] present DP-Opt(mizer), a disprover that attempts to find for counterexamples whose lower bounds on differential privacy exceed the claimed level of privacy guaranteed by the algorithm. The authors focus exclusively on

ϵ

-DP. They train a classifier to distinguish between the outputs

M (a)

and

M (a^{'})

and create an attack based on this classifier. The statistical guarantees for the attack found are given. They transform the search task into an improved optimization objective that takes the empirical error into account and then solve it using various off-the-shelf optimizers. Lokna et al. [48] present a black-box attack detection privacy violations of DP to detect potential

(ϵ, δ) -

violations by grouping

(ϵ, δ) -

pairs based on the perception that many pairs can be grouped together because they are due to the same algorithm. The key technical insight of their work is that many (ϵ, δ) differentially private algorithms combine ϵ and δ into a single privacy parameter ρ. By directly measuring the robustness or degree of privacy failure

ρ

, one can audit multiple privacy claims simultaneously. The authors implement their method in a tool called Delta-Siege.

4.3.3. Differential Privacy Auditing with Model Inversion

This is a DPML model evaluation scheme that examines how much information about individual data records can be inferred from the outputs of the trained model (usually confidence score values or gradients) to understand the level of privacy leakage. The model is inverted to extract information. The auditing might be a white-box with access to gradients or internal layers or a black-box that only accesses the output labels. The main challenge in detecting model inversion attacks in differential privacy auditing is the need to prevent the inference of sensitive attributes of individuals from the shared model, especially in the context of black-box scenarios. The main scenarios that have been considered in the literature for model inversion auditing are sensitivity analyses, gradient and weight analyses, empirical privacy loss, and embedding and reconstruction tests.

Sensitivity analyses quantify how much private information is embedded in the model’s outputs that could potentially be reversed. Auditors evaluate gradients or outputs to determine how well they reflect the characteristics of the data. This analysis often involves running a series of model inversions to assess how DP mechanisms (e.g., DP-SGD) protect against the disclosure of sensitive attributes.

Frederikson et al. [100] present a seminal paper that introduces model inversion attacks that use confidence scores from model predictions to reconstruct sensitive input data. It explores how certain types of models, even when protected with DP, can be vulnerable to model inversion attacks by revealing certain features. Wang et al. [136] analyze the vulnerability of existing DP mechanisms. They use a functional mechanism method that perturbs the coefficients of the polynomial representations of the objective function balancing the privacy budget between sensitive and non-sensitive attributes to mitigate model inversion attacks. Hitaj et al. [197] focuses primarily on collaborative learning settings, which is relevant to DP as it shows how generative adversarial networks (GANs) can be used for model inversion to reconstruct sensitive information, providing insights into potential vulnerabilities in DP-protected models. Song et al. [198] discuss how machine-learning models can memorize training data in a way that allows attackers to perform inversion attacks. The authors analyze scenarios in which DP cannot completely prevent leakage of private data features through inversion techniques. Fang et al. [135] examine the vulnerability of existing DP mechanisms using a functional mechanism method. They propose a differential privacy allocation model. They optimize the regression model by adjusting the allocation of the privacy budget allocation within the objective function. Cummings et al. [199] introduce an individual sensitivity metric technique like smooth sensitivity and sensitivity preprocessing to improve the accuracy of private data by reducing sensitivity, which is crucial for mitigating model inversion risk.

Gradient and weight analyses show whether and how gradients expose sensitive attributes. By auditing gradients and weights, privacy auditors can check whether protected data attributes can be inferred directly or indirectly. Since model inversion often leverages gradients for black-box attacks, gradient clipping in DP-SGD helps mitigate exposure.

Works such as that of Phan et al. [200] investigate how model inversion can circumvent the standard DP defense by exploiting subtle dependencies in the model parameters. There are works that use gradient-inversion attacks. Zhu et al. [201] show that gradient information, i.e., minimizing the difference between the observed gradients and those that would be expected from the true input data commonly used in DP or federated learning, can reveal sensitive training data by inversion. It is shown how even with DP mechanisms, gradient-based inversion attacks can reconstruct data and thus pose a privacy risk. Huang et al. [202] align the gradients of dummy data with the actual data, making the dummy images resemble the private images. The paper describes in detail how gradient inversion attacks work by recovering training patterns from model gradients shared during federated learning. Wu et al. [203] use gradient compression to reduce the effectiveness of gradient inversion attacks. Zhu et al. [204] introduce a novel generative gradient inversion attack algorithm (GGI) where the dummy images are generated from low-dimension latent vectors through the pre-trained generator.

Empirical privacy loss approach calculates the difference between theoretical and empirical privacy losses in inversion scenarios. Auditors measure the privacy loss,

ϵ

, by performing a model inversion on a DP-protected model and comparing the result with the theoretical privacy budget. Large deviations indicate a possible weakness of DP in protecting against inversion.

Yang et al. [205] investigate defense mechanisms against model inversion and propose prediction purification techniques, which involves modifying the outputs of the model to obscure sensitive information while still providing useful predictions. It shows how adding additional processing to predictions can mitigate the effects of inversion attacks. Zhang et al. [206] apply DP to software defect prediction (SDP) sharing models and investigate privacy disclosure through model inversion attacks. The authors introduce class-level and subclass-level DP and use DPRF (differential private random forest) as a part on enhancing DP mechanism.

Embedding and reconstruction test: It examines whether latent representations or embeddings could be reversed to obtain private data. The auditors question whether embeddings of DP models are resistant to inversion by attempting to reconstruct data points from compressed representations.

Manchini et al. [207] show that stricter privacy restrictions can lead to a strong bias in inferential, affecting the statistical performance of the model. They propose an approach to improve the data privacy in regression models under heteroscedasticity. In addition, there are some methods such as Graph Model Inversion (GraphMI) that are specifically designed to address the unique challenges of graph data [208]. Park et al. [139] recover the training images from the predictions of the model to evaluate the privacy loss of a face recognition model and measure the success of model inversion attacks based on the performance of an evaluation model. The results have shown that even a high privacy budget

ϵ

= 8 can provide protection against model inversion attacks.

4.3.4. Differential Privacy Auditing Using Model Extraction

When auditing DPML models using model extraction attacks, auditors evaluate how resistant a DP-protected model is to extraction attacks, where an attacker attempts to replicate or approximate the model by querying it multiple times and using the outputs to train a surrogate. This form of auditing is essential to verify that DP implementations truly protect against unintentional model replication, which can jeopardize privacy by allowing the attacker to learn sensitive information from the original model. The main scenarios that have been considered in the literature for auditing model extractions are query analyses.

Query analyses measure the extent to which queries can reveal model parameters or behaviors. Auditors simulate extraction attacks by extensively querying the model and analyzing how well they can replicate its outputs or decision boundaries.

Carlini et al. [101] show that embeddings can reveal private data, advancing research on the robustness of embeddings for DP models. Dziedzic et al. [209] require users to perform computational tasks before accessing model predictions. The proposed calibrated Proof of Work Mechanism (PoW) can deter attackers by increasing the model extraction effort and creating a balance between robustness and utility. Their work contributes to the broader field of privacy auditing by proposing proactive defenses instead of reactive measures in ML applications. Li et al. [210] investigate a novel personalized local differential mechanism to defend against the equation-solving attack and query-based attacks (solving for model parameters through multiple queries) by querying the model multiple times to solve for the model parameters. The authors concluded that this method is particularly effective against regression models and can be mitigated by adding high-dimensionality Gaussian noise to model coefficients. Li et al. [146] use the active verification two-stage privacy auditing method to detect suspicious users based on the query patterns and verifying if they are attackers. By analyzing how well the queries cover the feature space of the victim model, it can detect potential model extraction. Once suspicious users are identified, an active verification module is employed to confirm whether these users are indeed attackers. Their proposed method is particularly useful for object detection models through innovative use of feature space analysis and perturbation strategies. Zheng et al. [211] propose a novel privacy preserving mechanism, Boundary Differential Privacy (

ϵ

-BDM), which modifies the output layer of the model. BDP is designed to introduce carefully controlled noise around the decision boundary of the model. This method guarantees that an attacker cannot learn the decision boundary of two classes with a certain accuracy, regardless of the number of queries. The special layer, the so-called Boundary DP layer that applies differential privacy principles was implemented in the ML model. By integrating BDP into this layer, the model produces an output that preserves privacy around the boundary and effectively obscures information that could be exploited in extraction attacks. This boundary randomized response algorithm was developed for binary models and can be generalized to multiclass models. The extensive experiments (Zheng et al., 2022) [18] have shown that BDP obscures the prediction responses with noise and thus prevents attackers from learning the decision boundary of any two classes, regardless of the number of queries issued.

Yan et al. [212] propose an alternative to caching the BDP layer in which privacy loss is adapted accordingly. The authors propose an adaptive query-flooding parameter duplication (QPD) extraction attack that allows the auditor to infer model information with black-box access and without prior knowledge of the model parameters or training data. A defense strategy called monitoring-based DP (MDP) dynamically adjusts the noise added to the model responses based on real-time evaluations, providing effective protection against QPD attacks. Pillutla et al. [160] introduce a method in which multiple randomized canaries are added to audit the privacy guarantees by distinguish between models trained with different numbers of canaries in the dataset. By distinguishing between models trained with different numbers of canaries, the authors introduce Lifted Differential Privacy (LiDP) can effectively audit differentially private models. The introduction of novel confidence intervals that adapt to empirical high-order correlations to improve the accuracy and reliability of the auditing process.

4.3.5. Differential Privacy Auditing Using Property Inference

Auditing DP with property inference attacks typically focuses on extracting global features or statistical properties of a dataset used to train ML model, such as average age, frequency of diseases, or frequency of geographic locations, rather than specific data records to reveal sensitive information. The goal is to ensure that the DP mechanisms effectively prevent attackers from inferring sensitive characteristics even if they have access to the model outputs. The auditor checks whether the model reveals statistical properties of the training data that could violate privacy. The literature review on the property inference for differential privacy auditing presents different scenarios, such as evaluating property sensitivity with model outputs and attribute-based simulated worst-case scenarios.

Evaluating property sensitivity with model outputs test how well the DP obscures statistical dataset properties. Auditors analyze the extent to which an attacker could infer information at the aggregate or property level by examining model outputs across multiple queries. For example, changes in the distribution of outputs when querying specific demographics can reveal hidden patterns. An attribute-based simulation of a worst-case scenario is a case in which an attacker has partial information on certain attributes of the dataset. Auditors test the DP model by combining partially known data (e.g., location or age) with the model’s predictions to see if the model can reveal other attributes. This type of adversarial testing helps validate DP protections against more informed attacks.

Suri et al. [213] introduce the concept of distribution inference attacks in both white- box and black-box models, which motivated later DP studies to counteract these vulnerabilities. This type of inference attack aims to uncover sensitive properties of the underlying training data distribution, potentially exposing private information about individuals or groups within the dataset. The authors discuss auditing information disclosure at three granularity levels: distribution, user, and record level. This multifaceted approach allows for a comprehensive evaluation of privacy risks associated with ML models. Ganju et al. [214] show how property inference attacks on attributes can reveal the characteristics of datasets even when neural networks use DP. The authors introduce an approach for inferring properties that a model inadvertently memorizes, using both synthetic and real-world datasets.

Melis et al. [215] focus on collaborative learning using property inference in the context of shared model updates, focusing in particular on how unintended feature leakage can jeopardize privacy. By analyzing the model’s outputs, the author identify which features can leak information and which features contribute to the privacy risks. They introduce a method for inferring sensitive attributes that may only apply to subgroups, thereby revealing potential privacy vulnerabilities. Property inference attacks, in this case, rely on the detection of sensitive features in the training data. Attackers can exploit the linear property of queries to obtain multiple responses from DP mechanisms, leading to unexpected information leakage [215]. Huang and Zhou [215] address critical concerns about the limits of differential privacy (DP), especially in the context of linear queries. They show how the inherent linear properties of certain queries can lead to unexpected information leaks that undermine the privacy guarantees that DP is supposed to provide. Ben Hamida et al. [216] investigate how the implementation of differential privacy can reduce the likelihood of successful property inference attacks by obscuring the relationships between the model parameters and the underlying data properties. Song et al. [217] provide a comprehensive evaluation of privacy attacks on ML models, including property inference attacks. The authors propose attack strategies that target unintended model memorization, with empirical evaluations on DP-protected models. A list of privacy auditing schemes is shown in Table 2.

5. Discussion and Future Research

This paper presents the current trends in privacy auditing in DPML using membership inference, data poisoning, model inversion attacks, model extraction attacks, and property inference.

We consider the advantages of using membership inference for privacy auditing in DPML models through quantification of privacy risk, empirical evaluation, improved audit performance, and guidance for privacy parameter selection. MIAs can effectively quantify the privacy risk (the amount of private information) that a model leaks about individual data points in its training set. This makes them a valuable tool for auditing the privacy guarantees of DP models [41]. They provide a practical lower bound on inference risk complementing the theoretical upper bound of DP [16]. MIAs enable an empirical evaluation of privacy guarantees in DP models, helping to identify potential privacy leaks and implementation errors [160]. They can be used to calculate empirical identifiability scores that enable a more accurate assessment of privacy risks [34]. Advanced methods that combine MIAs with other techniques, such as influence-based poisoning attacks, have been shown to provide significantly improved audit performance compared to previous approaches [61]. MIAs can help in selecting appropriate privacy parameters (ε, δ) by providing insights into the trade-off between privacy and model utility [11,163]. These attacks require relatively weak assumptions about the adversary’s knowledge, making them applicable in different scenarios. This flexibility allows for broader applicability in real-world settings where attackers may have limited information about the mode [46].

We consider the drawbacks of using membership inference for privacy auditing in DPML models as impacting model utility, parameter selection complexity, and non-uniform risk across classes. Implementing DP to defend against MIAs often leads to a trade-off in which increased privacy leads to lower model accuracy [217]. Excessive addition of noise, which is required for strong privacy guarantees, can significantly degrade the utility of the model, especially in scenarios with imbalanced datasets. Choosing the right privacy parameters is challenging due to the variability of data sensitivity and distribution, making it difficult to effectively balance privacy and utility [35]. Legal and social norms for anonymization are not directly addressed by different privacy parameters, adding to the complexity. Some MIA methods, especially those that require additional model training or complex computations, can entail a significant computational overhead [96]. The development of robust auditing tools that can provide empirical assessments of privacy guarantees in DP models is crucial. These tools should take into account real-world data dependencies and provide practical measures of privacy loss [34]. Future research should focus on adaptive privacy mechanisms that can dynamically adjust privacy parameters based on the specific characteristics of the training data and the desired level of privacy.

Using data poisoning to audit privacy in DP provides valuable insight into vulnerabilities and helps quantify privacy guarantees, thus improving the understanding of the robustness of models. Data-poisoning attacks can reveal vulnerabilities in DP models by showing how easily an attacker can manipulate training data to influence model outputs. This helps identify vulnerabilities that are not obvious through standard auditing methods [46]. Data poisoning can help evaluate the robustness of DP mechanisms against attacks by attackers. By understanding how models react to poisoned data, we can improve their design and implementation [61]. By using data-poisoning techniques, auditors can quantitatively measure the privacy guarantees of differentially private algorithms [2]. This empirical approach complements theoretical analyses and provides a clearer understanding of how privacy is maintained in practice [43]. The use of data poisoning for auditing can be generalized to different models and algorithms, making it a versatile tool for evaluating the privacy of different machine-learning implementations [61].

However, it also poses challenges in terms of complexity, potential misuse, and the limits of the scope of application, which must be carefully considered in practice. Conducting data-poisoning attacks requires significant computational resources and expertise. Developing effective poisoning strategies can be complex and may not be feasible for all organizations [43,61]. Data-poisoning attacks may not cover all aspects of privacy auditing. While they may reveal certain vulnerabilities, they may not address other types of privacy breaches or provide a comprehensive view of the overall security of a model [43]. Techniques developed for data poisoning in the context of audits could be misused by malicious actors to exploit vulnerabilities in ML models. This dual use raises ethical concerns about the impact of such research [43]. The effectiveness of data-poisoning attacks can vary greatly depending on the specific characteristics of the model being audited. If the model is robust against certain types of poisoning attacks, the auditing process may lead to misleading results regarding its privacy guarantees.

Future research could focus on developing more robust data-poisoning techniques that can effectively audit DP models. By refining these methods, auditors can better assess the resilience of models to different types of poisoning attacks, leading to improved privacy guarantees. As federated learning becomes more widespread, interest in how data-poisoning attacks can be used in this context is likely to grow. Researchers could explore how to audit federated learning models for DP, taking into account the particular challenges posed by decentralized data and model updates. The development of automated frameworks that utilize data poisoning for auditing could streamline the process of evaluating differentially private models. Such frameworks would allow organizations to routinely assess the privacy guarantees of their models without the need for extensive manual intervention. There is a trend towards introducing standardized quantitative metrics for evaluating the effectiveness of DP mechanisms using data poisoning [63]. This could lead to more consistent and comparable assessments across different models and applications.

Model inversion attacks can expose vulnerabilities in DP models by showing how easily an attacker can reconstruct sensitive training data from the model outputs. This helps identify vulnerabilities that may not be obvious through standard auditing methods [58]. The model inversion serves as a benchmark for evaluating the effectiveness of different DP mechanisms. By evaluating a model’s resilience to inversion attacks, auditors can assess whether the models fulfil the privacy guarantees, enhancing trust in privacy protection technologies [200]. By using model inversion techniques, auditors can quantitatively measure the privacy guarantees provided by DP algorithms [199]. This empirical approach provides a better understanding of how well privacy is maintained in practice. The insights gained from the model inversion can inform developers about necessary adjustments to strengthen privacy protection. This iterative feedback loop can lead to continuous improvement of model security against potential attacks. Model inversion techniques can be applied to different types of ML models, making them versatile tools for evaluating the privacy of different implementations.

Performing model inversion audits can be computationally expensive and time-consuming, requiring significant resources for both implementation and analysis. This can limit accessibility for smaller organizations or projects with limited budgets [48]. Model inversion methods often rely on strong assumptions about the attacker’s capabilities, including knowledge of the architecture and parameters of the model [198]. This may not reflect real-world scenarios where attackers have limited access, which can lead to an overestimation of privacy risks. Practical implementations of algorithms with varying levels of privacy often contain subtle vulnerabilities, making it difficult to audit at scale, especially in federated environments [118]. The results of model inversion audits can be complex and may require expert interpretation to fully understand their implications. This complexity can hinder the effective communication of results to stakeholders who may not have a technical background [48]. While model inversion attacks are effective in detecting certain vulnerabilities, they may not cover all aspects of privacy auditing. While DP is an effective means of protecting the confidentiality of data, it has problems preventing model inversion attacks in regression models [136]. Other types of privacy violations may not be captured by this method, resulting in an incomplete overview of the overall security of a model.

Future research could investigate the implementation of DP at the class and subclass level to strengthen defenses against model inversion attacks. These approaches could enable more granular privacy guarantees that protect sensitive attributes related to specific data classes while enabling useful model outputs [58]. The use of the stochastic gradient descent (SGD) algorithm as a strategic approach to address the challenge of selecting an appropriate value for the privacy budget shows a possible future application of model inversion to optimize privacy budget selection. There may be a trend towards dynamic privacy budgeting, where the privacy budget is adjusted in real time based on the context and sensitivity of the data being processed. This could help to better balance the trade-off between privacy and utility, especially in scenarios that are prone to model inversion attacks [136].

Model extraction attacks pose a significant privacy risk, even when DP mechanisms are used [101]. These attacks aim to replicate the functionality of a target model by querying it and using the answers to infer its parameters or training data. Model extraction attacks can derive the parameters of a machine-learning model through public queries [209]. Even with DP, which adds noise to the model outputs to protect privacy, these attacks can still be effective. For example, the adaptive query-flooding parameter duplication (QPD) attack can infer model information with black-box access and without prior knowledge of the model parameters or training data [212].

Current trends in privacy auditing in the context of DPML show that the focus is on developing efficient, effective frameworks and methods for evaluating privacy guarantees. As the field continues to advance, ongoing research is critical to refine these auditing techniques and schemes, address the challenges related to the privacy–utility trade-off, and improve the practical applicability of DPML systems in real-world settings. We hope that this article provides insight into privacy auditing in both local and global DP.

Author Contributions

Conceptualization, I.N., K.S. and K.O.; methodology, I.N.; formal analysis, I.N.; investigation, I.N.; resources, I.N.; writing—original draft preparation, I.N.; writing—review and editing, K.S., A.N. and K.O.; project administration, K.O.; funding acquisition, K.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work is the result of activities within the “Digitalization of Power Electronic Applications within Key Technology Value Chains” (PowerizeD) project, which has received funding from the Chips Joint Undertaking under grant agreement No. 101096387. The Chips-JU is supported by the European Union’s Horizon Europe Research and Innovation Programme, as well as by Austria, Belgium, Czech Republic, Finland, Germany, Greece, Hungary, Italy, Latvia, Netherlands, Spain, Sweden, and Romania.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The table of privacy-auditing schemes provides an overview of the key privacy attacks, references, privacy guarantees, methods, and main contributions discussed in the study for Section 4.

Table A1. Privacy auditing schemes.

Privacy-Attack Methodology	Reference	Privacy Guarantees	Methodology and the Main Contribution
Membership inference auditing
Black-box membership inference auditing	Song et al. [175]	Membership inference attack analysis: Investigates the vulnerability of adversarial robust DL to MIAs and shows that there are significant privacy risks despite the claimed robustness.	Methodology: Performs a comprehensive analysis of MIAs targeting robust models proposing new benchmark attacks that improve existing methods by leveraging prediction entropy and other metrics to evaluate privacy risks. Empirical evaluations show that even robust models can leak sensitive information about training data. Contribution: Reveals that adversarial robustness does not inherently protect against MIAs and challenges the assumption that such protection is sufficient for privacy. Introduces the privacy risk score, a new metric that quantifies the likelihood of an individual sample being part of the training set providing a more nuanced understanding of privacy vulnerabilities in ML models.
	Carlini et al. [103]	Analyzes the effectiveness of MIAs against ML models: Shows that existing metrics may underestimate the vulnerability of a model to MIAs.	Methodology: Introduces a new attack framework based on quantile regression of models’ confidence scores. Proposes a likelihood ratio attack (LiRA) that significantly improves TPR at low FNR. Contribution: Establishes a more rigorous evaluation standard for MIAs and presents a likelihood ratio attack (LiRA) method to increase the effectiveness of MIAs by improving the accuracy in identifying training data members.
	Lu et al. [172]	Introduces a black-box estimator for DP: Allows domain experts to empirically estimate the privacy of arbitrary mechanisms without requiring detailed knowledge of these mechanisms.	Methodology: Combines different estimates of DP parameters with Bayes optimal classifiers. Proposes a relative DP framework that defines privacy with respect to a finite input set, T, which improves scalability and robustness. Contribution: Establishes a theoretical foundation for linking black-box poly-time $(ϵ, δ)$ parameter estimates to classifier performance and demonstrates the ability to handle large output spaces with tight accuracy bounds, thereby improving the understanding of privacy risks. Introduces a distributional DP estimator and compares its performance on different mechanisms.
	Kazmi et al. [174]	Measuring privacy violations in DPML models: Introduces a framework for through MIAs without the need to retrain or modify the model.	Methodology: PANORAMIA uses generated data from non-members to assess privacy leakage, eliminating the dependency on in-distribution non-members included in the distribution from the same dataset. This approach enables privacy measurement with minimal access to the training dataset. Contribution: The framework was evaluated with various ML models for image and tabular data classification, as well as with large-scale language models, demonstrating its effectiveness in auditing privacy without altering existing models or their training processes.
	Koskela et al. [176]	DP: Proposes a method for auditing require prior knowledge of the noise distribution or subsampling ratio in black-box settings.	Methodology: Uses a histogram-based density estimation technique to compare lower bounds for the total variance distance (TVD) between outputs from two neighboring datasets. Contribution: The method generalizes existing threshold-based membership inference auditing techniques and improves prior approaches, such as f-DP auditing, by addressing the challenges of accurately auditing the subsampled Gaussian mechanism.
	Kutta et al. [185]	Rényi DP: Establishes new lower bounds for Rényi DP in black-box settings providing statistical guarantees for privacy leakage that hold with high probability for large sample sizes.	Methodology: Introduces a novel estimator for the Rényi divergence between the output distributions of algorithms. This estimator is converted into a statistical lower bound that is applicable to a wide range of algorithms. Contribution: The work pioneers the treatment of Rényi DP in black-box scenarios and demonstrates the effectiveness of the proposed method by experimenting with previously unstudied algorithms and privacy enhancement techniques.
	Domingo-Enrich et al. [186]	DP: Proposes auditing procedures for different DP guarantees: $ϵ$ -DP, $(ϵ, δ)$ -DP, and $(α, ϵ)$ -Rényi DP.	Methodology: The regularized kern Rényi divergence can be estimated from random samples, which enables effective auditing even in high-dimensional settings. Contribution: Introduces relaxations of DP using the kernel Rényi divergence and its regularized version.
White-box membership inference auditing	Leino and Fredrikson [107]	Membership inference attack analysis: Introduces a calibrated attack that significantly improves the precision of membership inference	Methodology: Exploits the internal workings of deep neural networks to develop a white-box membership inference attack. Contribution: Demonstrates how MIAs can be utilized as a tool to quantify the privacy risks associated with ML models.
White-box membership inference auditing	Chen et al. [177]	DP: Evaluates the effectiveness of differential privacy as a defense mechanism by perturbating the model weights.	Methodology: Evaluate the differential private convolutional neural networks (CNNs) and Lasso regression model with and without sparsity. Contribution: Investigate the impact of sparsity on privacy guarantees in CNNs and regression models and provide insights into model design for improved privacy.
Black- and white-box membership inference auditing	Nasr et al. [47]	DP: Determines lower bounds on the effectiveness of MIAs against DPML models and shows that existing privacy guarantees may not be as robust as previously thought.	Methodology: Instantiates a hypothetical attacker that is able to distinguish between two datasets that differ only by a single example. Develops two algorithms, one for crafting these datasets and another for predicting which dataset was used to train a particular model. This approach allows users to analyze the impact of the attacker’s capabilities on the privacy guarantees of DP mechanisms such as DP-SGD. Contribution: Provides empirical and theoretical insights into the limitations of DP in practical scenarios. It is shown that existing upper bounds may not hold up under stronger attacker conditions, and it is suggested that better upper bounds require additional assumptions on the attacker’s capabilities.
	Tramèr et al. [49]	DP: Investigates the reliability of DP guarantees in an open-source implementation of a DL algorithm.	Methodology: Explores auditing techniques inspired by recent advances in lower bound estimation for DP algorithms. Performs a detailed audit of a specific implementation to assess whether it satisfies the claimed DP guarantees. Contribution: Shows that the audited implementation does not satisfy the claimed differential privacy guarantee with 99.9% confidence. This emphasizes the importance of audits in identifying errors in purported DP systems and shows that even well-established methods can have critical vulnerabilities.
	Nasr et al. [42]	DP: Provides tight empirical privacy estimates.	Methodology: Adversary instantiation to establish lower bounds for DP. Contribution: Develops techniques to evaluate the capabilities of attackers, providing lower bounds that inform practical privacy auditing.
	Sablayrolles et al. [22]	Membership inference attack analysis: Analyzes MIAs in both white-box and black-box settings and shows that optimal attack strategies depend primarily on the loss function and not on the model architecture or access type.	Methodology: Derives the optimal strategy for membership inference under certain assumptions about parameter distributions and shows that both white-box and black-box settings can achieve similar effectiveness by focusing on the loss function. Provides approximations for the optimal strategy, leading to new inference methods. Contribution: Establishes a formal framework for MIAs and presents State-of-the-Art results for various ML models, including logistic regression and complex architectures such as ResNet-101 on datasets such as ImageNet.
Shadow modeling membership inference auditing	Shokri et al. [52]	Membership inference attack analysis: Develop a MIA that utilizes a shadow training technique.	Methodology: Investigates membership inference attacks using black-box access to models. Contribution: Quantitatively analyzes how ML models leak membership information and introducing a shadow training technique for attacks.
Shadow modeling membership inference auditing	Salem et al. [112]	Membership inference attack analysis: Demonstrates that MIAs can be performed without needing to know the architecture of the target model or the distribution of the training data, highlighting a broader vulnerability in ML models.	Methodology: Introduces a new approach called “shadow training”. This involves training multiple shadow models that mimic the behavior of the target model using similar but unrelated datasets. These shadow models are used to generate outputs that inform an attack model designed to distinguish between training and non-training data. Contribution: Presents a comprehensive assessment of membership inference attacks across different datasets and domains, highlighting the significant privacy risks associated with ML models. It also suggests effective defenses that preserve the benefits of the model while mitigating these risks.
Memorization auditing	Yeom et al. [10]	Membership inference and attribute inference analysis: Analyzes how overfitting and influence can increase the risk of membership inference and attribute inference attacks on ML models, highlighting that overfitting is sufficient but not necessary for these attacks.	Methodology: Conducts both formal and empirical analyses to examine the relationship between overfitting, influence, and privacy risk. Introduces quantitative measures of attacker advantage that attempt to infer training data membership or attributes of training data. The study evaluates different ML algorithms to illustrate how generalization errors and influential features impact privacy vulnerability. Contribution: This work provides new insights into the mechanisms behind membership and attribute inference attacks. It establishes a clear connection between model overfitting and privacy risks, while identifying other factors that can increase an attacker’s advantage.
Memorization auditing	Carlini et al. [102]	Membership inference attack analysis: Identifies the risk of unintended memorization in neural networks, especially in generative models trained on sensitive data, and shows that unique sequences can be extracted from the models.	Methodology: Develops a testing framework to quantitatively assess the extent of memorization in neural networks. It uses exposure metrics to assess the likelihood that specific training sequences will be memorized and subsequently extracted. The study includes hands-on experiments with Google’s Smart Compose system to illustrate the effectiveness of their approach. Contribution: It becomes clear that unintentional memorization is a common problem with different model architectures and training strategies, and it occurs early in training and is not just a consequence of overfitting. Strategies to mitigate the problem are also discussed. These include DP, which effectively reduces the risk of memorization but may introduce utility trade-offs.
Label-only membership inference auditing	Malek et al. [178]	Label differential privacy: Proposes two new approaches—PATE (Private Aggregation of Teacher Ensembles) and ALIBI (additive Laplace noise coupled with Bayesian inference)—to achieve strong label differential privacy (LDP) guarantees in machine-learning models.	Methodology: Analyzes and compares the effectiveness of PATE and ALIBI in the delivering LDP. It demonstrates how PATE leverages a teacher–student framework to ensure privacy, while ALIBI is more suitable for typical ML tasks by adding Laplacian noise to the model outputs. The study includes a theoretical analysis of privacy guarantees and empirical evaluations of memorization properties for both approaches. Contribution: It demonstrates that traditional comparisons of algorithms based solely on provable DP guarantees can be misleading, advocating for a more nuanced understanding of privacy in ML. Additionally, it illustrates how strong privacy can be achieved with the proposed methods in specific contexts.
Label-only membership inference auditing	Choquette-Choo et al. [179]	Membership inference attack analysis: Introduces attacks that infer membership inference based only on labels and evaluate model predictions without access to confidence scores and shows that these attacks can effectively infer membership status.	Methodology: It proposes a novel attack strategy that evaluates the robustness of a model’s predicted labels in the presence of input perturbations such as data augmentation and adversarial examples. It is empirically confirmed that their label-only attacks are comparable to traditional methods that require confidence scores. Contribution: The study shows that existing protection mechanisms based on confidence value masking are insufficient against label-only attacks. The study also highlights that training with DP or strong L2 regularization is a currently effective strategy to reduce membership leakage, even for outlier data points.
Single-training membership inference auditing	Steinke et al. [43]	DP: Proposes a novel auditing scheme for DPML systems that can be performed with a single training run and increases the efficiency of privacy assessments.	Methodology: It utilizes the ability to independently add or remove multiple training examples during a single training run. It analyzes the relationship between DP and statistical generalization to develop its auditing framework. This approach can be applied in both black-box and white-box settings with minimal assumptions about the underlying algorithm. Contribution: It provides a practical solution for privacy auditing in ML models without the need for extensive retraining. This reduces the computational burden while ensuring robust privacy assessment.
	Andrew et al. [118]	DP: Introduces a novel “one-shot” approach for estimating privacy loss in federated learning.	Methodology: Develops a one-shot empirical privacy evaluation method for federated learning. Contribution: Provides a method for estimating privacy guarantees in federated learning environments using a single training run, improving the efficiency of privacy auditing in decentralized environments without a priori knowledge of the model architecture, tasks or DP training algorithm.
	Annamalai et al. [109]	DP: Proposes an auditing procedure for the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm that provides tighter empirical privacy estimates compared to previous methods, especially in black-box settings.	Methodology: It introduces a novel auditing technique that crafts worst-case initial model parameters, which significantly affects the privacy analysis of DP-SGD. Contribution: This work improves the understanding of how the initial parameters affect the privacy guarantees in DP-SGD and provides insights for detecting potential privacy violations in real-world implementations, improving the robustness of differential privacy auditing.
Loss-based membership inference auditing	Wang et al. [111]	DP: Introduces a new differential privacy paradigm called estimate–verify–release (EVR).	Methodology: Develops a randomized privacy verification procedure using Monte Carlo techniques and proposes an estimate–verify–release (EVR) paradigm. Contribution: Introduces a tight and efficient auditing procedure that converts estimates of privacy parameters into formal guarantees, allowing for effective privacy accounting with only one training run and averages the concept of Privacy Loss Distribution (PLD) to more accurately measure and track the cumulative privacy loss through a sequence of computations.
Confidence score membership inference auditing	Askin et al. [182]	DP: Introduces a statistical method for quantifying differential privacy in a black-box setting, providing estimators for the optimal privacy parameter and confidence intervals.	Methodology: Introduces a local approach for the statistical quantification of DP in a black-box setting. Contribution: Develops estimators and confidence intervals for optimal privacy parameters, avoiding event selection issues and demonstrating fast convergence rates through experimental validation.
Metric-based membership inference auditing	Rahman et al. [180]	DP: Examines the effectiveness of differential privacy in protecting deep-learning models against membership inference attacks.	Methodology: Investigates MIAs on DPML models through membership inference. Contribution: Analyzes the vulnerability of DP models to MIAs and shows that they can still leak information about training data under certain conditions, using accuracy and F-score as privacy leakage metrics.
	Liu et al. [169]	DP: Focuses on how differential privacy can be understood through hypothesis testing.	Methodology: Explores statistical privacy frameworks through the lens of hypothesis testing. Contribution: Provides a comprehensive analysis of privacy frameworks, emphasizing the role of hypothesis testing in evaluating privacy guarantees in ML models, linking precision, recall, and F-score metrics to the privacy parameters; and uses hypothesis testing techniques.
	Balle et al. [170]	Rényi DP: Explores the relationship between differential privacy and hypothesis testing interpretations.	Methodology: Examines hypothesis testing interpretations in relation to Rényi DP. Contribution: Establishes connections between statistical hypothesis testing and Rényi differential privacy, improving the theoretical understanding of privacy guarantees in the context of ML.
	Humphries et al. [181]	Membership inference attack analysis: Conducts empirical evaluations of various DP models across multiple datasets to assess their vulnerability to membership inference attacks.	Methodology: Analyzes the limitations of DP in the bounding of MIAs. Contribution: Shows that DP does not necessarily prevent MIAs and points out vulnerabilities in current privacy-preserving techniques.
	Ha et al. [41]	DP: Investigates how DP can be affected by MIAs.	Methodology: Analyzes the impact of MIAs on DP mechanisms. Contribution: Examines how MIAs can be used as an audit tool to quantify training data leaks in ML models and proposes new metrics to assess vulnerability disparities across demographic groups.
Data augmentation-based auditing	Kong et al. [184]	Membership inference attack analysis: Investigates the relationship between forgeability in ML models and the vulnerability to MIAs and uncovers vulnerabilities that can be exploited by attackers.	Methodology: It proposes a framework to analyze forgeability—defined as the ability of an attacker to generate outputs that mimic a model’s behavior—and its connection to membership inference. It conducts empirical evaluations to show how certain model properties influence both forgeability and the risk of MIAs. Contribution: It shows how the choice of model design can inadvertently increase vulnerability to MIAs. This suggests that understanding forgeability can help in the development of secure ML systems.
Data-poisoning auditing
Influence-function analysis	Koh and Ling [187]	Model Interpretation: Investigates how influence functions can be used to trace predictions back to training data and thus gain insight into the behavior of the model without direct access to the internal workings of the model.	Methodology: Uses influence functions from robust statistics to find out which training points have a significant influence on a particular prediction. Develops an efficient implementation that only requires oracle access to gradients and Hessian-vector products, allowing scalability in modern ML contexts. Contribution: Demonstrates the usefulness of influence functions for various applications, including understanding model behavior, debugging, detecting dataset errors, and creating attacks on training sets, improving the interpretability of black-box models.
	Jayaraman and Evans [21]	DP: Investigates the limitations of DPML, particularly focusing on the impact of the privacy parameter $ϵ$ on privacy leakage.	Methodology: Evaluates the practical implementation of differential privacy in machine-learning systems. Contribution: Conducts an empirical analysis of differentially private machine-learning algorithms, assessing their performance and privacy guarantees in real-world applications.
	Lu et al. [61]	DP: Focuses on the auditing of DPML models for the empirical evaluation of privacy guarantees.	Methodology: Proposes a general framework for auditing differentially private machine-learning models. Contribution: Introduces a comprehensive tight auditing framework that assesses the effectiveness and robustness of differential privacy mechanisms in various machine-learning contexts.
Gradient manipulation in DP training.	Chen et al. [188]	Gradient leakage analysis: Investigates the potential for training data leakage from gradients in neural networks, highlighting that gradients can be exploited to reconstruct training images.	Methodology: Analyzes training-data leakage from gradients in neural networks for image classification. Contribution: Provides a theoretical framework for understanding how training data can be reconstructed from gradients, proposing a metric to measure model security against such attacks.
	Xie et al. [189]	Generalization improvement: Focuses on improving generalization in DL models through the manipulation of stochastic gradient noise (SGN).	Methodology: Introduces Positive–Negative Momentum (PNM) to manipulate stochastic gradient noise for improved generalization in machine-learning models. Contribution: Proposes a novel approach that demonstrates the convergence guarantees and generalization of the model using PNM approach that leverages stochastic gradient noise more effectively without increasing computational costs.
	Ma et al. [54]	DP: Investigates the resilience of differentially private learners against data-poisoning attacks.	Methodology: Designs specific attack algorithms targeting two common approaches in DP, objective perturbation and output perturbation. Contribution: Analyzes vulnerabilities of differentially private models to data-poisoning attacks and proposes defensive strategies to mitigate these risks.
	Jagielski et al. [46]	DP: Investigates the practical privacy guarantees of Differentially Private Stochastic Gradient Descent (DP-SGD).	Methodology: Audits differentially private machine-learning models, specifically examining the privacy guarantees of stochastic gradient descent (SGD). Contribution: Evaluates the effectiveness of differential privacy mechanisms in SGD, providing insights into how private the training process really is under various conditions.
Empirical evaluation of privacy loss.	Steinke and Ullman [191]	DP: Establishes a new lower bound on the sample complexity of $(ϵ, δ) -$ differentially private algorithms for accurately answering statistical queries.	Methodology: Derives a necessary condition for the number of records, n, required to satisfy $(ϵ, δ) -$ differential privacy while achieving a specified accuracy. Contribution: Introduces a framework that interpolates between pure and approximate differential privacy, providing optimal sample size requirements for answering statistical queries in high-dimensional databases.
Empirical evaluation of privacy loss.	Kairouz et al. [192]	DP: Presents a new approach for training DP models without relying on sampling or shuffling, addressing the limitations of Differentially Private Stochastic Gradient Descent (DP-SGD).	Methodology: Proposes a method for practical and private deep learning without relying on sampling through shuffling techniques. Contribution: Develops auditing procedure for evaluating the effectiveness of shuffling in DPML models by leveraging various network parameters and likelihood ratio functions.
Privacy violation	Li et al. [193]	Information privacy: Reviews various theories related to online information privacy, analyzing how they contribute to understanding privacy concerns.	Methodology: Conducts a critical review of theories in online information privacy research and proposes an integrated framework. Contribution: Conducts a critical review of theories in online information privacy research and proposes an integrated framework.
	Hay et al. [194]	DP: Emphasizes the importance of rigorous evaluation of DP algorithms.	Methodology: Develops DPBench, a benchmarking suite for evaluating differential privacy algorithms. Contribution: Propose a systematic benchmarking methodology that includes various metrics to evaluate the privacy loss, utility, and robustness of algorithms with different privacy.
	Ding et al. [45]	DP: Addresses the issue of verifying whether algorithms claiming DP actually adhere to their stated privacy guarantees.	Methodology: Develops a statistical approach to detect violations of differential privacy in algorithms. Contribution: Proposes the first counterexample generator that produces human-understandable counterexamples specifically designed to detect violations to DP in algorithms.
	Wang et al. [195]	DP: Introduces CheckDP, an automated framework designed to prove or disprove claims of DP for algorithms.	Methodology: Utilizes a bidirectional Counterexample-Guided Inductive Synthesis (CEGIS) approach embedded in CheckDP, allowing it to generate proofs for correct systems and counterexamples for incorrect ones. Contribution: Presents an integrated approach that automates the verification process for differential privacy claims, enhancing the reliability of privacy-preserving mechanisms.
	Barthe et al. [196]	DP: Addresses the problem of deciding whether probabilistic programs satisfy DP when restricted to finite inputs and outputs.	Methodology: Develops a decision procedure that leverages type systems and program analysis techniques to check for differential privacy in a class of probabilistic computations. Contribution: Explores theoretical aspects of differential privacy, providing insights into the conditions under which differential privacy can be effectively decided in computational settings.
	Niu et al. [165]	DP: Presents DP-Opt, a framework designed to identify violations of DP in algorithms by optimizing for counterexamples.	Methodology: Utilizes optimization techniques to search for counterexamples that demonstrate when the lower bounds on differential privacy exceed the claimed values. Contribution: Develops a disprover that searches for counterexamples where the lower bounds on differential privacy exceed claimed values, enhancing the ability to detect and analyze privacy violations in algorithms.
	Lokna et al. [48]	DP: $Introduces a novel method for auditing (ϵ, δ)$ $- differential privacy, highlighting that many (ϵ, δ)$ pairs can be grouped, as they result in the same algorithm.	$Methodology : Develops a novel method for auditing differential privacy violations using a combined privacy parameter, ρ .$ Contribution: Introduces Delta-Siege, an auditing tool that efficiently discovers violations of differential privacy across multiple claims simultaneously, demonstrating superior performance compared to existing tools and providing insights into the root causes of vulnerabilities.
Model inversion auditing
Sensitivity analysis.	Frederikson et al. [100]	Model inversion attack analysis: Explores vulnerabilities in ML models through model inversion attacks that exploit confidence information and pose significant risks to user privacy.	Methodology: A new class of model inversion attacks is developed that exploits the confidence values given next to the predictions. It empirically evaluates these attacks in two contexts: decision trees for lifestyle surveys and neural networks for face recognition. The study includes experimental results that show how attackers can infer sensitive information and recover recognizable images based solely on model outputs. Contribution: It demonstrates the effectiveness of model inversion attacks in different contexts and presents basic countermeasures, such as training algorithms that obfuscate confidence values, that can mitigate the risk of these attacks while preserving the utility.
	Wang et al. [136]	DP: Proposes a DP regression model that aims to protect against model inversion attacks while preserving the model utility.	Methodology: A novel approach is presented that utilizes the functional mechanism to perturb the coefficients of the regression model. It analyzes how existing DP mechanisms cannot effectively prevent model inversion attacks. It provides a theoretical analysis and empirical evaluations showing that their approach can balance privacy for sensitive and non-sensitive attributes while preserving model performance. Contribution: It demonstrates the limitations of traditional DP in protecting sensitive attributes in model inversion attacks and presents a new method that effectively mitigates these risks while ensuring that the utility of the regression model is preserved.
	Hitaj et al. [197]	Information leakage analysis: Investigates vulnerabilities in collaborative DL models and shows that these models are susceptible to information leakage despite attempts to protect privacy through parameter sharing and DP.	Methodology: Develops a novel attack that exploits the real-time nature of the learning process in collaborative DL environments. They show how an attacker can train a generative adversarial network (GAN) to generate prototypical samples from the private training data of honest participants. It criticizes existing privacy-preserving techniques, particularly record-level DP at the dataset level, and highlights their ineffectiveness against their proposed attack. Contribution: Reveals fundamental flaws in the design of collaborative DL systems and emphasizes that current privacy-preserving measures do not provide adequate protection against sophisticated attacks such as those enabled by GANs. It calls for a re-evaluation of privacy-preserving strategies in decentralized ML settings.
	Song et al. [198]	Model inversion attack analysis: Investigates the risks of overfitting in ML models and shows that models can inadvertently memorize sensitive training data, leading to potential privacy violations.	Methodology: Analyzes different ML models to assess their vulnerability to memorization attacks. Introduces a framework to quantify the amount of information a model stores about its training data and conduct empirical experiments to illustrate how certain models can reconstruct sensitive information from their outputs. Contribution: The study highlights the implications of model overfitting on privacy, showing that even well-regulated models can leak sensitive information. The study emphasizes the need for robust privacy-preserving techniques in ML to mitigate these risks.
	Fang et al. [135]	DP: Provides a formal guarantee that the output of the analysis will not change significantly if an individual’s data are altered.	Methodology: Utilizes a functional mechanism that adds calibrated noise to the regression outputs, balancing privacy protection with data utility. Contribution: Introduces a functional mechanism for regression analysis under DP. Evaluates the performance of the model in terms of noise reduction and resilience to model inversion attacks.
	Cummings et al. [199]	DP: Ensures that the output of the regression analysis does not change significantly when the data of a single individual are changed.	Methodology: Introduces individual sensitivity preprocessing techniques for enhancing data privacy. Contribution: Proposes preprocessing methods that adjust data sensitivity on an individual level, improving privacy protection while allowing for meaningful data analysis. Introduces an individual sensitivity metric technique to improve the accuracy of private data.
Gradient and weight analyses	Zhu et al. [200]	Model inversion attack analysis: Utilizes gradients to reconstruct inputs from model outputs.	Methodology: Explores model inversion attacks enhanced by adversarial examples in ML models. Contribution: Demonstrates how adversarial examples can significantly boost the effectiveness of model inversion attacks, providing insights into potential vulnerabilities in machine-learning systems.
	Zhu et al. [201]	Gradient leakage analysis: Exchanges gradients that lead to the leakage of private training data.	Methodology: Investigates deep leakage from gradients in machine-learning models. Contribution: Analyzes how gradients can leak sensitive information about training data, contributing to the understanding of privacy risks associated with model training.
	Huang et al. [202]	Gradient inversion attack analysis: Evaluates gradient inversion attacks in federated learning.	Methodology: Explores model inversion attacks enhanced by adversarial examples in ML models. Contribution: Assesses the effectiveness of gradient inversion attacks in federated learning settings and proposes defenses to mitigate these vulnerabilities.
	Wu et al. [203]	Gradient inversion attack analysis: Introduces a new gradient inversion method, Learning to Invert (LIT).	Methodology: Develops adaptive attacks for gradient inversion in federated learning environments. Contribution: Introduces simple adaptive attack strategies to enhance the success rate of gradient inversion attacks (gradient compression), highlighting the risks in federated learning scenarios.
	Zhu et al. [204]	Gradient inversion attack analysis: Proposes a generative gradient inversion attack (GGI) in federated learning contexts.	Methodology: Utilizes generative models to perform gradient inversion without requiring prior knowledge of the data distribution. Contribution: Presents a novel attack that utilizes generative models to enhance gradient inversion attacks, demonstrating new avenues for information leakage in collaborative settings.
Empirical privacy loss	Yang et al. [205]	DP: Proposes a method to enhance privacy by purifying predictions.	Methodology: Proposes a defense mechanism against model inversion and membership inference attacks through prediction purification. Contribution: Demonstrates that a purifier dedicated to one type of attack can effectively defend against the other, establishing a connection between model inversion and membership inference vulnerabilities, employing a prediction purification technique.
Empirical privacy loss	Zhang et al. [206]	DP: Incorporates additional noise mechanisms specifically designed to counter model inversion attacks.	Methodology: Broadens differential privacy frameworks to enhance protection against model inversion attacks in deep learning. Contribution: Introduces new techniques to strengthen differential privacy guarantees specifically against model inversion, improving the robustness of deep-learning models against such attacks, and propose class and subclass DP within context of random forest algorithms.
Reconstruction test	Manchini et al. [207]	DP: Use differential privacy in regression models that accounts for heteroscedasticity.	Methodology: Proposes a new approach to data differential privacy using regression models under heteroscedasticity. Contribution: Develops methods to enhance differential privacy in regression analysis, particularly for datasets with varying levels of noise, improving privacy guarantees for ML applications.
Reconstruction test	Park et al. [139]	DP: Evaluates the effectiveness of differentially private learning models against model inversion attacks.	Methodology: Evaluates differentially private learning against model inversion attacks through an attack-based evaluation method. Contribution: Introduces an evaluation framework that assesses the robustness of differentially private models against model inversion attacks, providing insights into the effectiveness of privacy-preserving techniques.
Model extraction auditing
Query analysis	Carlini et al. [101]	Model extraction attack analysis: Demonstrates that large language models, such as GPT-2, are vulnerable to training data-extraction attacks.	Methodology: Employs a two-stage approach for training data extraction, suffix generation and suffix ranking. Contribution: Shows that attackers can recover individual training examples from large language models by querying them, highlighting vulnerabilities in model training processes and discussing potential safeguards.
	Dziedzic et al. [209]	Model extraction attack analysis: Addresses model extraction attacks, where attackers can steal ML models by querying them.	Methodology: Proposes a calibrated proof of work mechanism to increase the cost of model extraction attacks. Contribution: Introduces a novel approach, BDPL (Boundary Differential Private Layer), that raises the resource requirements for adversaries attempting to extract models, thereby enhancing the security of machine-learning systems against such attacks.
	Li et al. [210]	Local DP: Introduces a personalized local differential privacy (PLDP) mechanism designed to protect regression models from model extraction attacks.	Methodology: Uses a novel perturbation mechanism that adds high-dimensional Gaussian noise to the model outputs based on personalized privacy parameters. Contribution: Personalized local differential privacy (PLDP) ensures that individual user data are perturbed before being sent to the model, thereby protecting sensitive information from being extracted through queries.
	Li et al. [146]	Model extraction attack analysis: Proposes a framework designed to protect object detection models from model extraction attacks by focusing on feature space coverage.	Methodology: Uses a novel detection framework that identifies suspicious users based on their query traffic and feature coverage. Contribution: Develops a detection framework that identifies suspicious users based on feature coverage in query traffic, employing an active verification module to confirm potential attackers, thereby enhancing the security of object detection models and distinguishing between malicious and benign queries.
	Zheng et al. [211]	Boundary Differential Privacy ( $ϵ$ -BDP): Introduces Boundary Differential Privacy (ϵ-BDP), which protects against model extraction attacks by obfuscating prediction responses near the decision boundary.	Methodology: Uses a perturbation algorithm called boundary randomized response, which achieves ϵ-BDP by adding noise to the model’s outputs based on their proximity to the decision boundary. Contribution: Introduces a novel layer that obfuscates prediction responses near the decision boundary to prevent adversaries from inferring model parameters, demonstrating effectiveness through extensive experiments.
	Yan et al. [212]	DP: Proposes a monitoring-based differential privacy (MDP) mechanism that enhances the security of machine-learning models against query flooding attacks.	Methodology: Introduces a novel real-time model extraction status assessment scheme called “Monitor”, which evaluates the model’s exposure to potential extraction based on incoming queries. Contribution: Proposes a mechanism that monitors query patterns to detect and mitigate model extraction attempts, enhancing the resilience of machine-learning models against flooding attacks.
Property inference auditing
Evaluating property sensitivity with model outputs.	Suri et al. [213]	Distribution inference attack analysis: Investigates distribution inference attacks, which aim to infer statistical properties of the training data used by ML models.	Methodology: Introduces a distribution inference attack that infers statistical properties of training data using a KL divergence approach. Contribution: Develops a novel black-box attack that outperforms existing white-box methods, evaluating the effectiveness of various defenses against distribution inference risks; performs disclosure at three granularities, namely distribution, user, and record levels; and proposes metrics to quantify observed leakage from models under attack.
Property inference framework	Ganju et al. [214]	Property inference attack analysis: Explores property inference attacks on fully connected neural networks (FCNNs), demonstrating that attackers can infer global properties of the training data.	Methodology: Leverages permutation invariant representations to reduce the complexity of inferring properties from FCNNs. Contribution: Analyzes how permutation invariant representations can be exploited to infer sensitive properties of training data, highlighting vulnerabilities in neural network architectures.
Property inference framework	Melis et al. [215]	Feature leakage analysis: Reveals that collaborative learning frameworks inadvertently leak sensitive information about participants’ training data through model updates.	Methodology: Uses both passive and active inference attacks to exploit unintended feature leakage. Contribution: Examines how collaborative learning frameworks can leak sensitive features, providing insights into the risks associated with sharing models across different parties.
Empirical evaluation of linear queries	Huang and Zhou [216]	DP: Discusses how DP mechanisms can inadvertently leak sensitive information when linear queries are involved.	Methodology: Studies unexpected information leakage in differential privacy due to linear properties of queries. Contribution: Analyzes how certain (linear) query structures can lead to information leakage despite differential privacy guarantees, suggesting improvements for privacy-preserving mechanisms.
Analysis of DP implementation	Ben Hamida et al. [217]	DP: Discusses how differential privacy (DP) enhances the privacy of machine-learning models by ensuring that individual data contributions do not significantly affect the model’s output.	Methodology: Explore various techniques for implementing DPML, including adding noise to gradients during training and employing mechanisms that ensure statistical outputs mask individual contributions. Contribution: Explores the interplay between differential privacy techniques and their effectiveness in enhancing model security against various types of attacks.
Analysis of DP implementation	Song et al. [218]	Privacy risk evaluation:	Methodology: Conducts a systematic evaluation of privacy risks in machine-learning models across different scenarios. Contribution: Provides a comprehensive framework for assessing the privacy risks associated with machine-learning models, identifying key vulnerabilities and suggesting mitigation strategies.

References

Choudhury, O.; Gkoulalas-Divanis, A.; Salonidis, T.; Sylla, I.; Park, Y.; Hsu, G.; Das, A. Differential Privacy-Enabled Federated Learning For Sensitive Health Data. arXiv 2019, arXiv:1910.02578. Available online: https://arxiv.org/abs/1910.02578 (accessed on 1 December 2024).
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise To Sensitivity In Private Data Analysis. In Theory of Cryptography; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
Williamson, S.M.; Prybutok, V. Balancing Privacy and Progress: A Review of Privacy Challenges, Systemic Oversight, and Patient Perceptions. In: AI-Driven Healthcare. Appl. Sci. 2024, 14, 675. [Google Scholar] [CrossRef]
Barbierato, E.; Gatti, A. The Challenges of Machine Learning: A Critical Review. Electronics 2024, 13, 416. [Google Scholar] [CrossRef]
Noor, M.H.M.; Ige, A.O. A Survey on State-of-the-art Deep Learning Applications and Challenges. arXiv 2024, arXiv:2403.17561. Available online: https://arxiv.org/abs/2403.17561 (accessed on 1 December 2024).
Du Pin Calmon, F.; Fawaz, N. Privacy Against Statistical Inference. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton, Monticello, IL, USA, 1–5 October 2012; pp. 1401–1408. [Google Scholar]
Dehghani, M.; Azarbonyad, H.; Kamps, J.; de Rijke, M. Share your Model of your Data: Privacy Preserving Mimic Learning for Mimic Learning for Ranking. arXiv 2017, arXiv:1707.07605. Available online: https://arxiv.org/abs/1707.07605 (accessed on 1 December 2024).
Bouke, M.; Abdullah, A. An Empirical Study Of Pattern Leakage Impact During Data Preprocessing on Machine Learning-Based Intrusion Detection Models Reliability. Expert Syst. Appl. 2023, 230, 120715. [Google Scholar] [CrossRef]
Xu, J.; Wu, Z.; Wang, C.; Jia, X. Machine Unlearning: Solutions and Challenges. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2150–2168. [Google Scholar] [CrossRef]
Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy risk in machine learning: Analyzing the connection to overfitting. In Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF), Oxford, UK, 9–12 July 2018; pp. 268–282. [Google Scholar]
Li, Y.; Yan, H.; Huang, T.; Pan, Z.; Lai, J.; Zhang, X.; Chen, K.; Li, J. Model Architecture Level Privacy Leakage In Neural Networks. Sci. China Inf. Sci. 2024, 67, 3. [Google Scholar] [CrossRef]
Del Grosso, G.; Pichler, G.; Palamidessi, C.; Piantanida, P. Bounding information leakage in machine learning. Neurocomputing 2023, 534, 1–17. [Google Scholar] [CrossRef]
McSherry, F.; Talwar, K. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), Providence, RI, USA, 20–23 October 2007; pp. 94–103. [Google Scholar]
Mulder, V.; Humbert, M. Differential privacy. In Trends in Data Protection and Encryption Technologies; Springer: Berlin/Heidelberg, Germany, 2023; pp. 157–161. [Google Scholar]
Gong, M.; Xie, Y.; Pan, K.; Feng, K.; Qin, A. A Survey on Differential Private Machine Learning. IEEE Comput. Intell. Mag. 2020, 15, 49–64. [Google Scholar] [CrossRef]
Liu, B.; Ding, M.; Shaham, S.; Rahayu, W.; Farokhi, F.; Lin, Z. When Machine Learning Meets Privacy: A Survey and Outlook. ACM Comput. Surv. 2021, 54, 1–31. [Google Scholar] [CrossRef]
Blanco-Justicia, A.; Sanchez, A.; Domingo-Ferrer, J.; Muralidhar, K.A. Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning. ACM Comput. Surv. 2023, 55, 1–16. [Google Scholar] [CrossRef]
Zheng, H.; Ye, Q.; Hu, H.; Fang, C.; Shi, J. Protecting Decision Boundary of Machine Learning Model With Differential Private Perturbation. IEEE Trans. Dependable Secur. Comput. 2022, 19, 2007–2022. [Google Scholar] [CrossRef]
Ponomareva, N.; Hazimeh, H.; Kurakin, A.; Xu, Z.; Denison, C.; McMahan, H.B.; Vassilvitskii, S.; Chien, S.; Thakurta, A.G. A Practical Guide to Machine Learning with Differential Privacy. J. Artif. Intell. Res. 2023, 77, 1113–1201. [Google Scholar] [CrossRef]
Choquette-Choo, C.A.; Dullerud, N.; Dziedzic, A.; Zhang, Y.; Jha, S.; Papernot, N.; Wang, X. CaPC Learning: Confidential and Private Collaborative Learning. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
Jayaraman, B.; Evans, D. Evaluating Differentially Private Machine Learning in Practice. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1895–1912. [Google Scholar] [CrossRef]
Sablayrolles, A.; Douze, M.; Schmid, C.; Ollivier, Y.; Jégou, H. White-box vs black-box: Bayes optimal strategies for membership inference. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 5558–5567. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar] [CrossRef]
Bagdasaryan, E. Differential Privacy Has Disparate Impact on Model Accuracy. Adv. Neural Inf. Process. Syst. 2019, 32, 161263. [Google Scholar] [CrossRef]
Tran, C.; Dinh, M.H. Differential Private Empirical Risk Minimization under the Fairness Lens. Adv. Neural Inf. Process. Syst. 2021, 33, 27555–27565. [Google Scholar] [CrossRef]
Bichsel, B.; Stefen, S.; Bogunovic, I.; Vechev, M. Dp-Sniper: Black-Box Discovery Of Differential Privacy Violations Using Classifiers. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 391–409. [Google Scholar] [CrossRef]
Dwork, C. Differential Privacy. In Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Lecture Notes in Computer Science; Springer: Berlin, Germany, 2006; pp. 1–12. [Google Scholar] [CrossRef]
He, J.; Cai, L.; Guan, X. Differential Private Noise Adding Mechanism and Its Application on Consensus Algorithm. IEEE Trans. Signal Process. 2020, 68, 4069–4082. [Google Scholar] [CrossRef]
Wang, R.; Fung, B.C.M.; Zhu, Y.; Peng, Q. Differentially Private Data Publishing for Arbitrary Partitioned Data. Inf. Sci. 2021, 553, 247–265. [Google Scholar] [CrossRef]
Baraheem, S.S.; Yao, Z. A Survey on Differential Privacy with Machine Learning and Future Outlook. arXiv 2022, arXiv:2211.10708. Available online: https://arxiv.org/abs/2211.10708 (accessed on 1 December 2024).
Dwork, C.; Roth, A. The Algorithmic Foundations Of Differential Privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. Available online: https://www.nowpublishers.com/article/Details/TCS-042 (accessed on 1 December 2024). [CrossRef]
Chadha, K.; Jagielski, M.; Papernot, N.; Choquette-Choo, C.A.; Nasr, M. Auditing Private Prediction. arXiv 2024, arXiv:2402.0940. [Google Scholar] [CrossRef]
Papernot, N.; Abadi, M.; Erlingsson, Ú.; Goodfelow, I.; Talwar, K. Semi-Supervise Knowledge Transfer for Deep Learning from Private Training Data. International Conference on Learning Representations. 2016. Available online: https://openreview.net/forum?id=HkwoSDPgg (accessed on 1 December 2024).
Bernau, D.; Robl, J.; Grassal, P.W.; Schneider, S.; Kerschbaum, F. Comparing Local and Central Differential Privacy Using Membership Inference Attacks. In IFIP Annual Conference on Data and Applications Security and Privacy; Springer: Berlin/Heidelberg, Germany, 2021; pp. 22–42. [Google Scholar] [CrossRef]
Hsu, J.; Gaboardi, M.; Haeberlen, A.; Khanna, S.; Narayan, A.; Pierce, B.C.; Roth, A. Differential Privacy: An economic methods for choosing epsilon. In Proceedings of the Computer Security Foundations Workshop, Vienna, Austria, 19–22 July 2014; pp. 398–410. [Google Scholar] [CrossRef]
Mehner, L.; Voigt, S.N.V.; Tschorsch, F. Towards Explaining Epsilon: A Worst-Case Study of Differential Privacy Risks. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy Workshop, Euro S and PW, Virtual, 6–10 September 2021; pp. 328–331. [Google Scholar]
Busa-Fekete, R.I.; Dick, T.; Gentile, C.; Medina, A.M.; Smith, A.; Swanberg, M. Auditing Privacy Mechanisms via Label Inference Attacks. arXiv 2024, arXiv:2406.01797. Available online: https://arxiv.org/abs/2406.02797 (accessed on 1 December 2024).
Desfontaines, D.; Pejó, B. SoK: Differential Privacies. arXiv 2022, arXiv:1906.01337. Available online: https://arxiv.org/abs/1906.01337 (accessed on 1 December 2024). [CrossRef]
Lycklama, H.; Viand, A.; Küchler, N.; Knabenhans, C.; Hithnawi, A. Holding Secrets Accountable: Auditing Privacy-Preserving Machine Learning. arXiv 2024, arXiv:2402.15780. Available online: https://arxiv.org/abs/2402.15780 (accessed on 1 December 2024).
Kong, W.; Medina, A.M.; Ribero, M.; Syed, U. DP-Auditorium: A Large Scale Library for Auditing Differential Privacy. arXiv 2023, arXiv:2307.05608. Available online: https://arxiv.org/abs/2307.05608 (accessed on 1 December 2024).
Ha, T.; Vo, T.; Dang, T.K. Differential Privacy Under Membership Inference Attacks. Commun. Comput. Inf. Sci. 2023, 1925, 255–269. [Google Scholar]
Nasr, M.; Hayes, J.; Steinke, T.; Balle, B.; Tramer, F.; Jagielski, M.; Carlini, N.; Terzis, A. Tight Auditing of Differentially Private Machine Learning. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), San Francisco, CA, USA, 9–11 August 2023; pp. 1631–1648. [Google Scholar]
Steinke, T.; Nasr, M.; Jagielski, M. Privacy Auditing with One (1) Training Run. arXiv 2023, arXiv:2305.08846. Available online: https://arxiv.org/abs/2305.08846 (accessed on 1 December 2024).
Wairimu, S.; Iwaya, L.H.; Fritsch, L.; Lindskog, S. Assessment and Privacy Risk Assessment Methodologies: A Systematic Literature Review. IEEE Access 2024, 12, 19625–19650. [Google Scholar] [CrossRef]
Ding, Z.; Wang, Y.; Wang, G.; Zhang, D.; Kifer, D. Detecting Violations Of Differential Privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 475–489. [Google Scholar] [CrossRef]
Jagielski, M.; Ullman, J.; Oprea, A. Auditing Differentially Private Machine Learning: How Private is Private sgd? Adv. Neural Inf. Process. Syst. 2020, 33, 22205–22216. [Google Scholar] [CrossRef]
Nasr, M.; Songi, S.; Thakurta, A.; Papernot, N.; Carlin, N. Adversary instantiation: Lower bounds for differentially private machine learning. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 866–882. [Google Scholar]
Lokna, J.; Paradis, A.; Dimitrov, D.I.; Vechev, M. Group and Attack: Auditing Differential Privacy. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23), Copenhagen, Denmark, 26–30 November 2023; ACM: New York, NY, USA, 2023; pp. 1–22. Available online: https://dl.acm.org/doi/10.1145/3576915.3616607 (accessed on 1 December 2024).
Tramèr, F.; Terzis, A.; Steinke, T.; Song, S.; Jagielski, M.; Carlini, N. Debugging differential privacy: A case study for privacy auditing. arXiv 2022, arXiv:2202.12219. Available online: https://arxiv.org/abs/2202.12219 (accessed on 1 December 2024).
Kifer, D.; Messing, S.; Roth, A.; Thakurta, A.; Zhang, D. Guidelines for Implementing and Auditing Differentially Private Systems. arXiv 2020, arXiv:2002.04049. Available online: https://arxiv.org/abs/2002.04049 (accessed on 1 December 2024).
Homer, N.; Szelinger, S.; Redman, M.; Duggan, D.; Tembe, W.; Muehling, J.; Pearson, J.V.; Stephan, D.A.; Nelson, S.F.; Craig, D.W. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008, 4, e1000167. [Google Scholar] [CrossRef] [PubMed]
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks against Machine Learning Models. In Proceedings of the 2017 IEEE Symposium on Security and Priavcy (S&P), San Jose, CA, USA, 22–26 May 2017; pp. 3–18. [Google Scholar]
Cui, G.; Ge, L.; Zhao, Y.; Fang, T. A Membership Inference Attack Defense Methods Based on Differential Privacy and Data Enhancement. In Proceedings of the Communication in Computer and Information Science, Manchester, UK, 9–11 September 2024; Volume 2015 CCIS, pp. 258–270. [Google Scholar]
Ma, Y.; Zhu, X.; Hsu, J. Data Poisoning against Differentially-Private Learners: Attacks and Defences. arXiv 2019, arXiv:1903.09860. Available online: https://arxiv.org/abs/1903.09860 (accessed on 1 December 2024).
Cinà, A.E.; Grosse, K.; Demondis, A.; Biggo, B.; Roli, F.; Pelillo, M. Machine Learning Security Against Data Poisoning: Are We There Yet? Computer 2024, 7, 26–34. [Google Scholar] [CrossRef]
Cheng, Z.; Li, Z.; Zhang, L.; Zhang, S. Differentially Private Machine Learning Model against Model Extraction Attack. In Proceedings of the IEEE 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Rhodes, Greece, 2–6 November 2020; pp. 722–728. Available online: https://ieeexplore.ieee.org/document/9291542 (accessed on 3 December 2024).
Miura, T.; Hasegawa, S.; Shibahara, T. MEGEX: Data-free model extraction attack against gradient-based explainable AI. arXiv 2021, arXiv:2107.08909. Available online: https://arxiv.org/abs/2107.08909 (accessed on 1 December 2024).
Ye, Z.; Luo, W.; Naseem, M.L.; Yang, X.; Shi, Y.; Jia, Y. C2FMI: Corse-to-Fine Black-Box Model Inversion Attack. IEEE Trans. Dependable Secur. Comput. 2024, 21, 1437–1450. Available online: https://ieeexplore.ieee.org/document/10148574 (accessed on 3 December 2024). [CrossRef]
Qiu, Y.; Yu, H.; Fang, H.; Yu, W.; Chen, B.; Wang, X.; Xia, S.-T.; Xu, K. MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense. arXiv 2024, arXiv:2410.05159. Available online: https://arxiv.org/abs/2410.05159 (accessed on 1 December 2024).
Stock, J.; Lange, L.; Erhard, R.; Federrath, H. Property Inference as a Regression Problem: Attacks and Defense. In Proceedings of the International Conference on Security and Cryptography, Bengaluru, India, 18–19 April 2024; pp. 876–885. Available online: https://www.scitepress.org/publishedPapers/2024/128638/pdf/index.html (accessed on 3 December 2024).
Lu, F.; Munoz, J.; Fuchs, M.; LeBlond, T.; Zaresky-Williams, E.; Raff, E.; Ferraro, F.; Testa, B. A General Framework for Auditing Differentially Private Machine Learning. In Advances in Neural Information Processing Systems; Oh, A.H., Belgrave, A., Cho, K., Eds.; The MIT Press: Cambridge, MA, USA, 2022; Available online: https://openreview.net/forum?id=AKM3C3tsSx3 (accessed on 1 December 2024).
Zanella-Béguelin, S.; Wutschitz, L.; Tople, S.; Salem, A.; Rühle, V.; Paverd, A.; Naseri, M.; Köpf, B.; Jones, D. Bayesian Estimation Of Differential Privacy. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 40624–40636. [Google Scholar]
Cowan, E.; Shoemate, M.; Pereira, M. Hands-On Differential Privacy; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2024; ISBN 9781492097747. [Google Scholar]
Bailie, J.; Gong, R. Differential Privacy: General Inferential Limits via Intervals of Measures. Proc. Mach. Learn. Res. 2023, 215, 11–24. Available online: https://proceedings.mlr.press/v215/bailie23a/bailie23a.pdf (accessed on 3 December 2024).
Kilpala, M.; Kärkäinen, T. Artificial Intelligence and Differential Privacy: Review of Protection Estimate Models. In Artificial Intelligence for Security: Enhancing Protection in a Changing World; Springer Nature: Cham, Switherland, 2024; pp. 35–54. [Google Scholar]
Balle, B.; Wang, Y.-X. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 394–403. Available online: http://proceedings.mlr.press/v80/balle18a/balle18a.pdf (accessed on 3 December 2024).
Chen, B.; Hale, M. The Bounded Gaussian Mechanism for Differential Privacy. J. Priv. Confidentiality 2024, 14, 1. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, Y.; Sun, R.; Tsai, P.-W.; Ul Hassan, M.; Yuan, X.; Xue, M.; Chen, J. Bounded and Unbiased Composite Differential Privacy. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2024; pp. 972–990. [Google Scholar]
Nanayakkara, P.; Smart, M.A.; Cummings, R.; Kaptchuk, G. What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy. In Proceedings of the 32nd USINEX Security Symposium, Anaheim, CA, USA, 9–11 August 2023; Volume 3, pp. 1613–1630. [Google Scholar] [CrossRef]
Cannone, C.; Kamath, G.; McMillan, A.; Smith, A.; Ullman, J. The Structure Of Optimal Private Tests For Simple Hypotheses. In Proceedings of the Annual ACM Symposium on Theory of Computing, Phoenix, AZ, USA, 23–26 June 2019; pp. 310–321. Available online: https://arxiv.org/abs/1811.11148 (accessed on 3 December 2024).
Dwork, C.; Feldman, V. Privacy-preserving Prediction. arXiv 2018, arXiv:1803.10266. Available online: https://arxiv.org/abs/1803.10266 (accessed on 1 December 2024).
Mironov, I. Rényi Differential Privacy. In Proceedings of the 30th IEEE Computer Security Foundations Symposium, CSF, Santa Barbara, CA, USA, 21–25 August 2017; pp. 263–275. [Google Scholar] [CrossRef]
Sarathy, R.; Muralidhar, K. Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 2011, 4, 1–17. [Google Scholar] [CrossRef]
Kumar, G.S.; Premalatha, K.; Uma Maheshwari, G.; Rajesh Kanna, P.; Vijaya, G.; Nivaashini, M. Differential privacy scheme using Laplace mechanism and statistical method computation in deep neural network for privacy preservation. Eng. Appl. Artif. Intell. 2024, 128, 107399. [Google Scholar] [CrossRef]
Liu, F. Generalized Gaussian Mechanism for Differential Privacy. IEEE Trans. Knowl. Data Eng. 2018, 31, 747–756. [Google Scholar] [CrossRef]
Dong, J.; Roth, A.; Su, W.J. Gaussian Differential privacy. arXiv 2019, arXiv:1905.02383. Available online: https://arxiv.org/abs/1905.02383 (accessed on 1 December 2024). [CrossRef]
Geng, Q.; Ding, W.; Guo, R.; Kumar, S. Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy. Proc. Mach. Lerning Res. 2020, 108, 89–99. Available online: http://proceedings.mlr.press/v108/geng20a/geng20a.pdf (accessed on 3 December 2024).
Whitehouse, J.; Ramdas, A.; Rogers, R.; Wu, Z.S. Fully-Adaptive Composition in Differential Privacy. arXiv 2023, arXiv:2203.05481. Available online: https://arxiv.org/abs/2203.05481 (accessed on 1 December 2024).
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology—EUROCRYPT; Vaudenay, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 486–503. [Google Scholar]
Zhu, K.; Fioretto, F.; Van Hentenryck, P. Post-processing of Differentially Private Data: A Fairness Perspective. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI), Vienna, Austria, 23–29 July 2022; pp. 4029–4035. [Google Scholar] [CrossRef]
Ganev, G.; Annamalai, M.S.M.S.; De Cristofaro, E. The Elusive Pursuit of Replicating PATE-GAN: Benchmarking, Auditing, Debugging. arXiv 2024, arXiv:2406.13985. Available online: https://arxiv.org/abs/2406.13985 (accessed on 1 December 2024).
Naseri, M.; Hayes, J.; De Cristofaro, E. Local and Central Differential Privacy for Robustness and Privacy in Federated Learning. arXiv 2022, arXiv:2009.03561. Available online: https://arxiv.org/abs/2009.03561 (accessed on 1 December 2024).
Babesne, B. Local Differential Privacy: A tutorial. arXiv 2019, arXiv:1907.11908. Available online: https://arxiv.org/abs/1907.11908 (accessed on 1 December 2024).
Nasr, M.; Shokri, R.; Houmandsadr, A. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. arXiv 2020, arXiv:1812.00910. Available online: https://arxiv.org/abs/1812.00910 (accessed on 1 December 2024).
Galli, F.; Biswas, S.; Jung, K.; Cucinotta, T.; Palamidessi, C. Group privacy for personalized federated learning. arXiv 2022, arXiv:2206.03396. Available online: https://arxiv.org/abs/2206.03396 (accessed on 1 December 2024).
Cormode, G.; Jha, S.; Kulkarni, T.; Li, N.; Srivastava, D.; Wang, T. Privacy At Scale: Local Differential Privacy in Practice. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1655–1658. [Google Scholar] [CrossRef]
Yang, M.; Guo, T.; Zhu, T.; Tjuawinata, I.; Zhao, J.; Lam, K.-Y. Local Differential Privacy And Its Applications: A Comprehensive Survey. Comput. Stand. Interfaces 2024, 89, 103827. [Google Scholar] [CrossRef]
Duchi, J.; Wainwright, M.J.; Jordan, M.I. Local Privacy And Minimax Bounds: Sharp Rates For Probability Estimation. Adv. Neural Inf. Process. Syst. 2013, 26, 1529–1537. [Google Scholar] [CrossRef]
Ruan, W.; Xu, M.; Fang, W.; Wang, L.; Wang, L.; Han, W. Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy. In Proceedings of the—IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 21–25 May 2023; pp. 1926–1943. [Google Scholar]
Pan, K.; Ong, Y.-S.; Gong, M.; Li, H.; Qin, A.K.; Gao, Y. Differential privacy in deep learning: A literature review. Neurocomputing 2024, 589, 127663. [Google Scholar] [CrossRef]
Kang, Y.; Liu, Y.; Niu, B.; Tong, X.; Zhang, L.; Wang, W. Input Perturbation: A New Paradigm between Central and Local Differential Privacy. arXiv 2020, arXiv:2002.08570. Available online: https://arxiv.org/abs/2002.08570 (accessed on 1 December 2024).
Chaudhuri, K.; Monteleoni, C.; Sarwate, A.D. Differentially Private Empirical Risk Minimization. J. Mach. Learn. Res. 2011, 12, 1069–1109. [Google Scholar] [CrossRef]
De Cristofaro, E. Critical Overview of Privacy in Machine Learning. IEEE Secur. Priv. 2021, 19, 19–27. [Google Scholar] [CrossRef]
Shen, Z.; Zhong, T. Analysis of Application Examples of Differential Privacy in Deep Learning. Comput. Intell. Neurosci. 2021, 2021, e4244040. [Google Scholar] [CrossRef]
Rigaki, M.; Garcia, S. A Survey of Privacy Attacks in Machine Learning. ACM Comput.Surv. 2023, 56, 101. [Google Scholar] [CrossRef]
Wu, D.; Qi, S.; Li, Q.; Cai, B.; Guo, Q.; Cheng, J. Understanding and Defending against White-Box Membership Inference Attack in Deep Learning. Knowl. Based Syst. 2023, 259, 110014. [Google Scholar] [CrossRef]
Fang, H.; Qiu, Y.; Yu, H.; Yu, W.; Kong, J.; Chong, B.; Chen, B.; Wang, X.; Xia, S.-T. Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses. arXiv 2024, arXiv:2402.04013. Available online: https://arxiv.org/abs/2402.04013 (accessed on 1 December 2024).
He, X.-M.; Wang, X.S.; Chen, H.-H.; Dong, Y.-H. Study on Choosing the Parameter ε in Differential Privacy. Tongxin Xuebo/J. Commun. 2015, 36, 12. [Google Scholar]
Mazzone, F.; Al Badawi, A.; Polyakov, Y.; Everts, M.; Hahn, F.; Peter, A. Investigating Privacy Attacks in the Gray-Box Settings to Enhance Collaborative Learning Schemes. arXiv 2024, arXiv:2409.17283. Available online: https://arxiv.org/abs/2409.17283 (accessed on 1 December 2024).
Fredrikson, M.; Jha, S.; Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, Denver, CO, USA, 12–16 October 2015; pp. 1322–1333. [Google Scholar]
Carlini, N.; Tramèr, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting training data from large language models. arXiv 2020, arXiv:2012.07805. Available online: https://arxiv.org/abs/2012.07805 (accessed on 1 December 2024).
Carlini, N.; Liu, C.; Erlingsson, Ś.; Kos, J.; Song, D. The secret sharer: Evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 267–284. [Google Scholar]
Carlini, N.; Chien, S.; Nasr, M.; Song, S.; Terzis, A.; Tramèr, F. Membership Inference Attacks from First Principles. arXiv 2021, arXiv:2112.03570. Available online: https://arxiv.org/abs/2112.03570 (accessed on 1 December 2024).
Hu, H.; Salcic, Z.; Sun, L.; Dobbie, G.; Yu, P.S.; Zhang, X. Membership Inference Attacks on Machine Learning: A Survey. arXiv 2022, arXiv:2103.07853. Available online: https://arxiv.org/abs/2103.07853 (accessed on 1 December 2024). [CrossRef]
Zarifzadeh, S.; Liu, P.; Shokri, R. Low-Cost High-Power Membership Inference Attacks. arXiv 2023, arXiv:2312.03262. Available online: https://arxiv.org/abs/2312.03262 (accessed on 1 December 2024).
Aubinais, E.; Gassiat, E.; Piantanida, P. Fundamental Limits of Membership Inference attacks on Machine Learning Models. arXiv 2024, arXiv:2310.13786. Available online: https://arxiv.org/html/2310.13786v4 (accessed on 1 December 2024).
Leino, K.; Fredrikson, M. Stolen memories: Leveraging model memorization for calibrated white box membership inference. In Proceedings of the 29th {USENIX} Security Symposium {USENIX} Security 20, Online, 12–14 August 2020; pp. 1605–1622. Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/leino (accessed on 2 December 2024).
Liu, R.; Wang, D.; Ren, Y.; Wang, Z.; Guo, K.; Qin, Q.; Liu, X. Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3958–3973. [Google Scholar] [CrossRef]
Annamalai, M.S.M.S. Nearly Tight Black-Box Auditing of Differential Private Machine Learning. arXiv 2024, arXiv:2405.14106. Available online: https://arxiv.org/abs/2405.14106 (accessed on 1 December 2024).
Lin, S.; Bun, M.; Gaboardi, M.; Kolaczyk, E.D.; Smith, A. Differential Private Confidence Intervals for Proportions Under Stratified Random Sampling. Electron. J. Stat. 2024, 18, 1455–1494. [Google Scholar] [CrossRef]
Wang, J.T.; Mahloujifar, S.; Wu, T.; Jia, R.; Mittal, P. A Randomized Approach to Tight Privacy Accounting. arXiv 2023, arXiv:2304.07927. Available online: https://arxiv.org/abs/2304.07927 (accessed on 1 December 2024).
Salem, A.; Zhang, Y.; Humbert, M.; Fritz, M.; Backes, M. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. arXiv 2019, arXiv:1806.01246. Available online: https://arxiv.org/abs/1806.01246 (accessed on 1 December 2024).
Ye, D.; Shen, S.; Zhu, T.; Liu, B.; Zhou, W. One Parameter Defense—Defending against Data Inference Attacks via Differential Privacy. arXiv 2022, arXiv:2203.06580. Available online: https://arxiv.org/abs/2203.06580 (accessed on 1 December 2024). [CrossRef]
Cummings, R.; Desfontaines, D.; Evans, D.; Geambasu, R.; Huang, Y.; Jagielski, M.; Kairouz, P.; Kamath, G.; Oh, S.; Ohrimenko, O.; et al. Advancing Differential Privacy: Where We are Now and Future Directions. Harv. Data Sci. Rev. 2024, 6, 475–489. [Google Scholar] [CrossRef]
Zhang, G.; Liu, B.; Zhu, T.; Ding, M.; Zhou, W. Label-Only Membership Inference attacks and Defense in Semantic Segmentation Models. IEEE Trans. Dependable Secur. Comput. 2023, 20, 1435–1449. [Google Scholar] [CrossRef]
Wu, Y.; Qiu, H.; Guo, S.; Li, J.; Zhang, T. You Only Query Once: An Efficient Label-Only Membership Inference Attack. In Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, Hybrid, Vienna, 7–11 May 2024; Available online: https://openreview.net/forum?id=7WsivwyHrS&noteId=QjoAoa8UVW (accessed on 3 December 2024).
Li, N.; Qardaji, W.; Su, D.; Wu, Y.; Yang, W. Membership privacy: A Unifying Framework for Privcy Definitions. In Proceedings of the ACM Conference on Computers and Communication Security (CCS), Berlin, Germany, 4–8 November 2013; pp. 889–900. [Google Scholar] [CrossRef]
Andrew, G.; Kairouz, P.; Oh, S.; Oprea, A.; McMahan, H.B.; Suriyakumar, V. One-shot Empirical Privacy for Federated Learning. arXiv 2024, arXiv:2302.03098. [Google Scholar]
Patel, N.; Shokri, R.; Zick, Y. Model Explanations with Differential Privacy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FaccT ’22), Seoul, Republic of Korea, 21–24 June 2022; ACM: New York, NY, USA, 2022. 10p. [Google Scholar] [CrossRef]
Ding, Z.; Tian, Y.; Wang, G.; Xiong, J. Regularization Mixup Adversarial Training: A Defense Strategy for Membership Privacy with Model Availbility Assurance. In Proceedings of the 2024 2nd Interntional Conference on Big Data and Privacy Computing, BDPC, Macau, China, 10–12 January 2024; pp. 206–212. [Google Scholar]
Qui, W. A Survey on Poisoning Attacks Against Supervised Machine Learning. arXiv 2022, arXiv:2202.02510. Available online: https://arxiv.org/abs/2202.02510 (accessed on 1 December 2024).
Zhao, B. Towards Class-Oriented Poisoning Attacks Against Neural Networks. In Proceedings of the 2022 IEEE/CVF Winter Conference on Application of Computer Vision, WACV, Waikoloa, HI, USA, 3–8 January 2022; pp. 2244–2253. [Google Scholar] [CrossRef]
Koh, P.W.; Steinhardt, J.; Liang, P. Stronger data poisoning attacks data sanitization defenses. arXiv 2021, arXiv:1811.00741. Available online: https://arxiv.org/abs/1811.00741 (accessed on 1 December 2024). [CrossRef]
Zhang, R.; Gou, S.; Wang, J.; Xie, X.; Tao, D. A Survey on Gradient Inversion Attacks, Defense and Future Directions. In Proceedings of the 31st Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022; pp. 5678–5685. Available online: https://www.ijcai.org/proceedings/2022/0791.pdf (accessed on 1 December 2024).
Yan, H.; Wang, Y.; Yao, L.; Zhong, X.; Zhao, J. A Stacionary Random Process based Privacy-Utility Tradeoff in Differential Privacy. In Proceedings of the 2023 International Confernce on High Performance Big Data and Intelligence Systems, HDIS 2023, Macau, China, 6–8 December 2023; pp. 178–185. [Google Scholar]
D’Oliveira, R.G.L.; Salamtian, S.; Médard, M. Low Influence, Utiltiy, and Independence in Differential Privacy: A Curious Case of (³₂). IEEE J. Sel. Areas Inf. Theory 2021, 2, 240–252. [Google Scholar] [CrossRef]
Chen, M.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted Backdoor attacks on deed learning systems using data poisoning. arXiv 2017, arXiv:1712.05526. Available online: https://arxiv.org/abs/1712.05526 (accessed on 1 December 2024).
Feng, S.; Tramèr, F. Privacy Backdoors: Stealing Data with Corrupted Pretrained Models. arXiv 2024, arXiv:2404.00473. Available online: https://arxiv.org/abs/2404.00473 (accessed on 1 December 2024).
Gu, T.; Dolan-Gavitt, B.; Garg, S. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv 2019, arXiv:1708.06733. Available online: https://arxiv.org/abs/1708.06733 (accessed on 1 December 2024).
Demelius, L.; Kern, R.; Trügler, A. Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey. arXiv 2023, arXiv:2309.16398. Available online: https://arxiv.org/abs/2309.16398 (accessed on 1 December 2024).
Oprea, A.; Singhal, A.; Vassilev, A. Poisoning attacks against machine learning: Can machine learning be trustworthy? Computer 2022, 55, 94–99. Available online: https://ieeexplore.ieee.org/document/9928202 (accessed on 1 December 2024). [CrossRef]
Salem, A.; Wen, R.; Backes, M.; Ma, S.; Zhang, Y. Dynamic Backdoor Attacks Against Machine Learning Models. In Proceedings of the IEEE European Symposium Security Privacy (EuroS&P), Genoa, Italy, 6–10 June 2022; pp. 703–718. [Google Scholar] [CrossRef]
Xu, X.; Chen, Y.; Wang, B.; Bian, Z.; Han, S.; Dong, C.; Sun, C.; Zhang, W.; Xu, L.; Zhang, P. CSBA: Covert Semantic Backdoor Attack Against Intelligent Connected Vehicles. IEEE Trans. Veh. Technol. 2024, 73, 17923–17928. [Google Scholar] [CrossRef]
Li, X.; Li, N.; Sun, W.; Gong, N.Z.; Li, H. Fine-grained Poisoning attack to Local Differential Privacy Protocols for Mean and Variance Estimation. In Proceedings of the 32nd USENIX Security Symposium (USINEX Security), Anaheim, CA, USA, 9–11 August 2023; Volume 3, pp. 1739–1756. Available online: https://www.usenix.org/conference/usenixsecurity23/presentation/li-xiaoguang (accessed on 3 December 2024).
Fang, X.; Yu, F.; Yang, G.; Qu, Y. Regression Analysis with Differential Privacy Preserving. IEEE Access 2019, 7, 129353–129361. [Google Scholar] [CrossRef]
Wang, Y.; Si, C.; Wu, X. Regression Model Fitting under Differential Privacy and Model Inversion Attack. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015; pp. 1003–1009. [Google Scholar]
Dibbo, S.V. SoK: Model Inversion Attack Landscape: Taxonomy, Challenges, and Future Roadmap. In Proceedings of the IEEE 36th Computer Security Foundations Symposium (CSF), Dubrovnik, Croatia, 10–14 July 2023; Available online: https://ieeexplore.ieee.org/document/10221914 (accessed on 1 December 2024).
Wu, X.; Fredrikson, M.; Jha, S.; Naughton, J.F. A methodology for formalizing model-inversion attacks. In Proceedings of the 2016 IEEE 29th Computer Security Foundations Symposium (CSF), Lisbon, Portugal, 27 June–1 July 2016; pp. 355–370. Available online: https://ieeexplore.ieee.org/document/7536387 (accessed on 3 December 2024).
Park, C.; Hong, D.; Seo, C. An Attack-Based Evaluation Method for Differentially Private Learning Against Model Inversion Attack. IEEE Access 2019, 7, 124988–124999. [Google Scholar] [CrossRef]
Zhao, J.; Chen, Y.; Zhang, W. Differential Privacy Preservation in Deep Learning: Challenges, Opportunities and Solutions. IEEE Access 2019, 7, 48901–48911. [Google Scholar] [CrossRef]
Yang, Z.; Zhang, J.; Chang, E.-C.; Liang, Z. Neural Network in Adversarial Setting via Background Knowledge Alignment. In Proceedings of the 2019 ACM SIGSAC Conf. on Computing and Communication Security, London, UK, 11–15 November 2019; pp. 225–240. [Google Scholar] [CrossRef]
Han, G.; Choi, J.; Lee, H.; Kim, J. Reinforcement Learning-Based Black-Box Model Inversion Attacks. arXiv 2023, arXiv:2304.04625. Available online: https://arxiv.org/abs/2304.04625 (accessed on 1 December 2024).
Han, G.; Choi, J.; Lee, H.; Kim, J. Reinforcement Learning-Based Black-Box Model Inversion Attacks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 20504–20513. [Google Scholar] [CrossRef]
Bekman, T.; Abolfathi, M.; Jafarian, H.; Biswas, A.; Banaei-Kashani, F.; Das, K. Practical Black Box Model Inversion Attacks Against Neural Nets. Commun. Comput. Inf. Sci. 2021, 1525, 39–54. [Google Scholar] [CrossRef]
Du, J.; Hu, J.; Wang, Z.; Sun, P.; Gong, N.Z.; Ren, K. SoK: Gradient Leakage in Federated Learning. arXiv 2024, arXiv:2404.05403. Available online: https://arxiv.org/abs/2404.05403 (accessed on 1 December 2024).
Li, Z.; Pu, Y.; Zhang, X.; Li, Y.; Li, J.; Ji, S. Protecting Object Detection Models From Model Extraction Attack via Feature Space Coverage. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI), Jeju, Republic of Korea, 3–9 August 2024; pp. 431–439. [Google Scholar] [CrossRef]
Tramér, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing Machine Learning Models and Prediction APIs. In Proceedings of the USENIX Security Symposium (SEC), Austin, TX, USA, 10–12 August 2016; Available online: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer (accessed on 3 December 2024).
Liang, J.; Pang, R.; Li, C.; Wang, T. Model Extraction Attacks Revisited. arXiv 2023, arXiv:2312.05386. Available online: https://arxiv.org/abs/2312.05386 (accessed on 1 December 2024).
Liu, S. Model Extraction Attack and Defense on Deep Generative Models. J. Phys. Conf. Ser. 2022, 2189, 012024. [Google Scholar] [CrossRef]
Parisot, M.P.M.; Pejo, B.; Spagnuelo, D. Property Inference Attacks on Convolution Neural Networks: Influence and Implications of Target Model’s Complexity. arXiv 2021, arXiv:2104.13061. Available online: https://arxiv.org/abs/2104.13061 (accessed on 1 December 2024).
Zhang, W.; Tople, S.; Ohrimenko, O. Leakage of dataset properties in Multi-Party machine learning. In Proceedings of the 30th USINEX Security Symposium (USINEX Security), virtual, 11–13 August 2021; USINEX Association: Berkeley, CA, USA, 2021; pp. 2687–2704. Available online: https://www.usenix.org/conference/usenixsecurity21/presentation/zhang-wanrong (accessed on 1 December 2024).
Mahloujifar, S.; Ghosh, E.; Chase, M. Property Inference from Poisoning. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; pp. 1120–1137. [Google Scholar] [CrossRef]
Horigome, H.; Kikuchi, H.; Fujita, M.; Yu, C.-M. Robust Estimation Method against Poisoning Attacks for Key-Value Data Local Differential Privacy. Appl. Sci. 2024, 14, 6368. [Google Scholar] [CrossRef]
Parisot, M.P.M.; Pejó, B.; Spagnuelo, D. Property Inference Attacks on Convolutional Neural Networks: Influence and Implications of Target Model’s Complexity. In Proceedings of the 18th International Conference on Security and Cryptography, SECRYPT, Online, 6–8 July 2021; pp. 715–721. [Google Scholar] [CrossRef]
Chase, M.; Ghosh, E.; Mahloujifar, S. Property Inference from Poisoning. arXiv 2021, arXiv:2101.11073. Available online: https://arxiv.org/abs/2101.11073 (accessed on 1 December 2024).
Liu, X.; Xie, L.; Wang, Y.; Zou, J.; Xiong, J.; Ying, Z.; Vasilakos, A.V. Privacy and Security Issues in Deep Learning: A Survey. In IEEE Access 2020, 9, 4566–4593. [Google Scholar] [CrossRef]
Gilbert, A.C.; McMillan, A. Property Testing for Differential Privacy. In Proceedings of the 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 2–5 October 2018; pp. 249–258. [Google Scholar] [CrossRef]
Liu, X.; Oh, S. Minimax Optimal Estimation of Approximate Differential Privacy on Neighbouring Databases. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS’19), Vancouver, BC, Canada, 8 December 2019; pp. 2417–2428. Available online: https://dl.acm.org/doi/10.5555/3454287.3454504 (accessed on 1 December 2024).
Tschantz, M.C.; Kaynar, D.; Datta, A. Formal Verification of Differential Privacy for Interactive Systems (Extended Abstract). Electron. Notes Theor. Comput. Sci. 2011, 276, 61–79. [Google Scholar] [CrossRef]
Pillutla, K.; McMahan, H.B.; Andrew, G.; Oprea, A.; Kairouz, P.; Oh, S. Unleashing the Power of Randomization in Auditing Differential Private ML. Adv. Neural Inf. Process. Syst. 2023, 36, 198465. Available online: https://arxiv.org/abs/2305.18447 (accessed on 3 December 2024).
Cebere, T.; Bellet, A.; Papernot, N. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. arXiv 2024, arXiv:2405.14457. Available online: https://arxiv.org/abs/2405.14457 (accessed on 1 December 2024).
Zhang, J.; Das, D.; Kamath, G.; Tramèr, F. Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data. arXiv 2024, arXiv:2409.19798. Available online: https://arxiv.org/abs/2409.19798 (accessed on 3 December 2024).
Yin, Y.; Chen, K.; Shou, L.; Chen, G. Defending Privacy against More Knowledge Membership Inference Attackers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Singapore, 14–18 August 2021; pp. 2026–2036. [Google Scholar] [CrossRef]
Bichsel, B.; Gehr, T.; Drachsler-Cohen, D.; Tsankov, P.; Vechev, M. DP-Finder: Finding Differential Privacy Violations, by Sampling and Optimization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18), Toronto, ON, Canada, 15–19 October 2018; ACM: New York, NY, USA, 2018. 17p. [Google Scholar] [CrossRef]
Niu, B.; Zhou, Z.; Chen, Y.; Cao, J.; Li, F. DP-Opt: Identify High Differential Privacy Violation by Optimization. In Wireless Algorithms, Systems, and Applications. WASA 2022; Wang, L., Segal, M., Chen, J., Qiu, T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13472. [Google Scholar] [CrossRef]
Birhane, A.; Steed, R.; Ojewale, V.; Vecchione, B.; Raji, I.D. AI auditing: The broken bus on the road to AI accountability. arXiv 2024, arXiv:2401.14462. Available online: https://arxiv.org/abs/2401.14462 (accessed on 1 December 2024).
Dwork, C. A Firm Foundation for Private Data Analysis. Commun. ACM 2011, 54, 86–95. [Google Scholar] [CrossRef]
Dwork, C.; Su, W.J.; Zhang, L. Differential Private False Discovery Rate. J. Priv. Confidentiality 2021, 11, 2. [Google Scholar] [CrossRef]
Liu, C.; He, X.; Chanyaswad, T.; Wang, S.; Mittal, P. Investigating Statistical Privacy Frameworks from the Perspective of Hypothesis Testing. Proc. Priv. Enhancing Technol. (PoPETs) 2019, 2019, 234–254. [Google Scholar] [CrossRef]
Balle, B.; Barthe, G.; Gaboardi, M.; Hsu, J.; Sato, T. Hypothesis Testing Interpretations and Rényi Differential Privacy. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statisitcs (AISTATS), Online, 26–28 August 2020; Volume 108, pp. 2496–2506. [Google Scholar]
Kairouz, P.; Oh, S.; Viswanath, P. The Composition Theorem for Differential Privacy. In Proceedings of the 32nd International Conference on Machine Learning, ICML, Lille, France, 6–11 July 2015; pp. 1376–1385. Available online: https://proceedings.mlr.press/v37/kairouz15.html (accessed on 3 December 2024).
Lu, Y.; Magdon-Ismail, M.; Wei, Y.; Zikas, V. Eureka: A General Framework for Black-box Differential Privacy Estimators. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2024; pp. 913–931. [Google Scholar]
Shamsabadi, A.S.; Tan, G.; Cebere, T.I.; Bellet, A.; Haddadi, H.; Papernot, N.; Wang, X.; Weller, A. Confident-Dproof: Confidential Proof of Differential Private Training. In Proceedings of the 12th International Conference on Learning Representations, ICLR, Hybrid, Vienna, 7–11 May 2024; Available online: https://openreview.net/forum?id=PQY2v6VtGe#tab-accept-oral (accessed on 1 December 2024).
Kazmi, M.; Lautraite, H.; Akbari, A.; Soroco, M.; Tang, Q.; Wang, T.; Gambs, S.; Lécuyer, M. PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining. arXiv 2024, arXiv:2402.09477. Available online: https://arxiv.org/abs/2402.09477 (accessed on 1 December 2024).
Song, L.; Shokri, R.; Mittal, P. Membership Inference Attacks Against Adversarially Robust Deep Learning Models. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), San Francisco, CA, USA, 19–23 May 2019. [Google Scholar]
Koskela, A.; Mohammadi, J. Black Box Differential Privacy Auditing Using Total Variation Distance. arXiv 2024, arXiv:2406.04827. Available online: https://arxiv.org/abs/2406.04827 (accessed on 1 December 2024).
Chen, J.; Wang, W.H.; Shi, X. Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data. Pac. Symp. Biocomput. 2021, 26, 26–37. [Google Scholar] [CrossRef]
Malek, M.; Mironov, I.; Prasad, K.; Shilov, I.; Tramèr, F. Antipodes of Label Differential Privacy: PATE and ALBI. arXiv 2021, arXiv:2106.03408. Available online: https://arxiv.org/abs/2106.03408 (accessed on 1 December 2024).
Choquette-Choo, C.A.; Tramèr, F.; Carlini, N.; Papernot, N. Label-only Membership Inference Attacks. In Proceedings of the 38th International Conference on Machine Learning (ICML), Online, 18–24 July 2021; pp. 1964–1974. [Google Scholar]
Rahman, M.A.; Rahman, T.; Laganière, R.; Mohammed, N.; Wang, Y. Membership Inference Attack against Differentially Private Deep Learning Models. Trans. Data Priv. 2018, 11, 61–79. [Google Scholar]
Humphries, T.; Rafuse, M.; Lindsey, T.; Oya, S.; Goldberg, I.; Kerschbaum, F. Differential Private Learning does not Bound Membership Inference. arXiv 2020, arXiv:2010.12112. Available online: http://www.arxiv.org/abs/2010.12112v1 (accessed on 1 December 2024).
Askin, Ö.; Kutta, T.; Dette, H. Statistical Quantification of Differential Privacy. arXiv 2022, arXiv:2108.09528. Available online: https://arxiv.org/abs/2108.09528 (accessed on 1 December 2024).
Aerni, M.; Zhang, J.; Tramèr, F. Evaluation of Machine Learning Privacy Defenses are Misleading. arXiv 2024, arXiv:2404.17399. Available online: https://arxiv.org/abs/2404.17399 (accessed on 1 December 2024).
Kong, Z.; Chowdhury, A.R.; Chaudhurury, K. Forgeability and Membership Inference Attacks. In Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security (AISec ’22), Los Angeles, CA, USA, 11 November 2022. [Google Scholar] [CrossRef]
Kutta, T.; Askin, Ö.; Dunsche, M. Lower Bounds for Rényi Differential Privacy in a Black-Box Settings. arXiv 2022, arXiv:2212.04739. Available online: https://arxiv.org/abs/2212.04739 (accessed on 1 December 2024).
Domingo-Enrich, C.; Mroueh, Y. Auditing Differential Privacy in High Dimensions with the Kernel Quantum Rényi Divergence. arXiv 2022, arXiv:2205.13941. Available online: https://arxiv.org/abs/2205.13941 (accessed on 1 December 2024).
Koh, P.W.; Ling, P. Understanding Black-box Predictions via Influence Functions. arXiv 2017, arXiv:1703.04730. Available online: https://arxiv.org/abs/1703.04730 (accessed on 1 December 2024).
Chen, C.; Campbell, N.D. Understanding training-data leakage from gradients in neural networks for image classification. arXiv 2021, arXiv:2111.10178. Available online: https://arxiv.org/abs/2111.10178 (accessed on 1 December 2024).
Xie, Z.; Yan, L.; Zhu, Z.; Sugiyama, M. Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Imrove Generalization. arXiv 2021, arXiv:2103.17182. Available online: https://arxiv.org/abs/2103.17182 (accessed on 2 December 2024).
Liu, F.; Zhao, X. Disclosure Risk from Homogeneity Attack in Differntial Private Frequency Distribution. arXiv 2021, arXiv:2101.00311. Available online: https://arxiv.org/abs/2101.00311 (accessed on 1 December 2024).
Steinke, T.; Ullman, J. Between Pure and Approximate Differential Privacy. arXiv 2015, arXiv:1501.06095. Available online: https://arxiv.org/abs/1501.06095 (accessed on 1 December 2024). [CrossRef]
Kairouz, P.; McMahan, B.; Song, S.; Thakkar, O.; Xu, Z. Practical and Private (Deep) Learning without Sampling on Shuffling. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 5213–5225. Available online: https://proceedings.mlr.press/v139/kairouz21b.html (accessed on 3 December 2024).
Li, Y. Theories in Online Information Privacy Research: A Critical Review and an Integrated Framework. Decis. Support. Syst. 2021, 54, 471–481. [Google Scholar] [CrossRef]
Hay, M.; Machanavajjhala, A.; Miklau, G.; Chen, Y.; Zhang, D. Principled evaluation of differential private algorithms using DPBench. In Proceedings of the ACM SIGMOD Conference on Management of Data, San Francisco, CA, USA, 26 June–1 July 2016; pp. 919–938. [Google Scholar] [CrossRef]
Wang, Y.; Ding, Z.; Kifer, D.; Zhang, D. Checkdp: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 9–13 November 2020; pp. 919–938. [Google Scholar] [CrossRef]
Barthe, G.; Chadha, R.; Jagannath, V.; Sistla, A.P.; Viswanathan, M. Deciding Differential Privacy for Programming with Finite Inputs and Outpus. arXiv 2022, arXiv:1910.04137. Available online: https://arxiv.org/abs/1910.04137 (accessed on 2 December 2024).
Hitaj, B.; Ateniese, G.; Perez-Cruz, F. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. arXiv 2017, arXiv:1702.07464. Available online: https://arxiv.org/abs/1702.07464 (accessed on 1 December 2024).
Song, C.; Ristenpart, T.; Shmatikov, V. Machine Learning Models that Remember Too Much. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Dallas, TX, USA, 30 October–3 November 2017; pp. 587–601. [Google Scholar] [CrossRef]
Cummings, R.; Durfee, D. Individual Sensitivity Preprocessing for Data Privacy. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Salt Lake City, UT, USA, 5–8 January 2020; pp. 528–547. [Google Scholar]
Zhou, S.; Zhu, T.; Ye, D.; Yu, X.; Zhou, W. Boosting Model Inversion Attacks With Adversarial Examples. IEEE Trans. Dependable Secur. Comput. 2023, 21, 1451–1468. [Google Scholar] [CrossRef]
Zhu, L.; Liu, Z.; Han, S. Deep Leakage from Gradients. arXiv 2019, arXiv:1906.08935. Available online: https://arxiv.org/abs/1906.08935 (accessed on 1 December 2024).
Huang, Y.; Gupta, S.; Song, Z.; Li, K.; Arora, S. Evaluating Gradient Inversion Attacks and Defenses in Federated Learning. Adv. Neural Netw. Inf. Process. Syst. 2021, 9, 7232–7241. Available online: https://proceedings.neurips.cc/paper_files/paper/2021/hash/3b3fff6463464959dcd1b68d0320f781-Abstract.html (accessed on 3 December 2024).
Wu, R.; Chen, X.; Guo, C.; Weinberger, K.Q. Learning to Invert: Simple Adaptive Attacks for Gradient Inversion in Federated Learning. In Proceedings of the 39th Conferrence on Uncertainty in Artificial Intelligence (UAI), Pittsburgh, PA, USA, 31 July–4 August 2023; Volume 216, pp. 2293–2303. Available online: https://proceedings.mlr.press/v216/wu23a.html (accessed on 3 December 2024).
Zhu, H.; Huang, L.; Xie, Z. GGI: Generative Gradient Inversion Attack in Federated Learning. In Proceedings of the 6th International Conference on Data-Driven Optimization of Complex Systems(DOCS), Hangzhou, China, 16–18 August 2024; pp. 379–384. Available online: http://arxiv.org/pdf/2405.10376.pdf (accessed on 3 December 2024).
Yang, Z.; Zhang, B.; Chen, G.; Li, T.; Su, D. Defending Model Inversion and Membership Inference Attacks vi Prediction Purification. In Proceedings of the IEEE/CVF Conference on Computing Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1234–1243. [Google Scholar]
Zhang, Q.; Ma, J.; Xiao, Y.; Lou, J.; Xiong, L. Broadening Differential Privacy for Deep Learning against Model Inversion Attacks. In Proceedings of the 2020 IEEE International Conference on Big Data, Atlanta, GA, USA, 10–13 December 2020; pp. 1061–1070. [Google Scholar] [CrossRef]
Manchini, C.; Ospina, R.; Leiva, V.; Martin-Barreiro, C. A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data. Inf. Sci. 2023, 627, 280–300. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Huang, Z.; Wang, H.; Lu, C.; Liu, C.; Chen, E. GraphMI: Extracting Private Graph Data from Graph Neural Networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 19–27 August 2021; pp. 3749–3755. [Google Scholar] [CrossRef]
Dziedzic, A.; Kaleem, M.A.; Lu, Y.S.; Papernot, N. Increasing the Cost of Model Extraction with Calibrated Proof of Work. In Proceedings of the 10th International Conference on Learning Representations (ICLR), Online, 25 April 2022; Available online: https://openreview.net/forum?id=EAy7C1cgE1L (accessed on 3 December 2024).
Li, X.; Yan, H.; Cheng, Z.; Sun, W.; Li, H. Protecting Regression Models with Personalized Local Differential Privacy. IEEE Trans. Dependable Secur. Comput. 2023, 20, 960–974. [Google Scholar] [CrossRef]
Zheng, H.; Ye, Q.; Hu, H.; Fang, C.; Shi, J. BDPL: A Boundary Differential Private Layer Against Machine Learning Model Extraction Attacks. In Computer Security—ESORICS 2019; Sako, K., Schneider, S., Ryan, P., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11735. [Google Scholar] [CrossRef]
Yan, H.; Li, X.; Li, H.; Li, J.; Sun, W.; Li, F. Monitoring-Based Differential Privacy Mechanism Against Query Flooding-based Model Extraction Attack. IEEE Trans. Dependable Secur. Comput. 2022, 19, 2680–2694. [Google Scholar] [CrossRef]
Suri, A.; Lu, Y.; Chen, Y.; Evans, D. Dissecting Distribution Inferrence. In Proceedings of the 2023 IEEE Confernce Security and Trustworthy Machine Learning (SaTML), Raleigh, NC, USA, 8–10 February 2023; pp. 150–164. [Google Scholar]
Ganju, K.; Wang, Q.; Yang, W.; Gunter, C.A.; Borisov, N. Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communication Security, Toronto, ON, Canada, 15–19 October 2018; pp. 619–633. Available online: https://dl.acm.org/doi/10.1145/3243734.3243834 (accessed on 1 December 2024).
Melis, L.; Song, C.; De Cristofaro, E.; Shmatikov, V. Exploiting Unintended Feature Leakage in Collaborative Learning. In Proceedings of the Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 691–706. [Google Scholar]
Huang, W.; Zhou, S. Unexpected Information Leakage of Differential Privacy Due to the Linear Properties of Queries. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3123–3137. [Google Scholar] [CrossRef]
Ben Hamida, S.; Hichem, M.; Jemai, A. How Differential Privacy Reinforces Privacy of Machine Learning Modeles? In Proceedings of the International Conference on Computational Collective Intelligence (ICCI), Leipzig, Germany, 9–11 September 2024. [Google Scholar]
Song, L.; Mittal, P.; Gong, N.Z. Systematic Evaluation of Privacy Risks in Machine Learning Models. In Proceedings of the ACM on Asian Conference on Computer and Communication Security, Taipei, Taiwan, 5–9 October 2020. [Google Scholar]

Table 1. A list of alternative attacks to evaluate the privacy guarantees.

Attack	DPML Stages Impact	Type of Attack	Attack Techniques
Membership inference	Training data	White-box membership inference attack	Gradient-based approaches: Exploiting gradients whether specific data points were part of the training dataset.
		White-box membership inference attack	Activation analysis: Exploiting the activations for training data based on the assumption that they differ in certain layers from the activations for non-training data in certain layers.
		Black-box membership inference attack	Training shadow models: Creating and training a set of models that mimic the behavior of the targeted model.
		Black-box membership inference attack	Confidence score analysis: Construct and analyze confidence scores or confidence intervals.
		Label-only membership inference attack	Adaptive querying: Modifying the inputs to answer queries that are individually selected, where each query depends on the answer to the previous query when the model changes the label.
		Label-only membership inference attack	Meta-classification: Training a secondary model to distinguish between the labels of training and non-training data.
		Transfer membership inference attack	Model approximation: Using approximation algorithms to test the decision boundaries of the target model.
		Transfer membership inference attack	Adversarial examples: Using adversarial techniques to evaluate privacy guarantees.
Data poisoning	Training phase/model, data	Gradient manipulation attack	The gradients are intentionally altered during the model training process.
		Targeted label flipping	Label modification of certain data points in the training data without changing the data themselves.
		Backdoor poisoning	Inserting a specific trigger or “backdoor”.
		Data injection	Injecting malicious data samples that are designed to disrupt the model’s training.
		Adaptive querying and poisoning	Injecting a slightly modified version of data points and analyzing how these changes affect label predictions.
Model inversion	Model	White-box inversion attacks	The attacker uses detailed insights into the model’s structure and parameters (e.g., model weights or gradients) to recover private training data.
		Black-box inversion attacks	The attacker iteratively queries the model and uses the outputs to infer sensitive information without access to the model’s internals.
		Inferring sensitive attributes from the model	Balancing the privacy budget for sensitive and non-sensitive attributes.
		Gradient-based inversion attacks	The attacker tries to recover private training data from shared gradients.
Model extraction	Model	Adaptive Query-Flooding Parameter Duplication (QPD) attack	Allow the attacker to infer model information with black-box access and no prior knowledge of model parameters or training data.
		Equation-solving attack	Targets regression models by adding high-dimensional Gaussian noise to model coefficients.
		Membership-based property inference	Combines membership inference with property inference, targeting specific subpopulations with unique features.

Table 2. A list of privacy auditing schemes.

Privacy Auditing Scheme	Privacy Attack	Auditing Methodology
Membership inference audits	White-box membership inference auditing	Auditors analyze gradients, hidden layers, intermediate activations measuring how training data influences model behavior.
	Black-box membership inference auditing	Auditors observe probability distributions and confidence scores by analyzing these outputs and assessing the likelihood that certain samples were part of the training data.
	Shadow model membership auditing	Auditors use “shadow models” to mimic the behavior of the target model.
	Label-only membership inference auditing	Auditor evaluates the privacy guarantee leveraging only output labels, training shadow models, generating a separate classifier, and quantifying true-positive rate and accuracy.
	Single-training membership inference run auditing	Auditor leverages the ability to add or remove multiple training examples independently during the run. This approach focuses on estimating the lower bounds of the privacy parameters without the need for extensive retraining of the models.
	Metric-based membership inference auditing	Auditor assesses privacy guarantees directly evaluating metrics and statistics derived from the model’s outputs (precision, recall, and F1-score) on data points.
	Data augmentation-based auditing	Auditor generates or augmented data samples similar to training set, testing whether these samples reveal membership risk.
Data poisoning auditing	Influence-function analysis	Evaluate privacy by introduction malicious data.
	Gradient manipulation in DP training	Auditor alters the training data using back-gradient optimization, gradient ascent poisoning, etc.
	Empirical evaluation of privacy loss	Auditor conducts quantitative analyses of how the privacy budgets is affected.
	Simulation of worst-case poisoning scenarios	Auditor constructs approximate upper bounds on the privacy loss.
Model inversion auditing	Sensitivity analysis	Auditor quantifies how much private information is embedded in the model outputs.
	Gradient and weight analyses	Auditor attempts to recreate input features or private data points form model outputs using gradient-based or optimization methods.
	Empirical privacy loss	Auditor calculates the difference between theoretical and empirical privacy losses.
	Embedding and reconstruction test	Auditor examines whether latent representations or embeddings could be reversed to obtain private data.
Model extraction auditing	Query analysis	Auditors simulate extraction attacks by extensively querying the model and analyzing how well they can replicate its outputs or decision boundaries.
Property inference auditing	Evaluating property sensitivity with model outputs.	The auditor performs a test to infer whether certain properties can be derived from the model and whether the privacy parameters are sufficient to obscure such data properties.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Namatevs, I.; Sudars, K.; Nikulins, A.; Ozols, K. Privacy Auditing in Differential Private Machine Learning: The Current Trends. Appl. Sci. 2025, 15, 647. https://doi.org/10.3390/app15020647

AMA Style

Namatevs I, Sudars K, Nikulins A, Ozols K. Privacy Auditing in Differential Private Machine Learning: The Current Trends. Applied Sciences. 2025; 15(2):647. https://doi.org/10.3390/app15020647

Chicago/Turabian Style

Namatevs, Ivars, Kaspars Sudars, Arturs Nikulins, and Kaspars Ozols. 2025. "Privacy Auditing in Differential Private Machine Learning: The Current Trends" Applied Sciences 15, no. 2: 647. https://doi.org/10.3390/app15020647

APA Style

Namatevs, I., Sudars, K., Nikulins, A., & Ozols, K. (2025). Privacy Auditing in Differential Private Machine Learning: The Current Trends. Applied Sciences, 15(2), 647. https://doi.org/10.3390/app15020647

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Privacy Auditing in Differential Private Machine Learning: The Current Trends

Abstract

1. Introduction

2. Preliminaries

2.1. Differential Privacy Fundamentals

2.2. Differential Privacy Composition

2.3. Centralized and Local Models of Differential Privacy

2.4. Noise Injecting

3. Privacy Attacks

3.1. Overview

3.2. White-Box vs. Black-Box Attacks

3.3. Type of Attacks

3.3.1. Membership Inference Attack

3.3.2. Data-Poisoning Attack

3.3.3. Model Inversion Attack

3.3.4. Model Extraction Attack

3.3.5. Property Inference Attacks

4. Privacy Auditing Schemes

4.1. Privacy Auditing in Differential Privacy

4.2. Privacy Auditing Techniques

4.3. Privacy Audits

4.3.1. Differential Privacy Auditing Using Membership Inference

4.3.2. Differential Privacy Auditing with Data Poisoning

4.3.3. Differential Privacy Auditing with Model Inversion

4.3.4. Differential Privacy Auditing Using Model Extraction

4.3.5. Differential Privacy Auditing Using Property Inference

5. Discussion and Future Research

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI