1. Introduction
Differential privacy (DP) [
1] has emerged as a de facto standard for privacypreserving technologies in research and practice due to the quantifiable privacy guarantee it provides. DP involves randomizing the outputs of an algorithm in such a way that the presence or absence of a single individual’s information within a database does not significantly affect the outcome of the algorithm. DP typically introduces randomness in the form of additive noise, ensuring that an adversary cannot infer any information about a particular record with high confidence. The key challenge is to keep the performance or
utility of the noisy algorithm close enough to the unperturbed one to be useful in practice [
2].
In its pure form, DP measures privacy risk by a parameter
$\u03f5$, which can be interpreted as the
privacy budget, that bounds the loglikelihood ratio of the output of a private algorithm under two datasets differing in a single individual’s data. The smaller
$\u03f5$ used, the greater the privacy ensured, but at the cost of worse performance. In privacypreserving machine learning models, higher values of
$\u03f5$ are generally chosen to achieve acceptable utility. However, setting
$\u03f5$ to arbitrarily large values severely undermines privacy, although there are no hard threshold values for
$\u03f5$ above which formal guarantees provided by DP become meaningless in practice [
3]. In order to improve utility for a given privacy budget, a relaxed definition of differential privacy, referred to as
$(\u03f5,\delta )$DP, was proposed [
4]. Under this privacy notion, a randomized algorithm is considered privacypreserving if the privacy loss of the output is smaller than
$exp\left(\u03f5\right)$ with a high probability (i.e., with probability at least
$1\delta $) [
5].
Our current work is motivated by the necessity of a decentralized differentially private algorithm to efficiently solve practical signal estimation and learning problems that (i) offers better privacy–utility tradeoff compared to existing approaches, and (ii) offers similar utility as the pooleddata (or centralized) scenario. Some noteworthy realworld examples of systems that may need such differentially private decentralized solutions include [
6]: (i) medical research consortium of healthcare centers and labs, (ii) decentralized speech processing systems for learning model parameters for speaker recognition, (iii) multiparty cyberphysical systems. To this end, we first focus on improving the privacy–utility tradeoff of a well known DP mechanism, called the
functional mechanism (FM) [
7]. The FM approach is more general and requires fewer assumptions on the objective function than other objective perturbation approaches [
8,
9].
The functional mechanism was originally proposed for “pure”
$\u03f5$DP. However, it involves an additive noise with very large variance for datasets with even moderate ambient dimension, leading to a severe degradation in utility. We propose a natural “approximate”
$(\u03f5,\delta )$DP variant using Gaussian noise and show that the proposed
Gaussian FM scheme significantly reduces the additive noise variance. A recent work by Ding et al. [
10] proposed
relaxed FM using the Extended Gaussian mechanism [
11], which also guarantees approximate
$(\u03f5,\delta )$DP instead of pure DP. However, we will show analytically and empirically that, just like the original FM, the relaxed FM also suffers from prohibitively large noise variance even for moderate ambient dimensions. Our tighter sensitivity analysis for the Gaussian FM, which is different from the technique used in [
10], allows us to achieve much better utility for the same privacy guarantee. We further extend the proposed Gaussian FM framework to the decentralized or “federated” learning setting using the
$\mathsf{CAPE}$ protocol [
6]. Our
$\mathsf{capeFM}$ algorithm can offer the same level of utility as the centralized case over a range of parameters. Our empirical evaluation of the proposed algorithms on synthetic and real datasets demonstrates the superiority of the proposed schemes over the existing methods. We now review the relevant existing research works in this area before summarizing our contributions.
Related Works. There is a vast literature on the perturbation techniques to ensure DP in machine learning algorithms. The simplest method for ensuring that an algorithm satisfies DP is
input perturbation, where noise is introduced to the input of the algorithm [
2]. Another common approach is
output perturbation, which obtains DP by adding noise to the output of the problem. In many machine learning algorithms, the underlying objective function is minimized with gradient descent. As the gradient is dependent on the privacysensitive data, randomization is introduced at each step of the gradient descent [
9,
12]. The amount of noise we need to add at each step depends on the
sensitivity of the function to changes in its input [
4].
Objective perturbation [
8,
9,
13] is another stateoftheart method to obtain DP, where noise is added to the underlying objective function of the machine learning algorithm, rather than its solutions. A newly proposed take on output perturbation [
14] injects noise after model convergence, which imposes some additional constraints. In addition to optimization problems, Smith [
15] proposed a general approach for computing summary statistics using the
sampleandaggregate framework and both the Laplace and Exponential mechanisms [
16].
Zhang et al. originally proposed
functional mechanism (FM) [
7] as an extension to the Laplace mechanism. FM has been used in numerous studies to ensure DP in practical settings. Jorgensen et al. applied FM in personalized differential privacy (PDP) [
17], where the privacy requirements are specified at the userlevel, rather than by a single, global privacy parameter. FM has also been combined with homomorphic encryption [
18] to obtain both data secrecy and output privacy, as well as with fairnessaware learning [
10,
19] in classification models. The work of Fredrikson et al. [
20], which demonstrated privacy in pharmacogenetics using FM and other DP mechanisms, is of particular interest to us. Pharmacogenetic models [
21,
22,
23,
24] contain sensitive clinical and genomic data that need to be protected. However, poor utility of differentially private pharmacogenetic models can expose patients to increased risk of disease. Fredrikson et al. [
20] tested the efficacy of such models against attribute inference by using a model inversion technique. Their study shows that, although not explicitly designed to protect attribute privacy, DP can prevent attackers from accurately predicting genetic markers if
$\u03f5$ is sufficiently small (≤1). However, the small value of
$\u03f5$ results in poor utility of the models due to excessive noise addition, leading them to conclude that when utility cannot be compromised much,
the existing methods do not give an ϵ for which stateoftheart DP mechanisms can be reasonably employed. As mentioned before, Ding et al. [
10] recently proposed relaxed FM in an attempt to improve upon the original FM using the Extended Gaussian mechanism [
11], which offered approximate DP guarantee.
DP algorithms provide different guarantees than Secure Multiparty Computation (SMC)based methods. Several studies [
25,
26,
27] applied a combination of SMC and DP for distributed learning. Gade and Vaidya [
25] demonstrated one such method in which each site adds and subtracts arbitrary functions to confuse the adversary. Heikkilä et al. [
26] also studied the relationship of additive noise and sample size in a distributed setting. In their model,
S data holders communicate their data to
M computation nodes to compute a function. Tajeddine et al. [
27] used DPSMC on vertically partitioned data, i.e., where data of the same participants are distributed across multiple parties or data holders. Bonawitz et al. [
28] proposed a communicationefficient method for federated learning over a large number of mobile devices. More recently, Heikkilä et al. [
29] considered DP in a crosssilo federated learning setting by combining it with additive homomorphic secure summation protocols. Xu et al. [
30] investigated DP for multiparty learning in vertically partitioned data setting. Their proposed framework dissects the objective function into singleparty and crossparty subfunctions, and applies functional mechanisms and secure aggregation to achieve the same utility as the centralized DP model. Inspired by the seminal work of Dwork et al. [
31] that proposed distributed noise generation for preserving privacy, Imtiaz et al. [
6] proposed the
Correlation Private Estimation ($\mathsf{CAPE}$) protocol.
$\mathsf{CAPE}$ employs a similar principle as Anandan and Clifton [
32] to
reduce the noise added for DP in decentralizeddata settings.
Our Contributions. As mentioned before, we are motivated by the necessity of a decentralized differentially private algorithm that injects a smaller amount of noise (compared to existing approaches) to efficiently solve practical signal estimation and learning problems. To that end, we first propose an improvement to the existing functional mechanism. We achieve this by performing a tighter characterization of the sensitivity analysis, which significantly reduces the additive noise variance. As we utilize the Gaussian mechanism [
33] to ensure
$(\u03f5,\delta )$DP, we call our improved functional mechanism
Gaussian FM. Using our novel sensitivity analysis, we show that the proposed Gaussian FM injects a much smaller amount of additive noise compared to the original FM [
7] and the relaxed FM [
10] algorithms. We empirically show the superiority of Gaussian FM in terms of privacy guarantee and utility by comparing it with the corresponding nonprivate algorithm, the original FM [
7], the relaxed FM [
10], the objective perturbation [
8], and the noisy gradient descent [
12] methods. Note that the original FM [
7] and the objective perturbation [
8] methods guarantee pure DP, whereas the other methods guarantee approximate DP. We compare our
$(\u03f5,\delta )$DP Gaussian FM with the pure DP algorithms as a means for investigating how much performance/utility gain one can achieve by trading off pure the DP guarantee with an approximate DP guarantee. Additionally, the noisy gradient descent method is a multiround algorithm. Due to the composition theorem of differential privacy [
33], the privacy budgets in multiround algorithms accumulate across the number of iterations during training. In order to perform better accounting for the total privacy loss in the noisy gradient descent algorithm, we use Rényi differential privacy [
34].
Considering the fact that machine learning algorithms are often used in decentralized/federated data settings, we adapt our proposed Gaussian FM algorithm to decentralized/federated data settings following the (
$\mathsf{CAPE}$) [
6] protocol, and propose
$\mathsf{capeFM}$. In many signal processing and machine learning applications, where privacy regulations prevent sites from sharing the local raw data, joint learning across datasets can yield discoveries that are impossible to obtain from a single site. Motivated by scientific collaborations that are common in human health research,
$\mathsf{CAPE}$ improves upon the conventional decentralized DP schemes and achieves the same level of utility as the pooleddata scenario in certain regimes. It has been shown [
6] that
$\mathsf{CAPE}$ can benefit computations with sensitivies satisfying some conditions. Many functions of interest in machine learning and deep neural networks have sensitivites that satisfy these conditions. Our proposed
$\mathsf{capeFM}$ algorithm utilizes the Stone–Weierstrass theorem [
35] to approximate a cost function in the decentralizeddata setting and employs the
$\mathsf{CAPE}$ protocol.
To summarize, the goal of our work is to improve the privacy–utility tradeoff and reduce the amount of noise in the functional mechanism at the expense of approximate DP guarantee for applications of machine learning in decentralized/federated data settings, similar to those found in research consortia. Our main contributions are:
We propose Gaussian FM as an improvement over the existing functional mechanism by performing a tighter sensitivity analysis. Our novel analysis has two major features: (i) the sensitivity parameters of the datadependent (hence, privacysensitive) polynomial coefficients of the Stone–Weierstrass decomposition of the objective function are free of the dataset dimensionality; and (ii) the additive noise for privacy is tailored for the
order of the polynomial coefficient of the Stone–Weierstrass decomposition of the objective function, rather than being the same for all coefficients. These features give our proposed Gaussian FM a significant advantage by offering much less noisy function computation compared to both the original FM [
7] and the relaxed FM [
10], as shown for linear and logistic regression problems. We also empirically validate this on real and synthetic data.
We extend our Gaussian FM to decentralized/federated data settings to propose $\mathsf{capeFM}$, a novel extension of the functional mechanism for decentralizeddata. To this end, we note another significant advantage of our proposed Gaussian FM over the original FM: the Gaussian FM can be readily extended to decentralized/federated data settings by exploiting the fact that the sum of a number of Gaussian random variables is another Gaussian random variable, which is not true for Laplace random variables. We show that the proposed $\mathsf{capeFM}$ can achieve the same utility as the pooleddata scenario for some parameter choices. To the best of our knowledge, our work is the first functional mechanism for decentralizeddata settings.
We demonstrate the effectiveness of our algorithms with varying privacy and dataset parameters. Our privacy analysis and empirical results on real and synthetic datasets show that the proposed algorithms can achieve much better utility than the existing stateoftheart algorithms.
3. Functional Mechanism with Approximate Differential Privacy: Gaussian FM
Zhang et al. [
7] computed the
${\mathcal{L}}_{1}$sensitivity
${\Delta}^{fm}$ of the datadependent terms for linear regression and logistic regression problems. The
${\Delta}^{fm}$ is shown to be
$\frac{2}{N}{(1+D)}^{2}$ for linear regression, and
$\frac{1}{N}\left(\right)open="("\; close=")">\frac{{D}^{2}}{4}+3D$ for logistic regression. We note that
${\Delta}^{fm}$ grows quadratically with the ambient dimension of the data samples, resulting in a excessively large amount of noise to be injected into the objective function. Additionally, Ding et al. [
10] proposed relaxed FM, a “utilityenhancement scheme”, by replacing the Laplace mechanism with the Extended Gaussian mechanism [
11], and thus achieving slightly better utility than the original FM at the expense of an approximate DP guarantee instead of a pure DP guarantee. However, Ding et al. [
10] showed that the
${\mathcal{L}}_{2}$sensitivity of the datadependent terms for the logistic regression problem is
${\Delta}^{rlxfm}=\frac{1}{N}\sqrt{\frac{{D}^{2}}{16}+D}$. Additionally, using the technique outlined in [
10], it can be shown that the
${\mathcal{L}}_{2}$sensitivity of the datadependent terms is
${\Delta}^{rlxfm}=\frac{2}{N}\sqrt{1+4D+{D}^{2}}$ for the linear regression problem (please see
Appendix A for details). For both cases, we observe that
${\Delta}^{rlxfm}$ grows linearly with the ambient dimension of the data samples. Therefore, the privacypreserving additive noise variances in both the original FM and relaxed FM schemes are datadimensionality dependent, and therefore, can be prohibitively large even for moderate
D. Moreover, both FM and relaxed FM schemes add the same amount of noise to each polynomial coefficient
${\lambda}_{\varphi n}$ irrespective of the order
j. With a tighter characterization, we show in
Section 4 that the sensitivities of these coefficients are different for different order
j. We reduce the amount of added noise by addressing these issues and performing a novel sensitivity analysis. The key points are as follows:
Instead of computing the $\u03f5$DP approximation of the objective function using the Laplace mechanism, we use the Gaussian mechanism to compute the ($\u03f5,\delta $)DP approximation of ${f}_{D}\left(\mathbf{w}\right)$. This gives a weaker privacy guarantee than the pure differential privacy, but provides much better utility.
Recall that the original FM achieves
$\u03f5$DP by adding Laplace noise scaled to the
${\mathcal{L}}_{1}$sensitivity of the datadependent terms of the objective function
${f}_{D}\left(\mathbf{w}\right)$ in (
2). As we use the Gaussian mechanism, we require
${\mathcal{L}}_{2}$sensitivity analysis. To compute the
${\mathcal{L}}_{2}$sensitivity of the datadependent terms of the objective function
${f}_{D}\left(\mathbf{w}\right)$ in (
2), we first define an
array ${\Lambda}_{j}$ that contains
$\frac{1}{N}{\sum}_{n=1}^{N}{\lambda}_{\varphi n}$ as its entries for all
$\varphi \left(\mathbf{w}\right)\in {\mathsf{\Phi}}_{j}$. The term “array” is used because the dimension of
${\Lambda}_{j}$ depends on the cardinality of
${\mathsf{\Phi}}_{j}$. For example, for
$j=0$,
${\Lambda}_{0}$ is a scalar because
${\mathsf{\Phi}}_{0}=\left\{1\right\}$; for
$j=1$,
${\Lambda}_{1}$ can be expressed as a
Ddimensional vector because
${\mathsf{\Phi}}_{1}=\{{w}_{1},{w}_{2},...,{w}_{D}\}$; for
$j=2$,
${\Lambda}_{2}$ can be expressed as a
$D\times D$ matrix because
${\mathsf{\Phi}}_{2}=\{{w}_{{d}_{1}}{w}_{{d}_{2}}\mid {d}_{1},{d}_{2}\in \left[D\right]\}$.
We rewrite the objective function as
where
${\overline{\varphi}}_{j}$ is the array containing all
$\varphi \left(\mathbf{w}\right)\in {\mathsf{\Phi}}_{j}$ as its entries. Note that
${\overline{\varphi}}_{j}$ and
${\Lambda}_{j}$ have the same dimensions and number of elements. We define the
${\mathcal{L}}_{2}$sensitivity of
${\Lambda}_{j}$ as
where
${\Lambda}_{j}^{\mathbb{D}}$ and
${\Lambda}_{j}^{{\mathbb{D}}^{\prime}}$ are computed on neighboring datasets
$\mathbb{D}$ and
${\mathbb{D}}^{\prime}$, respectively. Following the Gaussian mechanism [
33], we can calculate the
$(\u03f5,\delta )$ differentially private estimate of
${\Lambda}_{j}$, denoted
${\widehat{\Lambda}}_{j}$ as
where the noise array
${e}_{j}$ has the same dimension as
${\Lambda}_{j}$, and contains entries drawn i.i.d. from
$\mathcal{N}(0,{\tau}_{j}^{2})$ with
${\tau}_{j}=\frac{{\Delta}_{j}}{\u03f5}\sqrt{2log\frac{1.25}{\delta}}$. Finally, we have
As the function ${f}_{D}\left(\mathbf{w}\right)$ depends on the data only through $\left\{{\Lambda}_{j}\right\}$, this computation satisfies ($\u03f5,\delta $)differential privacy. Our proposed Gaussian FM is shown in detail in Algorithm 1.
Theorem 2 (Privacy of the Gaussian FM (Algorithm 1))
. Consider Algorithm 1 with privacy parameters $(\u03f5,\delta )$, and the empirical average cost function ${f}_{D}\left(\mathbf{w}\right)$ represented as in (3). Then Algorithm 1 computes an $(\u03f5,\delta )$ differentially private approximation ${\widehat{f}}_{D}\left(\mathbf{w}\right)$ to ${f}_{D}\left(\mathbf{w}\right)$. Consequently, the minimizer ${\widehat{\mathbf{w}}}^{*}={arg\; min}_{\mathbf{w}}{\widehat{f}}_{D}\left(\mathbf{w}\right)$ satisfies $(\u03f5,\delta )$differential privacy. Algorithm 1 Gaussian FM 
 Require:
Data samples $({\mathbf{x}}_{n},{y}_{n})$ for $n\in \left[N\right]$; cost function ${f}_{D}\left(\mathbf{w}\right)$ represented as in ( 3); privacy parameters ( $\u03f5,\delta $).  1:
for
$0\le j\le J$
do  2:
Compute ${\Lambda}_{j}$ as shown in Section 4 3:
Compute ${\Delta}_{j}={max}_{\mathbb{D},{\mathbb{D}}^{\prime}}\parallel {\Lambda}_{j}^{\mathbb{D}}{\Lambda}_{j}^{{\mathbb{D}}^{\prime}}{\parallel}_{2}$  4:
Compute ${\tau}_{j}=\frac{{\Delta}_{j}}{\u03f5}\sqrt{2log\frac{1.25}{\delta}}$  5:
Compute ${e}_{j}\sim \mathcal{N}(0,{\tau}_{j}^{2})$ with the same dimension as ${\Lambda}_{j}$  6:
Release ${\widehat{\Lambda}}_{j}={\Lambda}_{j}+{e}_{j}$  7:
end for  8:
Compute ${\widehat{f}}_{D}\left(\mathbf{w}\right)={\sum}_{j=0}^{J}\left(\right)open="\langle "\; close="\rangle ">{\widehat{\Lambda}}_{j},{\overline{\varphi}}_{j}$  9:
return Perturbed objective function ${\widehat{f}}_{D}\left(\mathbf{w}\right)$

Proof. The proof of Theorem 2 follows from the fact that the function
${\widehat{f}}_{D}\left(\mathbf{w}\right)$ depends on the data samples only through
$\left\{{\widehat{\Lambda}}_{j}\right\}$. The computation of
$\left\{{\widehat{\Lambda}}_{j}\right\}$ is
$(\u03f5,\delta )$differentially private by the Gaussian mechanism [
4,
33]. Therefore, the release of
${\widehat{f}}_{D}\left(\mathbf{w}\right)$ satisfies
$(\u03f5,\delta )$differential privacy. One way to rationalize this is to consider that the probability of the event of selecting a particular set of
$\left\{{\widehat{\Lambda}}_{j}\right\}$ is the same as the event of formulating a function
${\widehat{f}}_{D}\left(\mathbf{w}\right)$ with that set of
$\left\{{\widehat{\Lambda}}_{j}\right\}$. Therefore, it suffices to consider the joint density of the
$\left\{{\widehat{\Lambda}}_{j}\right\}$ and find an upper bound on the ratio of the joint densities of the
$\left\{{\widehat{\Lambda}}_{j}\right\}$ under two neighboring datasets
$\mathbb{D}$ and
${\mathbb{D}}^{\prime}$. As we employ the Gaussian mechanism to compute
$\left\{{\widehat{\Lambda}}_{j}\right\}$, the ratio is upper bounded by
$exp\left(\u03f5\right)$ with probability at least
$1\delta $. Therefore, the release of
${\widehat{f}}_{D}\left(\mathbf{w}\right)$ satisfies
$(\u03f5,\delta )$differential privacy. Furthermore, differential privacy is postprocessing invariant. Therefore, the computation of the minimizer
${\widehat{\mathbf{w}}}^{*}={arg\; min}_{\mathbf{w}}{\widehat{f}}_{D}\left(\mathbf{w}\right)$ also satisfies
$(\u03f5,\delta )$differential privacy. □
Privacy Analysis of Noisy Gradient Descent [12] using Rényi Differential Privacy. One of the most crucial qualitative properties of DP is that it allows us to evaluate the cumulative privacy loss over multiple computations [
33]. Cumulative, or total, privacy loss is different from (
$\u03f5,\delta $)DP in multiround machine learning algorithms. In order to demonstrate the superior privacy guarantee of the proposed Gaussian FM, we compare it to the existing functional mechanism [
7], the relaxed functional mechanism [
10], the objective perturbation [
8], and the noisy gradient descent [
12] method. Note that, similar to objective perturbation, FM and relaxed FM, the proposed Gaussian FM injects randomness in a single round, and therefore does not require privacy accounting. However, the noisy gradient descent method involves addition of noise in each step the gradient is computed. That is, noise is added to the computed gradients of the parameters of the objective function during optimization. Since it is a multiround algorithm, the overall
$\u03f5$ used during optimization is different from the
$\u03f5$ for every iteration. We follow the analysis procedure outlined in [
6] for the privacy accounting of the noisy gradient descent algorithm. Note that Proposition 3 described in
Section 2.1 is defined for functions with unit
${\mathcal{L}}_{2}$sensitivity. Therefore, if a noise from
$N(0,{\tau}^{2})$ is added to a function with sensitivity
$\Delta $, then the resulting mechanism satisfies
$(\alpha ,\frac{\alpha}{2\frac{{\tau}^{2}}{{\Delta}^{2}}})$RDP. Now, according to Proposition 3, the
Tfold composition of Gaussian mechanisms satisfies
$(\alpha ,\frac{\alpha T}{2\frac{{\tau}^{2}}{{\Delta}^{2}}})$RDP. Finally, according to Proposition 1, it also satisfies
$({\u03f5}_{r}+\frac{log\frac{1}{{\delta}_{r}}}{\alpha 1},{\delta}_{r})$differential privacy for any
$0\le {\delta}_{r}\le 1$, where
${\u03f5}_{r}=\frac{\alpha T}{2\frac{{\tau}^{2}}{{\Delta}^{2}}}$. For a given value of
${\delta}_{r}$, we can express the value of the optimal overall
${\u03f5}_{\mathrm{opt}}$ as a function of
${\alpha}_{\mathrm{opt}}$:
where
${\alpha}_{\mathrm{opt}}$ is given by
We compute the overall
$\u03f5$ following this procedure for the noisy gradient descent algorithm [
12] in our experiments in
Section 6.
6. Experimental Results
In this section, we empirically compare the performance of our proposed Gaussian FM algorithm (
gaussfm) with those of some stateoftheart differentially private linear and logistic regression algorithms, namely noisy gradient descent (
noisygd) [
12], objective perturbation (
objpert) [
8], original functional mechanism (
fm) [
7], and relaxed functional mechanism (
rlxfm) [
10]. We also compare the performance of these algorithms with nonprivate linear and logistic regression (
nonpriv). As mentioned before, we compute the overall
$\u03f5$ using RDP for the multiround
noisygd algorithm. Additionally, we show how our proposed decentralized functional mechanism (
capefm) can improve a decentralized computation if the target function has sensitivity satisfying the conditions of Proposition 5 in
Section 2.1. We show the variation in performance with privacy parameters and number of training samples. For the decentralized setting, we further show the empirical performance comparison by varying the number of sites.
Performance Indices. For the linear regression task, we use the mean squared error (MSE) as the performance index. Let the test dataset be
${\mathbb{D}}_{\mathrm{test}}=\{({\mathbf{x}}_{n},{y}_{n})\in \mathcal{X}\times \mathcal{Y}:n\in \left[{N}_{\mathrm{test}}\right]\}$. Then the MSE can be defined as:
$\mathrm{MSE}=\frac{1}{{N}_{\mathrm{test}}}{\sum}_{n=1}^{{N}_{\mathrm{test}}}{({\widehat{y}}_{n}{y}_{n})}^{2}$, where
${\widehat{y}}_{n}$ is the prediction from the algorithm. For the classification task, we use accuracy as the performance index. The accuracy can be defined as:
$\mathrm{Accuracy}=\frac{1}{{N}_{\mathrm{test}}}{\sum}_{n=1}^{{N}_{\mathrm{test}}}\mathcal{I}\left(\right)open="("\; close=")">\mathtt{round}\left({\widehat{y}}_{n}\right)={y}_{n}$, where
$\mathcal{I}(\xb7)$ is the indicator function, and
${\widehat{y}}_{n}$ is the prediction from the algorithm. Note that, in addition to a small MSE or large accuracy, we want to attain a strict privacy guarantee, i.e., small overall
$(\u03f5,\delta )$ values. Recall from
Section 3 that the overall
$\u03f5$ for multishot algorithms is a function of the number of iterations, the target
$\delta $, the additive noise variance
${\tau}^{2}$ and the
${\mathcal{L}}_{2}$ sensitivity
$\Delta $. To demonstrate the overall
$\u03f5$ guarantee for a fixed target
$\delta $, we plotted the overall
$\u03f5$ (with dotted red lines on the right
yaxis) along with MSE/accuracy (with solid blue lines on the left
yaxis) as a means for visualizing how the privacy–utility tradeoff varies with different parameters. For a given privacy budget (or performance requirement), the user can use the overall
$\u03f5$ plot on the right
yaxis, shown with dotted lines, (or MSE/accuracy plot on the left
yaxis, shown with solid lines) to find the required noise standard deviation
$\tau $ on the
xaxis and, thereby, find the corresponding performance (or overall
$\u03f5$). We compute the overall
$\u03f5$ for the
noisygd algorithm using the RDP technique shown in
Section 3.
6.1. Linear Regression
For the linear regression problem, we perform experiments on three real datasets (and a synthetic dataset, as shown in
Appendix B). The
pharmacogenetic dataset was collected by the
International Warfarin Pharmacogenetics Consortium (
IWPC) [
23] for the purpose of estimating personalized warfarin dose based on clinical and genotype information of a patient. The data used for this study have ambient dimension
$D=9$, and features are collected from
$N=5052$ patients. Out of the wide variety of numerical modeling methods used in [
23], linear regression provided the most accurate dose estimates. Fredrikson et al. [
20] later implemented an attack model assuming an adversary who employed an inference algorithm to discover the genotype of a target individual, and showed that an existing functional mechanism (
fm) failed to provide a meaningful privacy guarantee to prevent such attacks. We perform privacypreserving linear regression on the
IWPC dataset (
Figure 1a–c) to show the effectiveness of our proposed
gaussfm over
fm,
rlxfm, and other existing approaches. Additionally, we use the
Communities and Crime dataset (
crime) [
45], which has a larger dimensionality
$D=101$ (
Figure 1d–f), and the
Buzz in Social Media dataset (
twitter) [
46] with
$D=77$ and a large sample size
$N=10,000$ (
Figure 1g–i). We refer the reader to [
47] for a detailed description of these real datasets. For all the experiments, we preprocess the data so that the samples satisfy the assumptions
$\parallel {\mathbf{x}}_{n}{\parallel}_{2}\le 1$ and
${y}_{n}\in [1,1]$∀
$n\in \left[N\right]$. We divide each dataset into train and test partitions with a ratio of 90:10. We show the average performance over 10 independent runs.
Performance Comparison with Varying $\tau $. We first investigate the variation of MSE with the DP additive noise standard deviation
$\tau $. We plot MSE against
$\tau $ in
Figure 1a,d,g. Recall from Definition 3 that, in the Gaussian mechanism, the noise is drawn from a Gaussian distribution with standard deviation
$\tau =\frac{\Delta}{\u03f5}\sqrt{2log\frac{1.25}{\delta}}$. We keep
$\delta $ fixed at
${10}^{5}$. Note that one can vary
$\u03f5$ to vary
$\tau $. Since noise standard deviation is inversely proportional to
$\u03f5$, increasing
$\u03f5$ means decreasing
$\tau $, i.e., smaller noise variance. We observe from the plots that smaller
$\tau $ leads to smaller MSE for all DP algorithms, indicating better utility at the expense of higher privacy loss. It is evident from these MSE vs.
$\tau $ plots that our proposed method
gaussfm has much smaller MSE compared to all the other methods for the same
$\tau $ values for all datasets. The
objpert and
fm algorithms offer pure DP by trading off utility, whereas
gaussfm and
rlxfm algorithms offer approximate DP. Although
rlxfm improves upon
fm, the excess noise due to linear dependence on data dimension
D leads to higher MSE than
gaussfm. Our proposed
gaussfm outperforms all of these methods by reducing the additive noise with the novel sensitivity analysis as shown in
Section 4. We recall that the overall privacy loss for
noisygd is calculated using the RDP approach, since noise is injected into the gradients in every iteration during optimization, with target
$\delta ={10}^{5}$. On the other hand,
gaussfm,
rlxfm, and
fm add noise to the polynomial coefficients of the cost function
${f}_{D}\left(\mathbf{w}\right)$ before optimization, and
objpert injects noise into the regularized cost function [
8]. We plot the total privacy loss for all of the algorithms against
$\tau $. We observe from the
yaxis on the right that the total privacy loss of the multiround
noisygd is considerably higher than the singleshot algorithms.
Performance Comparison with Varying ${N}_{train}$. Next, we investigate the variation of MSE with the number of training samples
${N}_{train}$. For this task, we shuffle and divide the total number of samples
N into smaller partitions and perform the same preprocessing steps, while keeping the test partition untouched. We kept the values of the privacy parameters fixed:
$\u03f5=0.5$ and
$\delta ={10}^{5}$. We plot MSE against
${N}_{train}$ in
Figure 1b,e,h. We observe that performance generally improves with the increase in
${N}_{train}$, which indicates that it is easier to ensure the same level of privacy when the training dataset cardinality is higher. We also observe from the MSE vs.
${N}_{train}$ plots that our proposed method
gaussfm offers MSE very close to that of
nonpriv even for moderate sample sizes, outperforming
fm,
rlxfm,
noisygd, and
objpert. Again, we compute the overall
$\u03f5$ spent using RDP for
noisygd, and show that the multiround algorithm suffers from larger privacy loss. Recall from (
7) in
Section 3 that the overall
$\u03f5$ depends on sensitivity
$\Delta $, and the number of iterations
T. In the computation of
$\frac{{\tau}^{2}}{{\Delta}^{2}}$, the number of training samples
${N}_{train}$ is cancelled out. Thus, the overall
$\u03f5$ depends only on
T for
noisygd. We keep
T fixed at 1000 iterations for
noisygd and observe that the overall privacy risk exceeds 20. Note that we set the value of the target
${\delta}_{r}$ in (
7) to be equal to
$\delta $ in our computations.
Performance Comparison with Varying $\delta $. Recall that we can interpret the privacy parameter
$\delta $ as the probability that an algorithm fails to provide privacy risk
$\u03f5$. The
objpert and
fm algorithms offer pure
$\u03f5$DP, where the additional privacy parameter
$\delta $ is zero. Hence, we compare our proposed
gaussfm method with the
rlxfm and
noisygd methods, which also guarantee (
$\u03f5$,
$\delta $)DP. In the Gaussian mechanism,
$\delta $ is in the denominator of the logarithmic term within the square root in the expression of
$\tau $. Therefore, the noise variance
${\tau}^{2}$ is not significantly changed by varying
$\delta $. We keep privacy parameter
$\u03f5$ fixed at
$0.5$ and observe from the MSE vs.
$\delta $ plots in
Figure 1c,f,i show that the performance of our algorithm does not degrade much for smaller
$\delta $. For the
IWPC dataset in
Figure 1c, for a value of
$\delta $ as small as
${10}^{2}$ (indicating
$1\%$ probability of the algorithm failing to provide
$\u03f5$differential privacy), the MSE of
gaussfm is almost the same as that of the
nonpriv case. For the other datasets, our proposed method also gives better performance and overall
$\u03f5$, and thus a better privacy–utility tradeoff than
rlxfm and
noisygd.
6.2. Logistic Regression
For the logistic regression problem, we again perform experiments on three real datasets (and a synthetic dataset, as shown in
Appendix B): the
Phishing Websites dataset (
phishing) [
47] with dimensionality
$D=30$ (
Figure 2a–c), the
Census Income dataset (
adult) [
47] with
$D=13$ (
Figure 2d–f), and the
KDD Cup ’99 dataset (
kdd) [
47] with
$D=36$ (
Figure 2g–i). As before, we preprocess the data so that the feature vectors satisfy
$\parallel {\mathbf{x}}_{n}{\parallel}_{2}\le 1$, and
${y}_{n}\in \left(\right)open="\{"\; close="\}">0,1$∀
$n\in \left[N\right]$. Note for
objpert that the cost function is regularized and the labels are assumed to be
$\left(\right)$ in [
8]. We divide each dataset into train and test partitions with a ratio of 90:10. We use percent accuracy on the test dataset as the performance index for logistic regression, and show the average performance over 10 independent runs.
Performance Comparison with Varying $\tau $. We plot accuracy against the DP additive noise standard deviation
$\tau $ in
Figure 2a,d,g. We observe that accuracy degrades when the additive DP noise standard deviation
$\tau $ increases, indicating a greater privacy guarantee at the cost of performance. When noise is too high, privacypreserving logistic regression may not learn a meaningful
$\mathbf{w}$ at all, and provide random results. Depending on the class distribution, this may not be obvious and the accuracy score may be misleading. We observe this for the
kdd dataset in
Figure 2g, where the classes are highly imbalanced, with ∼80% positive labels. Although the existing
fm performs poorly on this dataset, our proposed
gaussfm provides significantly higher accuracy for all datasets, outperforming
fm, as well as
rlxfm,
objpert, and
noisygd. As before, we observe the total privacy loss, i.e., overall
$\u03f5$ spent, from the
yaxis on the right.
Performance Comparison with Varying ${N}_{train}$. We perform the same steps described in
Section 6.1 and observe the variation in performance with the number of training samples,
${N}_{train}$ while keeping the privacy parameters fixed in
Figure 2b,e,h. Accuracy generally improves with increasing
${N}_{train}$. We observe that the same DP algorithm does not perform equally well for different datasets. For example,
objpert performs better than
noisygd on the
adult dataset (
Figure 2e), whereas
noisygd performs better than
objpert on the
phishing dataset (
Figure 2b). In general,
fm and
rlxfm suffer from too much noise due to the quadratic and linear dependence on
D of their sensitivities, respectively. However, our proposed
gaussfm overcomes this issue and consistently achieves accuracy close to the
nonpriv case even for moderate sample sizes. We also show the overall privacy guarantee, as before.
Performance Comparison with Varying $\delta $. Similar to the linear regression experiments shown in
Section 6.1, we keep
$\u03f5$ and
${N}_{train}$ fixed for this task and vary the other privacy parameter
$\delta $.
Figure 2c,f,i show that percent accuracy improves with increased
$\delta $. For sufficiently large
$\delta $ (indicating 1–5% probability of the algorithm failing to provide
$\u03f5$ privacy risk),
gaussfm accuracy can reach that of the
nonpriv algorithm in some datasets (e.g.,
Figure 2i). Although the accuracy of
noisygd also improves, it comes at the cost of additional privacy risk, as shown in the overall
$\u03f5$ vs.
$\delta $ plots along the
yaxes on the right. Due to the higher noise variance,
rlxfm achieves much inferior accuracy compared to both
gaussfm and
noisygd.
6.3. Decentralized Functional Mechanism ($\mathsf{capeFM}$)
In this section, we empirically show the effectiveness of
$\mathsf{capeFM}$, our proposed decentralized Gaussian FM which utilizes the
$\mathsf{CAPE}$ [
6] protocol. We implement differentially private linear and logistic regression for the decentralizeddata setting using the same datasets described in
Section 6.1 and
Section 6.2, respectively. Note that the IWPC [
23] data were collected from 21 sites across 9 countries. After obtaining informed consent to use deidentified data from patients prior to the study, the Pharmacogenetics Knowledge Base has since made the dataset publicly available for research purpose. As mentioned before, the type of data contained in the IWPC dataset is similar to many other medical datasets containing private information [
20].
We implement our proposed
capefm according to Algorithm 3, along with
fm,
rlxfm,
objpert, and
noisygd according to the conventional decentralized DP approach. We compare the performance of these methods in
Figure 3 and
Figure 4. Similar to the pooleddata scenario, we also compare performance of these algorithms with nonprivate linear and logistic regression (
nonpriv). For these experiments, we assume
${N}_{s}=\frac{N}{S}$ and
${\tau}_{s}=\tau $. Recall that the
$\mathsf{CAPE}$ scheme achieves the same noise variance as the pooleddata scenario in the symmetric setting (see Lemma 1 [
6] in
Section 2.1). As our proposed
$\mathsf{capeFM}$ algorithm follows the
$\mathsf{CAPE}$ scheme, we attain the same advantages. When varying privacy parameters and
${N}_{train}$, we keep the number of sites
S fixed. Additionally, we show the variation in performance due to change in the number of sites in
Figure 5. We preprocess each dataset as before, and use MSE and percent accuracy on test dataset as performance indices of the decentralized linear and logistic regression problems, respectively.
Performance Comparison by Varying $\tau $. For this experiment, we keep the total number of samples
N, privacy parameter
$\delta $, and the number of sites
S fixed. We observe from the plots (a), (d), and (g) in both
Figure 3 and
Figure 4 that as
$\tau $ increases, the performance degrades. The proposed
capefm outperforms conventional decentralized
noisygd,
objpert,
fm, and
rlxfm by a larger margin than the pooleddata case. The reason for this is that we can achieve a much smaller noise variance at the aggregator due to the correlated noise scheme detailed in
Section 5.3. The utility of
capefm thus stays the same as the centralized case in the decentralizeddata setting, whereas the conventional scheme’s utility always degrades by a factor of
S (see
Section 5.1). The overall
$\u03f5$ usage vs.
$\tau $ plots on the right yaxes for each site show that
noisygd suffers from much higher privacy loss.
Performance Comparison by Varying ${N}_{train}$. We keep
$\u03f5$,
$\delta $, and
S fixed while investigating variation in performance with respect to
${N}_{train}$. As the sensitivities we computed in
Section 4.1 and
Section 4.2 are inversely proportional to the sample size, it is straightforward to infer that guaranteeing smaller privacy risk and higher utility is much easier when the sample size is large. Similar to the pooleddata cases in
Section 6.1 and
Section 6.2, we again observe from the plots (b), (e), and (h) in both
Figure 3 and
Figure 4 that, for sufficiently large
${N}_{train}=S{N}_{s,train}$, utility of
capefm can reach that of the
nonpriv case. Note that the
nonpriv algorithms are the same as the pooleddata scenario, because if privacy is not a concern, all sites can send the data to aggregator for learning.
Performance Comparison by Varying $\delta $. For this task, we keep
$\u03f5$,
${N}_{train}$, and
S fixed. Note according to the
$\mathsf{CAPE}$ scheme that the proposed
capefm algorithm guarantees
$(\u03f5,\delta )$DP where
$(\u03f5,\delta )$ satisfy the relation
$\delta =2\frac{{\sigma}_{z}}{\u03f5{\mu}_{z}}\varphi \left(\frac{\u03f5{\mu}_{z}}{{\sigma}_{z}}\right)$. Recall that
$\delta $ is the probability that the algorithm fails to provide privacy risk
$\u03f5$, and that we assumed a fixed number of colluding sites
${S}_{C}=\lceil \frac{S}{3}\rceil 1$. From the plots (c), (f), and (i) in both
Figure 3 and
Figure 4, we observe that even for moderate values of
$\delta $,
capefm easily outperforms
rlxfm and
noisygd. Moreover, as seen from the overall
$\u03f5$ plots,
noisygd provides a much weaker privacy guarantee. Thus, our proposed
capefm algorithm offers superior performance and privacy–utility tradeoff in the decentralized setting.
Performance Comparison by Varying S. Finally, we investigate performance variation with the number of sites
S, keeping the privacy and dataset parameters fixed. This automatically varies the number of samples
${N}_{s}$ at each site
$s\in \left[S\right]$, as we consider the symmetric setting.
Figure 5a–c shows the results for decentralized linear regression, and
Figure 5d–f shows the results for decentralized logistic regression. We observe that the variation in
S does not affect the utility of
capefm, as long as the number of colluding sites meets the condition
${S}_{C}\le \lceil \frac{S}{3}\rceil 1$. However, increasing
S leads to significant degradation in performance for conventional decentralized DP mechanisms, since the additive noise variance increases as
${N}_{s}$ decreases. We show additional experimental results on synthetic datasets in
Appendix B.