Next Article in Journal
Information-Theoretic ESG Index Direction Forecasting: A Complexity-Aware Framework
Previous Article in Journal
A Multi-Objective Framework for Biomethanol Process Integration in Sugarcane Biorefineries Under a Multiperiod MILP Superstructure
Previous Article in Special Issue
Improving the Minimum Free Energy Principle to the Maximum Information Efficiency Principle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Utility–Leakage Trade-Off for Federated Representation Learning

1
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
2
Lehrstuhl für Nachrichtentechnik, Technical University Dortmund, 44227 Dortmund, Germany
3
Information Theory and Security Laboratory (ITSL), Linköping University, 581 83 Linköping, Sweden
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(11), 1163; https://doi.org/10.3390/e27111163
Submission received: 12 October 2025 / Revised: 13 November 2025 / Accepted: 13 November 2025 / Published: 15 November 2025
(This article belongs to the Special Issue Information-Theoretic Approaches for Machine Learning and AI)

Abstract

Federated representation learning (FRL) is a promising technique for learning shared data representations that capture general features across decentralized clients without sharing raw data. However, there is a risk of sensitive information leakage from learned representations. The conventional differential privacy (DP) mechanism protects the privacy of the whole data by randomizing (adding noise or random response) at the cost of deteriorating learning performance. Inspired by the fact that some data information may be public or non-private and only sensitive information (e.g., race) should be protected, we investigate the information-theoretic protection on specific sensitive information for FRL. To characterize the trade-off between utility and sensitive information leakage, we adopt mutual information-based metrics to measure utility and sensitive information leakage, and propose a method that maximizes the utility performance, while restricting sensitive information leakage less than any positive value ϵ via the local DP mechanism. Simulation demonstrates that our scheme can achieve the best utility–leakage trade-off among baseline schemes, and more importantly can adjust the trade-off between leakage and utility by controlling the noise level in local DP.

1. Introduction

The rapid advancement of machine learning has exposed significant practical challenges in traditional approaches, particularly as data volumes generated by edge devices—such as smartphones and IoT sensors—continue to grow exponentially. Conventional machine learning methods, which rely on the centralized aggregation of raw datasets, increasingly struggle with scalability, communication bottlenecks, and operational inefficiency. In response to these limitations, federated learning (FL) [1] has emerged as a pivotal framework addressing inherent weaknesses in traditional machine learning. By avoiding the need to collect raw data in a central location, FL not only mitigates critical privacy concerns and reduces communication overhead but also helps comply with stringent regulatory constraints.
In FL, multiple edge clients jointly train models by computing local model parameters or gradients and sharing them with a central server for aggregation. To improve the generalization capabilities and support various machine learning tasks (like classification or recognition), federated representation learning (FRL) combines the principles of FL and representation learning. It captures the underlying structure of the data and learns robust representations.
Unfortunately, the extracted representations may pose a potential risk of sensitive information leakage. Specifically, if these representations inadvertently encode sensitive attributes—such as demographic or health information—they can become a proxy for reconstructing them, thereby serving as a primary vector for privacy violations. As a fundamental principle of machine learning, privacy seeks to safeguard private and sensitive information—such as patient healthcare records or social network addresses and political affiliations—throughout the entire life cycle of a model, from training to deployment. This work addresses privacy risks arising from model outputs during the deployment phase, which stem from the leakage of sensitive information through the released data. To protect privacy, differential privacy (DP) [2,3,4,5,6,7] is the most popular context-free notion of privacy, and its key idea is to add noise to the output or randomize data to conceal private information before sharing. However, DP would negatively impact model performance and does not provide any guarantee on the average or maximum information leakage [8]. Moreover, in many scenarios, some data information is public or non-private, and only sensitive information (e.g., race and gender) needs to be protected. For example, in AI recommendation applications, some users might share their favorite restaurants to receive tailored recommendations, while others prioritize sensitive information leakage and avoid sharing such information. Information-theoretic (IT) privacy focuses on designing mechanisms and metrics to protect privacy. It uses information theory metrics (like f-divergences and mutual information) to quantify the trade-off between privacy and utility, measuring how much information an adversary can infer about private features from released data. Building upon this theoretical foundation, subsequent research has focused on adapting and applying these principles to privately disclose useful information. As an early exploration, ref. [9] proposed a mutual information-based privacy metric specifically designed for testing the effectiveness of privacy-preserving data mining algorithms. From a different perspective, one seminal work [10] addresses the fundamental problem of disclosing useful information under perfect privacy constraints for a variable S. It establishes a necessary and sufficient condition: non-trivial disclosure is possible if and only if the smallest principal inertia component of the joint distribution (S, X) is zero. Furthermore, the work derives tight bounds for this trade-off and provides explicit constructions of privacy-assuring mappings that achieve these bounds. From a more practical perspective, ref. [11] pursued a data-driven framework for optimizing privacy-preserving data release mechanisms to attain the information-theoretically optimal privacy–utility trade-off. An adversarially trained neural network is introduced to implement randomized mechanisms and to perform a variational approximation of mutual information privacy. Moreover, the privacy–utility trade-off in data release under a rate constraint is investigated in [12], which can be considered as a generalization of both the information bottleneck and privacy funnel problems. A necessary and sufficient condition for the existence of positive utility under perfect privacy is established in this paper. Furthermore, a general family of optimization problems, termed complexity-leakage-utility bottleneck (CLUB), is introduced in [13], which provides a unified theoretical framework that generalizes and unifies a wide spectrum of information-theoretic privacy models.
In this paper, we consider a statistical framework where mutual information serves as the metric for both utility and leakage [10,13,14,15]. Consider a utility–leakage problem with the Markov chain ( Y , S ) X Z , where X denotes the random variable of the input data, Y represents the target objective (e.g., label), S is the sensitive information to be protected, and Z denotes the extracted representation Z. Information utility and leakage are measured by I ( Z ; Y ) and I ( S ; Z ) , respectively. It is worth mentioning that all the aforementioned works focused on classical representation learning, instead of federated representation learning. It is uncertain whether their schemes still preserve sensitive information protection, considering the local model updates, global aggregation, and the complicating factor of data heterogeneity.
In this paper, we propose a leakage-restrained federated learning framework that theoretically guarantees the protection of sensitive information by using ϵ -local DP (LDP) mechanisms on the extracted representation. This can upper bound the maximum information leakage of sensitive information by I ( S ; Z ) ϵ , regardless of local model updates, global aggregation, or data heterogeneity. The proof follows by using the data processing inequality and leveraging the connection between ϵ -LDP and mutual information. Furthermore, leveraging the upper bound on maximum information leakage, we propose a simple yet efficient training loss function that involves minimizing the conditional uncertainty H ( X | Z , S ) , offering a simpler alternative to directly minimizing I ( S ; Z ) (intractable for most high-dimensional data sources). Simulation results demonstrate that our scheme can achieve the best utility–leakage trade-off among baseline schemes, and more importantly, can tune the trade-off between leakage and utility by controlling the noise level in local DP.
Notations: We denote random variables by capital letters and their realizations by lowercase letters. The probability distribution of a random variable X is denoted by P X and its probability density function by p X ( x ) . Given a finite set S , | S | denotes its cardinality. We may drop the capital letter when it is clear from the context (e.g., p X ( x ) = p ( x ) ), and use a subscript to emphasize the dependence of the measures on the choice of distribution parameterization (e.g., p ϕ ( z ^ | x ) ). The expectation is denoted by E [ · ] . The Shannon entropy and mutual information are denoted by H ( X ) and I ( X ; Y ) , respectively.

2. Problem Formulation

2.1. FRL with Sensitive Attribute

Consider an FRL system consisting of one central server and K devices indexed by K = { 1 , 2 , . . . , K } , as shown in Figure 1.
Each device k K has a local training dataset D k with | D k | data samples ( x , y , s ) D k , where x is the observed data sample, y is the corresponding utility attribute, and s is sensitive attribute to be protected (e.g., gender). The entire dataset is denoted by D = k K D k .
Let l ( f ( x ; ω ) , s , y ) denote the sample-wise loss function on data sample ( x , y , s ) , where f ( · , ω ) is the model f parameterized by ω . The model f can be decomposed as f ( x ; ω ) = ϕ ( ψ ( x ) ) = ϕ ( z ) , where ψ is the encoder, z = ψ ( x ) is the representation vector, and ϕ is the decoder.
The local loss function of device k is given by L k ( ω ) = 1 | D k | ( x , y , s ) D k l ( f ( x ; ω ) , s , y ) . Accordingly, the global loss function is given by L ( ω ) = k = 1 K | D k | | D | L k ( ω ) . The objective of the federated learning system is to train a global model f ( · ; ω ) that minimizes the global loss L ( ω ) , i.e., min ω L ( ω ) .

2.2. Sensitive Information Leakage–Utility Model

Given input X, its representation Z = ψ ( X ) , and its ground-truth label Y, we define the utility metric as the mutual information between Z and Y as follows:
Utility : I ( Z ; Y ) = H ( Y ) H ( Y | Z ) .
A higher value of I ( Z ; Y ) indicates that the representation contains more useful information about the target label Y. When H ( Y | Z ) = 0 , the representation can perfectly estimate Y.
Given sensitive information S and extracted representation Z, we define the sensitive information leakage as
Sensitive information leakage : I ( S ; Z ) .
The definition in (2) characterizes the amount of sensitive information contained in the representation relevant to the sensitive attribute S. Additionally, by data processing inequality, any operation on Z will leak sensitive information no more than I ( S ; Z ) .
With the definitions above, we formally define the theoretical guarantee of sensitive information leakage as follows.
Definition 1. 
Given any positive value ϵ, an FRL system with sensitive attribute S and representation Z is said to be ϵ-sensitive information leakage if I ( S ; Z ) ϵ .
Our goal is to design an FRL method that maximizes sufficient information for utility, while maintaining the system with sensitive information leakage restriction, i.e., ensuring ϵ -sensitive information leakage guarantee. The optimization problem can be formulated as follows:
max p ( z | x ) I ( Y ; Z ) s . t . I ( S ; Z ) ϵ .
Remark 1. 
When the sensitive attribute s is associated with fairness attributes such as gender, race, etc., problem (3) can be viewed as a fairness problem where the predictions should be unbiased across different groups. This formulation aligns with the established works [16,17,18,19].

3. Leakage-Restrained Federated Representation Learning

3.1. Proposed FRL Framework

Directly solving the optimization problem in (3) is infeasible, mainly because simultaneously maximizing I ( Y ; Z ) and minimizing I ( S ; Z ) via one encoder is challenging, and also because the joint distribution p ( s , y , z ) is hard to obtain. To address this issue, we alternatively minimize the upper bound on I ( S ; Z ) , whose computation does not necessarily explicitly require p ( s , z ) .
Rewrite the mutual information I ( S ; Z ) as follows:
I ( X , S ; Z ) = I ( X ; Z ) + I ( S ; Z | X )
= ( a ) I ( X ; Z ) ,
I ( X , S ; Z ) = I ( S ; Z ) + I ( X ; Z | S ) ,
where (a) holds due to the Markov chain ( Y , S ) X Z . From (4) and (5), we have
I ( S ; Z ) = I ( X ; Z ) I ( X ; Z | S ) .
By data processing inequality and Markov chain ( Y , S ) X Z , we have I ( S ; Z ) I ( X ; Z ) . With the observation (6), we aim to upper bound I ( X ; Z ) by ϵ
I ( X ; Z ) ϵ ,
and this will bring us two important advantages.
  • According to Definition 1, I ( S ; Z ) I ( X ; Z ) ϵ assures that the system is ϵ -sensitive information leakage guarantee.
  • If I ( X ; Z ) ϵ , then
    I ( S ; Z ) = I ( X ; Z ) I ( X ; Z | S ) ϵ I ( X ; Z | S ) .
    This enable us to minimize the upper bound of I ( S ; Z ) ϵ I ( X ; Z | S ) as follows:
    min ϵ I ( X ; Z | S ) min ϵ ( H ( X | S ) H ( X | Z , S ) ) ) ( a ) min H ( X | Z , S ) ,
where (a) holds since H ( Z | S ) is a constant given the dataset. Minimizing H ( X | Z , S ) could be easily solved by constructing a decoder that recovers X from representation Z and S.
To achieve I ( X ; Z ) ϵ , we adopt the ϵ -LDP mechanism [20], denoted by M ϵ ( · ) , which is defined as follows:
Definition 2. 
For any ϵ > 0 , a randomized mechanism M ϵ ( x ) satisfies ϵ-LDP if and only if for every x x X and any measurable set C W , it holds
Pr [ M ϵ ( X ) = C | X = x ] Pr [ M ϵ ( X ) = C | X = x ] e ϵ .
Unlike the conventional privacy FL scheme that adds random noise on the local parameter ω k , we place the ϵ -LDP mechanism after the output of the parameterized feature vector extractor, denoted by ψ ˜ ( · ) . Thus, the process of the model can be presented as a Markov chain
( Y , S ) X ψ ˜ ( · ) V M ϵ ( · ) Z ,
where V = ψ ˜ ( X ) and Z = ψ ( X ) = M ϵ ( ψ ˜ ( X ) ) . With the ϵ -LDP mechanism on the extracted feature V, we can guarantee the ϵ -sensitive information leakage and achieve I ( X ; Z ) I ( V ; Z ) ϵ ; the detailed proof will be provided later in Section 3.2.
From (7), (8), and the optimization problem in (3), we parameterize the encoding distribution p ( z | x ) and introduce the following objective function L ( p θ ( z | x ) , β ) :
min p θ ( z | x ) I ( Z ; Y ) + β H ( X | S , Z ) .
Let q θ 1 ( y | z ) be a parameterized variational approximation of p ( y | z ) , and q θ 2 ( x | z , s ) be a parameterized variational approximation of p ( x | z , s ) . The variational upper bound of (11) can be obtained as follows:
I ( Z ; Y ) + β H ( X | S , Z ) = H ( Y ) + H ( Y | Z ) + β H ( X | S , Z ) E p θ ( x , y , s , z ) ( log p ( y | z ) + β log p ( x | s , z ) ) = ( a ) E p ( x , y , s ) p θ ( z | x ) ( log p ( y | z ) + β log p ( x | s , z ) ) ( b ) E p ( x , y , s ) p θ ( z | x ) ( log q θ 1 ( y | z ) + β log p ( x | s , z ) ) ( c ) E p ( x , y , s ) p θ ( z | x ) ( log q θ 1 ( y | z ) + β log q θ 2 ( x | s , z ) ) ,
where (a) follows from Markov chain ( Y , S ) X Z , (b) holds since D K L p ( y | z ) | | q θ 1 ( y | z ) 0 , and (c) holds since D K L p ( x | s , z ) | | q θ 2 ( x | s , z ) 0 .
Given N data points { x ( i ) } i = 1 N , as well as the corresponding samples of utility and sensitive variables { y ( i ) , s ( i ) } i = 1 N , we now form the Monte Carlo estimation for (12) by sampling M realizations { z ( i , j ) } j = 1 M of representation z from p θ ( z | x ) for each data point x ( i ) . We have the Monte Carlo approximate estimation of (12) as L ( p θ ( z | x ) , β , q θ 1 ( y | z ) , q θ 2 ( x | s , z ) ) :
L ( p θ ( z | x ) , β , q θ 1 ( y | z ) , q θ 2 ( x | s , z ) ) = 1 N i = 1 N ( 1 M j = 1 M [ log q θ 1 ( y ( i ) | z ( i , j ) ) + β log q θ 2 ( x ( i ) | z ( i , j ) , s ( i ) ) ] ) .
The loss function L k for client k with local dataset D k is
L k = 1 | D k | ( x , y , s ) D k log q θ 1 k ( y | z ) + β log q θ 2 k ( x | z , s ) ,
where θ 1 k and θ 2 k are parameters of the utility decoder and side decoder, respectively.
Based on our proposed loss function (14), we have designed the local learning network at client k, as shown in Figure 2. The proposed framework consists of four modules:
  • The feature extractor ψ ˜ k ( x ) = p θ k ( v | x ) parameterized by θ k encodes the original data x into feature vector v.
  • The ϵ -LDP mechanism M ϵ ( · ) maps the feature v to an obfuscated representation z.
  • The utility decoder takes the representation z as input and predicts utility variable as y ^ .
  • The side decoder takes both representation z and sensitive attribute s as inputs to reconstruct input data as x ^ .
Then, we process the proposed loss function (14) by defining the following:
  • q θ 1 k ( y | z ) B ( y ^ ) (Bernoulli)
  • q θ 2 k ( x | z , s ) N ( x ^ , 1 ) (Gaussian)
The resulting loss function of client k in our optimization becomes
L k = 1 | D k | ( x , y , s ) D k l e ( y , y ^ ) + β l m ( x , x ^ ) ,
where l e denotes the cross-entropy loss and l m represents the mean squared error (MSE) loss. The complete training framework is outlined in Algorithm 1.
Algorithm 1 FRL with sensitive information protection.
Input: 
Global update rounds T, K clients, client datasets { D 1 , D 2 , . . . , D K }, learning rate η , initialization model parameters ω 0 = { θ 0 , θ 1 0 , θ 2 0 }
Output: 
Aggregated model parameters ω T = { θ T , θ 1 T , θ 2 T }
  • for  t = 0 , 1 , 2 , . . . , T 1  do
  •     Server executes:
  •     Broadcast parameter ω t
  •     Receive { ω k t + 1 } k = 1 K from K clients
  •      ω t + 1 1 K k = 1 K ω k t + 1
  •     Client k executes:
  •     //Update local ω k t + 1 = { θ k t + 1 , θ 1 k t + 1 , θ 2 k t + 1 }
  •     Receive global model parameters ω t
  •     Update local parameters ω k t + 1 ω t
  •     for  ( x , y , s ) D k  do
  •          v FeatureExtractor p θ k t + 1 ( v | x )
  •          z ϵ - LDP M ϵ ( v )
  •          x ^ SideDecoder (z,s; θ 2 k t + 1 )
  •          y ^ UtilityDecoder (z; θ 1 k t + 1 )
  •         Compute L k from (15), ω k t + 1 ω k t + 1 η L k
  •     end for
  •     return ω k t + 1 to server
  • end for

3.2. Guarantee of Sensitive Information Protection

The following theorem shows that our framework with the ϵ -LDP mechanism on the extracted feature V can guarantee the ϵ -sensitive information leakage and achieve I ( X ; Z ) ϵ .
Theorem 1. 
Consider an FRL framework in Section 3 with sensitive attribute S, feature extractor ψ ˜ : X V , and ϵ-LDP mechanism M ϵ ( · ) : V Z . The representation Z = ψ ( X ) = M ϵ ( V ) where V = M ϵ ( ψ ˜ ( X ) ) satisfies
I ( X ; Z ) I ( Z ; V ) ϵ ,
I ( S ; Z ) ϵ I ( X ; Z | S ) ϵ .
Proof. 
From the definition of ϵ -LDP, we obtain that v v
p ( M ϵ ( V ) = z | V = v ) p ( M ϵ ( V ) = z | V = v ) e ϵ .
Then
p ( z ) = E V p ( v ) [ p ( z | v ) ] = E V p ( v ) [ p ( M ϵ ( V ) = z ) | V = v ] E V p ( v ) [ p ( M ϵ ( V ) = z | V = v ) e ϵ ] = p ( z | v ) e ϵ , z Z , v , v V .
Thus, the mutual information I ( Z ; V ) can be bounded as
I ( Z ; V ) = E p ( z , v ) [ log p ( z , v ) p ( z ) p ( v ) ]
E p ( z , v ) [ log p ( z | v ) p ( z | v ) e ϵ ] = ϵ ,
where the inequality follows by (19). With the Markov chain ( Y , S ) X V Z and (21), by data processing inequality, we have I ( X ; Z ) I ( V ; Z ) ϵ .
From (6) and (16), we have I ( S ; Z ) ϵ I ( X ; Z | S ) . This completes the proof of Theorem 1. □
Remark 2. 
The proof requires no assumption on data distributions and model update methods, indicating that Theorem 1 always holds regardless of data heterogeneity. Since the ϵ-LDP mechanism M ϵ ( · ) : V Z always exists in both the local and global models, the theoretical guarantee I ( S ; Z ) ϵ holds for both local and global updates.

4. Simulation Results

In this section, we first present the simulation setting for the FL environment, datasets, simulation metrics, and baselines, and then evaluate the performance of the proposed framework.
Datasets: We perform simulations on two real-world datasets, including an income-prediction dataset, known as the Adult dataset from the UCI Machine learning Repository [21], and the ProPublica COMPAS dataset [22]. The COMPAS dataset comprises 4320 training data samples and 1852 testing data samples, each featuring 11 variables indicating race, age, sex, among others. The Adult dataset comprises 32,561 training data samples and 16,281 testing data samples, each with 14 variables indicating age, workclass, sex, and more.
FL environment: We consider an FRL system with 20 clients and one server. For the COMPAS dataset, each client samples 1500 data samples, and we select race information as the sensitive attribute S and recidivism outcome as the utility attribute Y. For the Adult dataset, each client samples 6000 data samples, and we choose gender as the sensitive attribute S and income as the utility attribute Y (predicting whether the annual earning of an individual is more than 50 K per year).
Simulation metrics: Utility performance is measured by mutual information I ( Z ; Y ) and the inference accuracy of Y, while leakage performance is measured by mutual information I ( Z ; S ) and the inference accuracy of S. I ( S ; Z ) and I ( Y ; Z ) offer direct quantification of the information contained in the representation Z about utility label Y and the sensitive attribute S, which can be regarded as utility information and sensitive information leakage. Consequently, the trade-off between utility and leakage can be quantified by I ( Y ; Z ) I ( S ; Z ) . Mutual information has an intrinsic drawback as a metric: the marginal and joint distributions are typically unknown, making the direct computation of mutual information between two variables intractable. To address this, the Mutual Information Neural Estimator (MINE) maximizes a lower bound on mutual Information as an alternative estimation. However, this approach introduces two key issues:
  • The estimated lower bound may fail to closely approximate the true mutual information, particularly when its actual value is small.
  • Neural network-based estimation can suffer from high variance. This problem is amplified when dealing with high-dimensional data.
To mitigate potential estimation inaccuracies, we employ the inference accuracy of predicting Y and S from Z as another metric. Higher inference accuracy for these variables validates a stronger dependency and indicates that Z encapsulates more relevant information about them. In addition, the difference in inference accuracy between Y and S serves as a complementary metric for evaluating the trade-off.
Baselines: While existing private FL methods defend against privacy attacks during training by addressing privacy risks in gradients and parameters, our approach focuses on reducing the leakage of sensitive attributes from the deployed model’s outputs. Since our method is compatible with most private federated learning approaches, we have selected baseline methods based on representation learning and federated learning to demonstrate its effectiveness. We compare our scheme with the baselines that combine FedAvg [1] with centralized representation learning with sensitive attribute protection, including the privacy funnel optimization-based method PPVAE [23], disentanglement-focused method FFVAE [24], variational approach VFAE [25], and latent distribution learning-based FSNS [26], as well as the raw data without sensitive information protection.
Implementation: We implement the neural network models for the Adult and COMPAS datasets utilizing fully connected layers, where both models share the same architecture presented in Table 1, with input dimension d i n = 10 for COMPAS and d i n = 13 for the Adult dataset. The representation dimensions have been uniformly established as d z = 2 , while the sensitive attribute dimensions have been consistently set to d s = 1 . A one-layer fully connected network is adopted as the inference model for the utility attribute Y, while a random forest classifier serves as the inference model for the sensitive attribute S, enabling the acquisition of inference accuracy rates for both the utility label Y and the sensitive attribute S.
Figure 3 and Figure 4 depict the tradeoff between leakage and utility across β as well as ϵ { 2 , 4 , 6 , 8 , 100 , 1000 } . Under a higher ϵ , due to the smaller noise scale, β has a greater impact on the model’s performance. As β increases, the performance on sensitive information protection increases while utility performance decreases, which is consistent with the results obtained from our loss function. The reduction in ε leads to a lower upper bound on leakage and a larger noise variance. The increasing noise scale reduces the leakage of sensitive information but degrades the utility. As can be observed, utility performance varies more drastically with changes in β , indicating that compared to leakage, utility is more susceptible to the influence of β .
Figure 5 and Figure 6 depict the trade-off between sensitive information leakage and information utility across a range of ϵ when β { 10 3 , 10 2 , 10 1 , 1 , 10 1 } on the COMPAS and Adult datasets. As the inference accuracy of the utility attribute increases, the inference accuracy of the sensitive attribute also increases, indicating the inherent trade-off between utility and leakage. Given an ϵ , we can attain diverse results by adjusting β . With low ϵ , the proposed scheme can achieve low sensitive information leakage at the expense of utility performance, indicating that less information leakage can be achieved by adding noise at the expense of the utility performance. As can be observed, noise substantially affects the utility–leakage trade-off of the overall model, while the effective range of β is noticeably affected by the magnitude of the noise.
To ensure a fair comparison, we tune the hyperparameters of all compared methods and select the hyperparameters with a similar utility level. Owing to the low dimensionality of the dataset, we chose a dimension of 2 for d z to conduct our experiments. As shown in Table 2, our proposed scheme not only mitigates sensitive information leakage but also achieves a favorable trade-off, as validated by both inference accuracy and mutual information. The failure of FFVAE’s representation learning is evident in label classification accuracy, where classifiers using its extracted representations collapse to predicting only the majority class (all 0 s or 1 s). Consequently, we exclude its performance on other metrics and represent them with “-”.
As can be seen from Table 2, when the representation dimension d z is 2, the utility performance of the compared methods on the COMPAS dataset is already close to that of using raw data directly for inference. However, the disentanglement methods (FFVAE) often require higher dimensions and the compared methods have not fully realized their potential on the Adult dataset, leaving room for enhancement. Therefore, based on the input dimension of two datasets, we conducted further experiments with the representation dimension d z set to 4.
Based on Table 3, it can be observed that FFVAE still performs poorly, while all other methods show improvements in utility. On the COMPAS dataset, the improvement in utility is relatively limited, as the performance of all methods was already strong at d z = 2 . In contrast, the enhancement is more pronounced on the Adult dataset. From the perspective of sensitive information leakage, the inference accuracy of S exhibits different trends across the two datasets, with slight increases in some cases and significant decreases in others. On the other hand, mutual information I ( S ; Z ) shows a clear increase in sensitive information leakage, which is reasonable as larger dimensions of feature vectors would potentially incur more information leakage.
We further evaluate the scalability of our method by extending it to accommodate multiple sensitive attributes. This is achieved by generalizing the objective in Equation (11) to include an additional term H ( X | S 2 , Z ) for a second sensitive attribute, coupled with an additional side decoder dedicated to it. To assess the impact of this extension, we conduct a comparative analysis between configurations with one and two sensitive attributes. We select two distinct sensitive attributes for our evaluation: we designate Prior as the second attribute ( S 2 ) on the COMPAS dataset, and relationship as S 2 on the Adult dataset. The first sensitive attribute ( S 1 ) for each dataset retains the original setting as described in the FL environment. The performance is presented in Table 4.
With the introduction of a second sensitive attribute to protect, our method experiences a decline in both the utility of the data and the level of protection for the first sensitive attribute. This is because we must now strip more sensitive information from the learned representations, and the optimization process must balance preventing leakage from both attributes, which weakens its focus on removing correlations with the original one. Nevertheless, our method successfully strikes a balance, maintaining effective protection against leakage for both sensitive attributes while ensuring an acceptable level of utility preservation.

5. Conclusions

In this paper, we focus on FRL that specifically protects sensitive information, such as race or gender. We propose a method that simultaneously maximizes utility information in representations while restricting sensitive information leakage by the LDP mechanism and minimizing the upper bound of sensitive information leakage. We prove that our method theoretically guarantees sensitive information leakage below a predefined positive threshold and empirically demonstrate that it can achieve a better trade-off than baselines.
As further research directions, one could derive a tighter bound for information leakage. A more refined bound for leakage enables the design of more effective schemes for optimizing the balance between data leakage and model utility. In addition, enhancing the robustness of extracted features against noise (perturbation) also represents a highly valuable research direction. Through the augmentation of feature robustness, a simultaneous reduction in information leakage and an improvement in utility performance can be attained when employing noise-based protection mechanisms. Furthermore, the leakage–utility trade-off can be further enhanced by selectively improving the robustness of those features within the feature vector that exhibits lower correlation with the sensitive attribute. Finally, extending the current framework to accommodate scenarios with multiple sensitive attributes constitutes an important and compelling avenue for future research.

Author Contributions

Conceptualization, Y.L., O.G. and Y.W.; Methodology, Y.L., O.G. and Y.W.; Software, Y.L.; Validation, Y.L., O.G., Y.S. and Y.W.; Formal analysis, Y.L., O.G., Y.S. and Y.W.; Investigation, Y.L., O.G., Y.S. and Y.W.; Resources, Y.W.; Writing—original draft, Y.L.; Writing—review & editing, Y.L., O.G., Y.S. and Y.W.; Supervision, Y.S. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

O. Günlü was partially supported by the ZENITH Research and Leadership Career Development Fund under Grant ID23.01 and the Swedish Foundation for Strategic Research (SSF) under Grant ID24-0087.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  2. Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  3. Song, S.; Chaudhuri, K.; Sarwate, A.D. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 245–248. [Google Scholar]
  4. Bassily, R.; Smith, A.; Thakurta, A. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, Philadelphia, PA, USA, 18–21 October 2014; pp. 464–473. [Google Scholar]
  5. Geyer, R.C.; Klein, T.; Nabi, M. Differentially private federated learning: A client level perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar]
  6. McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning differentially private recurrent language models. arXiv 2017, arXiv:1710.06963. [Google Scholar]
  7. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
  8. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
  9. Agrawal, D.; Aggarwal, C.C. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbra, CA, USA, 21–23 May 2001; pp. 247–255. [Google Scholar]
  10. Calmon, F.P.; Makhdoumi, A.; Médard, M. Fundamental limits of perfect privacy. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1796–1800. [Google Scholar] [CrossRef]
  11. Tripathy, A.; Wang, Y.; Ishwar, P. Privacy-Preserving Adversarial Networks. arXiv 2017, arXiv:1712.07008. [Google Scholar]
  12. Sreekumar, S.; Gündüz, D. Optimal Privacy-Utility Trade-off under a Rate Constraint. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 2159–2163. [Google Scholar] [CrossRef]
  13. Razeghi, B.; Calmon, F.P.; Gunduz, D.; Voloshynovskiy, S. Bottlenecks CLUB: Unifying Information-Theoretic Trade-Offs Among Complexity, Leakage, and Utility. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2060–2075. [Google Scholar] [CrossRef]
  14. du Pin Calmon, F.; Fawaz, N. Privacy against statistical inference. In Proceedings of the 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 1–5 October 2012; pp. 1401–1408. [Google Scholar] [CrossRef]
  15. Gündüz, D.; Gomez-Vilardebo, J.; Tan, O.; Poor, H.V. Information theoretic privacy for smart meters. In Proceedings of the 2013 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 10–15 February 2013; pp. 1–7. [Google Scholar]
  16. Rodríguez-Gálvez, B.; Thobaben, R.; Skoglund, M. A variational approach to privacy and fairness. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021; pp. 1–6. [Google Scholar]
  17. Hamman, F.; Dutta, S. Demystifying local and global fairness trade-offs in federated learning using information theory. In Proceedings of the International Conference on Machine Learning 2023, Honolulu, HI, USA, 28 July 2023. [Google Scholar]
  18. Kang, J.; Xie, T.; Wu, X.; Maciejewski, R.; Tong, H. Infofair: Information-theoretic intersectional fairness. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 1455–1464. [Google Scholar]
  19. Ghassami, A.; Khodadadian, S.; Kiyavash, N. Fairness in supervised learning: An information theoretic approach. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 176–180. [Google Scholar]
  20. Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
  21. Asuncion, A.; Newman, D. UCI Machine Learning Repository; University of California: Irvine, CA, USA, 2007. [Google Scholar]
  22. Dieterich, W.; Mendoza, C.; Brennan, T. COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity; Northpointe Inc.: Traverse City, MI, USA, 2016; Volume 7, pp. 1–36. [Google Scholar]
  23. Nan, L.; Tao, D. Variational approach for privacy funnel optimization on continuous data. J. Parallel Distrib. Comput. 2020, 137, 17–25. [Google Scholar] [CrossRef]
  24. Creager, E.; Madras, D.; Jacobsen, J.H.; Weis, M.; Swersky, K.; Pitassi, T.; Zemel, R. Flexibly fair representation learning by disentanglement. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 1436–1445. [Google Scholar]
  25. Louizos, C.; Swersky, K.; Li, Y.; Welling, M.; Zemel, R. The variational fair autoencoder. arXiv 2015, arXiv:1511.00830. [Google Scholar]
  26. Jang, T.; Gao, H.; Shi, P.; Wang, X. Achieving Fairness through Separability: A Unified Framework for Fair Representation Learning. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 2–4 May 2024; Dasgupta, S., Mandt, S., Li, Y., Eds.; Proceedings of Machine Learning Research (PMLR): Cambridge, MA, USA, 2024; Volume 238, pp. 28–36. [Google Scholar]
Figure 1. Federated representation learning framework.
Figure 1. Federated representation learning framework.
Entropy 27 01163 g001
Figure 2. Representation learning model at client k.
Figure 2. Representation learning model at client k.
Entropy 27 01163 g002
Figure 3. Leakage and utility performance on Adult dataset with β { 10 3 , 10 2 , 10 1 , 1 , 10 1 } : (a,b) leakage measured via inference accuracy of sensitive attribute S (gender) and mutual information I ( S ; Z ) ; (c,d) utility measured via inference accuracy of label Y (income) and mutual information I ( Y ; Z ) .
Figure 3. Leakage and utility performance on Adult dataset with β { 10 3 , 10 2 , 10 1 , 1 , 10 1 } : (a,b) leakage measured via inference accuracy of sensitive attribute S (gender) and mutual information I ( S ; Z ) ; (c,d) utility measured via inference accuracy of label Y (income) and mutual information I ( Y ; Z ) .
Entropy 27 01163 g003
Figure 4. Leakage and utility performance on COMPAS dataset with β { 10 3 , 10 2 , 10 1 , 1 , 10 1 } : (a,b) leakage measured via inference accuracy of sensitive attribute S (race) and mutual information I ( S ; Z ) ; (c,d) utility measured via inference accuracy of label Y (recidivism outcome) and mutual information I ( Y ; Z ) .
Figure 4. Leakage and utility performance on COMPAS dataset with β { 10 3 , 10 2 , 10 1 , 1 , 10 1 } : (a,b) leakage measured via inference accuracy of sensitive attribute S (race) and mutual information I ( S ; Z ) ; (c,d) utility measured via inference accuracy of label Y (recidivism outcome) and mutual information I ( Y ; Z ) .
Entropy 27 01163 g004
Figure 5. Utility–leakage trade-offs on COMPAS dataset: (a) trade-off measured by inference accuracy; (b) trade-off measured by mutual information.
Figure 5. Utility–leakage trade-offs on COMPAS dataset: (a) trade-off measured by inference accuracy; (b) trade-off measured by mutual information.
Entropy 27 01163 g005
Figure 6. Utility–leakage trade-offs on Adult dataset: (a) trade-off measured by inference accuracy; (b) trade-off measured by mutual information.
Figure 6. Utility–leakage trade-offs on Adult dataset: (a) trade-off measured by inference accuracy; (b) trade-off measured by mutual information.
Entropy 27 01163 g006
Table 1. The neural network architecture for Adult or compas datasets.
Table 1. The neural network architecture for Adult or compas datasets.
LayerInputOutput
EncoderDense + ReLU
Dense
d i n
100
100
d z
Utility DecoderDense + ReLU
Dense + Sigmoid
d z
100
100
2
Side DecoderDense + ReLU
Dense
d z + d s
100
100
d i n
LDP mechanismLaplacian mechanism d z d z
Table 2. Utility–leakage trade-off on COMPAS and Adult datasets, d z = 2 .
Table 2. Utility–leakage trade-off on COMPAS and Adult datasets, d z = 2 .
DatasetMethodAccuracy (Y) I ( Y ; Z ) Accuracy (S) I ( S ; Z ) I ( Y ; Z ) I ( S ; Z ) Accuracy (Y)-Accuracy (S)
COMPASours0.66910.11920.59830.00980.10940.0708
FFVAE0.53770.011----
PPVAE0.66320.08970.94150.1485−0.0587−0.2783
VFAE0.65460.05670.98340.2865−0.2118−0.3288
FSNS0.67010.07600.62460.02070.05530.0455
Raw data0.67760.17760.68840.2506−0.0729−0.0108
Adultours0.83890.19380.61420.03250.16130.2247
FFVAE0.76370.0----
PPVAE0.78790.16330.74790.07690.08640.040
VFAE0.78650.15550.66720.06960.08590.1193
FSNS0.81260.24230.66890.08500.15730.1437
Raw data0.85270.33740.83910.4150−0.07760.0136
Note: Bold values denote the best results for each metrics.
Table 3. Utility–leakage trade-off on COMPAS and Adult datasets, d z = 4 .
Table 3. Utility–leakage trade-off on COMPAS and Adult datasets, d z = 4 .
DatasetMethodAccuracy (Y) I ( Y ; Z ) Accuracy (S) I ( S ; Z ) I ( Y ; Z ) I ( S ; Z ) Accuracy (Y)-Accuracy (S)
COMPASours0.67170.11030.61870.07580.03440.0530
FFVAE0.53770.0797----
PPVAE0.66270.13840.66390.5099−0.3715−0.0012
VFAE0.66590.11540.59100.6574−0.54200.0749
FSNS0.67220.12260.62930.1778−0.05520.0429
Raw data0.67760.17760.68840.2506−0.0729−0.0108
Adultours0.83640.21360.65950.09900.11460.1769
FFVAE0.76370.0----
PPVAE0.81180.22620.69950.2593-0.03310.1123
VFAE0.80730.20490.66840.13300.07190.1389
FSNS0.83350.27010.70710.22900.04110.1264
Raw data0.85270.33740.83910.4150−0.07760.0136
Note: Bold values denote the best results for each metrics.
Table 4. Performance comparison of our method with different numbers of sensitive attributes.
Table 4. Performance comparison of our method with different numbers of sensitive attributes.
DatasetMethodAccuracy (Y) I ( Y ; Z ) Accuracy ( S 1 ) I ( S 1 ; Z ) Accuracy ( S 2 ) I ( S 2 ; Z )
COMAPSours
S 1 only
0.66910.11920.59830.0098--
ours
S 1 and S 2
0.65330.07640.58900.05010.62950.0178
Adultours
S 1 only
0.83890.19380.61420.0325--
ours
S 1 and S 2
0.81250.14850.66220.03630.69860.1173
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Günlü, O.; Shi, Y.; Wu, Y. Utility–Leakage Trade-Off for Federated Representation Learning. Entropy 2025, 27, 1163. https://doi.org/10.3390/e27111163

AMA Style

Liu Y, Günlü O, Shi Y, Wu Y. Utility–Leakage Trade-Off for Federated Representation Learning. Entropy. 2025; 27(11):1163. https://doi.org/10.3390/e27111163

Chicago/Turabian Style

Liu, Yuchen, Onur Günlü, Yuanming Shi, and Youlong Wu. 2025. "Utility–Leakage Trade-Off for Federated Representation Learning" Entropy 27, no. 11: 1163. https://doi.org/10.3390/e27111163

APA Style

Liu, Y., Günlü, O., Shi, Y., & Wu, Y. (2025). Utility–Leakage Trade-Off for Federated Representation Learning. Entropy, 27(11), 1163. https://doi.org/10.3390/e27111163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop