Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization

Ma, Chao; Dai, Mingkai; Guan, Zhibo; Ye, Zi; Hou, Yikai; Wang, Xiaoyu; Huang, Hai

doi:10.3390/app152413031

Open AccessArticle

Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization

by

Chao Ma

^1,*,

Mingkai Dai

¹,

Zhibo Guan

¹,

Zi Ye

¹,

Yikai Hou

¹,

Xiaoyu Wang

² and

Hai Huang

¹

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China

²

Electric Power Research Institute, State Grid Heilongjiang Electric Power Co., Ltd., Harbin 150040, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(24), 13031; https://doi.org/10.3390/app152413031

Submission received: 31 October 2025 / Revised: 25 November 2025 / Accepted: 26 November 2025 / Published: 10 December 2025

(This article belongs to the Special Issue Advances in Cyber Security)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Coded text representations often contain a large amount of personal sensitive information, which can easily lead to problems such as privacy leakage and model prediction bias. Most of the existing methods focus on optimizing a single objective, making it difficult to achieve an effective balance between model performance, fairness and privacy protection. For this reason, this paper proposes a new adaptive feature representation learning method, AMF-DP (adaptive matrix factorization with differential privacy). The method combines adaptive matrix factorization with a differential privacy technique to effectively improve the fairness of the model while realizing privacy protection. The experimental results show that AMF-DP is able to achieve a better balance between privacy protection, fairness, and model performance, providing a new way of thinking for text feature representation learning that takes into account multi-objective optimization.

Keywords:

text representation; differential privacy; adaptive matrix decomposition; fairness; privacy protection; multi-objective optimization

1. Introduction

With the rapid development of natural language processing (NLP) technology, text feature representation learning has become an important foundation for driving practical applications such as information retrieval, sentiment analysis, and content recommendation. The emergence of massive text data brings opportunities for performance improvement of NLP models, but also exposes a series of security and ethical risks. On the one hand, encoded text representations often contain a large amount of sensitive user information, such as age, gender, race, etc., which is highly susceptible to leakage or misuse in the process of feature representation learning and model training [1,2,3]. How to effectively prevent the leakage of sensitive information and safeguard user privacy while improving model performance has become a core issue of common concern in the current academic community.

On the other hand, traditional NLP models are prone to bias when dealing with data involving sensitive attributes, leading to unfair treatment of certain groups. Sun et al. [4] pointed out that many sentiment analysis models show significant bias when dealing with texts with gender information, leading to significantly higher probability of negative sentiment scores when the models deal with texts involving women than when dealing with probability when dealing with male texts. These biases not only affect the accuracy of the model, but also lead to unfair treatment of certain groups in practical applications. For example, biases in recruitment systems may affect the chances of job applicants [5], and biases in medical diagnosis systems may affect the treatment plan of patients [6].

Existing privacy-preserving approaches are mainly based on the theory of differential privacy (DP), which reduces the recognizability of sensitive attributes in a model by injecting noise into the feature representation or model gradient to conceal sensitive information [7,8]. However, such approaches tend to focus on privacy guarantees while ignoring the non-uniform impact of noise injection on the model performance of different groups, leading to a significant decrease in the accuracy of the model on certain groups, and even further exacerbating inter-group unfairness [9,10]. Meanwhile, fairness-oriented optimization methods mostly adopt strategies such as debiased regularization, adversarial training, or reweighting, aiming to reduce the model’s dependence on sensitive attributes, but these methods usually lack systematic protection against privacy threats and are difficult to effectively prevent the leakage of sensitive information [11,12,13].

In recent years, some works have attempted to combine privacy preservation and fairness optimization, such as introducing fairness constraints under differential privacy mechanisms or adding privacy preserving modules to fair representation learning frameworks [14,15,16]. However, these approaches mainly adopt a simple superposition of the two types of constraints and lack an in-depth portrayal of the complex interrelationships between the two, resulting in models that often have difficulty in balancing privacy, fairness, and the performance of the main task when pursuing privacy, fairness, and performance of the main task. In addition, these methods significantly increase the computational complexity and system overhead due to the need to introduce additional regular terms, adversarial networks, or multi-stage optimization in the model training process. In text categorization tasks, the presence of high-dimensional sparse features and redundant information further amplifies the shortcomings of these methods in terms of efficiency and scalability, making noise injection and fairness constraints even more technically challenging in practice.

To address the above issues, this paper proposes the AMF-DP method, which introduces a flexible end-to-end architecture that innovatively integrates adaptive matrix decomposition with dynamic differential privacy mechanisms and introduces the concept of multi-objective collaborative optimization to systematically balance the three objectives of model performance, fairness, and privacy protection.

The main contributions of this paper include three aspects:

1.: We propose an adaptive low-rank embedding learning (ALEL) module that reformulates matrix factorization as a learnable low-dimensional embedding process. By jointly optimizing task loss, fairness regularization, and privacy-related reconstruction loss, ALEL produces text representations that are simultaneously discriminative, less dependent on sensitive attributes, and less vulnerable to privacy leakage.
2.: We design a collaborative noise injection mechanism that combines self-attention with adaptive sensitivity optimization. Differential privacy noise is dynamically allocated according to feature importance and inter-group disparities, which allows AMF-DP to protect sensitive information while maintaining accuracy and fairness for different demographic groups.
3.: We introduce an adaptive dual-stream (ADS) encoder that processes fairness-related and privacy-related features in two parallel streams and fuses them through a dynamic gating module. This design enables fine-grained control over the trade-off between performance, fairness, and privacy, and leads to a more balanced solution than existing single-stream DP or fairness methods.
4.: From a practical point of view, AMF-DP acts as a plug-and-play encoder for standard Transformer-based text classifiers. It provides a concrete training recipe and evaluation protocol for jointly optimizing accuracy, TPR-gap/GRMS, Leakage, and MDL. Extensive experiments on three public datasets show that AMF-DP consistently reduces fairness gaps and sensitive-attribute leakage while keeping accuracy and training cost at a level comparable to strong baselines such as FedFair-DP, making it suitable for real-world text classification systems with multi-objective requirements.

The remainder of this paper is organized as follows. Section 2 reviews existing work on differential privacy, fairness optimization, and privacy–fairness joint learning in text representation. Section 3 presents the proposed AMF-DP framework, including the adaptive low-rank embedding module, the collaborative noise injection mechanism, and the adaptive dual-stream encoder. Section 4 describes the experimental setup, baselines, and results, including ablation and trade-off analyses. Section 5 concludes the paper and outlines directions for future work.

2. Related Work

2.1. DP in Text Representation Learning

Text feature representation learning is a fundamental part of many downstream tasks in natural language processing, but it often unintentionally encodes a large amount of sensitive information, including age, gender, geographic location, social identity, etc., when processing data. It has been found that even if text features are deeply abstracted, attackers can still mine hidden sensitive attributes from them through techniques such as reconstruction attacks and attribute inference [3,17]. Therefore, how to achieve effective privacy protection in the text characterization phase has become an important issue of wide concern in the academic community.

Differential privacy (DP), as a strict privacy protection framework, can effectively reduce the risk of sensitive information leakage by introducing noise during data processing and model training. In recent years, differential privacy has been widely used in various aspects of text feature representation learning, as follows: Ghazi et al. [18] proposed to introduce a differential privacy mechanism during the training word embedding process, which limits the impact of a single text on the word vector by cropping and adding noise to the gradient, thus preventing sensitive words from being captured and reduced by the model; Smith et al. [19] introduced a differential privacy optimization method in the fine-tuning stage of pre-training models, such as BERT, which ensures the model performance while effectively reducing the sensitive attributes of the training text the risk of being inferred in reverse; Wang et al. [20] further proposed adding noise directly to text feature vectors (e.g., sentence vectors, and document vectors) before they are released to ensure that the publicly available feature vectors satisfy the differential privacy requirements, preventing the attacker from reconstructing the original text or inferring sensitive attributes through the features.

Nonetheless, the application of existing differential privacy methods to textual feature representation learning still faces a number of challenges. First, the high-dimensional sparsity of the text feature space leads to a more drastic impact of noise injection on the model performance, which especially performs poorly on low-resource or long-tail group samples [21]. Second, the distribution of sensitive attributes in text is extremely complex, and a single noise injection strategy is difficult to take into account the privacy needs of all groups, and may even lead to a decrease in the model’s ability to discriminate vulnerable groups, thus triggering new fairness risks [22]. In addition, most of the current privacy protection mechanisms mainly focus on the global privacy budget and lack the ability to dynamically adapt to individual or group differences, making it difficult to achieve personalized privacy protection [23].

2.2. Fairness Optimization in NLP Models

With the wide deployment of NLP models in various practical applications, the problem of model fairness has received increasing attention. NLP models often inadvertently learn social biases in the data during the training process, leading to systematic injustice of the model outputs to specific groups, including the discrimination of sensitive attributes such as gender, ethnicity, age, etc. [11,24]. Therefore, how to realize fairness guarantee in the learning stage of text feature representation has become an important research direction in the field of natural language processing.

Existing fairness methods mainly mitigate the unfairness in the model through technical means such as projection elimination, adversarial training, and fairness constraints. For example, projection elimination reduces the model’s dependence on sensitive attributes by removing components related to sensitive attributes in the representation space [25]; adversarial training makes it difficult for the model to distinguish sensitive attributes while learning effective features by introducing adversarial loss [26]; fairness constraints limit the model’s performance differences among different groups by directly introducing fairness-related regular terms in the loss function [27]; in addition, data processing methods such as data reweighting and resampling have been used to mitigate a priori bias in training data [28].

Although existing fairness optimization methods alleviate the bias problem of NLP models to a certain extent, they still face many practical dilemmas and technical bottlenecks in practical applications. First, while removing the sensitive attribute information, these techniques may mistakenly hurt the useful features related to the main task, resulting in model performance degradation or even bringing in new non-interpretability [29]. Second, many methods are able to achieve fairness constraints on specific datasets, but the effectiveness of fairness guarantee decreases significantly when facing domain migration or distribution drift [30]. Also, some of the methods require high computational resources and system deployment environments, which limits their feasibility in practical large-scale applications [31].

2.3. Privacy–Fairness Joint Optimization

A growing line of research attempts to jointly address privacy and fairness. Lyu et al. [14] designed a federated deep learning framework that imposes adversarial fairness constraints under DP-SGD based training. Ghoukasian and Asoodeh [16] and Yang et al. [15] studied differentially private fair binary classification and collaborative filtering, respectively, by combining group fairness constraints with differential privacy mechanisms. Maheshwari et al. [32] further investigated fair NLP models with differentially private text encoders (FedFair-DP), where DP-SGD is used to train a private encoder and an adversarial classifier is introduced to reduce the dependence of the representations on sensitive attributes. These approaches demonstrate the feasibility of combining DP and fairness, but they are mainly designed at the gradient or parameter level and focus on model outputs rather than the structure of the representation space.

However, the above methods are not directly tailored to the high-dimensional sparse nature of textual features and the multi-objective trade-offs considered in this paper. First, most existing approaches inject homogeneous noise into gradients or parameters and do not explicitly decouple representation learning, noise calibration, and fairness control. As a consequence, the same amount of noise is applied to all features and groups, which can cause severe performance degradation on minority groups when the privacy budget is tight [22,33]. Second, adversarial debiasing based methods typically require additional discriminators and multi-stage optimization, greatly increasing computational and deployment costs, especially in resource-constrained or latency-sensitive scenarios.

Building on these observations, we design AMF-DP as a representation-level framework that explicitly couples adaptive matrix factorization, feature- and group-aware noise allocation, and a dual-stream encoder with dynamic fusion gates. The adaptive low-rank embedding learning module jointly optimizes reconstruction, task loss, fairness regularization, and privacy-related reconstruction error, thereby isolating sensitive information before noise injection. The collaborative noise injection module allocates differential privacy noise according to feature importance and group disparities, enabling fine-grained control of the privacy–fairness trade-off. Finally, the ADS encoder maintains two dedicated channels for fairness and privacy and adjusts their contributions with a learnable gate, which allows AMF-DP to achieve a more balanced optimization of accuracy, fairness, and privacy than existing DP-only or fairness-only methods, as confirmed by the experimental results in Section 4.

3. Approach

3.1. Model Overview

As shown in Figure 1, the AMF-DP (adaptive matrix factorization with differential privacy) method focuses on the “collaborative optimization of privacy protection and fairness.” The overall architecture consists of three core modules: adaptive low-rank embedding learning (Section 3.2), collaborative noise injection (Section 3.3), and dual-stream embedding encoder (Section 3.4). These modules are tightly coupled, forming an end-to-end multi-objective collaborative optimization framework.

First, the original text input X is processed by an adaptive low-rank embedding learning module to extract a low-rank feature representation

X_{low}

. This module uses matrix decomposition and feature selection to effectively compress redundant information, reduce noise interference, and preliminarily isolate sensitive attributes, laying the foundation for subsequent privacy protection and fairness optimization.

Next, the low-rank feature

X_{low}

is input into the dynamic collaborative noise injection module. This module innovatively combines self-attention mechanisms and adaptive sensitive optimization techniques to comprehensively assess the importance of each feature and group differences, dynamically allocating differential privacy noise to ultimately generate privacy-enhanced features

X_{private}

.

Subsequently, the privacy-enhanced features

X_{private}

are fed into the dual-stream embedding privacy transformer (ADS encoder). This encoder adopts a parallel dual-stream structure, performing deep feature transformation through fairness and privacy channels, respectively. The two feature streams are then adaptively fused in the dynamic gated fusion module to obtain the final fused feature representation

E_{fused} (x)

.

Finally, the fused feature representation

E_{fused} (x)

is input to the main task classifier c to complete the downstream discrimination task, achieving feature utilization that balances privacy protection and fairness.

In this study, we focus on the comprehensive performance of AMF-DP in terms of model performance, fairness, and privacy protection by training and evaluating the system on the above end-to-end model structure. Specifically, we evaluate the model outputs using several quantitative metrics to validate the effectiveness and advantages of the proposed method in multi-objective co-optimization scenarios.

The overall training procedure of AMF-DP is summarized in Algorithm 1.

Algorithm 1 Training procedure of AMF-DP

1:: Input: Training dataset $D = {(x_{i}, y_{i}, s_{i}, a_{i})}_{i = 1}^{N}$ ; learning rate $η$ ; clipping bound C; fairness weight $λ_{1}$ ; privacy weight $λ_{2}$ ; noise scale $σ$ ; privacy budget $(ε_{total}, δ)$ ; number of epochs T.
2:: Output: Parameters $(U, V)$ , ADS encoder $θ$ , and classifier c.
3:: Initialize model parameters and the DP moments accountant
4:: for $t = 1$ to T do
5:: ▹ (1) Adaptive low-rank embedding learning (Section 3.2)
6:: Sample a mini-batch $B \subset D$
7:: Apply EWMA-based standardization to $B$ and obtain X
8:: Compute low-rank embeddings $Z = U V^{⊤}$
9:: Compute losses $L_{task}$ , $L_{fair}$ , $L_{priv}$ using Equations (3)–(5)
10:: Optionally update embedding dimension k based on $L_{fair}$ and $L_{priv}$
11:: ▹ (2) Collaborative noise injection (Section 3.3)
12:: for each feature do
13:: Compute importance $I_{i}$ and group disparity $Δ S_{i}$ via Equations (6) and (7)
14:: Compute combined scores $C_{i}$ and attention weights $a_{i}$ using Equations (8) and (9)
15:: Sample Laplace noise $d_{i}$ based on $a_{i}$ and DP budget
16:: Set $η_{i} = a_{i} \cdot d_{i}$
17:: end for
18:: Compute privacy-enhanced features $X_{priv} = X + η$
19:: ▹ (3) ADS encoder and classifier (Section 3.4)
20:: Feed $X_{priv}$ into fairness and privacy streams to obtain fair_feat and priv_feat
21:: Compute dynamic gate g using Equation (15)
22:: Fuse streams to obtain $E_{fused} (x)$ by Equation (16)
23:: Predict labels $\hat{y} = c (E_{fused} (x))$
24:: Compute overall loss $L = L_{task} + λ_{1} L_{fair} + λ_{2} L_{priv}$
25:: ▹ (4) Parameter and privacy accounting update
26:: Clip per-example gradients to norm C
27:: Add DP noise according to $σ$
28:: Update parameters using learning rate $η$
29:: Update moments accountant with current sampling rate and noise scale
30:: end for
31:: return $(U, V, θ, c)$

3.2. Adaptive Low-Rank Embedding Learning

In real-world scenarios involving high-dimensional heterogeneous data, traditional matrix decomposition methods struggle to effectively balance feature discriminability, fairness, and privacy protection, especially when data distribution is uneven or sensitive attributes are present. Conventional decomposition methods are prone to information loss or leakage of sensitive information [34,35,36]. To this end, this paper proposes an adaptive low-rank embedding learning (ALEL) method based on an adaptive mechanism, which transforms the matrix decomposition process into a learnable low-dimensional embedding space construction process, and incorporates task discriminativeness, fairness constraints, and privacy-preservation requirements into the decomposition objective, so as to achieve a synergistic balance among discriminability, fairness, and privacy in the learned embeddings.

Before matrix decomposition, to ensure the consistency of input data distribution across different batches and time points, this paper employs the Exponential Weighted Moving Average (EWMA) method for dynamic standardization. EWMA updates based on the weighted average of each feature’s historical values and current input values, thereby smoothing data fluctuations and ensuring consistent data scales across different time points or tasks. The specific calculation is as follows:

X_{s t d} = \frac{x_{t} - u_{t}}{\sqrt{σ^{2} + ϵ}}

(1)

Among them,

x_{t}

is the current data point,

u_{t}

and

σ^{2}

are the current mean and variance, respectively, and

ϵ

is a very small constant to prevent division by zero and to stabilize normalization when

σ^{2}

is close to zero. In practice,

σ^{2}

can become zero for features that take an identical value within a mini-batch (e.g., binary attributes after stratified sampling), which would otherwise lead to extremely large scaled values and unstable gradients. We therefore set

ϵ = 10^{- 6}

in all experiments. After the original data is dynamically standardized as described above, it can more accurately reflect the actual distribution characteristics and provide high-quality input for subsequent embedding learning.

Next, the joint optimization objective is designed and the joint optimization objective function is defined as

min_{U, V} L_{t a s k} (Z, Y) + λ_{1} L_{f a i r} (Z, S) + λ_{2} L_{p r i v} (Z, A)

(2)

where

Z = U V^{'}

is the low-rank embedding representation, Y is the task label, S is the sensitive attribute, A is the privacy-related feature, and

λ_{1}

and

λ_{2}

are the weight coefficients.

Here,

λ_{1}

controls the strength of the fairness regularizer

L_{fair}

, whereas

λ_{2}

weights the privacy-related reconstruction term

L_{priv}

. In practice, we first normalize the three losses so that they have comparable magnitudes and then treat

λ_{1}

and

λ_{2}

as trade-off coefficients between accuracy, fairness, and privacy. We restrict

λ_{1}, λ_{2} \geq 0

and select them on a held-out validation set through a small grid search. Across all datasets, we found that setting

λ_{1} = 0.5

and

λ_{2} = 0.1

provides a stable balance between classification accuracy, TPR-gap/GRMS, and Leakage; varying these coefficients within

λ_{1} \in [0.4, 0.7]

and

λ_{2} \in [0.05, 0.15]

leads to changes within 1–2 percentage points on the validation metrics, indicating that AMF-DP is relatively robust to their exact values.

Task loss:

L_{task} (f_{θ} (Z), Y) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{c = 1}^{C} y_{i c} log f_{θ} {(z_{i})}_{c}

(3)

Fairness regularity term (maximum mean difference): Make the embedding Z as “irrelevant” to the sensitive attribute S as possible. In this way, the optimization will drive the embedding means of different sensitive attribute groups closer together and reduce the expression of sensitive attribute information in the embedding.

L_{fair} (Z, S) = {∥\frac{1}{| S_{0} |} \sum_{i \in S_{0}} z_{i} - \frac{1}{| S_{1} |} \sum_{i \in S_{1}} z_{i}∥}_{2}^{2}

(4)

where

S_{0}, S_{1}

denote two subsets of values of the sensitive attribute, respectively, and

z_{i}

is the low-rank embedding of the i-th sample.

Privacy sensitivity suppression term (reconstruction error): Make the embedding Z difficult to restore the privacy attribute A. When training, the goal is to maximize this loss so that the privacy attribute is difficult to reconstruct.

L_{priv} (Z, A) = \frac{1}{n} \sum_{i = 1}^{n} {∥ g_{ϕ} (z_{i}) - a_{i} ∥}_{2}^{2}

(5)

where

g_{ϕ}

is the privacy attribute reconstructor and

a_{i}

is the privacy attribute of sample i.

In the dimension selection of embedding, this paper introduces an adaptive adjustment mechanism. By monitoring changes in the target loss during training, the embedding dimension k is dynamically adjusted to minimize the separability of sensitive information while ensuring task performance. Specifically, if

L_{f a i r}

or

L_{p r i v}

does not reach the preset threshold, k is appropriately reduced to enhance feature compression and desensitization; otherwise, k is appropriately increased to ensure discriminative power.

From an implementation perspective, the matrix factors

U

and

V

are trained jointly with the encoder parameters using the Adam optimizer. To stabilize training and avoid exploding updates, we apply

ℓ_{2}

weight decay and gradient clipping to the parameters of

U

and

V

, and initialize them with a truncated SVD of the initial Transformer representations. This warm-start makes the optimization more stable and accelerates convergence. During deployment, only the low-rank factors

U

and

V

, rather than the full high-dimensional matrices, are stored, so the additional memory cost of matrix training remains moderate.

This dynamic adjustment mechanism makes the extracted features more stable and reliable. Additionally, the reduced-dimension data has fewer features compared to the original high-dimensional data, resulting in lower computational complexity for adding noise, minimal impact on fairness, and better privacy protection for the data.

3.3. Collaborative Noise Injection

As shown in Figure 2, the collaborative noise injection module achieves a dynamic balance between privacy protection and model fairness by combining self-attention mechanisms and adaptive sensitivity optimization techniques. Its core idea is to dynamically allocate differential privacy noise through a comprehensive assessment of feature importance

I_{i}

and group diversity

Δ S_{i}

, thereby protecting sensitive information while reducing bias in the model across different groups.

First, the importance of each feature

I_{i}

is dynamically evaluated through the self-attention mechanism, and combined with the group difference indicator

Δ S_{i}

for multi-angle analysis. Feature importance

I_{i}

measures the contribution of each feature to the model output, while group variability

Δ S_{i}

measures the distribution differences of a specific feature across different groups. If a feature exhibits significant differences across groups, it may lead to unfairness, necessitating greater attention during privacy protection. The calculation formula is as follows:

I_{i} = \frac{1}{n} \sum_{j = 1}^{n} |\frac{\partial y_{j}}{\partial x_{i}}|

(6)

Δ S_{i} = \frac{1}{m} \sum_{g = 1}^{m} V a r_{g, i}

(7)

where n is the total number of samples,

y_{j}

is the model prediction output for the jth data point,

x_{i}

is the feature vector of the i-th data point, and

V a r_{g, i}

denotes the variance of the i-th feature in group g. By dynamically adjusting the weights of feature importance

I_{i}

and group variability

Δ S_{i}

, we can effectively avoid any single feature or group dominating the model, thereby enhancing the overall fairness of the model while protecting privacy.

In order to strike the best balance between privacy protection and fairness, AMF-DP introduces a comprehensive assessment score

c_{i}

, which is calculated as follows:

C_{i} = α \cdot I_{i} - β \cdot Δ S_{i}

(8)

Among them,

α

and

β

are balance weights, which control the weights of feature importance and group differences in the comprehensive evaluation, allowing the priority of privacy protection and fairness to be flexibly adjusted according to specific task requirements. When privacy protection requirements are high, the weight of

α

can be increased to provide important features with higher protection; when fairness requirements are high, the weight of

β

can be increased to focus more on differences between groups.

In order to keep

C_{i}

in a stable range, we normalize both

I_{i}

and

Δ S_{i}

into

[0, 1]

and impose the constraint

α + β = 1

. A larger

α

emphasizes preserving task-relevant features through

I_{i}

, while a larger

β

puts more weight on reducing inter-group disparities via

Δ S_{i}

. We select

(α, β)

on the validation split by a small grid search with

α \in 0.5, 0.6, 0.7

and

β = 1 - α

. In all experiments, we use

α = 0.6

and

β = 0.4

, which achieves the best overall trade-off between accuracy, fairness, and privacy-related metrics.

Next, by learning the trainable weight matrix W and bias vector b, and combining them with the attention mechanism, the attention score

z_{i}

is calculated as follows:

z_{i} = W \cdot C_{i} + b

(9)

where we use the softmax function to normalize them into relative importance scores

a_{i}

. These normalized attention scores dynamically reflect the demand for each feature in the privacy protection process. Noise allocation is dynamically adjusted based on the relative importance

a_{i}

of the feature. The intensity of noise

η_{i}

is proportional to the importance of the feature, with the specific formula being

η_{i} = a_{i} \cdot d_{i}

(10)

where

d_{i}

generates Laplace noise with the same shape as the input data, and

α_{i}

dynamically adjusts the noise intensity to ensure that important features are assigned greater noise and secondary features are assigned lesser noise. The final result is the noisy data:

{\tilde{x}}_{i} = x_{i} + η_{i}

(11)

Among them,

{\tilde{x}}_{i}

is the data point after noise is added, and

x_{i}

is the feature vector of the input data point.

From the viewpoint of differential privacy, the collaborative noise injection module can be regarded as an instantiation of the Laplace mechanism. Let

f (x)

denote the low-rank embedding of a single example and let

Δ_{1} f

be the

ℓ_{1}

-sensitivity of f, which we enforce by clipping each embedding to have

ℓ_{1}

-norm at most C. We normalize the attention scores

a_{i}

into non-negative coefficients

w_{i}

with

\sum_{i} w_{i} = 1

and allocate an individual privacy budget

ε_{i} = w_{i} ε_{noise}

to each feature i, where

ε_{noise}

is the total privacy budget assigned to this module. The noise term in Equation (10) is then sampled as

d_{i} \sim Lap (Δ_{1} f / ε_{i})

, and

η_{i} = α_{i} d_{i}

. According to the standard analysis of the Laplace mechanism, releasing

{\tilde{x}}_{i} = x_{i} + η_{i}

ensures

(ε_{noise}, 0)

-differential privacy with respect to the original embedding

f (x)

for each individual record.

Through this dynamic noise addition mechanism, AMF-DP achieves a balance between privacy protection and fairness: on the one hand, it protects sensitive information and prevents privacy leaks by applying more precise noise control to important features; on the other hand, it reduces bias in the model across different groups by focusing on group differences, thereby improving fairness. Ultimately, this approach not only effectively reduces the negative impact of noise on model performance but also ensures comprehensive optimization of the model in terms of privacy protection and fairness, meeting the diverse needs of different tasks and data scenarios.

3.4. Adaptive Dual-Stream Encoder

As shown in Figure 3, the ADS encoder module is based on a dual-stream processing mechanism, with independent feature processing channels for fairness and privacy protection. It integrates innovative modules such as fairness attention enhancement, differential privacy protection, and dynamic gate integration to achieve hierarchical privacy–fairness protection for features.

Specifically, input features are first parallelly fed into the fairness enhancement stream (Fairness Attention Block) and the privacy protection stream (DP Transformer Block). Within the privacy protection stream, differential privacy noise is injected, while the fairness enhancement stream directly performs bias-aware processing on the original features. Through a dynamic gating fusion module, the two feature streams are adaptively weighted and fused, ensuring that the final encoding results flexibly balance fairness and privacy.

Fairness Attention Block: Fairness enhancement introduces a bias-aware attention mechanism that actively eliminates bias by suppressing correlations between features and sensitive attributes. Specifically, based on the attention mechanism, a fairness enhancement mask is defined as follows:

M_{f a i r} = 1 - corr (K, S)

(12)

where K is the feature representation, S is the sensitive attribute, and

corr (\cdot)

is the correlation coefficient. The attention score is calculated using the following formula:

A t t n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}} ⊙ M_{f a i r}) V

(13)

This mechanism can effectively block the spread of bias, improve the fairness of the model in terms of sensitive attributes, and has the ability to reverse adversarial gradients, further reducing the predictability of features for sensitive attributes.

DP Transformer Block: Privacy protection deeply integrates the Rényi differential privacy mechanism with the Transformer structure, achieving hierarchical protection of feature privacy by injecting noise into each layer of the Transformer. Its core operation is

x^{'} = x + LaplaceNoise (scale = privacy_budget)

(14)

Normalization and safety controls are applied before and after multi-head attention calculations to ensure a balance between the effectiveness of noise perturbations and the expressive power of the model. This mechanism can adaptively adjust noise intensity based on feature sensitivity to achieve dynamic privacy protection.

A similar analysis applies to the DP Transformer Block. Each application of the mechanism

x^{'} = x + LaplaceNoise (scale)

with sensitivity bounded by a clipping constant

C_{ℓ}

provides an

(ε_{ℓ}, 0)

-differential privacy guarantee for the corresponding layer ℓ. Let

ε_{enc} = \sum_{ℓ} ε_{ℓ}

denote the privacy budget assigned to the encoder. During training, these layer-wise mechanisms are repeatedly applied at each optimization step. We track the cumulative privacy loss of all steps using a Rényi-DP based moments accountant, which provides a tighter bound than naïve linear composition. Putting the above together, AMF-DP as a whole satisfies an

(ε_{total}, δ)

-DP guarantee with

ε_{total} = ε_{noise} + ε_{enc}

and a target failure probability

δ

.

Dynamic fusion gate: The ADS encoder introduces a dynamic gate integrator, which is responsible for adaptive fusion of fairness and privacy features to achieve multi-objective trade-offs. Specifically, the gate coefficient g is calculated by the following equation:

g = σ (W_{g} \cdot [\nabla L_{t a s k}; F_{p r i v}; F_{f a i r}])

(15)

Among them,

\nabla L_{t a s k}

is the gradient information of the task loss, and

F_{p r i v}

and

F_{f a i r}

are the privacy cumulative loss and fairness bias, respectively.

The final output is

o u t p u t = g \cdot f a i r_f e a t + (1 - g) \cdot p r i v_f e a t

(16)

Through a gatekeeping mechanism, the model can perceive the current task’s privacy and fairness requirements in real time during training, dynamically adjust the feature fusion ratio, and achieve multi-objective collaborative optimization.

The hierarchical structure and dynamic integration mechanism of the ADS encoder enable it to exhibit high flexibility and scalability in multi-objective scenarios. Compared to traditional single-stream DP or fairness models, ADS can more precisely control feature flow and risk distribution, significantly enhancing the model’s controllability and robustness in practical applications.

4. Experiment

4.1. Experimental Setup

To validate the effectiveness of the AMF-DP approach in improving model fairness, privacy preservation, and performance, this paper performs end-to-end training and evaluation on a model structure containing an encoder and a classifier. The encoder is used to generate privacy-enhanced and fair feature representations, while the classifier accomplishes the downstream discriminative task. To systematically evaluate the performance, fairness, and privacy-preserving capability of the proposed method in text categorization tasks, this paper designs multiple sets of comparison experiments covering different data types, sensitive attributes, and task settings. Three representative public datasets are selected for the experiments, focusing on the tasks of sentiment analysis, career prediction, and income prediction, respectively, involving various sensitive attributes such as gender and race. By comparing with the mainstream fairness and privacy-preserving baseline methods, we comprehensively examine the performance of each method in terms of accuracy, fairness metrics, and privacy measures, so as to validate the effectiveness and advantages of the proposed methods.

We evaluate AMF-DP on three public datasets:

(1): Twitter Sentiment [37]: comprising 200,000 tweets annotated with binary sentiment labels and race-related attributes (speakers of African American English and Standard American English);
(2): Bias in Bios [38] dataset: contains 393,423 text biographies, annotated with occupational labels (28 categories) and binary gender attributes.
(3): Adult Income [39]: This dataset is derived from a subset of the 1994 U.S. Census database, containing 48,842 records, each with 14 features including age, education level, occupation, race, gender, etc., and annotated with a binary classification label indicating whether annual income exceeds $50,000.

Fairness metrics: (1) For the Twitter Sentiment dataset, we use the true positive rate gap (TPR-gap), which measures the difference in true positive rates between two sensitive groups (gender/race) and is closely related to the concept of equal opportunity. For a given sensitive attribute A, such as gender or race, the data is divided into two subgroups, with A = 0 as the reference group and A = 1 as the target group. The TPR-gap is defined as

TPR - gap = {TPR}_{A = 0} - {TPR}_{A = 1}

(17)

Among them,

{TPR}_{A = 0} = \frac{T P_{A = 0}}{T P_{A = 0} + F N_{A = 0}}

,

{TPR}_{A = 1} = \frac{T P_{A = 1}}{T P_{A = 1} + F N_{A = 1}}

, TP (True Positives) are true examples, and FN (False Negatives) are false negative examples.

(2) For the Bias in Bios dataset with 28 categories, this paper follows Romanov et al. [40] and uses the root mean square (RMS) of the TPR gap for all occupations

y \in O

, where O is the set of all occupational categories.

Privacy metrics: (1) Leakage: The accuracy of a two-class classifier in predicting sensitive attributes in encoded representations. (2) Minimum Description Length (MDL): Used to quantify the “effort” required to extract sensitive attributes from text representations [41].

In all experiments, the sensitive information to be protected is the sensitive attribute S associated with each example. For Twitter Sentiment and Bias in Bios, S corresponds to race and gender, respectively, while for Adult Income, it corresponds to gender. We assume that a potential adversary can observe only the released representation

E (x)

but not the raw text x, and aims to infer S from

E (x)

. Therefore, a lower Leakage value and a higher MDL score indicate that the learned representations contain less exploitable information about these sensitive attributes and thus provide stronger privacy protection.

To ensure a fair comparison, we keep the overall privacy budget

(ε_{total}, δ)

fixed across all DP-based methods and datasets, so that differences in performance can be attributed to the modeling choices rather than to different privacy levels. A more exhaustive empirical study over multiple privacy budgets (e.g., varying

ε_{total}

) would require retraining all DP-enabled models for several privacy levels on three datasets and re-tuning the associated hyperparameters, which is beyond our current computational budget and page limit. We therefore focus on a representative moderate privacy level in this work and leave a detailed exploration of privacy–utility trade-off curves as an important direction for future research.

Unless otherwise stated, all models are trained with the Adam optimizer, a learning rate of

2 \times 10^{- 5}

, a batch size of 64, a dropout rate of 0.1, and 45 training epochs. For AMF-DP, we use the same architecture across all datasets and select hyperparameters on a held-out validation set. The trade-off coefficients in Equation (2) are fixed to

λ_{1} = 0.5

(fairness) and

λ_{2} = 0.1

(privacy). In the collaborative noise injection module, we set

α = 0.6

and

β = 0.4

, assigning slightly more weight to feature importance than to group disparity. For the differentially private mechanisms, we clip the

ℓ_{2}

-norm of per-example gradients to

C = 1.0

and adopt a noise multiplier

σ = 1.1

, which, together with the sampling rate and the number of optimization steps, corresponds to an overall privacy budget of approximately

(ε_{total} \approx 5.0, δ = 10^{- 5})

as described in Section 3.4. We further conducted a small sensitivity study by varying

λ_{1}

,

λ_{2}

,

α

, and

β

within

\pm 0.1

around their default values; the resulting changes in accuracy, TPR-gap/GRMS, and Leakage on the validation sets were within 1–2 percentage points. Qualitatively, larger fairness-related weights

(λ_{1}, β)

push the model towards smaller TPR-gap/GRMS with a slight reduction in accuracy, whereas larger privacy-related weights

(λ_{2}, α)

mainly reduce Leakage and increase MDL at the cost of moderate performance loss, so the chosen configuration corresponds to a balanced point on the privacy–fairness–performance trade-off.

Baselines: To evaluate the performance of different methods, this paper compares the performance of the following methods on three datasets:

Standard Text Classifier (STC): A traditional deep text classification model based on a Transformer-based classifier, without introducing any privacy protection or fairness constraints, with classification accuracy as the primary optimization objective. This method serves as a reference for the upper limit of performance.
Fairness-Aware Classifier (FAIR): A representational fairness optimization method that introduces a group fairness regularization term into the loss function to reduce the correlation between model predictions and sensitive attributes (such as gender, race, etc.) [42]. This method reflects the trade-off between fairness and performance.
Differential Privacy Classifier (DP): This method uses differential privacy mechanisms (DP-SGD, [43]) to inject noise into the model parameter update process, thereby protecting the privacy of training data. This method is used to evaluate changes in model performance and fairness under different privacy protection strengths.
Adversarial Debiasing (ADV): Combines fairness enhancement methods from adversarial training (e.g., Zhang et al., 2018) with the introduction of a sensitive attribute discriminator to force the main model to learn feature representations independent of sensitive attributes, thereby enhancing fairness [11]. This method focuses on fairness enhancement but has high computational costs.
Privacy-Preserving Matrix Factorization (PMF): This method employs privacy-preserving matrix factorization to perform low-rank embedding on input features while injecting noise, balancing feature representation capability with privacy protection [44]. This method is used to compare the ability of different matrix factorization strategies to balance privacy and performance.
Federated Fairness with DP (FedFair-DP): A method combining adversarial training and differential privacy, which enhances fairness and privacy protection in a distributed environment through a differential privacy encoder and adversarial training [32]. This method reflects the latest advancements in multi-objective optimization.
AMF-DP (the method in this paper): the adaptive matrix factorization-differential privacy method proposed in the paper incorporates multi-objective optimization, dynamic noise injection, and dual-stream coding mechanism to systematically improve the model performance, fairness, and privacy protection.

4.2. Analysis

These baselines were chosen to cover complementary design choices: STC represents a strong accuracy-oriented model without any privacy or fairness constraints; FAIR and ADV focus on fairness enhancement through regularization or adversarial debiasing; DP and PMF are privacy-preserving approaches without explicit fairness objectives; and FedFair-DP is a recent joint DP-and-fairness framework for text classification. Comparing AMF-DP with these methods therefore allows us to disentangle the effects of (i) fairness regularization, (ii) DP noise injection, and (iii) the proposed adaptive low-rank and dual-stream design on accuracy, fairness, and privacy.

To make the following results easier to follow, we briefly recall the roles of the three main modules in AMF-DP. As illustrated in Figure 1, Figure 2 and Figure 3, the adaptive low-rank embedding learning (ALEL) module compresses high-dimensional text features and weakens the explicit dependence on sensitive attributes. The collaborative noise injection module then adds carefully calibrated Laplace noise to these compact features to protect privacy while controlling the impact on model performance. Finally, the adaptive dual-stream (ADS) encoder processes fairness-related and privacy-related features in two separate streams and fuses them through a gating mechanism, so that the final representation can simultaneously support accuracy, fairness, and privacy.

4.2.1. Performance Analysis

Figure 4 and Table 1 summarize the performance of all methods on the four core metrics (Accuracy, TPR-gap/GRMS, Leakage, and MDL) across the three datasets. Overall, the results show clear and consistent patterns.

First, the standard text classifier (STC) achieves the highest or near-highest accuracy on all datasets, indicating that it can fully exploit the data features when no additional constraints are imposed. However, its TPR-gap/GRMS and Leakage values are much worse than those of the other methods, which reveals strong dependence on sensitive attributes and a high risk of privacy leakage. This illustrates a typical “high performance but high risk” behaviour of conventional NLP models.

Second, fairness-oriented methods such as FAIR and ADV substantially reduce TPR-gap/GRMS by adding group-fairness regularization terms or adversarial debiasing objectives. These methods effectively mitigate model bias and bring the predictions of different groups closer together. Nevertheless, their Leakage and MDL metrics only show limited improvement, meaning that sensitive attributes can still be inferred from the learned representations. Focusing on fairness alone is therefore insufficient to guarantee privacy protection.

Third, privacy-preserving methods based on differential privacy (DP) and privacy-preserving matrix factorization (PMF) significantly strengthen privacy protection by injecting noise. Both methods noticeably reduce the Leakage of sensitive attributes and increase MDL, indicating that more effort is required to recover private information from the encoded features. However, this gain in privacy comes at the cost of lower classification accuracy, and the overall performance is clearly degraded.

In contrast, FedFair-DP and the proposed AMF-DP exhibit more balanced radar shapes on all four indicators, which reflects their ability to jointly consider accuracy, fairness, and privacy. Among them, AMF-DP achieves the best overall performance: on all three datasets, it maintains competitive accuracy while achieving lower TPR-gap/GRMS and Leakage and higher MDL compared with most baselines. This shows that AMF-DP can effectively integrate the benefits of fairness optimization and privacy protection instead of sacrificing one for the other.

Furthermore, Figure 5 compares the training time of FedFair-DP and AMF-DP. The results demonstrate that AMF-DP converges significantly faster on both the Twitter Sentiment and Bias in Bios datasets. This efficiency gain mainly comes from the adaptive low-rank embedding and collaborative noise injection modules, which compress redundant features and reduce unnecessary computation, as well as from the dual-stream encoder that simplifies the feature separation process. Therefore, AMF-DP not only improves the trade-off among accuracy, fairness, and privacy, but also reduces training overhead.

In summary, the proposed framework allows us to obtain (1) high accuracy that is close to STC on all datasets; (2) substantially improved fairness, comparable to or better than dedicated fairness methods; (3) strong privacy protection, with lower Leakage and higher MDL than most privacy baselines; (4) higher training efficiency than FedFair-DP, thanks to the synergy of ALEL, collaborative noise injection, and the ADS encoder.

4.2.2. Trade-Off Analysis

Figure 6 and Figure 7 provide a more intuitive view of the trade-offs between accuracy, fairness, and privacy. In both figures, points closer to the lower-right corner correspond to methods that achieve better accuracy while simultaneously reducing unfairness or privacy leakage.

For the trade-off between accuracy and fairness (Figure 6), most methods lie on a clear accuracy–fairness curve. FAIR and ADV move the points towards smaller TPR-gap/GRMS compared with STC, but their accuracy drops noticeably. DP and PMF obtain only modest fairness improvements and even lower accuracy, which means that noise injection alone cannot solve the bias problem effectively. In contrast, FedFair-DP and AMF-DP are located near the Pareto frontier: they keep accuracy at a relatively high level while significantly reducing TPR-gap/GRMS on all datasets.

For the trade-off between accuracy and privacy leakage (Figure 7), traditional methods (STC, FAIR, and ADV) show high accuracy but also high Leakage, indicating limited privacy protection. DP and PMF clearly reduce Leakage, but this again comes with a noticeable loss in accuracy. FedFair-DP and AMF-DP achieve a better balance between the two dimensions. In particular, AMF-DP consistently lowers Leakage on multiple datasets while incurring only minor accuracy loss, demonstrating strong multi-objective optimization capability.

Overall, although AMF-DP is not the best method on every single metric, it consistently lies in the region where accuracy, fairness, and privacy are jointly satisfactory. This makes it more suitable for realistic scenarios where multiple requirements must be satisfied at the same time.

4.3. Ablation Experiment

To further understand the contribution of each core module, we conduct ablation studies by progressively removing the low-rank embedding (w/o Low-Rank), the collaborative noise injection (w/o Noise), and the dual-stream encoder (w/o ADS) from AMF-DP. Figure 8 reports the results on all three datasets.

On the Twitter Sentiment dataset, removing the low-rank embedding enlarges the TPR-gap from 3.15 to 5.80 and increases Leakage from 60.23% to 65.40%, which confirms that ALEL effectively suppresses sensitive attributes while preserving discriminative features. When the collaborative noise injection module is disabled, accuracy slightly improves but both fairness and privacy metrics deteriorate sharply, and Leakage rises to 72.30%. This verifies that our noise allocation strategy is crucial for protecting sensitive information and maintaining a reasonable performance–privacy balance. Replacing the ADS encoder with a standard encoder also leads to clear degradation in fairness and privacy, showing the benefit of processing fairness- and privacy-related features in two dedicated streams.

Similar trends are observed on the Bias in Bios and Adult Income datasets. Across all datasets, the full AMF-DP model achieves the most balanced combination of accuracy, fairness, and privacy. Any removal of the key modules causes noticeable regression, especially in TPR-gap and Leakage, which indicates that each component (ALEL, collaborative noise injection, and ADS) plays an indispensable role in the overall framework.

4.4. Practical Applications and Limitations

From an application perspective, AMF-DP can be plugged into existing text classification pipelines as a drop-in encoder. In a typical deployment, raw texts are first encoded by a pre-trained Transformer, and the resulting representations are then processed by the ALEL module, the collaborative noise injection mechanism, and the ADS encoder to obtain privacy- and fairness-aware features. The final classifier can be trained on domain-specific data (e.g., hiring, lending, healthcare, or online content moderation) while monitoring accuracy, TPR-gap/GRMS, Leakage, and MDL on a held-out validation set. Before releasing a model, practitioners can test AMF-DP in simulated scenarios by varying the trade-off hyperparameters and privacy budgets to explore operating points that satisfy application-specific constraints on performance, fairness, and privacy.

Despite these advantages, AMF-DP also has several limitations in practical use. First, the framework assumes access to reliable sensitive-attribute labels during training in order to define the fairness loss and evaluation metrics; in many real-world systems, such attributes may be incomplete, noisy, or unavailable, which can weaken the effectiveness of our approach. Second, although AMF-DP is more efficient than FedFair-DP, its training and inference costs are still higher than those of a standard text classifier due to the additional low-rank decomposition, noise injection and dual-stream encoding modules, which may limit deployment in highly resource-constrained environments. Third, the current experiments focus on centralized training for single-label classification with one binary sensitive attribute. Extending AMF-DP to multiple intersecting sensitive attributes, multi-label or sequence generation tasks, and federated or streaming settings is left for future work.

5. Conclusions

In this paper, we propose a novel text processing method, AMF-DP (adaptive matrix factorization with differential privacy), to address the risk of privacy leakage and model prediction bias in encoded text representation. An end-to-end multi-objective learning framework of “Privacy Protection and Fairness Co-Optimization” is constructed through adaptive low-rank embedding learning, collaborative noise injection, and cascade optimization of dual-stream encoder. AMF-DP achieves privacy protection and fairness enhancement while improving the model performance, which opens up new ideas for privacy protection and fairness research in the field of NLP, and has important theoretical and application value. From a practical perspective, AMF-DP can be used as a plug-and-play encoder on top of existing Transformer-based text classifiers, together with a concrete training and evaluation protocol for jointly optimizing accuracy, fairness, and privacy in real-world text categorization systems.

Nevertheless, AMF-DP still has several limitations. The current framework depends on a pre-trained Transformer encoder as the backbone, so its computational and memory costs, as well as part of its bias characteristics, are inherited from this underlying model. In addition, AMF-DP requires explicit annotations of sensitive attributes to optimize and evaluate fairness, which may not always be available due to legal or organizational constraints. Although the adaptive low-rank embedding and collaborative noise injection reduce the overhead compared with FedFair-DP, the overall training and inference cost is still higher than that of a standard text classifier, which may increase deployment cost in large-scale systems. Finally, our experiments are restricted to single-label text classification; evaluating AMF-DP on other tasks and domains, and under stronger distribution shifts, remains an important direction. In future work, we plan to design lighter-weight variants of AMF-DP, investigate techniques that can operate with weak or unobserved sensitive attributes, and conduct more comprehensive case studies in real-world application scenarios.

Author Contributions

Conceptualization, Z.Y.; Methodology, M.D.; Validation, M.D.; Formal analysis, Z.G.; Resources, Z.G.; Writing—original draft, M.D.; Writing—review and editing, Z.Y. and Y.H.; Supervision, Z.Y., X.W. and H.H.; Project administration, C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62172123) and the Key Research and Development Program of Heilongjiang (Grant No. 2022ZX01A36).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study utilizes three publicly available datasets: Twitter Sentiment (Blodgett et al., 2016 [37]): This dataset contains 200,000 tweets annotated with binary sentiment labels and demographic attributes related to race. It is available at https://github.com/slanglab/twitteraae(accessed on 25 November 2025). Bias in Bios (De-Arteaga et al., 2019 [38]): This dataset includes 393,423 biography texts annotated with profession labels (28 categories) and binary gender attributes. It is available at https://github.com/microsoft/biosbias(accessed on 25 November 2025). Adult Income (Kohavi et al., 1996 [39]): This dataset is derived from a subset of the 1994 U.S. Census database, containing 48,842 records, each with 14 features including age, education level, occupation, race, and gender, and annotated with a binary label indicating whether annual income exceeds $50,000. It is available at https://archive.ics.uci.edu/ml/datasets/adult(accessed on 25 November 2025). All datasets used in this study are publicly accessible and were used in accordance with their respective licenses or terms of use.

Conflicts of Interest

Author Xiaoyu Wang was employed by the company Electric Power Research Institute, State Grid Heilongjiang Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Li, H.; Xu, M.; Song, Y. Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence. arXiv 2023, arXiv:2305.03010. [Google Scholar]
Lehman, E.; Jain, S.; Pichotta, K.; Goldberg, Y.; Wallace, B.C. Does BERT pretrained on clinical notes reveal sensitive data? arXiv 2021, arXiv:2104.07762. [Google Scholar] [CrossRef]
Pan, X.; Zhang, M.; Ji, S.; Yang, M. Privacy risks of general-purpose language models. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–20 May 2020; pp. 1314–1331. [Google Scholar]
Sun, T.; Gaut, A.; Tang, S.; Huang, Y.; Elsherief, M.; Zhao, J.; Mirza, D.; Belding-Royer, E.M.; Chang, K.-W.; Wang, W.Y. Mitigating Gender Bias in Natural Language Processing: Literature Review. arXiv 2019, arXiv:1906.08976. [Google Scholar] [CrossRef]
Fabris, A.; Baranowska, N.N.; Dennis, M.J.; Graus, D.; Hacker, P.; Saldivar, J.; Zuiderveen Borgesius, F.J.; Biega, A.J. Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey. ACM Trans. Intell. Syst. Technol. 2023, 16, 1–54. [Google Scholar] [CrossRef]
Cross, J.L.; Choma, M.A.; Onofrey, J.A. Bias in medical AI: Implications for clinical decision-making. PLoS Digit. Health 2024, 3, e0000651. [Google Scholar] [CrossRef]
Phan, N.; Jin, R.; Thai, M.T.; Hu, H.; Dou, D. Preserving Differential Privacy in Adversarial Learning with Provable Robustness. arXiv 2019, arXiv:1903.09822. [Google Scholar]
Panda, A.; Wu, T.; Wang, J.T.; Mittal, P. Privacy-Preserving In-Context Learning for Large Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Bagdasarian, E.; Shmatikov, V. Differential Privacy Has Disparate Impact on Model Accuracy. arXiv 2019, arXiv:1905.12101. [Google Scholar] [CrossRef]
Yang, M.; Ding, M.; Qu, Y.; Ni, W.; Smith, D.; Rakotoarivelo, T. Privacy at a Price: Exploring its Dual Impact on AI Fairness. arXiv 2024, arXiv:2404.09391. [Google Scholar] [CrossRef]
Zhang, B.; Lemoine, B.; Mitchell, M. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2018. [Google Scholar]
Roh, Y.; Lee, K.; Whang, S.E.; Suh, C. FairBatch: Batch Selection for Model Fairness. arXiv 2020, arXiv:2012.01696. [Google Scholar]
Lahoti, P.; Beutel, A.; Chen, J.; Lee, K.; Prost, F.; Thain, N.; Wang, X.; Chi, E.H. Fairness without Demographics through Adversarially Reweighted Learning. arXiv 2020, arXiv:2006.13114. [Google Scholar] [CrossRef]
Lyu, L.; Yu, J.; Nandakumar, K.; Li, Y.; Ma, X.; Jin, J.; Yu, H.; Ng, K.S. Towards Fair and Privacy-Preserving Federated Deep Models. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 2524–2541. [Google Scholar] [CrossRef]
Yang, Z.; Ge, Y.; Su, C.; Wang, D.; Zhao, X.; Ying, Y. Fairness-aware Differentially Private Collaborative Filtering. In Proceedings of the WWW ’23 Companion: Companion Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023. [Google Scholar]
Ghoukasian, H.; Asoodeh, S. Differentially Private Fair Binary Classifications. In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 611–616. [Google Scholar]
Staab, R.; Vero, M.; Balunović, M.; Vechev, M. Beyond Memorization: Violating Privacy via Inference with Large Language Models. arXiv 2023, arXiv:2310.07298. [Google Scholar] [CrossRef]
Ghazi, B.; Huang, Y.; Kamath, P.; Kumar, R.; Manurangsi, P.; Sinha, A.; Zhang, C. Sparsity-Preserving Differentially Private Training of Large Embedding Models. arXiv 2023, arXiv:2311.08357. [Google Scholar] [CrossRef]
Smith, V.; Shamsabadi, A.S.; Ashurst, C.; Weller, A. Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey. arXiv 2023, arXiv:2310.01424. [Google Scholar] [CrossRef]
Wang, Y.; Meng, X.; Liu, X. Differentially Private Recurrent Variational Autoencoder for Text Privacy Preservation. Mob. Netw. Appl. 2023, 28, 1565–1580. [Google Scholar] [CrossRef]
Silva, B.C.C.D.; Ferraz, T.P.; Lopes, R.D.D. Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social Media. arXiv 2024, arXiv:2410.19193. [Google Scholar] [CrossRef]
Chen, H.; Zhu, T.; Zhang, T.; Zhou, W.; Yu, P. Privacy and Fairness in Federated Learning: On the Perspective of Tradeoff. ACM Comput. Surv. 2023, 56, 1–37. [Google Scholar] [CrossRef]
Niu, B.; Chen, Y.; Wang, B.; Cao, J.; Li, F. Utility-Aware Exponential Mechanism for Personalized Differential Privacy. In Proceedings of the 2020 IEEE Wireless Communications and Networking Conference (WCNC), Seoul, Republic of Korea, 25–28 May 2020; pp. 1–6. [Google Scholar]
Blodgett, S.L.; Barocas, S.; Daumé, H.; Wallach, H.M. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. arXiv 2020, arXiv:2005.14050. [Google Scholar] [CrossRef]
Ravfogel, S.; Elazar, Y.; Gonen, H.; Twiton, M.; Goldberg, Y. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Seattle, WA, USA, 5–10 July 2020. [Google Scholar]
Sun, C.; Xu, C.; Yao, C.; Liang, S.; Wu, Y.; Liang, D.; Liu, X.; Liu, A. Improving Robust Fairness via Balance Adversarial Training. arXiv 2022, arXiv:2209.07534. [Google Scholar] [CrossRef]
Yao, W.; Zhou, Z.; Li, Z.; Han, B.; Liu, Y. Understanding Fairness Surrogate Functions in Algorithmic Fairness. arXiv 2023, arXiv:2310.11211. [Google Scholar]
Yang, J.; Jiang, J.; Sun, Z.; Chen, J. A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 15–19 July 2024. [Google Scholar]
Ke, J.X.C.; DhakshinaMurthy, A.; George, R.B.; Branco, P. The Effect of Resampling Techniques on the Performances of Machine Learning Clinical Risk Prediction Models in the Setting of Severe Class Imbalance: Development and Internal Validation in a Retrospective Cohort. Discov. Artif. Intell. 2024, 4, 91. [Google Scholar] [CrossRef]
Lin, Y.; Li, D.; Zhao, C.; Wu, X.; Tian, Q.; Shao, M. Supervised Algorithmic Fairness in Distribution Shifts: A Survey. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
Palakkadavath, R.; Le, H.; Nguyen-Tang, T.; Gupta, S.; Venkatesh, S. Fair Domain Generalization with Heterogeneous Sensitive Attributes Across Domains. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; pp. 7389–7398. [Google Scholar]
Maheshwari, G.; Denis, P.; Keller, M.; Bellet, A. Fair NLP Models with Differentially Private Text Encoders. arXiv 2022, arXiv:2205.06135. [Google Scholar] [CrossRef]
Bagdasaryan, E.; Poursaeed, O.; Shmatikov, V. Differential Privacy Has Disparate Impact on Model Accuracy. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 15479–15488. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Liang, D.; Krishnan, R.G.; Hoffman, M.D.; Jebara, T. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (WWW), Lyon, France, 23–27 April 2018; pp. 689–698. [Google Scholar]
Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web (WWW), Florence, Italy, 18–22 May 2015; pp. 111–112. [Google Scholar]
Blodgett, S.L.; Green, L.; O’Connor, B. Demographic Dialectal Variation in Social Media: A Case Study of African-American English. arXiv 2016, arXiv:1608.08868. [Google Scholar] [CrossRef]
De-Arteaga, M.; Romanov, A.; Wallach, H.; Chayes, J.; Borgs, C.; Chouldechova, A.; Geyik, S.; Kenthapadi, K.; Kalai, A.T. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT), Atlanta, GA, USA, 29–31 January 2019; pp. 120–128. [Google Scholar]
Kohavi, R. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 202–207. [Google Scholar]
Romanov, A.; De-Arteaga, M.; Wallach, H.; Chayes, J.; Borgs, C.; Chouldechova, A.; Geyik, S.; Kenthapadi, K.; Rumshisky, A.; Kalai, A.T. What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes. arXiv 2019, arXiv:1904.05233. [Google Scholar]
Voita, E.; Titov, I. Information-Theoretic Probing with Minimum Description Length. arXiv 2020, arXiv:2003.12298. [Google Scholar] [CrossRef]
Xu, D.; Yuan, S.; Zhang, L.; Wu, X. FairGAN: Fairness-Aware Generative Adversarial Networks. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 570–575. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Hua, J.; Xia, C.; Zhong, S. Differentially Private Matrix Factorization. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]

Figure 1. Overall architecture of the proposed AMF-DP framework. The method consists of (i) an adaptive low-rank embedding learning (ALEL) module that compresses high-dimensional text features and weakens explicit sensitive information, (ii) a collaborative noise injection mechanism that allocates DP noise according to feature importance and group disparities, and (iii) an adaptive dual-stream (ADS) encoder that fuses fairness- and privacy-oriented representations for the downstream classifier.

Figure 2. Illustration of the collaborative noise injection module. Feature importance and group disparity are combined to compute attention weights that determine the allocation of Laplace noise under a fixed privacy budget. More important and group-sensitive features receive stronger perturbations to protect sensitive information while controlling the impact on accuracy.

Figure 3. Structure of the adaptive dual-stream (ADS) encoder. The fairness-enhancement stream applies a bias-aware attention mechanism to reduce correlations with sensitive attributes, whereas the privacy-preserving stream injects differential privacy noise within a Transformer block. A dynamic gating module adaptively fuses the two streams into the final representation

E_{fused} (x)

.

Figure 3. Structure of the adaptive dual-stream (ADS) encoder. The fairness-enhancement stream applies a bias-aware attention mechanism to reduce correlations with sensitive attributes, whereas the privacy-preserving stream injects differential privacy noise within a Transformer block. A dynamic gating module adaptively fuses the two streams into the final representation

E_{fused} (x)

.

Figure 4. Multi-indicator radar charts for each method on different datasets.

Figure 5. Training time comparison between FedFair-DP and AMF-DP on Twitter Sentiment and Bias in Bios datasets.

Figure 6. Scatter plot of accuracy and fairness for each method on different datasets. In the figure, the closer the point is to the lower right corner, the higher the accuracy and the smaller the fairness difference, thus the closer the method is to the lower right corner, the better the performance.

Figure 7. Scatter plot of accuracy and privacy leakage for each method on different datasets. In the figure, the closer the point is to the lower right corner, the higher the accuracy and the lower the privacy leakage, so the closer the method is to the lower right corner, the better the performance.

Figure 8. Ablation study of AMF-DP on the three datasets. The figure compares the full model with variants without low-rank embedding (w/o Low-Rank), without collaborative noise injection (w/o Noise), and without the dual-stream encoder (w/o ADS) in terms of accuracy, TPR-gap/GRMS, Leakage, and MDL, illustrating the contribution of each core module.

Table 1. Results on three datasets.

Dataset	Method	Accuracy ↑	Gap/GRMS ↓	Leakage ↓	MDL ↑
Twitter Sentiment	STC	73.13	26.63	86.73	15.77
	FAIR	72.21	11.82	85.92	16.05
	DP	69.80	28.75	61.15	23.42
	ADV	72.85	7.21	83.44	17.63
	PMF	71.23	23.10	68.20	23.24
	FedFair-DP	74.57	4.47	66.56	25.05
	AMF-DP	73.92	3.15	60.23	26.98
Bias in Bios	STC	79.52	17.04	78.20	173.66
	FAIR	78.63	11.81	77.36	180.43
	DP	75.30	18.84	60.32	237.55
	ADV	79.20	9.01	68.64	190.02
	PMF	76.35	15.12	57.44	243.79
	FedFair-DP	78.08	10.54	55.67	255.52
	AMF-DP	77.43	8.14	53.40	257.43
Adult Income	STC	85.42	17.81	84.21	19.13
	FAIR	84.97	7.33	83.12	20.42
	DP	82.56	8.11	69.24	28.37
	ADV	82.24	5.97	78.13	21.33
	PMF	83.78	7.66	67.41	29.02
	FedFair-DP	84.11	4.23	65.90	30.25
	AMF-DP	84.74	3.18	62.36	31.48

Note: ↑ indicates that higher values are better, while ↓ indicates that lower values are better. All results are averaged over five random seeds, with the maximum and minimum values excluded.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, C.; Dai, M.; Guan, Z.; Ye, Z.; Hou, Y.; Wang, X.; Huang, H. Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization. Appl. Sci. 2025, 15, 13031. https://doi.org/10.3390/app152413031

AMA Style

Ma C, Dai M, Guan Z, Ye Z, Hou Y, Wang X, Huang H. Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization. Applied Sciences. 2025; 15(24):13031. https://doi.org/10.3390/app152413031

Chicago/Turabian Style

Ma, Chao, Mingkai Dai, Zhibo Guan, Zi Ye, Yikai Hou, Xiaoyu Wang, and Hai Huang. 2025. "Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization" Applied Sciences 15, no. 24: 13031. https://doi.org/10.3390/app152413031

APA Style

Ma, C., Dai, M., Guan, Z., Ye, Z., Hou, Y., Wang, X., & Huang, H. (2025). Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization. Applied Sciences, 15(24), 13031. https://doi.org/10.3390/app152413031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Feature Representation Learning for Privacy-Fairness Joint Optimization

Abstract

1. Introduction

2. Related Work

2.1. DP in Text Representation Learning

2.2. Fairness Optimization in NLP Models

2.3. Privacy–Fairness Joint Optimization

3. Approach

3.1. Model Overview

3.2. Adaptive Low-Rank Embedding Learning

3.3. Collaborative Noise Injection

3.4. Adaptive Dual-Stream Encoder

4. Experiment

4.1. Experimental Setup

4.2. Analysis

4.2.1. Performance Analysis

4.2.2. Trade-Off Analysis

4.3. Ablation Experiment

4.4. Practical Applications and Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI