Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases

Lee, Pilhyeon; Park, Sungho

doi:10.3390/sym18020344

Open AccessArticle

Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases

by

Pilhyeon Lee

¹

and

Sungho Park

^1,2,*

¹

Department of Artificial Intelligence, Inha University, Incheon 22212, Republic of Korea

²

Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(2), 344; https://doi.org/10.3390/sym18020344

Submission received: 6 January 2026 / Revised: 4 February 2026 / Accepted: 9 February 2026 / Published: 12 February 2026

(This article belongs to the Special Issue Symmetry/Asymmetry in Computer Vision and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Asymmetries in data distributions and performance across subgroups can induce systematic unfairness in real-world systems. A variety of previous studies have significantly ameliorated the fairness of deep learning models; however, most of them necessarily require additional labels for sensitive attributes, (i.e., ethnicity and gender). Since sensitive attributes often correspond to personal information, collecting such labels can be restricted and may raise privacy concerns. Although recent work has sought to address these issues by training a model without sensitive attribute labels, we point out that it has limitations, as it assumes specific characteristics of sensitive attributes and is validated in simplistic, constrained environments. Therefore, we propose an Unsupervised Fairness-aware Framework (UFF) that trains a fair classification model without pre-defining the characteristics of the sensitive attributes. It includes branches that capture various types of biases and eliminates them through adversarial training. In various scenarios on benchmark datasets, (i.e., CelebA and UTK Face) for facial attribute classification, the proposed method significantly enhances fairness without assuming specific characteristics of sensitive attributes. Moreover, we introduce g-FAT, which is a new metric to measure generalized trade-off performances between classification accuracy and fairness. For example, on CelebA, ours reduces EO from 11.8 to 7.6 for malignant bias and from 15.6 to 9.6 for benign bias, while improving g-FAT from 80.7 to 84.9 and from 79.0 to 85.2, respectively. In terms of g-FAT, our method achieves the highest trade-off performance among the compared methods on the benchmarks.

Keywords:

bias and fairness in AI; symmetry in AI; FAI; trustworthy AI; bias mitigation

1. Introduction

Deep learning has been widely deployed in real-world application systems, such as face recognition [1,2], object detection [3], and image generation [4,5], due to its tremendous advances. However, recent studies have revealed severe ethical issues behind the convenience [6,7,8]. For example, COMPAS [9], which is an assistant system for judges used in the United States, predicted the recidivism risk of African-Americans to be higher than that of Caucasians. Therefore, researchers have tried to ensure fairness in terms of sensitive attributes, which are properties that people should not be discriminated against, aiming to establish performance symmetry across demographic groups [10,11,12,13,14]. On the broad topic of fairness, we narrow the scope to fairness-aware image classification, which is a popular and fundamental task in computer vision [15,16,17].

A variety of existing studies have tried to mitigate the asymmetry in model predictions (i.e., fairness-aware classification) by leveraging additional information with respect to sensitive attributes. To this end, they should pre-define the attributes for which fairness would be improved and annotate the labels before the training process. Under the supervision of the extra labels, they could significantly improve fairness.

However, such methods have non-negligible limitations. First, annotating sensitive attributes can introduce ethical issues and privacy concerns. Since most sensitive attributes, such as ethnicity and gender, correspond to the personal information of data providers, their collection and use are often strictly restricted by institutional policies or legal regulations (e.g., GDPR [18]). Therefore, the annotating task is cautious, label-intensive, and sometimes forbidden [15,19]. Moreover, they can be impractical in real-world applications. The methods only improve the fairness of sensitive attributes that are predefined during training time. For example, a model trained only with gender labels causes unfairness with respect to race or age. However, there has not been agreement on which attributes to set as sensitive, which can vary depending on the environment or user. Whenever the sensitive attribute changes, they need to retrain the model, which limits scalability and practicality. These issues increase the need for a new approach to improve fairness in an unsupervised manner.

To this end, recent studies [20,21,22,23,24,25,26,27] have proposed methods to ensure fairness without using sensitive attribute labels. However, a substantial portion of these studies [20,21,23,24,25,26] assume that the sensitive attributes possess specific characteristics, such as texture, color [21,23], malignant bias [20], or a high correlation between the sensitive attributes and the target class [24,25,26]. On the other hand, several studies [21,22,25,26,27] designed and validated their proposed methods in simplistic and constrained environments, such as tabular datasets or synthesized image datasets. To address these limitations, our objective is to propose a new method that (1) enhances fairness without making any assumptions of the sensitive attributes and (2) is effectively deployed in a more complex environment, i.e., real-world image datasets.

Specifically, we draw inspiration from [20], which shows that sensitive attributes can be classified into two types: malignant and benign biases, which are, respectively, harder or easier to learn than the target information. Motivated by this, we first compose two modules to capture malignant and benign biases independently, based on the finding that a model tends to learn malignant bias, target information, and benign bias sequentially. While LfF [20] primarily focuses on mitigating malignant bias, our method is designed to handle both malignant and benign biases. To this end, we exploit the model’s learning dynamics to derive bias proxies using only the target labels, enabling fairness-aware training without sensitive attribute labels. In particular, we introduce an additional branch to capture benign bias separately and then suppress both bias proxies in the target classifier via adversarial training. As a result, our unsupervised fairness-aware framework learns more symmetric representations with respect to sensitive attributes [10,15] by leveraging the derived bias proxies.

In the experimental section, we validate our approach on two benchmark datasets for fairness-aware image classification, CelebA [28] and UTK Face [29]. Across various scenarios involving sensitive attributes, our approach significantly improves fairness regardless of attribute type and outperforms state-of-the-art methods. Moreover, we introduce g-FAT, a new metric for measuring generalized trade-offs between classification accuracy and fairness. In terms of g-FAT, ours achieves superior trade-off performances on the benchmarks.

We summarize the main contributions as follows.

We introduce branches that capture the sensitive attribute information without making any assumptions about the specific characteristics.
Based on the capturing branches, we design an Unsupervised Fairness-aware Framework for image classification (UFF) that ensures fairness without sensitive attribute labels.
Through extensive experiments on CelebA, UTK Face, and Cat and Dog [30], we validate that ours significantly improves fairness with respect to various sensitive attributes.
We propose a new metric, namely g-FAT, to measure generalized trade-off performances between classification accuracy and fairness. The proposed method achieves the best trade-off performance in terms of g-FAT.

2. Related Works

2.1. Fairness in Classification Using Sensitive Attributes

To provide context for our setting, we first review fairness-aware classification methods that utilize sensitive attribute labels, as they establish standard fairness metrics and common mechanisms for mitigating group disparities. Many researchers have tried to improve fairness with respect to sensitive attributes in classification tasks. Some studies [17,31,32] supposed the major cause of unfairness is learning the unwanted information for the sensitive attributes. Therefore, they sought to improve fairness by removing sensitive information from the feature representation. To this end, Raff et al. [33] introduced a Gradient Reversal Layer (GRL) between the encoder network and classifier. Subsequently, they adversarially train the encoder network and classifier so that the former cannot predict the sensitive attributes. On the other hand, the authors of [15,17,34] separated the representation into subspaces for the target and sensitive attributes. Sarhan et al. [34] introduced a regularization term to make the subspaces orthogonal to each other, and Creager et al. [17] disentangled them with adversarial training to eliminate their correlation. In the test phase, they only utilized the subspace for the target class to ensure fairness. On top of that, Park et al. [15] introduced an additional subspace including the intersected information between the target and sensitive attributes. It helps reduce unnecessary loss of target information within the subspace of sensitive attributes.

Recently, more diverse approaches to fairness have been proposed. Jung et al. [16] designed a fair knowledge distillation method based on Maximum Mean Discrepancy (MMD), which transfers the informative knowledge of the teacher network to the student network independently of the sensitive attributes. Several works [35,36,37,38] generated a new balanced dataset using a generative adversarial network. Different from the previous work [35,36,38], Ramaswamy et al. [37] utilized a pre-trained GAN and perturbed the latent space to decorrelate the target and sensitive attributes. Park et al. [39] proposed a fair supervised contrastive learning algorithm that pushes the samples with different classes but prevents the samples with different sensitive attributes from being distinguished.

While the previous methods have effectively enhanced fairness, they require additional information about sensitive attributes, which can lead to labeling costs and privacy issues. Therefore, we propose to enhance fairness without any knowledge of sensitive attributes.

2.2. Improving Fairness Without Sensitive Attributes

To address the inefficiency and ethical concerns associated with the annotation and use of sensitive attribute labels, several recent works [20,21,22,23,24,25,26,27] have sought to improve the fairness of models without such labels. The majority of studies [20,21,23,25,26] make assumptions about specific characteristics of sensitive attributes. Specifically, the authors of [21,23] predefined simple bias (e.g., texture and color), and designed models dedicated to the bias. Furthermore, Nam et al. [20] divided all the types of bias into two categories, malignant and benign bias. When a classifier is trained to predict the target class, the malignant bias is easier to learn than the target information. Meanwhile, the benign bias is harder to learn than it is. Since the malignant bias is a major factor in biased classification performance, they proposed a method that identifies and eliminates it. Lastly, some studies [24,25,26] assumed that the sensitive attributes and target classes are sufficiently correlated. Based on it, the authors of [24,25] indirectly speculated the sensitive attribute using the features of a biased classifier, and Lahoti et al. [26] gave more weight to computational-identifiable errors. However, these methods have the drawback of performing well only under specific assumptions about sensitive attributes.

On the other hand, some existing methods were designed for or only validated in simplistic and constrained environments [21,22,25,26,27]. Since their effectiveness has been demonstrated only on tabular or synthesized image datasets, the scalability of these methods to complex real-world environments appears limited.

Unlike previous work, we introduce a method that mitigates unfairness without relying on any assumptions about the sensitive attributes. Furthermore, we demonstrate that the proposed method can effectively work in real-world image classification tasks.

3. Proposed Method

3.1. Preliminary

Before describing our proposed method in detail, we specify the notation as follows.

$x \in R^{N}$ : input data with N dimension.
$y \in {0, 1}$ : binary target label we want to predict.
$s \in {0, 1}$ : binary sensitive attribute (unknown in the training phase).
$E (\cdot)$ , $C (\cdot)$ : encoder network and classifier in each module.

When the input data x is given, a classification model M predicts the target labels:

\hat{y} = C (E (x))

. The goal is to ensure fairness (i.e., predictive symmetry) in terms of the sensitive attribute s as follows.

P (\hat{y} = 1 | y = c, s = 0) = P (\hat{y} = 1 | y = c, s = 1),

(1)

where

c \in {0, 1}

and the definition of fairness is called Equalized Odds (EO) [40]. We adopt this population-statistics formulation to define and measure fairness, while sensitive attributes are unavailable during training and used only for evaluation. To this end, we propose an unsupervised fairness-aware framework (UFF) for image classification, which consists of two bias-capturing branches, debiasing modules, and a target classification network. As shown in Figure 1, when an input image is given, the bias capturing branches extract the knowledge of bias by using only the target label. Using it as a supervision of bias, we eliminate bias in the target classification network. These proxies are then employed as supervision to suppress biased representations in the target classification network through adversarial training, without access to sensitive attribute labels.

3.2. Malignant Bias Capturing Branch

Following the previous work [20], we classify the kind of biases into two categories, which are malignant and benign biases. When a classification model is trained to predict the target class, the malignant bias is learned more easily than the target information, whereas the benign bias is learned more difficultly than the target information. Based on this, we design two branches: the malignant-bias-capturing branch (MCB) and the benign-bias-capturing branch (BCB) to capture both types of bias.

Firstly, we utilize the same architecture as [20] for the malignant-bias-capturing branch. It consists of the encoder

E_{m}

and classifier

C_{m}

, which outputs the softmax prediction

S (C_{m} (E_{m} (x)))

for the target label y. Here, S denotes the softmax function. To ensure the model is sufficiently overfitted to the malignant bias before learning the target information, we utilize the generalized cross-entropy (GCE) [41] loss, which amplifies the weights of easy-to-learn samples as follows:

L^{G C E} = \frac{1 - S_{y} {(C_{m} (E_{m} (x)))}^{q}}{q},

(2)

where

q \in (0, 1]

is a hyperparameter, and

S_{y}

is the softmax output for the target class of y. As q increases, the loss places greater weight on the easier samples.

3.3. Benign Bias Capturing Branch

To capture the other kind of bias (i.e., benign bias), we design a benign-bias-capturing branch (BCB). To this end, we suppose that a classification network learns target information before the benign bias. In Figure 2, we measure the training accuracy for the target class, malignant bias, and benign bias during the early stage of training. In the initial epoch, the model sufficiently learns the malignant bias and target information, but has not yet properly learned the benign bias. It is worth noting that training accuracy for the benign bias is almost 50%. Based on this observation, we deploy two identical classification networks, i.e., the benign exclusionary (BE) network and the benign dedicated (BD) network, in the BCB.

Benign exclusionary network. It is trained to classify the target labels y with the cross-entropy loss as follows.

L^{B E} = - log S_{y} (C_{b e} (E_{b e} (x))) .

(3)

Here, the loss encourages the network to learn the malignant bias and target information in the initial stage, as aforementioned. Consequently, since the output of the BE network

\hat{y_{b e}}

primarily includes the knowledge of the target class and malignant bias, we exploit

\hat{y_{b e}}

as weak supervision for the target class and malignant bias.

Benign dedicated network. To capture only the benign bias, the benign dedicated network is adversarially trained not to predict

\hat{y_{b e}}

while accurately predicting the target labels y as follows.

\begin{matrix} L^{B D} = & - log S_{y} (C_{b d} (E_{b d} (x))) \\ + λ_{1} log S_{\hat{y_{b e}}} (C_{b d} (E_{b d} (x))), \end{matrix}

(4)

where

S_{\hat{y_{b e}}}

is the softmax output for the argmax class of

\hat{y_{b e}}

and

λ_{1}

is a hyperparameter. By adversarially training the two networks (i.e., BE and BD), we encourage

E_{b d}

, which is the encoder network of BD, to exclude information for the malignant bias and target labels, only including benign bias information.

3.4. Debiasing Module

As described above, the outputs of MCB and BCB capture the malignant bias and benign bias, respectively. Using them to supervise bias, we aim to eliminate all kinds of bias in the target classification network (TC). Specifically, we train the target classification network to avoid predicting the outputs of MCB and BCB based on the previous work [10] as follows.

\begin{matrix} L^{T C} = & - log S_{y} (C_{t c} (E_{t c} (x))) \\ + λ_{2} log S_{\hat{y_{m}}} (C_{t c} (E_{t c} (x))) \\ + λ_{3} log S_{\hat{y_{b s}}} (C_{t c} (E_{t c} (x))), \end{matrix}

(5)

where

\hat{y_{m}}

and

\hat{y_{b s}}

are the outputs of MCB and BCB.

λ_{2}

and

λ_{3}

are hyperparameters. This ensures fairness across all potential sensitive attributes, where sensitive attributes correspond to benign or malignant bias.

Ultimately, the overall framework is trained in an end-to-end manner with the following total loss:

\begin{matrix} L^{T o t a l} = L^{G C E} + α L^{B E} + β L^{B D} + γ L^{T C}, \end{matrix}

(6)

where

α

,

β

, and

γ

are hyperparameters for balancing the losses.

4. Experiment

In this section, we validate the proposed method on three datasets: CelebA [28], UTK Face [29], and Cat and Dog [30]. To demonstrate that our method can ensure fairness across various sensitive attributes, we conduct extensive experiments across diverse scenarios. We include LfF [20] and BPA [24] as comparison methods, which are widely used baselines for mitigating demographic bias without requiring explicit group labels. In all the experiments, ours achieves a significant improvement in fairness without any knowledge of the sensitive attributes.

4.1. Dataset

CelebA [28]: It consists of about 200 k face images with 40 binary attributes. Among them, we set Male and Young attributes as the sensitive attributes and the others as the target classes. Exceptionally, we exclude three target attributes of 5-o-Clock Shadow, Goatee, and mustache since there is no data for a specific group in the test set, e.g., no data exists for women with mustaches. For all experiments, we follow the original composition of the training, validation, and test set.
UTK Face [29]: It contains about 20 k face images with three attributes: Gender, Race, and Age. Following the previous work [15,39], we first reform Race and Age into binary attributes as follows. Race—Caucasian or not; Age—over 35 or not. Next, we compose two subsets for training (i.e., Sub1 and Sub2) with the different correlations between the attributes.
‒
Sub1: The target attribute “Age” has a high correlation with the sensitive attribute “Gender”
‒
Sub2: The target attribute “Race” has a high correlation with the sensitive attribute “Gender”
For example, according to the bias intensity $η$ , we set male data to be $η$ times higher than female data for the young, and vice versa for the old in Sub1; that is, the male data are correlated with the old in this subset. Similarly, we set male data to be $η$ times the female data for Caucasians, and vice versa for the other races in Sub2. To validate the effectiveness of the proposed method in various levels of data imbalance, we set $η$ to 2, 3, and 4. Unlike the training set, we compose the validation and test sets to be fully balanced.
Cat and Dog [30]: It is a relatively small-scale dataset, including about 40 k dog and cat images. Following the setting of previous work [42], we set the species as the target attribute and color (i.e., bright or dark) as the bias. To establish a correlation between them, we set the white dogs to be 5 times as many as the black dogs, and the black cats to be five times as many as the white cats. As with UTK Face, we set the validation and test sets to be fully balanced. Although it is not common to regard color bias as a sensitive attribute, we design this experiment to show that our method can improve fairness with respect to not only facial sensitive attributes but also various kinds of bias.

4.2. Implementation Detail

For all comparative models, we use ResNet-18 [43] as the encoder and a single-layer Multi-Layer Perceptron (MLP) as the classifier. We use the Adam optimizer [44] with a learning rate of

1 \times 10^{- 5}

. We resize input images to 128 × 128 for CelebA and UTKFace, and to 224 × 224 for Cat and Dog. These resolutions were chosen to standardize the input size across experiments and to keep the computational and memory costs reasonable for fair comparison. We trained the model to converge sufficiently (i.e., 10 epochs for CelebA, and 50 epochs for UTK Face and Cat and Dog). All the results are averaged over three independent runs. For LfF [20], the original version shows poor classification performance on CelebA and UTK Face because the pre-trained encoder network was not trained on the target dataset (e.g., CelebA). Therefore, we use it to fine-tune the encoder network on the target dataset and achieve comparable performance. For BPA [24], we set the number of clusters k to 4 since the performance is saturated from

k = 4

. For ours, we first fix the hyperparameters for balancing losses between the classification branches (i.e.,

α

,

β

,

γ

) to 1 for simplification. On top of that, we find the other hyperparameters by jointly searching

{λ_{1}, λ_{2}, λ_{3}}

from 0.1 to 1.0. As a result, in CelebA and UTK Face, we set

λ_{1}, λ_{2}, λ_{3} = 0.3

, and in Cat and Dog, we set them to 0.1. For both LfF and ours, we set

q = 0.7

in GCE. We note that ours achieves the best trade-off performances with such simplified hyperparameter tuning.

4.3. Analysis of Attributes of CelebA

To verify the effectiveness of the proposed method in addressing malignant and benign biases, we first analyze the bias types of sensitive attributes (i.e., Male and Young) across all target attributes. Following the previous work [20], we observe whether the training loss of bias-aligned samples is in contrast to that of bias-conflicting samples in the early training stage. If the sensitive attribute is the malignant bias, the trends of the losses are contrasted, i.e., the loss of the bias-conflicting samples is increased, and that of the bias-aligned samples is decreased; otherwise, the trends are similar to each other. Based on these observations, we categorize the bias types of target attributes as shown in Table 1. For a more reliable evaluation, we mark some attributes with inconsistent or unclear tendencies as “confused”.

4.4. Evaluation Metric

In the experiments, we evaluate the performance of comparative models from two perspectives: fairness and classification accuracy. In general, it is known that there is a trade-off between fairness and classification accuracy [12,39], and thus the goal is to improve fairness while maintaining accuracy at the most. Among a variety of metrics for fairness, such as Demographic Parity (DP) [40], Equal Opportunity (EOPP) [40], and Equalized Odds (EO) [40], we adopt Equalized Odds (EO), which is the average disparity of the true positive rate and false positive rate between different sensitive groups [40]. EO is particularly suitable for our setting since it evaluates group disparities in both positive and negative outcomes, thereby capturing overall error asymmetry across sensitive groups. In contrast, other metrics, such as DP and EOPP, may be satisfied by skewed prediction behaviors (e.g., indiscriminately increasing positive predictions) and can overlook unfairness in negative outcomes [40]. On the other hand, we measure the balanced accuracy (i.e., BAcc.), which is the average accuracy over all sensitive groups. We use BAcc. rather than overall accuracy, since overall accuracy can be dominated by majority groups under imbalanced test distributions, masking poor performance on underrepresented groups. It not only reflects the classification ability of a model but also its robustness to distribution shifts in the test environment.

4.5. g-FAT: Generalized Fairness-Accuracy Trade-Off Metric

Moreover, we introduce a generalized version of FAT [45], called g-FAT, to evaluate the generalized trade-off performance between classification accuracy and fairness. While FAT measures trade-off performance using the harmonic mean of accuracy and fairness, a subjective decision is required about the importance ratio

α

between them. To overcome this limitation, we define g-FAT by integrating FAT values for

α

as follows:

g - F A T = \int \frac{1}{α (\frac{1}{100 - f_{p}}) + (1 - α) \frac{1}{B A c c .}} d α

(7)

where

f_{p}

is a metric for fairness, i.e., equalized odds. For measurement convenience, we utilize the following approximation:

g - F A T \approx \frac{1}{| A |} \sum_{α \in A} \frac{1}{α (\frac{1}{100 - f_{p}}) + (1 - α) \frac{1}{B A c c .}},

(8)

where

A = {0.1, 0.2, \dots, 1.0}

.

Compared to the original FAT, which requires selecting a single importance ratio

α

, g-FAT aggregates the trade-off scores over a range of

α

values, thereby reducing sensitivity to any particular choice of

α

and providing a more robust summary of the fairness-accuracy trade-off. Second, g-FAT summarizes the fairness–accuracy trade-off into a single scalar score, enabling convenient comparison across different methods. Third, because of its harmonic-mean formulation, g-FAT assigns lower scores to methods that improve one objective at the expense of the other, thereby favoring balanced improvements in both fairness and accuracy.

We also note several limitations. First, because distinct pairs of balanced accuracy and equalized odds can yield similar g-FAT values, g-FAT should be interpreted alongside the underlying metrics. Second, g-FAT depends on the choice of the fairness metric

f_{p}

, and its absolute value may change under alternative fairness criteria. Finally, we approximate the integral using a discrete set A. While the score may vary slightly with different samplings of

α

, we use a fixed A to ensure consistent evaluation across all methods.

4.6. Classification Results on CelebA

For all target attributes in CelebA, we measure balanced accuracy (BAcc.), equalized odds (EO), and generalized FAT (g-FAT) with respect to the two sensitive attributes (i.e., Male and Young). In Table 2, we first summarize the results by sensitive attribute. As demonstrated in the previous work [20], LfF achieves a significant improvement of balanced accuracy in all the settings. However, compared with enhancing EO for malignant bias, it fails to substantially improve EO for benign bias. Conversely, BPA [24] shows limited progress in improving fairness against malignant bias. The proposed method significantly outperforms them in terms of EO while also demonstrating a substantial improvement in BAcc. compared to the baseline. As a result, ours achieves the best trade-off performance in terms of g-FAT. It is noted that our method is the only approach among all comparison methods that enhances fairness across all types of bias. In addition, we report all classification results for two sensitive attributes, Male and Young, respectively, in Table 3 and Table 4. We note that the type and strength of bias can vary substantially across target attributes, which may lead to attribute-wise performance variations across methods. Accordingly, while some comparison methods can outperform ours on specific attributes, our method is designed to jointly handle both malignant and benign biases, resulting in the best average EO and g-FAT over all attributes.

4.7. Classification Results on UTK Face

In Table 5 and Table 6, we report BAcc., EO, and g-FAT according to the intensity of data imbalance η, respectively, for Sub1 and Sub2. In Table 5, although LfF [20] and BPA [24] improve both BAcc. and EO over the baseline, the improvements are not significant. Meanwhile, ours largely ameliorates EO over the baseline across all the environments while maintaining balanced accuracy, resulting in superior g-FAT scores. Moreover, the improvement of fairness becomes larger as

η

increases. Overall, Table 6 exhibits a similar trend to Table 5, but all the comparable methods notably improve BAcc. as well as EO.

4.8. Ablation Study on Benign Bias

In contrast to the previous study (i.e., LfF [20]) that we are inspired by, our method is designed to improve fairness not only in terms of malignant bias but also the other kind of bias (i.e., benign bias). Moreover, we claim that the benign bias capturing branch (BCB) is more crucial in effectively improving fairness for the benign bias. To demonstrate it, we validate the effectiveness of the capturing branches (i.e., BCB and MCB) for mitigating the benign bias on CelebA. Specifically, we set Arched-Eyebrows as the target attribute and Male, which is the benign bias for it, as the sensitive attribute and ablates each branch from the proposed method. In Table 7, Ours without BCB rather aggravates EO over the baseline, despite an increase in the balanced accuracy, resulting in a degradation of g-FAT. It indicates that MCB slightly helps improve fairness for benign bias. Meanwhile, ours without MCB achieves a level of fairness comparable to the complete model, thereby demonstrating that BCB predominantly contributes to enhancing fairness for benign bias.

4.9. Extension to General Bias

To verify that our method can robustly handle general data asymmetry (i.e., color bias) as well as facial sensitive attributes, we further conduct experiments on Cat and Dog [30]. Because the training dataset is severely imbalanced, the baseline achieves highly unfair classification performance. Table 8 shows that all comparable methods significantly improve balanced accuracy and fairness compared with the baseline. Among them, ours shows the best trade-off performance between them, indicating the effectiveness of the proposed method for general bias.

5. Conclusions

In this paper, we proposed an unsupervised fairness-aware framework for image classification. Unlike previous work, the proposed method ensures fairness regardless of the types of sensitive attributes. To this end, we first demonstrated that a classification network learns target information before benign bias. Based on this observation, we designed a benign bias-capturing branch. By integrating it with the malignant bias capturing branch, we can capture diverse types of biases (i.e., malignant and benign biases). Subsequently, we mitigated these biases through adversarial training to learn symmetric representations with respect to undisclosed sensitive attributes. In the experiments, ours shows consistent improvements in the fairness–accuracy trade-off across multiple datasets. Specifically, on the CelebA dataset, ours reduced Equalized Odds (EO) for malignant bias from 11.8 to 7.6 and for benign bias from 15.6 to 9.6 compared to the baseline model. Furthermore, for the newly proposed g-FAT metric, our method achieved the highest score of 85.2 among the compared unsupervised methods. Moreover, we show that our method can handle not only facially sensitive attributes but also other types of data asymmetry; on the color-biased Cat and Dog dataset, our method improved balanced accuracy from 79.9 to 87.1 compared with the baseline model. These results suggest that ours achieves a favorable trade-off between fairness and accuracy without requiring any sensitive attribute labels. Finally, we discuss the limitations of our work and outline directions for future research. First, exploring how to further reduce the performance gap between unsupervised approaches and fully supervised methods that leverage sensitive attribute labels remains an important direction for future work. Second, our study primarily focuses on binary target labels and a binary sensitive attribute; extending the framework to multi-class targets and intersectional sensitive attributes is a promising avenue. Third, while adversarial optimization is effective, improving its training stability and computational efficiency remains an important topic for future study.

Author Contributions

Conceptualization, S.P.; methodology, S.P. and P.L.; investigation, S.P. and P.L.; software, S.P.; writing—original draft preparation, S.P.; writing—review and editing, S.P. and P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available from public domain resources. These data were derived from the following publicly available sources: CelebA (Liu et al. [28], available online: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, accessed on 8 February 2026), UTKFace (Zhang et al. [29], available at https://susanqq.github.io/UTKFace/, accessed on 8 February 2026), and the Cat and Dog dataset (Kaggle [30], available online: https://www.kaggle.com/c/dogs-vs-cats, accessed on 8 February 2026).

Acknowledgments

This work was supported by Incheon National University Research Grant in 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, J.; Cheng, Y.; Xu, Y.; Xiong, L.; Li, J.; Zhao, F.; Jayashree, K.; Pranata, S.; Shen, S.; Xing, J.; et al. Towards Pose Invariant Face Recognition in the Wild. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Tran, L.; Yin, X.; Liu, X. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 1283–1292. [Google Scholar]
Wang, J.; Song, L.; Li, Z.; Sun, H.; Sun, J.; Zheng, N. End-to-End Object Detection With Fully Convolutional Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2021; pp. 15849–15858. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems; IEEE: Piscataway, NJ, USA, 2014; pp. 2672–2680. [Google Scholar]
Kaneko, T.; Harada, T. Blur, Noise, and Compression Robust Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2021; pp. 13579–13589. [Google Scholar]
Dougherty, C. Google Photos Mistakenly Labels Black People ‘Gorillas’. The New York Times (Bits Blog), 1 July 2015. Available online: https://archive.nytimes.com/bits.blogs.nytimes.com/2015/07/01/google-photos-mistakenly-labels-black-people-gorillas/ (accessed on 8 February 2026).
Lomas, N. FaceApp Apologizes for Building a Racist AI; TechCrunch: Bay Area, CA, USA, 2018. [Google Scholar]
Gong, S.; Liu, X.; Jain, A. Jointly De-biasing Face Recognition and Demographic Attribute Estimation. In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020. [Google Scholar]
Angwin, J.; Larson, J.; Mattu, S.; Kirchner, L. Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks; ProPublica: New York, NY, USA, 2016. [Google Scholar]
Zhang, B.H.; Lemoine, B.; Mitchell, M. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society; AIES ’18; ACM: New York, NY, USA, 2018; pp. 335–340. [Google Scholar] [CrossRef]
Amini, A.; Soleimany, A.P.; Schwarting, W.; Bhatia, S.N.; Rus, D. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society; AIES ’19; ACM: New York, NY, USA, 2019; pp. 289–295. [Google Scholar] [CrossRef]
Wang, T.; Zhao, J.; Yatskar, M.; Chang, K.W.; Ordonez, V. Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2019; pp. 5309–5318. [Google Scholar]
Burns, K.; Hendricks, L.A.; Darrell, T.; Rohrbach, A. Women also Snowboard: Overcoming Bias in Captioning Models. In Proceedings of the ECCV; ACM: New York, NY, USA, 2018. [Google Scholar]
Bolukbasi, T.; Chang, K.W.; Zou, J.; Saligrama, V.; Kalai, A. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems; NIPS’16; NIPS; NIPS: Red Hook, NY, USA, 2016; pp. 4356–4364. [Google Scholar]
Park, S.; Hwang, S.; Kim, D.; Byun, H. Learning Disentangled Representation for Fair Facial Attribute Classification via Fairness-aware Information Alignment. In Proceedings of the AAAI-2021; AAAI Press: Menlo Park, CA, USA, 2021. [Google Scholar]
Jung, S.; Lee, D.; Park, T.; Moon, T. Fair Feature Distillation for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2021; pp. 12115–12124. [Google Scholar]
Creager, E.; Madras, D.; Jacobsen, J.H.; Weis, M.; Swersky, K.; Pitassi, T.; Zemel, R. Flexibly Fair Representation Learning by Disentanglement. arXiv 2019, arXiv:1906.02589. [Google Scholar] [CrossRef]
European Parliament and the Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Off. J. Eur. Union 2016, 119, 1–88. [Google Scholar]
Jung, S.W.; Chun, S.; Moon, T. Learning Fair Classifiers with Partially Annotated Group Labels. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 10338–10347. [Google Scholar]
Nam, J.; Cha, H.; Ahn, S.; Lee, J.; Shin, J. Learning from Failure: De-biasing Classifier from Biased Classifier. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Nice, France, 2020; Volume 33, pp. 20673–20684. [Google Scholar]
Bahng, H.; Chun, S.; Yun, S.; Choo, J.; Oh, S.J. Learning De-biased Representations with Biased Representations. arXiv, 2020; arXiv:1910.02806. [Google Scholar] [CrossRef]
Hashimoto, T.B.; Srivastava, M.; Namkoong, H.; Liang, P. Fairness Without Demographics in Repeated Loss Minimization. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Dy, J.G., Krause, A., Eds.; PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 1934–1943. [Google Scholar]
Hong, Y.; Yang, E. Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning. In Proceedings of the Advances in Neural Information Processing Systems; Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; NIPS: Grenada, Spain, 2021. [Google Scholar]
Seo, S.; Lee, J.Y.; Han, B. Unsupervised Learning of Debiased Representations With Pseudo-Attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 16742–16751. [Google Scholar]
Zhao, T.; Dai, E.; Shu, K.; Wang, S. Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 2022; WSDM ’22; IEEE: Piscataway, NJ, USA, 2022; pp. 1433–1442. [Google Scholar] [CrossRef]
Lahoti, P.; Beutel, A.; Chen, J.; Lee, K.; Prost, F.; Thain, N.; Wang, X.; Chi, E.H. Fairness without Demographics through Adversarially Reweighted Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2020; NIPS’20; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Grari, V.; Lamprier, S.; Detyniecki, M. Fairness without the Sensitive Attribute via Causal Variational Autoencoder. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22; Raedt, L.D., Ed.; International Joint Conferences on Artificial Intelligence Organization: Sydney, Australia, 2022; pp. 696–702. [Google Scholar] [CrossRef]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Face Attributes in the Wild. In Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
Zhang, Z.; Song, Y.; Qi, H. Age Progression/Regression by Conditional Adversarial Autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Kaggle. Dogs vs. Cats. 2013. Available online: https://www.kaggle.com/c/dogs-vs-cats (accessed on 8 February 2026).
Hwang, S.; Park, S.; Lee, P.; Jeon, S.; Kim, D.; Byun, H. Exploiting Transferable Knowledge for Fairness-aware Image Classification. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
Locatello, F.; Abbati, G.; Rainforth, T.; Bauer, S.; Schölkopf, B.; Bachem, O. On the Fairness of Disentangled Representations. arXiv 2019, arXiv:1905.13662. [Google Scholar] [CrossRef]
Raff, E.; Sylvester, J. Gradient Reversal against Discrimination: A Fair Neural Network Learning Approach. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA); IEEE: Piscataway, NJ, USA, 2018; pp. 189–198. [Google Scholar] [CrossRef]
Sarhan, M.H.; Navab, N.; Eslami, A.; Albarqouni, S. Fairness by Learning Orthogonal Disentangled Representations. In Proceedings of the Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIX Lecture Notes in Computer Science; Vedaldi, A., Bischof, H., Brox, T., Frahm, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12374, pp. 746–761. [Google Scholar] [CrossRef]
Xu, D.; Yuan, S.; Zhang, L.; Wu, X. FairGAN: Fairness-aware Generative Adversarial Networks. In 2018 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2018; pp. 570–575. [Google Scholar] [CrossRef]
Sattigeri, P.; Hoffman, S.C.; Chenthamarakshan, V.; Varshney, K.R. Fairness GAN: Generating datasets with fairness properties using a generative adversarial network. IBM J. Res. Dev. 2019, 63, 3:1–3:9. [Google Scholar] [CrossRef]
Ramaswamy, V.V.; Kim, S.S.Y.; Russakovsky, O. Fair Attribute Classification Through Latent Space De-Biasing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2021; pp. 9301–9310. [Google Scholar]
Hwang, S.; Park, S.; Kim, D.; Do, M.; Byun, H. FairFaceGAN: Fairness-aware Facial Image-to-Image Translation. In Proceedings of the BMVC 2020, Virtual Event, 7–10 September 2020; Volume 2020. [Google Scholar]
Park, S.; Lee, J.; Lee, P.; Hwang, S.; Kim, D.; Byun, H. Fair Contrastive Learning for Facial Attribute Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 10389–10398. [Google Scholar]
Hardt, M.; Price, E.; Srebro, N. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2016; NIPS’16; ACM: New York, NY, USA, 2016; pp. 3323–3331. [Google Scholar]
Zhang, Z.; Sabuncu, M.R. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2018; NIPS’18; ACM: New York, NY, USA, 2018; pp. 8792–8802. [Google Scholar]
Kim, B.; Kim, H.; Kim, K.; Kim, S.; Kim, J. Learning Not to Learn: Training Deep Neural Networks With Biased Data. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Conference Track Proceedings; Bengio, Y., LeCun, Y., Eds.; INSPIRE Consultancy Ltd.: London, UK, 2015. [Google Scholar]
Kim, D.; Park, S.; Hwang, S.; Byun, H. Fair classification by loss balancing via fairness-aware batch sampling. Neurocomputing 2023, 518, 231–241. [Google Scholar] [CrossRef]

Figure 1. The overall flow of the proposed method. It consists of a malignant-bias capturing branch (a), a benign-bias capturing branch (b), and a target classification network (c).

L^{C E}

and

L^{G C E}

represent the standard cross-entropy and generalized cross-entropy, respectively. Dashed lines denote the backward pass of adversarial objectives. The red dashed lines suppress malignant bias in the TC (a) and discourage the BCB from encoding target information and malignant bias (b), while the purple dashed line suppresses benign bias in the TC (c).

Figure 1. The overall flow of the proposed method. It consists of a malignant-bias capturing branch (a), a benign-bias capturing branch (b), and a target classification network (c).

L^{C E}

and

L^{G C E}

represent the standard cross-entropy and generalized cross-entropy, respectively. Dashed lines denote the backward pass of adversarial objectives. The red dashed lines suppress malignant bias in the TC (a) and discourage the BCB from encoding target information and malignant bias (b), while the purple dashed line suppresses benign bias in the TC (c).

Figure 2. Classification accuracy in the early training stage. The target class, malignant bias, and benign bias are respectively set to Big-Nose, Young, and Male attributes on CelebA. Epochs 1–3 correspond to the first three full passes over the training set. We show these initial epochs (out of 10) to illustrate early learning dynamics, where malignant bias and target information emerge earlier than benign bias.

Table 1. Analysis on the bias type of sensitive attributes for all target attributes on CelebA. We first exclude three target attributes and categorize Male and Young into the malignant and benign biases based on the other target attributes. If the bias type is not clear, it is marked “Confused” to ensure reliable evaluation.

Attribute	Male			Young			Excluded
Attribute	Malignant	Benign	Confused	Malignant	Benign	Confused
5-o-Clock-Shadow							✓
Arched-Eyebrows		✓		✓
Attractive	✓			✓
Bags-Under-Eyes		✓		✓
Bald		✓		✓
Bangs		✓		✓
Big-Lips		✓		✓
Big-Nose		✓		✓
Black-Hair		✓		✓
Blond-Hair		✓		✓
Blurry		✓		✓
Brown-Hair		✓		✓
Bushy-Eyebrows		✓		✓
Chubby		✓		✓
Double-Chin		✓		✓
Eyeglasses	✓			✓
Goatee							✓
Gray-Hair			✓		✓
Heavy-Makeup	✓			✓
High-Cheekbones		✓		✓
Mouth-Slightly-Open			✓	✓
Mustache							✓
Narrow-Eyes		✓		✓
No-Beard	✓			✓
Oval-Face	✓			✓
Pale-Skin		✓		✓
Pointy-Nose		✓		✓
Receding-Hairline		✓			✓
Rosy-Cheeks	✓			✓
Sideburns		✓			✓
Smiling		✓				✓
Straight-Hair	✓			✓
Wavy-Hair	✓			✓
Wearing-Earrings			✓	✓
Wearing-Hat		✓			✓
Wearing-Lipstick			✓		✓
Wearing-Necklace	✓			✓
Wearing-Necktie		✓				✓

Table 2. Summarized results on CelebA. We summarized the classification results for all target attributes of CelebA. We set Male and Young to the sensitive attributes and measured BAcc., EO, and g-FAT with respect to them. Furthermore, we reorganize the results by malignant and benign biases.

Method	Male			Young			Malignant			Benign
Method	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT
Baseline [43]	72.7	19.0	76.6	76.8	7.9	83.9	74.1	11.8	80.7	74.1	15.6	79.0
LfF [20]	77.5	19.8	78.8	82.4	7.4	87.2	79.6	10.6	84.2	81.0	15.1	82.9
BPA [24]	79.4	17.1	81.1	82.1	8.2	86.7	79.4	11.5	83.7	81.7	13.8	83.9
Ours	78.2	12.7	82.7	81.3	5.1	87.7	78.3	7.6	84.9	80.6	9.6	85.2

Table 3. Overall classification results for sensitive attribute “Male” on CelebA. We report all experimental results for 35 target attributes with respect to sensitive attribute“Male”.

Attribute	Baseline [43]			LfF [20]			BPA [24]			Ours
Attribute	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT
Arched-Eyebrows	64.8	31.5	66.6	64.0	32.7	65.6	68.1	20.6	73.4	71.3	17.1	76.7
Attractive	76.1	24.4	75.8	76.4	11.5	82.1	75.2	21.6	76.7	76.2	8.0	83.5
Bags-Under-Eyes	67.6	20.7	73.1	72.3	21.1	75.4	72.7	16.2	77.9	70.7	11.0	79.0
Bald	71.5	13.8	78.3	70.4	46.3	61.2	91.0	13.3	88.8	96.0	7.8	94.0
Bangs	92.1	6.9	92.5	93.5	3.7	94.8	91.1	4.3	93.3	92.3	2.7	94.7
Big-Lips	62.0	18.2	70.8	61.9	38.9	61.4	71.0	17.3	76.5	61.2	9.5	73.7
Big-Nose	69.4	24.5	72.3	68.4	18.6	74.4	70.0	35.2	67.3	68.1	18.5	74.3
Black-Hair	81.8	6.7	87.2	86.8	5.0	90.7	85.8	7.5	89.0	85.1	6.6	89.1
Blond-Hair	77.6	28.2	74.6	87.5	14.9	86.2	90.3	9.4	90.4	84.8	17.2	83.7
Blurry	56.6	5.0	72.1	85.2	3.2	90.7	87.8	5.2	91.2	76.1	1.1	86.4
Brown-Hair	76.4	14.4	80.8	83.5	3.9	89.4	75.9	10.1	82.4	79.4	1.0	88.4
Bushy-Eyebrows	78.5	13.8	82.2	83.4	6.9	88.0	84.2	4.6	89.5	80.2	12.2	83.8
Chubby	64.3	1.2	78.8	75.4	28.0	75.3	70.2	23.7	73.1	75.6	17.9	78.7
Double-Chin	62.0	17.5	71.1	75.7	25.0	73.6	85.1	16.9	84.0	76.0	18.7	78.5
Eyeglasses	97.3	1.3	97.9	98.5	0.7	98.8	96.7	3.6	96.5	98.6	0.3	99.1
Gray-Hair	78.4	5.4	85.9	83.9	16.9	83.4	83.7	11.1	86.2	89.2	12.9	88.1
Heavy-Makeup	72.1	43.1	63.8	71.4	47.2	61.0	78.3	37.2	69.9	71.4	37.9	66.5
High-Cheekbones	84.3	10.1	87.0	84.8	3.2	90.5	81.0	6.8	86.7	84.3	9.0	87.5
Mouth-Slightly-Open	92.1	3.4	94.3	93.3	0.9	96.1	89.8	1.9	93.8	93.2	0.1	96.4
Narrow-Eyes	70.9	7.3	80.7	76.2	3.6	85.4	75.8	2.3	85.7	73.9	3.5	84.0
No-Beard	74.6	39.1	67.2	77.0	32.7	71.9	71.4	50.3	59.1	79.1	26.3	76.3
Oval-Face	61.5	25.6	67.5	59.9	11.2	72.2	61.9	12.1	73.2	64.8	3.7	78.2
Pale-Skin	75.6	10.6	82.0	86.4	8.9	88.7	87.5	3.1	92.0	87.2	7.6	89.7
Pointy-Nose	61.8	15.3	71.9	67.5	6.0	79.1	65.6	28.1	68.6	59.4	14.2	70.8
Receding-Hairline	70.5	15.7	76.9	82.6	16.5	83.0	82.6	12.6	84.9	81.0	2.6	88.6
Rosy-Cheeks	66.4	43.7	61.0	74.0	39.7	66.6	78.3	30.1	73.9	59.6	21.7	68.0
30 Sideburns	68.7	45.5	60.9	68.6	53.2	56.1	68.7	53.3	56.1	66.9	35.0	65.9
Smiling	90.9	1.1	94.7	91.8	1.9	94.8	90.4	2.5	93.8	91.7	2.0	94.7
Straight-Hair	63.1	12.4	73.8	65.9	9.6	76.7	73.6	6.5	82.6	75.3	12.2	81.1
Wavy-Hair	70.0	28.3	70.8	73.1	4.2	83.3	78.2	19.5	79.3	79.7	7.8	85.6
Wearing-Earrings	71.9	31.5	70.1	70.4	39.9	64.9	78.8	22.0	78.3	74.4	29.1	72.6
Wearing-Hat	72.5	41.2	65.1	93.6	2.9	95.3	94.2	5.5	94.3	95.1	1.9	96.5
Wearing-Lipstick	78.1	33.9	71.7	74.6	41.3	65.9	79.1	35.8	71.0	79.0	34.0	72.0
Wearing-Necklace	52.5	5.6	68.9	62.4	57.7	50.9	65.6	14.7	74.5	60.8	20.8	69.1
Wearing-Necktie	70.8	20.3	75.0	74.2	34.2	69.8	79.3	35.7	71.2	80.0	13.3	83.2
Average	72.7	19.0	76.6	77.5	19.8	78.8	79.4	17.1	81.1	78.5	12.7	82.7

Table 4. Overall classification results for sensitive attribute “Young” on CelebA. We report all experimental results for 35 target attributes with respect to sensitive attribute “Young”.

Attribute	Baseline [43]			LfF [20]			BPA [24]			Ours
Attribute	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT
Arched-Eyebrows	74.4	6.5	83.1	70.6	5.6	81.2	71.3	10.5	79.6	76.8	5.6	84.9
Attractive	78.1	18.8	79.6	75.2	16.9	79.0	77.6	18.0	79.7	75.6	18.4	78.5
Bags-Under-Eyes	70.8	15.5	77.2	67.3	7.7	78.3	73.8	7.3	82.4	71.8	3.6	82.7
Bald	63.4	6.4	76.3	54.2	2.0	71.4	78.2	13.3	82.2	92.7	2.5	95.0
Bangs	93.0	1.2	95.8	91.4	1.3	94.9	91.9	1.6	95.0	92.8	0.2	96.2
Big-Lips	63.0	14.2	73.1	54.4	0.9	71.8	72.1	11.8	79.5	61.6	7.7	74.6
Big-Nose	71.1	22.9	74.0	66.0	14.5	74.8	73.2	17.4	77.6	70.7	13.1	78.1
Black-Hair	81.6	3.5	88.5	83.3	4.2	89.2	86.8	3.5	91.4	85.6	0.8	92.0
Blond-Hair	88.2	2.4	92.7	91.4	1.3	94.9	92.3	2.6	94.7	90.9	2.0	94.3
Blurry	56.8	1.9	73.3	53.2	1.1	70.9	98.6	3.3	97.6	76.1	1.5	86.2
Brown-Hair	78.9	2.4	87.5	77.4	1.7	86.9	76.5	5.9	84.6	79.2	1.8	87.9
Bushy-Eyebrows	81.9	3.2	88.8	79.6	2.3	87.9	85.8	9.2	88.2	82.7	1.8	89.9
Chubby	61.4	9.3	73.9	60.0	5.8	74.2	70.7	14.1	77.7	78.0	10.0	83.6
Double-Chin	64.1	11.7	74.7	49.5	1.7	67.8	81.4	23.8	78.7	77.7	13.4	81.9
Eyeglasses	97.1	1.1	97.9	96.8	1.9	97.4	96.1	4.0	96.0	98.6	0.6	98.9
Gray-Hair	68.2	25.0	71.4	72.2	19.7	76.0	76.7	17.6	79.4	86.1	17.1	84.4
Heavy-Makeup	88.5	7.0	90.7	88.6	2.2	93.0	91.0	4.6	93.1	87.2	4.6	91.1
High-Cheekbones	85.4	3.4	90.7	84.2	4.3	89.6	81.1	6.4	87.0	85.2	3.9	90.4
Mouth-Slightly-Open	92.0	1.7	95.0	92.5	1.7	95.3	89.4	6.2	91.5	93.0	1.1	95.8
Narrow-Eyes	70.0	2.4	82.1	54.8	3.0	71.5	75.1	2.1	85.3	73.8	2.6	84.4
No-Beard	90.9	2.6	94.0	85.7	0.9	92.0	91.5	2.7	94.3	91.3	1.9	94.6
Oval-Face	60.6	18.5	69.9	57.2	9.8	70.9	61.7	5.3	75.6	63.6	5.4	76.8
Pale-Skin	76.7	6.5	84.4	65.9	5.4	78.3	87.7	2.3	92.5	87.1	7.1	89.9
Pointy-Nose	63.3	8.6	75.4	62.7	9.0	74.9	67.8	17.2	74.7	60.2	8.9	73.3
Receding-Hairline	70.6	16.3	76.7	63.3	2.2	77.8	81.8	10.2	85.6	80.5	5.6	87.0
Rosy-Cheeks	84.1	5.1	89.2	67.4	3.3	80.1	82.5	7.6	87.2	67.8	6.1	79.2
30 Sideburns	89.8	1.7	93.9	72.6	1.9	83.9	92.1	1.8	95.0	85.4	2.4	91.2
Smiling	89.8	5.4	92.1	91.1	2.7	94.1	89.9	3.2	93.2	91.7	1.7	94.9
Straight-Hair	60.0	17.4	69.9	59.2	2.0	75.0	70.9	12.8	78.4	75.4	3.1	85.1
Wavy-Hair	74.3	11.9	80.7	74.7	3.9	84.4	78.3	1.8	87.4	82.2	4.3	88.5
Wearing-Earrings	83.8	2.3	90.3	63.8	1.5	78.4	83.1	6.5	88.0	84.9	2.8	90.7
Wearing-Hat	90.3	3.4	93.3	89.6	2.4	93.4	94.0	3.6	95.1	95.0	1.9	96.5
Wearing-Lipstick	90.4	7.1	91.6	92.3	3.4	94.4	93.6	4.5	94.5	93.3	3.7	94.7
Wearing-Necklace	55.8	5.1	71.5	62.1	5.1	75.9	73.7	10.9	80.8	69.0	7.3	79.5
Wearing-Necktie	79.5	4.5	86.9	79.1	6.2	85.9	84.8	14.3	85.2	84.6	6.5	88.8
Average	76.8	7.9	83.9	82.4	7.4	87.2	82.1	8.2	86.7	81.3	5.1	87.7

Table 5. Results on Sub1 of UTK Face. We respectively set Age and Gender as the target and sensitive attributes.

η

represents the severity of data imbalance. As

η

increases, ours improves fairness to a greater extent than the baseline.

Table 5. Results on Sub1 of UTK Face. We respectively set Age and Gender as the target and sensitive attributes.

η

represents the severity of data imbalance. As

η

increases, ours improves fairness to a greater extent than the baseline.

Method	Severity of Data Imbalance $(η) = 2$			Severity of Data Imbalance $(η) = 3$			Severity of Data Imbalance $(η) = 4$
Method	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT
Baseline [43]	82.1	17.7	82.1	81.9	21.0	80.4	80.3	26.0	77.0
LfF [20]	82.9	11.6	85.5	82.2	17.6	82.2	81.4	21.5	79.9
BPA [24]	83.1	16.6	83.2	82.0	21.8	80.0	80.3	22.6	78.8
Ours	82.2	6.8	87.4	81.3	10.5	85.2	80.3	14.4	82.8

Table 6. Results on Sub2 of UTK Face. We respectively set Race and Gender as the target and sensitive attributes.

Method	Severity of Data Imbalance $(η) = 2$			Severity of Data Imbalance $(η) = 3$			Severity of Data Imbalance $(η) = 4$
Method	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT	BAcc.	EO	g-FAT
Baseline [43]	81.8	10.1	85.7	81.6	15.2	83.1	79.3	21.2	79.0
LfF [20]	88.2	5.3	91.3	87.7	10.1	88.7	86.9	11.5	87.8
BPA [24]	87.7	6.7	90.4	86.5	12.5	86.9	85.8	14.3	85.7
Ours	87.9	4.1	91.7	87.4	8.1	89.6	87.2	9.3	88.9

Table 7. Ablation study for bias capturing branches. We respectively set Arched-Eyebrows as the target attribute and Male, which is the benign bias for it, as the sensitive attribute. We exclude MCB and BCB from the overall framework and compare BAcc., EO, and g-FAT with respect to benign bias.

Method	BAcc.	EO	g-FAT
Baseline [43]	64.8	31.5	66.6
Ours without BCB	69.3	38.0	65.5
Ours without MCB	72.4	20.1	76.0
Ours	71.3	17.1	76.7

Table 8. Classification results on Cat and Dog. We respectively set the target attribute to species and the sensitive attribute to color. The results show that our method can effectively mitigate a more general type of bias.

Method	Balanced Accuracy	EO	g-FAT
Baseline [43]	79.9	20.7	79.5
LfF [20]	81.6	14.0	83.7
BPA [24]	87.7	10.7	88.4
Ours	87.1	8.3	89.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, P.; Park, S. Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases. Symmetry 2026, 18, 344. https://doi.org/10.3390/sym18020344

AMA Style

Lee P, Park S. Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases. Symmetry. 2026; 18(2):344. https://doi.org/10.3390/sym18020344

Chicago/Turabian Style

Lee, Pilhyeon, and Sungho Park. 2026. "Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases" Symmetry 18, no. 2: 344. https://doi.org/10.3390/sym18020344

APA Style

Lee, P., & Park, S. (2026). Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases. Symmetry, 18(2), 344. https://doi.org/10.3390/sym18020344

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Fairness Without Demographic Labels via Identifying and Mitigating Potential Biases

Abstract

1. Introduction

2. Related Works

2.1. Fairness in Classification Using Sensitive Attributes

2.2. Improving Fairness Without Sensitive Attributes

3. Proposed Method

3.1. Preliminary

3.2. Malignant Bias Capturing Branch

3.3. Benign Bias Capturing Branch

3.4. Debiasing Module

4. Experiment

4.1. Dataset

4.2. Implementation Detail

4.3. Analysis of Attributes of CelebA

4.4. Evaluation Metric

4.5. g-FAT: Generalized Fairness-Accuracy Trade-Off Metric

4.6. Classification Results on CelebA

4.7. Classification Results on UTK Face

4.8. Ablation Study on Benign Bias

4.9. Extension to General Bias

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI