1. Introduction
Deep learning has been widely deployed in real-world application systems, such as face recognition [
1,
2], object detection [
3], and image generation [
4,
5], due to its tremendous advances. However, recent studies have revealed severe ethical issues behind the convenience [
6,
7,
8]. For example, COMPAS [
9], which is an assistant system for judges used in the United States, predicted the recidivism risk of African-Americans to be higher than that of Caucasians. Therefore, researchers have tried to ensure fairness in terms of sensitive attributes, which are properties that people should not be discriminated against, aiming to establish performance symmetry across demographic groups [
10,
11,
12,
13,
14]. On the broad topic of fairness, we narrow the scope to fairness-aware image classification, which is a popular and fundamental task in computer vision [
15,
16,
17].
A variety of existing studies have tried to mitigate the asymmetry in model predictions (i.e., fairness-aware classification) by leveraging additional information with respect to sensitive attributes. To this end, they should pre-define the attributes for which fairness would be improved and annotate the labels before the training process. Under the supervision of the extra labels, they could significantly improve fairness.
However, such methods have non-negligible limitations. First, annotating sensitive attributes can introduce ethical issues and privacy concerns. Since most sensitive attributes, such as ethnicity and gender, correspond to the personal information of data providers, their collection and use are often strictly restricted by institutional policies or legal regulations (e.g., GDPR [
18]). Therefore, the annotating task is cautious, label-intensive, and sometimes forbidden [
15,
19]. Moreover, they can be impractical in real-world applications. The methods only improve the fairness of sensitive attributes that are predefined during training time. For example, a model trained only with gender labels causes unfairness with respect to race or age. However, there has not been agreement on which attributes to set as sensitive, which can vary depending on the environment or user. Whenever the sensitive attribute changes, they need to retrain the model, which limits scalability and practicality. These issues increase the need for a new approach to improve fairness in an unsupervised manner.
To this end, recent studies [
20,
21,
22,
23,
24,
25,
26,
27] have proposed methods to ensure fairness without using sensitive attribute labels. However, a substantial portion of these studies [
20,
21,
23,
24,
25,
26] assume that the sensitive attributes possess specific characteristics, such as texture, color [
21,
23], malignant bias [
20], or a high correlation between the sensitive attributes and the target class [
24,
25,
26]. On the other hand, several studies [
21,
22,
25,
26,
27] designed and validated their proposed methods in simplistic and constrained environments, such as tabular datasets or synthesized image datasets. To address these limitations, our objective is to propose a new method that (1) enhances fairness without making any assumptions of the sensitive attributes and (2) is effectively deployed in a more complex environment, i.e., real-world image datasets.
Specifically, we draw inspiration from [
20], which shows that sensitive attributes can be classified into two types: malignant and benign biases, which are, respectively, harder or easier to learn than the target information. Motivated by this, we first compose two modules to capture malignant and benign biases independently, based on the finding that a model tends to learn malignant bias, target information, and benign bias sequentially. While LfF [
20] primarily focuses on mitigating malignant bias, our method is designed to handle both malignant and benign biases. To this end, we exploit the model’s learning dynamics to derive bias proxies using only the target labels, enabling fairness-aware training without sensitive attribute labels. In particular, we introduce an additional branch to capture benign bias separately and then suppress both bias proxies in the target classifier via adversarial training. As a result, our unsupervised fairness-aware framework learns more symmetric representations with respect to sensitive attributes [
10,
15] by leveraging the derived bias proxies.
In the experimental section, we validate our approach on two benchmark datasets for fairness-aware image classification, CelebA [
28] and UTK Face [
29]. Across various scenarios involving sensitive attributes, our approach significantly improves fairness regardless of attribute type and outperforms state-of-the-art methods. Moreover, we introduce g-FAT, a new metric for measuring generalized trade-offs between classification accuracy and fairness. In terms of g-FAT, ours achieves superior trade-off performances on the benchmarks.
We summarize the main contributions as follows.
We introduce branches that capture the sensitive attribute information without making any assumptions about the specific characteristics.
Based on the capturing branches, we design an Unsupervised Fairness-aware Framework for image classification (UFF) that ensures fairness without sensitive attribute labels.
Through extensive experiments on CelebA, UTK Face, and Cat and Dog [
30], we validate that ours significantly improves fairness with respect to various sensitive attributes.
We propose a new metric, namely g-FAT, to measure generalized trade-off performances between classification accuracy and fairness. The proposed method achieves the best trade-off performance in terms of g-FAT.
4. Experiment
In this section, we validate the proposed method on three datasets: CelebA [
28], UTK Face [
29], and Cat and Dog [
30]. To demonstrate that our method can ensure fairness across various sensitive attributes, we conduct extensive experiments across diverse scenarios. We include LfF [
20] and BPA [
24] as comparison methods, which are widely used baselines for mitigating demographic bias without requiring explicit group labels. In all the experiments, ours achieves a significant improvement in fairness without any knowledge of the sensitive attributes.
4.1. Dataset
CelebA [28]: It consists of about 200 k face images with 40 binary attributes. Among them, we set
Male and
Young attributes as the sensitive attributes and the others as the target classes. Exceptionally, we exclude three target attributes of
5-o-Clock Shadow,
Goatee, and
mustache since there is no data for a specific group in the test set, e.g., no data exists for women with mustaches. For all experiments, we follow the original composition of the training, validation, and test set.
UTK Face [29]: It contains about 20 k face images with three attributes:
Gender,
Race, and
Age. Following the previous work [
15,
39], we first reform
Race and
Age into binary attributes as follows.
Race—Caucasian or not;
Age—over 35 or not. Next, we compose two subsets for training (i.e., Sub1 and Sub2) with the different correlations between the attributes.
- ‒
Sub1: The target attribute “Age” has a high correlation with the sensitive attribute “Gender”
- ‒
Sub2: The target attribute “Race” has a high correlation with the sensitive attribute “Gender”
For example, according to the bias intensity , we set male data to be times higher than female data for the young, and vice versa for the old in Sub1; that is, the male data are correlated with the old in this subset. Similarly, we set male data to be times the female data for Caucasians, and vice versa for the other races in Sub2. To validate the effectiveness of the proposed method in various levels of data imbalance, we set to 2, 3, and 4. Unlike the training set, we compose the validation and test sets to be fully balanced.
Cat and Dog [30]: It is a relatively small-scale dataset, including about 40 k dog and cat images. Following the setting of previous work [
42], we set the species as the target attribute and color (i.e., bright or dark) as the bias. To establish a correlation between them, we set the white dogs to be 5 times as many as the black dogs, and the black cats to be five times as many as the white cats. As with UTK Face, we set the validation and test sets to be fully balanced. Although it is not common to regard color bias as a sensitive attribute, we design this experiment to show that our method can improve fairness with respect to not only facial sensitive attributes but also various kinds of bias.
4.2. Implementation Detail
For all comparative models, we use ResNet-18 [
43] as the encoder and a single-layer Multi-Layer Perceptron (MLP) as the classifier. We use the Adam optimizer [
44] with a learning rate of
. We resize input images to 128 × 128 for CelebA and UTKFace, and to 224 × 224 for Cat and Dog. These resolutions were chosen to standardize the input size across experiments and to keep the computational and memory costs reasonable for fair comparison. We trained the model to converge sufficiently (i.e., 10 epochs for CelebA, and 50 epochs for UTK Face and Cat and Dog). All the results are averaged over three independent runs. For LfF [
20], the original version shows poor classification performance on CelebA and UTK Face because the pre-trained encoder network was not trained on the target dataset (e.g.,
CelebA). Therefore, we use it to fine-tune the encoder network on the target dataset and achieve comparable performance. For BPA [
24], we set the number of clusters
k to 4 since the performance is saturated from
. For ours, we first fix the hyperparameters for balancing losses between the classification branches (i.e.,
,
,
) to 1 for simplification. On top of that, we find the other hyperparameters by jointly searching
from 0.1 to 1.0. As a result, in CelebA and UTK Face, we set
, and in Cat and Dog, we set them to 0.1. For both LfF and ours, we set
in GCE. We note that ours achieves the best trade-off performances with such simplified hyperparameter tuning.
4.3. Analysis of Attributes of CelebA
To verify the effectiveness of the proposed method in addressing malignant and benign biases, we first analyze the bias types of sensitive attributes (i.e.,
Male and
Young) across all target attributes. Following the previous work [
20], we observe whether the training loss of bias-aligned samples is in contrast to that of bias-conflicting samples in the early training stage. If the sensitive attribute is the malignant bias, the trends of the losses are contrasted, i.e., the loss of the bias-conflicting samples is increased, and that of the bias-aligned samples is decreased; otherwise, the trends are similar to each other. Based on these observations, we categorize the bias types of target attributes as shown in
Table 1. For a more reliable evaluation, we mark some attributes with inconsistent or unclear tendencies as “confused”.
4.4. Evaluation Metric
In the experiments, we evaluate the performance of comparative models from two perspectives: fairness and classification accuracy. In general, it is known that there is a trade-off between fairness and classification accuracy [
12,
39], and thus the goal is to improve fairness while maintaining accuracy at the most. Among a variety of metrics for fairness, such as Demographic Parity (DP) [
40], Equal Opportunity (EOPP) [
40], and Equalized Odds (EO) [
40], we adopt Equalized Odds (EO), which is the average disparity of the true positive rate and false positive rate between different sensitive groups [
40]. EO is particularly suitable for our setting since it evaluates group disparities in both positive and negative outcomes, thereby capturing overall error asymmetry across sensitive groups. In contrast, other metrics, such as DP and EOPP, may be satisfied by skewed prediction behaviors (e.g., indiscriminately increasing positive predictions) and can overlook unfairness in negative outcomes [
40]. On the other hand, we measure the balanced accuracy (i.e., BAcc.), which is the average accuracy over all sensitive groups. We use BAcc. rather than overall accuracy, since overall accuracy can be dominated by majority groups under imbalanced test distributions, masking poor performance on underrepresented groups. It not only reflects the classification ability of a model but also its robustness to distribution shifts in the test environment.
4.5. g-FAT: Generalized Fairness-Accuracy Trade-Off Metric
Moreover, we introduce a generalized version of FAT [
45], called g-FAT, to evaluate the generalized trade-off performance between classification accuracy and fairness. While FAT measures trade-off performance using the harmonic mean of accuracy and fairness, a subjective decision is required about the importance ratio
between them. To overcome this limitation, we define g-FAT by integrating FAT values for
as follows:
where
is a metric for fairness, i.e., equalized odds. For measurement convenience, we utilize the following approximation:
where
.
Compared to the original FAT, which requires selecting a single importance ratio , g-FAT aggregates the trade-off scores over a range of values, thereby reducing sensitivity to any particular choice of and providing a more robust summary of the fairness-accuracy trade-off. Second, g-FAT summarizes the fairness–accuracy trade-off into a single scalar score, enabling convenient comparison across different methods. Third, because of its harmonic-mean formulation, g-FAT assigns lower scores to methods that improve one objective at the expense of the other, thereby favoring balanced improvements in both fairness and accuracy.
We also note several limitations. First, because distinct pairs of balanced accuracy and equalized odds can yield similar g-FAT values, g-FAT should be interpreted alongside the underlying metrics. Second, g-FAT depends on the choice of the fairness metric , and its absolute value may change under alternative fairness criteria. Finally, we approximate the integral using a discrete set A. While the score may vary slightly with different samplings of , we use a fixed A to ensure consistent evaluation across all methods.
4.6. Classification Results on CelebA
For all target attributes in CelebA, we measure balanced accuracy (BAcc.), equalized odds (EO), and generalized FAT (g-FAT) with respect to the two sensitive attributes (i.e.,
Male and
Young). In
Table 2, we first summarize the results by sensitive attribute. As demonstrated in the previous work [
20], LfF achieves a significant improvement of balanced accuracy in all the settings. However, compared with enhancing EO for malignant bias, it fails to substantially improve EO for benign bias. Conversely, BPA [
24] shows limited progress in improving fairness against malignant bias. The proposed method significantly outperforms them in terms of EO while also demonstrating a substantial improvement in BAcc. compared to the baseline. As a result, ours achieves the best trade-off performance in terms of g-FAT. It is noted that our method is the only approach among all comparison methods that enhances fairness across all types of bias. In addition, we report all classification results for two sensitive attributes,
Male and
Young, respectively, in
Table 3 and
Table 4. We note that the type and strength of bias can vary substantially across target attributes, which may lead to attribute-wise performance variations across methods. Accordingly, while some comparison methods can outperform ours on specific attributes, our method is designed to jointly handle both malignant and benign biases, resulting in the best average EO and g-FAT over all attributes.
4.7. Classification Results on UTK Face
In
Table 5 and
Table 6, we report BAcc., EO, and g-FAT according to the intensity of data imbalance
η, respectively, for Sub1 and Sub2. In
Table 5, although LfF [
20] and BPA [
24] improve both BAcc. and EO over the baseline, the improvements are not significant. Meanwhile, ours largely ameliorates EO over the baseline across all the environments while maintaining balanced accuracy, resulting in superior g-FAT scores. Moreover, the improvement of fairness becomes larger as
increases. Overall,
Table 6 exhibits a similar trend to
Table 5, but all the comparable methods notably improve BAcc. as well as EO.
4.8. Ablation Study on Benign Bias
In contrast to the previous study (i.e., LfF [
20]) that we are inspired by, our method is designed to improve fairness not only in terms of malignant bias but also the other kind of bias (i.e., benign bias). Moreover, we claim that the benign bias capturing branch (BCB) is more crucial in effectively improving fairness for the benign bias. To demonstrate it, we validate the effectiveness of the capturing branches (i.e., BCB and MCB) for mitigating the benign bias on CelebA. Specifically, we set
Arched-Eyebrows as the target attribute and
Male, which is the benign bias for it, as the sensitive attribute and ablates each branch from the proposed method. In
Table 7, Ours without BCB rather aggravates EO over the baseline, despite an increase in the balanced accuracy, resulting in a degradation of g-FAT. It indicates that MCB slightly helps improve fairness for benign bias. Meanwhile, ours without MCB achieves a level of fairness comparable to the complete model, thereby demonstrating that BCB predominantly contributes to enhancing fairness for benign bias.
4.9. Extension to General Bias
To verify that our method can robustly handle general data asymmetry (i.e., color bias) as well as facial sensitive attributes, we further conduct experiments on Cat and Dog [
30]. Because the training dataset is severely imbalanced, the baseline achieves highly unfair classification performance.
Table 8 shows that all comparable methods significantly improve balanced accuracy and fairness compared with the baseline. Among them, ours shows the best trade-off performance between them, indicating the effectiveness of the proposed method for general bias.
5. Conclusions
In this paper, we proposed an unsupervised fairness-aware framework for image classification. Unlike previous work, the proposed method ensures fairness regardless of the types of sensitive attributes. To this end, we first demonstrated that a classification network learns target information before benign bias. Based on this observation, we designed a benign bias-capturing branch. By integrating it with the malignant bias capturing branch, we can capture diverse types of biases (i.e., malignant and benign biases). Subsequently, we mitigated these biases through adversarial training to learn symmetric representations with respect to undisclosed sensitive attributes. In the experiments, ours shows consistent improvements in the fairness–accuracy trade-off across multiple datasets. Specifically, on the CelebA dataset, ours reduced Equalized Odds (EO) for malignant bias from 11.8 to 7.6 and for benign bias from 15.6 to 9.6 compared to the baseline model. Furthermore, for the newly proposed g-FAT metric, our method achieved the highest score of 85.2 among the compared unsupervised methods. Moreover, we show that our method can handle not only facially sensitive attributes but also other types of data asymmetry; on the color-biased Cat and Dog dataset, our method improved balanced accuracy from 79.9 to 87.1 compared with the baseline model. These results suggest that ours achieves a favorable trade-off between fairness and accuracy without requiring any sensitive attribute labels. Finally, we discuss the limitations of our work and outline directions for future research. First, exploring how to further reduce the performance gap between unsupervised approaches and fully supervised methods that leverage sensitive attribute labels remains an important direction for future work. Second, our study primarily focuses on binary target labels and a binary sensitive attribute; extending the framework to multi-class targets and intersectional sensitive attributes is a promising avenue. Third, while adversarial optimization is effective, improving its training stability and computational efficiency remains an important topic for future study.