Local Data Debiasing for Fairness Based on Generative Adversarial Training

: The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness of the process and the possible resulting discrimination. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a ﬂexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on their proﬁle before releasing it. Finally, experiments on real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.


INTRODUCTION
In recent years, the availability and the diversity of large-scale datasets, the algorithmic advancements in machine learning and the increase in computational power have led to the development of personalized services and prediction systems to such an extent that their use is now ubiquitous in our society.For instance, machine learning-based systems are now used in banking for assessing the risk associated with loan applications [26], in hiring system [11] and in predictive justice to quantify the recidivism risk of an inmate [5].Despite their usefulness, the predictions performed by these algorithms are not exempt from biases, and numerous cases of discriminatory decisions have been reported over the last years.
For example, going back on the case of predictive justice, a study conducted by ProPublica showed that the recidivism prediction tool COMPAS, which is currently used in Broward County (Florida), is strongly biased against black defendants, by displaying a false positive rate twice as high for black persons than for white persons [19].
If the dataset exhibits strong detectable biases towards a particular sensitive group (e.g., an ethnic or minority group), the naÃŕve solution of removing the attribute identifying the sensitive group prevents only direct discrimination.Indeed, indirect discrimination can still occur due to correlations between the sensitive attribute and other attributes.
In this paper, we propose a novel approach called GANSan (for Generative Adversarial Network Sanitizer) to address the problem of discrimination due to the biased underlying data.
In a nutshell, our approach learns a sanitizer (in our case a neural network) transforming the input data in a way that maximize the following two metrics : (1) fidelity, in the sense that the transformation should modify the data as little as possible, and (2) nondiscrimination, which means that the sensitive attribute should be difficult to predict from the sanitized data.
A typical use case might be one in which a company during its recruitment process offers to job applicants a tool to remove racial correlation in their data before submitting their sanitized profile on the job application platform.If built appropriately, this tool would make the recruitment process of the company free from racial discrimination as it never had access to the original profile.
Overall, our contributions can be summarized as follows.
• We propose a novel adversarial approach, inspired from Generative Adversarial Networks (GANs) [16], in which a sanitizer is learned from data representing the population.The sanitizer can then be applied on a profile in such way that the sensitive attribute is removed, as well as existing correlations with other attributes while ensuring that the sanitized profile is modified as little as possible, preventing both direct and indirect discrimination.• Our objective is more generic than simply building a nondiscriminating classifier, in the sense that we aim at debiasing the data with respect to the sensitive attribute.Thus, one of the main benefits of our approach is that the sanitization can be performed without having any knowledge regarding the tasks that are going to be conducted in the future on the sanitized data.• Another strength of our approach is that once the sanitizer has been learned, it can be used locally by an individual (e.g., on a device under his control) to generate a modified version of his profile that still lives in the same representation space, but from which it is very difficult to infer the sensitive attribute.In this sense, our method can be considered to fall under the category of randomized response techniques [35] as it can be distributed before being used locally by a user to sanitize his data.Thus, it does not require his true profile to be sent to a trusted third party.Of all of the approaches that currently exist in the literature to reach algorithmic fairness [14], we are not aware of any other work that has considered the local sanitization with the exception of [31], which focuses on the protection of privacy but could also be applied to enhance fairness.• To demonstrate its usefulness, we have proposed and discussed four different evaluation scenarios and assessed our approach on real datasets for these four different scenarios.In particular, we have analyzed the achievable trade-off between fairness and utility measured both in terms of the perturbations introduced by the sanitization framework but also with respect to the accuracy of a classifier learned on the sanitized data.
The outline of the paper is as follows.First, in Section 2, we introduce the system model before reviewing the background notions on fairness metrics.Afterwards, in Section 3, we review the related work on methods for enhancing fairness belonging to the preprocessing approach like ours before describing GANSan in Section 4. Finally, we evaluate experimentally our approach in Section 5 before concluding in Section 6.

PRELIMINARIES
In this section, we first present the system model used in this paper before reviewing the background notions on fairness metrics.

System model
In this paper, we consider the generic setting of a dataset D composed of N records.Each record r i typically corresponds to the profile of the individual i and is made of d attributes, which can be categorical, discrete or continuous.Amongst those, the sensitive attribute S (e.g., gender, ethnic origin, religious belief, . . . ) should remain hidden to prevent discrimination.In addition, the decision attribute Y is typically used for a classification task (e.g., accept or reject an individual for a job interview).The other attributes of the profile, which are neither S nor Y , will be referred hereafter as A.
For simplicity, in this work we restrict ourselves to the situations in which these two attributes are binary (i.e., S ∈ {0, 1} and Y ∈ {0, 1} ).However, our approach can also be generalized to multivalued attributes, although quantifying fairness for multivalued attributes is much more challenging than for binary ones [23].Our main objective is to prevent the possibility of inferring the sensitive attribute from the sanitized data.
This objective is similar to the protection against group membership inference, which in our context amounts to distinguish between the two groups generated by the values of S, which we will refer to as the sensitive group (for which S = 0) and the default group (for which S = 1).

Fairness metrics
First, we would like to point out that there are many different definitions of fairness existing in the literature [1,6,9,14,18,33] and that the choice of the appropriate definition is highly dependent of the context considered.
For instance, one natural approach for defining fairness is the concept of individual fairness [9], which states that individuals that are similar except for the sensitive attribute should be treated similarly (i.e, receive similar decisions).This notion relates to the legal concept of disparate treatment [2], which occurs if the decision process was made based on sensitive attributes.This definition is relevant when discrimination is caused by the decision process.Therefore, it cannot be used in the situation in which the objective is to directly redress biases in the data.
In contrast to individual fairness, group fairness relies on statistic of outcomes of the subgroups indexed by S and can be quantified in several ways, such as demographic parity [3] and equalized odds [17].More precisely, the demographic parity corresponds to the absolute difference of rates of positive outcomes in the sensitive and default groups (for which respectively S = 0 and S = 1): while equalized odds is the absolute difference of odds in each subgroup: Compared to demographic parity, equalized odds is more suitable when the base rates in both groups differ (P(Y = 1|S = 0) P(Y = 1|S = 1)).Note that these definitions are agnostic to the cause of the discrimination and are based solely on the assumption that statistics of outcomes should be similar between subgroups.
In our work, we follow a different line of research by defining fairness in terms of the inability to infer S from other attributes [12,36].This approach stems from the observation that it is impossible to discriminate based on the sensitive attribute if the latter is unknown and cannot be predicted from other attributes.Thus, our approach aims at sanitizing the data in such a way that no classifier should be able to infer the sensitive attribute from the sanitized data.
The inability to infer the attribute S can be measured by the accuracy of a predictor Adv trained to recover the hidden S (sAcc), as well as the balanced error rate (BER) introduced in [12]: The BER captures the predictability of both classes and a value of 1 2 can be considered optimal for protecting against inference in the sense that it means that the inferences made by the predictor are not better than a random guess.In addition, the BER is more relevant than the accuracy of a classifier sAcc at predicting the sensitive attribute for datasets with imbalanced proportions of sensitive and default groups.Thus, a successful sanitization would lead to a significant drop of the accuracy while raising the BER close to its optimal value of 0.5.

RELATED WORK
In recent years, many approaches have been developed to enhance the fairness of machine learning algorithms.Most of these techniques can be classified into three families of approaches, namely (1) the preprocessing approach [10,12,24,38] in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), (2) the algorithmic modification approach (also sometimes called constrained optimization) in which the learning algorithm is adapted to ensure that it is fair by design [22,37] and (3) the postprocessing approach that modifies the output of the learning algorithm to increase the level of fairness [17,21] 1 .Due to the limited space and as our approach falls within the preprocessing approach [10,12,24,38] in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), we will review afterwards only methods of this category that makes use of an adversarial training.
Several approaches have been explored to enhance fairness based on adversarial learning.For instance, Edwards and Storkey [10] have trained an encoder to output a representation from which an adversary is unable to predict the group membership, from which a decoder can reconstruct the data and on which decision predictor still performs well.Madras, Creager, Pitassi and Zemel [25] extended this framework to satisfy the equality of opportunities constraint [17] and explored the theoretical guarantees for fairness provided by the learned representation as well as the ability of the representation to be used for different classification tasks.Beutel, Chen, Zhao and Chi [4] have studied the impact of data quality on fairness in the context of adversarial learning, and demonstrated for instance that learning a representation independent of the sensitive attribute with a balanced dataset ensures statistical parity.Zhang, Lemoine and Mitchell [39] have designed a decision predictor satisfying group fairness by ensuring that an adversary is unable to infer the sensitive attribute from the predicted outcome.McNamara, Ong and Williamson [27] have investigated the benefits and drawbacks of fair representation learning, demonstrating that techniques building fair representations restrict the space of possible decisions, limiting the usages of resulting data while providing fairness.
All these previous approaches does not preserve the interpretability of the data, in the sense that the modified profile lives in a different space than the original one.One notable exception is FairGan [36], which maintains the interpretability of the profile.Their objective is to learn a fair classifier on a dataset that has been generated such that it is discrimination-free and whose distribution on attributes is close to the original one.While FairGan generates a synthetic dataset close to the original data while being discrimination free, one key difference with GANSan is that FairGan cannot be used to sanitize directly a particular profile.Following a similar line of work, there is a growing body of research investigating the use of adversarial training to protect the privacy of individuals during the collection or disclosure of data.Feutry, Piantanida, Bengio and Duhamel [13] have proposed an anonymization procedure based on the learning of an encoder, an adversary and a label predictor.The authors have ensured the convergence of these three networks during training by proposing an efficient optimization procedure with bounds on the probability of misclassification.Pittaluga, Koppal and Chakrabarti [29] have designed a procedure based on adversarial training to hide a private attribute of a dataset.Romanelli, Palamidessi and Chatzikokolakis [31] have designed a mechanism to create a dataset preserving the original representation.They have developed a method for learning an optimal privacy protection mechanism also inspired by GAN [32], which they have applied to location privacy.The objective is to minimize the amount of information (measured by the mutual information) preserved between S and the prediction made on the decision attribute by a classifier while respecting a bound on utility.With respect to the local sanitization and randomized response techniques, most of them are applied in the context of privacy protection [34].Our approach is among the first that places the protection of information at the individual level as the user can locally sanitize his data before publishing it.

ADVERSARIAL TRAINING FOR DATA DEBIASING
As previously explained, removing the sensitive attribute is rarely sufficient to guarantee non-discrimination as correlations are likely to exist between other attributes and the sensitive one.
In general, detecting and suppressing complex correlations between attributes is a difficult task.
To address this challenge, our approach GANSan relies on the modelling power of GANs to build a sanitizer that can cancel out correlations with the sensitive attribute without requiring an explicit model of those correlations.In particular, it exploits the capacity of the discriminator to distinguish the subgroups indexed by the sensitive attribute.Once the sanitizer has been trained, any individual can apply it locally on his profile before disclosing it.The sanitized data can then be safely used for any subsequent task.

Generative adversarial network sanitization
High level overview.Formally, given a dataset D, the objective of GANSan is to learn a function S an , called the sanitizer that perturbs individual profiles of the dataset D, such that a distance measure called the fidelity f id (in our case we will use the L 2 norm) between the original and the sanitized datasets ( D = S an (D) = { Ā, Ȳ }), is minimal, while ensuring that S cannot be recovered from D. Our approach differs from classical conditional GAN [28] by the fact that the objective of our discriminator is to reconstruct the hidden sensitive attribute from the generator output, whereas the discriminator in classical conditional GAN has to discriminate between the generator output and samples from the true distribution.
Figure 1 presents the high-level overview of the training procedure, while Algorithm 1 describes it in details.
The first step corresponds to the training of the sanitizer S an (Algorithm 1, Lines 7−17).The sanitizer can be seen as the generator similarly to standard GAN but with a different purpose.In a nutshell, it learns the empirical distribution of the sensitive attribute and generate a new distribution that concurrently respects two objectives: (1) finding a perturbation that will fool the discriminator in predicting S while (2) minimizing the damage introduced by the sanitization.More precisely, the sanitizer takes as input the original dataset D (including S and Y) plus some noise P z .The noise introduced is used to prevent the over-specialization of the sanitizer on the training set while making the reverse mapping of sanitized profiles to their original versions more difficult as the mapping will be probabilistic and not deterministic.As a result, even if the sanitizer is applied twice on the same profile, it can produced two different modified profiles.
The second step consists in training the discriminator D isc for predicting the sensitive attribute from the data produced by the sanitizer S an (Algorithm 1, Lines 18 − 24).The rationale of our approach is that the better the discriminator is at predicting the sensitive attribute S, the worse the sanitizer is at hiding it and thus the higher the potential risk of discrimination.
These two steps are run iteratively until convergence of the training.
Training objective of GANSan .Let S be the prediction of S by the discriminator ( S = D isc (S an (D))).Its objective is to accurately predict S, thus it aims at minimizing the loss J D isc (S, S) = d disc (S, S).In practice in our work, we instantiate d disc as the Mean Squared Error (MSE).
Given an hyperparameter α representing the desired trade-off between the fairness and the fidelity, the sanitizer minimizes a loss combining two objectives: on the sensitive attribute.The term 1  2 is due to the objective of maximizing the error of the discriminator (i.e., recall that the optimal value of the BER is 0.5).
Concerning the reconstruction loss d r , we have first tried the classical Mean Absolute Error (MAE) and MSE losses.However, our ▷ compute the sensitive loss ì J S an = concat( ì J S an , e S ) 14: for loss ∈ ì J S an do ▷ Back-propagation using loss Backpropagate Loss

23:
Update D isc weights 24: end for

25:
end for

26:
Save S an and D isc states 27: end for initial experiments have shown that these losses produce datasets that are highly problematic in the sense that the sanitizer always outputs the same profile whatever the input profile, which protects against attribute inference but renders the profile unusable.Therefore, we had to design a slightly more complex loss function.More precisely, we chose not to merge the respective losses of these attributes (e , yielding a vector of attribute losses whose components are iteratively used in the gradient descent.
Hence, each node of the output layer of the generator is optimized to reconstruct a single attribute from the representation obtained from the intermediate layers.The vector formulation of the loss is as follows: ì T and the objective is to minimize all its components.
The details of the parameters used for the training are given in Appendices A and B.

Performance metrics
The performance of GANSan will be evaluated by taking into account the fairness enhancement and the fidelity to the original data.With respect to fairness, we will quantify it primarily with the inability of a predictor Adv, hereafter referred to as the adversary, in inferring the sensitive attribute (cf.Section 2) using its Balanced Error Rate (BER) [12] and its accuracy sAcc (cf., Section 2.2).We will also assess the fairness using metrics (cf.Section 2) such as demographic parity (Equation 1) and equalized odds (Equation 2).
To measure the fidelity f id between the original and the sanitized data, we have to rely on a notion of distance.More precisely, our approach does not require any specific assumption on the distance used, although it is conceivable that it may work better with some than others.For the rest of this work, we will instantiate f id by the L 2 -norm as it does not differentiate between attributes.
Note however that a high fidelity is a necessary but not a sufficient condition to imply a good reconstruction of the dataset.In fact, as mentioned previously early experiments showed that the sanitizer might find a "median" profile to which it will map all input profiles.Thus, to quantify the ability of the sanitizer to preserve the diversity of the dataset, we introduce the diversity measure, which is defined in the following way : While f id quantifies how different the original and the sanitized datasets are, the diversity measures how diverse the profiles are in each dataset.We will also provide a qualitative discussion of the amount of damage for a given fidelity and fairness to provide a better understanding of the qualitative meaning of the fidelity.Finally, we evaluate the loss of utility induced by the sanitization by relying on the accuracy yAcc of prediction on a classification task.More precisely, the difference in yAcc between a classifier trained on the original data and one trained on the sanitized data can be used as a measure of the loss of utility introduced by the sanitization with respect to the classification task.

EXPERIMENTAL EVALUATION
In this section, we describe the experimental setting used to evaluate GANSan as well as the results obtained.

Experimental setting
Dataset description.We have evaluated our approach on two datasets that are classical in the fairness litterature, namely the Adult Census Income as well as on German Credit.Both are available on the UCI repository 2 .Adult Census reports the financial situation of individuals, with 45222 records after the removal of rows with empty values.Each record is characterized by 15 attributes among which we selected the gender (i.e., male or female) as the sensitive one and the income level (i.e., over or below 50K$) as the decision.German Credit is composed of 1000 applicants to a credit loan, described by 21 of their banking characteristics.Previous work [20] have found that using the age as the sensitive attribute by binarizing it with a threshold of 25 years to differentiate between old and young yields the maximum discrimination based on DemoParity.In this dataset, the decision attribute is the quality of the customer with respect to his credit score (i.e., good or bad).Due to lack of space, we will mostly discuss the results on Adult dataset in this section.However, the results obtained on German credit were quite similar.Training process.We will evaluate GANSan using metrics among which the fidelity f id, the BER as well as the demographic parity DemoParity (cf.Section 4.2).For this, we have conducted a 10-fold cross-validation during which the dataset is divided into ten blocks.During each fold, 8 blocks are used for the training, while another one is retained as the validation set and the last one as the test set.
We computed the BER and sAcc using the internal discriminator of GANSan and three external classifiers independent of the GANSan framework, namely Support Vector Machines (SVM) [7], Multilayer Perceptron (MLP) [30] and Gradient Boosting (GB) [15].For all these external classifiers and all epochs, we report the space of achievable points with respect to the fidelity/fairness trade-off.Note that most approaches described in the related work (cf.Section 3) do not validate their results with independent external classifiers trained outside of the sanitization procedure.
The fact that we rely on three different family of classifiers is not fullproof, in the sense that it might exist another classifiers that we have not tested that can do better, but it provides a higher confidence on the strength of the sanitization than simply relying on the internal discriminator.
For each fold and each value of α, we train the sanitizer during 40 epochs.At the end of each epoch, we save the state of the sanitizer and generate a sanitized dataset on which we compute the BER, sAcc and f id.Afterwards, HeuristicA is used to select the sanitized dataset that is closest to the "ideal point" (BER = 0.5, f id = 1).More precisely, HeuristicA is defined as follows: ) 2 + f id e , f or e ∈ {1, . . ., MaxEpoch}} with BER min referring to the minimum value of BER obtained with the external classifiers.For each value of α ∈ [0, 1], HeuristicA selects among the sanitizers saved at the end of each epoch, the one achieving the highest fairness in terms of BER for the lowest damage.We will use the three families of external classifiers for computing yAcc, DemoParity and EqOddGap.We also used the same chosen test set to conduct a detailed analysis of its reconstruction's quality (diversity and quantitative damage on attributes).

Evaluation scenarios
Recall that GANSan takes as input the whole original dataset (including the sensitive and the decision attributes) and outputs a sanitized dataset (without the sensitive attribute) in the same space as the original one, but from which it is impossible to infer the sensitive attribute.In this context, the overall performance of GANSan can be evaluated by analyzing the reachable space of points characterizing the trade-off between the fidelity f id to the original dataset and the fairness enhancement.More precisely, during our experimental evaluation, we will measure the fidelity between the original and the sanitized data, as well as the diversity, both in relation with the BER and sAcc, computed on this dataset.
However, in practice, our approach can be used in several situations that differ slightly from one another.In the following, we detail four scenarios that we believe as representing most of the possible use cases of GANSan .To ease the understanding, we will use the following notation: the subscript tr (respectively ts) will denote the data in the training set (respectively test set 24.78% 70% Table 1: Distribution of the different groups with respect to the protected attribute and the decision one for both the Adult Census Income and the German Credit datasets.and the decision attributes), the decision in the original training set, the attributes the sanitized training set and the decision attribute in the sanitized training set.Table 2 describes the composition of the training and the testings sets for these four scenarios.
Table 2: Scenarios envisioned for the evaluation of GANSan .Each set is composed of either the original attributes or their sanitized versions, coupled with either the original or sanitized decision.

Scenario Train set composition Test set composition
Original Sanitized Original Scenario 1 : complete data debiasing.This setting corresponds to the typical use of the sanitized dataset, which is the prediction of a decision attribute through a classifier.The decision attribute is also sanitized as we assumed that the original decision holds information about the sensitive attribute.Here, we quantify the accuracy of prediction of { Ȳ } t s as well as the discrimination represented by the demographic parity (Equation 1) and equalized odds (Equation 2).Scenario 2 : partial data debiasing.In this scenario, similarly to the previous one, the training and the test sets are sanitized with the exception that the sanitized decision in both these datasets { Ā, Ȳ } is replaced with the original one { Ā, Y }.This scenario is generally the one considered in the majority of paper on fairness enhancement [10,25,38], the accuracy loss in the prediction of the original decision {Y } t s between this classifier and another trained on the original dataset without modifications {A} t r is a straightforward way to quantify the utility loss due to the sanitization.Scenario 3 : building a fair classifier.This scenario was considered in [36] and is motivated by the fact that the sanitized dataset might introduce some undesired perturbations (e.g., changing the education level from Bachelor to PhD).
Thus, a third party might build a fair classifier but still apply it directly on the unperturbed data to avoid the data sanitization process and the associated risks.More precisely in this scenario, a fair classifier is obtained by training it on the sanitized dataset { Ā} t r to predict the sanitized decision { Ȳ } t r .Afterwards, this classifier is tested on the original data ({A} t s ) by measuring its fairness through the demographic parity (Equation 1, Section 2).We also compute the accuracy of the fair classifier with respect to the original decision of the test set {Y } t s .Scenario 4 : local sanitization.The local sanitization scenario corresponds to local use of the sanitizer by the individual himself.For instance, the sanitizer could be used as part of a mobile phone application providing individuals with a mean to remove some sensitive attributes from their profile before disclosing it to an external entity.In this scenario, we assume the existence of a biased classifier, trained to predict the original decision {Y } t r on the original dataset {A} t r .The user has no control on this classifier, but he is allowed nonetheless to perform the sanitization locally on his profile before submitting it to the existing classifier similarly to the recruitment scenario discussed in the introduction.This classifier is applied on the sanitized test set { Ā} t s and its accuracy is measured with respect to the original decision {Y } t s as well as its fairness quantified by DemoParity.

Experimental results
General results on Adult.Figure 2 describes the achievable tradeoff between fairness and fidelity obtained on Adult.First, we can observe that fairness improves when α increased as expected.Even with α = 0 (i.e., maximum utility with no focus on the fairness), we cannot reach a perfect fidelity to the original data as we get at most f id α =0 ≈ 0.982 (cf. Figure 2).Increasing the value of α from 0 to a low value such as 0.2 provides a fidelity close to the highest possible (f id α =0.2 = 0.98), but leads to a BER that is poor (i.e., not higher than 0.2).Nonetheless, we still have a fairness enhancement, compared to the original data (f id or iд = 1, BER ≤ 0.15).
At the other extreme in which α = 1, the data is sanitized without any consideration on the fidelity.In this case, the BER is optimal as expected and the fidelity is 10% lower than the maximum achievable (f id α =1 ≈ 0.88).However, slightly decreasing the value of α, such as setting α = 0.96, allows the sanitizer to significantly remove the unwarranted correlations (BER ≈ 0.45) with a cost of 2.24% on fidelity (f id α =0.96 ≈ 0.95).
With respect to sAcc, the accuracy drops significantly when the value of α increases 3. Here, the optimal value is the proportion of the majority class, which GANSan renders the accuracy of predicting S from the sanitized set closer to that value.However, even at the extreme α = 1, it is nearly impossible to reach this optimal value.Similarly to BER, slightly decreasing α from this extreme value by setting α = 0.85 improves the sanitization while preserving a fidelity closer to the maximum achievable.
The quantitative analysis with respect to the diversity is shown in Figure 4.More precisely, the smallest drop of diversity obtained is 3.57%, which is achieved when we set α ≤ 0.2.Among all values of q q q q q q q q q q q q q q q q 0.2 0.3 0.4 0.5 0.90 0.95 1.00 Fidelity Fairness(BER) α q q q q q q q q < 0.02  The fairness improves with the increase of α, a small value providing a low fairness guarantee while a high one causes greater damage to the sanitized data.q q q q q q q q q q 0.70 0.75 0.80 0.96 0.97 0.98 0.99 1.00 Fidelity Fairness(Sacc) α q q q q q q < 0.02  sAcc decreases with the increase of α, a small value providing a low fairness guarantee while a larger one usually introduced a higher damage.Remark that even with α = 0, a small damage is to be expected.Points whose f idelity = 1 (lower right) represent the BER on the original (i.e., unperturbed) dataset.
α, the biggest drop observed is 36%.The application of GANSan , therefore introduces an irreversible perturbation as observed with the fidelity.This loss of diversity implies that the sanitization reinforces the similarity between sanitized profiles as α increases, rendering them almost identical or mapping the input profiles to a small number of stereotypes.When α is in the range [0.98, 1[ (i.e., complete sanitization), 75% of categorical attributes have a proportion of modified records between 10 and 40% (cf. Figure 4).
For numerical attributes, we compute the relative change (RC) normalized by the mean of the original and sanitized values: We normalize the RC using the mean (since all values are positives) as it allows us to handle situations in which the original values are equal to 0. With the exception of the extreme sanitization (α = 1), at least 70% of records in the dataset have a relative change lower than 0.25 for most of the numerical attributes.Selecting α = 0.9875 ≥ 0.98 leads to 80% of records being modified with a relative change less than 0.5 (c.f. Figure 9 in appendix C).
General results on German.Similarly to Adult, the protection increases with α.More precisely α = 0 (maximum reconstruction) achieves a fidelity of almost 0.96.The maximum protection of BER = 0.5 corresponds to a fidelity of 0.81 and a sensitive accuracy value of sAcc = 0.76.
We can observe on Figure6 that most values are concentrated on the 0.76 plateau, regardless of the fidelity and the value of α.We believe this is due to the high disparity of the dataset.The fairness on German credit is initially quite high, being close to 0.33.Nonetheless, we can observe three interesting trade-offs on Figure5, each located at a different shoulder of the Pareto front.These tradeoffs are A (BER ≈ 0.43, f id ≈ 0.94), B (BER ≈ 0.45, f id ≈ 0.84) and C (BER ≈ 0.5, f id ≈ 0.81), each achievable with α = 0.6 for the first one, and α = 0.9968 for the rest.
We review the diversity and the sanitization induced damage on categorical attributes in Figure 7.As expected, the diversity decreases with alpha, rendering most profiles identical with α = 1.We can also observe some instabilities higher α values produce a shallow range of diversities (i.e α ≥ 0.9) while smaller values have a higher range of diversities.Such instability is mainly explained by the size and the imbalance of the dataset, which does not allow the sanitizer to correctly learn the distribution (such phenomenon is common when training GANs with a small dataset).Nonetheless, most of the diversity results prove close to the original one, that is 0.51.The same trend is observed on the categorical attribute damage.For most values of α, the median damage is below or equal to 20%, meaning that we have to modify only two categorical columns in a record to prevent remove unwanted correlations.For the numerical damage, most columns have a relative change lower than 0.5 for more than 70% of the dataset, regardless of the value of α.Only columns Duration in month and Credit amount have a higher damage.This is due to the fact that these columns have a very large range of possible values compare to the other columns (33 and 921), especially for column Credit amount which also exhibit a nearly uniform distribution.Our reference points A, B and C have a median damage close than 10% for A and 20% for both B and C. The damage on categorical columns are also acceptable.
To summarize our results, GANSan is able to maintain an important part of the dataset structure despite sanitization, making it usable for other analysis tasks.However, at the individual level, some perturbation might be more important on some profiles than on others.A future work will investigate the relationship between the position of the profile in distributions and the damage introduced.For the different scenarios investigated hereafter, we fixed the value of α to 0.9875, which provides nearly a perfect level of sensitive attribute protection while leading to an acceptable damage on Adult.Due to space limitations, we will not discussed results obtained on German, the scenario analysis are available on the appendices D.2.Scenario 1 : complete data debiasing.In this scenario, we observe that GANSan preserves the accuracy of the dataset.More precisely, it increases the accuracy of the decision prediction on the sanitized dataset for all classifiers (cf. Figure 8, Scenario S1), compared to the original one which is 0.86, 0.84 and 0.78 respectively for GB, MLP and SVM.This increase can be explained by the fact that GANSan modifies the profiles to make them more coherent with the associated decision, by removing correlations between the sensitive attribute and the decision one.As a consequence, this sets the same decision to similar profiles in both the protected and the default groups.In fact, nearly the same distributions of decision attribute are observed before and after the sanitization but some record's decisions are shifted (7.56% ± 1.23% of decision shifted in the sanitized whole set, 11.44%±2.74% of decision shifted in the sanitized sensitive group for α = 0.9875).Such decision shift could be explained by the similarity between those profiles to others with the opposite decisions in the original dataset.
We also believe that the increase of accuracy is correlated with the drop of diversity.More precisely, if profiles become similar to each other, the decision boundary might be easier to find.
The discrimination is reduced as observed through DemoParity, EqOddGap 1 and EqOddGap 0 , which all exhibit a negative slope.When correlations with the sensitive attribute are significantly removed (α ⩾ 0.6), those metrics also significantly decrease.For instance, at α = 0.9875, BER ≥ 0.48, yAcc = 0.96, DemoParity = 0.0453, EqOddGap 1 = 0.0286 EqOddGap 0 = 0.0062 for GB; whereas as the original demographic parity gap and equalised odds gap are respectively DemoParity = 0.16, EqOddGap 1 = 0.083 EqOddGap 0 = 0.060 (cf., Tables 5 and 6 in appendices for more details).In this setup, FairGan [36] achieves a BER of 0.3862 ± 0036 an accuracy of 0.82 ± 0.01 and a demographic parity of 0.04 ± 0.02.Scenario 2 : partial data debiasing.Somewhat surprisingly, we observe an increase in accuracy for most values of alpha.The demographic parity also decreases while the equalized odds remains nearly constant (EqOddGap 1 , green line on Figure 8).Table 3 compare the results obtained to other existing work from the state-of-the-art.We include the classifier with the highest accuracy (MLP) and the one with the lowest one (SVM).From these results, we can observe that our method outperforms the others in terms of accuracy, but that the lowest demographic parity is achieved with the method proposed in [39] (DemoParity = 0.01), which is not surprising as this method is precisely tailored to reduce this metric.
Even though our method is not specifically constrained to mitigate the demographic parity, we can observe that it significantly improve it.Thus, while partial data debiasing is not the best application scenario for our approach as the original decision might be correlated with the sensitive attribute, it still mitigates its effect to some extent.Scenario 3 : building a fair classifier.The sanitizer helps to reduce discrimination based on the sensitive attribute, even when using the original data on a classifier trained on the sanitized one.As presented in the third row of Figure 8, as we force the system to completely remove the unwarranted correlations, the discrimination observed when classifying the original unperturbed data is reduced.On the other hand, the accuracy exhibits here the highest negative slope with respect to all the scenarios investigated.More precisely, Table 3: Comparison with other works on the basis of accuracy and demographic parity on Adult.
FairGan [36], which also investigated this scenario achieved yAcc = 0.82 and DemoParity = 0.05 ± 0.01 whereas our best classifier in accuracy (GB) achieves yAcc = 0.72±0.033and DemoParity = 0.12 ± 0.06 for α = 0.9875.Scenario 4 : local sanitization.On this setup, we observe that the discrimination is lowered as the α coefficient increases.Similarly to other scenarios, the more the correlations with the sensitive attribute are removed, the higher the drop of discrimination as quantified by the DemoParity, EqOddGap 1 as well as EqOddGap 0 , and the lower the accuracy on the original decision attribute.For instance, with GB, we obtain yAcc = 0.83 ± 0.039, DemoParity = 0.035 ± 0.022 at α = 0.9875 (the original values were yAcc = 0.86 and DemoParity = 0.16).With MLP which has the best DemoParity, we observe: yAcc = 0.77 ± 0.060, DemoParity = 0.025 ± 0.017 This proves that GANSan can be used locally, thus allowing users to contribute to large datasets by sanitizing and sharing their information without relying on any third party, with the guarantee that the sensitive attribute GANSan has been trained for is removed.
The drop of accuracy due to the local sanitization is 3.68% on GB (8% with MLP).Thus, for application requiring a time-consuming training phase, using GANSan to sanitize profiles without retraining the classifier seems to be a good compromise.

CONCLUSION
In this work, we have introduced GANSan , a novel sanitization method inspired by GANs achieving fairness by removing the correlations between the sensitive attribute and the other attributes of the profile.Our experiments demonstrate that GANSan can prevent the inference of the sensitive attribute while limiting the loss of utility as measured in terms of the accuracy of a classifier learned on the sanitized data as well as by the damage on the numerical and categorical attributes.In addition, one of the strengths of our approach is that it offers the possibility of local sanitization, by only modifying the attributes as little as possible while preserving the space of the original data (thus preserving interpretability).As a consequence, GANSan is agnostic to subsequent use of data as the sanitized data is not tied to a particular task.
While we have relied on three different types of external classifiers for capturing the difficulty to infer the sensitive attribute from the sanitized data, it is still possible that a more powerful classifier exists that could infer the sensitive attribute with higher accuracy.Note that this is an inherent limitation of all the preprocessing techniques and not only our approach.Nonetheless, as future work we would like to investigate other families of learning algorithms to complete the range of external classifiers.
Finally, much work still needs to be done to assess the relationship between the different fairness notions, namely the impossibility of inference and the individual and group fairness.

APPENDICES A PREPROCESSING OF DATASETS
The preprocessing step consists in first in one-hot encoding categorical and numerical attributes with less than 5 values followed with a scaling between 0 and 1.
In addition on Adult dataset, we need to apply a logarithm on columns capital − дain and capital − loss prior any step because of the fact that those attributes exhibit a distribution close to a Dirac delta [8], with the maximal values being respectively 9999 and 4356, and a median of 0 for both (respectively 91% and 95% of records have a value of 0).Since most values are equal to 0, the sanitizer will always nullify both attributes and the approach will not converge.Afterwards, a postprocessing step consisting of reversing the preprocessing ones is performed in order to remap the generated data to the original shape.The training rate represents the number of time for which an instance is trained during a single iteration.For instance, for an iteration i, the discriminator is trained with 100 * 50 = 5000 records while the sanitizer is trained with 1 * 100 = 100 records.The number of iterations is equal to: iterations = datasetsize batchsize .Our experiments were run for a total of 40 epochs.We varied the α value using a geometric progression:

C EVALUATION OF ADULT
This appendix is composed of supplementary results of the evaluation of the Adult dataset.

C.2 Evaluation of group-based discrimination
Table 5 summarizes the results obtained in terms of discrimination, and table 6 presents the sensitive attribute level for all classifiers.These results are computed with α = 0.9875.

C.3 Utility of GANSan
We present in Table 7 the utility of GANSan as measured in terms of the accuracy on the decision prediction, the fidelity and the diversity on Adult.In Tables 8 and 9, we present the records that have been maximally and minimally damaged due to the sanitization.

D.1 Damage and qualitative analysis
Looking at the numerical columns damage (Figure10), we can observe that most columns have a relative change lower than 0.5 for more than 70% of the dataset, regardless of the value of α.Only columns Duration in month and Credit amount have a higher damage.This is due to the fact that these columns have a very large range of possible values compare to the other columns (33 and 921), especially for column Credit amount which also exhibit a nearly uniform distribution.
First off all, we can observe that for all scenario, the accuracy is mostly stable with the increase of α for all classifiers.The sanitization does not significantly affect the quality of prediction, which is mostly around 75%, 7.143% greater than the proportion of the positive outcomes in the dataset.On scenario S3 and S4 This observation comes into contrast to adult, where the accuracy decrease with the increase of the protection coefficient.If we take a closer look at the fairness metrics as provided in Figure12, we observe that the DemoParity and EqOddGap 1 have a negative slope, which increase with α.In constrast, EqOddGap 0 is rather unstable, especially when α > 0.8.S1: complete data debiasing.In this scenario, we observe that the sanitization makes render the profiles in each decision group easily seperable, which in turn improve the accuracy as we can observe.The sanitization also reduces the risk of discrimination, just as we have seen on the Adult dataset.S2: partial data debiasing.: Even though the sanitized and original decisions does not share the same distribution, the sanitization is able to transform the dataset in such way that it improves the classifications performances of all classifiers.The discrimination on the other hand is almost constant, meaning that the original decision still preserve a certain amount of discrimination that is harder to remove on not sensitive attributes alone, especially on small dataset.S3: building a fair classifier.: Just as the results observed on Adult dataset, building a fair classifier by training it on sanitized data and testing/using it on unprocessed data proved to be less conservative of the accuracy.We observe a slight drop, from 0.75 to almost 0.65 for the first value of α, then it stays stable across all α.There decision boundaries learned by the fair classifier can not be directly transfered on another type of data as they do not share the same distribution.Concerning the fairness metrics, we observe two behaviour: for α ≤ 0.9 the fairness metrics are nearly constants in constrast to adult where they all seems to increase; the discrimination is reduced when we push the system close to the maximum (α > 0.9), but not to the extreme where the discrimination increases.The increase for extrÃłme values is due to the fact that the sanitization has almost completely perturbed the dataset, losing all of its structure.Thus, as this set is highly imbalanced both on the sensitive attribute distribution as well as the decision one, all classifiers to predict the majority labels which are in favor of the default group (c.f.Table 10).S4: local sanitization.On this dataset, this scenario provides the most significant results.As a matter of fact, the accuracy of the classifier is almost not affected by the sanitization, while the discrimination is reduced.MLP provides the most unstable EqOddGap 0 , and for all classifiers, we observe a reduction of DemoParity and EqOddGap 0 , which become significantly important with higher values of α (α > 0.8).This result is differents from Adult, where we observed a negative slope.An deeper analysis of such behaviour is left out as another research objective.
Table 12 provides the quantitative results for these 4 scenario on values of α that correspond to points A and B.

SFigure 1 :
Figure1: Overview of the framework of GANSan .The objective of the discriminator is to predict S from the output of the sanitizer D. The two objective functions that the framework aims at minimizing are respectively the discriminator and sanitizer losses, namely J D isc and J S an .

Figure 2 :
Figure 2: Fidelity/fairness trade-off on Adult.Each point represents the minimum possible BER of all the external classifiers.The fairness improves with the increase of α, a small value providing a low fairness guarantee while a high one causes greater damage to the sanitized data.

Figure 3 :
Figure 3: Fidelity-fairness trade-off on Adult.Each point represents the minimum possible sAcc of all the external classifiers.sAccdecreases with the increase of α, a small value providing a low fairness guarantee while a larger one usually introduced a higher damage.Remark that even with α = 0, a small damage is to be expected.Points whose f idelity = 1 (lower right) represent the BER on the original (i.e., unperturbed) dataset.

Figure 4 :Figure 5 :
Figure 4: Boxplots of the quantitative analysis of sanitized datasets selected using HeuristicA.These metrics are computed on the whole sanitized dataset.Modified records correspond to the proportion of records with categorical attributes affected by the sanitization.

Figure 6 :
Figure 6: Fidelity-Fairness trade-off on German Credit.Each point represents the minimum possible sAcc of all the external classifiers.

Figure 7 :
Figure 7: Diversity and categorical damage on German.

Figure 8 :
Figure 8: Accuracy (blue), demographic parity gap (orange) and equalized odds gap (true positive rate in green and false positive rate in red) computed for scenarios 1, 2, 3 and 4 (top to bottom), with the classifiers GB, MLP and SVM (left to right) on Adult dataset.The greater the value of α the better the fairness.Using only the sanitized data Ā (S1, S2) increases the accuracy while a combination of the original (A) and sanitized data ( Ā) decreases it.

Figure 9
Figure 9 summarizes the numerical damage on Adult computed with the formula detailed in Appendix ??.

Figure 9 :
Figure 9: Cumulative distribution of the relative change (x-axis) for numerical attributes, versus the proportion of records affected in the dataset (y-axis).

Figure 11 :
Figure 11: Accuracy (blue), demographic parity gap (orange) and equalized odds gap (true positive rate in green and false positive rate in red) computed for scenarios 1, 2, 3 and 4 (top to bottom), with the classifiers GB, MLP and SVM (left to right on german credit dataset).

Figure 12 :
Figure 12: Fairness metrics evaluated on different scenatio on german credit dataset.
).For instance, {Z } t r in which Z can either be A, Y , Ā or Ȳ , represents respectively the attributes of the original training set (not including the sensitive = S 0 , Female) Default (S x = S 1 , Male) Protected (S x = S 0 , Young) Default (S x = S 1 , Old) Pr (S = S x ) x

Table 4
details the parameters of the classifiers that have yielded the best results respectively on the Adult and German credit datasets.

Table 4 :
Hyper parameters tuning for Adult dataset.

Table 5 :
Equalized odds and demographic parity on Adult.

Table 6 :
Evaluation of GANSan 's sensitive attribute protection on Adult.

Table 7 :
Evaluation of GANSan 's utility on adult Census.

Table 8 :
Most damaged profiles for α = 0.9875 on the first and the fourth folds.Only the perturbed attributes are shown.