Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling

Vega-Gonzalo, María; Christidis, Panayotis

doi:10.3390/su14148416

Open AccessArticle

Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling

by

María Vega-Gonzalo

^1,2,*

and

Panayotis Christidis

²

¹

Centro de Investigación del Transporte (TRANSyT), Universidad Politécnica de Madrid, 28040 Madrid, Spain

²

Joint Research Centre (JRC), European Commission, 41092 Seville, Spain

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(14), 8416; https://doi.org/10.3390/su14148416

Submission received: 30 May 2022 / Revised: 30 June 2022 / Accepted: 5 July 2022 / Published: 9 July 2022

(This article belongs to the Section Psychology of Sustainability and Sustainable Development)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The increasing use of new data sources and machine learning models in transport modelling raises concerns with regards to potentially unfair model-based decisions that rely on gender, age, ethnicity, nationality, income, education or other socio-economic and demographic data. We demonstrate the impact of such algorithmic bias and explore the best practices to address it using three different representative supervised learning models of varying levels of complexity. We also analyse how the different kinds of data (survey data vs. big data) could be associated with different levels of bias. The methodology we propose detects the model’s bias and implements measures to mitigate it. Specifically, three bias mitigation algorithms are implemented, one at each stage of the model development pipeline—before the classifier is trained (pre-processing), when training the classifier (in-processing) and after the classification (post-processing). As these debiasing techniques have an inevitable impact on the accuracy of predicting the behaviour of individuals, the comparison of different types of models and algorithms allows us to determine which techniques provide the best balance between bias mitigation and accuracy loss for each case. This approach improves model transparency and provides an objective assessment of model fairness. The results reveal that mode choice models are indeed affected by algorithmic bias, and it is proven that the implementation of off-the-shelf mitigation techniques allows us to achieve fairer classification models.

Keywords:

algorithmic bias; behavioural transport models; machine learning; fairness; emerging technologies; data sources

1. Introduction

Machine learning (ML) models have gained considerable importance in the development of behavioural transport models that aim to predict individuals’ mode choices [1]. One of the main characteristics of ML techniques is their ability to determine the relationships between features on the basis of patterns revealed by the data, instead of establishing a theoretical framework to describe and quantify causality, as traditional discrete choice models do [2]. ML models are in general more flexible in terms of required input data types and structures, which highlights the potential of new sources of mobility data [3]. The use of multimodal platforms for trip planning, social media networks, transport cards, navigation devices, etc., has resulted in the generation of unprecedented volumes of personal mobility data, that could be of great use for transport planning and policy making [4].

Although the combination of big data and machine learning techniques for travel behaviour modelling promises new opportunities for the field, it also poses several risks that must be considered in order to ensure that these models efficiently fulfil the task they were designed for. One of these risks is algorithmic bias or unfairness, which occurs when models make systematic errors on the classification of unprivileged socio-economic groups. This lack of fairness is caused by the use of training data that misrepresent the behaviour of these groups, reflecting discriminatory beliefs or attitudes occurring in society [5]. One of the first examples of this problem was the COMPAS model—an algorithm used in multiple U.S. jurisdictions to predict the risk of recidivism for criminal defendants. The analysis of the classification errors by race revealed that, despite race not being included as an explanatory feature, black defendants were twice as likely as whites to be labelled with a high risk of reoffending but not actually reoffend [6]. As in this case, machine learning is present in many domains of society in which the predicted outcomes can significantly affect an individual’s life opportunities (loan default prediction, college admissions, recruitment processes, etc.). For this reason, the need to ensure that these automated decisions are not biased has become an urgent topic for the artificial intelligence community [7].

As of yet, the problem of algorithmic bias in transport behaviour modelling has received only limited attention, presumably due to the still limited adoption of machine learning techniques in the field. However, using biased models for transport policy development might lead to the deployment of inefficient or even unfair policies that would perpetuate or enhance inequality and discrimination. Therefore, to ensure the success of transport policy design, it is necessary that the detection and mitigation of algorithmic bias becomes part of the modelling process. This works aims to provide further insights into how the already existing mitigation algorithms can be implemented in the development of mode choice models. In this sense, two research questions have been formulated. First, the investigation wants to determine which kind of the already existing bias mitigation algorithms provides a better balance between bias mitigation and accuracy loss. Second, the question of whether the use of training data acquired through uncontrolled methods (opportunistic data) could increase the bias in the classification results will be addressed. To answer this questions, three predictions models trained with different kinds of data have been tested for the existence of bias and thereafter subjected to three algorithmic bias mitigation techniques implemented at different stages of the model pipeline (pre-processing, in-processing and post-processing).

This paper is structured as follows. After the introductory chapter, a literature review (Section 2) addressing the existing works about machine learning and transport modelling (Section 2.1), algorithmic fairness (Section 2.2), bias mitigation algorithms (Section 2.3) and inequality in transport models (Section 2.4) is presented. Section 3 explains the materials and methods that have been used during this research. Specifically, Section 3.1 presents an overview of the methodology, Section 3.2 explains the fairness metrics and bias mitigation algorithms that have been chosen and Section 3.3 describes the models that have been analysed, as well as the data that have been used to create those models. Section 4 presents the results obtained from the bias mitigation process and Section 5 discusses those results. Finally, Section 6 gathers the conclusions drawn from this work and proposes further research lines.

2. Literature Review

2.1. Machine Leaning and Transport Modelling

As machine learning techniques are gaining ground in transport modelling, an increasing variety of works using different algorithms that are fed with different kinds of input data types can been found in literature [8]. These techniques have been applied for a broad variety of goals, such as detecting changes in travel patterns [9], flow prediction [10] or travel time estimation [11].

One of the most frequent applications of ML in the transport modelling field is mode choice prediction [12,13,14]. In general, ML techniques have been proved to achieve a higher prediction accuracy than traditional discrete choice models [15,16], partly due to its greater flexibility in capturing underlying interactions between variables than traditional discrete choice models [2]. At the same time, they also present some drawbacks with respect to traditional econometric models, such as their limited interpretability [17,18], the delivery of elasticities that are incongruent with the existing behavioural literature [19] or the difficulty to consistently adapt ML algorithms to discrete choice data [20]. These limitations have been addressed in the literature by comparing different ML methodologies [21,22,23,24] or comparing the performance and variable importance of ML algorithms with different logit models [18,25,26].

The use of ML for transport modelling applications already includes a noticeable number of contributions that propose and compare different methodologies using a wide variety of data types. These works are a key contribution to gain deeper understanding on the different levels of accuracy, interpretability and behavioural coherence that can be obtained. However, a problem that has been frequently observed if ML models [27,28] and that has been seldom mentioned in the literature regarding ML and mode choice modelling is the issue of algorithmic fairness. This methodological challenge must also be considered when using this kind of technique for transport modelling.

2.2. Fairness Definitions and Metrics

Defining algorithmic fairness can be a controversial issue in which several moral, legal and technical aspects should be considered. From a legal point of view, lack of fairness (i.e., discrimination) occurs when a member of a legally protected group suffers a prejudicial treatment. This can be a consequence of disparate treatment or of disparate impact. The former occurs when the membership of these groups is explicitly considered in the decision-making process. The later happens when, even if the sensitive attribute is not considered for deciding, the outcomes are unequal across different groups [29].

From the moral perspective, the definition of group fairness is related with how differences across socio-economic groups are understood. In this regard, Friedler et al. [30] provide two definitions of fairness corresponding to two different conceptions of the world. On the one hand, it can be considered that all groups are equal and the differences that could be found in the data are a consequence of a structural bias that impacts on the data acquisition. This worldview receives the name of “We Are All Equal”. On the other hand, it can be assumed that real differences exist regarding the behaviour of different groups and that researchers can capture them without bias. This worldview is referred to as “What You See Is What You Get”.

The technical dimension of algorithmic group fairness aims to bring together the legal and moral considerations explained above by building metrics that allow us to detect discrimination in ML algorithms. A comprehensive overview of the existing fairness metrics can be found in Majumder et al. [31] and Verma and Rubin [32].

The first worldview (considering that all groups are equal) implies that positive rates must be equal across groups and therefore the predicted outcome must be independent from the sensitive attribute. This consideration leads to the definition of fairness as Statistical Parity also called Equal Acceptance Rate, which requires all groups to have an equal probability of being assigned the favourable outcome, no matter what their true label is [33].

According to the second worldview (considering that there are behavioural differences across groups), a model would only be fair when the classification accuracy is equal across these groups (i.e., equality of error rates must be achieved). As different types of error rates can be evaluated when assessing the accuracy of a model, a wider range of fairness definitions is included in this group. Predictive parity considers that a model is fair when the True Positive Rate (TPR) is equal across groups [34]. Predictive Equality uses equality of the False Positive Rate (FPR) across groups to evaluate fairness [35]. The Equality of Opportunity criteria considers the equality of the False Negative Rate (FNR) across groups [36]. The Equality of Odds definitions requires both the TPR and the FPR to be equal between the different groups [36,37]. The decision of which definition and consequently which fairness metrics is more appropriate to evaluate the existence of bias depends to a large extent on the purpose of the model. Models supporting decisions that might limit the individual’s life opportunities (i.e., default prediction or recidivism prediction) should prioritize the minimisation of the FPR for the discriminated groups. On the contrary, models which would imply a favourable outcome for the individual should minimise the FNR for these groups [38].

2.3. Bias Mitigation Algorithms

The existing algorithms for bias mitigations can be classified according to the stage of the pipeline in which they are implemented [7].

There is a first group of techniques that are based on the pre-processing of the input data before the classifier is trained. The simplest approach to bias mitigation is the suppression of the sensitive attribute, known as “Fairness through unawareness”. However, the correlation between the sensitive attribute and the other attributes has been observed to make this method inefficient [39]. Other pre-processing approaches involve the transformation of the data aiming to remove the bias by mapping the original dataset to a new data space that resembles the original as much as possible [40]. These methods include the modification of labels for some of the individuals [40,41], resampling the data [42], mapping the data to a new representation space [43,44] or assigning weights to the observations [45].

The in-processing bias mitigation algorithms address the issue at the training stage by introducing regularisation terms in the loss function or by subjecting the optimisation problem to fairness constrains. Kamishima et al. [46] introduce a regulariser that accounts for the mutual information between the protected feature and the predicted variable, in such way that the larger the correlation is, the less influence this feature has on the prediction (Prejudice Remover). B. H. Zhang et al. [47] propose a debiasing technique for Neuronal Networks, including the addition of a layer that predicts the protected attribute as a function of all the other attributes (Adversarial Debiasing). Fairness is achieved when the predicted outcome on the main model has no impact on the adversary’s classification. Another group of in-processing algorithms achieves fairness through the introduction of constraints in the optimisation problem [48,49,50]. Finally, some works propose the creation of new classification algorithms that consider both the accuracy and a selected fairness metric in their loss function, instead of the modification of already existing algorithms [35,51,52].

Post-processing techniques modify the output provided by a calibrated classifier to make the predictions fairer while minimising the impact on accuracy. This kind of algorithms counts with a smaller number of contributions in the literature, probably because they may be sub-optimal if non-discrimination constrains are not relaxed [53]. Most of the proposed methods compute the probabilities with which the predicted labels should be changed to optimise a chosen fairness metric [36,54,55].

2.4. Transport Modelling and Equality

The differences in transport choices and preferences across socio-economic groups as well as the drivers of these differences have been frequently addressed in the literature. Women have been observed to make significantly different choices when deciding when, why and how to use the available transport modes [56,57]. Race also seems to play a role on the access to transport opportunities, and consequently in the decisions on how to move [58,59,60]. Several authors have also pointed out the effect of low income, low level of education and suburban residential location on transport behaviour [61,62,63].

Gaining understanding on why the different groups make different choices is undoubtedly the first step towards creating policies that reduce the transport gap between privileged and unprivileged groups [64]. Nevertheless, if the modelling framework that is used to analyse individuals’ behaviours is misrepresenting specific socio-economic groups, the results could cause researchers and policy makers to draw biased conclusions. Therefore, accounting for algorithmic bias when developing transport choice models is fundamental to advance towards transport equality.

The main, and to the best of the authors’ knowledge, work in the field of algorithmic fairness in transport models is the study by Zheng et al. [65]. In this work, a Binary Logistic Regression and a Deep Neuronal Network are employed to predict travel behaviour using three different travel surveys. The bias is mitigated by introducing a regularisation term in the loss function that accounts for the correlation between the predicted probability distribution of two socio-economic groups. The contribution of this work with respect to the previous literature and, in particular, to the work of Zheng et al. [65], is twofold. First, in this work, three bias mitigation techniques have been applied at three different stages of the model development process, which allows us to determine which type of algorithm provides the best balance between bias mitigation and accuracy loss for the models under study. Second, while the models analysed in Zheng et al. [65] are solely based on survey data, this work includes both a model built using survey data as well as a model which employs opportunistic data containing user information from a multimodal trip planner. The consideration of these two data sources allows us to analyse whether the impact of algorithmic bias might be dependent on the kind of data used to train the model. Therefore, this research aims to evaluate the performance of different types of bias mitigation algorithms and analyse if the level of bias of a model and the extent to which mitigation algorithms are able to remove it depends on the method of acquiring the training data (survey data vs. multimodal trip planning data).

3. Materials and Methods

3.1. Methodology Overview

Transport models that employ ML techniques to predict human transport behaviour can, as any other ML model [28], provide unfair predictions that lead to discrimination against disadvantaged socio-economic groups. As the ML community is starting to gain awareness of the impact of this problem and the urgency to tackle it, several bias mitigation techniques are being proposed. However, these techniques have only recently been implemented and made available as open tools for researchers of all fields to analyse the existence of bias in their algorithms and mitigate it. The methodology designed in this paper employs the AIF360 library [38], developed by IBM to detect and mitigate algorithmic bias.

The bias mitigation methodology has been applied to three different models. The first one is COMPAS, a widely known model in the field of algorithmic fairness, that has been selected as a benchmark to test our methodology, so that the obtained results are comparable to a certain extent with already existing works. The second model has been built using survey data and a logistic regression as modelling tools. Finally, the third one is an XGBoost classifier that has been trained on user data from a multimodal trip planner of the city of Beijing. From the socio-economic features included as predictors in each of those models, two protected attributes (one for the benchmark model) were identified as potentially discriminatory. Thereupon, two bias metrics based on two different definitions of fairness were computed for each protected attribute. Next, three mitigation algorithms (each of them at a different stage of the pipeline) were implemented for each pair model-protected attribute allowing us to compare bias reduction and the impact on the prediction accuracy achieved for each of the debiasing algorithms.

It is noteworthy that both the biased and the de-biased models were trained with a training dataset, while the classification accuracy and bias metrics are computed using the classification results of an unseen (test) dataset. Given that the fairness metrics vary considerably for the different test and training set, a K-folds cross validation procedure with five folds have been employed to take this variability into account when analysing the results. Therefore, the values of the fairness metrics are presented as the average value and the standard deviations of the measurements.

3.2. Fairness Metrics and Bias Mitigation Algorithms

The measurement of model fairness considers two metrics, each of them representing different definitions of fairness. Both metrics have a differential nature, zero being the value at which the model would be considered unbiased.

(1): The Statistical Parity Difference (SPD) measures the difference in the probability of being labelled with the favourable outcome between an individual that belongs to the unprivileged group and an individual that belongs to the privileged group. The SPD represents a definition of fairness that does not consider the accuracy of the predictions (i.e., this metric does not consider the true label of an individual), but only the parity on the probability of being assigned the positive label. Equation (1) shows the mathematical expression of this metric, where D represents the sensitive attribute.

S P D = P (\hat{Y} = 1 | D = u n p r i v i l e d g e d) - P (\hat{Y} = 1 | D = p r i v i l e d g e d)

(1)

(2): The Equal Opportunity Difference (EOD) [36] is defined as the difference between the TPR between the unprivileged and privileged group. The EOD measures how accurate the model is when correctly predicting a favourable label for the unprivileged group with respect to the privileged group. Equation (2) shows the mathematical expression of this metric.

E O D = T P R_{D = u n p r i v i l e d g e d} - T P R_{D = p r i v i l e d g e d}

(2)

The metrics described above were measured for the original classification model and after the implementation of each of the three bias mitigation algorithms that have been selected. Figure 1 shows the pipeline of the methodology that was implemented aiming to mitigate the bias. As it can be observed, three algorithms (represented in the diagram by trapezoidal shape elements) have been applied separately at different stages of the pipeline. (1) The Reweighting Algorithm [42,45] is applied to the training data in the pre-processing stage, before the classifier is trained; (2) the MetaFairClassifier [51] is an in-processing algorithm that substitutes the baseline classifier; and (3) the Calibrated Equalized Odds [55] is a post-processing algorithm that is applied to the prediction provided by the baseline classifier.

3.2.1. Reweighting

The Reweighting Algorithm intends to mitigate the bias by assigning weights to the tuples “protected attribute-label”. The weight (

W_{i}

) assigned to each observation (

x_{i}

) is computed as the probability of the observation having a protected attribute value (D) and a label (Y) assuming independence between the two, divided by the observed probability of that combination (Equation (3)).

W_{i} = \frac{P_{e x p e c t e d} (D = x_{i} (D) \land Y = x_{i} (Y))}{P_{o b s e r v e d} (D = x_{i} (D) \land Y = x_{i} (Y))}

(3)

The main advantage of this method is that, unlike most of the available pre-processing techniques, it avoids modifying the labels. The baseline classifier is provided with a weighted dataset that preserves all values from the original dataset, while facilitating the construction of a fair model.

3.2.2. MetaFairClassifier

The MetaFairClassifier is a Bayesian classifier that incorporates fairness metrics as constraints in the optimisation of the loss functions. In the AIF360 library, the implementation of this algorithm offers two possible fairness metrics to be used as constraints, the Disparate Impact (DI, Equation (4)) and the False Discovery Rate (FDR, Equation (5)) Difference. As explained before, DI together with SPD are metrics that only consider the rate of positive outcomes without considering the accuracy of the classification. The FDR is defined as the proportion of instances wrongly classified as positive with respect to the total number of positive predictions, and the FDR difference is the difference between this ratio for the privileged and the unprivileged group. In the case of travel mode choice models, it can be assumed that different groups behave differently and the goal of debiasing is to ensure that models are equally accurate across groups. This equality target is better expressed through the FDR difference. Thus, this metric has been used as a constraint.

D I = \frac{P (\hat{Y} = 1 | D = u n p r i v i l e d g e d)}{P (\hat{Y} = 1 | D = p r i v i l e d g e d)}

(4)

F D R d i f f e r e n c e = F D R_{D = u n p r i v l e d g e d} - F D R_{D = p r i v i l e d g e d} = {(\frac{F P}{F P + T P})}_{D = u n p r i v l e d g e d} - {(\frac{F P}{F P + T P})}_{D = p r i v l e d g e d}

(5)

Additionally, the MetaFairClassifier allows a variable level of bias repair varying from 0 (no debiasing) to 1 (full repair). As perfect fairness has in general a great impact on the performance of the classifier, a balance between debiasing and loss of accuracy should be reached through the bias repair parameter. For the tuning of this parameter, the value that provides the best trade-off between balanced accuracy and FDR was selected following an iterative process.

3.2.3. Calibrated Equalized Odds

Equal Odds is a fairness conditions that is satisfied when none of the potentially discriminated groups suffer from a disproportional FPR or FNR. This condition could only be satisfied by perfect classifiers. However, when only one of the error rates (either false positive or false negative) is considered, it is possible to achieve the condition while minimising the loss of accuracy. The Calibrated Equalized Odds algorithm modifies the scores provided by an already calibrated classifier of a random set of individuals from the unprivileged group and assigns them new scores that minimise the error rate difference between groups [55]. In this work, the error rate used as a cost constraint for each of the protected attributes is the one that yielded the best trade-off between bias mitigation and accuracy loss.

It is worth noticing that the Calibrated Equalized Odds technique requires an additional partition of the original dataset (see validation dataset in Figure 1) to train the post-processor and thereafter apply it to the unseen test dataset.

3.3. Description of the Models and Data

In this section, a description of the models that have been analysed is provided. The main elements of the description are additionally summarized in Table 1.

3.3.1. COMPAS

The COMPAS model is a widely known model in the algorithm fairness literature and it has been frequently used for scientific works related to algorithmic fairness [35,66,67,68]. In this research, a simplified version of the COMPAS model serves as a benchmark to provide a reference for the fairness metric values before and after the implementation of the mitigation algorithms.

The COMPAS model predicts the probability of reoffending as a function of some characteristics of the individual and their previous criminal behaviour. The characteristics of the individual that are included are gender, race and age. Race is included as a binary variable that distinguishes African Americans (race = 1) from Caucasians (race = 0). Age is a three category variable that has been encoded using three dummy variables for less than 25 years old, between 25 and 45 years old and more than 45 years old. The information on previous criminal behaviour includes a categorical variable with three categories indicating the number of prior arrests (no prior arrests, between one and three prior arrests, more than three priors), as well as a binary variable specifying if the charges were due to felony or misdemeanour. The dataset contains information for 5278 individuals. Table 2 shows the sample distribution of the data used for the COMPAS model. The sample is clearly unbalanced for the three socio-economic variables—80.5% of the defendants are male, 60.2% are African Americans and almost 60% are between 25 and 45 years old. Regarding the number of priors, the sample is well distributed across the three categories that have been defined (31.6% of the individuals have no priors, 37.0% have between 0 and 3 charges and 31.4% have more than 3 charges). On the other hand, most of the defendants (65.2%) are accused with charges of felony, while only 34.8% are accused of misdemeanour.

The modelling framework used for the COMPAS cases is a logistic regression model and the protected attribute that was identified as potentially discriminatory is race, where Caucasians are the privileged group and African Americans the unprivileged group.

3.3.2. Active Modes Model

This model was built by Pisoni et al. [69] as part of their research on the role of active modes in the reduction of urban mobility external costs. It has been trained with data from the 2018 EU Survey that collects information on citizens’ mobility and travel habits across the 28 countries belonging to the European Union. It includes a total of 26,500 respondents—1000 per country, except in the case of Malta, Luxemburg and Cyprus for which only 500 responses were gathered.

The selected modelling framework is an XGBoost classifier that predicts the probability of an individual choosing active modes (walking or biking) to make her most frequent trip. The variables included as predictors are the socio-economic features of the individual, including gender, age, education, type of employment, household income and country of residence; built environment characteristics, including the size of the city of residence and whether the individual lives in the city centre or in the suburbs; and mobility characteristics and habits, including vehicle ownership, the frequency and distance of the most frequent trip and whether the trip is made within the urban area of residency, to another urban area or to a non-urban area.

Table 3 shows both the variables included in the model, together with their distributions in the sample. The number of men and women in the sample is balanced, with nearly 50% of the sample in each category. Most of the surveyed individuals (84.4%) have an upper secondary or higher level of education and are employed full time (63.14%). Regarding the household income, the biggest share of individuals belongs to the middle categories—12.7% declared to have a high middle income, 50.3% a middle income and 22.3% a low middle income. The distribution across the different urban environments shows that almost half of the sample (44.2%) lives in small cities, while the other half lives mostly in rural areas (23.8%), followed by mid-sized (19.0%) and big cities (12.9%). Those who live in urban areas are evenly distributed between the city centre (47.9%) and the suburbs (52.1%). The differences with respect to the European averages are small—according to the Eurostat classification for the level of urbanization, 39.2% of Europeans live in cities, 31.6% in towns or suburbs and 29.1% in rural areas. Therefore, the sample slightly overrepresents residents in urban areas [70]. The characteristics of the most frequent trip show that most of the respondents make this trip every day or every working day (65.2%), that most of them travel a distance between 3 and 20 km (58.8%) and almost half (49.3%) of the sample has the urban area of residence as the destination. Finally, the use of active modes is rather unbalanced, with a 73.2% of the sample not using this kind of transportation for their most frequent trip.

The protected attributes that are studied for this model are (1) gender, with females and males as unprivileged and privileged groups, respectively; and (2) residential locations, with individuals living in the city suburbs as unprivileged groups and residents in the city centres as privileged group.

3.3.3. Multimodal Transportation in Beijing

The Beijing Multimodal Transportation model (from now on, Beijing model) was constructed using the data provided by the “Context-Aware Multi-Modal Transportation Recommendation” competition launched by the Chinese company Baidu as part of the KDD 2019 [71]. This dataset provides information on users’ trip planning requests to the Baidu Maps multimodal platform, the proposed trip plans offered by the platform, the estimated trip duration and cost for each option, as well as the trip option that was finally selected by the user. The interesting aspect of the dataset is that it includes 66 variables on user characteristics that are collected by the provider but have been obfuscated for confidentiality reasons. Such obfuscation of the description of the variable description is a usual practice in order to ensure data privacy for the users and to protect business interests. The dataset thus allows the analysis of revealed preference using fully anonymized data and socio-economic information but can also be an example of how bias may be present in a model even if the data do not include visible information.

The trips included in the dataset were made between the 1 October and 30 November 2018. The number of planned trips contained in the dataset is 1,424,848, corresponding to 453,336 requests. Each request has a personal ID that identifies the individual that requests a trip plan, for each of whom 66 anonymized socio-economic variables are provided. In total, the number of personal IDs in the sample is 42,343. Furthermore, in order to decrease the dimensionality of the data and reduce potential noise that could harm the accuracy of the model, the socio-economic categories containing less than 1000 individuals were removed. Therefore, the total number of binary socio-economic variables included as features in the model are 39.

A request to go from the origin coordinates to the destination coordinates results in one to seven possible trip plans that are offered to the user. The data contain planned trips with eight modal or multi-modal possibilities: walking, biking, private car, taxi, bus, metro, bus-metro combination and a last category grouping other minority modes, such as long-distance train (mainly trips to the mountains nearby Beijing) or more complex multimodal combinations. Each of these plans contains a mode or combination of modes, the estimated trip time, the distance and the price of the trip. The selected plan by each of the personal IDs is also provided, together with the date and time when it was selected.

The variables included as predictors in the model are trip characteristics (i.e., distance, trip time and time), the socio-economic characteristics and eight dummy variables that indicate which modes are available for that trip. Additionally, the hour and the day of the week in which the trip was selected, whether that day was a holiday and whether it was raining or not, have also been introduced as predictors. The availability of the modes has been included through the construction of eight dummy variables that indicate whether a plan using that mode has been suggested for the trip request.

The full description of the trip variables that have been included in the model is shown in Table 4. The average trip distance is around 17 km, with an average duration of roughly 50 min and a price of 15 Yuan (around 2 EUR). However, these three variables show a very high variance. Regarding the mode availability, walking is the less frequent mode offered to respond to the users’ trip planning requests, while the private car and taxi are very frequently available to reach the desired destination. Wednesday is the day in which less trip plans are requested (8.9%) while over the weekend the number of requests raises significantly (around 16% each day of the week). The number of observations per hour of the day has been grouped in periods of two hours. However, the hour variable that has been supplied to the model considers periods of one hour. As expected, most trip requests are carried out between eight in the morning and six in the afternoon when the biggest share of travel is carried out [72].

As mentioned before, the socio-economic variables in the multimodal trip planner data are given as obfuscated binary variables. In this context, it is not possible to identify which attributes could be potentially discriminatory from a cultural or legal point of view. However, it is certainly relevant to analyse if the model systematically makes wrong predictions regarding the travel decisions of individuals with a specific socio-economic attribute. For this reason, the fairness metrics have been computed for the 39 profile attributes and the two with the highest bias have been selected as sensitive attributes to implement the bias mitigation methodologies. The distribution of the 39 socio-economic variables is shown in Table 5. These sensitive attributes, p0 and p19, are highlighted in bold. While p0 shows a rather balanced distribution, p19 is highly skewed with 0.6% of the sample taking value one for this socio-economic feature.

An XGBoost classifier was implemented with an outcome variable that uses a binary structure of one vs. rest. A trip plan is labelled as one if it is selected by the user to complete a specific trip and label zero is assigned to the rest of trip plans that were suggested for that trip but were not selected. A similar approach is used in Moons et al. [73].

4. Results

This section presents the results obtained through the implementation of the three bias mitigation algorithms that were previously explained. For each protected attribute, two fairness metrics are analysed, the Statistical Parity Difference and the Equal Opportunity Difference. The results are plotted using two scatter plots for each protected attribute, in which each point represents an algorithm, being the x-axis, the fairness metric value and the y-axis, the balanced accuracy achieved by that specific algorithm. The optimal value for these metrics is zero; therefore, the incidence of bias is indicated by the absolute value.

4.1. COMPAS Model

The protected attribute of race in the COMPAS model has African Americans as the unprivileged group and Caucasians as the privileged group. Figure 2 shows the fairness metric values before and after the implementation of the bias mitigation algorithms. The original model shows a balanced accuracy of 0.66 and a rather high level of bias. The SPD shows that the probability of being labelled as a reoffender is 0.212 points higher for African Americans than for Caucasians. Regarding the EOD metric, it shows that the probability of being correctly labelled as reoffender is 0.135 lower for African Americans. These results are consistent with previous literature in which this dataset is analysed [6,38].

The different bias mitigation algorithms have very different impacts on both the accuracy and the fairness metrics. The pre-processing algorithms have a very limited impact on the balanced accuracy, and for both metrics, significantly reduces the bias with respect to the original model. In the case of the in-processing and post-processing algorithms, they generate an increase on the bias, albeit with very different levels of accuracy loss. The MetaFairClassifier (in-processing) brings a 50% drop in both the balanced and the global, while the Calibrated Equalized Odds maintains the same accuracy as the original model.

4.2. Active Modes Model

The results for the protected attribute of gender in the active modes model are shown in Figure 3. The original model has a balanced accuracy of 0.757 (global accuracy = 0.876) and an SPD of 0.024, revealing that the model is more likely to predict that women would use active modes for their most frequent trip. This difference does not necessarily imply that there is discrimination, given that the existing literature regarding gender differences in active mode adoption has pointed out that women are more prompt to walk for their daily trips [74]. When it comes to biking, women only outnumber men when the share of biking trips excesses a critical threshold, which often happens in many European cities [75]. The EOD shows that the probability of the model correctly predicting the use of active modes is 0.058 higher for women than for men. This metric indicates that the model is to some extent discriminating men, as it is misclassifying them as non-users of active modes more frequently than woman. If this model was to be used for policy support, it could lead to the deployment of unnecessary policies to promote walking or biking among men.

The Reweighting Algorithm assigns weights to each instance based on their combination gender–label. The highest weight is given for the combination male–active modes, followed female–not active modes, male–active modes and the lowest weight is given to the combination female–active modes. The XGBoost classifier using the reweighted dataset provides a model with a very similar balanced accuracy to the original model, 0.759 (Global accuracy = 0.877). The value the SPD metric decreases by more than 50% (from 0.024 to 0.011) and the value of the EOD also decreases by 38% (from 0.058 to 0.036). As the mitigation is achieved through the reduction in female individuals being classified as users of active modes, and as neither the balanced accuracy nor the global accuracy decreases, this correction clearly provides a model that gives a fairer picture of the travel behaviour for both groups.

The MetaFairClassifier algorithm was implemented with a level of debiasing of 0.1. The resulting models have a lower balanced accuracy than the original model (0.736, Global accuracy = 0.597). The SPD increases significantly from 0.024 to 0.122 (419%), while the EOD decreases from 0.058 to 0.029 (50%). The reason why all metrics except the EOD worsen is because the in-processing algorithm is optimising the FDR difference across genders by disproportionately increasing the rate of false positive prediction for the privileged group (in this case, women).

The Calibrated Equalized Odds post-processing algorithm was implemented using the FNR as a constraint. The balanced accuracy goes down to 0.739, but the global accuracy remains almost at the same level as in the original model with a value of 0.874. The SPD practically achieves value zero (0.001, 96% reduction), while the absolute value of the EOD decreases to a lower extent (from 0.058 to −0.020, 65% reduction). However, it is worth noticing that the sign changes with respect to the previous models, revealing that the post-processed models apart from reducing the absolute value of the EOD also reverts the direction of the discrimination—men are now more likely to be correctly classified than women. In this case, the algorithm minimises the FNR difference across groups by increasing the FNR of women while keeping the FNR for the men untouched, which accounts for the slight loss in balanced accuracy.

Figure 4 shows the results for the residence location attribute in the Active Modes model, where the privileged group is made up by individuals living in the city centre and the unprivileged group by those who live in the suburbs. Suburban areas have been observed to have lower levels of active mode adoption due to the lack of infrastructure [76]. Therefore, it seems reasonable that the SPD takes a negative value of −0.10, showing that the number of individuals living in the suburbs and being predicted as users of active modes is smaller than for those living in the city centre. However, the EOD metric takes a negative value of −0.08, which means that the model is more likely to misclassify residents in the suburbs as non-users of active transport. A relevant effect of algorithmic bias in transport models for residential location variables has also been observed in Zheng et al. [65]. An underestimation of the active modes demand in these areas could lead to a lack of adequate infrastructure supply and ultimately perpetuate the discrimination against residents in the suburbs.

The Reweighting Algorithm assigns the highest weight to the combination suburbs–active modes followed city centre–not active, suburbs–non-active modes and city centre–active modes. As a result of the reweighting, the probability of being classified as an active transport user for residents in the suburbs raises practically until it is equal to the probability for city centre residents (SPD = 0.006, 94% reduction). On the other hand, the absolute value of the EOD decreases by 20%. While the bias reduction is considerable, the balanced accuracy as well as the global accuracy reduction is minimal (from 0.757 to 0.748 and from 0.877 to 0.872).

For the residence attribute, the MetaFairClassifier algorithm has been implemented with a debiasing level of 0.9. The effect of this technique in the residence attribute is rather similar to the gender attribute. The balanced accuracy decreases slightly (from 0.757 to 0.736, 3% reduction) while the accuracy drops by 30% (from 0.876 to 0.603), as a result of the increase in the false positive labels in both groups. This increase does not lead to a significant reduction in the EOD, which remains unchanged with respect to the original model. The absolute values of the SPD show a great variability, achieving values from −0.576 to −0.242, but always higher than the original model.

Finally, the Calibrated Equalized Odds algorithm performs best for this attribute when using the FNR as a constraint for the reassignment of the labels. This post-processing algorithm provides a very similar balanced accuracy (0.724) to the MetaFairClassifier but in this case the global accuracy remains virtually at the same level as the original model (0.874). The bias reduction follows the same pattern as the gender attribute, the SPD achieving a reduction of 55%, while the absolute value of the EOD only decreases by 8%.

4.3. Beijing Model

The Beijing model aims to predict whether an individuals will choose a trip plan to carry on a specific journey. Therefore, the outcome variable takes value one when the trip plan has been chosen and zero when it has not. The original model achieves a balanced accuracy of 0.748 and a global accuracy of 0.843 when predicting whether a plan is chosen or not. As mentioned in the description of the model, the XGBoost classifier also takes as features, 39 obfuscated (i.e., encrypted as binary anonymized variables) profile variables that provide information on the characteristics of the user. As the real meaning of these variables is unknown, the protected attributes have been selected to be those that have higher values for the bias metrics considered in this paper, which turn out to be p0 and p19. Arbitrarily, the privileged group has been decided to be composed of individuals with a null value for these categories and the unprivileged group by individuals with value one. Regarding the parameters that maximize fairness for the in- and post-processing algorithms, the MetaFairClassifier has been implemented with a level of debiasing of 0.8 and the Calibrated Equalized Odds optimises weighted combinations of the FNR and the FPR for both attributes.

Figure 5 shows the results for the p0 attribute, where it can be observed that the bias metrics are positive. Therefore, the model is discriminating against individuals who do not belong to the p0 socio-economic category. In terms of the SPD, the discrimination is mild—an individual that has characteristic p0 only has 0.032 lower probability of being predicted to choose a plan than the rest of individuals. By contrast, the difference in the TPR across groups (EOD = 0.113) is almost at the same level as the COMPAS model (EOD = 0.135) revealing that this model is making systematic errors when predicting the transport choices for the p0 group.

All the three bias mitigation techniques accomplish fairer models than the original. Both the Reweighting and Calibrated Equalized Odds algorithms reduce the bias with a very limited reduction in the accuracy. The reduction is especially efficient for the post-processing technique as it reduces the SPD by 70% and the EOD by 60%, with a balanced and global accuracy loss of 2% and 1%, respectively. The MetaFairClassifier (implemented with a level of debiasing of 0.8) achieves very similar results to the Calibrated Equalized Odds algorithm in terms of debiasing. In contrast, the balanced accuracy drops by 25% and the global accuracy by 11%. This is probably because in order to optimise the FDR difference (i.e., the ratio of observations in which it was predicted the trip plan would be chosen, but it is not), the classifier significantly decreases the number of positive predictions for both groups.

Finally, the results for the p19 attribute are displayed in Figure 6. Again, the SPD exhibits a rather low value of −0.028, while the EOD is significantly higher (−0.046). These metrics show that, as in the previous cases, the bias affecting the model comes mainly from the differences in the TPR across groups rather than from the favourable outcome predicted frequencies.

The debiasing techniques yield very similar results to the p0 attribute. The pre-processing achieves a low level of debiasing for the SPD (16%, from −0.028 to −0.023) while keeping a similar level of accuracy as the original model. The MetaFairClassifier causes similar reductions in the balanced accuracy (25%) and in the global accuracy (11%), while achieving good levels of fairness (SPD = −0.011, EOD = −0.016). In terms of trade-off between fairness and accuracy loss, the Calibrated Equalized Odds algorithm offers a better balance. It reaches an SPD of −0.009 (68% reduction) and an EOD of −0.037 (56% reduction), while only losing 2% of balanced accuracy and 1% of global accuracy.

5. Discussion

This research studies the level of algorithmic bias that affects two different transport mode choice models and implements three different off-the-shelf bias mitigation algorithms. This study analyses the extent to which fairness can be achieve by the different debiasing techniques, what is their impact on the global and balanced accuracy and how the modification would affect the policies designed on the basis of those debiased models.

The results obtained for the bias metrics before implementing the bias mitigation techniques have shown that mode choice models are indeed affected by algorithmic bias against socially discriminated groups (i.e., women, residents in the suburbs), as had been already observed in Zheng et al. [65]. With respect to the benchmark model COMPAS, the SPD found in the Active Modes model and the Beijing model is considerably lower, while the EOD is in the same order of magnitude. The comparison with the values obtained in the previous literature reveals that with respect to the SPD metric, our models can hardly be considered biased. On the contrary, for the case of the EOD, the values that have been obtained are at the same level as the previous models that have been undoubtedly considered as biased [38,51,77]. It would have been expected that the Beijing model would have had higher bias than the Active Modes model, as the latter was trained with data from a dedicated survey in which the acquisition process was controlled. However, the comparison between the two models reveals a slightly higher level of SPD for the Active Modes model than for the Beijing model—the average SPD value is 0.062 for the first and 0.030 for the latter. On the contrary, and in line with what would have been expected, the EOD is on average 14% higher for the Beijing model than for the Active Modes model.

Regarding the effectivity of the bias mitigation algorithms, Table 6 compiles the average loss of global and balanced accuracy, as well as the average reduction in the bias metrics for each of the models and each of the implemented algorithms.

The Reweighting (pre-processing) algorithm provides the most consistent results across all cases. It always achieves a reasonable level of debiasing with a very limited accuracy loss—less than 1% in all cases. Since the population sampling is fundamental for a transport model to be representative, the weighting of the different socio-economic groups depending on the frequency on which they are predicted to choose a specific mode seems a rather logical and easy to implement technique to make models fairer. Furthermore, as the classifier is not modified nor are the outcomes provided for the classifier, Reweighting would still allow for the interpretation of the model to draw conclusions regarding transport behaviour, which is a key aspect to consider when building mode choice models.
The MetaFairClassifier (in-processing) provides a model with a significantly worse global accuracy than the original models and a very variable level of debiasing. When predicting the use of active modes, the bias measured by the SPD drastically increases by 363.54%, while a limited decrease in the EOD is achieved. A very different performance is observed for the protected attributes of the Beijing model, for which the bias reduction reaches an average of 67.5% for the SPD and 78% for the EOD. This difference between models might indicate that the MetaFairClassifier needs to be trained with big datasets in order to accomplish good levels of debiasing. In both cases, the reduction is achieved at the expense of the accuracy loss that derives from the drastic loss of positive predictions aiming to reduce the difference in the FDR across groups. The consequences of this reduction would be severe for the policy-making process, as it would entail an underestimation of the demand for all groups.
The Calibrated Equalized Odds (post-processing) algorithm has very little impact in the accuracies of both models, although it is slightly higher than the pre-processing technique. The balanced accuracy suffers a moderate loss because in order to minimise the FNR gap across groups, the number of positive labels assigned to the privileged group decreases slightly, while keeping the number of positive labels assigned to the privileged group untouched. The redistribution of labels allows us to successfully remove the bias for both metrics, since it achieves an average reduction of 72% for the SPD and 48% for the EOD. It is noteworthy that debiasing is especially efficient for the gender attribute in the Active Modes model, which is also the variable with the most balanced distribution across categories. These results could suggest a higher efficiency of the post-processing algorithms for balanced data.

6. Conclusions

In this work, a methodological framework to test and mitigate algorithmic bias in behavioural transport models has been proposed. This methodology tests the efficiency of three different debiasing techniques at three stages of the model development pipeline (pre-processing, in-processing, post-processing) for three different prediction models.

The implementation of three bias elimination procedures at three different stages allows us to draw conclusions on how each of the procedures affects the bias and the accuracy of the models. Furthermore, the proposed methodology has been applied to two transport mode choice models built with data of different nature. The Active Modes model uses survey data, which in general is gathered through a specifically designed process trying to obtain a heterogeneous and representative sample. The Beijing model uses data from a multimodal transportation planner application in which the researcher has no control over the acquisition process. Checking and removing bias is becoming more and more relevant as an increasing number of transport models are using mobile application data to model human mobility [78,79,80]. These kind of acquisition technologies have been proven to have an inherent bias due to the different levels of technology adoption across socio-economic groups [81].

The contribution that this works aims to make is to provide transport modellers with an overview of the best available techniques to detect and mitigate the algorithmic bias that might be present in mode choice models. For this purpose, recommendations for transport modellers regarding which of the approaches to bias detection and quantification are more suitable and what are the advantages and disadvantages of some of the existing debiasing procedures at different stages of the modelling pipeline. The main outcome in this respect is that pre-processing and post-processing techniques yield better results in terms of balance between accuracy loss and bias mitigation than in-processing techniques. Furthermore, as ML techniques are often used with big data coming from opportunistic data sources, the comparison of the two models allows us to draw conclusions on the relation between the nature of the data and the existence of bias. In this regard, both models exhibit a certain level of bias but no significant differences across them have been found. This result implies that there is no reason to assume that the use of new mobility data sources will lead to less fair results than the use of survey data. Nonetheless, the issue of algorithmic fairness has been observed to affect models trained on both kinds of data, which highlights the importance of taking this problem into account when developing any kind of mode choice model based on ML techniques.

As the main limitation of this research, the authors wish to mention the restriction to binary outcomes that is imposed by the tools and techniques that have been developed so far for bias mitigation. The implementation of the existing bias mitigation algorithms provided in the AIF360 toolkit created by IBM requires a binary label dataset as the input, which in models considering more than one transportation option might be a barrier to the introduction of bias mitigation methods in the pipeline. Furthermore, in-processing and post-processing methods fundamentally alter the classifier or the predictions of the classifier which hinders the behavioural interpretation of the models. In this direction, the authors wish to highlight this issue as the main avenue for future research. A next step towards the creation of fair models for transport policy support would be to analyse how the implementation of bias mitigation techniques might modify the behavioural interpretation of the models (i.e., changes in the feature importance or modification of the regression coefficients).

Author Contributions

Conceptualization, M.V.-G. and P.C.; Data curation, M.V.-G.; Formal analysis, M.V.-G.; Funding acquisition, P.C.; Investigation, M.V.-G.; Methodology, M.V.-G.; Project administration, P.C.; Software, M.V.-G.; Supervision, P.C.; Visualization, M.V.-G.; Writing—original draft, M.V.-G.; Writing—review and editing, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was developed within the Collaborative Doctoral Partnership program between the Joint Research Center of the European Commission and Centro de Investigación del Transporte (TRANSyT) of Universidad Politécnica de Madrid [Agreement nº 35364].

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

EU Disclaimer

The views expressed are purely those of the authors and may not in any circumstances be regarded as stating an official position of the European Commission.

References

ITF. Governing Transport in the Algorithmic Age; ITF: London, UK, 2019; Available online: https://www.itf-oecd.org/governing-transport-algorithmic-age (accessed on 25 May 2022).
van Cranenburgh, S.; Wang, S.; Vij, A.; Pereira, F.; Walker, J. Choice modelling in the age of machine learning. arXiv 2021, arXiv:2101.11948. [Google Scholar]
ITF. Big Data and Transport. Corporate Partnership Board Report. 2015. Available online: https://www.itf-oecd.org/big-data-and-transport (accessed on 2 February 2022).
Anda, C.; Erath, A.; Fourie, P.J. Transport modelling in the age of big data. Int. J. Urban Sci. 2017, 21 (Suppl. S1), 19–42. [Google Scholar] [CrossRef]
Kleinberg, J.; Ludwig, J.; Mullainathan, S.; Rambachan, A. Algorithmic Fairness. AEA Pap. Proc. 2020, 108, 22–27. [Google Scholar] [CrossRef]
Larson, J.; Mattu, S.; Kirchner, L.; Angwin, J. How We Analyzed the COMPAS Recidivism Algorithm. 2016. Available online: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm (accessed on 25 January 2022).
Barocas, S.; Hardt, M.; Narayanan, A. Fairness and Machine Learning—Limitations and Opportunities. 2019. Available online: https://fairmlbook.org/ (accessed on 2 February 2022).
Wang, Y.; Zeng, Z. Overview of Data-Driven Solutions. Data-Driven Solut. Transp. Probl. 2019, 2019, 1–10. [Google Scholar] [CrossRef]
Zhao, Z.; Koutsopoulos, H.N.; Zhao, J. Detecting pattern changes in individual travel behavior: A Bayesian approach. Transp. Res. Part B Methodol. 2018, 112, 73–88. [Google Scholar] [CrossRef]
Liu, Z.; Liu, Y.; Meng, Q.; Cheng, Q. A tailored machine learning approach for urban transport network flow estimation. Transp. Res. Part C: Emerg. Technol. 2019, 108, 130–150. [Google Scholar] [CrossRef]
Zhang, K.; Jia, N.; Zheng, L.; Liu, Z. A novel generative adversarial network for estimation of trip travel time distribution with trajectory data. Transp. Res. Part C Emerg. Technol. 2019, 108, 223–244. [Google Scholar] [CrossRef]
Cheng, L.; Chen, X.; de Vos, J.; Lai, X.; Witlox, F. Applying a random forest method approach to model travel mode choice behavior. Travel Behav. Soc. 2019, 14, 1–10. [Google Scholar] [CrossRef]
Hillel, T. New Perspectives on the Performance of Machine Learning Classifiers for Mode Choice Prediction; Ecole Polytechnique Fédérale de Lausanne: Lausanne, Switzerland, 2020. [Google Scholar]
Omrani, H.; Charif, O.; Gerber, P.; Awasthi, A.; Trigano, P. Prediction of Individual Travel Mode with Evidential Neural Network Model. Transp. Res. Rec. 2013, 2399, 1–8. [Google Scholar] [CrossRef]
Hagenauer, J.; Helbich, M. A comparative study of machine learning classifiers for modeling travel mode choice. Expert Syst. Appl. 2017, 78, 273–282. [Google Scholar] [CrossRef]
Xie, C.; Lu, J.; Parkany, E. Work Travel Mode Choice Modeling with Data Mining: Decision Trees and Neural Networks. Transp. Res. Rec. 2003, 1854, 50–61. [Google Scholar] [CrossRef]
Karlaftis, M.G.; Vlahogianni, E.I. Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transp. Res. Part C Emerg. Technol. 2011, 19, 387–399. [Google Scholar] [CrossRef]
Wang, F.; Ross, C.L. Machine Learning Travel Mode Choices: Comparing the Performance of an Extreme Gradient Boosting Model with a Multinomial Logit Model. Transp. Res. Rec. 2018, 2672, 35–45. [Google Scholar] [CrossRef] [Green Version]
Zhao, X.; Yan, X.; Yu, A.; van Hentenryck, P. Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models. Travel Behav. Soc. 2020, 20, 22–35. [Google Scholar] [CrossRef]
Hillel, T.; Bierlaire, M.; Elshafie, E.B.; Jin, Y. A systematic review of machine learning classification methodologies for modelling passenger mode choice. J. Choice Model. 2021, 38, 100221. [Google Scholar] [CrossRef]
Chang, X.; Wu, J.; Liu, H.; Yan, X.; Sun, H.; Qu, Y. Travel mode choice: A data fusion model using machine learning methods and evidence from travel diary survey data. Transp. A Transp. Sci. 2019, 15, 1587–1612. [Google Scholar] [CrossRef]
Kim, E.J. Analysis of Travel Mode Choice in Seoul Using an Interpretable Machine Learning Approach. J. Adv. Transp. 2021, 2021, 6685004. [Google Scholar] [CrossRef]
Omrani, H. Predicting Travel Mode of Individuals by Machine Learning. Transp. Res. Procedia 2015, 10, 840–849. [Google Scholar] [CrossRef] [Green Version]
Tang, L.; Xiong, C.; Zhang, L. Decision tree method for modeling travel mode switching in a dynamic behavioral process. Transp. Plan. Technol. 2015, 38, 833–850. [Google Scholar] [CrossRef]
Ceccato, R.; Chicco, A.; Diana, M. Evaluating car-sharing switching rates from traditional transport means through logit models and Random Forest classifiers. Transp. Plan. Technol. 2021, 44, 160–175. [Google Scholar] [CrossRef]
Zhao, D.; Shao, C.; Li, J.; Dong, C.; Liu, Y. Travel Mode Choice Modeling Based on Improved Probabilistic Neural Network. In Proceedings of the Conference on Traffic and Transportation Studies (ICTTS), Kunming, China, 3–5 August 2010; Volume 383, pp. 685–695. [Google Scholar] [CrossRef]
Calders, T.; Žliobaitė, I. Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures. Stud. Appl. Philos. Epistemol. Ration. Ethics 2013, 3, 43–57. [Google Scholar] [CrossRef]
Kleinberg, J.; Ludwig, J.; Mullainathan, S.; Sunstein, C.R. Discrimination in the Age of Algorithms. J. Leg. Anal. 2018, 10, 113–174. [Google Scholar] [CrossRef]
Yarbrough, M.V. Disparate Impact, Disparate Treatment, and the Displaced Homemaker. Law Contemp. Probl. 1986, 49, 107. [Google Scholar] [CrossRef] [Green Version]
Friedler, S.A.; Scheidegger, C.; Venkatasubramanian, S. On the (Im)Possibility of Fairness. arXiv 2016, arXiv:1609.07236. [Google Scholar] [CrossRef]
Majumder, S.; Chakraborty, J.; Bai, G.R.; Stolee, K.T.; Menzies, T. Fair Enough: Searching for Sufficient Measures of Fairness. arXiv 2021, arXiv:abs/2110.13029. [Google Scholar]
Verma, S.; Rubin, J. Fairness Definitions Explained. In Proceedings of the 2018 IEEE/ACM International Workshop on Software Fairness (FairWare), Gothenburg, Sweden, 29 May 2018; pp. 1–7. [Google Scholar] [CrossRef]
Dwork, C.; Hardt, M.; Pitassi, T.; Reingold, O.; Zemel, R. Fairness Through Awareness. In Proceedings of the ITCS 2012—Innovations in Theoretical Computer Science Conference, Cambridge, MA, USA, 8–10 January 2012; pp. 214–226. [Google Scholar] [CrossRef] [Green Version]
Simoiu, C.; Corbett-Davies, S.; Goel, S.; Ermon, S.; Feller, A.; Flaxman, S.; Gelman, A.; Mackey, L.; Overgoor, J.; Pierson, E. The Problem of Infra-marginality in Outcome Tests for Discrimination. Ann. Appl. Stat. 2016, 11, 1193–1216. [Google Scholar] [CrossRef]
Corbett-Davies, S.; Pierson, E.; Feller, A.; Goel, S.; Huq, A. Algorithmic decision making and the cost of fairness. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F129685, Halifax, NS, Canada, 13–17 August 2017; pp. 797–806. [Google Scholar] [CrossRef]
Hardt, M.; Price, E.; Srebro, N. Equality of Opportunity in Supervised Learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3323–3331. [Google Scholar]
Zafar, M.B.; Valera, I.; Rodriguez, M.G.; Gummadi, K.P. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment. In Proceedings of the 26th International World Wide Web Conference (WWW), Perth, Australia, 3–7 April 2017; pp. 1171–1180. [Google Scholar] [CrossRef] [Green Version]
Bellamy, R.; Dey, K.; Hind, M.; Hoffman, S.C.; Houde, S.; Kannan, K.; Lohia, P.; Martino, J.; Mehta, S.; Mojsilovic, A.; et al. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. 2018. Available online: https://github.com/ibm/aif360 (accessed on 15 December 2021).
Pedreshi, D.; Ruggieri, S.; Turini, F. Discrimination-aware data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 560–568. [Google Scholar] [CrossRef] [Green Version]
Kamiran, F.; Calders, T. Classifying without discriminating. In Proceedings of the 2009 2nd International Conference on Computer, Control and Communication, Karachi, Pakistan, 17–18 February 2009; pp. 1–6. [Google Scholar] [CrossRef]
Feldman, M.; Friedler, S.A.; Moeller, J.; Scheidegger, C.; Venkatasubramanian, S. Certifying and removing disparate impact. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 259–268. [Google Scholar] [CrossRef] [Green Version]
Kamiran, F.; Calders, T. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 2011, 33, 1–33. [Google Scholar] [CrossRef] [Green Version]
Calmon, F.P.; Wei, D.; Vinzamuri, B.; Ramamurthy, K.N.; Varshney, K.R. Optimized Data Pre-Processing for Discrimination Prevention. Adv. Neural Inf. Processing Syst. 2017, 1, 3993–4002. [Google Scholar]
Zemel, R.; Ledell, Y.W.; Swersky, K.; Pitassi, T.; Dwork, C. Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 325–333. Available online: https://proceedings.mlr.press/v28/zemel13.html (accessed on 3 February 2022).
Calders, T.; Kamiran, F.; Pechenizkiy, M. Building classifiers with independency constraints. In Proceedings of the ICDM Workshops 2009—IEEE International Conference on Data Mining, Miami, FL, USA, 6 December 2009; pp. 13–18. [Google Scholar] [CrossRef]
Kamishima, T.; Akaho, S.; Sakuma, J. Fairness-aware learning through regularization approach. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Vancouver, BC, Canada, 11 December 2011; pp. 643–650. [Google Scholar] [CrossRef] [Green Version]
Zhang, B.H.; Lemoine, B.; Mitchell, M. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2018; pp. 335–340. [Google Scholar] [CrossRef] [Green Version]
Agarwal, A.; Beygelzimer, A.; Dudfk, M.; Langford, J.; Hanna, W. A Reductions Approach to Fair Classification. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 1, pp. 102–119. Available online: https://arxiv.org/abs/1803.02453v3 (accessed on 3 February 2022).
Agarwal, A.; Dudík, M.; Wu, Z.S. Fair Regression: Quantitative Definitions and Reduction-based Algorithms. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 166–183. Available online: https://arxiv.org/abs/1905.12843v1 (accessed on 3 February 2022).
Kearns, M.; Roth, A.; Neel, S.; Wu, Z.S. An Empirical Study of Rich Subgroup Fairness for Machine Learning. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 100–109. [Google Scholar] [CrossRef] [Green Version]
Elisa Celis, L.; Huang, L.; Keswani, V.; Vishnoi, N.K. Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA, 29–31 January 2019; pp. 319–328. [Google Scholar] [CrossRef]
Menon, A.K.; Williamson, R.C. The cost of fairness in binary classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, New York, NY, USA, 23–24 February 2018; Friedler, S.A., Wilson, C., Eds.; Volume 81, pp. 107–118. Available online: https://proceedings.mlr.press/v81/menon18a.html (accessed on 10 December 2021).
Woodworth, B.; Gunasekar, S.; Ohannessian, M.I.; Srebro, N. Learning Non-Discriminatory Predictors. arXiv 2017, arXiv:1702.06081v3. [Google Scholar]
Kamiran, F.; Karim, A.; Zhang, X. Decision theory for discrimination-aware classification. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, 10–13 December 2012; pp. 924–929. [Google Scholar] [CrossRef] [Green Version]
Pleiss, G.; Raghavan, M.; Wu, F.; Kleinberg, J.; Weinberger, K.Q. On Fairness and Calibration. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5684–5693. [Google Scholar]
Best, H.; Lanzendorf, M. Division of labour and gender differences in metropolitan car use. An empirical study in Cologne, Germany. J. Transp. Geogr. 2005, 13, 109–121. [Google Scholar] [CrossRef]
Scheiner, J. Gendered key events in the life course: Effects on changes in travel mode choice over time. J. Transp. Geogr. 2014, 37, 47–60. [Google Scholar] [CrossRef]
Hu, L. Racial/ethnic differences in job accessibility effects: Explaining employment and commutes in the Los Angeles region. Transp. Res. Part D Transp. Environ. 2019, 76, 56–71. [Google Scholar] [CrossRef]
Rosenbloom, S.; Waldorf, B. Older travelers: Does place or race make a difference? Transp. Res. Circ. 2001, E-C026, 103–120. [Google Scholar]
Tehrani, S.O.; Wu, S.J.; Roberts, J.D. The Color of Health: Residential Segregation, Light Rail Transit Developments, and Gentrification in the United States. Int. J. Environ. Res. Public Health 2019, 16, 3683. [Google Scholar] [CrossRef] [Green Version]
Calafiore, A.; Dunning, R.; Nurse, A.; Singleton, A. The 20-minute city: An equity analysis of Liverpool City Region. Transp. Res. Part D Transp. Environ. 2022, 102, 103111. [Google Scholar] [CrossRef]
Farber, S.; Bartholomew, K.; Li, X.; Páez, A.; Nurul Habib, K.M. Assessing social equity in distance based transit fares using a model of travel behavior. Transp. Res. Part A Policy Pract. 2014, 67, 291–303. [Google Scholar] [CrossRef]
Giuliano, G. Low Income, Public Transit, and Mobility. Transp. Res. Rec. 2005, 1927, 63–70. [Google Scholar] [CrossRef]
Stanley, J.; Stanley, J.; Vella-Brodrick, D.; Currie, G. The place of transport in facilitating social inclusion via the mediating influence of social capital. Res. Transp. Econ. 2010, 29, 280–286. [Google Scholar] [CrossRef]
Zheng, Y.; Wang, S.; Zhao, J. Equality of opportunity in travel behavior prediction with deep neural networks and discrete choice models. Transp. Res. Part C Emerg. Technol. 2021, 132, 103410. [Google Scholar] [CrossRef]
Corbett-Davies, S.; Goel, S.; Chohlas-Wood, A.; Chouldechova, A.; Feller, A.; Huq, A.; Hardt, M.; Ho, D.E.; Mitchell, S.; Overgoor, J.; et al. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. arXiv 2018, arXiv:1808.00023. [Google Scholar]
Chouldechova, A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Artif. Intell. Law 2016, 25, 5–27. [Google Scholar] [CrossRef] [PubMed]
Rudin, C.; Wang, C.; Coker, B. The age of secrecy and unfairness in recidivism prediction. Harv. Data Sci. Rev. 2018, 2, 6ed64b30. [Google Scholar] [CrossRef]
Pisoni, E.; Christidis, P.; Cawood, E.N. Active mobility versus motorized transport? User choices and benefits for the society. Sci. Total Environ. 2022, 806, 150627. [Google Scholar] [CrossRef] [PubMed]
Eurostat. Urban and Rural Living in the EU; Eurostat: Luxembourg, 2020. Available online: https://ec.europa.eu/eurostat/web/products-eurostat-news/-/edn-20200207-1 (accessed on 5 February 2022).
Zhou, W.; Roy, T.D.; Skrypnyk, I. The KDD Cup 2019 Report. ACM SIGKDD Explor. Newsl. 2020, 22, 8–17. [Google Scholar] [CrossRef]
TomTom. Beijing Traffic Report. 2020. Available online: https://www.tomtom.com/en_gb/traffic-index/beijing-traffic/ (accessed on 22 December 2021).
Moons, E.; Wets, G.; Aerts, M. Nonlinear Models for Determining Mode Choice. In Proceedings of the Progress in Artificial Intelligence, Guimarães, Portugal, 3–7 December 2007; pp. 183–194. [Google Scholar] [CrossRef]
Goel, R.; Oyebode, O.; Foley, L.; Tatah, L.; Millett, C.; Woodcock, J. Gender differences in active travel in major cities across the world. Transportation 2022, 2021, 1–17. [Google Scholar] [CrossRef]
Goel, R.; Goodman, A.; Aldred, R.; Nakamura, R.; Tatah, L.; Garcia LM, T.; Diomedi-Zapata, B.; de Sa, T.H.; Tiwari, G.; de Nazelle, A.; et al. Cycling Behaviour in 17 Countries across 6 Continents: Levels of Cycling, Who Cycles, for What Purpose, and How Far? Transp. Rev. 2021, 42, 58–81. Available online: https://doi.org/10.1080/01441647.2021.1915898/SUPPL_FILE/TTRV_A_1915898_SM5155.ZIP (accessed on 2 December 2021). [CrossRef]
Aldred, R.; Croft, J.; Goodman, A. Impacts of an active travel intervention with a cycling focus in a suburban context: One-year findings from an evaluation of London’s in-progress mini-Hollands programme. Transp. Res. Part A Policy Pract. 2019, 123, 147–169. [Google Scholar] [CrossRef]
Aasheim, T.H.; Sølveånneland KT, H.; Sølveånneland, S.; Brynjulfsen, H.; Slavkovik, M. Bias Mitigation with AIF360: A Comparative Study. Nor. IKT-Konf. Forsk. Og Utdanning 2020, 1, 833. Available online: https://ojs.bibsys.no/index.php/NIK/article/view/833 (accessed on 10 January 2022).
Burgdorf, C.; Mönch, A.; Beige, S. Mode choice and spatial distribution in long-distance passenger transport—Does mobile network data deliver similar results to other transportation models? Transp. Res. Interdiscip. Perspect. 2020, 8, 100254. [Google Scholar] [CrossRef]
Sun, X.; Wandelt, S. Transportation mode choice behavior with recommender systems: A case study on Beijing. Transp. Res. Interdiscip. Perspect. 2021, 11, 100408. [Google Scholar] [CrossRef]
González, M.C.; Hidalgo, C.A.; Barabási, A.L. Understanding individual human mobility patterns. Nature 2008, 453, 779–782. [Google Scholar] [CrossRef] [PubMed]
Wesolowski, A.; Eagle, N.; Noor, A.M.; Snow, R.W.; Buckee, C.O. The impact of biases in mobile phone ownership on estimates of human mobility. J. R. Soc. Interface 2013, 10, 20120986. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Pipeline diagrams of the three debiasing methodologies implemented at different stages of the pipeline. Adapted from [38].

Figure 2. Fairness metrics for the protected attribute of race in the COMPAS model.

Figure 3. Bias mitigation results for the gender attribute in the Active Modes model.

Figure 4. Bias mitigation results for the residence attribute in the Active Modes model.

Figure 5. Bias mitigation results for variable p0 of the Beijing model.

Figure 6. Bias mitigation results for variable p19 of the Beijing model.

Table 1. Overview of the models under study.

	Baseline Model	Data	Protected Attributes	Favourable Outcome
COMPAS	Logistic regression	Personal and criminal behaviour characteristics of criminal defendants	Race	Individual labelled with high risk of recidivism
Active Modes	XGBoost	Travel Survey	Gender Residential location	Individual uses active modes for his most frequent trip
Multimodal transportation Beijing model	XGBoost	Multimodal trip planner (Baidu) data	Anonymized socio-economic variables with highest bias: p0 p19	The planned trip is chosen for a given trip

Table 2. Sample distribution of the COMPAS dataset.

		Categories	N. of Observations (Percentage)
Socio-economic variables	Gender	Male	4247 (80.5%)
	Gender	Female	1031 (19.5%)
	Race	Caucasian	2103 (39.8%)
	Race	African American	3175 (60.2%)
	Age	Less than 25 years old	1156 (21.9%)
		25–45 years old	3026 (57.3%)
		More than 45 years old	1096 (20.8%)
Criminal behaviour	Number of priors	No priors	1667 (31.6%)
		0–3 priors	1953 (37.0%)
		More than 3	1658 (31.4%)
	Type of charges	Felony	3440 (65.2%)
	Type of charges	Misdemeanour	1838 (34.8%)

Table 3. Sample distribution of the survey data used in the Active Modes model.

		Continuous Variables		Categorical Variables
		Mean	Standard Deviation	Categories	N. of Observations (Percentage)
Socio-economic variables	Gender			Male	12,986 (49.0%)
	Gender			Female	13,514 (51.0%)
	Age	41.21	13.75
	Education			Primary	738 (2.8%)
				Low secondary	3167 (11.9%)
				Upper secondary	11,365 (42.9%)
				Tertiary and higher	11,230 (42.4%)
	Type of employment			Full-time employed	15,954 (60.2%)
				Part-time employed	2845 (10.7%)
				Unemployed	1696 (6.4%)
				Studying	1933 (7.3%)
				Retired	2490 (9.4%)
				Other	1319 (5.0%)
				I prefer not to answer	263 (1.0%)
	Household income			High	509 (1.9%)
				Higher middle	3276 (12.4%)
				Middle	14,017 (52.9%)
				Lower middle	5958 (22.5%)
				Low	1780 (6.7%)
				I prefer not to answer	960 (3.6%)
Urban environment	Size of the city			>1 million inhabitants	3416 (12.9%)
				250.000–50.000 inhabitants	5041 (19.0%)
				<250.000 inhabitants	11,726 (44.3%)
				Rural area	6317 (23.8%)
	Area of residence			Centre	9657 (36.4%)
				Suburbs	10,525 (39.8%)
				Not living in a city	6317 (23.8%)
Most frequent trip characteristics	Vehicles per person in household	0.612	0.377
	Frequency of the most frequent trip			Every day/every working day	17,286 (65.2%)
				2–4 days/week	7041 (26.6%)
				Once per week or less	2173 (8.2%)
	Distance of the most frequent trip			Less than 3 km	4274 (16.1%)
				3–5 km	5002 (18.9%)
				6–10 km	5444 (20.5%)
				11–20 km	5154 (19.4%)
				21–30 km	2904 (11.0%)
				31–50 km	1842 (7.0%)
				More than 50 km	1880 (7.1%)
	Area of destination of most frequent trip			Urban area of residence	13,073 (49.3%)
				Urban area different from that of residence	9108 (34.4%)
				Outside an urban area	4319 (16.3%)
	Active modes for most frequent trip			Yes	7112 (26.8%)
	Active modes for most frequent trip			No	19,388 (73.2%)

Table 4. Sample distribution of the multimodal trip planner data used in the Beijing model—trip characteristics.

	Continuous Variables		Categorical Variables
	Mean	Standard Deviation	Categories	N. Observations (Percentage)
Distance (m)	17,087	16,037
Time (sec.)	2934	2017
Price (Yuan cents)	1514	3100
Availability: walking			Yes	385,763 (27.1%)
Availability: walking			No	1,039,085 (72.9%)
Availability: biking			Yes	595,855 (41.8%)
Availability: biking			No	828,993 (58.2%)
Availability: private car			Yes	1,394,930 (97.9%)
Availability: private car			No	29,918 (2.1%)
Availability: taxi			Yes	1,319,705 (92.6%)
Availability: taxi			No	105,143 (7.4%)
Availability: bus			Yes	838,947 (58.9%)
Availability: bus			No	585,901 (41.1%)
Availability: metro			Yes	623,295 (43.7%)
Availability: metro			No	801,553 (56.3%)
Availability: metro–bus			Yes	628,790 (44.1%)
Availability: metro–bus			No	796,058 (55.9%)
Availability: other			Yes	711,594 (49.9%)
Availability: other			No	713,254 (50.1%)
Weather			Not raining	1,389,698 (97.5%)
Weather			Raining	35,150 (2.5%)
Holidays			Holidays	241,794 (83.0%)
Holidays			Not holidays	1,183,054 (17.0%)
Day of the week			Monday	199,942 (14.0%)
			Tuesday	182,565 (12.8%)
			Wednesday	146,127 (10.3%)
			Thursday	226,191 (15.9%)
			Friday	195,225 (13.7%)
			Saturday	235,745 (16.5%)
			Sunday	239,053 (16.8%)
Hour of the day			00:00–01:59	13,557 (1.0%)
			02:00–03:59	4358 (0.3%)
			04:00–05:59	15,207 (1.0%)
			06:00–07:59	86,467 (6.1%)
			08:00–09:59	184,422 (12.9%)
			10:00–11:59	194,840 (13.7%)
			12:00–13:59	209,346 (14.7%)
			14:00–15:59	203,881 (14.3%)
			16:00–17:59	203,535 (14.3%)
			18:00–19:59	148,026 (10.4%)
			20:00–21:59	104,059 (7.3%)
			22:00–23:59	57,150 (4.0%)

Table 5. Sample distribution of the multimodal trip planner data used for the Beijing model—obfuscated socio-economic characteristics.

Variable	Categories	N. Observations (Percentage)	Variable	Categories	N. Observations (Percentage)
P0	0	663,397 (46.6%)	P32	0	966,387 (67.8%)
	1	761,451 (53.4%)		1	458,461 (32.2%)
P2	0	1,058,116 (74.3%)	P34	0	1,069,356 (75.1%)
	1	366,732 (25.7%)		1	355,492 (24.9%)
P3	0	1,238,550 (86.9%)	P35	0	1062,231 (74.6%)
	1	186,298 (13.1%)		1	362,617 (25.4%)
P4	0	1,356,173 (95.2%)	P36	0	1,013,116 (58.4%)
	1	68,675 (4.8%)		1	411,732 (41.6%)
P7	0	884,770 (62.1%)	P37	0	831,986 (58.4%)
	1	540,078 (37.9%)		1	592,862 (41.6%)
P8	0	540,366 (37.9%)	P38	0	1,292,685 (90.7%)
	1	884,482 (62.1%)		1	132,163 (9.3%)
P9	0	1,036,282 (72.7%)	P39	0	1,335,421 (93.7%)
	1	388,566 (27.3%)		1	89,427 (6.3%)
P10	0	804,972 (56.5%)	P40	0	1,363,782 (95.7%)
	1	619,876 (43.5%)		1	61,066 (4.3%)
P16	0	1,407,166 (98.8%)	P45	0	1,404,475 (98.6%)
	1	17,682 (1.2%)		1	20,373 (1.4%)
P17	0	1,390,173 (97.6%)	P46	0	1,370,546 (96.2%)
	1	34,675 (2.4%)		1	54,302 (3.8%)
P18	0	1,404,906 (98.6%)	P47	0	1,205,040 (84.6%)
	1	19,942 (1.4%)		1	219,808 (15.4%)
P19	0	1,415,639 (99.4%)	P49	0	1,380,596 (96.9%)
	1	9209 (0.6%)		1	44,252 (3.1%)
P21	0	1,386,895 (97.3%)	P54	0	1,311,549 (92.0%)
	1	37,953 (2.7%)		1	113,299 (8.0%)
P26	0	1,060,262 (74.4%)	P56	0	1,366,041 (95.9%)
	1	364,586 (25.6%)		1	58,807 (4.1%)
P27	0	1,213,610 (85.2%)	P57	0	1,324,310 (92.9%)
	1	211,238 (14.8%)		1	100,538 (7.1%)
P28	0	1,258,215 (88.3%)	P60	0	803,062 (56.4%)
	1	166,633 (11.7%)		1	621,786 (43.6%)
P29	0	1,010,106 (70.9%)	P61	0	1,374,630 (96.5%)
	1	414,742 (29.1%)		1	50,218 (3.5%)
P30	0	286,014 (20.1%)	P62	0	1,167,877 (82.0%)
	1	1,138,834 (79.9%)		1	256,971 (18.0%)
P31	0	1,196,366 (84.0%)	P63	0	1,248,581 (87.6%)
	1	228,482 (16.0%)		1	176,267 (12.4%)

Table 6. Average impact on accuracy and bias metrics per model and mitigation algorithm.

	Average Global Accuracy Loss	Average Balanced Accuracy Loss	Average SPD Reduction	Average EOD Reduction
Reweighting— Active Modes	0.14%	0.55%	74.01%	29.37%
Reweighting—Beijing	0.02%	−0.06%	26.99%	8.60%
MetaFairClassifier— Active Modes	31.51%	2.84%	−363.54%	27.77%
MetaFairClassifier— Beijing	11.43%	25.13%	67.50%	78.37%
Calibrated Equalized Odds—Active Modes	0.52%	3.48%	74.48%	37.17%
Calibrated Equalized Odds—Beijing	0.72%	2.25%	69.65%	59.03%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vega-Gonzalo, M.; Christidis, P. Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling. Sustainability 2022, 14, 8416. https://doi.org/10.3390/su14148416

AMA Style

Vega-Gonzalo M, Christidis P. Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling. Sustainability. 2022; 14(14):8416. https://doi.org/10.3390/su14148416

Chicago/Turabian Style

Vega-Gonzalo, María, and Panayotis Christidis. 2022. "Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling" Sustainability 14, no. 14: 8416. https://doi.org/10.3390/su14148416

APA Style

Vega-Gonzalo, M., & Christidis, P. (2022). Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling. Sustainability, 14(14), 8416. https://doi.org/10.3390/su14148416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fair Models for Impartial Policies: Controlling Algorithmic Bias in Transport Behavioural Modelling

Abstract

1. Introduction

2. Literature Review

2.1. Machine Leaning and Transport Modelling

2.2. Fairness Definitions and Metrics

2.3. Bias Mitigation Algorithms

2.4. Transport Modelling and Equality

3. Materials and Methods

3.1. Methodology Overview

3.2. Fairness Metrics and Bias Mitigation Algorithms

3.2.1. Reweighting

3.2.2. MetaFairClassifier

3.2.3. Calibrated Equalized Odds

3.3. Description of the Models and Data

3.3.1. COMPAS

3.3.2. Active Modes Model

3.3.3. Multimodal Transportation in Beijing

4. Results

4.1. COMPAS Model

4.2. Active Modes Model

4.3. Beijing Model

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

EU Disclaimer

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI