Transport Choice Modeling for the Evaluation of New Transport Policies

: Quantifying the impact of the application of sustainable transport policies is essential in order to mitigate effects of greenhouse gas emissions produced by the transport sector. One of the most common approaches used for this purpose is that of trafﬁc modelling and simulation, which consists of emulating the operation of an entire road network. This article presents the results of ﬁtting 8 well known data science methods for transport choice modelling, the area in which more research is needed. The models have been trained with information from Biscay province in Spain in order to match as many of its commuters as possible. Results show that the best models correctly forecast more than 51% of the trips recorded. Finally, the results have been validated with a second data set from the Silesian Voivodeship in Poland, showing that all models indeed maintain their forecasting ability.


Introduction
Transport systems performance evaluation and forecasting is a difficult task currently under consideration by many research centers.This complexity emerges mainly due to its stochastic nature, whereby a large number of variables and personal preferences come into play [1].Transport infrastructures are highly expensive and if the solution does not fit the necessities of the citizens, the infrastructure might need to be upgraded, increasing the cost or, even worse, becoming a worthless investment [2].Therefore, a cheap and flexible method for testing future transport developments is to use computer simulations supported by advanced mathematical models [3].
To calculate the impact of the transport sector, traditional approaches are relying on simulations, a paradigm that has become a popular and effective technique to analyze a wide range of dynamically changing systems.The essence of simulating is to carry out a series of computational experiments using a model that describes the operations of the real system (how drivers, vehicles, signals, pedestrians, etc. interact) and to generate output data that characterizes the subject system [4].Traffic simulation allows one to check, monitor, and evaluate the behavior of the real systems under different realistic conditions in an artificial-computer-based environment [5].Most common traffic simulations focus on simulating everyday traffic activity in search of congestions, assessing different ways of enabling traffic in intersections (crossings, roundabouts, traffic lights, etc.) or applying dynamic speed limits to avoid jams.
Over the last decades, many universal approaches have been widely used for modeling stochastic processes, such as Monte Carlo techniques [6], Markov chain models [7], or queuing theory methods [8].Today's state-of-the-art simulation packages (listed later) add features such as graphical interfaces A great amount of research effort has been put into understanding transport mode choice modeling and how the travelers' background, trips and transport facilities influence the attractiveness of transport [22].The evaluation of transport systems depends completely on how people adapt to it.This implies modeling social preferences that have particular importance when traveling, such as time, cost, comfort, or environment-friendly awareness.Even as the aforementioned simulators are relatively easy to run and have many proven success stories, most focus primarily on the calculation and estimation of routes and congestion avoidance, leaving out the social behavior that influences the traveler's choice of using one transport method or another.
For those simulators that model transport selection, discrete choice regression models (probit and logit models) [23] are the most widespread techniques [24].In fact, the three simulators that include transport choice models (Aissum, TRANSIMS, and TransCAD) implement these types of models.However, the addition of new transport modes, the alteration of the transport network, or the evaluation of the application of new incentives and restrictions requires calibrating (and sometimes building) the entire model again.
Within this scope, this article compares eight methodologies to model human decision making when choosing between transport modes.The objective is to emphasize the models that perform better and assess the difficulty of extending them to assess the introduction of new policies or transport methods.All models will be trained with a real data set, and their results will be validated with another data set.This process is of great importance for analyzing the impacts that sustainable transport policies will have when applied over the specific geographical and demographic characteristics of a certain area.
The rest of the paper is organized as follows.Section 2 describes the materials used and details of the proposed transport choice methodologies.Section 3 presents and analyzes the results derived from the application of these methodologies to two different scenarios.The final section presents the conclusions and discussion that emerge from the analysis of the results.

Materials and Methods
In order to better understand the methodology presented, the next subsections will detail its implementation according to the stages defined in the FSM.

Trip Generation and Distribution
In order to generate the information to train the models, agent-based simulations were extensively used.The main characters of the simulation are the citizens, where each citizen agent has a set of personal preferences that will directly influence the transport choice for traveling.For the generation and distribution of the trips for the experimentation, two data sets were used.On the one hand, Biscay's commute data (depicted in Figure 1a and Figure 2a) was extracted from the census performed every 10 years by the Spanish National Statistics Institute [25].This data set covers the region of Biscay, in the Basque Country, with an area of 2217 km 2 .The census also details the points of origin and destination of traveling for the most populated 11 municipalities of the region, with a total population of about 1,100,000 citizens.This data set will be used to train the different models.On the other hand, Silesia commutes (depicted in Figure 1b and Figure 2b) were extracted from the green traveling project's surveys [26].The data set covers 19 municipalities of the central part of Silesian Voivodeship with a population of 4,710,000 citizens.This data set will be used to validate the results.When the data set is brought to life by the simulation, each citizen agent registers its point of origin, point of destination, departure time, and the itinerary for the five transport modes.The platform preprocesses the input data to geolocate each citizen agent on the environment through a weighted distribution.This process assigns the origin of each citizen agent to household buildings in the municipality and the destination to available amenities, commercial, or industrial buildings.The weighted distributions are based on the buildings' size and levels, i.e., the bigger and the taller the building, the higher the probability of assigning a citizen agent to it.Once located, citizens will calculate the five itineraries for CAR, MOTORCYCLE, TRANSIT, BICYCLE, and WALK.For each itinerary, the main characteristics of every choice (see Section 2.2 for details) will be stored in order to build their decision process.Some itineraries might be null due to a lack of public transport stop nearby or an inability to calculate the route.

Transport Mode Choice
There are many features that determine the modal split of a territory.More than 45 years ago, the authors of [27] identified the main features that influence commuters' choices, and since then other important characteristics have been added to the list [28].In spite of using different techniques, researchers agree that itinerary dependent variables can be grouped into four main categories that represent the physical and factual features of a trip [29]: duration, price, length, and environmental impact.Table 1 describes these categories and shows how they are usually further divided in other second-level features.Nonetheless, these four variables do not cover some other features listed in the literature, which are directly linked to the traveler's background.These background variables, presented in Table 2 do not appear in the census nor in the surveys and are therefore beyond the scope of this study (see Section 4 for details).

Length
Distance of the itinerary (in kilometers).
For long-and medium-length travels, this is not a determinant factor.This is only when walking or cycling are an option.
Walking distance.Cycling distance.

Environmental impact
Contribution to climate change (measured in CO 2 emissions).Regarding short distance and itineraries, where transports compete in similar conditions, environmental awareness can be the trigger that determines user decisions.
Environmental impact of the trip.

Climatology
Variables that may have a direct impact on the physical variables of the trip, i.e., if it rains, the duration of a trip may take longer than usual.

Environment climatology
Missing values have been the main source of problem.In these data sets, missing values appear as NaN (not a number), representing that the itinerary is undefined.There are valid reasons to have missing values in the data set (for example, it is not possible to travel from one place to other using public transport), but a closer look reveals that the missing values are randomly distributed and do not follow any logical pattern (for example, itineraries that can be performed by MOTORCYCLE and not by CAR).This suggests that missing values originated from the routing algorithm time-out errors due to the inability to calculate complex routes.For this reason, all NaN values were replaced with an arbitrary number that was greater than any of the real measured values (in this case, all NaNs were replaced by the maximum value in the data set).
The rest of the variables do not present any obvious problems.In fact, they clearly show an expected distribution.For example, Figure 3 shows the length of the trips made with BICYCLE.As expected, there is a point where the trips using BICYCLE are too costly and fall sharply.Another expected result is, for example, the distribution of time expended in CAR trips.As seen in the left panel of Figure 3, the distribution almost follows a χ 2 distribution.As citizens tend to be normally geographically distributed [30], the distance traveled follows a χ 2 distribution, as does the time spent.The influence of these variables on transport choice was modeled through eight modeling techniques: k-nearest neighbors (KNNs) [31], multinomial logit (M) [32], support vector machines (SVMs) [33], neural networks (NNs) [34], naïve Bayes (B) [35], fuzzy logic (CE) [36], expert knowledge (EK) [37], and random search (RA).

Multinomial Logit Models
Following the so-called latent construct interpretation, multinomial models work by adjusting a helper function p that takes as input a vector of external factors that affect the choice and gives as output the selected one.For a two class case, the function p takes the form: where X is a vector of the variables considered to make the regression, β is the vector of regressors to be estimated, and ε is a random variable following the logistic (logit) or normal (probit) distribution.Details can be consulted in [32].

Support Vector Machines
This technique builds a black-box model that can be tailored for classification and regression problems [33].A more detailed explanation is beyond the scope of this document and readers interested can consult [33] for details.Since the transport choice is a classification problem, the default approach ε-classification was followed.This technique features three hyper-parameters: C controls the penalization to the errors in the training sample, ε controls the margin of tolerance to errors, and γ controls the bandwidth of the radial function.These hyper-parameters have been optimized using a grid search following the advice given in [38].

Neural Networks
This technique builds a black-box model that tries to mimic the behavior of the natural neurons.This way, a neural network is composed of a set of artificial networks arranged into three layers: an input layer, a hidden layer, and an output layer (see Figure 4).The outputs of one layer are connected to the inputs of the subsequent layers, building in this way a network of neurons.Finally, each neuron computes the following function: where x is the input vector w, b are the parameters of the model (wights and bias), and ϕ := 1 1+e −x denotes the sigmoid function [39].Further details can be consulted in [34].The hidden layer features several numbers of neurons, and the parameters of the model have been adjusted through backpropagation.
The main hyper-parameter of this model is the number of neurons m in the hidden layer.In this case, a grid search has been followed to optimize this value such that m = 100 is the best value.

The Naïve Bayes (B) model
This model extends Bayes' theorem to perform classifications.This way, this model estimates a probability p c (X) of every transport mode given the vector of variables considered to make the prediction X.For this end, the algorithm just needs to calculate an estimation of the probability of X conditional of every transport mode c and then use Bayes' Theorem.The output class will be the one with the greatest probability.Details of this model can be found in [35].

k-Nearest Neighbours
This method uses the notion of the closeness of the input to perform classifications.This way, given a new element y to classify, the algorithm searches the k-nearest elements known in the training set with respect to a given metric (usually the Euclidean) and classifies y as the weighted mean value of the nearest neighbours (in a regression setting) or the most common class among the k-nearest elements (in a classification setting as this one).Using a grid search procedure, it was found that k = 5 was the best value for this project.

Fuzzy Logic
All methods presented above are black-box models or are not straightforward in considering new transport modes or new input variables.Fuzzy logic [36], however, has been shown to achieve high success in the emulation of individual preferences for transport mode choice [40,41], therefore enabling a wider level of parametrization for each citizen's profile.A well trained fuzzy logic engine and accurately defined rule sets are expected to provide consistent results even in the case of variation in the characteristics of the transport network.
Fuzzy logic engines build discrete choice models through a simple natural language rule-based approach [42].Classical logic only allows conclusions to be true or false.In contrast, fuzzy logic defines a set of rules that map numeric data into linguistic terms and creates fuzzy sets, indicating the extent to which each term is part of.That is, a variable can have several values (called terms) that can be overlapped and shaped.Thereby, it is possible to model how each feature increases or decreases the likelihood of choosing a given transport mode.Fuzzy logic engines are parametrized though their inputs, outputs, and rules.For this research, the fuzzy logic was built as follows: Inputs, terms, and membership functions: For the engine to translate from linguistic terms to numbers, inputs are composed of a name that identifies the input, the possible terms or values that the input can take, and a membership function that limits the extent to which each term can be classified.The inputs of the model presented here are built from the four previously mentioned itinerary features: DURATION, PRICE, LENGTH, and ENVIRONMENTAL IMPACT.
Each of these inputs feature a fixed set of three terms: LOW, MEDIUM, and HIGH, shaped as triangles.The anchors of these triangles (α, β, γ, and δ), shown in Figure 5, are key values that describe how citizens understand these qualitative variables.Outputs, terms, and membership functions: Similarly to the inputs, the outputs are composed of a name that identifies the output, the possible terms or values the output can take, and a membership function for each term.The model presented here features a single output with five fixed, non-overlapping terms that represent the transport modes among which to choose: CAR, MOTORCYCLE, PUBLIC TRANSPORT, BICYCLE, and WALK, as represented in Figure 6.Rules: Rules follow the following structure: if INPUT is INPUT_TERM and . . .then OUTPUT is OUTPUT_TERM.The engine needs a set of rules, each with one output but with one or more terms, as depicted in Table 3.Additionally, the engine provides fuzzy value modifiers or hedges through natural language keywords to help describe border cases without the need to include additional terms: ANY, NOT, EXTREMELY, SELDOM, SOMEWHAT, or VERY.The defuzzifier will be the maximum value between the three terms, that is, the term that receives the maximum value and thus the one chosen by the citizen.Please note that only AND conjunctions will be used between rule terms, since disjunctions can be represented by splitting a rule in two or using the NOT hedge.A first approach of the definition of the fuzzy logic inputs and rules was created according to both literature [37] and internal transport surveys carried out for the green traveling project [43,44].From this information, a simple set of rules was extracted.These rules can be consulted in Table 4.

Duration membership function (α)
Table 4. Fuzzy engine rule sets extracted from surveys and expert knowledge.Source: own research generated from the surveys in [43,44].
While expert knowledge can lead to accurate solutions and is crucial for a better understanding of the problem, it is not flexible enough to variations in the inputs used to build that knowledge.Therefore, it is always advisable to further search whether a better solution can be reached.Trying to manually improve the expert knowledge rules to match a wider number of citizens from the census involves applying many changes to interrelated inputs and rule sets in a very large search space.One of the approaches to automate this is to use an evolutionary algorithm and to divide the search space in subsets of problems.
Genetic programming techniques have already been used to train fuzzy logic models [45].Here, a cooperative co-evolution programming method (CE) is presented.CE is a form of evolutionary computation method that divides a large problem into sub-components and solves them independently in order to solve the large problem [46].The sub-components are implemented as sub-populations and the only interaction between sub-populations is in the cooperative evaluation of each individual of the sub-populations.The cooperative evaluation of each individual in a sub-population is done by concatenating the current individual with the best individuals from the rest of the sub-populations.The algorithm presented hereby tackles both the adjustment of the the input linguistic variables membership functions and the search of the rule sets that most faithfully fit the census data.These two variables are strongly related since any change in the input terms' extent requires adapting the rules to the new meaning of the linguistic variable.Thus, the CE algorithm suits perfectly the problem of the optimizing two interrelated variables in two sub-components: Input-evolving sub-component: This is composed of a population of 200 input sets.Each input set features the four input variables already mentioned-DURATION, PRICE, LENGTH, and ENVIRONMENT.Each input has three fixed and ordered terms-LOW, MEDIUM, and HIGH-whose triangle anchors α, β, γ, and δ (see Figure 5 for details) can be modified in search of limits that better fit the citizens' appreciation for these qualitative terms.Rule evolving sub-component: This is composed of a population of 200 rule sets.Each rule set features a list with a variable number of rules, and, for each rule, its terms and output are modified to extract the combination of rules that more faithfully represents citizens' transport choice from the census.
Each combination of input membership functions and rules defines what is called a fuzzy experiment.On evaluation, each fuzzy experiment configures the fuzzy logic engine to test how accurate the outputs of the engine are compared to the census.On the evolution of the input sets, whenever a change is applied in any of the triangle anchors, a fuzzy experiment will be created with corresponding input set, the fixed output, and the best found rule set to calculate the input set's fitness.Likewise, on the evolution of the rule sets, whenever a change is applied to the rules (whether it is an addition, a deletion, or a change in the terms), a fuzzy experiment will also be created with a corresponding rule set, and the best input set for calculating its fitness will be found.
Although evolutionary algorithm operators traditionally use mutation and crossover, rule set mutations produce only slight changes in the population.Therefore, a more abrupt approach has been designed by relying only on the crossover operator.The crossover operators for evolving rule sets and evolving input sets are detailed below.

Operators used for Rules:
rule set slice crossover Given two rule sets, this operator creates a new rule set, taking the first to n rules from the first set and the n to last rules from the second set.rule set combine crossover Given two rule sets, this operator creates a new rule set, appending for each position a randomly selected rule from the same position of the two provided rule sets.
rule set slice terms crossover Given two rule sets, this operator clones the first rule set and selects a p rule in the set.The p rule's terms are replaced by the first to n terms from the p rule of the first set and n to last from the p rule of the second provided set.

Fuzzy Logic Input Membership Functions:
input set combine crossover Given two input sets, this function creates a new input set, appending for each position a randomly selected element from the same position of the two provided input sets.input set shape mutation Given an input set, this function modifies one of the α, β, γ, or δ values from the shapes in Figure 5.It will displace at the same time the first term's triangle's third point, the second term's triangle's mid-point, and the third term's triangle's first point.
The fitness of a fuzzy experiment is given by the amount of citizens that match the transport choice originally registered in the census (that is, the accuracy of the forecasting model).The goal of the CE algorithm is, therefore, to maximize the fitness in order to estimate as many correct transport choices as possible.For this, a confusion matrix is calculated within each fitness evaluation.Confusion matrices are square matrices with a number of rows and columns equal to the number of output terms.Columns denote the amount of real elements, and rows denote the predicted elements of that class, that is, a column contains a split of 100% of the elements.Namely, cell m ij denotes the percentage of elements classified as i that are of class j.The diagonal of this matrix (namely the elements m ii ) describes the correctly classified elements.For instance, in Table A1, 19.8004% of C (CAR) instances were forecast correctly, but 14.4995 % of C users were forecast as M (MOTORCYCLE) users.The global fitness is composed of the best element of the input and rules populations as seen in Figure 7.
Since the number of elements for each vehicle type is not balanced, a fuzzy experiment that matched only CAR would obtain a better fitness than other operating on a more distributed modal split.Therefore, the previously defined correctly classified elements has been improved to include balanced correctly classified elements.In this sense, let t k denote the relative partial matches of class k, namely, Then, the fitness function f is defined as the sum of the logarithm of the partial matches t k .Namely, Please note that we add 1 to the partial matches to avoid undefined values of the fitness function when a model does not correctly classify any element.Thus, with this approximation, not only are all vehicle types balanced, but rules that match the census in a more distributive way also weigh more.
Additionally, a gradually growing subset of the census is used to calculate the fitness.This subset is extended as the matched percentage of that subset increases.That is, the first iteration's fitness is calculated with a 10 % subset of the entire census data set.Following iterations, the subset will be kept equal or increased parallel to the matched percentage.In this way, as matched elements increase, more elements are introduced in the census, against which the next iteration's fitness can be evaluated.Moreover, in order to go through all the elements of the census, at every 15 iterations, the subset to extract from the census is moved by 10 %.This changes in the reference data with which to evaluate the fitness and generates the repeating pattern visible in Figure 7.
As a summary, the pseudo code for the CE algorithm is described in Figure 8.The algorithm is divided into two main procedures: 1.
initialize the population of rule sets and input sets that will take part in the co-evolutionary process, and 2. initialize the co-evolutionary process itself where the rule sets and input sets evolve together through the application of the evolutionary operators, the calculation of the fuzzy experiments' fitness on each iteration, and the update of the percentage of census being used for evaluation.Initialize Census' to 10 % of the whole census Add new input set to New input sets Random Search (RA) In order to assess whether the results had been produced by chance or, indeed, due to the evolutionary process, a simple yet powerful method, random search, was also trained and validated.For this purpose, the same amount of random fuzzy models as that of fitness evaluations for the CE were produced.As 200 rules are in the population and 200 generations have been made, 400,000 random search rules have been generated.All models, in general, and the CE algorithm, in particular, are expected to produce far better results than this method.

Transport Simulation
Since the current research focuses primarily on transport choice modeling, the description and evaluation of the transport simulation stage will be featured in future works.Section 4 describes how this phase will be tackled in future works.

Experimental Results
This section presents the results for the execution of the eight data science methods described in Section 2. The experiments were carried out using the following:
GeoWorldSim is a powerful software that eases the integration of multi-agent systems with reference simulations tools such as Matlab or EPA-NET.

•
The Caret package [49] of the R Statistical Suite [50].This package uniforms the access to several libraries and stages needed to build data-based models.For example, it provides a uniform way to tune the hyper-parameters, train the models, and produce a forecast among other facilities.
Simulations were executed on two computers: large experiments were run on a system with an AMD Opteron 6168 CPU and 32 GB of RAM, and short experiments were run on a system with an Intel Core i7-2600 CPU and 16 GB of RAM.
Table 5 shows the global accuracy results, that is, the number of trips correctly classified.The first row (5-TM) shows the results with the five transport modes, while the second row (3-TM) shows the results after grouping CAR and MOTORCYCLE into PRIVATE VEHICLE as well as WALK and BICYCLE into WALK.
Table 5. Global accuracy (%) of the algorithm.Column T (testing) shows the results for Biscay's data set and Column V (validation) for Silesia.Source: own research by extracting the forecast accuracy of the different models in Table A1.EK: expert knowledge; CE: fuzzy logic; SVM: support vector machine; M: multinomial logit; NN: neural network; B: naive Bayes; KNN: knearest neighbor; RA: random search.As seen, all techniques are able to transfer the results almost seamlessly since the differences between the accuracy of the training (T) and validation (V) sets are rather small.As explained previously, each method was trained with data from the census of Biscay (Basque Country, Spain) and validated over the data set of Silesian Voivodeship (Poland).The models can be classified in three groups: the cluster of the best models, the contrast models, and the rest of the models.As expected, the worst model is the random search (RA).The best models are the most complex ones: SVM, NN, and M. Surprisingly, the KNN method, while being very simple, achieves the best results.The rest of the models-EK, CE, and B-complete the clusters of the other models.In all cases, the best models can correctly predict the transport choice of almost half of the trips registered in the data set.
For more detail, Table A1 contains the confusion matrices of all models included in this article.As previously stated, within these matrices, columns show the observed modes, and rows show the forecasted real modes.In theses tables, it is clear that some of the models have difficulties differentiating between CAR -MOTORCYCLE and WALK -BICYCLE.This may be caused by the existence of other additional variables, such as socio-demographics, car/motorcycle ownership, or climatology factors that are not taken into consideration in this article.Therefore, since the census lacked information about these features, the five transport modes were grouped into three categories.This way, CAR and MOTORCYCLE were gathered into PRIVATE VEHICLE, WALK -BICYCLE into WALK, and TRANSIT is maintained as its own class.
With these three transport modes, all techniques are able to correctly transfer the results from the training to the validation set (Table 5).As before, three clusters of techniques appear.On the one hand, RA continues to be the worst method, but it is close to B. On the other hand, EK has greatly improved its performance and produces the best cluster.In any case, the best models have not improved their performance, yet they are still able to correctly predict the transport choice of half of the trips made.
In order to validate the previous claims, a bootstrapping validation was used [51].Thus, 100 samples were randomly built following the original distribution for the Silesia data sets.Figure 9 shows a box plot with the results of the different models bringing to light the significant differences among them.In fact, a Friedman test confirms this hypothesis (p-value < 10 −16 ).In order to assess the differences among the different models, a post-hoc analysis has been made following the procedure described in [52].This procedure clusters all methods with similar results and gives them the same letter.It should be taken into consideration that these groups are not sorted, namely, the best algorithms are not labeled with an a.The results are shown in Figure 10.In this figure, the mean value of the different algorithms is plotted and groups are identified with a letter.As can be seen, the test confirm our claims.The three groups are clearly visible.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q EK CE SVM M NN B KNN RA Box plot of the distribution of the models' accuracy of the three transport modes on 100 bootstrapping samples.In Biscay, the probability of using BICYCLE, CAR, MOTORCYCLE, TRANSIT, and WALK is 1.75%, 43.51%, 2.1%, 37.6%, and 15.04%, respectively.In Silesia, the probability of using BICYCLE, CAR, MOTORCYCLE, TRANSIT, and WALK is 0.82%, 53.96%, 0.12%, 36.33%, and 8.75%, respectively.Therefore, if a model can produce such a forecast, it will achieve an accuracy of 100 %.Source: own research generated from R Statistical Suite [50].Again, when analyzing the results in detail, problems arise for some of the models.Looking at the confusion matrices (Table A2) highlights the fact that some models continue to have problems with some of the classes.For example, the SVM cannot correctly predict any WALK trip (even if it has correctly predicted half of the trips).The same problem occurs, when looking at the M and NN models, though it is not as acute.In the end, it seems that only the KNN, EK, and CE models do not suffer these problems in the setting.
Finally, Table 6 shows the forecasts that the different models share.As before, 100 bootstrapping samples were built from the Silesia data set.Column Real shows the results of the survey carried out in 2015 [26], and Columns L.C.I. and U.C.I. contain the lower and upper confidence interval at the 0.05 significant level (α = 0.5) for the forecast value made.As expected, given the low variance of the bootstrapped samples (see Figure 9) the modal split forecasts of all models are quite similar (the confidence intervals are quite narrow).Nevertheless, the real value is not included in the confidence interval.

Conclusions and Future Work
This article presents eight methodologies for building transport choice models according to FSM.The methodologies were trained for one scenario, Biscay (Basque Country, Spain), and later evaluated with a second scenario, Silesian Voivodeship (Poland).As explained in Section 3, the results suggest the existence of three groups of models: On the one hand, as expected, the most complex models tested (SVM, CE, and M) perform quite well, hitting almost half of the trips.Next, there is another set of complex algorithms but with slightly worse results (NN, EK, and KNN).The results of the KNN algorithm was in fact unexpected, as it is a very simple algorithm.On the other hand, the control methods (B and RA) do not achieve good results, as expected.In order to draw meaningful conclusions, a qualitative analysis should also be made: • Some of the models produce unbalanced predictions, like M, SVM and NN, which completely ignore some of the transport modes.Therefore, given the similar prediction capabilities, the models with more balanced predictions should be used, namely, the EK, CE, and KNN models.

•
The number of parameters and complexity of the models are not the same either.In this sense, simpler models should be preferred to more complex models.However, EK, CE, and KNN are non-parametric, and their complexity is difficult to assess.On the one hand, the number of parameters of KNN is indeed the amount of points in the training data set, but the complexity of the model is quite low (understanding here the complexity in terms of its VC dimension or similar measures [53]).On the other hand, the amount of parameters in the EK and CE models is quite low (compared to the number of parameters of KNN), but the complexity of the model is higher [54].).However, the EK rules are only composed of one or two terms, which makes it easy to follow and modify, while the CE evolved rules usually have about 10 or more terms with a relation between them that is not very clear.

•
It is always better to use a model that is easy to be understood than a data-driven model.Following this advice, EK and M should be preferred to the rest of the models.
Based on the above considerations, it is clear that the best model is EK, as it is possible to be understood, is easy to be extended so as to cover new transport policies, has a complexity that is similar to other models that produce similar numerical results, and its predictions are well balanced among the classes.
Even though the results are good, the models need to be improved.At the moment, the methodology does not take into account socio-economic variables, which are not present on the tested data sets (see Table 2).Recent works ( [55,56]) have achieved successful results by using demographic and socio-economic features.As shown in [56] (where they have almost reached a 90 % chance of success), socio-economic information from commuters enables the clustering and creation of custom utility functions for each group.Furthermore, climatic variables have been identified as relevant [28].Future refinements of the model may introduce both sets of variables in search of reducing error.A first approach should consist of adding these non-trip related features to the global census models and evaluate their improvement.In case this is not satisfactory, the next step will be to try clustering the commuters and adjusting the models for each group.
Additionally, the ownership of private vehicles is a relevant parameter not detailed in the used data sets.In order to introduce this information, the model will use Monte Carlo simulations so that several distributions of these variables are simulated and the results assessed.
Finally, once the transport choice model is fitted, the next step is to determine the extent to which different non-physical incentives, such as changes in the price of the journey, discounts on public transport, additional taxes on the use of private vehicles, reductions in the duration of the journey, and congestion charges, affect commuters' choices.These incentives alter the features of the itineraries, resulting in a different modal split.In order to check the suitability of these incentives, the citizen agents within the simulation will be given a set of preferences that will determine whether they are prone to accept or ignore a certain policy, based on the itinerary features already modeled in the transport choice model.This approach will involve defining the thresholds at which the applications of transport policies will be effective, all the while seeking an equilibrium between the objectives of the policies and the citizens' preferences in order to avoid social rejection or oversizing.

Figure 2 .
Figure 2. Modal splits for the Biscay and Silesia data sets.Source: own research from the datasources of [25,26].

Figure 3 .
Figure 3. Distribution of time (left) and length (right) for CAR and BICYCLE transport modes in Silesian Voivodeship.

Figure 5 .
Figure5.Fuzzy engine input membership functions, with the configurable anchors that modify the shape of the terms.Source: own research for better understanding the input membership functions.

Figure 6 .
Figure 6.Fuzzy engine output.Each term represents a mode of transport one may choose.Source: own research for better understanding the fuzzy logic output membership function.

Figure 7 .
Figure 7. Iterations of the co-evolutive fuzzy algorithm.Source: own research by plotting the evolution in time of the algorithm.1: procedure INITPOPULATION 2:

24 :FillFigure 8 .
Figure 8. Pseudocode of the co-evolutionary fuzzy logic algorithm.Source: own research simplification of the real code programmed in the algorithm.

Figure 9 .
Figure 9. Box plot of the distribution of the models' accuracy of the three transport modes on 100 bootstrapping samples.In Biscay, the probability of using BICYCLE, CAR, MOTORCYCLE, TRANSIT, and WALK is 1.75%, 43.51%, 2.1%, 37.6%, and 15.04%, respectively.In Silesia, the probability of using BICYCLE, CAR, MOTORCYCLE, TRANSIT, and WALK is 0.82%, 53.96%, 0.12%, 36.33%, and 8.75%, respectively.Therefore, if a model can produce such a forecast, it will achieve an accuracy of 100 %.Source: own research generated from R Statistical Suite[50].

Figure 10 .
Figure 10.Results of an ALL vs. ALL post-hoc analysis of the accuracy of the models.Source: own research by plotting the results from R Statistical Suite [50].

Table 2 .
Itinerary features are not present in the data sets and are beyond the scope of the study.

Table 3 .
Example of fuzzy engine rule sets.Source: own research for explaining the structure and examples of fuzzy logic rules.if TRANSIT_DURATION is HIGH and TRANSIT_PRICE is HIGH then TRANSPORT is PRIVATE if TRANSIT_DURATION is HIGH and PRIVATE_DURATION is LOW then TRANSPORT is PRIVATE if PRIVATE_PRICE is HIGH and TRANSIT_PRICE is LOW and TRANSIT_DURATION is MEDIUM then TRANSPORT is TRANSIT if WALK_LENGTH is LOW and TRANSIT_PRICE is HIGH then TRANSPORT is WALK Expert Fuzzy Knowledge (EK) -TM 24.3265 22.0289 37.0689 39.9139 50.6600 50.05 50.1700 49.47 49.9950 47.62 36.030028.13 62.6800 47.22 14.1254 2.700 535 3-TM 46.8343 44.878 67 49.218 45.725 21 50.880049.32 51.0700 50.61 50.9500 45.73 37.9900 33.68 64.2500 47.50 31.27626.261 14

Table 6 .
Bootstraping values (in %) for the modal split forecast by the different models.Source: own research.