1. Introduction
Transport systems performance evaluation and forecasting is a difficult task currently under consideration by many research centers. This complexity emerges mainly due to its stochastic nature, whereby a large number of variables and personal preferences come into play [
1]. Transport infrastructures are highly expensive and if the solution does not fit the necessities of the citizens, the infrastructure might need to be upgraded, increasing the cost or, even worse, becoming a worthless investment [
2]. Therefore, a cheap and flexible method for testing future transport developments is to use computer simulations supported by advanced mathematical models [
3].
To calculate the impact of the transport sector, traditional approaches are relying on simulations, a paradigm that has become a popular and effective technique to analyze a wide range of dynamically changing systems. The essence of simulating is to carry out a series of computational experiments using a model that describes the operations of the real system (how drivers, vehicles, signals, pedestrians, etc. interact) and to generate output data that characterizes the subject system [
4]. Traffic simulation allows one to check, monitor, and evaluate the behavior of the real systems under different realistic conditions in an artificial-computer-based environment [
5]. Most common traffic simulations focus on simulating everyday traffic activity in search of congestions, assessing different ways of enabling traffic in intersections (crossings, roundabouts, traffic lights, etc.) or applying dynamic speed limits to avoid jams.
Over the last decades, many universal approaches have been widely used for modeling stochastic processes, such as Monte Carlo techniques [
6], Markov chain models [
7], or queuing theory methods [
8]. Today’s state-of-the-art simulation packages (listed later) add features such as graphical interfaces and object-oriented programming and even allow one to simulate both continuous and discrete events. Regarding transport simulation, two mutually interrelated pieces of information are needed [
9]: the demand that characterizes the need for movement (passengers or freight) and a description of the transport network, traffic zones, and vehicles. By combining these data sets, state-of-the-art models follow the standard concept of the four-step model (FSM) [
10]. FSM is a primary tool for the analysis of transport systems, assessment of their performance, and prediction of their future behavior. This model is composed of the following phases:
Trip generation: Generate a specific, whether fixed or approximated, amount of vehicles that represents the population.
Trip distribution: Spread the population along the scenario in a realistic distribution.
Transport choice: Schedule their transport mode and itineraries throughout the city to reach their destination.
Traffic simulation: Estimate the global impact derived from the configuration of a single scenario and compare the results with those obtained from others.
In this sense, multi-agent systems (MASs) have been an active area of research in the last three decades [
11]. What started as isolated systems for simple problem-solving has turned into full-scale simulation platforms that are able to integrate several state-of-the-art technologies. Specially, for transport systems evaluation, agent-based simulation models have been proven to be very useful [
12]. In these models, each vehicle is represented as a software object, a so-called intelligent agent, that has its own set of parameters and attributes that define how the agent will behave, interact, make decisions, and communicate with the other agents in the system [
13]. This approach allows one to build a thoroughly detailed description of the model population. According to their accuracy and scope, simulation models can be classified as follows:
Micro-simulation models: These models describe traffic with a high level of detail and distinguish separate elements in the traffic flow, such as types of vehicles and pedestrians. Though the use of this high level of detail entails very precise analysis, it must be limited to a small area or intersection due to its complexity. The most popular micro-simulation tools are PTV Vissim [
14], Aimsun [
15], and Corsim [
16].
Meso-simulation models: These models describe traffic with an intermediate level of detail, distinguishing separate elements in the traffic flow but not taking into account the interactions between them. They are less precise and can be applied to cover larger areas than those of the micro-simulation. The most popular meso-simulation tools are Dynasmart [
17] and Transims [
18].
Macro-simulation models: These models describe traffic with a high level of aggregation, as a uniform traffic flow. They are based on deterministic relationships between the parameters characterizing the traffic flow, such as volume, speed, or density. Macroscopic simulation has been developed to model an entire transport network and/or system. The most popular macro-simulation tools are Emme/2 [
19], PTV Visum [
20], and Transcad [
21].
A great amount of research effort has been put into understanding transport mode choice modeling and how the travelers’ background, trips and transport facilities influence the attractiveness of transport [
22]. The evaluation of transport systems depends completely on how people adapt to it. This implies modeling social preferences that have particular importance when traveling, such as time, cost, comfort, or environment-friendly awareness. Even as the aforementioned simulators are relatively easy to run and have many proven success stories, most focus primarily on the calculation and estimation of routes and congestion avoidance, leaving out the social behavior that influences the traveler’s choice of using one transport method or another.
For those simulators that model transport selection, discrete choice regression models (probit and logit models) [
23] are the most widespread techniques [
24]. In fact, the three simulators that include transport choice models (Aissum, TRANSIMS, and TransCAD) implement these types of models. However, the addition of new transport modes, the alteration of the transport network, or the evaluation of the application of new incentives and restrictions requires calibrating (and sometimes building) the entire model again.
Within this scope, this article compares eight methodologies to model human decision making when choosing between transport modes. The objective is to emphasize the models that perform better and assess the difficulty of extending them to assess the introduction of new policies or transport methods. All models will be trained with a real data set, and their results will be validated with another data set. This process is of great importance for analyzing the impacts that sustainable transport policies will have when applied over the specific geographical and demographic characteristics of a certain area.
The rest of the paper is organized as follows.
Section 2 describes the materials used and details of the proposed transport choice methodologies.
Section 3 presents and analyzes the results derived from the application of these methodologies to two different scenarios. The final section presents the conclusions and discussion that emerge from the analysis of the results.
3. Experimental Results
This section presents the results for the execution of the eight data science methods described in
Section 2. The experiments were carried out using the following:
The GeoWorldSim platform [
47] with the open-source fuzzy logic library Fuzzylite [
48]. GeoWorldSim is a powerful software that eases the integration of multi-agent systems with reference simulations tools such as Matlab or EPA-NET.
The Caret package [
49] of the R Statistical Suite [
50]. This package uniforms the access to several libraries and stages needed to build data-based models. For example, it provides a uniform way to tune the hyper-parameters, train the models, and produce a forecast among other facilities.
Simulations were executed on two computers: large experiments were run on a system with an AMD Opteron 6168 CPU and 32 GB of RAM, and short experiments were run on a system with an Intel Core i7-2600 CPU and 16 GB of RAM.
Table 5 shows the global accuracy results, that is, the number of trips correctly classified. The first row
(5-TM) shows the results with the five transport modes, while the second row
(3-TM) shows the results after grouping
CAR and
MOTORCYCLE into
PRIVATE VEHICLE as well as
WALK and
BICYCLE into
WALK.
As seen, all techniques are able to transfer the results almost seamlessly since the differences between the accuracy of the training (T) and validation (V) sets are rather small. As explained previously, each method was trained with data from the census of Biscay (Basque Country, Spain) and validated over the data set of Silesian Voivodeship (Poland). The models can be classified in three groups: the cluster of the best models, the contrast models, and the rest of the models. As expected, the worst model is the random search (RA). The best models are the most complex ones: SVM, NN, and M. Surprisingly, the KNN method, while being very simple, achieves the best results. The rest of the models—EK, CE, and B—complete the clusters of the other models. In all cases, the best models can correctly predict the transport choice of almost half of the trips registered in the data set.
For more detail,
Table A1 contains the confusion matrices of all models included in this article. As previously stated, within these matrices, columns show the observed modes, and rows show the forecasted real modes. In theses tables, it is clear that some of the models have difficulties differentiating between
CAR -
MOTORCYCLE and
WALK -
BICYCLE. This may be caused by the existence of other additional variables, such as socio-demographics, car/motorcycle ownership, or climatology factors that are not taken into consideration in this article. Therefore, since the census lacked information about these features, the five transport modes were grouped into
three categories. This way,
CAR and
MOTORCYCLE were gathered into
PRIVATE VEHICLE,
WALK -
BICYCLE into
WALK, and
TRANSIT is maintained as its own class.
With these three transport modes, all techniques are able to correctly transfer the results from the training to the validation set (
Table 5). As before, three clusters of techniques appear. On the one hand, RA continues to be the worst method, but it is close to B. On the other hand, EK has greatly improved its performance and produces the best cluster. In any case, the best models have not improved their performance, yet they are still able to correctly predict the transport choice of half of the trips made.
In order to validate the previous claims, a bootstrapping validation was used [
51]. Thus, 100 samples were randomly built following the original distribution for the Silesia data sets.
Figure 9 shows a box plot with the results of the different models bringing to light the significant differences among them. In fact, a Friedman test confirms this hypothesis (
p-value
). In order to assess the differences among the different models, a post-hoc analysis has been made following the procedure described in [
52]. This procedure clusters all methods with similar results and gives them the same letter. It should be taken into consideration that these groups are not sorted, namely, the best algorithms are not labeled with an
a. The results are shown in
Figure 10. In this figure, the mean value of the different algorithms is plotted and groups are identified with a letter. As can be seen, the test confirm our claims. The three groups are clearly visible.
Again, when analyzing the results in detail, problems arise for some of the models. Looking at the confusion matrices (
Table A2) highlights the fact that some models continue to have problems with some of the classes. For example, the SVM cannot correctly predict any
WALK trip (even if it has correctly predicted half of the trips). The same problem occurs, when looking at the M and NN models, though it is not as acute. In the end, it seems that only the KNN, EK, and CE models do not suffer these problems in the setting.
Finally,
Table 6 shows the forecasts that the different models share. As before, 100 bootstrapping samples were built from the Silesia data set. Column Real shows the results of the survey carried out in 2015 [
26], and Columns L.C.I. and U.C.I. contain the lower and upper confidence interval at the 0.05 significant level (
) for the forecast value made. As expected, given the low variance of the bootstrapped samples (see
Figure 9) the modal split forecasts of all models are quite similar (the confidence intervals are quite narrow). Nevertheless, the real value is not included in the confidence interval.
4. Conclusions and Future Work
This article presents eight methodologies for building transport choice models according to FSM. The methodologies were trained for one scenario, Biscay (Basque Country, Spain), and later evaluated with a second scenario, Silesian Voivodeship (Poland). As explained in
Section 3, the results suggest the existence of three groups of models: On the one hand, as expected, the most complex models tested (SVM, CE, and M) perform quite well, hitting almost half of the trips. Next, there is another set of complex algorithms but with slightly worse results (NN, EK, and KNN). The results of the KNN algorithm was in fact unexpected, as it is a very simple algorithm. On the other hand, the control methods (B and RA) do not achieve good results, as expected. In order to draw meaningful conclusions, a qualitative analysis should also be made:
Some of the models produce unbalanced predictions, like M, SVM and NN, which completely ignore some of the transport modes. Therefore, given the similar prediction capabilities, the models with more balanced predictions should be used, namely, the EK, CE, and KNN models.
The number of parameters and complexity of the models are not the same either. In this sense, simpler models should be preferred to more complex models. However, EK, CE, and KNN are non-parametric, and their complexity is difficult to assess. On the one hand, the number of parameters of KNN is indeed the amount of points in the training data set, but the complexity of the model is quite low (understanding here the complexity in terms of its VC dimension or similar measures [
53]). On the other hand, the amount of parameters in the EK and CE models is quite low (compared to the number of parameters of KNN), but the complexity of the model is higher [
54].
Comparing the EK and CE fuzzy rule sets, both have similar amounts of rules (EK has 33 and CE has 30). However, the EK rules are only composed of one or two terms, which makes it easy to follow and modify, while the CE evolved rules usually have about 10 or more terms with a relation between them that is not very clear.
It is always better to use a model that is easy to be understood than a data-driven model. Following this advice, EK and M should be preferred to the rest of the models.
Based on the above considerations, it is clear that the best model is EK, as it is possible to be understood, is easy to be extended so as to cover new transport policies, has a complexity that is similar to other models that produce similar numerical results, and its predictions are well balanced among the classes.
Even though the results are good, the models need to be improved. At the moment, the methodology does not take into account socio-economic variables, which are not present on the tested data sets (see
Table 2). Recent works ([
55,
56]) have achieved successful results by using demographic and socio-economic features. As shown in [
56] (where they have almost reached a 90% chance of success), socio-economic information from commuters enables the clustering and creation of custom utility functions for each group. Furthermore, climatic variables have been identified as relevant [
28]. Future refinements of the model may introduce both sets of variables in search of reducing error. A first approach should consist of adding these non-trip related features to the global census models and evaluate their improvement. In case this is not satisfactory, the next step will be to try clustering the commuters and adjusting the models for each group.
Additionally, the ownership of private vehicles is a relevant parameter not detailed in the used data sets. In order to introduce this information, the model will use Monte Carlo simulations so that several distributions of these variables are simulated and the results assessed.
Finally, once the transport choice model is fitted, the next step is to determine the extent to which different non-physical incentives, such as changes in the price of the journey, discounts on public transport, additional taxes on the use of private vehicles, reductions in the duration of the journey, and congestion charges, affect commuters’ choices. These incentives alter the features of the itineraries, resulting in a different modal split. In order to check the suitability of these incentives, the citizen agents within the simulation will be given a set of preferences that will determine whether they are prone to accept or ignore a certain policy, based on the itinerary features already modeled in the transport choice model. This approach will involve defining the thresholds at which the applications of transport policies will be effective, all the while seeking an equilibrium between the objectives of the policies and the citizens’ preferences in order to avoid social rejection or oversizing.