Travel Mode Detection Based on Neural Networks and Particle Swarm Optimization

The collection of massive Global Positioning System (GPS) data from travel surveys has increased exponentially worldwide since the 1990s. A number of methods, which range from rule-based to advanced classification approaches, have been applied to detect travel modes from GPS positioning data collected in travel surveys based on GPS-enabled smartphones or dedicated GPS devices. Among these approaches, neural networks (NNs) are widely adopted because they can extract subtle information from training data that cannot be directly obtained by human or other analysis techniques. However, traditional NNs, which are generally trained by back-propagation algorithms, are likely to be trapped in local optimum. Therefore, particle swarm optimization (PSO) is introduced to train the NNs. The resulting PSO-NNs are employed to distinguish among four travel modes (walk, bike, bus, and car) with GPS positioning data collected through a smartphone-based travel survey. As a result, 95.81% of samples are correctly flagged for the training set, while 94.44% are correctly identified for the test set. Results from this study indicate that smartphone-based travel surveys provide an opportunity to supplement traditional travel surveys.


Introduction
With worsening traffic congestion, significant attention has been given to travel behavior research, as well as transportation demand analysis, to develop and assess transportation demand management strategies.In practice, travel surveys are generally utilized to collect the required information and data for travel demand modeling and analysis.However, conventional travel surveys, which are typically conducted using paper-and-pencil interviews, computer-assisted telephone interviews, or computer-assisted self-interviews, may significantly burden respondents because all relevant information, including travel modes, travel start and end times, trip destinations, and trip purposes, related to each trip should be reported throughout the entire survey period.Respondents of multi-day travel surveys experience severe fatigue, which could decrease the quality of retrieved data and adversely affect travel behavior analysis.Furthermore, some travel details are typically recalled or recorded approximately.Wolf et al. [1] revealed that respondents were significantly burdened with these surveys and indicated that trip timing and trip distance were prone to be rounded in traditional travel surveys.In addition, trip rates may be underestimated in conventional travel surveys.Du and Aultman-Hall [2] indicated that trips or trip chains with a short duration or distance were usually neglected.
It has been widely accepted that data collection efforts based on GPS technology can present evident advantages over traditional travel surveys.One of the most important benefits is the alleviation of the burden on respondents [1,3].Burden reduction enables researchers to collect detailed travel information for a longer period without imposing additional burdens on respondents.Thus, the quality of the collected data is enhanced and researchers are allowed to investigate the dynamics of travel patterns over several days or weeks.For example, investigating the regularity and variability in trip destination decision, trip purpose frequencies, trip duration distributions, the number of daily trips, and temporal trip-making distributions over days of the week or even weeks of the month has become viable [4,5].Additionally, employing GPS technology significantly improves the availability of accurate travel information.For example, trip rates are expected to be corrected through location acquisition with high precision [6,7].GPS-enabled smartphones and dedicated GPS devices are typically utilized to record positioning data in GPS-based travel surveys.In recent years, travel surveys based on the latter have encountered more difficulties because of the following facts: (1) dedicated GPS devices are expensive; (2) the collected data are likely to be incomplete because respondents tend to forget to take the devices or charge them; (3) device distribution and recovery are required each time a new respondent is recruited; and (4) the number of respondents simultaneously participating in the survey is confined by the amount of available dedicated GPS devices [8].
The focus of the current study is to infer travel modes according to GPS positioning data.Rule-based algorithms [9][10][11] and advanced machine learning methods [12] are two frequently used methods to detect the travel mode.Although rule-based algorithms are readily understandable and applicable, these algorithms exhibit low generalizability [13].That is to say, there exists a low extent to which the rules obtained from one case can be generalized to another, particularly when travel modes are detected in a city context with particular rules trained from data collected in another one.Zheng et al. [14] collected the GPS positioning data of 65 respondents for approximately 10 months and employed decision trees to correctly match 75.6% of trips.Subsequently, Rudloff and Ray [15] split 792 trajectories into training and test sets (70% to 30%) and applied a multilayer perceptron to properly infer travel modes for 82.70% of trips.Thereafter, Bolbol et al. [16] explored feature selection and employed a framework with a support vector machine (SVM) to achieve a promising accuracy of 88% for detecting travel modes.A more recent study by Broach et al. [17] used a multinomial logit (MNL) model to distinguish walk, bike, car and bus modes and correctly labelled 90.8% of trips.Table 1 shows the different methods applied to travel mode detection and summary of the corresponding accuracies.[10] Rule-based algorithm GIS, speed, acceleration 340 segments 82.6% Zheng et al. [14] Decision trees Speed, acceleration 65 respondents 75.6% Rudloff and Ray [15] Multilayer perceptron GIS, speed, acceleration 792 trajectories 82.70% Bolbol et al. [16] Support vector machine GIS, speed, acceleration 81 respondents 88% Broach et al. [17] Multinomial logit GIS, speed, acceleration 926 segments 90.8% Gonzalez et al. [18] Neural networks Speed, acceleration, data quality 114 trips 91.23% A study by Gonzalez et al. [18] indicated that NNs are an appropriate approach for detecting travel modes from GPS positioning data since they can extract subtle information from training data that cannot be directly obtained by human or other analysis techniques.They used a number of statistics derived from the GPS data as input to the NNs.An individual looking at the same numbers may not necessarily be able to determine if the trip was by a certain mode.However, these features can be used to classify travel modes with NNs.In addition, the NNs have a powerful nonlinear pattern classification capacity, which makes NNs appropriate methods to extract travel modes from GPS data.They collected positioning data on 114 trips with smartphones and achieved a promising accuracy of 91.23%, which outperforms most of existing studies.However, the employed NNs were trained by back-propagation (BP) algorithms, which utilized gradient descent technique to train the parameters of the network.In this way, the fitness function is prone to be trapped into local optimum [19].To address this issue, we include the particle swarm optimization (PSO) technique to search for a global optimum.Although PSO exhibits the partial optimism, which may cause lower accuracy at the regulation of its speed and the direction [20], the algorithm is a gradient-free approach, making the algorithm conceptually simple.During the iteration process of the search, only the globally best particle or the best particle in a group can transmit information to other particles, resulting in a high-speed search.In fact, a high searching speed is crucial to travel mode detection in GPS-based travel surveys, particularly when respondents are required to check the travel modes that are automatically derived from the GPS data.The resulting PSO-NNs are expected to achieve a favorable accuracy for travel mode detection.
The rest of this paper is organized as follows.Data collection and description are described in Section 2, and the theoretical and methodological background of NNs and PSO are presented in Section 3. The construction of the classifier is described in Section 4. Finally, the main conclusions and directions for future research are provided in Section 5.

Respondent Recruitment and Positioning App
The data used to detect travel modes were retrieved from a travel survey based on smartphones in Shanghai city from mid-October 2013 to mid-February 2014.Respondents were typically limited to commuters in Shanghai to generate a relatively large number of trips and most of these respondents are recruited over the Internet.More details on the survey design are described in one of our previous studies [21].A mobile app was developed to record GPS positioning data before the survey was conducted.Android and iOS were the platforms used in app development because of their considerable market penetration rates.The app records UTC time, latitude, longitude, altitude, the instantaneous speed, bearing, the number of satellites in view and Horizontal Dilution of Precision (HDOP) once every second.HDOP represents how these satellites in view are arranged in the sky.HDOP is a factor in determining the relative accuracy of a horizontal position.The smaller the HDOP number is, the better the geometric arrangement is.In practice, battery consumption might be an intractable problem with such a high recording frequency, although this frequency increases the accuracy of travel mode imputation.Thus, we gave each respondent a mobile power supply to avoid battery drainage.The package is expected to motivate more respondents to participate in the travel survey.In addition, the app automatically shut down when the smartphone remains stationary for over five minutes and automatically restarts when the smartphone moves again.This function is designed to decrease battery consumption to a minimum extent without adversely effecting on the regular location recording during trips.

Requirements of the Travel Survey
Respondents were asked to start the app to enable automatic location acquisition before their first trip, and keep the device running until their final arrival back home each day during the entire survey period.Respondents were requested to shut down the app at home to minimize battery consumption as much as possible and relieve privacy issues on the respondents, particularly those who are unwilling to be positioned at home.Respondents were asked to upload the positioning data to a database on our server with a single touch and recommended to upload the data in a Wi-Fi environment to decrease transmission cost.
Single-day GPS positioning data were first divided into trips, where each was split into single-mode segments (known as segments) prior to detecting travel modes.The approach utilized for trip division is described in one of our previous studies [22].The basic assumption in the study is that trip ends are characterized by either GPS point clustering or sudden direction change in the context of normal GPS location acquisition, while they display a dwell time exceeding the threshold during GPS signal loss.The segments are extracted from trips based on the approach suggested by Gong et al. [10].After automatically deriving the segments from GPS trajectories, we contacted respondents by telephone and asked them to check these segments.The partition of the segments was corrected when necessary.The respondents were also asked to check the travel modes of each segment.Such a travel survey is also called prompted recall survey.As expected the respondents do not experience a significant burden because most trip ends and travel modes can be inferred correctly.The travel mode of each segment that was validated by the respondents is taken as the "ground truth" in the subsequent study.

Sample Data
Each respondent was requested to participate in the travel survey for at least five continuous days.A total of 113 respondents participated in this survey but only 102 completed it.Those who did not complete the survey quit because of various reasons such as smartphone failure and privacy issues.The number of respondents who completed this survey was categorized based on the GPS data uploaded within a given number of days: 83 respondents with five days and 19 respondents with six to eight days.In addition, 73 of the 102 respondents kept their survey days continuous.Although this continuity requirement does not substantially affect the travel mode detection, this can prompt the respondents to complete the survey quickly.A total of 535 days of GPS data were collected.
In the current study, we distinguished among walk, bike, bus and car modes.Signal loss during subway trips is a serious issue.Thus, it is not included in the analysis.The signal loss for the other four travel modes is handled as follows.The GPS positioning data were expected to be retrieved with a regular interval of one second.Thus, the duration of each missing trajectory can be calculated easily.If the duration of any missing trajectory is less than 10 s, it is compensated with a straight line by itself with lost positioning points located uniformly on this line.Based on the experiences gained in the travel survey, 10 s is an appropriate criteria that could balance respondent burdens with the accuracy required by the travel survey.The positioning data that incorporated missing trajectories greater than 10 s were not included in the current analysis to achieve high data accuracy.A total of 1654 segments were extracted from the recorded trajectories, and all of these segments were validated by respondents.The number of segments for the walk, bike, car and bus modes is 820, 148, 326, and 360, respectively.The walk mode exhibits a maximum percentage with an overwhelming advantage because walk segments usually play a transitional role when intermodal transfers are made.For example, a person usually needs to walk to a subway station after getting off a bus when the bus and subway are sequentially taken during a trip.In addition, walk segments usually exist before the bus segment and after the subway segment.Based on the preceding analysis, this trip is generally broken down into three walk segments, one subway segment and one bus segment.Therefore, effectively imputing walk segments is crucial in the overall detection performance.

Feature Selection
A feature set that incorporates a high distinctiveness provides an opportunity to greatly improve the performance of classification.According to existing studies, the average speed, median speed, average absolute acceleration, travel distance and the 95th percentile speed are typically used to infer travel modes (as shown in Figure 1a-e).However, distinguishing bus segments from car segments is difficult when only the aforementioned speed-related features are utilized [9,18].In most cases, the inclusion of a transit network layer can significantly improve the distinction degree of bus segments from car segments [23].However, the transit network layer is not available for most researchers, and we also encountered the same difficulty.In addition, the bus network is updated every month and even every day at times in a megacity such as Shanghai.A timely update is required to effectively infer bus segments.Thus, it is costly to maintain an up-to-date GIS layer of the bus network.All these factors motivated us to develop a new feature called "low-speed point rate", i.e., the rate of points with a speed of less than 1 m/s, which is expected to capture the characteristics of the periodical stops of buses.In fact, we investigated four types of critical speeds, i.e., 0.5 m/s, 1.0 m/s, 1.5 m/s and 2.0 m/s, for this feature.According to the two-sample Kolmogorov-Smirnov test for each "low-speed rate" (0.5 m/s, 1 m/s, 1.5 m/s and 2 m/s) between bus segments and car segments, 1 m/s achieves the minimum p value (p = 1.9456 × 10 −7 ).The p values of 0.5 m/s, 1.5 m/s and 2 m/s are 8.9686 × 10 −7 , 9.7462 × 10 −5 , and 7.1505 × 10 −4 , respectively.Thus, 1 m/s is applied because this value can best differentiate these two segments (as shown in Figure 1f).The distribution of the six features is shown in Figure 1.

Neural Networks
A neural network consists of a selection of input, hidden and output neurons.Generally, any two neurons between the adjacent layers are directly correlated.The neural network collects the inputs, extracts any useful information through hidden neurons and produces the output neurons that can be used to classify the input sample.In the current study, the weights are trained by the PSO algorithm, which is described in the next subsection.The trained neural network, which is represented by a series of weighting matrices, is employed to evaluate the data streams with correct outputs that the researchers are unaware of.The neural network can be highly flexible because various types of data are readily fitted when the multiple settings of a neural work are adjusted, including the amount of hidden layers, number of neurons in each layer, and learning rate.In this study, we employed a three-layer neural network to detect travel modes from the GPS positioning data (as shown in Figure 2).Additionally, we applied a commonly used logistic function as the activation function.

Particle Swarm Optimization
PSO is essentially an evolutionary computation technique proposed by Kennedy and Eberhart [24].PSO originates from a natural system and incorporates a global search ability.The algorithm is developed according to studies based on the social behavior of animals, such as fish schooling and bird flocking.To improve the probability of convergence to a global optimal, we employed a common ring topology [25].In this topology, each particle corresponds to a specific group of particles, which consists of the particle in question and its immediate adjacent neighbors.In addition, the connections within a group are undirected and unweighted.The algorithm is initialized with a swarm of random solutions.Each initialized solution, also known as a particle, is represented by a random initial location and velocity.At each step n, the velocity is updated according to the following equation: In the velocity updating equation, the new velocity vector ( 1) i n  v corresponding to particle i can be obtained by computing the sum of three components.The first component is known as "inertia" and the parameter  is the inertia weight.The second component is known as "individual knowledge", which represents the effect of the known best position vector _ pbest i p that particle i has determined on the current velocity.The third component is known as "group knowledge", which indicates the effect of the known best position lbest p that all the particles in the group have found on the current velocity.The acceleration parameters 1 c and 2 c are included in the second and third components to adjust these two types of impact, whereas two random numbers 1 rand and 2 rand , which have a uniform distribution between 0 and 1, are applied to provide random search.
After the new velocity is calculated, the particle position is updated based on the following equation: ( 1) ( ) ( 1) where ( ) i n x indicates the position vector of particle i in the previous iteration.
For each iteration, we need to evaluate the fitness of each particle based on the desired optimization purpose.To evaluate a neural network, the root-mean-square-error (RMSE) is utilized to describe the average detection error and can be calculated according to the following equation: ) where nTr represents the amount of the training samples, nOut is the amount of network outputs, and tij and yij are the jth derived output and reported output corresponding to the ith training sample, respectively.For example, if the reported travel mode of the ith training sample is walk, yi1 is equal to 1. Accordingly, if the derived travel mode of the ith training sample is bus, ti1 and ti3 are equal to 0 and 1, respectively.The fitness function is defined as follows: Fitness can be used to measure the performance of a neural network.According to the evaluation results of the initial swarm, the personal best of each particle and the local best of the group will be stored.Obviously, the personal best of each particle is equal to its initial position in the initial state, whereas the local best records the position of the particle that achieves the best fitness value calculated with Equation ( 4) in a group.The optimization process continues by updating the personal best and the local best with Equations ( 1) and (2).If the fitness of a particle is greater than its personal best, the local best is replaced with the particle.Similarly, if the fitness of a particle is greater than the local best, the local best is replaced with the current particle.The process does not stop until the criterion set by the user is met.The criterion may include the threshold of the RMSE or the maximum iteration number of the optimization process.

Detecting Travel Modes with PSO-NNs
According to the sample partition applied by Feng and Timmermans [13], three quarters of the total sample is taken as the training set, whereas the remaining one quarter of the total sample is regarded as the test set.In this way, the number of samples in training set and test set is 1240 and 414, respectively.In the current study, the acceleration parameters 1 c and 2 c are taken to be constants 1 and 2 according to Lin and Hsieh [19].To increase the global search ability at the beginning of the iteration process and improve the local search ability near the end of the iteration process, we used a linearly decreasing inertia weight, which is represented as follows: where tmax and t are the maximum number of iterations and the current iteration times, and ωs and ωe are the initial and final values of the inertia weight, respectively.Following the recommendation of Shi and Eberhart [26], we apply the values of 0.9 and 0.4 for ωs and ωe.In addition, the maximum number of iterations tmax = 5000 is used in this study [27].
We defined the population size as 50 particles.A particle in the current study is encoded for a vector.For a neural network involved, each particle represents all weights of the structure of the neural network.If the number of neurons in the hidden layer in Figure 2 is 5, the neural network has the structure of 7-6-4, including a bias unit in both of the first two layers.In this case, each particle is a vector with a length of 66.Before implementing the classification efforts, we rescaled the four dimensions (travel distance, average speed, median speed and 95% percentile speed) to the range [0, 1] so that all the features have the same range.The formula is given as follows: ' min( ) max( ) min( ) where ' i x and i x are the ith original and normalized sample values for a specific feature, respectively, and x is the vector consisting of all sample values for the feature.To avoid a particle exceeding the range of the problem, xmax = 1 and xmin = 0 are used for each dimension.Accordingly, the maximum velocity vmax = 1 and vmin = −1 are used in this study [28].Figure 3 illustrates the pseudocode of a PSO using a ring topology.In this study, a "wrap-around" ring topology is used.In other words, the last particle is the neighbor of the first particle and vice versa.The function neighborhood(.)returns the best personal best in the neighborhood of particle i.
As stated above, we employed a three-layer neural network to detect travel modes.We tested multiple NNs that included the number of hidden neurons ranging from 1 to 20 [27].For each neural network, the classification accuracy for the test set is shown in Figure 4.The classification accuracy is represented as the ratio of the number of samples that were correctly classified to the total number of samples.The highest accuracy of 94.44% was achieved for the neural network that incorporates 12 hidden neurons.Figure 4 indicates that the classification accuracy increases rapidly when the number of neurons increases from 1 to 8, whereas it gradually decreases after the number of neurons exceeds 12.When 12 neurons are applied, the classification accuracy reaches 94.44%, which is higher than the results of most existing studies [29].An exception is the study by Stopher et al. [9], which achieves an accuracy of approximately 95%.However, it is difficult to compare different studies because the accuracy of travel mode detection depends on the spatial context in which GPS is used, the number of identified transportation modes, type of input variables and data used for validation [13].Therefore, we applied the neuron network that includes 12 neurons in the hidden layer.To further explore the performance of the classifier, we calculated the confusion matrix.Recall and precision are used to measure the classification capacity of the classifier and their definitions are described by Forman [30] in detail.As shown in Table 2, the highest recall is achieved for segments with walk modes and it reaches 96.59% for the test set and 97.24% for the training set.This result is predictable because several features, such as the average speed and travel distance, exhibit an obviously distinct distribution between walk modes and other modes.Recall for segments with bus and car modes is relatively low among the four modes.In addition, most bus segments that are incorrectly flagged are identified as car modes, and vice versa.This result was also obtained by other studies [9].Regarding precision, the lowest value is obtained for bike segments, which may be due to the fact that the speed related features for bike modes are between walk modes and bus/car modes.In comparing PSO-NNs with other frequently used machine learning methods, we chose several representative classifiers, consisting of SVM, MNL, and neural networks with back-propagation (BP-NNs), to detect the travel modes.The SVM classifier is implemented with package "probsvm" in R, with one-versus-rest used in the multiclass method.The MNL model [31] is used with its primary version and all the features are alternative-specific attributes.BP-NNs are performed with the same acceleration parameters, inertia parameter strategy, population size, velocity limits, and maximum number of iterations and layers of neurons as the PSO-NNs.In addition, BP-NNs are also tested with 1 to 20 neurons in the hidden layer, and the highest accuracy with 15 neurons was achieved.All these approaches share the same training data and test data, ensuring that the classification capacity is comparable among classifiers.The results of the comparison of these classifiers are shown in Table 3 and they are predictable to a certain extent.For example, although the MNL model is readily understandable and practicable, this model does not fit the problem well because the basic assumption of independence of irrelevant alternatives (IIA) may not hold for this issue.The comparison between BP-NNs and PSO-NNs indicates that implementing PSO algorithm evidently improves the classification ability of the neural network.More specifically, the classification accuracy for the training set is improved from 91.85% to 95.81%, while that for test set is improved from 89.37% to 94.44%.According to McNemar's test, the resulting M value (M = 9.30) is greater than the critical value 3.84 (95% confidence level).Thus, this test rejects the null hypothesis that BP-NNs and PSO-NNs have the same classification errors.This result indicates that it is more reliable to apply the classifier of PSO-NNs to evaluate new samples.

Summaries and Conclusions
The current study employed PSO-NNs to distinguish the four travel modes (walk, bike, bus, and car) with GPS positioning data collected through a smartphone-based travel survey.The derived travel modes for each segment were compared with the "ground truth", which was obtained by respondents validating or correcting the primary travel information that was automatically detected from trajectories uploaded by the respondents themselves.Based on comparison of PSO-NNs with several representative classifiers, the PSO-NNs achieved the best classification accuracy of 94.44% for the test set and 95.81% for the training set.
By improving the classification accuracy of travel modes with PSO-NNs, respondent burdens are expected to be further reduced and the survey duration can be prolonged without requiring additional effort from the respondents.With the improvement of the classification accuracy for travel mode detection and the decrease of burdens on respondents, the smartphone-based travel surveys provide an opportunity to supplement traditional travel surveys.According to the investigation of the confusion matrix, newly targeted features may be included to decrease the classification errors between bus and car modes in future studies.

Figure 1 .
Figure 1.Distribution of six features.

Figure 3 .
Figure 3. Pseudocode of a PSO using a ring topology.

Figure 4 .
Figure 4. Classification accuracy for different number of hidden neurons.

Table 1 .
Summary of different methods for detecting travel modes.

Table 2 .
Confusion matrix of the neural network.

Table 3 .
Comparison among several classifiers for detecting travel modes.