## 1. Introduction

Environmental pollution and traffic congestion are major problems that large cities are facing. Traffic emissions are an important source of air pollution in urban environments and have been associated with several adverse health effects including cardiorespiratory morbidity, mortality, and cancer [

1]. Shanghai is a rapidly developing city and the cause of air pollution has changed from conventional coal combustion to mixed coal combustion/motor vehicle emission due to the rapid increase of motor vehicles within the city [

2,

3]. The increase of vehicles not only causes air pollution but also causes congestion. Several studies have demonstrated that as the Chinese urbanization process continues to accelerate, urban traffic congestion issue will become increasingly severe [

4,

5]. The problem of urban traffic congestion causes inconvenience and concern to commuting residents [

6], and causes severe air contamination [

7]. The rationality of resident travel structure directly affects the rationality of urban transportation structure, affects the sustainable development of urban transportation, and indirectly affects environmental pollution. Therefore, study of travel mode choice can alleviate both environment and congestion problems. Commuting travel is the most important travel behavior for urban residents, and the main time period of traffic congestion is commuting time. Hence, a better understanding of the influential factors of multi-day commute mode choice will be advantageous in finding effective measures for reducing private vehicle use and for improving the appeal of public transport in Shanghai, China. Additionally, better understanding of the day-to-day variability in commute mode is needed for alleviating traffic congestion and developing sustainable transportation systems.

Our study’s purpose was to provide theoretical support for policymakers and transportation planners to alleviate traffic congestion and develop sustainable transportation systems. In this paper, we will develop commute mode choice model for Shanghai using multi-day GPS data, which provides accurate and reliable travel information. Furthermore, multi-day GPS data enables better understanding of the day-to-day dynamics of individual commute mode choice, which is crucial for modeling. Studying the model of commute mode choice can help reveal the changing rules of commuting behavior and provide reference for urban transportation development planning and transportation policy formulating.

The structural of this paper is as follows. A literature review of travel mode choice and variability in mode choice across multiple days is provided in the second section. The third section analyzes the data used in this paper. The fourth section provides a multinomial logit model for commuter travel mode choice and a binary logit model to analyze the multi-day travel mode change. Quantitative analysis of the factors that influence the choice of commuters and the factors that cause changes in commuting mode choices is then undertaken. A final section concludes the paper and offers suggestions for future research.

## 2. Literature Review

Some previous studies have used different methods and consider different factors to study the choice of travel modes. Logit model is a popular model used by some scholars. Logit model mainly includes multinomial logit model, nested logit model, cross-nested logit model, binary logit model, and mixed logit model. These different logit models were compared to find the most suitable one by some scholars [

8,

9]. Apart from the logit model, random forest approach [

10], latent class model [

11], game theory approach [

12], cluster analysis method [

13], and regret theory [

14] are models that are usually used for studying travel mode choice.

Travel mode choice is influenced by the characteristics of trip maker and trip features. Scholars think slightly differently regarding factors that influence the choice of travel mode. The socio-economic attributes of travelers and the characteristics of each travel mode are factors that will be considered. Apart from these common factors, the speed of adjustment, the resistance to change [

15], working hours [

16], subjective lifestyle [

17], residential density [

18], geographical factors [

19], and housing price [

20], were also considered by scholars. However, the result shows that the influence of objective socio-economic characteristics (e.g., age, gender, employment, etc.) exceeds the influence of subjective factors [

17]. Among the numerous determining factors, objective socio-economic characteristics such as car availability, income, age, and household characteristics are most often studied and found to be significant [

18]. In this study, we just consider factors that were verified to be significant in the previous studies, and add other factors we think are significant but were ignored in previous studies.

Different types of cities and different regions of the city consider different factors and have different travel mode choice results. City specification is an important factor to be concerned with before research [

21,

22,

23]. Because country populations are different, the conclusions obtained from different countries would be different. For example, China has a large population and is more suitable for the development of public transport. The population of some countries is relatively small, and the development of public transport may result in a waste of funds. However, cities with large populations such as Los Angeles also adopt effective strategies for increasing transit competitiveness relative to auto, and hence attract people out of their cars [

24]. In addition to the difference in population, there are differences in travel modes. Electric vehicles are very common in China, but not that common in other countries. Therefore, the classification of travel modes should be considered according to the specific conditions of the city [

25].

However, many studies that study the choice of travel mode use only one day’s data, and do not take into account the variability in commuting mode choice across multiple days. The shift in transport policy towards travel demand management has directed the attention of transport research towards the dynamic processes in travel behavior; learning, and change on the one hand, and rhythms and routines on the other [

26]. Previous studies have showed that travel behavior is neither totally repetitious nor totally variable [

27]. Though many behaviors that make up the daily pattern are highly repetitious, the similarity between daily travel patterns on different days in an individual’s longitudinal record is quite low [

28]. Hence, intrapersonal variability needs to be considered in studying travel behavior.

Intrapersonal variability in daily urban travel behavior was considered by scholars including variability in the trip frequency, trip chaining, daily travel time [

29], travel modes, and routes [

30]. Based on previous studies, the variability of travel modes is related to life stage and spatial mobility constraints. Most adolescents are multimodal, mainly out of necessity, and the percentage of multimodal people declines drastically on entry into professional life [

31]. Modal variability is determined by different types of spatial mobility constraints and it is found that reduced modal variability is predicted for having mobility difficulties, being aged over 60, being non-white, working full-time, living in smaller settlements, having lower household income, having regular access to a car, having no public transport pass/season ticket, and not owning a bicycle [

32].

Regarding the variety in travel modes, single people tend to be more multimodal than married ones. The car is often the better choice for families. However, public transport was a better option for single people in multimodal specific situations [

33]. Regarding bicycle travel mode in multimodal, the decisions of occasional cyclists to commute by bicycle are more affected by positive weather conditions; frequent cyclists are discouraged from cycling by more practical barriers, including wind speed and the need to be at multiple locations [

34].

Due to disparities in the sampling strategies and in the land use/transportation/cultural milieu, the travel mode choice showed some similarities and some differences across countries [

35]. Therefore, considering the situation in Shanghai, China, a commute mode choice model considering the variability in commuting mode choice was established.

Currently, many of the data sources for the travel mode study are from revealed-preference and stated-preference surveys. These single-day data are not accurate enough. Due to the limit of single-day data, although some studies study the variability in travel mode, the influence of travel mode change on travel mode choice is not often considered in previous studies. Therefore, we want to find out whether the probability of mode change has an influence on the mode choice.

The data used in this paper comes from a questionnaire, the travel modes are identified by GPS data [

36], and thus the use of multi-day travel data is more accurate. The factors considered in this article are more comprehensive, including the socio-economic attributes of individuals, family attributes, and the attributes of each travel mode. Apart from these factors, the probability of commute mode change is also considered in the commute mode choice model.

## 3. Data Description

The data used in this article was from a smartphone-based travel survey conducted in Shanghai from October 2013 to April 2015. Respondents generally require residents who work or study in Shanghai, mainly recruited through online recruitment and commissioned investigation companies, which can effectively improve the representativeness of the sample. All respondents were recruited before the designated survey day and sent survey guidance and privacy documents. In this survey, respondents were required to complete their socio-demographic attributes online. The socio-demographic attributes include gender, age, education, marriage, work, income, driver’s license, working hours, total number of family, school-age children, home address, work address, telephone number, the number of bicycle, the number of car, the longitude and latitude of home, the longitude and latitude of work place, and so on.

Additionally, to get the daily travel mode of respondents, an application was developed for collecting location-based data by our research group. The application could record time, longitude, latitude, altitude, heading, and the number of satellites in view every second (these factors can be automatically identified). To avoid battery drainage, we presented each respondent with an external battery package. Also, the application will automatically be closed when the smartphone is stationary for more than five minutes. It will restart when the smartphone moves again, which could effectively reduce battery consumption with no adverse effects on normal data recording. After the survey, each respondent was provided with a mobile recharge card valued at ¥50, which in turn attracts more respondents to participate.

Respondents were required to start the application before leaving home and upload GPS records after the last arrival home every day. After uploading the GPS data streams to our server, travel information, including trip ends, travel modes, and trip purposes were derived and displayed on the map. Then the respondents would be called by our group members to validate and correct the travel information if necessary. This intervention aims to help the respondents recall more details of their trips, which can improve the accuracy of the actual travel information to a maximum extent. Next, each travel segment needed to be identified. Subway, bus, car, electric bicycle, bike, and walking were considered to be travel modes. Special rules were employed to detect subway trips, because of their significant difference from the other travel modes. Subsequently, the method of random forest classifier was applied to divide the remainder of the five types of travel modes [

36]. Then the multi-day travel modes of each volunteer are obtained.

According to the completeness and validity, a total of 312 respondents were required to complete at least five days of the survey. However, not all the respondents have commute behavior across several days. As this article is to study the travel mode choice for commuters, we take 152 respondents who have multi-day commute behavior. Each commuter has an average of seven days’ commuting routes and modes.

Through initial data processing, socio-economic attributes and family attributes data in this paper came from individual and household information of 152 respondents. The travel modes attributes in this paper mainly include public transport commuting time, car commuting time, the cost of public transport, the cost of driving, bicycle commuting time, and walk commuting time. Transportation commuting time, bicycle commuting time, and walk commuting time are combined by the data from Google Maps and the time actually used by commuters. (If the commuter just takes public transport, then the public transport time is the actual time used. In addition, the bicycle commuting time and walk commuting time are estimated by Google Maps according to the home address and workplace address.) The commuter multi-day travel modes are identified by random forest model and confirmed by calling back [

36]. Each commuter travel mode used in this paper is the most frequently used travel mode. The probability of mode change is the frequency of all modes divided by the frequency of other minor modes. In this paper, due to the factors being similar in some modes, we combine bus and subway as public transport mode, and combine electric bicycle and bike as bicycle mode. Therefore, the travel modes are divided into four categories: public transport, car, walk, and bicycle.

Table 1 shows the data from the sample. 46.7% of commuters mainly choose public transport as commute mode. 23.1% of commuters mainly choose car as their commute mode. Only 17.7% of commuters mainly cycle to work. 16.4% of commuters changed their commuting travel mode in one-week data. Among the commuters whose travel modes have changed, 16.0% of them use three travel modes for commuting, 84.0% of them use two travel modes of commuting. Most of the changed commuters use car as their main commute mode. Therefore, we want to find out if we can lead those easy-to-change commuters to a more intensive and sustainable travel mode.

The factors that influence the commute mode choice include personal and family attributes, travel modes attributes, and home and workplace address attributes as listed in

Table 1. However, not all the factors are independent, and some factors are highly correlated. We conducted Pearson correlation analysis on the variables above. It has been found that the Pearson correlation coefficients among the commute time by public transport, commute time by walking, commute time by bicycle, commute time by car, and commute distance is more than 0.8; those factors are highly correlated. Therefore, we just take the commute distance into account.

Many studies use only one day’s travel mode obtained from the questionnaire to study the travel mode choice, and the factors concerned are relatively small. This paper takes the commuter’s multi-day travel data into account and has more comprehensive factors. We take the probability of commute mode change into account, and find that the probability is significant. Then we want to find what factors have caused this change. To solve these problems, two methods were used in this study.

## 4. Modeling and Results

#### 4.1. Multi-Day Commute Mode Choice

#### 4.1.1. Multi-day Commute Mode Choice Model

The multinomial logit model is a commonly used forecasting method in travel behavior research. According to the theory of random utility, the traveler chooses the most effective alternative under certain circumstances, and the choice is influenced by factors such as the characteristics of the traveler and the characteristics of the travel mode. If these factors are known to influence the utility of the traveler, then the traveler’s choice behavior can be predicted. In addition to following the two basic assumptions of the random utility theory and the utility maximization principle, the multinomial logit model also assumes that the random terms ${\mathsf{\epsilon}}_{\mathrm{t}}$ of the utility function are independent of each other and obey the same double exponential distribution.

Assume that traveler

t’s travel option set is

${A}_{t}$, which contains

k different modes, in this paper

${A}_{t}$ contains public transport, car, walk, and bicycle, where mode

i’s utility is

${U}_{it}$, which is based on the principle of utility maximization. If option

i can bring the greatest utility to individuals, then traveler

t will choose

i. The equation is as follows:

the utility

${U}_{it}$ of the mode is composed of two parts: the fixed item

${V}_{it}$ and the random item

${\epsilon}_{it}$, and the probability

${P}_{it}$ of the traveler

t selecting the mode

i is:

since the random term

${\epsilon}_{it}$ obeys the double exponential distribution, the basic form of the multinomial logit model can be derived by:

There are many methods for estimating the parameters of the multinomial logit model, such as the linear least squares method, the nonlinear least squares method, the maximum-likelihood estimation method, etc., and the most widely used is the maximum-likelihood estimation method. In this paper, SPSS software is used to estimate the parameters of the multinomial logit model by maximum-likelihood estimation.

#### 4.1.2. Results of Multi-Day Commute Mode Choice Model

Some highly relevant data (commute time by public transport, commute time by walking, commute time by bicycle, commute time by car, and commute distance) were found in the data description part. Except those highly relevant data, the personal and family attributes, the commute modes attributes and the home and workplace address attributes were concerned in the commute mode choice model. The personal and family attributes include monthly income, age, gender, education, marriage, school-age children, number of cars, and number of bicycles. The commute modes attributes include the cost of public transport and cost of commuting by car. The home and workplace address attributes include distance to the nearest public transport station, number of public transport lines, number of transfer times, and commute distance. Apart from these factors, we also considered each commuter’s probability of commute mode change.

When all these factors are put into the model, we found that not all the factors are significant to the commute mode choice. The significance of age, gender, education, marriage, cost of public transport, cost of commuting by car, and number of public transport lines are more than 0.05 (which means that the factor is not relevant to the mode choice). Then, we remove these insignificant factors one by one until the significances of all the remaining variables are less than 0.05. The significances of the left factors are listed in

Table 2; all significances are less than 0.05. This means that these factors can be considered significant in the model at a 95% probability.

As the aim of this paper is to lead commuters to more sustainable and intensive modes, we take public transport mode as the reference category. The parameter calibration of the multinomial logit model using the maximum-likelihood estimation by SPSS24 software is shown in

Table 3.

The significant factors are not same in different travel modes. We only put significant factors into the equation as the main factors affecting the choice of travel mode. Three logit model equations (only contain significant factors) can be derived from

Table 3, and the final classification results are shown in

Table 4.

Finally, the prediction accuracy of the multinomial logit model is 84.2%. The quality of the fit of the model is good enough.

As can be seen from

Table 3, taking public transport mode as a reference, as for car commute mode, the number of bicycles, the number of cars, the average transfer times, the probability of mode change, the distance to the nearest public transit station, and monthly income (less than ¥5000) are significant factors. The number of bicycles and the monthly income both negatively influence the car mode choice, which means that the more bicycles in the family, the more the commuter is inclined to choose public transport; commuters whose monthly income is less than ¥5000 is less inclined to choose car for commuting. The number of cars, the number of transfer times, and distance to the nearest public transport station all positively influence the car mode choice. This means that the more cars in the family, the higher the probability to choose car for commuting; the more transfer times required, the higher the probability to choose car; the farther away from the public transport station, the higher the probability to choose car. The results are consistent with our expectation. The parameter of probability of change is 14.070, which is positive. This means that the car commuters are more inclined to change.

As for walking commute mode, the significant factors are the average transfer times and the commute distance. As can be seen from Equation (5), the parameter of the average transfer times is 6.425, which is positive. This means that the more transfer times, the less the commuter is inclined to choose public transport. The parameter of the commute distance is negative, which means that the farther the commute distance, the more the commuter is inclined to choose public transport than walk for commuting. This is consistent with our expectation.

As for bicycle commute mode, the significant factors are the distance to the nearest public transit station, the number of bicycles, the commute distance, and whether the family have school-age children. The distance to the nearest station, and the number of bicycles in the family both positively influence the bicycle mode choice, which means that the farther from the station, the inclined the commuter is to choose bicycle than public transport; the more bicycles, the higher the probability to choose bicycle for commuting. The commute distance and having no school-age children in the family have a negative influence on bicycle mode choice. This means that the farther the commute distance, the lower the probability to choose bicycle than transportation; the commuter without school-age children is more inclined to choose public transport.

#### 4.2. Factors Affect the Probability of Commute Mode Change

#### 4.2.1. Multi-Day Commute Mode Change Model

The commuter travel mode change or not has two alternatives, and the factors that affect the choice cannot be directly tested. Therefore, the binary logit model is suitable to analyze factors that affect the commute mode change.

Equation (7) can be fitted by logit regression.

where:

p is the probability that the multi-day commute mode does not change under the influence of factors (

${x}_{1}$,

${x}_{2}$, …,

${x}_{m}$); 1 −

P is the probability of multi-day commute mode change;

${x}_{i}$ (

i = 1, 2, …,

m) is the ith factor that affects whether the travel mode is changed or not;

a,

${b}_{i}$ (

i = 1, 2, …,

m) is the parameter to be estimated.

#### 4.2.2. Results of Multi-Day Commute Mode Change Model

As shown in the results of the multinomial logit model, some personal and family attributes are not significant. This is different from previous research results. According to this result, we speculate that this may be due to the fact that we considered the probability of change in travel mode choice model, these factors of personal attribute characteristics may affect whether the travel mode is changed or not. Because the probability of mode change is a continuous value, it cannot be used in the binary logit model. As the travel mode change is a dichotomous variable, we use the travel mode change to represent the probability of change as the dependent variable of the commute mode change model. We define the travel mode change in

Table 1 (1 = commuters who do not change their commuting mode across multiple days, 2 = commuters who change their commuting mode across multiple days).

Age, gender, education, and marriage status are not significant in the travel mode choice model. In addition, what we want to find out is if these factors influence the travel mode change. So we consider these four factors in this part. Except for education, the other three factors are significant to the travel mode change.

The estimations of the significant factors are listed in

Table 5.

From

Table 5, the equation obtained by logit probability can be expressed as:

P is the probability that the multi-day commute travel mode does not change under the influence of factors (

$\mathrm{Gender},\text{}\mathrm{Age},\text{}\mathrm{Marriage}$); 1 −

P is the probability of multi-day commute travel mode change.

The accuracy of the model is shown in

Table 6.

As per column 2 in

Table 5, the estimation of gender is −3.661, which means that gender has a negative impact on travel mode change, where females are more inclined to change their commute mode than males. The estimation of age is 0.162, which is positive, which means that younger commuters are more inclined to change their commute mode. The estimation of marriage is −2.119, which is negative, which means that unmarried commuters are more inclined to change their commute modes than married commuters. The significance of these three factors are all less than 0.05, so these three factors have a significant impact on the change of commuting mode.

As we can see from the multinomial logit model above, the probability of mode change has a significant effect on the choice between public transport and car. From this we can know that female, young, unmarried commuters are more inclined to change commute modes. Combining the results of these two models, we can recommend environmentally friendly travel modes for these easy-to-change commuters to replace car for commuting.

## 5. Conclusions and Recommendation

Using multi-day GPS-input travel data in Shanghai, China, this paper studied factors that influence commute mode choice, and factors that cause multi-day commute mode change by logit models. In the commute mode choice study, the probability of commute mode change is considered, and shows that the probability of mode change has a significant effect on public transport and car mode choice. This result is useful to guide car users to adopt a more intensive and environmentally friendly travel mode.

Several conclusions can be drawn from this study. First, the commute mode choice is influenced by many factors. Monthly income, school-age children, number of bicycles, number of cars, number of transfer times, distance to the nearest public transport station, commute distance, and probability of mode change directly influence the commute mode choice. Age, gender and marriage influence the probability of change directly, and influence the commute mode choice indirectly. Second, we usually consider that the cost of mode may influence our choice, but the result shows that the cost of public transport and the cost of car are not significant in public transport and car mode choice. Third, the probability of mode change mainly influences the car mode choice, which means that commuters, who choose car as their main commute mode usually combine car with other modes for commuting. Therefore, policies can be made to guide car commuters to commute by public transport. Fourth, the factors that restrict commuters from choosing public transport are mainly the number of transfers times and the distance to the nearest public transport stations. The number of public transport lines is not the main factor. This is useful to improve the public transport system to attract more commuters. Finally, the factors that restrict commuters from choosing bicycle are mainly the number of bicycles and commuting distance. According to this result, policies can be made to guide commuters, whose commute distance is short, to commute by bicycle.

Based on these findings above, the following policies are suggested for the government and transportation agencies to alleviate road congestion and vehicle pollution.

First, with the development of the economy in Shanghai, a developed city, the cost of modes is no longer the main factor affecting the commute mode choice. Therefore, public transport cannot simply lower the price to attract commuters. It should improve other aspects to attract commuters. It can provide a customized bus with a slightly higher price, but more convenient and comfortable, to attract more commuters to take this intensive way. If needed, research needs to be conducted to find residential areas and office areas that are far away from public transport stations, and build public transport stations in these areas.

Second, the policy about limiting car ownership should continue. As having a car decreases the probability of commuting by public transport, it is necessary to keep this policy to decrease car use and increase public transport use such as metro. Apart from limiting car ownership, policy should take some measures such as eliminating parking spaces within certain areas. Restricting cars based on license plate number and on time can reduce the use of cars to some extent.

Third, the number of routes has little effect on the choice of public transport options. This may be because commuters are not aware of the information on these routes. Therefore, timely information about public transport routes should be sent to commuters, such as when the next bus will arrive, the nearest station, and the best transfer method, which can guide more commuters to travel by public transport.

Fourth, due to the development of shared bicycles in China, we should make full use of it, since commuters who have bicycles in the home are more likely to commute by bicycle. Commuters without bicycles will take public transport when their workplace is close to home. This will cause congestion on the public transport system and reduce the comfort of public transport. When the workplace is far from home, commuters without a bicycle will drive to work. This will increase road congestion and environmental pollution. With the development of shared bicycles in recent years, commuters who live close to the workplace but without bicycles can use shared bicycles for commuting. Commuters who live far away from public transport stations should be guided to use shared bicycles to public transport stations or use shared bikes to travel the final miles.

Finally, intensive travel should be promoted to those who are more likely to change their commuting mode. According to this study, environmentally friendly modes should be promoted mainly to unmarried young people. Give them positive impressions about alternative modes by providing quality transit services and a highly walkable environment. Commuters may change their commute modes to sustainable and intensive travel modes.

Guiding car commuters to use public transport can alleviate road traffic congestion. Guiding short-distance commuters who use public transport to use bicycles can ease the congestion of public transport during peak hours. This is more sustainable and can reduce pollution and improve commuting efficiency.

This study can be improved in several aspects, requiring further research. First, this study is unable to include all influential factors, e.g., reliability and comfort of various travel modes is not concerned. Secondly, this paper only concerns public transport, and does not discuss bus and rail separately. Future studies should collect more data, increase the amount of data to support the commuter’s travel mode choice model, and further improve the credibility of the model.