Modelling the impact of weather and context data on transport mode choices: A case study of GPS trajectories from Beijing

: Over the years, researchers have been studying the effect of weather and context data on the transport mode choice. The majority of these works are based on survey data, however the accuracy of their ﬁndings relies on how respondents give accurate and honest answers. In this paper, the potential of using GPS trajectories as an alternative to travel surveys in studying the impact of weather and context data on transport mode choices is investigated in Beijing city. In the analysis, we apply both descriptive and statistical models such as the MNL and MNP models. Our ﬁndings indicate that temperature has the most prominent effect among weather conditions. For instance, for temperatures greater than 25 C, the walking share increases by 27% and the bike share reduces by 21%, which is line with the results from several survey studies. In addition, the evidence of government policy on transport regulation is revealed when the air quality becomes hazardous as people are encouraged to use environmentally friendly travel mode choices such as the bike instead of the bus and car, which are known CO 2 emitters. Moreover, due to a series of trafﬁc restrictions introduced by the Beijing government during the 2008 summer Olympics, a decrease of 17.5% in the car share and an increase of 13% and 10% in the walking and bus shares, respectively are observed. These ﬁndings provide a scientiﬁc basis for effective transport regulation and planning purposes.


Introduction
At present, the transport sector accounts for 29% of all world greenhouse gas emissions, and 23% of global carbon dioxide (CO 2 ) emissions, consequently contributing towards the global climate change [1]. Climate change mitigation requires finding ways of achieving sustainable mobility options without compromising economic growth and social inclusion, which necessitates effective government planning and regulation of the transport sector. Therefore, it is crucial to investigate the scientific evidence which shows how changes in climate conditions, transport planning, and transport regulation are intrinsically interrelated.
With climate change, weather is now an important topic in transportation research. Over the years, researchers have been studying the effect of weather and context on the transport mode choice. We define context data as information that can provide perspective into a person or an event such as culture and habits. These studies relate context information such as individual characteristics [2], temporal factors [3], transportation supply [4], travel demands [5], and weather conditions [6] to existing or self-gathered travel behaviour data. Most of the works focus on weather effects on transport mode decisions across different cities [6][7][8].
The existing literature on weather and daily mobility focus particularly on precipitation [9][10][11][12][13], temperature [14][15][16], humidity [17,18], wind speed [18], air quality [16,18], seasons [19]. The findings of these studies give insights into the separate roles of not only weather parameters, but also the geographical context, cultures and habits in influencing transport mode choices. For instance, the authors in [15,20] indicate that in the Netherlands, higher but not too high temperature favours walking and biking over motorised transport, hence, temperature is more important than precipitation. In contrast, researchers in [9,10] conclude that precipitation creates more significant ridership fluctuations than temperature in Nanjing, China and Flanders, Belgium.
In search for a better understanding of how the geographical context and habits affect mobility decisions, Böcker et al. [12] investigates weather and daily mobility across Dutch, Norwegian and Swedish cities. The authors find that biking is favoured by dry and warm conditions and the use of the car is favored during wet and windy weather. The authors also highlight the presence of dierences in the eects of weather on mobility across the different cities and countries.
With respect to seasonality, Hyland et al. [19] reports that it is highly likely for the car to be the chosen mode of transport during bad weather for commuters in Chicago. In fact, the authors show that millennials are inclined to choose the car in winter than the summer, while for non-millennials, seasonality had little impact on the choice of mode of transport. In [21], Ton et al. further highlights that apart from seasonality and weather characteristics, other categories of context information such as work conditions, trip and household characteristics, built environment do also influence the transport mode choices. In [22], Böcker et al. investigates how emotional travel experiences influence transport mode choices and finds that biking is influenced by sunny dry, calm, and warm but not too hot weather conditions, hence leading to more satisfying emotions.
While the aforementioned studies improve our understanding of weather and context data effects on daily mobility choices, one key issue standouts, many of the studies are based on survey or diary travel for characterising individual travel behaviour. The disadvantage of surveys is that the accuracy of their findings relies on how respondents give accurate and honest answers. However, information sources such as smartphones which can log satellite positioning data can now be used as an alternative to the surveys in transportation research. To the best of our knowledge, there is no study that analyses the relationship among weather conditions, context information, and different transport modes using satellite positioning data.
The goal of this paper is to show the possibility of using GPS trajectories in investigating the impact of weather and context data on transport mode choices in Beijing city. In this paper, not only do we study the impact of weather conditions such as temperature, precipitation, relative humidity, wind speed, and air quality on on transport mode choices, but also the effect of context information such as rush hours, holidays, day/night, an event such as the Olympics, and trip distance on individual travel behaviour is presented. We apply both descriptive and statistical models such as the multinomial logit (MNL) and probit (MNP) models. The specific contribution of this paper is two fold: 1.
The potential of using GPS trajectories to analyse and model the relationship between transport mode choices, weather and context information is investigated.

2.
The relationship among weather and context information, transport planning, and transport regulation is analysed.
The structure of the remainder of this paper is as follows. Section 2 presents the databases applied in this work. The statistical analysis and models are described in Section 3 and Section 4, respectively. The results are discussed in Section 5 while the conclusions and future work are presented in Section 6.

Databases
The case study is Beijing [see - Figure 1], which is the capital of the People's Republic of China. With a population of over 21 million residents, it is regarded as the most populated capital city in the world. Due to its high population density, rapid urbanization and motorization, Beijing faces severe congestion and air quality problems.
Beijing is an interesting case study due to the sheer number of solutions that have been adopted to overcome its transport related problems such as the development of bus rapid transit corridors, new extensive bus lanes, policies like congestion charging, among others. In this section, we describe the three databases with the GPS trajectories, weather and context information in Beijing.

GPS trajectories
The dataset adopted in this study has GPS trajectories of 182 users collected during the GeoLife project conducted between April 2007 and August 2012 [23]. The GPS trajectories were recorded by GPS loggers and phones and are described by a sequence of time-stamped points, with each containing longitude, latitude, altitude, and transportation mode label. Although the dataset contains trajectories distributed in over 30 cities worldwide, in this work, we use a total of 2,671 labeled trajectories from Beijing. The total traveled trip distance and duration is 21350 km and 1296.6 h, respectively. Each traveler has an average of 74 trips, with an average distance and duration of 5.75 km and 0.5 h, respectively. In Figure 1, we show the geographical location and demarcation of Beijing and the starting point of the trajectories.
The transport modes considered in this work include: walk, bike, car, bus, and train. The percentage of the transport mode in the dataset is shown in Figure 2. During the period in which the data was collected, walking accounts for 46% of the transport mode labels in Beijing. This result is expected given that travel guides recommend getting around Beijing on foot as the best and most efficient commuting mode [24].
It is worth noting that the Geolife trajectory dataset is natural since it was recorded while the users performed their life routines [25]. For instance, trajectories were recorded as users made trips from home to work and vice-versa, to entertainment and sports venues, shopping, sightseeing.
The Geolife trajectory dataset has been used in different research fields such as in privacy preserving location data [26], measuring trajectory stops and moves [27], user identification [28], trajectory completion [29], and transport mode detection [30].

Weather information
The weather conditions considered include, temperature, precipitation, wind speed, air quality, and relative humidity, collected from a meteorological station located within the city of Beijing. This database can be accessed from the National Aeronautics and Space Administration (NASA) website [31].
Thanks to its continental climate, Beijing has hot, sultry, and rainy summers, cold and sunny winters, and a precipitation of about 545 mm annually [32]. However, a summary of the conditions in Table 1 shows that during the period of study: i) average temperature was about 19C, ii) the rains were not abundant with a mean of 0.083 mm/h, iii) the air was less humid with a mean relative humidity of 53%, iv) the wind speed was very light according to Beaufort scale given the mean wind speed is 3 m/s, and v) the air quality was unhealthy given that the mean air quality was 153 µg/m 3 .

Context information
This study also includes the impact of context information such as the Beijing 2008 summer Olympics event held from 8th to 24th August on transport mode choices. This event was awarded to Beijing in the year 2001, thereafter its leaders embarked on massive projects to transform the city's transport system. For instance, the rail network was expanded from 50 km to 200 km and there was an introduction of 286 km of dedicated on-road lanes [33]. In fact, a new set of restrictions were passed months leading to the games. For instance, in some parts of the city, people were not allowed to use their private vehicles, trucks from outside Beijing were to avoid the city, flexible retail and shopping hours were also introduced to spread traffic loads.
Other context information attributes we look at include, rush hour or off-rush hour, holiday or non-holidays, day time or night time, and trip distance. In Beijing, morning and evening rush hour traffic demand is from about 7AM to 9AM and 4PM to 8PM, respectively according to the studies made by [33], which has been applied during our analysis. As for the holidays, there are 10 public holidays spread throughout the year, which include: New Year's, Chinese New Year, Lunar New Year, Qingming Festival, Labour Day, Dragon Boat Festival, Mid-Autumn Festival, Golden Week, and Christmas [34]. In total, our analysis considers 28 days of the year as public holidays. Day time in Beijing is considered as the period between sunrise, i.e. 6AM and sunset, i.e. 6PM compared to night time, which is between sunset and sunrise.

Matching weather and context information with GPS trajectories
As mentioned earlier, the GPS trajectories from the Geolife dataset contain the location, date, time, and the mode of transport chosen by the traveler. The starting point of the recorded trajectory provides the departure information of the traveler such as their departure location, date, and time.
In matching the weather data to the GPS trajectories, the starting point of each trajectory is linked to the hourly historical weather data from the nearest meteorological station. The result generates weather-related variables corresponding to the departure time of each trip. It is worth noting that hourly weather data are preferred over daily weather data because the former generates higher temporal accuracy, given the constantly changing weather.
Similarly, the timing corresponding to each of context information attributes is linked to the departure time of each trip. From the combined datasets, we analyse the impact of weather and context data on transport mode choices by considering weather and context information as input and modal choices as output conditions.

Statistical analysis
In this section, the weather conditions, context information and GPS trajectories are analysed statistically. We apply two main statistical methods which are commonly used in data analysis: i) descriptive statistics, where we summarize the findings using bar graphs, and ii) statistical interference, where we apply the MNL and MNP models.
In order to track the weather changes, we classify the weather conditions into different levels according to their rating generated from published weather knowledge. For instance, the temperature scales with seven levels observed in Figure 3a are adopted from [35,36], precipitation with five levels in Figure 3b from [37,38], wind speed with six levels of classification in Figure 3c -are adopted according to Beaufort number [39], air quality yardstick with six levels that run from 0 to 500 in Figure 3d adopted from monitoring the fine particulate matter concentrations (PM 2.5 ) [40], and relative humidity in Figure 3e falls into four levels adopted from [41]. These classifications as well as the timing corresponding to the context information are used to compute the share for each transport mode in Figure 3 and Figure 4 due to the effects of weather and context data, respectively.

Influence of the weather condition on the transport mode choices
In Figure 3a, it is observed that during snowfall, cars have a higher share compared to other transport modes. However, walking becomes the most preferred transport as the temperature increases. At temperatures greater than 25C, the bike and walking shares reduce and the bus share increases. In Figure 3b, we see that the share of the transport modes is similar across the precipitation levels except when precipitation is greater than 2 mm/h, i.e, little rain hardly impacts passengers' travel choices, which is in line with the observations of Junlong Li et al. [9].
In Figure 3c, it observed that as the intensity of the wind increases from calm, greater than 0.27 m/s to strong wind, greater than 7.8 m/s, the bike and walk share reduce, while the car share increases. In Figure 3d, it is noticeable that as the air quality turns from good  (0-50) µg/m 3 to unhealthy (201-300) µg/m 3 , the transport mode share remains constant. The potential explanation is that Beijing averages an air quality of 150 µg/m 3 [see - Table 1], so it is highly likely that most of the time the air quality is unhealthy as people carry on with their daily lives. However, when the air quality becomes hazardous (300+) µg/m 3 , bike and walk have a larger share and there is a reduction in the car and bus shares in order to reduce the (CO 2 ) emissions. In Figure 3e, the share of transport mode is the same regardless of the relative humidity, which is in line with the findings in Aultman-Hall et al. [36], who found that humidity has a limited effect on transport choices.

Influence of context information on the transport mode choices
During the Olympics, an increase in the bus share and a reduction in the car share is shown in Figure 4a. Its intuitive that during international events of this magnitude some roads are closed off or restricted to the public. Our findings show that the alternative travel means is by public transport, the bus in particular. In Figure 4b, an increase in the bus share and a reduction in the train and car shares is noticeable at rush hours in comparison  with non-rush hours. This finding is in line with the study made by the Asian development bank in [33] in which it is shown that during rush hours passengers prefer taking the bus than the train.
In Figure 4c, we see that there is an increase in the car share and a reduction in the bus share during holidays perhaps because people are choosing to travel by car on holidays. In Figure 4e, it can be observed that trip distance has a tremendous influence on the transport mode choice. For instance for shorter distances (0-6 km), walk and bike have a larger share but between (6-12 km) the bus share is larger. However between 12-20 km, the bus share reduces as the car and train shares increase. Beyond 20 km, train share reduces and car share increases.

Statistical Modelling
Given that the transport mode choices are a typical example of discrete outcomes, in order to link the probabilities of choosing a given mode of transport to the weather and context information, in this study, we adopt MNL and MNP models. These are among the 8 of 14 most popular models in analysing and predicting travel decisions. These models analyse simultaneous effects of meteorological and context information on transport decisions in one integrated model, which is important in understanding mobility decisions.

Methods
According to Ben-Akiva et al. [42], the framework of these models is based on four general assumptions: i) decision maker who is the individual or group of people making the choice, ii) alternatives, which are the available choices to choose from, iii) attributes, which are parameters that characterise each alternative, iv) decision rule, which is the process used by the decision maker evaluates the alternatives.
The decision rule in models used in travel behavior analysis is based on the Utility model, U in , and takes the form The alternative with the highest utility is chosen, therefore the probability that alternative i is chosen by an individual n from a choice set C n = {1, 2, ..., i, ..., j n } takes on the general form V in is the deterministic or systematic part of the utility function. It is defined in (3) by a vector of observable variables z n and their corresponding coefficients γ i : To identify the model, one set of coefficients of the systematic term needs to be normalized to zero e.g (γ 1 = 0), which makes the corresponding transport mode choice (i = 1) to become the base mode. The coefficients of other alternatives are interpreted in reference to the base outcome.
in is the random term which expresses the errors of the utility function. Its distribution is often known and makes the problem more reasonable to empirically characterise. If the error terms assumes an independent and identically distributed "Extreme Value", i.e, Gumbel, the model is MNL, hence P in takes on the form However, when the error terms assume a multivariate normal distribution, the model is MNP and P in takes on the form   where φ() is the density function which follows a multivariate normal distribution with means 0 and a covariance matrix with a size of J n X J n , ξ 1n = V in − V jn and in = jn − in as also seen in (2) are the difference between the systematic and error terms, respectively.
The difficulty of estimation of these models grows with the number of discrete choices, so dedicated commercial software packages are recommended for their estimation. In our multivariate analysis, MNL and MPL models have been implemented via the software package Stata. We refer the reader to Stata base manual [43] for a detailed discussion of estimation procedure.
Note that during the analysis, five modes of transports are used as the dependent variable [see Figure 2], and two groups of individual level variables: weather and context information attributes are used as the independent (explanatory) variables [see Table 1]. However, in the model, we excluded the use of relative humidity, precipitation, and holidays information because as seen previously have a limited impact on the transport mode decisions. In fact, the categories of the remaining weather conditions such as temperature, wind speed, and air quality were adjusted according to relationships and patterns observed during the statistical analysis. McFadden R-squared 0.3289 0.012 * * Significant at α < 0.01 * Significant at α < 0.05 Table 2 shows the results of the both models with a summary of the relationships between the transport mode choices, weather, and the context information. The modelling performance shows that MNL model performs better than MNP because it has a larger McFadden R-squared and smaller AIC and BIC values, which shows that results will be in favor of the MNL model. The values in the tables represent the standardised regression coefficients, with the star indicating their respective statistical significance. Note that the coefficients are a relative measure of transport choice compared to walking, which was reference mode. For instance our findings indicate that a negative temperature effect on the bike, car, bus, and train make them less likely to be used as transport modes than walking.

Results
However, this means that other than the sign, the coefficients don't have a lot of useful interpretation since the magnitude of the coefficients cannot be interpreted or quantified. Thus the marginal effects of the observatory variables z n on the dependent variable i is required. Marginal effects are defined in (6) as the amount of change in dependent variable due to one unit change of the observatory variable in the model system. Table 3 reports the marginal effects for each variable according to the MNP and MNL models. The positive values represent an increase in the probability of selecting an alternative by the marginal effect expressed as a percentage while the negatives indicate the opposite. It should be noted that qualitative results are consistent between the MNP and MNL.

Discussion
In the previous section, we have presented statistical models that help in understanding the relationship between the weather and context information with transport mode choices. In this section, the results in Section 4.2 are discussed.

Air Quality
When the air quality changes from the very unhealthy category to hazardous, Table 3 shows that this change would increase the bike share by 31% and reduce the car and bus shares by 12% and 10%, respectively. The likely explanation is that hazardous air quality causes serious health concerns such as loss of lung capacity and decreased lung function according to epidemiological studies. Therefore, the government in Beijing introduces vehicle controls to reduce C0 2 emissions to prevent the population from long-term exposure to polluted air. Consequently, the population is encouraged to use environmentally friendly transport mode choices, such as the bike, to get to their preferred destinations.

Temperature
Among all weather conditions, temperature has the most prominent effect on mobility choices with strong significant effects on almost all travel behaviour modes. According to Table 3, when the temperature increases, people walk more, which decreases the relative likeliness of choosing other transport modes. For instance, we see that with a unit increase in temperature for 0-15 C and 15-25 C, the probabilities of walking is expected to increase by 23% and 32%, respectively while the probabilities of choosing the bus is expected to decrease by 21% and 17%, respectively. However, a unit increase in temperature greater than 25 C, would reduce the bike share by 21% while the walking share would increase by 27% according to MNL model, which is line with the survey results in [44]. Table 3 shows that a unit increase in the wind speed between 3-7.8 m/s would result into an increase of 1.8% in the train modal share according to the MNL model. This result shows that the wind speed just like relative humidity and precipitation has a very limited effect on transport mode choices in Beijing. One possible reason is that during the period in which this study was conducted, the average wind speed was 3 m/s, which is classified as very light according to the Beaufort scale.

Trip distance
Trip distances significantly influences all transport choices as shown in Table 3. For instance, a unit increase in distances of 4-8 km is expected to decease walking and biking by 51% and 30%, respectively, while increase the bus, car, and train shares by 6%, 12.5%, and 2.7%, respectively. A unit increase in distances of 8-12 km is expected to lead to a further decrease in walking by 62%, while increasing, car, bus, and train shares by 12.3%, 39.8%, and 12.3%, respectively. For a unit increase in distances of 12-20 km, walking and biking are expected to decrease by 66% and 16.2%, respectively, while car, bus, and train shares are expected to increase by 22.5%, 24.6%, 35.3%, respectively.
Beyond 20 km, this trend is similar except that there is a further increase in the car, bus, and train shares by 67.2%, 16.2%, 6.6%, respectively while a further reduction in walking and bike shares by 72.5% and 17.6%, respectively is expected. These results are in line with the findings in survey studies by [16] and [45], whose authors find that shorter distances are likely to performed by walking and bike, while the longer distances by car, bus, and trains.

Olympics
Generally, it is highly likely that different transport modes will be disrupted due to the magnitude of people that an event like the summer Olympics attracts. This study helps to analyse how the Olympics influences the transport mode share. During the Olympics, we see that there is a decrease of 17.5% in the car share and an increase of 13% and 10% in the walk and bus shares, respectively.
The decrease in the car share can be attributed to the private car restrictions imposed by the government, and the increase in the bus share is explained by the additional bus lines put in place as an alternative to accommodate the number of passengers. Our findings are consistent with the findings of the authors in [33] who mention that during the preparation for the 2008 summer Olympics, a series of traffic restrictions on private vehicles were introduced by the Beijing government.

Rush hour
During rush hours, we see a reduction in the likelihood of traveling by car and train by 7.1% and 4.4%, respectively and an increase in biking by 5.8% . A more general explanation for this travel behaviour may have to do with the need to reduce or avoid congestion i.e. traffic jams and congestion of people in trains.

Day/Night
During the day, Table 3 shows that the probability of biking would decrease by 6.4% and the use of the bus would increase by 7.3%. A possible reason is that during the day there are more buses operating than in the night. During the night there is an increase in the people biking because transport operators reduce the number of buses.
However, our result is contrary to the findings obtained using survey data in Scandinavia cities of Stockholm and Oslo [45], in which during the night there are fewer trips by active transport modes such as biking than during the day.

Conclusions
Our main objective is to provide a better understanding on the possibility of using GPS data for studying the impact of weather and context data on transport mode choices. In the methodology, we linked GPS trajectories with weather and context information and then analysed their relationship by descriptive means and statistical models.
The first conclusion is that trip distances and the transport mode choice were very much interrelated. In fact, this study highlights that the trip distance is the most significant factor in choosing a transport mode. Of the five weather conditions, temperature had the most prominent effect on the transport mode choice.
The second conclusion is that we can observe the effect of governmental regulations on the choice of transport mode once the air quality becomes hazardous. We observe that traffic restrictions imposed by the government encourage the population to chose environmentally friendly transport modes such as the bike instead of the car and bus, which are greenhouse gas emitters. Moreover, our study concludes that hosting an event such as the summer Olympics would require transport regulators and operators to provide additional bus lines to accommodate the number of passengers.
Future studies could include adding more explanatory variables such as income, gender, occupation, age, trip purpose, to improve the performance of the models. In addition, it would be interesting to use advanced models such as Structural equation modelling to reveal potential unobservable heterogeneity of the effects of weather and context data on transport mode choices.