1. Introduction
Due to its convenient online booking services and door-to-door pickup and drop-off services, the emerging travel mode of online car-hailing has become an important choice for people to travel in the city. China’s Ministry of Transportation reported that China’s online taxi regulatory information interaction platform received a total of 660 million orders in November 2020, and most of these car-hailing orders involved intracity travel.
In recent years, China has elevated its urban agglomeration development strategy to a new level. The Yangtze River Delta, the Pearl River Delta, and the Beijing-Tianjin-Hebei urban agglomeration have become the engine of China’s rapid economic growth. With improvements in road networks and enhanced transportation convenience, intercity travel behaviors such as commuting to workplaces across cities and going to and from international airports have become increasingly common. In turn, this trend has stimulated growth in intercity car-hailing demand. Thus, research on intercity car-hailing travel holds great significance.
However, few studies have investigated intercity car-hailing travel. Most likely, the reason is the limited availability of intercity travel data. Similar to intracity travel analysis, it is critical to analyze the factors that influence online car-hailing trips and to study the relationship between the urban built environment and intercity car-hailing travel. The analysis in this paper can provide practical help to improve intercity traffic management and to develop better transportation policies.
Decades’ worth of extensive studies have investigated the factors affecting traveling behavior. Questionnaire investigations were employed by early researchers. Cervero et al [
1] analyzed three factors of the impact of the built environment on traveling behavior: density, diversity, and design. Later, in 2001, four additional factors were added by Ewing et al. [
2]: destination accessibility, the distance to transit, demand management, and demographics. Questionnaire investigations were soon replaced by information technology, which has better efficiency and accuracy. Schaller et al. [
3] developed multiple regression models to meet taxi requirements using Global Positioning System (GPS) data. Subsequently, Pan et al. [
4] Tang et al. [
5], Yao et al. [
6], Liu et al. [
7], and Yang et al. [
8] conducted research based on GPS data, aiming to analyze the spatial and temporal factors of taxi cabs. In addition, numerous studies [
9,
10,
11] have investigated the impact of primary land use factors on online car-hailing ridership with Didi or Uber. In this article, we utilized online car-hailing GPS data to mine intercity ridership.
Recently, scholars have conducted many studies on the factors that influence urban traffic trips based on online car-hailing GPS data. Cerveroet et al. [
1] found that vehicle miles traveled (VMT) is the most relevant factor to destination accessibility. Wang et al. [
12] concluded that bus trips are most affected by the distance to transit and the street network design. Li et al. [
13] found that entertainment districts and residential districts are the factors with the greatest influence on night-time online car-hailing travel, and the land use mix was also found to have a positive effect. However, all the studies above mainly focus on different areas within a city, and very few studies have analyzed the influencing factors of intercity car-hailing travel. By analyzing intercity travel data, we found that the peak hours for intercity car-hailing trips are between 9:00 and 10:00 between and 16:00 and 18:00, which are significantly different from those for intracity trips. Moreover, intercity online taxi trips are largely influenced by weather, which is very different from the characteristics of intracity trips.
To better understand the influencing factors of intercity online car-hailing travel, it is essential to determine whether the factors that influence intracity travel are still applicable to intercity travel, as well as the important factors that influence intercity online car-hailing travel. Based on intercity car-hailing data, we classified influencing factors into four types, i.e., spatial factors, temporal factors, passenger-related factors, and weather factors, and we analyzed the importance of these factors for intercity car-hailing travel.
On the other hand, some early studies have shown that the urban built environment plays an important role in urban travel [
2,
14,
15], together with population density and land use [
16,
17]. Pan et al. [
4] studied the influences of 8 types of land use patterns: train/coach stations, hospitals, commercial districts, office buildings, campuses, scenic spots, entertainment districts, and residential districts. Among them, the number of residential districts was nearly five times that of other categories. Sun et al. [
9], Li et al. [
14], and Zhang et al. [
10] investigated ten types of urban built environments but did not describe the classification standard of destinations. To the best of our knowledge, most studies have not provided clear explanations of the classification of the urban built environment.
To address the issue above, we performed a comprehensive review of papers and classified urban built environments. A recent study [
18] pointed out that researchers have replaced land use factors with points of interest (POIs). Inspired by Jiang et al. [
19], we categorized POIs into 13 types based on Amap and previous papers [
4,
5,
6,
7,
9,
10]: catering POIs, healthcare POIs, recreational and entertainment POIs, tourist destination POIs, residential POIs, accommodation POIs, finance and insurance POIs, cooperative and business POIs, government POIs, educational and cultural POIs, and transportation POIs.
Some researchers have also used regression models to explore the influence of the built environment on ridership, such as ordinary least squares (OLS) [
8,
20,
21] models and two-stage least squares (2SLS) regression [
3]. Traditional methods of passenger travel demand analysis mainly include global regression methods [
22,
23], which assume that all variables are stationary and independent across the study area and ignore spatial heterogeneity. Geographically weighted regression (GWR) models overcome this shortcoming by allowing independent variables to alter spatially. A GWR model effectively reveals the spatial variation in the influence coefficient across a study area [
4,
5], and GWR models have been widely used for transportation planning [
6,
7,
24,
25]. Therefore, in this paper, we choose a GWR model to analyze the relationship between the urban built environment and intercity online car-hailing travel and to explore how the built environment influences intercity online car-hailing travel.
To conclude, we focus on analyzing the factors that influence online car-hailing trips and studying the relationship between the urban built environment and intercity car-hailing travel based on intercity car-hailing data. First, we investigate whether the factors that influence intracity travel are still applicable to intercity travel, as well as the important factors that influence intercity online car-hailing travel. Second, we classify the urban built environment based on Amap and previous papers. Third, we adopt a GWR model to analyze the relationship between the urban built environment and intercity online car-hailing travel and predict intercity travel demand. The purpose of this paper is to fully study the characteristics of intercity car-hailing travel, and to make a more accurate prediction of intercity travel demand. We hope that the analysis of influencing factors of intercity travel and the prediction of travel demand in this article can help city managers better plan for intercity transportation facilities and improve travel policies. At the same time, our research can also help online car-hailing companies more accurately deploy vehicles for intercity travel in time and space, so as to make full use of public resources.
The rest of this paper is organized as follows.
Section 2 describes the study data, and
Section 3 provides a detailed analysis of the factors that influence intercity car-hailing travel.
Section 4 uses a GWR model to forecast the demand for intercity online car-hailing travel.
Section 5 summarizes the content of this research and suggests future research.
4. Demand Forecasting of Intercity Car-Hailing Travel
4.1. Methodology
Geographical data are widely utilized in geography, economics, environmental science, and many other fields. The first law of geography emphasizes that spatial correlations appear greater between closer objects. Initially, the OLS method was used to deal with problems of multivariate linear regression. GWR was then designed by Brunsdon in 1996 [
26] to explore the spatial stationarity of spatial data, serving as a remarkable solution to spatial correlation. The fundamental formula is as follows [
27]:
Each () represents a spatial grid; represents the value of kth () variable of grid ; is the residual of the actual value and fitting value; represents the coordinates of the center point of the grid; is the intercept; and is the spatial geographic location function.
In this paper, 13 features are selected as the independent variables: catering POIs, healthcare POIs, recreational and entertainment POIs, tourist destination POIs, residential POIs, accommodation POIs, finance and insurance POIs, cooperative and business POIs, government POIs, educational and cultural POIs, transportation POIs, GDP, and population density, where m equals 13. Moreover, n equals 978, referring to the 978 grid units contained in the study area.
After 978 repetitions, the formula can be written in matrix form:
where
equals
, meaning the number of passengers in the 978 grids;
is a matrix with a size of
, representing the coefficient for the 13 dependent variables in the 978 grids and the intercept.
X is the feature matrix with
rows and
n columns, consisting of
mentioned above and an extra intercept.
Given coordinates
, the regression coefficient is estimated by the locally weighted least square method:
The formula can be estimated by the following matrix operation:
where
is a diagonal matrix with diagonal elements
, usually calculated by a Gaussian function:
In the formula above, represents the Euclidean or Manhattan distance between and ; is the bandwidth, the most important coefficient for GWR, representing the attenuation parameter of .
In this paper, bandwidth is chosen based on the Akaike information criterion (AICc) [
28] to measure the model performances between OLS and GWR. Note the following:
The best bandwidth is chosen based on the following:
4.2. Model Results and Discussion
A total of 224,822 orders were divided into 978 grid units with a size of 500 m × 660 m. The number of buildings at each POI in each grid was obtained from the Amap developer application programming interface (API). The GDP and population density [
29] of each county were retrieved from the Ning Xia Statistical Yearbook 2020 and then assigned to each grid. Thirteen independent POI variables were selected: catering POIs, healthcare POIs, recreational and entertainment POIs, tourist destination POIs, residential POIs, accommodation POIs, finance and insurance POIs, cooperative and business POIs, government POIs, educational and cultural POIs, transportation POIs, GDP, and population density. Shopping POIs and domestic POIs were excluded on statistical grounds. Passenger flows served as the dependent variable.
Considering that intercity travel has more specific routes than pure car-hailing travel, which implies great aggregation in certain addresses, our data contain order records from two fleets and three routes. The Dawukou Fleet contains two routes, with one route from the Dawukou district to Yinchuan Hedong Airport and the other route from the Dawukou district to Yinchuan. The Huinong Fleet contains only one route from the Huinong district to Yinchuan Hedong Airport. Aggregation is remarkably shown in a single route to the airport, as only 181 grids are occupied by the Huinong Fleet, while the destinations of the route to the urban district are distributed in more grids (978). In our study, all routes were included as real travel scenarios.
4.2.1. Model Results
Table 2 shows the results of the OLS model first used to identify the general influence of each independent variable on ridership. The adjusted R
2 is only 0.61546, indicating that the OLS model has a terrible fit to the data. The AICc of the OLS model is 152.651687, which explains the poor performance of the OLS model. The variance inflation factor (VIF) of each variable varies below 7.5, indicating that there is no multicollinearity. Significant results are observed for five variables: residential POIs, finance and insurance POIs, government POIs, transportation POIs, and GDP. GDP and population density cannot be directly compared with POIs given the differences in their units. Attention should be paid to the relative size of the GDP coefficient. Given the characteristics discussed above, the coefficients suggest that areas with a larger number of people receive more orders and that most orders come from travelers traveling from economically undeveloped areas to developed areas, which explains the negative coefficient of GDP. Different from previous studies [
8,
9,
30], the variables were analyzed from a global perspective, and it was impossible to explore how the built environment influences intercity ridership in the specific grid drawn. Therefore, a GWR model was applied for further analysis.
The GWR result is a coefficient matrix with a 978-grid sample and 13 independent variables.
Table 3 presents the minimum, maximum, standard deviation, and average values for every independent variable. The standard deviation of every variable except healthcare POIs remains at a high level, emphasizing strong regional differences. Tourist destination POIs, residential POIs, finance and insurance POIs, transportation POIs, and GDP show larger geographical relevance. Every variable has a corresponding grid where it makes the greatest contribution to the estimated value. Among the variables, residential POIs have the highest deviation, varying from −19.3051 to 39.1386, which indicates large regional relevance and requires further analysis. Recreational and entertainment POIs are also excavated by the GWR model as a supplement to the OLS results. Healthcare POIs show higher spatial stability than other variables.
As summarized in
Table 4, the adjusted R
2 in the GWR model is 0.2 higher than that in the OLS model, reaching 0.81324 and indicating that nearly 80% of the results are explainable. These comparisons make it clear that the GWR model is more suitable for analyzing geographical data. The AICc value decreases from 152.651687 to 112.895317, showing that the GWR model is a better fit. However, the AICc value of 112 is still higher than the general car-hailing data [
14], demonstrating that there are obvious spatial aggregation characteristics in intercity travel.
4.2.2. Intercity Car-Hailing Travel Demand Forecasting Results
The ridership for each grid is described in this section. Different colors are painted according to the size of values, which allows a clear overview of the spatial distribution of ridership and other variables. As many relevant studies did, we fold prediction maps of different models, and consequently, the effect of residential POI, as well as transportation POI, is analyzed from a temporal and spatial view. For prediction, the MSE of GWR is 55.863785, which is 33.372751 lower than that of OLS.
In
Figure 11, grids are painted with distinct colors, from blue to red, based on the ridership count. The ridership counts are then compared with the estimated OLS and GWR values for the Yinchuan district simply and intuitively. Better performance is observed with the GWR model than with the OLS model.
Figure 11a shows that orders are mostly concentrated in the blue regions of Yinchuan, whose orders are far higher than those of the surrounding areas. The OLS and GWR estimations retain this trend but behave conservatively in the prediction of the surrounding flow.
Figure 11c appears closer to
Figure 11a than to
Figure 11b.
The results for Yinchuan are then isolated for a more in-depth analysis. From the above discussion, tourist destination, residential, finance and insurance, government, and transportation POIs are highlighted as having strong spatial heterogeneity. The statistical analysis in
Section 3 finds that residential and transportation POIs appear to be the main source of passengers, and they are also found to have large spatial heterogeneity with a high standard deviation. Consequently, the distributions of POIs and the coefficients for residential areas and transportation are further analyzed.
Figure 12 compares the results and GWR coefficients of residential POIs. It more clearly appears that residential orders mainly come from people in the Dawukou district of Shizuishan. As the capital of Ningxia Province, Yinchuan is the most developed city in this region, and citizens tend to undertake intercity travel to Yinchuan for different reasons and then return to the Dawukou and Huinong districts where they reside. Based on this interpretation, despite the larger population in Yinchuan, the main passengers taking buses from Yinchuan are not Yinchuan locals. Therefore, places with a higher concentration of residences do not appear to be the center of orders and the coefficient, as shown in
Figure 12c,e.
Figure 13 compares the numbers and GWR coefficients of transportation POIs. Yang et al. [
8] emphasized that car-hailing does not supplement public transport modes such as public buses; rather, it is a competitor. People tend to spend less on car-hailing services if public transport is more available. In
Figure 13a,b, stations and road intersections are highlighted, but the demand coefficient is far larger in Shizuishan than in Yinchuan. We conclude that intercity car-hailing travel is mainly used to meet the needs of Shizuishan locals [
31]. Passengers tend to undertake intercity travel from their homes to a transportation hub and back home from Yinchuan, which is in line with the observations made in
Section 3. The purpose of intercity travel is very clear.
5. Conclusions
The development of urban agglomerations has led to the gradual increase in demand for intercity travel, making intercity car-hailing travel an important part of urban mobility. However, few studies have explored the influencing factors of intercity travel or forecasted intercity car-hailing travel demand. This paper used data-mining methods to explore the influence of temporal, spatial, passenger-related, and environmental factors on intercity car-hailing travel. Regarding the temporal factors, we find that highest peak hours for intercity car-hailing trips are between 16:00 and 18:00, which are significantly different from the peak hours for intracity trips. In terms of the spatial factors, residential districts and transportation facilities are the most influential factors for intercity online car-hailing travel. There exists an interesting phenomenon whereby departures from and drop-offs at residential districts increase as the weekend approaches and drop after Mondays. Concerning the passenger-related factors, we found that elderly individuals and children are more likely to be accompanied by other people when traveling between cities, and youths and children travel more often on weekends. With respect to the environmental factors, we found that intercity travel is largely affected by weather. Bad weather conditions, such as rainy days, stormy days, and snowy days, limit residents’ mobility and willingness to travel. Based on the conclusions of the influencing factor analysis, the characteristics of intercity and intracity car-hailing travel are very different. Thus, analyzing the influencing factors of intercity car-hailing travel holds great significance.
To further study the impact of the urban-built environment on intercity car-hailing travel, this paper uses a GWR model to predict the demand for car-hailing travel in each grid. The advantage of the GWR model is that it takes into account spatial heterogeneity, and the experimental results show that the GWR model achieves better prediction accuracy than the OLS model. Additionally, the analysis results of the GWR model reflect that residential facilities and transportation facilities are important influencing factors for intercity car-hailing travel, which is consistent with the conclusions drawn from our influencing factor analysis.
Our research could help city managers better plan for intercity transportation facilities and improve travel policies, and it can also help online car-hailing companies more accurately deploy vehicles for intercity travel. However, there is still room for improvement. In future research, we will consider more intercity factors, such as city distance, to better study the influencing factors of intercity car-hailing travel. In addition, more complex models, such as geographically weighted temporal regression (GWTR) models, can be introduced to consider both the temporal and spatial heterogeneity in the demand for intercity online car-hailing travel. Furthermore, we will verify whether our findings are still valid in other urban agglomerations.