Exploring Spatiotemporal Variation in Hourly Metro Ridership at Station Level : The Influence of Built Environment and Topological Structure

Reliable and accurate estimates of metro demand can provide metro authorities with insightful information for the planning of route alignment and station locations. Many existing studies focus on metro demand from daily or annual ridership profiles, but only a few concern the variation in hourly ridership. In this paper, a geographically and temporally weighted regression (GTWR) model was used to examine the spatial and temporal variation in the relationship between hourly ridership and factors related to the built environment and topological structure. Taking Nanjing, China as a case study, an empirical study was conducted with automatic fare collection (AFC) data in three weeks. With an analysis of variance (ANOVA), it was found that the GTWR model produced the best fit for hourly ridership data compared with traditional regression models. Four built-environment factors, namely residence, commerce, scenery, and parking, and two topological-structure factors, namely degree centrality and closeness centrality, were proven to be significantly related to station-level ridership. The spatial distribution pattern and temporal nonstationarity of these six variables were further analyzed. The result of this study confirmed that the GTWR model can provide more realistic and useful information by capturing spatiotemporal heterogeneity.


Introduction
Due to the competitive advantages in huge capacity, clean energy, and conservation of land, the metro has been widely perceived as a preferred public transport mode for major metropolitans [1,2].With the emerging traffic congestion and overpopulation aggravated by rapid urbanization and motorization, a growing number of Chinese cities have taken steps to build well-connected metro networks.For those middle or long travel-distance individuals, the metro service is more attractive compared to other public transportation modes for travelers because of its high speed, reliable schedule and comfortable running smoothness.As a consequence, it can promote a significant modal shift to reduce the heavy use of cars [3,4].Although the construction of metro lines has turned out to be an effective measure to alleviate transportation pressure and improve city image [5], some problems have occurred during the development and expansion of metro networks, such as uneven distribution of passengers [6], overcrowded carriages [7], and increased security risk [8].A particular source of these problems is the unreasonable planning of route alignment and station location.
Ridership demand forecasting is a vital component for the analysis of project viability and sustainability in metro planning [9].A traditional method, namely the four-step model, has been widely used for travel demand forecasting and transportation infrastructure planning because of its universal applicability [10].Since the four-step model requires a large amount of activity surveys and complex modeling processes [11,12], many recent studies adopted an alternative approach by investigating the regression relationship between the ridership demands and the characteristics of metro stations [9,[12][13][14][15][16][17][18].Most of those studies focus on the ridership demand of metro stations in daily or annual profiles, but only a few concern the variation in hourly ridership demand.The hourly ridership demand, especially the ridership demand during peak periods that often results in heavy crowding in metro systems of many Chinese cities, can provide some considerable new insights for metro planning.This paper will attempt to address this gap with a case study from Nanjing, China.
In a recent study, Zhao et al. built an ordinary least squares (OLS) multiple regression model to investigate the influence of land utilization, external connectivity, intermodal connection, and station type on average weekday ridership at station level with data from Nanjing metro [15].The results were compared with the outcomes of regression models developed for cities in the United States [13] and Seoul [14], and it was found that both the R 2 value and the significance of variables vary greatly across different cities.Another experiment in the Madrid Metro network demonstrated that the correlation between station-level transit ridership and variables of the characteristics have significant differences in space.Also, the geographically weighted regression (GWR), which has advantages in measuring spatial instability, manifested its superiority over the traditional OLS model [9].A similar spatial autocorrelation across different metro stations was found in commercial property value in Wuhan [19] and metro-bikeshare transfer in Nanjing [20].Besides, the short-term metro ridership was proven to be autocorrelative in temporal scales in many studies [17,21], which means that the hourly ridership in contiguous periods could present analogous correlations with variables of the characteristics.
The objective of this study is to investigate the spatial and temporal variations in hourly metro ridership demand at station level.The rest of this paper is organized as follows.The next section reviews existing literature of the influence factors on metro ridership and the direct forecasting models.Section 3 describes the methodology used in this paper.Section 4 is an introduction to the study area and the data set.Modelling results analysis and discussion are presented in Section 5 and our conclusions are presented in Section 6.

Characteristics Affecting Ridership
It is essential for ridership forecasting to explore what and how factors affect metro ridership at station level.In past decades, previous studies found that the station-level ridership demand in metro systems is significantly associated with the characteristics of the station environment [9,13,14,16,[22][23][24][25].Characteristics affecting station-level ridership mainly include built environment variables, socio-economic variables, connectivity with other metro stations and other traffic modes.Built environment is usually measured by building uses and building gross amount or density.The typical built uses can be categorized into: residence, service facilities, companies/offices, attractions, educations, hotels, bus stops, roads, and parking lots [26].The built environment is thought of as the crucial driver of transit ridership [12].Heavy land use means that there are more people living and/or working in the walking area around metro stations [27].In general, the metro stations with a higher land use level in neighborhood areas, such as retail, hotel, education, and bus routes, should be more likely to aggregate crowded passenger flow; nevertheless it is not an incontestable truth.For instance, the station-level ridership will decrease with increasing storage area according to Zhang and Wang [25], and large-scale commercial building density has a negative correlation with station-level ridership in Seoul metro while taking 1 km as service boundary [24].
Many studies have found that the population and employment within walking area are important factors and have positive correlations with metro ridership [9,12,13,15].According to the results of Kuby et al., lower income might lead to increasing metro use [13].In terms of the Rio de Janeiro metro, metro ridership presents a stronger correlation with the number of jobs than the average income and population [16].Apart from population, employment and income, some other socioeconomic characteristics, such as age group, gender, ethnicity, and unemployment rate are also considered to be correlated with transit ridership [28][29][30].
The influence of connectivity with other metro stations is usually quantified with the topological features of network structure.For example, the distance of the nearest station was found to be negatively associated with station-level ridership [16,31].A more comprehensive research conducted by Sohn and Shim [14] introduced several external connectivity features: closeness centrality, betweenness centrality, straightness centrality, and average number of transfers calculated respectively for both the metro and highway networks.Taking the Seoul metro as a case study, they found that only the closeness centrality and average number of transfers have significant impacts on metro ridership at a station-level, and both the associations are negative.Regarding the impact of connectivity with other traffic modes, it can be counted by some indicators of built environment, such as feeder bus lines/stops, road length, and number of parking lots [13][14][15].
Due to the socioeconomic, cultural and political specificities of Nanjing (e.g., large amount of floating population), accurate socio-economic variables within the walking area of metro stations are difficult to access.On the other hand, despite the fact that population and employment are not synonymous with the density or type of built environment, the built environment variables can be used as indicators of socioeconomic status.For example, high residential building density can be attributed to more residents around stations.Considering the accessibility of data, we mainly explored the correlation between station-level ridership with the variables of built environment and topological structure in this paper.

Direct Ridership Models
The history of demand modeling has been dominated by the four-step model [10].The four-step model is developed by formulating the process as sequential and interrelated models (trip generation, distribution, mode choice and route choice).It has experienced two stages: activity-based and trip-based.Either of the two versions requires an enormous database and is designed for regional scales.Thus, the four-step model is less effective for forecasting traffic demand on station-level scales [9].Regression-based direct ridership models, as a complementary method of the four-step model, are able to formula the relationship between ridership and station-level characteristics [32].This method is calibrated with data that are easily accessible, such as smart card data and land use density.It makes direct ridership models relatively concise and inexpensive compared with the four-step model.
Linear regression is a statistical analysis method used to determine the linear quantitative relationship between two or more variables.It can be categorized into two general types: global and local regression models [9].OLS regression is the most representative global model for revealing the influence of various factors on metro ridership [13,22,31].It is based on a hypothesis that the prediction errors for all samples are independent.However, based on Tobler's First Law of Geography [33], the spatial data normally present similar patterns with short distances.The metro ridership has been proven to be autocorrelated in spatial scales according to the results of Cardozo et al. [9] and Jun et al. [18], who introduced the GWR model to station-level ridership forecasting.
Instead of the constant parameters of the global regression model, the parameters of GWR will change with the samples' position to capture spatial variations [34].Taking the version of the OLS model, the GWR model can be expressed as: where the spatial coordinates of the metro station i are denoted as (u i , v i ); y i , x ik and ε i are the ridership, the kth explanation variable and the error term for the metro station i, respectively; β 0 (u i , v i ) and β k (u i , v i ) represent the regression parameters at metro station i, which are allowed to vary across space.As summarized by Cardozo et al. [9], the GWR model has some important advantages, which mainly refer to greater detail and accuracy, stronger explanatory power, smaller estimation errors, place-based decision support and measuring the degree of spatial similarity.Although the GWR model presents a significant superior capability in determining the spatial dependencies of station-level ridership, it is not adequate to model spatiotemporal data because it needs to aggregate or average the time-scale data by a certain period [26], such as average weekday boardings investigated in most previous studies on station-level ridership.A novel approach proposed by Huang et al. [35], the geographically and temporally weighted regression (GTWR), can offer a better improved fit from a new perspective by considering the spatial and temporal heterogeneity compared with the traditional OLS or GWR models.It has been applied to some certain topics, for example, housing price [35] and environment [36,37].In a notable research, Ma et al. explored the influence of built environment on transit ridership using GTWR model [26].However, his work focused on bus ridership in the region, in contrast to our study of metro stations.

Methodology
The GTWR model, which is an extend version of the GWR model, can take consideration of both spatial and temporal nonstationarity in real data.Thus, the GTWR model can presented as: where t i is the index of observing time interval of metro ridership.
The estimates of regression parameters can be shown as follows: where , whose diagonal elements denote weights based on the definition of the space-time distance to observation i, and its off-diagonal elements are zeros.
Considering that location and time usually have different scaling effects, we transfer the temporal distance into an additional spatial-scale distance with a time-space scale factor.If the Euclidean distance and absolute time difference are adopted as spatial distance and temporal distance, the spatiotemporal distance D ST between observation i and observation j can be formulated as a linear combination of the spatial distance D S and the temporal distance D T .
where the value of time-space scale factor τ means that the decay of weight for the increase in temporal distance by a unit (hours) equates to that for the increase in spatial distance by τ unit (kilometers).
Actually, the GWR model is a special case of the GTWR model with the parameter τ = 0.The diagonal elements of weighting matrix W(u i , v i , t i ) are calculated according to a weighting function formulated with the spatiotemporal distance and a kernel function.An eligible kernel function should ensure that neighboring observation points from the spatiotemporal data are allocated relatively larger weighting value.The most common weighting kernel is fixed Gaussian-based function, which can be expressed as: where h denotes a fixed parameter of spatio-temporal bandwidth.
A cross-validation (CV) minimization procedure is usually conducted to select the suitable parameters of time-space scale factor and bandwidth [26,34,37].
where ŷ =i (τ, h) is the estimate of observation i from the GTWR model established with the training data after eliminating observation i in the situation of time-space scale factor τ and bandwidth h.
The over-fitting that resulted from the extraordinary large weight for observation i can be effectively avoided with the CV procedure.To improve the efficiency of parameter optimizing procedures, we use a hybrid model by the incorporation of grid search and particle swarm optimization (PSO) to search an approximation optimal solution of Equation ( 6).

Study Area and Data
As the capital of Jiangsu Province, Nanjing is the second largest city in eastern China with the Yangzi River flowing through the city.Like many other metropolitans in China, Nanjing has been experiencing rapid urbanization and motorization after the reform and opening-up in 1978.By the year 2016, the city contained 11 districts with an administrative area of 6600 km 2 (2500 mi 2 ) and a total population of 8.27 million, 82% of which have urban registration [38].The consistently growing motor vehicle retention reached 2.28 million by the end of 2017.
To address the growing problems of urban expansion and traffic congestion, Nanjing began to construct its urban rail transit (URT) system from 2000.Since the first line formally started to operate in September 2005, Nanjing Metro had had 7 lines and 128 stations in operation as to November 2017 (Figure 1).By the year 2030, the projected metro network will contain 25 lines and cover 80% of suburban area with 800 m as service distance for metro stations [39].Under such a circumstance, it will provide a more useful decision support for line alignment and station location to investigate the correlation between metro ridership at station level and influence factors of topological structure and built environment based on the insight into spatiotemporal variation.where ℎ denotes a fixed parameter of spatio-temporal bandwidth.
A cross-validation (CV) minimization procedure is usually conducted to select the suitable parameters of time-space scale factor and bandwidth [26,34,37].
where  , ℎ is the estimate of observation  from the GTWR model established with the training data after eliminating observation  in the situation of time-space scale factor  and bandwidth ℎ.
The over-fitting that resulted from the extraordinary large weight for observation  can be effectively avoided with the CV procedure.To improve the efficiency of parameter optimizing procedures, we use a hybrid model by the incorporation of grid search and particle swarm optimization (PSO) to search an approximation optimal solution of Equation ( 6).

Study Area and Data
As the capital of Jiangsu Province, Nanjing is the second largest city in eastern China with the Yangzi River flowing through the city.Like many other metropolitans in China, Nanjing has been experiencing rapid urbanization and motorization after the reform and opening-up in 1978.By the year 2016, the city contained 11 districts with an administrative area of 6600 km (2500 mi ) and a total population of 8.27 million, 82% of which have urban registration [38].The consistently growing motor vehicle retention reached 2.28 million by the end of 2017.
To address the growing problems of urban expansion and traffic congestion, Nanjing began to construct its urban rail transit (URT) system from 2000.Since the first line formally started to operate in September 2005, Nanjing Metro had had 7 lines and 128 stations in operation as to November 2017 (Figure 1).By the year 2030, the projected metro network will contain 25 lines and cover 80% of suburban area with 800 m as service distance for metro stations [39].Under such a circumstance, it will provide a more useful decision support for line alignment and station location to investigate the correlation between metro ridership at station level and influence factors of topological structure and built environment based on the insight into spatiotemporal variation.

Metro Ridership Data
The automatic fare collection (AFC) system records the information of boarding station identification (ID), tap-in time, alighting station ID, and tap-out time for almost every passenger.In this study, the AFC data were collected from three weeks in October 2017 (9 to 29 October 2017) without festival.The raw transaction data were aggregated into hourly time interval from 5 a.m. to 12 p.m., 19 h totally with non-service time omitted, for all the 128 stations.The patterns of weekday travel and weekend travel are distinctly different from each other.Because the weekday travel demand always attracts more concern, the station ridership in this study only refers to the weekday boarding number.The average hourly ridership was derived from the same period during weekdays (from Monday to Friday) as the dependent variable.As shown in Figure 2, two obvious peaks can be observed for most stations, where the morning peaks mainly appear during 7:00-9:00 and the afternoon peaks mainly appear during 17:00-19:00.

Metro Ridership Data
The automatic fare collection (AFC) system records the information of boarding station identification (ID), tap-in time, alighting station ID, and tap-out time for almost every passenger.In this study, the AFC data were collected from three weeks in October 2017 (9 to 29 October 2017) without festival.The raw transaction data were aggregated into hourly time interval from 5 a.m. to 12 p.m., 19 h totally with non-service time omitted, for all the 128 stations.The patterns of weekday travel and weekend travel are distinctly different from each other.Because the weekday travel demand always attracts more concern, the station ridership in this study only refers to the weekday boarding number.The average hourly ridership was derived from the same period during weekdays (from Monday to Friday) as the dependent variable.As shown in Figure 2, two obvious peaks can be observed for most stations, where the morning peaks mainly appear during 7:00-9:00 and the afternoon peaks mainly appear during 17:00-19:00.

Influence Factor Measurement
Considering the findings of existing studies and the accessibility of data, two categories of the factors that may influence metro demand at station level, namely built environment and topological structure, were investigated in this paper.The measured factors were regarded as the potential independent variables in our GTWR model.A summary of the potential independent variables for station-level ridership is shown in Table 1.
The service area for the metro station is usually taken as an area within a threshold walking distance of 800 m (0.5 miles) in many previous studies [9,13,15,16].The built-environment related factors were calculated within this area with the assistance of a geographic information system (GIS) and point of interest (POI) data collected from Baidu Map.Among the built-environment related factors, the type of land use surrounding the metro station was categorized into eight types: residence, office, commerce, service, education, hotel, scenery, and parking, and the intensity of land use was described by the number within walking area of stations, while the five remaining built-

Influence Factor Measurement
Considering the findings of existing studies and the accessibility of data, two categories of the factors that may influence metro demand at station level, namely built environment and topological structure, were investigated in this paper.The measured factors were regarded as the potential independent variables in our GTWR model.A summary of the potential independent variables for station-level ridership is shown in Table 1.
The service area for the metro station is usually taken as an area within a threshold walking distance of 800 m (0.5 miles) in many previous studies [9,13,15,16].The built-environment related factors were calculated within this area with the assistance of a geographic information system (GIS) and point of interest (POI) data collected from Baidu Map.Among the built-environment related factors, the type of land use surrounding the metro station was categorized into eight types: residence, office, commerce, service, education, hotel, scenery, and parking, and the intensity of land use was described by the number within walking area of stations, while the five remaining built-environment related factors were used to indicate the convenience of intermodal connectivity with other traffic modes.Regarding the topological-structure-related factors, three common centrality metrics, namely degree centrality, closeness centrality, and betweenness centrality [40], were adopted in this paper, for the undirected metro network with edge weights as the line lengths between adjacent stations.
Degree centrality of an observation station is measured as the total number of adjacent edges of this station.Degree centrality is the simplest index in the notions of centrality, which can indicate the types of station (i.e., transfer station, terminal station, and intermediate station) from the perspective of network connectively.
However, a well-connected transfer station may be located in a relative remote region.Closeness centrality provides complementary mechanisms for the definition of centrality.The aim of closeness centrality is to describe the accessibility determined by how easily passengers get to all the stations from an observation station.It can be calculated as the reciprocal of the average shortest-path distances from observation station i to all other stations in the metro network, as given by: where d ir defines the shortest topological distance between stations i and r; R is the total number of metro stations.Betweenness centrality is a different extension of centrality, which is based on how often a special node acts as a bridge on the shortest path between any other two nodes.It can be introduced as a measure for quantifying the importance of a station on the connectivity between other stations in a metro network.The betweenness centrality of observation station i can be compactly defined as: where φ km is the total number of shortest paths from station k to station m, and φ km (i) the number of these paths that pass through station i.

Variables Selection for GTWR
Although the GTWR model has a remarkable ability to capture spatial and temporal heterogeneity, it is difficult to implement the selection of candidate variables because of the diverse test results of parameters at different spatiotemporal observation points.Thus, a multiple OLS regression, based on average daily ridership data of 128 stations in Nanjing, was conducted at first.A stepwise procedure was executed to identify the significant ridership-related variables, where the selection of variables is a bidirectional elimination based on the criterion that variables will be included with a confidence level above 85% or excluded with a confidence level below 70%.Table 2 shows the summary for the final OLS model with the parameter estimates and parametric diagnosis.
Four built-environment related factors and two topological-structure related factors were included in the final OLS model.All of these factors show a highly significant correlation with daily ridership demand, and most of them have slight collinearity with other variables (VIF < 4).Only the variables of Commerce and Parking are multicollinear variables, but they will be retained in the OLS model because the Variance Inflation Factor (VIF) values are markedly smaller than 10 (the threshold indicating significant collinearity) [41].Table 2 shows that the variables of Commerce, Parking, Degree Centrality, and Closeness Centrality have a positive impact on daily ridership demand, which demonstrates that an area with prosperous commerce, friendly parking systems, well-connected metro lines, and accessibility to other metro stations will attract more passengers.However, the negative coefficients of Residence and Scenery are contrary to general cognition and existing findings [13][14][15].

Modeling
Before building the GTWR model, we need to make a spatial and temporal nonstationarity test for the sample data.If the spatial and/or temporal nonstationarity is significant, it means that the GTWR model can provide significant better fits for the sample data than the OLS model.An effective means can be employed to assess the spatial and temporal nonstationarity with an analysis of variance (ANOVA) [35].The results of ANOVA based the average hourly ridership data in Nanjing Metro are presented in Table 3.In this table, the nonstationarity was diagnosed from spatial, temporal, both spatial and temporal perspectives by comparing the residual mean squares (MS) for the global model (OLS) and different local models, namely GWR, temporally weighted regression (TWR) and GTWR.As shown in the last two columns, the statistics of the F-test demonstrate that there is both spatial and temporal heterogeneity in the correlation between ridership and influence factors in the Nanjing Metro.In addition, it was found that the GTWR model can describe the data significantly better than considering the nonstationarity from a single scale (GWR or TWR).According to the values of R 2 in Table 3, 93% of variation in station-level ridership is explained in GTWR, which is 24% in OLS, 52% in TWR, 42% in GWR.Notably, the TWR model achieved a better goodness-of-fit compared with the GWR model in terms of R 2 , which indicates that nonstationarity is more prominent in temporal scale than that in spatial scale.To realize the optimal selection of time-space scale factor τ and bandwidth h simultaneously, we converted the spatiotemporal distance into spatial scale (km) and combined grid search and PSO.We conducted the parameter optimizing process with the following steps: Firstly, we generated two sequences for the parameters τ and h and built GTWR models with each pairwise set.The minimum CV appeared at the set of (3, 3) as shown in Figure 3. Since the parameters of the GTWR model were locally estimated and various across spatial and temporal scales, it could offer deep insight into the spatial and temporal heterogeneities of the influence of built environment and topological factors on metro ridership by visualizing the distribution patterns of GTWR-based estimators.Understanding how station demands are affected and determined, especially for the demand during peak period, is more of a concern in metro planning work.A comparison of the coefficient distributions during morning peak hours and evening peak hours is presented in Figures 4 and 5.It is observed that all of the six variables have both positive and negative effects on metro ridership for morning and evening peaks, which is different from the fixed negative correlation of the OLS model in Table 2, as well as the result of Zhao et al. [15] that is also a case study of Nanjing.The coefficients for Residence are negative in the core urban area which is generally work-oriented and/or commercial-oriented and positive in the stations that are close to terminal stations which are generally residential-oriented.The negative coefficients in the core urban area could be reasonable, because there is higher car ownership and shorter commuting distance.During evening peak hours, the effect of Residence on metro ridership become negative for most stations in suburban areas, and We then initialized the population of particles in PSO as 50 and each particle with a random position on two dimensions.For each particle, its fitness was calculated as the CV value with the set of (τ, h) equal to its two-dimension coordinates.
Then, we gave a search range surrounding the set of (3, 3) and further minimized the CV value with iterations of PSO.The final optimal CV was attained with the parameter set of (τ = 3.13, h = 3.10).
As stated before, the value of τ means that the decay of weight for the increase in temporal distance by one hours equate to that for the increase in spatial distance by 3.13 km.Since the average distance of adjacent metro stations is about 2 km for Nanjing metro, it means that the weights for adjacent metro stations within the same period approach twice those for adjacent time periods for the same station.In other word, the spatial data could provide more information for ridership forecasting than the temporal data.

Results and Discussion
The GTWR model was estimated with the optimal parameters based on average hourly data, and the results are presented in Table 4. Seven statistics were selected to describe the distributions of the estimated coefficients.Specifically, the lower quartile (LQ) and the upper quartile (UQ) were used to indicate interquartile range.The coefficient mathematically implicates that the metro ridership during the observed period will increase by the value of this coefficient due to unit change in the corresponding variable.Thereby, the positive mean coefficient for the Residence variable suggests that the metro station with more residual communities within its service area will generally attract more ridership, which is opposite to the result of OLS in Section 5.1 but identical with previous studies [14,15].The sign of the mean Scenery is negative as same as that in OLS.A possible reason is that famous sceneries are usually located in underpopulated regions.Another counter-example is the variable of Closeness Centrality, where the signs of both mean and medium Closeness Centrality are negative.It is difficult to explain this abnormal phenomenon based on the information in Table 4.The positive mean and medium values of the variables of Parking and Degree Centrality arrive at a similar inference as the OLS model.Since the parameters of the GTWR model were locally estimated and various across spatial and temporal scales, it could offer deep insight into the spatial and temporal heterogeneities of the influence of built environment and topological factors on metro ridership by visualizing the distribution patterns of GTWR-based estimators.Understanding how station demands are affected and determined, especially for the demand during peak period, is more of a concern in metro planning work.A comparison of the coefficient distributions during morning peak hours and evening peak hours is presented in Figures 4 and 5.
It is observed that all of the six variables have both positive and negative effects on metro ridership for morning and evening peaks, which is different from the fixed negative correlation of the OLS model in Table 2, as well as the result of Zhao et al. [15] that is also a case study of Nanjing.The coefficients for Residence are negative in the core urban area which is generally work-oriented and/or commercial-oriented and positive in the stations that are close to terminal stations which are generally residential-oriented.The negative coefficients in the core urban area could be reasonable, because there is higher car ownership and shorter commuting distance.During evening peak hours, the effect of Residence on metro ridership become negative for most stations in suburban areas, and the average absolute value of the negative coefficients is less than that in morning peak hours.It may be explained by that the number of residence communities primarily effect the trips departing from home regarding boarding ridership.
Figures 4 and 5 display the spatial distribution of the Commerce coefficients during peak hours.The number of shopping malls, restaurants, retail stores, and entertainment centers has a negative association with metro ridership in the core urban area during morning peak hours, and this association is converted to positive during evening peak hours.A possible explanation for this finding is that the trip purposes of boarding ridership relatively concentrate on commuting in the morning peak, whereas the ridership in evening peak may be mingled with large amounts of other trip purposes, such as dinner together, shopping, and entertainment.However, for the business districts in suburban areas, a contrary trend (i.e., from positive to negative) could be observed.It is not surprising that the suburban area is relative underdevelopment.Concerns of security and long travel distance could be possible reasons for the reduction in intensity of business activity in remote areas.
The Closeness Centrality provides a negative effect on station demand in the central area of the metro network and tends to present a general positive correlation with metro ridership when shifting away from the central area.The medium value of the coefficients for this variable during evening peak hours is much larger than that during morning peak hours (5403.02versus 0.34).It may be a result from the resident lifestyle that many people live in rural areas and work in urban areas.
An obvious tidal traffic phenomenon can be observed from Figure 6.It can be seen that the stations at the junctures of suburban areas and exurban areas experience high densities of boarding passengers during morning peak hours; nevertheless the vast majority of crowded boarding ridership always occurs in the core urban stations during evening peak hours.This unbalanced movement can also partially explain the temporal variation between morning peak and evening peak.

Conclusions
This paper contributes by exploring the spatial and temporal distributions of the association between hourly boarding ridership at station level and the factors related to built environment and topological structure in Nanjing city.Two statistical techniques (OLS and GTWR) were combined to investigate this relationship.Firstly, the global OLS was conducted to auto-select the potential variables by a bidirectional stepwise procedure.The GTWR model, an extended version of the GWR model, can capture the spatial and temporal variation between hourly ridership and the selected variables.In the GTWR model, we transferred the temporal distance into spatial scales, which not only could reduce the number of parameters but also endow the transfer factor with a practical meaning.To improve optimized efficiency, we used the incorporation of grid search and PSO to release solution optimal selection of weighting parameters.
The results of OLS based on daily ridership data suggested that six variables including four built-environment factors (i.e., Residence, Commerce, Scenery, and Parking) and two topologicalstructure factors (i.e., Degree Centrality and Closeness Centrality) had a significant effect on stationlevel ridership.The diagnosis of ANOVA demonstrated that there is significant spatial and temporal nonstationarity in the relationship between ridership and influence factors.The GTWR model offered a significantly better goodness-of fit for hourly ridership data than the traditional OLS, GWR and TWR models by producing a more complete picture of ridership data analysis from the perspective of both spatial and temporal scales.The results of the GTWR model suggested that residence communities primarily effect the trips departing from home regarding boarding ridership.The commercial buildings in the central city mainly attract metro ridership during evening peak hours, whereas in suburban areas they serve more metro ridership during morning peak hours.Nevertheless, the findings of GTWR are not completely different from those of OLS; for example, adequate parking lots and transfer stations could attract more ridership.Consequently, the findings of the GTWR model can not only provide more reliable and accurate estimates for metro demand, but also help planners to better understand the spatial and temporal variation of the correlation between ridership and influencing factors.
Future work can enrich the factor data that relate to metro ridership, such as the frequency of buses, the number of sharing bikes, and traffic conditions.The POIs used in this study were counted in the same service area without considering their diverse attractions and the decay influences with

Conclusions
This paper contributes by exploring the spatial and temporal distributions of the association between hourly boarding ridership at station level and the factors related to built environment and topological structure in Nanjing city.Two statistical techniques (OLS and GTWR) were combined to investigate this relationship.Firstly, the global OLS was conducted to auto-select the potential variables by a bidirectional stepwise procedure.The GTWR model, an extended version of the GWR model, can capture the spatial and temporal variation between hourly ridership and the selected variables.In the GTWR model, we transferred the temporal distance into spatial scales, which not only could reduce the number of parameters but also endow the transfer factor with a practical meaning.To improve optimized efficiency, we used the incorporation of grid search and PSO to release solution optimal selection of weighting parameters.
The results of OLS based on daily ridership data suggested that six variables including four built-environment factors (i.e., Residence, Commerce, Scenery, and Parking) and two topological-structure factors (i.e., Degree Centrality and Closeness Centrality) had a significant effect on station-level ridership.The diagnosis of ANOVA demonstrated that there is significant spatial and temporal nonstationarity in the relationship between ridership and influence factors.The GTWR model offered a significantly better goodness-of fit for hourly ridership data than the traditional OLS, GWR and TWR models by producing a more complete picture of ridership data analysis from the perspective of both spatial and temporal scales.The results of the GTWR model suggested that residence communities primarily effect the trips departing from home regarding boarding ridership.The commercial buildings in the central city mainly attract metro ridership during evening peak hours, whereas in suburban areas they serve more metro ridership during morning peak hours.Nevertheless, the findings of GTWR are not completely different from those of OLS; for example, adequate parking lots and transfer stations could attract more ridership.Consequently, the findings of the GTWR model can not only provide more reliable and accurate estimates for metro demand, but also help planners to better understand the spatial and temporal variation of the correlation between ridership and influencing factors.
Future work can enrich the factor data that relate to metro ridership, such as the frequency of buses, the number of sharing bikes, and traffic conditions.The POIs used in this study were counted in the same service area without considering their diverse attractions and the decay influences with increasing distance.Besides, advanced weighting kernel function and the GTWR model that can estimate both local and global parameters deserve further investigation.

Figure 1 .
Figure 1.A map of the study area.

Sustainability 2018 , 18 Figure 1 .
Figure 1.A map of the study area.

Figure 6 .
Figure 6.Spatial distribution of hourly ridership during peak hours.

Figure 6 .
Figure 6.Spatial distribution of hourly ridership during peak hours.

Table 1 .
Potential factors and their descriptions.

Table 2 .
Summary and diagnosis of ordinary least squares (OLS) model coefficients.The correlation is significant at the 0.05 level.b VIF (Variance Inflation Factor) is greater than 4. a

Table 3 .
Analysis of variance (ANOVA) comparison between OLS and Local Regression Models.